Creating insights from sources

Last updated:

|Edit this page

The data warehouse enables you to visualize and query data from external sources with PostHog's tools.

After linking your source, you can use its data in a trend like you would event or person data.

To do this, create a new insight, click the series, go to the Data warehouse tables tab, hover over the table you want, and click Select.

Selecting a data warehouse table in a trend

The main difference from events is that you must identify 3 fields from the data warehouse data to visualize it:

  • ID: A field that corresponds to the ID of the element.
  • Distinct ID: A field that corresponds to an ID representing the user associated with the element. If none, can match ID.
  • Timestamp: A timestamp field representing when the element was created.

It's infeasible for us to map these fields. We pre-fill with a best guess, but allow you to edit it.

Filters and breakdowns

When using data warehouse tables in insights, you can use properties from those tables filter and breakdown as you would with any other insight. For example, you could filter zendesk_tickets for ones where the status is open or hubspot_companies where lifecyclestage is lead.

Filtering and breaking down data warehouse data

Want to create insights with a combination of PostHog and external data? Check out our docs on joining tables and persons.

Querying sources with SQL

For more complicated queries, you can use SQL insights.

To start, either create a new SQL insight or go to the data warehouse tab. Here you can see the schemas all available external and PostHog tables as well as saved views

To see available sources, go to the data pipeline page. This page shows the external and PostHog tables as well as saved views you can visualize with trends or query through SQL insights.

Data warehouse tab in PostHog app

Here you can:

  1. Write your SQL query using your table like SELECT * FROM hubspot_contacts
  2. Click Update and run to see the results.
  3. Modify your query using SQL commands as needed to get the data you want like SELECT email FROM hubspot_contacts WHERE city = 'Brisbane'.
  4. Save your query.
Querying a source in PostHog

Querying multiple sources together

Much of the power of the data warehouse comes from combining multiple sources, like your Stripe or Hubspot data with your product analytics data. Two of the easiest ways of doing this are WHERE IN and JOIN SQL commands.

For example, to get a count of events for your Hubspot contacts you can filter events.distinct_id by email FROM hubspot_contacts like this:

SQL
SELECT COUNT() AS event_count, distinct_id
FROM events
WHERE distinct_id IN (SELECT email FROM hubspot_contacts)
GROUP BY distinct_id
ORDER BY event_count DESC

You can also use a JOIN such as INNER JOIN or LEFT JOIN to combine data. For example, to get a count of events for your Stripe customers you can INNER JOIN on distinct_id and email like this:

SQL
SELECT events.distinct_id, COUNT() AS event_count
FROM events
INNER JOIN prod_stripe_customer ON events.distinct_id = prod_stripe_customer.email
GROUP BY events.distinct_id
ORDER BY event_count DESC

Questions?

Was this page useful?

Next article

Joining data

The real power of the data warehouse is the ability to combine data from multiple tables in a single query. Joins enable you to do this. They enable you to choose fields that act as connections between PostHog and external sources. Table joins You can join external data on existing PostHog schemas and other external data tables. These joins are saved and interpreted anytime they're accessed on the origin table. To define a join, go to the data warehouse tab , click the three dots next to your…

Read next article