Snowplow 0.8.10 released with analytics cubes and recipes 'baked in'
We are pleased to announce the release of Snowplow 0.8.10. In this release, we have taken many of the SQL recipes we have covered in the Analysts Cookbook and ‘baked them’ into Snowplow by providing them as views that can be added directly to your Snowplow data in Amazon Redshift or PostgreSQL.
- Background on this release
- Reorganizing the Snowplow database
- Seeing a recipe in action: charting the number of uniques over time
- Seeing a cube in action: interrogating the visitors cube in Tableau
- Installing this release
- Next steps: where to go from here
One of the things we’ve learnt from many new Snowplow users, is that they want to get up and running analyzing Snowplow data as fast as possible: often by putting a familiar business intelligence tool directly on top of Snowplow data, to start exploring and visualizing that data.
For those users, a frustration with Snowplow is that each analysis typically starts with having to write a SQL query on the Snowplow data, to transform it into a format suitable for analysing in a BI / OLAP tool. Whilst we believe it is a strength that Snowplow gives you the flexibility to design and structure a wide range of different analyses, we recognise that for new users in particular, it would be nicer if they could dive straight into the data in their BI tool of choice.
This release aims to bridge that gap: we are providing a range of recipes and cubes as SQL views into the atomic Snowplow data; all of these are suitable for being loaded into BI tools like Excel, ChartIO and Tableau, directly.
With this release we have reorganized Snowplow data:
- The
events
table is now located in theatomic
schema. (This is new for Redshift users, not for Postgres users.) - There are three new schemas that contain “cubes” – i.e. views of the Snowplow data that can be consumed directly in a pivot / OLAP / BI tool e.g. PowerPivot or Tableau. These are
cubes_pages
,cubes_visits
andcubes_transactions
. - There are three new schemas that contain “recipes” – views of the Snowplow data that can be visualized directly in any graphics package. These are
recipes_basic
,recipes_customer
andrecipes_catalog
Let’s start by showing how easy the views make it to start plotting Snowplow data in ChartIO. Log into ChartIO and create a new connection to your database, just as before, but this time set the Schema name field to be ‘recipes_basic’ rather than ‘atomic’. (This is necessary, because ChartIO requires a different connection for each schema on your database, rather than a single connection per database.)
Now go back into ChartIO and select to create a new graph. Select your new data source: a long list of different “recipes” should be displayed. Select the Uniques And Visits By Day from the list – this will reveal the dimensions and metrics returned in that view:
Now you can simply drag the relevant metrics and dimensions over from the list into the design pane. Let’s simply plot uniques by date: drag the Date dimension to the dimensions pane, and the Uniques measure over to the Measures pane:
Now click Chart Query to draw the graph. Simple visualization of Snowplow data with no SQL required!
We’ve included a number of “cube views” in the new release. These can be opened directly into your pivoting tool of choice.
For this example, we’re going to open the cubes_visits.referer_entries_and_exits
view directly into Tableau.
Open up Tableau, select to create a new database connection, enter your database details. Select the referer_entries_and_exits
view to connect to.
Note: if you are connecting to the views in Redshift, you will need to add the new schemas to your search_path
before they are visible in Tableau. You can, however, access them directly by selecting Custom SQL and entering SELECT * FROM cubes_visits.referer_entries_and_exits
.
Tableau will ask whether you want to import all the data, or connect live. If you have a lot of data, we recommend connecting live.
You can now drag and drop any of the dimensions and any of the metrics listed. For example, we can drag in entry_page_path
, visit_start_ts
and visit_duration
to see how average visit lengths have changed per landing page over time:
5.1 Redshift users
If you’re using Redshift, you will need to migrate your Snowplow events table from the public
schema to the atomic
schema. This can be done using this migration script.
You will then you need to update your StorageLoader config file to ensure that from now on, all new data is loaded into the atomic.events
table, rather than the public.events
table. You do this by updating the file so that the :table:
key is set to ‘atomic.events’ rather than just ‘events’:
Now you need to create the new schemas for the different views, and define each view. The following six scripts need to be run:
- recipes-basic.sql
- recipes-catalog.sql
- recipes-customers
- cube-visits.sql
- cube-transactions.sql
- cube-pages.sql
These can be run directly using the psql
command line tool, as described in the setup guide.
Finally, you will want to add the new schemas to your search_path
. This is necessary for the views in these schemas to show up in tools like Tableau and SQL Workbench. An explanation of how to update the search path is given here, in the setup guide.
5.2 PostgreSQL users
If you are using PostgreSQL, your events data should already be in the atomic.events
schema.
You need to do is updated your events table definition, as per this migration script.
Afterwards, you can create the new schemas and views, by running the following scripts:
- recipes-basic.sql
- recipes-catalog.sql
- recipes-customers
- cube-visits.sql
- cube-transactions.sql
- cube-pages.sql
These can be run directly using the psql
command line tool, as described in the setup guide.
We’ll be covering how to use the recipes and cubes in more detail in forthcoming blog posts, and of course adding new recipes to the Analysts Cookbook. In the meantime, we recommend that curious users start experimenting with the different views, and refer to the underlying SQL to understand how they’re created, and indeed how they can tweak those statements to deliver the data formatted as they need.