Introducing our BigQuery and Snowflake web data models
Following the recent release of our new generation web model for Redshift, we are very excited to announce the releases of the Snowplow web data model for BigQuery and Snowflake as well. The new web model is now available for all three major storage targets to empower Snowplow BDP customers and Snowplow Open Source users to work with their Snowplow data in an extensible, scalable and incremental manner.
Why we are investing in building data models
Modeling large amounts of raw data into various tables optimised for analytical use cases in a performant way is hard. It requires in-depth understanding of how the data is collected and structured, as well as how the data warehouse processes and stores the data. Testing large SQL data models is also hard. We want to support our customers and Open Source users with these challenges, so they can focus on applying their business logic and solving their data use cases.
Similarities and differences between the models
All three versions of the Snowplow web data model have the same modular structure and incrementalization logic, about which you can read more in the documentation. Essentially, independent of the storage target, the data model:
- Runs incrementally on the same inputs.
- Produces the same temporary and persistent output tables per module, which any custom module can be further built upon.
- Logs metadata about every core module’s run.
However, each warehouse differs. To start with, the raw data is structured a bit differently: in Redshift the data is structured in federated tables, while in BigQuery and Snowflake all data is loaded into the atomic.events table. In addition, schema versioning works a bit differently in BigQuery, where, unlike Snowflake, even a non-breaking schema change results in a new column.
Furthermore, it was essential to adopt each target’s best practices, to take into account its unique properties, for example concerning partitioning, clustering and transactions, and to take advantage of its unique strengths to drive performance. This is also the reason why in BigQuery and Snowflake the models define and use some stored procedures, of which you can also take advantage in your custom modules.
In the official data models’ GitHub repository, besides the SQL code, users can also find the playbooks to run the model via SQL Runner and detailed README’s per storage target:
Snowplow customers can also find the datamodeling configuration files to run their data models via Snowplow BDP.
Start building out data models that power your business with Snowplow today
If you are interested in event data modeling and want to derive behavioral insights from granular, event-level data points, feel free to reach out and get in touch with us today! Alternatively, you can explore the granular, event-level data Snowplow provides or try out Snowplow for yourself!