Introducing Snowplow 22.01 Western Ghats
In this post we are excited to be discussing the latest features of the Snowplow platform as part of our third Snowplow OS Distribution since we introduced them in early 2021.
22.01 Western Ghats brings a wide range of improvements, read on to find out more.
Snowplow 22.01 Western Ghats
So what’s new in 22.01 Western Ghats? Since the release of 21.08 North Cascades, we have focused on:
- Making it easier to get started on Google Cloud Platform with new Terraform modules
- Getting your Snowplow data to more destinations with Google Tag Manager Server Side
- Getting your data from more sources with new Trackers
- Landing your Snowplow events to core destinations more reliably, cost effectively and with observability with our updated Loaders.
There’s some other goodies discussed below too, so without any further delay let’s get started with our new getting started guide.
Open Source Quick Start for Google Cloud Platform
Following on from the launch of the Open Source Quick Start on AWS in 21.08 North Cascades, we are very pleased to announce support for GCP.
These terraform modules make it far quicker and easier to get started with Snowplow open source on GCP than ever before. Coupled with a new quick start guide for GCP, you should have everything you need to get started with Snowplow open source on GCP.
You can find more information in the Open Source Quick Start now available on GCP blog post and the quick start documentation.
Google Tag Manager Server Side support
With the introduction of Snowplow’s support for Google Tag Manager server-side you can now effortlessly forward your behavioral data to downstream destinations.
Get started with Snowplow’s out-of-the-box authored tags for Amplitude, Braze and Iterable or utilise Google Tag Manager’s rich library of vendor and community authored tags such as Facebook Conversions, Tiktok Ads and many more.
Alternatively, take full control by leveraging our new HTTP Request Tag Template to forward your behavioral data to any JSON HTTP destination.
GTM SS with Snowplow can be set up in two different configurations.
You can find more detailed information on these different configurations and the different Clients and Tags in our GTM SS announcement post.
Iglu Central Schema lists
Snowplow’s RDB loader and Postgres loader use the schema list endpoints in order to discover and gather all available schema patches and revisions, and therefore to create table columns with the correct types.
Until now, those loaders required the user to run an Iglu server, because that was the only style of Iglu repository that supported the list endpoints.
In order to use Iglu Central schemas, users had to manually upload those schemas to their own privately-run Iglu Server. With this new change, users can run the RDB loader and Postgres loader with an iglu resolver that simply uses the http://iglucentral.com repository instead of self-hosting those schemas. We still recommend having your own Iglu Server or Static Iglu Repository with Schema Lists for your custom schemas.
The full announcement with information about any impact this change might have on your pipelines was announced on Discourse.
Snowplow Destination Improvements
We’ve released a whole host of improvements across the Snowplow Loaders since 21.08. Many of these alone could take up an entire blog post so we’ll give each update a brief introduction here and links to where you can find more information.
Redshift
There have been two big releases to RDB Loader, in the form of v2.0.0 which adds SNS as a new shredding complete message destination to Shredder, allowing multiple Loaders to run in parallel. Also, we overhauled the config structure of both Loader and Shredder, to make them less tightly coupled. v2.1.0 then introduced RDB Loader as a long-running (or “self-healing”) application with proper retry logic and a variety of other usability and stability improvements.
BigQuery
BigQuery Stream Loader is an entirely new way of loading your BigQuery Data Warehouse from your Snowplow pipeline. No longer requiring Beam, this is a standalone application which should hopefully simplify set up and reduce running costs. In addition, we introduced a new load_tstamp
field to all events loaded into BigQuery. This timestamp represents the time when the data arrived in the warehouse and can be used for incremental processing of new data in data modeling. This release also bring observability to the BigQuery Loader with observability though a StatsD-compatible reporting mechanism.
S3
And last, but certainly not least, the S3 Loader v2 brings observability though a StatsD-compatible reporting mechanism for enriched data latency and counts for all rows, including bad and raw.
ElasticSearch
Snowplow Mini 0.13 introduced a new endpoint to reset the ES index, if it can no longer load events into Elasticsearch due to a breaking schema update then using this new endpoint will reset the index so it can be recreated and loading into ElasticSearch can continue.
New Trackers for Roku and Flutter
We’ve recently introduced two new trackers, covering two new and exciting technologies that we’ve received a number of requests for.
Roku Tracker
First up is the Roku Tracker, which even at v0.2.0 already comes with a sizable feature set. It enables tracking common event types (self-describing, structured, and screen view) along with custom context entities. It automatically enriches each event with context information about the Roku device, unique identifiers for channel and device, current device usage, and more. However, perhaps the most exciting feature offered by the Roku tracker is video tracking.
For more information, head over to the full Roku Tracker announcement post.
Flutter Tracker
The Flutter tracker takes much inspiration from our React Native tracker, and combines our robust mobile trackers into a wrapper that makes it easy to work with Snowplow tracking in your flutter apps. Not only that but the Flutter tracker also works for Flutter Web by using the Snowplow JavaScript Tracker under the hood. You can find the new Flutter tracker on pub.dev along with documentation to help you get started.
Video Tracking for Web
Version 3.2 of the Snowplow JavaScript Trackers brings the long-awaited video (and audio) tracking functionality to the JavaScript Tracker (and with further improvements in v3.3). This uses the same schemas as the above Roku video tracking but adds support for any HTML5 <video> and <audio> tag as well as YouTube.
It’s easy to set up as a tracker plugin, and then you simply need to pass your video elements id into the tracker and then all the events you wish to track will be automatically sent to your Snowplow collector.
import { newTracker } from "@snowplow/browser-tracker";
import { MediaTrackingPlugin, enableMediaTracking } from "@snowplow/browser-plugin-media-tracking"
newTracker("sp2", "{{collector_url}}", {
appId: "my-app-id",
plugins: [ MediaTrackingPlugin() ],
})
enableMediaTracking({
id: "example-id"
})
You can read more about version 3.2 in the official discourse announcement.
Snowplow web dbt package on all warehouses
The latest releases for the Snowplow web model for dbt have brought support for BigQuery, Snowflake and Postgres! This means you can now run the web model using the dbt package directly from the dbt hub across all the supported Snowplow warehouses. We’ve also added support for dbt v1 with version 0.5.0. Keep your eyes peeled for the Snowplow mobile model landing on the dbt hub very soon.
Looker web block
We also announced the release of looker-snowplow-web v1 (and v1.1), a block that makes the derived tables produced by the Snowplow Web V1 model available for exploration in Looker. A series of dashboards have also been included to help visualize the data.
Recommended Component Versions
This section links to the recommended components in 22.01 Western Ghats. We’ve listed the major features above but many components have also seen smaller but significant updates. Running the recommended 22.01 Western Ghats components ensures you will be able to use all the features listed above and have the confidence they are battle tested and ready for production.
Recommended Component Versions are detailed on the 22.01 Western Ghats Version Compatibility Matrix. Components which have been updated since the last release are highlighted.
Latest releases and the public roadmap
Keep up to date with the latest Snowplow releases:
- Github; give us a star to stay update date with individual component releases.
- Discourse; sign-up & watch the new releases category for announcements.
- Snowplow blog; check out the releases and product features sections for longer form content.
- Public roadmap; let us know which features you are most excited about by adding an emoji or comment.
- You can read more about our Public Roadmap blog post.
Snowplow BDP
For Snowplow BDP customers reading this, the majority of pipelines are already running 22.01 Western Ghats components so you should be good to go ahead and explore the features above. If you’d like to find out exactly which versions you are running currently, please contact Snowplow Support.