Today we bring together and share the latest updates to the Snowplow platform, as well as revealing our new way of announcing Snowplow releases. Our new release announcements will clarify our recommended component versions and discuss our latest features.
In April 2020 we made our last umbrella release, R119 Tycho Magnetic Anomaly Two. Following this release, we moved the Snowplow components into separate repositories and we announced why we’re changing the way we’re releasing in a blog post. This has given us far greater agility and the flexibility to make updates to each individual component separately, but we’ve come to realise that it’s hard to keep up with all the updates across the Snowplow platform and understand how they all fit together.
With this in mind, over the last couple of months we’ve been thinking hard about how we can continue to help our users understand the latest features that are available in your Snowplow pipeline and what versions of the Snowplow components you need to be running to use them.
This brings us to today, where we reveal our new way of announcing Snowplow platform releases. We will be moving to periodic based releases, which are named by the year and month of the release along with a memorable, mountainous name 🏔. We wanted to land on something memorable, aspirational and ‘Snowy’ enough to tie these releases into the wider Snowplow ecosystem.
We will continue to publish new versions of our components within their associated repositories, but these platform releases will provide clarity on the current recommended component versions that are fully compatible with each other, battle tested and ready for production.
There is no precise cadence for these releases, we will define a release when we feel there is a notable set of new features available for your Snowplow pipeline. We will aim to do at least two releases per year, although there is every possibility that you will see more as we aim to assess whether we’re ready to announce a platform release each quarter.
If you’d like to find the very latest updates and features, you can look at the latest commits to snowplow/snowplow where we now push component updates, check our product roadmap or you can check the releases and product features sections of the Snowplow Analytics blog.
Snowplow 21.04 Pennine Alps
- Automated snowplow/snowplow updates
- Surge Protection for AWS
- Anonymous Tracking
- Data Models for Web
- Reliability and General Hardening
- Postgres Loader
- Observability Updates
Automated snowplow/snowplow updates
Our Open Source homepage has been updated with new graphics, an architectural overview and automatic updates when any component of the Snowplow platform receives an update. This means it’s never been easier to spot the latest releases across the entire Snowplow platform.
Head to https://github.com/snowplow/snowplow to check out the latest updates and whilst you’re there, watch and star the repository to keep up to date with all the latest releases going forward.
Surge Protection for AWS
We have released a new feature on AWS so our customers can have confidence that their pipeline will successfully scale to handle even the most extreme traffic spikes.
We have achieved this by adding Amazon Simple Queue Service (SQS) as a buffer mechanism, acting as a pressure valve between the collector and Kinesis and preventing messages from having to wait in the collector’s memory while Kinesis is scaling. Kinesis’ slow scaling leads to over (and costly) sensitivity and provisioning in the scaling algorithm (unlike GCP where there is no need to wait for PubSub to scale). Instead, messages are written to SQS where they are queued whilst Kinesis is resizing, and the sqs2kinesis application is then responsible for reading the messages and writing to Kinesis once it is ready. With Surge Protection, customers now have even greater assurance that their pipeline will scale faster to handle even the most extreme data surges, without having to pre-provision capacity.
Check out the announcement blog post and the docs for Open Source users to set this up. This was rolled out automatically to Snowplow BDP customers.
For more information on how to leverage cookieless and anonymous tracking with Snowplow read our detailed blog post.
Data Models for Web
We have introduced the next generation of our data models, starting with our web models. These models address a number of challenges, by moving to a new modular approach to data modelling. This allows us to segregate the ‘heavy lifting’ of an incremental Snowplow module by extrapolating the incremental logic into its own ‘base’ module. The base module produces a table which contains only events relevant to this run of the incremental logic, both new events and those events that require recomputing (for example because they are part of an ongoing session).
To find out more about our new data models, you can read our introduction to them as well as our follow up for BigQuery and Snowflake.
Reliability and General Hardening
On top of the above features, since the big R119 Failed Events update, we’ve been hard at work ensuring the core pipeline components – collector, enrich and loaders – are as reliable as possible. There have been a number of updates since R119 which are recommended. These updates offer a range of improvements and fixes to ensure your pipeline is performing optimally. Whilst some components may not have received new features since R119, many have seen updates as we continued to test and roll them out across new and existing pipelines.
Following much demand from the OSS community, we released an initial version of a Postgres Loader (v0.1.0), providing an alternative to Redshift or Snowflake when looking to try out Snowplow open source at lower volumes or for QA purposes. In fact, the Postgres Loader is being used in our recently launched Try Snowplow experience.
You can find further information on the Postgres Loader here, as well as documentation on how to set this up as an open source user.
As part of our drive to improve the observability of each of the pipeline components, we introduced a ‘gauge’ metric to our BigQuery Loader which samples the latency of the data from the collector to point of loading to BigQuery every 1 second, giving you greater visibility of the health of your pipeline. We have published a blog post on this topic on the Snowplow blog.
In addition, we introduced basic observability to Snowplow Mini (v0.12.0) such that the logs of each of its internal services are now exported to CloudWatch on AWS and Cloud Logging on GCP, as well as runtime metrics. Read the documentation for this feature to find out more.
Recommended Component Versions
Our Version Compatibility Matrix is now broken down into specific platform releases and the current latest recommended compoenents. We’ve listed the major features above but many components have also seen smaller but significant updates. Running the components listed in the Snowplow 21.04 Pennie Alps Version Compatibility Matrix ensures you will be able to use all the features listed above and have the confidence they are battle tested and ready for production. Components which have been updated since the last release are highlighted in purple.
We’ve published the Snowplow 21.04 Pennie Alps Version Compatibility Matrix on our documentation site.
If you’re eager to play with the very latest Snowplow technology, you should head over to our Public Roadmap which highlights the latest updates we’ve released and what will be coming soon. We’d also love to hear from you, so please add an emoji or a comment to the features you’re excited about or that you’d like to know more about. If you’d like to know more, you can read all about it in our Public Roadmap blog post.
For Snowplow BDP customers reading this, the majority of pipelines are already running 21.04 Pennine Alps components so you should be good to go ahead and explore the features above. If you’d like to find out exactly which versions you are running currently, please contact Snowplow Support.
As an Snowplow BDP customer, you will also have benefitted from a number of features that make managing and configuring your pipeline easier, including; the ability to manage domain & cookie configuration as well as your data models from the Snowplow BDP Console, an improved UI and a supporting API for managing your Data Structures, real time data quality alerting, and a re-design of many areas to improve the overall experience. Additionally, the above mentioned Surge Protection on AWS was automatically rolled out to all Snowplow BDP pipelines.