Blog

Delivering on Our Promise: New Snowplow Pipeline Developments

By
Adam Roche
&
July 29, 2024
Share this post

In January 2024, we announced our new licensing approach. Since then, we’ve rolled out significant updates to our core pipeline and broader customer data infrastructure (CDI) capabilities, delivering on our promise of continued innovation and value for our customers. These enhancements showcase our commitment to advancing our technology to meet our customers’ business needs while operating under the Snowplow Limited Use License Agreement (SLULA).

What’s new with the Snowplow core pipeline?

Since January, we have released several exciting new upgrades to the Snowplow pipeline to make it faster, more cost efficient to run, more flexible, and more secure. These include:

  • Snowflake Streaming Loader: This new loader, released exclusively under the current licensing agreement, makes Snowplow data available in Snowflake in real time. Compared to its predecessor — RDB Loader — the latency is reduced by 100 times and costs reduced by 8 times.
  • Lake Loader: Now accessible on AWS (in addition to GCP and Azure), this feature offers a cost-effective data storage solution using open table formats like Iceberg and Delta.
  • Enhanced RDB Loader: Version 6.0.0 enhances robustness and gracefully handles invalid schema evolution.
  • Snowbridge Updates: Version 2.4.0 now includes OAuth2 support for improved security and integration capabilities.
  • Failed Events Handling: The upcoming Enrich 5.0 will make it easier to explore and recover failed events.
  • Continuous Security Updates: We have addressed various Common Vulnerabilities and Exposures (CVEs), including several classified with “high” or “critical” severity. We’re also investing in deprecating third-party components that are no longer maintained or patched, such as Lightbend Akka 2.6, which is dependent on the Collector. The full list of CVEs we’ve fixed can be found at the end of this post.

What benefits do these updates provide to Snowplow customers?

These new features and improvements demonstrate our commitment to investing in our  pipeline components. They address important needs for our users and offer advantages not available in previous versions of the pipeline or ones derived from them:

  • Performance: The Snowflake Streaming Loader processes data much faster (in real time), compared to RDB Loader. This enables Snowflake users to build real time use cases. The speed boost can be a game-changer for industries where getting timely insights is key.
  • Cost-efficiency: The Snowflake Streaming Loader and Lake Loader offer big infrastructure cost savings. This means companies can handle larger amounts of data without having to increase their budget by as much.
  • Flexibility: We have made Lake Loader available for AWS in addition to GCP and Azure. This gives businesses more choices for storing and managing data to fit their unique needs and systems.
  • Reliability: Improvements to the RDB Loader and the upcoming failed events handling in Enrich 5.0 make sure that data processing is even more dependable. This means businesses can trust that their data is accurate and complete, even when dealing with invalid schemas or processing issues.
  • Security: Regular security updates keep sensitive customer data, as well as the infrastructure itself, safe from known threats, which is crucial as data privacy regulations become more stringent around the world.

Software paired with best-in-class support

By entering into a commercial agreement with Snowplow, you’ll not only benefit from continuous technical innovation and enhancements, but also gain a supportive ecosystem designed to enhance your data infrastructure:

  • Get help from Snowplow’s top-notch Support Engineers any time of the day, swiftly resolving issues and keeping your data pipeline running smoothly.
  • Pick between Standard and Enhanced support levels tailored to your business needs, with Enhanced support promising quick responses within 30 minutes for critical issues.
  • Access exclusive resources like the customer-only Help Center and receive assistance through email, Help Center, or Slack for Enhanced Support customers.
  • Receive expert advice on tracker setup, schema design, infrastructure management, and data modeling, drawing from valuable insights gained from Snowplow’s diverse customer base.
  • Share your thoughts on Snowplow’s product development, influencing the software’s future to better align with your business requirements.
  • Gain peace of mind with a pipeline that is continuously enhanced by its original creators, ensuring your data infrastructure stays up-to-date and secure.

Future Pipeline Developments

Our engineers work continuously to improve our pipeline components. Here are just some of the future updates we have in store:

  • We’re introducing a new BigQuery Loader to help cut costs and store Snowplow data more compactly (in fewer columns).
  • Our upcoming Databricks Loader will soon allow real time data ingestion for Databricks users.
  • We’re also planning to move to a cloud-agnostic messaging framework, which should result in significant infrastructure cost savings on AWS, GCP and Azure.

These updates are designed to further boost performance, lower costs, and make data handling more flexible. By continuously improving our pipeline components, we’re committed to keeping our customers one step ahead.

Ready to explore our latest technology?

If you’re using an older version of Snowplow open source software, get in touch with our teamto find out how these new advancements can improve your data infrastructure, increase performance, and move your business forward. Take advantage of the competitive edge that real time, reliable behavioral data can offer in the age of AI.

If you’re new to Snowplow, contact us to schedule a demo to learn about the full set of capabilities and benefits you gain with our customer data infrastructure (CDI) designed for AI.

Resolved CVEs

  • Collector: CVE-2022-21653 [7.5 – high], CVE-2023-31442 [7.5 – high], CVE-2023-33251 [5.5 – medium], CVE-2023-44487 [7.5 – high]
  • Enrich: CVE-2007-6755 [5.4 – medium], CVE-2007-6755 [5.4 – medium], CVE-2007-6755 [5.4 – medium], CVE-2010-0928 [5.1 – medium], CVE-2010-0928 [5.1 – medium], CVE-2010-0928 [5.1 – medium], CVE-2010-4756 [5.3 – medium], CVE-2010-4756 [5.3 – medium], CVE-2018-20796 [7.5 – high], CVE-2018-20796 [7.5 – high], CVE-2019-1010022 [9.8 – critical], CVE-2019-1010022 [9.8 – critical], CVE-2019-1010023 [8.8 – high], CVE-2019-1010023 [8.8 – high], CVE-2019-1010024 [5.3 – medium], CVE-2019-1010024 [5.3 – medium], CVE-2019-1010025 [5.3 – medium], CVE-2019-1010025 [5.3 – medium], CVE-2019-9192 [7.5 – high], CVE-2019-9192 [7.5 – high], CVE-2022-1471 [9.8 – critical], CVE-2022-21653 [7.5 – high], CVE-2023-2976 [7.1 – high], CVE-2023-3446 [5.3 – medium], CVE-2023-3446 [5.3 – medium], CVE-2023-3446 [5.3 – medium], CVE-2023-34462 [6.5 – medium], CVE-2023-3817 [5.3 – medium], CVE-2023-3817 [5.3 – medium], CVE-2023-3817 [5.3 – medium], CVE-2023-44487 [7.5 – high], CVE-2023-4806 [5.9 – medium], CVE-2023-4806 [5.9 – medium], CVE-2023-4813 [5.9 – medium], CVE-2023-4813 [5.9 – medium], CVE-2023-4911 [7.8 – high], CVE-2023-4911 [7.8 – high], CVE-2023-4911 [7.8 – high], CVE-2023-5678 [5.3 – medium], CVE-2023-5678 [5.3 – medium], CVE-2023-5678 [5.3 – medium], CVE-2024-0727 [5.5 – medium], CVE-2024-0727 [5.5 – medium], CVE-2024-0727 [5.5 – medium], CVE-2024-1597 [9.8 – critical], CVE-2024-21634 [7.5 – high], CVE-2024-2511 [3.7 – low], CVE-2024-2511 [3.7 – low], CVE-2024-2511 [3.7 – low], CVE-2024-29025 [7.5 – high]
  • RDB Loader: CVE-2024-21634 [7.5 – high], CVE-2024-25710 [7.5 – high], CVE-2024-26308 [7.5 – high]

Subscribe to our newsletter

Get the latest blog posts to your inbox every week.

Get Started

Unlock the value of your behavioral data with customer data infrastructure for AI, advanced analytics, and personalized experiences