Start creating behavioral data faster with Snowplow BDP Cloud.
Join the waitlist for Snowplow BDP Cloud
Engineering, Product features, Releases

Expanding our range of real-time destinations with Snowbridge

Summary

Problem: Connecting real-time Snowplow data to a broad range of analytics tools and applications has been a common challenge for Open Source users.

Solution: We are open-sourcing Snowbridge — an application that can forward Snowplow data to various streaming platforms (e.g. Kafka) and analytics tools (via Google Tag Manager) in real time.

Prerequisites: Running Snowplow on AWS or GCP

Available from: January, 2023

Documentation:  Available here

Background

Snowplow enables organizations to create high-quality behavioral data. This data can power numerous use cases, such as product analytics (e.g., to find out which product features are most valuable) or content optimization (e.g., to decide which articles to display on the home page). Canada’s leading news publisher The Globe and Mail did exactly this through their AI platform Sophi, as illustrated in this case study.

More often than not, the best place for the rich, reliable, and accurate data you create with Snowplow is your data warehouse or data lake. 

That’s why we provide out-of-the-box loaders for a number of popular destinations, including BigQuery, Redshift, and our technology partners Snowflake and Databricks. Any of these targets can become the source of truth for your entire organization and power various applications, especially when used in conjunction with Reverse ETL.

The Problem

Sometimes, you need to consume the data in real time, and doing so from the warehouse is not always practical. That’s why Snowplow also provides the events in Kinesis or Pub/Sub streams on AWS and GCP, respectively.

Still, we felt that something was missing. Something that would allow users to easily send real-time data to analytics applications, other streams, or even other clouds. Ideally, without having to write much code. Not only did we build it, we are also making it available as open source! Meet Snowbridge.

What is Snowbridge?

Snowbridge is an application that takes data — including Snowplow data — from AWS Kinesis, AWS SQS or Google Pub/Sub, and forwards it to Kinesis, SQS, Pub/Sub, Apache Kafka, Azure Event Hubs, or any HTTP endpoint (for testing purposes, standard input and standard output are also supported).

In addition, Snowbridge can filter and transform data along the way, either with built-in functions tailored to Snowplow events or with a custom script.

A key benefit is very low latency (we’ve seen 70 ms on average) and the ability to scale horizontally to thousands of events per second. We have been using Snowbridge in production in our Behavioral Data Platform (Snowplow BDP) product for over a year where it powers our Destination Hub.

What can I do with Snowbridge?

You do not need to use Snowplow to take advantage of Snowbridge. It is a standalone application that can help you replicate data from one stream to another, for example, if you are building a cross-cloud solution.

However, Snowbridge really shines when combined with Snowplow pipelines.

Integrating with streaming platforms

If you are running Snowplow on AWS or GCP, you can immediately take advantage of Snowbridge to forward your validated and enriched data to a cloud-agnostic streaming platform like Apache Kafka and integrate with other systems. 

You only need minimal configuration, like this:

source {
  use "pubsub" {
    project_id = <your GCP project id>
    subscription_id = <your Pub/Sub subscription id>
  }
}

transform {
  // only pass the web events
  use "spEnrichedFilter" {
    atomic_field = "platform"
    regex = "web"
    filter_action = "keep"
  }

  // convert Snowplow events to JSON
  use "spEnrichedToJson" {}
}


target {
  use "kafka" {
    brokers = <your Kafka broker connection string>
    topic_name = <your Kafka topic name>
  }
}

Integrating with analytics applications

Kafka might be a good choice if you are building your own data analytics stack or even a real-time recommendation engine. But what if you just want to feed your data into existing analytics tools?

Snowbridge can send events to any HTTP endpoint. As long as your preferred tool supports HTTP input, with some scripting you can “massage” your data into the format the tool expects and then send it. 

That still sounds like a bit of work, doesn’t it?

This is where Google Tag Manager (GTM) server-side comes into play. It can host various server-side “tags” that forward events in real time to myriad applications and platforms. With Snowbridge, any Snowplow user can tap into this ecosystem. Simply point Snowbridge to your GTM server container.

We have prepared our own GTM tags for Amplitude, Braze and Iterable, with more in the pipeline. These tags take advantage of Snowplow’s rich data. For example, if your events contain context specific to your organization, you can specify how it should be mapped to the properties available in the destination tool.

Suppose you want to send your Snowplow data to Amplitude. What are the steps?

Step 1. Set up your GTM server-side instance.

Step 2. Add Snowplow Client to GTM.

This module populates the common event data so that many GTM server-side tags understand your Snowplow events with no additional configuration. The module also populates a set of additional properties to ensure that the rich Snowplow event data is available to the tags that benefit from it (like our own tags mentioned above).

Step 3. Add our Amplitude Tag

You can configure how Snowplow schemas translate to Amplitude. For example, you can map the context coming from the Snowplow Media Tracking plugin to Amplitude’s YouTube event properties.

Step 4. Configure Snowbridge to send events to your GTM instance. 

You will need something like this:

source { … }

transform {
  // convert Snowplow events to JSON
  use "spEnrichedToJson" {}
}

target {
  use "http" {
    content_type = "application/json"
    url = <your GTM server container endpoint>
    
    // add a preview header if using GTM in preview mode
    headers = "{\"x-gtm-server-preview\": \"...\"}" 
  }
}

Step 5. Test your setup. 

You can enable GTM’s preview mode and run Snowbridge locally, sending events to standard input. You should see them arrive in GTM and trigger the Amplitude tag.

Step 6. Enjoy your Snowplow events in Amplitude!

* * *

Going forward, the next logical step would be for Snowbridge to replace the various ad hoc streaming connectors we have developed over the years, such as sqs2kinesis or snowplow-mixpanel-relay. All in a single package that’s easy to maintain and contribute to. Speaking of that…

Snowbridge is open source

Snowplow is an open core company, and a large portion of our code is open source. From the beginning, we have been building our product transparently and collaboratively, making the key building blocks available for free. We believe this is the right way to develop software and this remains integral to our corporate culture.

Snowbridge is no exception — you can find the code on GitHub. It’s written in Go, a popular and easy to learn language widely used in networking and low latency applications.

Have an idea for a new source or destination? Perhaps streaming the data to your favorite analytics tool directly, bypassing Google Tag Manager? Or perhaps a new integrated transformation for Snowplow data? We welcome all contributions and hope Snowbridge grows thanks to input and collaboration with our community.

Snowplow Community License

Most of our code is released under the Apache 2.0 license. However, because Snowbridge is so powerful and flexible, it is important to specify that it may not be used to develop competing products. Therefore, we are introducing the Snowplow Community License, which includes a non-compete clause while preserving the spirit of open source development.

You can find the license and the answers to common questions in our documentation.

Getting started

Ready to dive in? Everything you need to get started, including the start guide and configuration examples can be found in our documentation.

If you have any questions, or want to let us know what you think (we’d love to hear from you), Discourse is the best place to discuss all things Snowbridge.

More about
the author

Nick Stanchenko
Nick Stanchenko

Product Manager, Open Source

View author

Ready to start creating rich, first-party data?

Image of the Snowplow app UI