1. Home
  2. Blog
  3. How To Guides
Featured, How to guides

Snowplow for Media part 1: how can I use Snowplow?

Jump to

Media organizations are increasing realizing the potential for behavioral data. Whether it’s to inform their newsroom and editorial teams, to power personalization engines or optimize their paywall for conversion, there are huge opportunities for Media companies who embrace behavioral data.

Snowplow is the trusted Behavioral Data Platform for Media firms, helping them to answer questions such as:

“We don’t know where to focus our content creation efforts; which content leads to retention and subscription? which content categories, authors, themes drive high CPMs?

We have several disparate brands and users have multiple products on the web and in our app so we don’t have a single customer view. This means we can’t effectively group our users based on engagement or monitor how they are retained.

We don’t have a good understanding of how our marketing spend across multiple channels affects subscription or readership.”

This series will explore how Snowplow can support Media firms to answer these questions and more. Click below to reach the most relevant section for you:

  1. What should I track?
  2. What can we do with the data, when we’re getting started?
  3. What can we do with the data, while we’re growing?
  4. What can we do with the data, now we’re well established?

Snowplow is trusted by leading Media organizations, such as the below:

Media companies using Snowplow

White paper

Get our guide to better data quality and improved data governance


What is Snowplow?

Simply put, Snowplow is a behavioral data platform.

Snowplow runs in your cloud environment (AWS or GCP) to capture and deliver rich, high-quality behavioral data at scale.

You can use one of Snowplow’s trackers in your website, app or server; or a Snowplow webhook to capture third-party data and the Snowplow platform will deliver the data to your destination of your choice. From there, you are free to use the data as you wish.

the Snowplow pipeline

Why would I use Snowplow?

There are several reasons why Snowplow is a powerful solution for Media data teams:

  1. All your data lives in your own environment, not Snowplow’s, so you have unlimited access to all your event level data;
  2. Data is available in real time, which means fresh data for your reports or apps in seconds;
  3. Data from all platforms (web/mobile/server-side/emails/ad impressions) is structured in the same way and stored in the same place;
  4. It’s possible to full customize your data capture with Snowplow, letting you set up data collection to work for your unique business and use cases;
  5. Lots of tooling around privacy to allow for GDPR compliance including data hashing at collection and scrubbing stored data;
  6. Data can be enriched with 3rd-party sources to bring more meaning and context to your data set;
  7. Snowplow data is well structured and of exceptionally high quality due to the validation step of the pipeline (events that fail validation are also stored, but not in the warehouse, meaning the pipeline is non-lossy) making it a perfect input for machine learning models or advanced data use cases such as personalization;
  8. Snowplow tracking is versioned, enabling you to evolve your data strategy over time. As your business grows and you add more features, you can adapt your data collection approach to meet your needs;
  9. Snowplow events are extremely rich – each one collected with at least 130 properties (where available) allowing for a deep understanding of your users.

How do I use Snowplow?

With Snowplow, you’re free to focus on meaningful data projects. All you need to do to get started is:

  1. Decide what you want to track;
  2. Start building data models so the data can be put to use.

All it takes is a front-end developer who can paste tracking code in your website, app and server as required. For out-of-the-box tracking (page/screen views, heartbeats, link clicks, form fills, searches), this should only take an hour but for custom tracking this can take up to a day.

Snowplow is designed to support data teams who can empower their organization with rich, behavioral data.

What should I track?

With Snowplow, you can track entities as well as events.

For now, let’s assume you’ve set up tracking and are looking at the data coming in.

What will the data actually look like?

Let’s take an example of someone looking to learn more about machines that plow snow.

Snowplow data structure

This shows a very simplified user journey as it would appear in your data warehouse. Data from all trackers is loaded into one table, the image above is what a subset of your BigQuery columns may look like.

Note: remember that only 3 out of 130 out of the box properties are shown here, each event can also come with timestamps, weather, location, device, cookies, marketing campaign and much more. In addition, each custom event and entity can have many many more properties, only a subset are shown here.
Warning: The name is shown as an example field to make the blog post more readable, always be cautious collecting PII.

These are the actions that correspond with the data in this table:

  1. Someone goes on their laptop and opens an email sent by you. You know which campaign it was part of and more importantly, the 3rd-party cookie of the user that opened it.
  2. They click a link in the email and visit your site, you know what page they visited when, and on which browser and device.
  3. They decide they want to learn more about machines that plow snow so they search for “snowplo” as they are unsure how to spell it. Your site returns 2 results, one for an article on Snowplows, the other on Snowploughs.
  4. They choose the article on Snowplows! You also know who wrote the article, any products/companies mentioned in the article, the length of the article and whether the article is a native ad as these are custom properties of the content entity (not shown in table above).
  5. They enjoy the article so much they share it. You know that they shared it on Facebook as that was one of the properties of the engage event (not shown).
  6. They attempt to download a fact file on snowplows but they are blocked by a paywall.
  7. The paywall works and they begin the subscription flow by entering their personal details like name (events for this not shown) to create an account choosing a monthly plan.
  8. After some other steps (not shown), Joe successfully subscribes. A subscription ID is now associated with all future events created by Joe on those devices.
  9. Since this is a critical event to track, to get around potential ad blocker issues, the subscription is confirmed server-side too.
  10. Some days later Joe is on their phone, having previously downloaded the app and logged in, to find that article again to show some frien
    ds they are with.
  11. Joe and friends watch a Snowstorm video in the related content section of the Snowplow article in the app.
  12. They spot a picture in the same article that they want to send to another friend so they screenshot it. In reality this event occurred offline when Joe and friends were in the metro but the event was sent when connection was re-established (the timestamp with the event was actually accurate to when the event was created).

Hopefully you can now begin to see what Snowplow can do. The tool collects and delivers great raw data. What you do next is in the hands of your data team.

What can we do with the data?

You can use this wealth of data in a number of ways to drive value:

  • Increase the lifetime value of subscribers;
  • Increase revenue from ads;
  • Reducing spend on content creation;
  • Reducing marketing spend;
  • Increasing subscription/donation rates.

What you do with the data depends to a large degree on the level of data maturity in your organization. To start with, let’s look at how Snowplow can be effective across three levels of data maturity. Follow the links to read a full post on how a team of each size in the Media sector could consume Snowplow data (note that each post assumes you have read the previous ones):

In each section, let’s look at use cases that make most sense to explore.

  • We’re getting started (AKA Data aware: the data team has one data analyst )
    • Stitching user journeys across web and mobile
    • Aggregating that data to understand engagement
    • Retention analysis
    • Marketing attribution
  • We’re growing(AKA data adept: the data team has several analysts)
    • Tracking server-side and ingest 3rd party marketing data
    • Funnel analysis and paywall optimization
    • Advanced user stitching
    • Content production and producer dashboards
    • Ad analytics
  • We’re well established(AKA Data informed: We have a thriving data team of analysts, engineers and scientists)
    • Marketing automation
    • Personaliz of the product
    • Recommendation engines
    • Anomaly detection
    • Fraud detection
    • Sentiment analysis

Dare I ask, GDPR?

If you are an existing Snowplow customer or are interested in becoming one, do get in touch and we can send you extensive documentation on how we help you comply with GDPR.

Otherwise, there are some key points to consider:

  • Specific rows of data can be deleted from all the places they are stored if necessary;
  • Personal Identifiable Information (PII), such as IP address, can be hashed by the pipeline so you can still analyze anonymous user behavior;
  • Fields can be removed from events using custom enrichments if necessary;
  • The client data that you capture all lives in your own cloud environment, not Snowplow’s. You always have total control and ownership of your data.

Read next: What do I track?

Find out how we’ve helped other media companies accelerate their data journeys. Book a demo today

More about
the author

Archit Goyal
Archit Goyal

Product Strategy at Snowplow

View author

Ready to start creating rich, first-party data?

Image of the Snowplow app UI