Snowplow Product Directory

no image

Last updated: 30 September 2025

This Snowplow Product Directory provides descriptions of the products you have purchased on an Order Form or enabled as part of a pilot. While our Product Directory may be updated from time to time, the descriptions of the products as of the Start Date in your Order Form will apply to the Products specified in your Order Form. If new terms are introduced for new features or functions made available within a Product during the Term of your Agreement, these new terms will apply to the use of those new features or functions if you use them.

Snowplow Customer Data Infrastructure

Snowplow Customer Data Infrastructure (“CDI”) enables businesses to own and unlock the value of their customer behavioral data across all digital touchpoints to fuel AI-driven analytics, real-time customer experiences, fraud mitigation, and agentic applications. Our CDI collects, manages, and delivers this BI- and AI-ready data to your chosen destinations to help you improve your data quality, data governance, and data usability.

Our CDI product includes Data Pipeline, Data Management, and Extensions components organized into Workspaces. 

Workspace:

A Workspace is an isolated environment that includes a production Data Pipeline as well as optional quality assurance (QA) or development Data Pipelines.

Data Pipeline

The Snowplow Data Pipeline enables the collection of your behavioral data across multiple digital touchpoints and applications, including but not limited to:

  • Web applications
  • Mobile applications
  • Desktop applications
  • Server applications
  • Smart TV applications

Your data is processed by Snowplow in real time. Data processing steps include, but are not limited to:

  • Validating the data against schemas - these are objects which define the structure of the data that you collect, including the fields that are recorded with each event and the validation criteria for each of these fields.
  • Enriching the data with first-party and third-party data sets
  • Obfuscating data fields (e.g., for data protection)
  • Transforming the data into a format optimized for loading into downstream data destinations

The data is delivered into downstream destinations activated by you. These cloud data destinations can include:

  • Data warehouses and lakehouses (e.g., Snowflake, Databricks, GCP BigQuery, AWS Redshift, AWS S3, Azure Fabric)
  • Streaming technologies (e.g., Kafka, GCP Pub/Sub, AWS Kinesis, Azure Event Hubs)

Distribution: Cloud or Private Managed Cloud (“PMC”)

Snowplow Data Pipeline is distributed via two models: Cloud and Private Managed Cloud.

If you choose Cloud distribution, all your customer behavioral data is processed on Snowplow’s own infrastructure, in our own cloud environment, before being delivered to your selected cloud data destination (e.g., data warehouse or lakehouse).

If you choose the Private Managed Cloud (PMC) distribution, your customer behavioral data is processed end-to-end in your own cloud infrastructure (on your AWS, GCP, or Azure Cloud accounts).

Outage Protection (AWS only)

The Outage Protection product protects you against data loss in the event of a region-wide AWS outage. In the event of a region-wide AWS outage, this service redirects your data processing into a secondary region selected by you, until the outage is over and your data can be redirected to your previous region. While we can’t guarantee that data loss will not be completely eliminated, Outage Protection will minimize that as much as practicable.

Infrastructure and Security

Standard

The following Infrastructure and Security features are included with the Data Pipeline:

Feature

Cloud

PMC

SSO

Single sign-on with support for Active Directory/LDAP, ADFS, Azure Active Directory, Azure Active Directory Native, Google Workspace, OpenID Connect, Okta, PingFederate, SAML.

Custom IAM Permissions Boundary (AWS only)

To control what IAM permissions Snowplow services are allowed to have, an IAM Permissions Boundary policy may be configured by the customer which can sandbox the service in addition or in exchange of account-wide permissions.

 

Custom Tagging (AWS only)

Up to 5 custom tags can be defined that will be appended to every AWS resource that Snowplow deploys. If needed specific tags can be defined for VPC assets and S3 bucket assets that are not propagated to every other resource.

 

High

The following Infrastructure and Security features can be added by customers to their Data Pipeline:

Feature

Cloud

PMC

PrivateLink (AWS only)

Connect to Snowflake or Redshift warehouse via a private AWS network using the AWS PrivateLink feature, as opposed to the public Internet. For Snowflake, PrivateLink is only supported for loading data but not for data modeling.

VPC Peering (AWS and GCP only)

As part of the Snowplow pipeline setup, a Virtual Private Cloud (VPC) housing the pipeline is set up in your cloud account. If you wish to enable VPC peering between any existing VPC you own and the new Snowplow VPC, can choose the CIDR/IP range used in the Snowplow-setup VPC so that peering is possible.

HTTP Access controls

All HTTP (i.e. non-encrypted) traffic to internet facing load balancers deployed as part of Snowplow can be disabled.

 

SSH access controls (AWS only)

As part of your internal security policies Snowplow’s SSH access to the environment can be disabled.

 

CVE Reporting (AWS and GCP only)

CVE Reporting provides a periodic report on Common Vulnerabilities and Exploits identified in any relevant software component, as well as regular patching of the same

 

Advanced

The following Infrastructure and Security features can be added by customers to their Data Pipeline:

Feature

Cloud

PMC

Custom CMK (AWS only)

Allows to encrypt Snowplow data in AWS Kinesis using a custom CMK (customer managed key) from another AWS account.

Custom IAM Policy (AWS only)

As part of agent installation on EC2 nodes extra IAM permissions can be required (e.g. SSM agent) for correct functionality.  IAM policies attached to EC2 servers can be extended with a customer defined policy if needed.

 

Custom VPC integration (AWS only)

As part of a Private Managed Cloud deployment, Snowplow deploys a VPC for all other Snowplow infrastructure  to be deployed within. If customers require Snowplow to set up pipelines and other Snowplow infrastructure into a pre-existing VPC (rather than creating one from scratch), they need to select this option. 

This VPC must allow Snowplow access to the internet via a directly connected Internet Gateway (IGW) and ensure sufficient NACL rules are allowed for the deployment to function as expected in order to be signed off by the Snowplow team prior to deployment.

 

Custom security agents (AWS, GCP only)

On AWS, for all EC2 servers that are deployed as part of the service, a customer’s custom security agents may be installed via an S3 object made available by the customer. On all EKS clusters that are deployed as part of the service a customer’s custom security agent can be deployed via a helm chart.


On GCP, for all GKE clusters that are deployed as part of the service, a customer’s custom security agents may be installed via a helm chart. 

 

Custom EKS AMIs (AWS only)

Provision of a custom hardened AMI (machine image) for use in EKS node pools instead of standard AWS images.

 

* Note: the features that apply to you depend on the cloud infrastructure vendor you are using. For example, Customer EKS AMIs described above only apply if you use AWS and are a PMC customer who purchases a subscription to the “Advanced” Infrastructure and Security.

Event Forwarding

Event Forwarding enables organizations to seamlessly deliver enriched behavioral data to their preferred downstream application and streaming destinations in real-time. 

Event Forwarding has native integrations with platforms like Google Tag Manager Server-Side, Kafka, Braze, and Amplitude, driving use cases such as real-time personalization, customer engagement, and analytics with minimal latency. Snowplow continually adds support for additional destinations.

Event Forwarding supports JavaScript-based event filtering, data mapping, and transformations to ensure your data is delivered in the format preferred by your downstream applications.

Data Management

Snowplow provides standard access to functionality to help you manage and govern your behavioral data. This functionality is called “Event Data Management” and is further described in the table below.

You can also subscribe to our “Data Product Studio” and “Data Model Pack” products. These services provide enhanced functionality to manage and model your behavioral data, further described in the table below.

Event Data Management

Access to a library of Snowplow SDKs for collecting behavioral data in different application environments. A full list of our current Snowplow SDKs can be found in our documentation.

Tools for developers to support your setup of Snowplow’s SDKs including:

  • Snowtype: A tool for enabling developers to more easily integrate Snowplow tracking SDKs based on their data design by creating type-safe, client-specific functions and methods for instrumenting Snowplow tracking.
  • Snowplow Micro: A tool to enable developers to inspect Snowplow data from a development environment easily, and set up automated tests to fail builds that break Snowplow tracking.
  • Snowplow browser extension: A tool to enable developers to conveniently inspect and validate web tracking via a Chrome plugin.
  • Cookie extension service (formerly ID service):  A tool to help you set your own first-party persistent cookies for tracking your users on your web domains.

A user interface that enables you to:

  • Instrument and configure pre-built Snowplow behavioral data sets (behavioral data products), including authoring up to 5 new custom schemas (data structures).
  • Enable and configure enrichments on the behavioral data sets
  • View the different behavioral data products you are using Snowplow to deliver, including the data definitions and instructions on how to access and understand the data.
  • Receive alerts for any deviations (quality issues) in the data collected from those definitions. (Failures can be reviewed via a live dashboard.)

Data Product Studio

Access to functionality to help you define, extend, manage, and socialize the data generated by the Snowplow Data Pipeline. The Data Product Studio includes the following functionality to enable you to:

  • Design unlimited new behavioral data sets (behavioral data products), including defining new schemas (data structures) and event specifications (semantics).
  • Assign ownership to those data sets.
  • Provide controls on who can create and update behavioral data set definitions.
  • Generate machine-readable data contracts for those behavioral data sets.
  • Report against the data set definition/design.
  • Record changes to data set definitions over time.
  • Enable users within your organization to “subscribe” to updates and receive notifications on changes to the associated definitions.

Data Model Packs

The Digital Analytics Data Model Pack is comprised of predefined data models, data product templates and example dashboards with visualizations for the following use cases:

  1. User and Marketing Analytics: Understand your customer engagement with digital channels.
  2. Marketing Attribution: Understand the impact of different marketing channels on conversions and traffic levels.
  3. Funnel Analytics: Understand the sequential steps users take toward a specific goal, identifying drop-off points and optimizing the user journey for higher conversions.
  4. Video and Media Analytics: Understand engagement with video, audio, and streaming content, including clicks through to conversions and advertisements.

The Data Model Pack includes data product templates which specify the events to be tracked and data models (written using dbt open source software) that aggregate the underlying event-level data in the cloud data destination into AI and Business Intelligence - ready tables. Example tables include a user-level table, a session-level table, and a pageview-level table. These tables directly power the graphs and charts in the example visualizations and user interfaces, and can be used and customized by you to perform more sophisticated analytics and AI. The included dbt packages are: 

  • Unified Digital: Understand user behavior across web and mobile apps 
  • Attribution: attribute conversions and revenue through multiple attribution methods: 
  • Media Player: calculate aggregate play and ad statistics across video, audio and streaming content
  • Normalize: filter and flatten your event data into format more suitable for downstream applications 
  • Utils: contains our base processing logic for all other packages"

The data models implement several data processing steps, including but not limited to:

  • Deduplicating the underlying event data
  • Stitching user identities across different platforms and channels (e.g., web and mobile)
  • Accurately calculating time spent engaging with different content items (e.g., web pages, mobile screens)
  • Sessionizing the data

The data models aggregate the data in a performant, incremental fashion which may reduce your cost of data processing and increase the speed of data delivery. The data models are extendable and run in your selected cloud data destination.

The Ecommerce Analytics Data Model Pack includes the underlying dbt models (written using dbt open source software) in the Digital Analytics Data Model Pack, but also includes an associated ecommerce data product and dbt package  to help understand and optimize a digital shopping experience. 

The ecommerce dbt package creates AI and Business Intelligence-ready tables describing carts, checkouts, product performance, transactions, and sessions. These tables directly power the example visualizations and user interfaces, and can be used and customized to power more sophisticated analytics and AI.

Extensions

Snowplow Extensions introduces tools and integrations that enhance Snowplow core functionality, enabling organizations to seamlessly extend the value of their behavioral data. Extensions are designed to empower teams to operationalize insights, streamline workflows, and sync data by connecting Snowplow to a broader ecosystem of platforms and tools.

Reverse ETL

Reverse ETL, powered by Census, is a reverse ETL tool that empowers data teams to operationalize their data by seamlessly syncing insights from data warehouses to business tools like CRMs, marketing platforms, and analytics tools. Designed for flexibility and precision, reverse ETL enables organizations to create personalized customer experiences, streamline workflows, and drive better decision-making by making data actionable across teams. With robust automation, field-level controls, and support for complex data models, reverse ETL bridges the gap between data warehouses and the tools where business happens.

Audience Hub

Audience Hub, powered by Census, enables teams to create, manage, and activate highly targeted customer segments directly from their data warehouse. With an intuitive, no-code interface, Audience Hub empowers marketing, sales, and customer success teams to craft dynamic audiences based on real-time data, behavioral insights, and business logic. By eliminating the need for complex engineering workflows, it accelerates the ability to personalize campaigns, drive engagement, and improve customer retention.

Snowplow Signals

Snowplow Signals (“Signals”) is a real-time customer intelligence system that enables product and engineering teams to compute, store, and serve user attributes for real-time applications. Signals transforms behavioral data into actionable customer context for personalization engines, adaptive user interfaces, and agentic applications.

Signals operates within your cloud infrastructure and integrates with Snowplow Data Pipeline(s). User attributes can be computed using both streaming behavioral data and batch warehouse data. The system includes authentication mechanisms, autoscaling capabilities, and monitoring tools for production deployment. Signals is compatible with AWS and GCP cloud platforms and supports integration with Snowflake and Google BigQuery data warehouses. 

Signals offers fast-start tooling and resources for developers, including:

  • SDKs for Python and TypeScript to define user attributes and retrieve profile data
  • Declarative attribute definition framework 
  • Solution accelerators, code samples, and integration documentation
  • No- code user interface for configuring and monitoring Signals and managing Profiles Store and Interventions.

Signals includes two core components:

Profiles Store

A low-latency API that provides applications with access to real-time and historical user attributes. The Profiles Store is populated by two data processing engines:

  • Streaming Engine: Computes real-time user attributes from live event streams in seconds
  • Sync Engine: Processes historical customer data from data warehouses and lakehouses, with materialization capabilities to sync computed user attributes to the profile store. Snowplow customers can generate attribute tables using Signals, or sync their existing tables based on a shared attribute key. 

Profile Reads are requests to the Profiles Store API to retrieve user attributes.Profile Writes are updates to user attributes in the Profiles Store from the Streaming Engine or Sync Engine.

Interventions

A real-time decisioning framework that detects user behaviors based on configurable rules or machine learning models and executes personalized actions within applications. Interventions operate on the event stream within the Snowplow Data Pipeline to enable in-session personalization and contextual responses.