Companies use the Snowplow Behavioral Data Platform to manage and operationalize behavioral data.
Behavioral data is collected from different digital touchpoints, typically including websites and mobile apps (referred to as “Sources”). Sources can include third party SaaS solutions that support webhooks (e.g. Zendesk, Mailgun).
The Snowplow Behavioral Data Platform processes and delivers this behavioral data to different endpoints (referred to as “Destinations”). From there, companies use that data to accomplish different use cases. Destinations include data warehouses (e.g. AWS Redshift, GCP BigQuery, Snowflake, Azure Synapse Analytics), data lakes (e.g. AWS S3, GCP GCS, Azure OneLake) and streams (e.g. AWS Kinesis, GCP Pub/Sub, Azure Event Hubs).
Data Applications can (optionally) be purchased as part of the Snowplow Behavioral Data Platform. These can include data models, that transform the data in the data warehouse into AI and BI-ready tables, and a user interface to enable end-users to use the data to drive insight and activation.
Behavioral Data Pipeline
Key Capabilities
Robust data delivery
The Data Pipeline powers every step in processing the event data, from receiving events to delivering them to a data warehouse or data lake destination. It is engineered for maximum reliability – it’s highly available and scales automatically to match the load levels.
Secure data handling
Regardless of the deployment model (Cloud or Private Managed Cloud), the Data Pipeline is ISO 27001 compliant and is designed such that the data is always owned by the customer (rather than Snowplow), as an asset in their data warehouse or data lake.
Key Features
Deployment
Customers are able to choose between two deployment models for their Data Pipeline(s): Cloud or Private Managed Cloud.
Cloud
The Data Pipeline is deployed in Snowplow’s cloud infrastructure and configured to load to a customer-owned cloud destinations (such as a data warehouse/lake or event stream). This is a traditional SaaS deployment model.
Private Managed Cloud (PMC)
The Data Pipeline is deployed in the Customer’s cloud infrastructure (AWS, GCP, Azure account) and managed by Snowplow. This is a secure deployment model whereby the customer has full visibility and audibility over the data processing steps and data itself. The management of the Data Pipeline by Snowplow involves:
- Proactively ensuring uptime
- Managed upgrades, including compliance and security releases
- Performance and cost optimization
IAM Permissions Boundary
For the Private Managed Cloud service, to control what IAM permissions Snowplow services are allowed to have, a IAM Permissions Boundary policy may be configured by the customer which can sandbox the service in addition or in exchange of account wide SCPs.
Data processing-as-a-Service
The Snowplow Behavioral Data Platform is provided as a service. Snowplow takes responsibility for the setup, administration, and successful running of the Snowplow behavioral data pipeline(s) and all related technology infrastructure.
Please note for customers on our Private Managed Cloud service, the Snowplow team can only do the above subject to the customer providing Snowplow with the required access levels to their cloud infrastructure, and compliance with all Snowplow Documentation and reasonable instructions.
Data residency
For all data processed and collected with Snowplow Behavioral Data Platform, the customer decides:
- What data is tracked
- Where it is processed and stored (e.g. what cloud and region)
- What the data is used for and who has access to it
- How long the data is retained
Each of customer’s and Snowplow’s obligations with respect to data protection and privacy are set forth in a Data Protection Agreement.
For the Private Managed Cloud service, all data processed and tracked with the Snowplow Behavioral Data Platform is undertaken within the customer’s own cloud account (e.g. AWS, GCP, Azure). It is the customer’s obligation (and not Snowplow’s) to maintain and administer the cloud account.
For BDP Cloud customers, behavioral data is processed in Snowplow’s own cloud account (e.g. AWS, GCP, Azure) before being delivered into the destinations specified by the customer. (Which will typically include e.g. a data warehouse or lake in the customer’s own cloud environment.)
Snowplow has maximum retention limits for Behavioral Data Platform customers using the Cloud service.
Workspaces
Customers will have access to one workspace as part of the Behavioral Data Pipeline consisting of one production pipeline, 0 or 1 staging pipelines and 0 or 1 development pipeline.
Test environments
Full staging pipeline
A minimally specced staging pipeline can be deployed to enable full end-to-end testing of changes prior to deployment in a production environment.
Development pipeline
An additional, limited functionality pipeline (aka Snowplow Mini) is deployed for real time debugging of events in OpenSearchagainst schemas and enrichments in development.
Integrated Test Suite Functionality
Snowplow Micro is made available to be used in integrated test suites to run automated checks prior to changes being deployed to a production environment.
Behavioral Data Management
Event Data Management
Key Capabilities
Define and Track
The event data and its structure is defined up front and then tracked using a Snowplow SDK from websites, mobile, applications, IoT devices or server-side applications.
Validate and Enrich
The event data is processed in real time before being loaded to the destinations. Data is validated against the predefined structure and enriched against 1st- and 3rd-party datasets. Specific data fields can be pseudonymized and obfuscated at this stage as well.
Store
The event data is delivered to a data warehouse or lake in a uniformed format that follows the defined data structure and is optimized for analytics and AI applications.
Key Features
Robust 1st party cookies
Customers can configure a custom collector domain to match the primary domain on which data is being collected to enable first party cookies. Customers can enhance the robustness of these cookies where ITP is enabled using an ID service.
Data products
Data products are used to define the event data to collect upfront, and manage and govern the data over time. Customers can create custom data products to meet the bespoke needs of their business as part of the Data Product Studio, or can benefit from Snowplow authored event specifications as part of the Data Application packages.
Data quality reporting & alerting
Events are validated as they are processed by the Data Pipeline. Customers can access a UI to monitor and configure email alerts on events failing validation.
Obfuscation transformations
Customers have the ability to configure real-time event transformations to truncate IP address or pseudonymize fields that may contain PII such as user and session identifiers or referrer fields.
Enrichments
Event Data Management ships with a set of default enrichments that enrich the data with 1st and 3rd party datasets in real time. Customers can configure these enrichments via the Snowplow UI.
Storage destinations
Events can be stored in one of the supported data warehouses or lakes. Customers can set up the required destinations via the Snowplow UI and monitor the number of events loaded.
Data Product Studio
Key Capabilities
Collaborative tooling
Data Product Studio ships with workflows and tooling designed to minimize data silos and supports better collaboration across data producers and data consumers within a business.
Control & flexibility
Data Product Studio functionality provides customers with functionality to design and manage event data that is completely custom to the needs of their business. Centrally managed data schemas support in creating well governed, consistent event data across an organization.
Governance & data quality management
Customers are able to create well-documented, well-governed datasets and have access to enhanced data quality tooling, to support in creating trusted datasets that drive a self-serve culture around data.
Key features
Custom Events & Entities
Customers can access a UI in the Snowplow Data Product Studio to create, manage, and change the definitions for custom events and entities.
Custom Data Products
Customers will have access to functionality to create & manage custom data products, and deploy & manage custom data models.
Tracking Catalog
Tracking Catalog helps customers better understand the data has already been collected. It contains a list of all the events, entities, and properties that went through the Data Pipeline as well as functionality to see all properties associated with events and their structure, including an event map, which shows which entities are tracked with each event.
API key access
Access to Snowplow Data Product Studio’s API to read and write event definitions, access failed event aggregates, retrieve and execute data models and manage data products.
Fine Grained User Permissions
Admin users of the Snowplow Data Product Studio can configure custom permissions for others users governing level of access to the monitoring and configuration functionality available in the UI and APIs.
Behavioral Data Applications
Digital Analytics
Key capabilities
Time to value
The Digital Analytics product consists of a suite of pre-built data applications that help customers to answer key questions about customer behavior and marketing performance on their digital products. These are accessible via the Snowplow UI.
AI & BI ready data
Digital Analytics ships with underlying dbt models, that prepare the data in the data warehouse in a format optimized for analytics and AI, and provide the specific aggregations necessary to power the data applications. These dbt models are run as part of the data app delivery.
Event volumes
Snowplow Behavioral Data Platform customers purchase capacity to process a certain volume of events for each month e.g. 100M, 5Bn.
Customers can use their capacity across all their Snowplow production pipelines if they have more than one: it is up to the customer how capacity is distributed between pipelines.
The “volume of events” in a given time period is calculated as the number of records written to the good enriched stream in that time period (UTC). For customers on the Private Managed Cloud deployment, on AWS this number is provided as a Cloudwatch Metric, on GCP it is provided as a Google Cloud Metric, on Azure it is provided as an Azure Monitor metric.
Enterprise Configurations
Enterprise configurations are a set of features that customers can purchase to meet more bespoke requirements.
Performance and Resilience
Customers with a Private Managed Cloud deployment on AWS with additional requirements may choose to enable the following features.
Outage Protection
An outage protected AWS pipeline is deployed to span multiple distinct AWS regions. One “failover pipeline” is deployed into a different region to the primary pipeline. In the event of an outage in the primary pipeline region traffic is rerouted to the failover pipeline which buffers the data collected until the outage is over at which point the buffered data is relayed back to the primary pipeline. This is available for Customers on AWS only.
Infrastructure and Security
Customers with a Private Managed Cloud deployment with custom infrastructure security requirements may choose from any number of these bundles to meet these requirements.
High
VPC Peering (AWS, GCP)
As part of the Snowplow pipeline setup, a Virtual Private Cloud (VPC) housing the pipeline is set up in the Customer’s cloud account.
Customers that wish to enable VPC peering between any existing VPC they own and the new Snowplow VPC can choose the CIDR/IP range used in the Snowplow-setup VPC so that peering is possible.
Http Access Controls (AWS)
All HTTP (i.e. non-encrypted) traffic to internet facing Load Balancers deployed as part of Snowplow Behavioral Data Platform can be disabled.
SSH Access Controls (AWS)
As part of customers internal security policies Snowplow’s SSH access to the environment can be disabled.
CVE Reporting (AWS, GCP)
CVE Reporting provides a periodic report on Common Vulnerabilities and Exploits identified in any relevant software component, as well as regular patching of the same.
Advanced
Custom VPC Integration (AWS)
As part of a Private Managed Cloud deployment, Snowplow deploys a VPC for all other Snowplow infrastructure to be deployed within.
If customers require Snowplow setting up pipelines and other Snowplow infrastructure into a pre-existing VPC (rather than creating one from scratch), they need to select this option.
This VPC must allow Snowplow access to the internet via a directly connected Internet Gateway (IGW) and ensure sufficient NACL rules are allowed for the deployment to function as expected in order to be signed off by the Snowplow team prior to deployment.
Custom IAM Policy (AWS)
As part of agent installation on EC2 nodes extra IAM permissions can be required (e.g. SSM agent) for correct functionality. IAM policies attached to EC2 servers can be extended with a customer defined policy if needed.
Custom Security Agents (AWS, GCP)
On AWS, for all EC2 servers that are deployed as part of the service, a customer’s custom security agents may be installed via an S3 object made available by the customer.
On GCP, for all GKE clusters that are deployed as part of the service, a customer’s custom security agents may be installed via a helm chart.
This is run as an addendum to Snowplow’s user-data scripts and can allow customers to meet certain security compliance needs.
Custom EKS AMIs (AWS)
Provision of a custom hardened AMI (machine image) for use in EKS node pools instead of standard AWS images.
Additional Cloud Destinations
Each workspace will load events to one data warehouse or lake. Private Managed Cloud customers also receive one streaming destination as part of the workspace.
Customers may choose to load event data to additional destinations. These are destinations within the customer’s cloud from where data applications can be built such as data warehouses, data lakes and event streams.
Additional Workspaces
Customers may purchase additional workspaces. Note that the additional workspace must be of the same deployment type as the first unless otherwise agreed with Snowplow.
Service Level Agreements (SLAs)
Snowplow provides Service Level Agreements (SLAs) on Collector Uptime, Data Latency and Support.
Collector Uptime SLA
Snowplow BDP customers benefit from uptime SLAs given above. Uptime refers to “collector uptime” i.e. the availability of the Snowplow collector, that receives data for processing from different sources.
‘Collector Uptime’ is the % of time that the collector is available over a calendar month, and is calculated according to the following:
[total minutes in calendar month that collector is ‘up’ / total minutes in a calendar month]
The collector is defined as ‘up’ if the number of 5xx responses make up less than 5% of responses in a 1 minute period. If there are no requests in the period the collector will also be defined as being ‘up’.
Snowplow Analytics Limited commits to providing a monthly Collector Uptime percentage to the Client as denoted by the pipeline type.
If Snowplow Analytics Limited does not meet the commitment over a calendar month, the customer will be entitled to service level credits on their Snowplow BDP monthly fee equal to the % of time that we are in breach of the SLA, up to a maximum of 20%.
The SLA will not apply to any downtime due to:
- An AWS, GCP or Azure outage
- A failure by AWS, GCP or Azure to scale up the collector load balancer*
- A Client making any direct configuration change to any of the Snowplow BDP infrastructure running in their cloud account
- A feature identified as being pilot, alpha or beta
* Snowplow is responsible for and controls the scaling of the collector application, but AWS, GCP and Azure control and are responsible for the scaling of the load balancer.
Data Latency SLA
Snowplow BDP customers benefit from Data Latency SLAs given above.
Snowplow Analytics Latency SLA is calculated as follows:
[Total time in a calendar month that the latency of the data is within the time period denoted/Total time in calendar month]
Snowplow Analytics will ensure that data is available in the destination selected within the time periods denoted by the pipeline type, 99.9% of the time each calendar month (based on UTC timestamp).
The latency of data in Redshift and Snowflake is measured at each point in time as the difference between the current time and the max collector timestamp for all events loaded into that destination.
The latency of data in BigQuery is measured by periodically sampling (e.g. every 1 second) the difference between the collector timestamp and the current time for events as they are loaded into the destination.
If Snowplow Analytics Limited does not meet the commitment over a calendar month, the customer will be entitled to service level credits on their Snowplow BDP monthly fee equal to the % of time that we are in breach of the SLA, up to a maximum of 20%.
The SLA will not apply if a failure to process and load the data into the data warehouse is due to a factor out of the control of Snowplow Analytics Limited, for example:
- Customer-owned and managed data warehouse has run out of capacity. (So it is not possible to load data, or to load it in a performant manner.)
- An outage of the data warehouse
- A broader outage in AWS, GCP or Azure
- The Customer making any direct configuration change to any of the Snowplow BDP infrastructure running in their cloud account
The SLA will also not apply:
- For failed events e.g. events that fail to validate against the associated schemas
- If latency is caused by features that are identified as either pilot, alpha or beta
- For any data that does not reach the Snowplow collector to be processed (e.g. because of an issue upstream of the collector such as a network or connectivity issue)
- Until the onboarding has been completed with the Customer, and production level volumes are being processed by the pipeline
In order to ensure that we can honor the SLA stated, Snowplow Analytics Limited reserves the right to make periodic adjustments to the Customer’s pipeline, unless otherwise agreed in writing with the Customer.
Support SLA
Snowplow provides support to Snowplow Behavioral Data Platform customers under the terms and services described in our Statement of Support.
When you contact Snowplow Support, tickets are prioritised according as level 1 (highest) to 4 (lowest) depending on the severity and impact of the issue or question. These severity levels are defined in our Statement of Support.
Snowplow Behavioral Data Platform customers also have access to our Community Forum.
Support SLAs are dependent on the severity of the issue and on which Support Level (Standard or Enhanced) you have purchased. Snowplow reserves the right to adjust the Severity of any Support Ticket to align with the definitions laid out in the Statement of Support.
In the event that a customer is entitled to access Snowplow Support through multiple agreements or multiple product purchases, then the highest SLA across entitlement will apply across all purchased products. SLA commitments are still bound by the terms of service outlined in the Statement of Support and associated severity of any ticket.
Snowplow will ensure that customers receive human responses to questions submitted via support tickets as detailed in the table above. Snowplow does not commit to resolution time SLAs. In the event that Snowplow does not meet its obligations under the SLA, the customer will be entitled to receive service level credits on their Snowplow Behavioral Data Platform fee of 2% of the monthly fee per support ticket if Snowplow ultimately determines that the SLA was breached, up to a maximum of 20% per month.
The monthly fee is defined as the annual contract value ÷ 12.
Support SLAs are outlined in the Product options above and are dependent on the severity of the issue and Snowplow Behavioral Data Platform product option you have purchased. Snowplow reserves the right to adjust the Severity of any Support Ticket to align with the above definitions. Snowplow will ensure that customers receive human responses to questions submitted via support tickets as detailed in the table above. Snowplow does not commit to resolution time SLOs or SLAs.
Service Level Credits
Combined credits across all SLA breaches (collector uptime, data latency and support) will be capped at 20% of the monthly fee.