Introducing Advanced Event Filtering: Streamline Your Snowplow Data Pipelines
Data engineers and developers, we've heard your feedback. Processing irrelevant events wastes your resources and inflates your costs. Today, Snowplow announces a solution that addresses this problem at its source: pipeline-level event filtering during enrichment.
This new capability lets you define precise JavaScript conditions to identify and remove unwanted events—whether from bots, deprecated apps, or test environments—before they progress through the rest of your pipeline. By filtering at an early stage, teams eliminate unnecessary processing costs, storage fees, and downstream filtering complexity.
The Challenge: Not All Data Deserves Equal Treatment
In the world of data collection, volume doesn't always equal value. Even perfectly structured and enriched events can be irrelevant to your business goals:
- Bot traffic ranging from search engine crawlers to malicious scanners generates events that distort your customer behavior metrics
- Deprecated applications continue sending analytics data long after they've been phased out of your ecosystem
- Test environments produce events structurally identical to production but irrelevant for analysis
These events don't just muddy your analytics—they cost real money through:
- Streaming and compute resources needed for processing
- Storage fees for data you'll never use
- Additional workloads to filter them downstream
Previous Limitations
Until now, options for handling irrelevant events have been suboptimal:
- Bot protection products offer high effectiveness for known patterns but add costs, potential UX impacts like CAPTCHAs, and are limited to just bot traffic
- Validation failure approaches remove unwanted events from analytics but pollute data quality metrics and can mask legitimate issues
- Downstream filtering creates clean data layers but processing and storage costs remain, while adding pipeline complexity
Introducing Pipeline-Level Event Filtering
Our new feature enables you to filter out unwanted events directly during the enrichment phase of your Snowplow pipeline. This approach offers several key advantages:
1. Cost Efficiency
Events filtered at this stage:
- Don't consume additional compute resources
- Don't incur storage costs
- Aren't counted toward your Snowplow usage metrics
2. Developer-Friendly Configuration
Filter conditions are defined using JavaScript, giving you the flexibility to:
- Target specific event fields
- Leverage existing enrichment data (including bot detection signals)
- Implement complex conditional logic for precision filtering
Getting Started
This feature is now available to all Snowplow BDP customers. To implement, please refer to the documentation or get in touch at support@snowplow.io.