Enriching Snowplow Events in Real Time with SQL Query Enrichment and MySQL
Snowplow’s SQL Query Enrichment provides a powerful mechanism to widen your event data in real time. By querying a relational database as events flow through the pipeline, you can attach relevant attributes without inflating your tracking payloads or deferring enrichment to a batch process.
In this tutorial, we’ll show how to use SQL Query Enrichment with a MySQL database to enhance form submission events with user information stored in an internal CRM.
The Use Case
Imagine a website with a simple lead capture form. When a user submits their email address, you want to enrich that submission with details already known about them—such as their name and signup date—stored in a MySQL database.
By using SQL Query Enrichment, you can do exactly this in real time, ensuring that downstream systems like your data warehouse or customer engagement tools receive fully enriched user data.
Step 1 – Create the Lookup Table
We’ll use a MySQL table called users, which stores known information about site visitors:
CREATE TABLE users (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
firstname VARCHAR(30),
lastname VARCHAR(30),
email VARCHAR(50) UNIQUE,
reg_date TIMESTAMP
);
This table could be populated by your CRM or marketing platform. We'll use the email field as the join key when enriching events.
Step 2 – Track Email Input with a Snowplow Event
We’ll use the change_form event type provided by the Snowplow JavaScript Tracker to capture the email address entered in the form:
<script type="text/javascript">
window.snowplow('newTracker', 'sp', 'collector.acme.com', { appId: 'web' });
window.snowplow('enableFormTracking');
</script>
When a user submits the form, the tracker emits a change_form event. The entered email is included in the value field, which we’ll extract and use as the input to our enrichment.
Step 3 – Define a Schema for the Enriched Data
We’ll define a custom context schema to describe the enriched output we expect from our SQL query:
{
"$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
"self": {
"vendor": "com.acme",
"name": "user",
"format": "jsonschema",
"version": "1-0-0"
},
"type": "object",
"properties": {
"firstname": { "type": "string" },
"lastname": { "type": "string" },
"email": { "type": "string" },
"reg_date": { "type": "string", "format": "date-time" }
},
"required": ["email"],
"additionalProperties": false
}
Once created, upload this schema to your Iglu registry and generate JSONPaths if needed for your warehouse:
igluctl static generate --with-json-paths schemas/
Step 4 – Configure the SQL Query Enrichment
Now, let’s configure the enrichment to:
- Extract the email address from the incoming event
- Query the MySQL database for a matching user
- Attach the result as a new context
Here’s a sample enrichment configuration:
{
"schema": "iglu:com.snowplowanalytics.snowplow.enrichments/sql_query_enrichment_config/jsonschema/1-0-0",
"data": {
"name": "sql_query_enrichment_config",
"vendor": "com.snowplowanalytics.snowplow.enrichments",
"enabled": true,
"parameters": {
"inputs": [
{
"placeholder": 1,
"json": {
"field": "unstruct_event",
"schemaCriterion": "iglu:com.snowplowanalytics.snowplow/change_form/jsonschema/1-0-0",
"jsonPath": "$.value"
}
}
],
"database": {
"mysql": {
"host": "your-db-host.amazonaws.com",
"port": 3306,
"sslMode": false,
"username": "{{USERNAME}}",
"password": "{{PASSWORD}}",
"database": "{{DATABASE}}"
}
},
"query": {
"sql": "SELECT firstname, lastname, email, reg_date FROM users WHERE email = ? LIMIT 1"
},
"output": {
"expectedRows": "AT_MOST_ONE",
"json": {
"schema": "iglu:com.acme/user/jsonschema/1-0-0",
"describes": "EVERY_ROW",
"propertyNames": "AS_IS"
}
},
"cache": {
"size": 3000,
"ttl": 60
}
}
}
}
Some notes:
- placeholder: 1 tells the enrichment to use the extracted email as the first parameter in the SQL query.
- expectedRows: "AT_MOST_ONE" avoids enrichment failure on duplicates.
- Caching is enabled to reduce query volume and latency.
Step 5 – Test and Inspect the Enriched Event
Once deployed:
- Restart your enrichment job (e.g., Snowplow stream-enrich).
- Submit a test form with an email present in the users table.
- Use a tool like Snowplow Mini or a staging warehouse to inspect the resulting event.
You should see a derived context added to the event:
"derived_contexts": {
"data": [
{
"schema": "iglu:com.acme/user/jsonschema/1-0-0",
"data": {
"firstname": "Jane",
"lastname": "Doe",
"email": "jane@example.com",
"reg_date": "2021-02-14T13:25:00Z"
}
}
]
}
This enriched data is now available to any downstream consumers of the pipeline—from your data warehouse to your reverse ETL tool.
Why Use SQL Query Enrichment?
SQL Query Enrichment is a powerful and flexible way to:
- Join event data with relational sources like MySQL, PostgreSQL, Redshift, or SQL Server
- Keep tracking lightweight and flexible by deferring enrichment logic
- Perform real-time lookups for use cases like user identification, product metadata, or campaign mapping
By integrating existing databases directly into your Snowplow enrichment process, you gain the ability to respond to business logic in real time—without compromising on event fidelity or schema governance.
Next Steps
- Try replacing MySQL with Redshift or PostgreSQL for other lookup use cases.
- Use this approach to join product IDs with catalog metadata, UTM tags with campaign info, or user IDs with subscription tiers.