Blog

Enriching Snowplow Events in Real Time with SQL Query Enrichment and MySQL

By
Snowplow Team
&
August 29, 2024
Share this post

Snowplow’s SQL Query Enrichment provides a powerful mechanism to widen your event data in real time. By querying a relational database as events flow through the pipeline, you can attach relevant attributes without inflating your tracking payloads or deferring enrichment to a batch process.

In this tutorial, we’ll show how to use SQL Query Enrichment with a MySQL database to enhance form submission events with user information stored in an internal CRM.

The Use Case

Imagine a website with a simple lead capture form. When a user submits their email address, you want to enrich that submission with details already known about them—such as their name and signup date—stored in a MySQL database.

By using SQL Query Enrichment, you can do exactly this in real time, ensuring that downstream systems like your data warehouse or customer engagement tools receive fully enriched user data.

Step 1 – Create the Lookup Table

We’ll use a MySQL table called users, which stores known information about site visitors:

CREATE TABLE users (
  id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
  firstname VARCHAR(30),
  lastname VARCHAR(30),
  email VARCHAR(50) UNIQUE,
  reg_date TIMESTAMP
);

This table could be populated by your CRM or marketing platform. We'll use the email field as the join key when enriching events.

Step 2 – Track Email Input with a Snowplow Event

We’ll use the change_form event type provided by the Snowplow JavaScript Tracker to capture the email address entered in the form:

<script type="text/javascript">
  window.snowplow('newTracker', 'sp', 'collector.acme.com', { appId: 'web' });
  window.snowplow('enableFormTracking');
</script>

When a user submits the form, the tracker emits a change_form event. The entered email is included in the value field, which we’ll extract and use as the input to our enrichment.

Step 3 – Define a Schema for the Enriched Data

We’ll define a custom context schema to describe the enriched output we expect from our SQL query:

{
  "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#",
  "self": {
    "vendor": "com.acme",
    "name": "user",
    "format": "jsonschema",
    "version": "1-0-0"
  },
  "type": "object",
  "properties": {
    "firstname": { "type": "string" },
    "lastname": { "type": "string" },
    "email": { "type": "string" },
    "reg_date": { "type": "string", "format": "date-time" }
  },
  "required": ["email"],
  "additionalProperties": false
}

Once created, upload this schema to your Iglu registry and generate JSONPaths if needed for your warehouse:

igluctl static generate --with-json-paths schemas/

Step 4 – Configure the SQL Query Enrichment

Now, let’s configure the enrichment to:

  • Extract the email address from the incoming event

  • Query the MySQL database for a matching user

  • Attach the result as a new context

Here’s a sample enrichment configuration:

{
  "schema": "iglu:com.snowplowanalytics.snowplow.enrichments/sql_query_enrichment_config/jsonschema/1-0-0",
  "data": {
    "name": "sql_query_enrichment_config",
    "vendor": "com.snowplowanalytics.snowplow.enrichments",
    "enabled": true,
    "parameters": {
      "inputs": [
        {
          "placeholder": 1,
          "json": {
            "field": "unstruct_event",
            "schemaCriterion": "iglu:com.snowplowanalytics.snowplow/change_form/jsonschema/1-0-0",
            "jsonPath": "$.value"
          }
        }
      ],
      "database": {
        "mysql": {
          "host": "your-db-host.amazonaws.com",
          "port": 3306,
          "sslMode": false,
          "username": "{{USERNAME}}",
          "password": "{{PASSWORD}}",
          "database": "{{DATABASE}}"
        }
      },
      "query": {
        "sql": "SELECT firstname, lastname, email, reg_date FROM users WHERE email = ? LIMIT 1"
      },
      "output": {
        "expectedRows": "AT_MOST_ONE",
        "json": {
          "schema": "iglu:com.acme/user/jsonschema/1-0-0",
          "describes": "EVERY_ROW",
          "propertyNames": "AS_IS"
        }
      },
      "cache": {
        "size": 3000,
        "ttl": 60
      }
    }
  }
}

Some notes:

  • placeholder: 1 tells the enrichment to use the extracted email as the first parameter in the SQL query.

  • expectedRows: "AT_MOST_ONE" avoids enrichment failure on duplicates.

  • Caching is enabled to reduce query volume and latency.

Step 5 – Test and Inspect the Enriched Event

Once deployed:

  1. Restart your enrichment job (e.g., Snowplow stream-enrich).

  2. Submit a test form with an email present in the users table.

  3. Use a tool like Snowplow Mini or a staging warehouse to inspect the resulting event.

You should see a derived context added to the event:

"derived_contexts": {
  "data": [
    {
      "schema": "iglu:com.acme/user/jsonschema/1-0-0",
      "data": {
        "firstname": "Jane",
        "lastname": "Doe",
        "email": "jane@example.com",
        "reg_date": "2021-02-14T13:25:00Z"
      }
    }
  ]

}

This enriched data is now available to any downstream consumers of the pipeline—from your data warehouse to your reverse ETL tool.

Why Use SQL Query Enrichment?

SQL Query Enrichment is a powerful and flexible way to:

  • Join event data with relational sources like MySQL, PostgreSQL, Redshift, or SQL Server

  • Keep tracking lightweight and flexible by deferring enrichment logic

  • Perform real-time lookups for use cases like user identification, product metadata, or campaign mapping

By integrating existing databases directly into your Snowplow enrichment process, you gain the ability to respond to business logic in real time—without compromising on event fidelity or schema governance.

Next Steps

  • Try replacing MySQL with Redshift or PostgreSQL for other lookup use cases.

  • Use this approach to join product IDs with catalog metadata, UTM tags with campaign info, or user IDs with subscription tiers.
Subscribe to our newsletter

Get the latest content to your inbox monthly.

Get Started

Whether you’re modernizing your customer data infrastructure or building AI-powered applications, Snowplow helps eliminate engineering complexity so you can focus on delivering smarter customer experiences.