Blog

JSONPaths Files Demystified: Mapping Unstructured Data into Redshift with Confidence

By
Snowplow Team
&
August 28, 2024
Share this post

Working with unstructured or self-describing events in Amazon Redshift often leads to one of the most frequently asked questions among Snowplow users:

“What is a JSONPaths file and how do I use it correctly with Redshift?”

While JSONPaths files aren’t unique to Snowplow, they play a critical role in how Snowplow loads custom event and context data into your Redshift tables. This post walks through how JSONPaths work, common pitfalls, and best practices — with real examples tailored for Snowplow-powered pipelines.

What is a JSONPaths file and why do I need one?

A JSONPaths file is a text file used by Amazon Redshift's COPY FROM JSON command to map fields from JSON data to columns in a Redshift table.

Snowplow uses JSONPaths files when loading self-describing events and contexts from S3 into Redshift via the COPY command. Since JSON field order is not guaranteed and the keys might not match column names, a JSONPaths file explicitly defines how to extract the relevant fields in the correct order.

Think of JSONPaths like XPath, but for JSON.

How does a JSONPaths file work?

A JSONPaths file contains a single JSON object with one key: "jsonpaths", which maps to an array of JSONPath expressions. Each expression tells Redshift how to locate a field in the incoming JSON object.

Example:

Given this input JSON:

{
  "schema": "iglu:com.acme/context/jsonschema/1-0-0",
  "data": {
    "version_app": "1.7.0",
    "build": "1002"
  }
}

A corresponding JSONPaths file might look like:

{
  "jsonpaths": [
    "$.data.version_app",
    "$.data.build"
  ]
}

Redshift uses these paths to extract values and map them to the target table columns, in order.

Common Pitfall: Field Order, Not Field Names

The biggest gotcha: JSONPaths files map values to Redshift columns by order, not by name.

Redshift doesn’t care what you call the keys in your JSON. It relies on the position of each JSONPath expression in the array matching the column order of your Redshift table.

So if your table is:

CREATE TABLE com_acme_context (
  app_version VARCHAR,
  build_id    VARCHAR
);

The first path in the JSONPaths file should extract version_app, the second should extract build, even if their key names differ from the column names.

How does this fit into Snowplow’s pipeline?

When Snowplow enriches events and stores them in S3, any attached contexts or unstructured events are serialized as self-describing JSONs. These are loaded into Redshift using COPY FROM JSON with a JSONPaths file generated based on the Iglu schema for that event or context.

Snowplow’s Redshift Loader automatically manages this if you're using the standard pipeline configuration, including:

  • Generating CREATE TABLE statements from your Iglu schemas

  • Creating matching JSONPaths files

  • Running COPY with the correct mapping

However, if you are writing custom loaders or troubleshooting data not appearing correctly, understanding and validating the JSONPaths file becomes essential.

Where do JSONPaths files live?

In a typical Snowplow Redshift setup:

  • JSONPaths files are stored in your Iglu-compatible resolver repo, often alongside your schemas.

  • When using Snowplow’s batch loader, these files are uploaded to the same S3 location where enriched data is stored.

  • The loader references the JSONPaths file in the COPY command like so:

COPY com_acme_context
FROM 's3://your-bucket/enriched/archive/'
CREDENTIALS 'aws_iam_role=...'
JSON 's3://your-bucket/jsonpaths/com.acme/context_1.jsonpaths'

Best Practices for Using JSONPaths with Redshift

✅ Use dot notation consistently

Snowplow-generated JSONPaths files use dot notation (e.g., $.data.version_app). Stick to this format unless you have a reason to use bracket notation.

✅ Match Redshift column order exactly

The position of each JSONPath in the array must correspond to the column order in your Redshift CREATE TABLE statement — not the JSON key names.

✅ Use Iglu schemas to automate this

If you’re managing schemas through Iglu Central or your own Iglu registry, Snowplow tools can auto-generate both the Redshift table DDL and the JSONPaths file using Iglu Schema Tools.

✅ Validate JSONPath files

Before loading data, test your JSONPaths files with sample JSON to ensure values are extracted correctly. This can help avoid silent data drops during the Redshift load phase.

Recap: How Redshift COPY Works with JSONPaths

Component - Role

JSON Input - The event or context data in self-describing JSON

JSONPaths File - List of JSONPath expressions to extract values

Redshift Table - Defines column order; not matched by field names

COPY Command - Reads from S3 using mapping defined in JSONPaths

💬 Still Seeing Data Misalignment?

Check for:

  • Mismatched column order between JSONPaths and Redshift table

  • Missing or incorrect JSONPath expressions

  • Copying from the wrong S3 folder or referencing the wrong JSONPaths file

  • Typos in the schema names or paths

If you're stuck, get in touch with our team.

JSONPaths may seem obscure at first, but once you understand their mapping role, they become a powerful tool in loading structured behavioral data from JSON into Redshift with precision — enabling better modeling, cleaner data, and stronger analytics.

Subscribe to our newsletter

Get the latest content to your inbox monthly.

Get Started

Whether you’re modernizing your customer data infrastructure or building AI-powered applications, Snowplow helps eliminate engineering complexity so you can focus on delivering smarter customer experiences.