Make Big Data Small Again with Redshift ZSTD Compression

Snowplow Team

November 13, 2024

Share this post

How Can Redshift ZSTD Compression Help Snowplow Users?

Amazon Redshift introduced support for Zstandard (ZSTD) compression in early 2017, unlocking substantial storage and performance efficiencies for Snowplow users working with large datasets. This guide explains how to implement ZSTD compression with Snowplow's atomic.events and other derived tables, what benefits you can expect, and common pitfalls to avoid.

What Is ZSTD and Why Does It Matter for Snowplow?

Zstandard, or ZSTD, is a modern compression algorithm developed by Facebook that combines Lempel-Ziv and tANS encoding. Compared to older algorithms like LZO or RUNLENGTH, ZSTD typically offers:

Higher compression ratios (typically 2–3x over Snowplow’s standard setup)
Faster decompression speeds
Support across all Redshift-compatible data types

These characteristics make ZSTD ideal for compressing large VARCHAR columns—especially useful for verbose fields like unstruct_event, contexts, or any schema with large JSON strings.

How Do I Apply ZSTD Compression in Redshift?

There are two main approaches:

1. Analyze Existing Tables

Use Redshift's ANALYZE COMPRESSION to profile a table and get encoding recommendations.

ANALYZE COMPRESSION atomic.events;

Pros:

Automated suggestions
Fast for small/medium tables

Cons:

Sampling limited to 1 billion rows
Recommends compression for all columns (including SORTKEYs, which is not ideal)

2. Manual Deep Copy with Explicit Encodings

Create a new table manually and explicitly define encodings based on your workload and query patterns. Then, deep copy the data:

CREATE TABLE atomic.events_new (...); -- with ZSTD encodings

INSERT INTO atomic.events_new SELECT * FROM atomic.events;

Pros:

Full control over compression types
Opportunity to clean, sort, and reoptimize data layout

Cons:

Requires disk headroom (50%+ recommended)
Need to recreate views, permissions, and dependencies

Should I Compress Sort Keys?

Short answer: No.

Compressing SORTKEY columns can negatively affect query performance. AWS and the Snowplow team recommend using ENCODE RAW for these columns (e.g., collector_tstamp, event_id).

What Compression Ratios Can I Expect?

Snowplow users have reported the following when switching from default compression to ZSTD:

~3x reduction in disk space across datasets of 10 million to 5 billion rows
Negligible query performance impact, though this depends on workload and access patterns

What Are the Steps to Migrate Safely?

Run ANALYZE COMPRESSION or analyze workloads manually
Create new table with ZSTD-compressed columns (except SORTKEYs)
Deep copy data into the new table
Validate row counts and data consistency
Recreate views, grants, foreign keys, and constraints
Swap tables (rename atomic.events_new to atomic.events)

Be sure to:

Pause pipeline if needed
Check disk space before copying
Reapply table ownership and ACLs

Use the SQL query below to extract user grants before migration:

-- Permission recovery query omitted for brevity (see source)

What’s the Status of Tooling Support in Snowplow?

igluctl: Now supports ZSTD as a compression option for DDL generation (since 2018)
Schema Guru: Doesn’t auto-generate ZSTD yet, but can be integrated with igluctl workflows

For managed deployments, Snowplow BDP users can get support from the team for planning and executing ZSTD migrations.

Final Recommendations

Use ZSTD for large VARCHAR and JSON-heavy fields
Avoid compressing SORTKEYs
Benchmark both compression ratio and query performance before deploying to production
Always validate and backup before dropping or renaming key tables

‍

Subscribe to our newsletter

Get the latest content to your inbox monthly.

Make Big Data Small Again with Redshift ZSTD Compression

How Can Redshift ZSTD Compression Help Snowplow Users?

What Is ZSTD and Why Does It Matter for Snowplow?

How Do I Apply ZSTD Compression in Redshift?

1. Analyze Existing Tables

2. Manual Deep Copy with Explicit Encodings

CREATE TABLE atomic.events_new (...); -- with ZSTD encodings

INSERT INTO atomic.events_new SELECT * FROM atomic.events;

Should I Compress Sort Keys?

What Compression Ratios Can I Expect?

What Are the Steps to Migrate Safely?

-- Permission recovery query omitted for brevity (see source)

What’s the Status of Tooling Support in Snowplow?

Final Recommendations

Get Started

Get Started

Get Started

Products

Comparisons

Customers

Solutions

Explore

Integrations

Technology

Company

Resources

Get the latest Snowplow news and updates

Follow Us