How Do I Generate Redshift DDL for Existing Tables? A Quick Guide for Data Engineers
When managing event data at scale using Snowplow pipelines in Amazon Redshift, it’s not uncommon to migrate or replicate data between clusters — perhaps to isolate production workloads, test new models, or establish dev/stage environments.
But when the target cluster doesn’t yet have a matching table, you’ll need to recreate it — which means regenerating the original CREATE TABLE DDL.
This post explains how to generate that DDL directly from Redshift using AWS-provided utilities — a valuable technique for Snowplow users and any Redshift practitioner working with complex schemas.
Q: Can Redshift generate the DDL for an existing table?
Yes — while Redshift doesn’t have built-in SHOW CREATE TABLE support like MySQL, AWS provides an officially supported workaround via their v_generate_tbl_ddl view in the amazon-redshift-utils repository.
Q: How do I set it up?
To use this utility, follow these steps:
1. Clone the AWS Redshift Utils repo
Visit: https://github.com/awslabs/amazon-redshift-utils
2. Create the admin.v_generate_tbl_ddl view
Navigate to:
src/AdminViews/v_generate_tbl_ddl.sql
Ensure you have a schema named admin in your Redshift cluster:
CREATE SCHEMA admin;
Then, execute the SQL script in that file to create the view. You’ll need sufficient privileges (typically a superuser or an account with pg_catalog access).
Q: How do I generate the DDL for a specific table?
Once the view is created, simply run:
SELECT ddl
FROM admin.v_generate_tbl_ddl
WHERE schemaname = 'your_schema'
AND tablename = 'your_table';
This will return the full CREATE TABLE statement, including:
- Columns and data types
- Sort and distribution keys
- Encoding specifications
- Compression encodings
Q: Can I use this with Snowplow-generated tables?
Absolutely. This is particularly useful for Snowplow users who:
- Deploy to multiple Redshift clusters
- Need to replicate Snowplow atomic and derived schemas
- Are migrating between environments (e.g. staging to prod)
- Want to audit or version control DDLs for compliance or reproducibility
Snowplow’s dbt models, for example, generate tables dynamically during model runs — so being able to extract their final structure can be helpful for documentation or backup purposes.
Q: Is this better than using pg_table_def?
Yes — pg_table_def only gives you basic metadata (e.g. column names and types), whereas v_generate_tbl_ddl provides the full DDL, including constraints, encodings, and sort/dist keys — making it suitable for complete migrations.
Q: Any limitations or caveats?
- You must have sufficient privileges to access system tables (pg_class, pg_namespace, etc.)
- Some features (e.g. defaults, constraints) may be partially implemented, depending on your Redshift version
- The utility doesn’t output GRANT statements or COMMENT ON clauses — you’ll need to handle permissions separately
Final Thoughts
For Snowplow users managing structured event data in Redshift, the ability to programmatically extract CREATE TABLE DDLs simplifies migration, recovery, and multi-environment consistency. AWS’s v_generate_tbl_ddl view makes this straightforward, and is a must-have in any Redshift-based analytics toolkit.
Snowplow pipelines generate powerful, schema-rich data — this method helps you keep that structure portable and auditable across Redshift environments.