Blog

GDPR Tutorial: Deleting Customer Data from Snowflake with Snowplow

By
Snowplow Team
&
May 9, 2024
Share this post

With the enforcement of the GDPR (General Data Protection Regulation), organizations collecting behavioral data must be able to honor a user’s right to erasure — the "right to be forgotten." This means deleting a data subject’s personal data upon request from all systems, including your Snowplow pipeline’s storage target.

For users leveraging Snowflake as their data warehouse, the process of identifying and removing customer data is more straightforward than in other systems like Redshift. In this tutorial, we’ll walk through how to safely and comprehensively delete user data from the Snowplow atomic.events table in Snowflake while preserving data integrity and compliance.

When and Why to Delete Customer Data

Under GDPR, users can request erasure of their data. As a data engineer or architect responsible for compliance:

  • You must identify and delete all identifiable records.

  • You should understand how Snowflake’s Time Travel and Fail-safe features impact true deletion.

  • You must confirm that modeled data derived from atomic events will also be reprocessed or invalidated.

Assumptions for This Tutorial

  • You’re storing raw Snowplow events in Snowflake's atomic schema, specifically atomic.events.

  • Your data models are either fully recomputed daily or designed to reflect upstream deletions.

  • You’re deleting based on user_id, though the approach applies to other identifiers (e.g., domain_userid, network_userid, or user_ipaddress).

For incremental dbt models or non-recomputable downstream systems, further engineering consideration is needed and is out of scope for this guide.

Step 1: Validate the Target Data

Before running any deletion, it’s important to sanity-check what data will be removed:

SELECT
  COUNT(*) AS event_count,
  MIN(collector_tstamp) AS first_seen,
  MAX(collector_tstamp) AS last_seen
FROM
  atomic.events
WHERE
  user_id = 'REDACTED_USER_ID';

This helps verify the time window and volume of data impacted. Always confirm with stakeholders or auditors as needed.

Step 2: Delete Events from atomic.events

If the previewed data looks correct, proceed with the deletion:

DELETE FROM atomic.events
WHERE user_id = 'REDACTED_USER_ID';

This statement permanently removes the matching rows from the active table. However, Snowflake retains historical versions, so deletion is not immediately permanent.

Step 3: Understand Time Travel and Fail-safe in Snowflake

Time Travel

Snowflake's Time Travel allows you to query previous versions of data for a defined retention period (by default, 1 day):

  • Standard Edition: 1-day retention (can be reduced to 0)

  • Enterprise Edition: Up to 90 days

This means deleted records can be recovered or viewed via Time Travel during that period. To comply with GDPR:

  • Minimize Time Travel retention for sensitive tables

  • Document and communicate the retention window clearly in privacy notices

Example of querying historical data:

SELECT *
FROM atomic.events AT (OFFSET => -60*60) -- 1 hour ago
WHERE user_id = 'REDACTED_USER_ID';

Or, drop and recreate the table with retention disabled (requires consideration for other jobs depending on this table).

Fail-safe

Fail-safe is an additional 7-day period after Time Travel ends, during which Snowflake can recover deleted data only upon request. You cannot access this data directly.

  • Fail-safe is non-configurable

  • Intended for disaster recovery, not GDPR compliance

  • Data subject requests should note that complete removal may be delayed up to 7 + N days

💡 According to GDPR, data must be erased "without undue delay" — consider this in your retention policies.

Best Practices for GDPR Compliance with Snowplow + Snowflake

  • Design atomic tables with GDPR in mind:
    • Partition on user identifiers where possible
    • Avoid duplicating personal data across tables

  • Minimize retention windows:
    • Lower Time Travel where feasible
    • Set expiration policies using Snowflake's data lifecycle management

  • Automate erasure workflows:
    • Build stored procedures or use dbt macros to detect and delete by user_id
    • Track and log deletions for compliance audits

  • Ensure downstream data refreshes:
    • Use dbt or Airflow to re-run models post-deletion
    • Confirm the event deletion propagates to derived datasets

  • Audit your pipeline:
    • Run regular scans for PII leakage
    • Map all fields that may carry identifiers (e.g., user_ipaddress, domain_userid, network_userid, se_label)

Automating Deletions with dbt + Snowplow

If you’re managing your Snowflake transformations with dbt, consider adding deletion logic as part of your orchestration pipeline. For example:

-- models/gdpr_deletion.sql
DELETE FROM {{ ref('events') }}
WHERE user_id IN (
  SELECT user_id FROM {{ ref('gdpr_deletion_requests') }}
);

Trigger this with a flag-based mechanism or a separate dbt run step. Make sure this runs before your daily model builds.

Final Thoughts

As behavioral data pipelines mature, ensuring GDPR compliance becomes a core responsibility for data teams. With Snowplow and Snowflake, you have powerful tools to help:

  • Capture granular, high-quality behavioral data

  • Delete data programmatically and transparently

  • Maintain trust and compliance through data governance

Subscribe to our newsletter

Get the latest content to your inbox monthly.

Get Started

Whether you’re modernizing your customer data infrastructure or building AI-powered applications, Snowplow helps eliminate engineering complexity so you can focus on delivering smarter customer experiences.