Blog

Can I Use Snowplow Data to Build Sankey Diagrams of User Journeys?

By
Snowplow Team
&
September 24, 2024
Share this post

User journey visualization is a powerful technique for understanding behavior flows across a website or app. Tools like Google Analytics (GA) offer limited out-of-the-box versions — but with Snowplow, you can create custom, flexible, and privacy-respecting Sankey diagrams that reveal actual user paths and drop-offs across sessions.

In this post, we explain how to turn your Snowplow event data into Sankey-style visualizations, covering data modeling, transformation, and rendering options.

Q: What is a Sankey diagram in the context of digital analytics?

A Sankey diagram is a flow-based visualization that shows the movement of users (or other entities) between states or stages — such as page views or actions. It’s often used to reveal:

  • Popular entry paths

  • Drop-off points in funnels

  • Conversion paths across multiple steps

  • Differences between audience segments

For Snowplow users, Sankey diagrams can bring rich context to web, mobile, or app journeys using the full power of your event-level data.

Q: Can I build this with Snowplow data?

Yes — Snowplow’s granular behavioral data is a perfect fit for constructing Sankey diagrams. Unlike aggregated tools (like GA), Snowplow gives you:

  • Session-level and page-level events

  • Full timestamped navigation sequences

  • Custom contexts (e.g. product categories, device type, marketing source)

  • First-party data with ownership and flexibility

You can model exactly the journey flow you want — and segment it however you choose.

Q: What data structure do I need to generate a Sankey diagram?

To render a Sankey diagram, most libraries or BI tools expect pairs of sequential steps with associated flow counts.

To get this structure from Snowplow:

  1. Group events by user session

  2. Sort by event timestamp

  3. Create “step pairs” from the ordered event paths

  4. Aggregate by step pair

This can be done in dbt, SQL, or Spark depending on your Snowplow setup.

Q: How do I model user journeys in dbt?

If you’re using the Snowplow dbt packages, you already have access to modeled sessions and page_views data.

A basic journey model might look like:

WITH ordered_events AS (
  SELECT
    domain_userid,
    domain_sessionid,
    page_urlpath,
    event_time,
    ROW_NUMBER() OVER (
      PARTITION BY domain_sessionid
      ORDER BY event_time
    ) AS step_order
  FROM snowplow.page_views
)
, step_pairs AS (
  SELECT
    a.page_urlpath AS from_step,
    b.page_urlpath AS to_step
  FROM ordered_events a
  JOIN ordered_events b
    ON a.domain_userid = b.domain_userid
   AND a.domain_sessionid = b.domain_sessionid
   AND a.step_order = b.step_order - 1
)
SELECT
  from_step,
  to_step,
  COUNT(*) AS flow_count
FROM step_pairs
GROUP BY from_step, to_step

This can be materialized into a table and used directly in your BI tool or exported for visualization.

Q: What tools can I use to visualize a Sankey diagram?

Popular options include:

  • Looker / Tableau / Power BI

    • Some support Sankey visualizations via custom extensions or plug-ins

  • Observable / D3.js / Plotly

    • Full control via code for web-based dashboards

  • Python (Plotly, Holoviews, SankeyMATIC)

    • Ideal for data scientists working in Jupyter or Streamlit apps

Choose based on your team’s skill set and delivery needs — Snowplow gives you the flexibility to integrate with any visualization layer.

Q: Are there best practices for Sankey diagrams with Snowplow?

Absolutely. Here are some:

  • Limit the number of unique steps: Group long-tail pages into “Other” or “Explore” to keep visuals readable

  • Use meaningful labels: Consider labeling steps by category or funnel stage rather than raw URLs

  • Segment the audience: Visualize journeys by device, campaign, or referrer for deeper insights

  • Pre-aggregate the data: Don’t render at the session level — use aggregated pairs (as shown above) to optimize performance

Q: Does Snowplow support this out of the box?

While Snowplow doesn’t ship with built-in Sankey visualizations, it provides everything you need to build one, including:

  • Complete event-level, time-ordered data

  • Out-of-the-box session modeling

  • Integration with dbt, BigQuery, Redshift, and Databricks

  • First-party, privacy-compliant tracking from web, mobile, and server sources

Final Thoughts

Sankey diagrams are a powerful way to bring your Snowplow data to life. Whether you’re mapping product funnels, onboarding flows, or content journeys, Snowplow gives you the depth, structure, and ownership needed to visualize user behavior with confidence.

Your data, your structure, your visual — Snowplow puts you in control of your user journey analytics.

Subscribe to our newsletter

Get the latest content to your inbox monthly.

Get Started

Whether you’re modernizing your customer data infrastructure or building AI-powered applications, Snowplow helps eliminate engineering complexity so you can focus on delivering smarter customer experiences.