Can I Use Snowplow Data to Build Sankey Diagrams of User Journeys?
User journey visualization is a powerful technique for understanding behavior flows across a website or app. Tools like Google Analytics (GA) offer limited out-of-the-box versions — but with Snowplow, you can create custom, flexible, and privacy-respecting Sankey diagrams that reveal actual user paths and drop-offs across sessions.
In this post, we explain how to turn your Snowplow event data into Sankey-style visualizations, covering data modeling, transformation, and rendering options.
Q: What is a Sankey diagram in the context of digital analytics?
A Sankey diagram is a flow-based visualization that shows the movement of users (or other entities) between states or stages — such as page views or actions. It’s often used to reveal:
- Popular entry paths
- Drop-off points in funnels
- Conversion paths across multiple steps
- Differences between audience segments
For Snowplow users, Sankey diagrams can bring rich context to web, mobile, or app journeys using the full power of your event-level data.
Q: Can I build this with Snowplow data?
Yes — Snowplow’s granular behavioral data is a perfect fit for constructing Sankey diagrams. Unlike aggregated tools (like GA), Snowplow gives you:
- Session-level and page-level events
- Full timestamped navigation sequences
- Custom contexts (e.g. product categories, device type, marketing source)
- First-party data with ownership and flexibility
You can model exactly the journey flow you want — and segment it however you choose.
Q: What data structure do I need to generate a Sankey diagram?
To render a Sankey diagram, most libraries or BI tools expect pairs of sequential steps with associated flow counts.
To get this structure from Snowplow:
- Group events by user session
- Sort by event timestamp
- Create “step pairs” from the ordered event paths
- Aggregate by step pair
This can be done in dbt, SQL, or Spark depending on your Snowplow setup.
Q: How do I model user journeys in dbt?
If you’re using the Snowplow dbt packages, you already have access to modeled sessions and page_views data.
A basic journey model might look like:
WITH ordered_events AS (
SELECT
domain_userid,
domain_sessionid,
page_urlpath,
event_time,
ROW_NUMBER() OVER (
PARTITION BY domain_sessionid
ORDER BY event_time
) AS step_order
FROM snowplow.page_views
)
, step_pairs AS (
SELECT
a.page_urlpath AS from_step,
b.page_urlpath AS to_step
FROM ordered_events a
JOIN ordered_events b
ON a.domain_userid = b.domain_userid
AND a.domain_sessionid = b.domain_sessionid
AND a.step_order = b.step_order - 1
)
SELECT
from_step,
to_step,
COUNT(*) AS flow_count
FROM step_pairs
GROUP BY from_step, to_step
This can be materialized into a table and used directly in your BI tool or exported for visualization.
Q: What tools can I use to visualize a Sankey diagram?
Popular options include:
- Looker / Tableau / Power BI
- Some support Sankey visualizations via custom extensions or plug-ins
- Some support Sankey visualizations via custom extensions or plug-ins
- Observable / D3.js / Plotly
- Full control via code for web-based dashboards
- Full control via code for web-based dashboards
- Python (Plotly, Holoviews, SankeyMATIC)
- Ideal for data scientists working in Jupyter or Streamlit apps
- Ideal for data scientists working in Jupyter or Streamlit apps
Choose based on your team’s skill set and delivery needs — Snowplow gives you the flexibility to integrate with any visualization layer.
Q: Are there best practices for Sankey diagrams with Snowplow?
Absolutely. Here are some:
- Limit the number of unique steps: Group long-tail pages into “Other” or “Explore” to keep visuals readable
- Use meaningful labels: Consider labeling steps by category or funnel stage rather than raw URLs
- Segment the audience: Visualize journeys by device, campaign, or referrer for deeper insights
- Pre-aggregate the data: Don’t render at the session level — use aggregated pairs (as shown above) to optimize performance
Q: Does Snowplow support this out of the box?
While Snowplow doesn’t ship with built-in Sankey visualizations, it provides everything you need to build one, including:
- Complete event-level, time-ordered data
- Out-of-the-box session modeling
- Integration with dbt, BigQuery, Redshift, and Databricks
- First-party, privacy-compliant tracking from web, mobile, and server sources
Final Thoughts
Sankey diagrams are a powerful way to bring your Snowplow data to life. Whether you’re mapping product funnels, onboarding flows, or content journeys, Snowplow gives you the depth, structure, and ownership needed to visualize user behavior with confidence.
Your data, your structure, your visual — Snowplow puts you in control of your user journey analytics.