Exploring funnel-based product analytics with Snowplow
This article is intended to serve as an introduction to funnel based product analytics that is based on event level behavioural data. Below are some of the key metrics used in funnel analysis and recipes for SQL queries that can be used to generate them.
This article is for anyone who is keen to explore ideas that will help to formalise their product analysis process. The process defined is applicable to any event-level behavioral data that captures a session identifier.
As this article is intended as an introduction to event level product analytics, some staple product analytics topics such as A/B testing or the consideration of web page UX have been omitted intentionally. For more information on topics that are not covered in this article please visit our product analytics ebook.
The benefits of Product Analytics
By implementing tracking on your product it is possible to see what interactions are taking place during individual user sessions on your product. Some interactions are desirable and some may be less desirable. The desirable interactions are often things that are linked intimately to business metrics; this link is most commonly expressed in the maxim “engagement drives retention”. By gathering and analysing data on these desired interactions and the interactions that surround them it’s possible to identify areas where product interactions are performing suboptimally and thereby arrive at insights that allow for suboptimal business metrics to be improved.
It should be noted that product analytics is an iterative process and that when action has been taken to change a product for the better it should be possible to observe a change in the desired user behaviour. For changes that are small, and drive only slight incremental improvements A/B testing may be required to determine whether or not the user behaviour has indeed seen a significant improvement.
The purpose of implementing automatic tracking
Automatic tracking is a class of tracking that is based on 4 predefined methods in the Snowplow Javascript Tracker, they are:
trackPageView - Which tracks page view events
enableActivityTracking - Which sets a configurable heartbeat (triggered by the page view event), this can be used to understand user scroll behaviour and time engaged.
enableLinkClickTracking - Which tracks link clicks for any links in an <a ref>
enableFormTracking - Which tracks html form interactions; 3 events are generated:
- Form focus
- Form changes
- Form submits
These are termed automatic tracking as they are so easy to set up - anyone with access to a GTM account can do it - see our guide here. To read more about the automatic tracking methods visit our docs site here.
We generally recommend our customers set up automatic tracking before doing anything else due to the richness of data this type of tracking can generate (especially for the product analytics use case). Gaps in the tracking data delivered by automatic tracking should be filled with custom defined events. The rest of this article considers product analytics that is enabled exclusively via automatic tracking.
The purpose of implementing anonymous tracking
Over time there has been an increase in privacy concerns from users and businesses alike when it comes to tracking user behavior. Snowplow has evolved to meet the needs of our customers by developing a Javascript tracker that allows for fully anonymous tracking. This generates behavioural data that simultaneously protects user privacy and can be used for product analytics. The analytics queries below do not rely on any concept of user identification - only session identifiers are required to run the queries.
Conversion analysis for product analytics
In order to measure conversion, you first needto define what a conversion looks like i.e. what is desirable in-session behaviour. In the case of an ecommerce retail organisation this could be a user purchasing a product, similarly a media organisation might be interested in driving users to subscribe or register a user account. A good starting point used by many organisations is to use the url captured in page_view events as a conversion indicator. A more sophisticated conversion indicator could be indicated by the presence of an order_id in a transaction context (in the case of an ecommerce purchase) or the presence of a specific form-submit event (in the case of a user completing an account sign up funnel).
It is possible to have multiple types of conversion defined and tracked concurrently. A conversion can also be inverted and could be indicative of an undesired in-session behaviour e.g. users navigating through a website’s product get-help section in order to have a costly call with a call centre agent.
Naturally different weights should be assigned to different types of conversion and these weights should influence project prioritization, for example a project that delivers 2% increase in conversion rate in a main order flow where the average order value is £40 should be prioritised over a project that delivers a 2% decrease in conversion rate in a get-help flow where the cost of a call with an agent is £5.
Calculating the conversion rates in your funnel
There are two types of conversion rates to consider:
Specific Conversion
Where the session starts with the intent to convert e.g. viewing the product listings on an ecommerce site or starting the user signup process and the session finishes their journey with an action that signifies conversion e.g. making a purchase or creating a user account, often with the specification that the conversion must occur after the user signals their intent. This conversion rate is calculated as:
General Conversion
Where session intent is not factored into the conversion, but instead the number of conversions is divided by the total number of sessions.
When discussing general conversion it is important to caveat that all sessions refers to all valid sessions - any session behaviour that is attributable to bots should be filtered out of the results.
The following query can be used to calculate both general and specific conversion rates:
SELECT COUNT(DISTINCT domain_sessionid) AS all_sessions
, COUNT(DISTINCT CASE WHEN page_urlpath LIKE '%{{.conversion_intent}}%' THEN domain_sessionid END) AS sessions_w_intent
, COUNT(DISTINCT CASE WHEN page_urlpath LIKE '%{{.conversion}}%' THEN domain_sessionid END) AS converting_sessions
, converting_sessions / (1.0 *sessions_w_intent) AS specific_conversion_rate
, converting_sessions / (1.0 * all_sessions) AS general_conversion_rate
FROM atomic.events
WHERE DATE(collector_tstamp) >= '2021-01-01' -- limit on partition, improve query performance
AND event_name = 'page_view'
;
Adding in a Group By clause to this query makes it possible to view conversion rates in a number of different ways - by day or by geography for example. This opens up conversion rate analysis to an analyst to consider questions such as:
- Are conversion rates affected by day of the week?
- How are conversion rates trending over time?
- What implication does a poor conversion rate in a specific geography have for my marketing strategy?
- If a session starts on a particular section of the product how does this impact their likelihood to convert?
From this we can see how even this short, simple query has a lot to offer in terms of insights.
A note on templates in queries
The contents of this article are intended to be as generic as possible so that the process described is applicable to all verticals. For this reason {{.templates}} are included in the queries for the user to populate. A table has been created below to demonstrate how an analyst working in different verticals might approach populating the templates to arrive at their first insights:
VerticalTemplateSuggested ValueEcommerce retail{{.conversion_intent}}Text urlpath string indicative of visit to a product listings page.Ecommerce retail{{.intermediate_step}}Text urlpath string indicative of visit to the page for a specific product.Ecommerce retail{{.conversion}}Text urlpath string indicative of order-completion.Media{{.conversion_intent}}Text urlpath string indicative of visit to the homepage. Media{{.intermediate_step}}Text urlpath string indicative of visit to the page for a specific article.Media{{.conversion}}Text urlpath string indicative of subscription sign-up.
Calculating the time to convert in funnel analytics
Another important consideration is the time it takes to convert as this time can indicate friction in the conversion journey. The following query is a template that can be used to calculate the time a user was actively engaged during their session in between signalling intent to convert and actually converting.
WITH time_boundaries AS( -- define the time boundaries in between intent and converting
SELECT domain_sessionid
, MIN(CASE WHEN page_urlpath LIKE '%{{.conversion_intent}}%' THEN derived_tstamp END) AS first_intent_signalled -- replace with template
, MIN(CASE WHEN page_urlpath LIKE '%{{.conversion}}%' THEN derived_tstamp END) AS first_conversion -- replace with template
FROM atomic.events
WHERE DATE(collector_tstamp) >= '2021-01-01' -- limit on partition, improve query performance
AND event_name = 'page_view'
GROUP BY 1
)
, session_ttc AS( -- for each session calculate the time to convert
SELECT ev.domain_sessionid
, COUNT(DISTINCT event_id) * {{.interval_seconds}} AS session_time_to_convert
FROM atomic.events AS ev
JOIN time_boundaries AS tb
ON ev.domain_sessionid = tb.domain_sessionid
AND ev.derived_tstamp > first_intent_signalled
AND ev.derived_tstamp < first_conversion
WHERE DATE(collector_tstamp) >= '2021-01-01' -- limit on partition
AND event_name = 'page_ping'
GROUP BY 1
)
-- calculate the average time to convert
SELECT AVG(session_time_to_convert) AS avg_time_to_convert
FROM session_ttc
;
The page_ping automatic event type is used in this query, this is a heartbeat event that fires at a specified interval of user engagement on a page. A user is considered to be engaged on a page if any of the following are true: the tab in focus, the mouse moves over the page, the user scrolls etc.
Alternatively for an organisation unwilling to take on the increased cost associated with the recording of heartbeat events a simple timestamp difference between the intent to convert event and the converting event can be taken.
Similar to the first conversion rate query this query can also be used as a starting point to derive practically unlimited insights.
Funnel Analysis - SQL query
Conversion rates give a useful overview of the main funnels defined in a product but they do not come close to telling the full story for each conversion. By drilling down into the next level of granularity it is possible to yield more insights and show where to prioritise focus in order to have the highest impact on the overall conversion rate. The diagram below shows the structure of a generic product funnel and some associated and important metrics. The funnel includes a required starting node, required end node and required intermediate node(s), edges between nodes indicate the directional transition from one node to a subsequent node.
For reporting purposes it is generally a good idea to run collate funnel data over a defined and reasonable time period, e.g. the past week, and to have separate queries that cover each of the core funnels in the product funnel.
The following query can be used to calculate the metrics shown in the funnel diagram above:
WITH funnel_prep AS(
SELECT domain_sessionid
, page_urlpath
, derived_tstamp
FROM atomic.events
WHERE event_name = 'page_view'
AND DATE(collector_tstamp) = '2021-01-07' -- timestamp argument, limit on partition key to improve query performance
)
, intent AS(
SELECT domain_sessionid
, MIN(derived_tstamp) AS intent_tstamp
FROM funnel_prep
WHERE page_urlpath LIKE '%{{.conversion_intent}}%'
GROUP BY 1
)
, step1 AS(
SELECT domain_sessionid
, MIN(derived_tstamp) AS step1_tstamp
FROM funnel_prep
WHERE page_urlpath LIKE '%{{.funnel_step1}}%'
GROUP BY 1
)
, conversion AS(
SELECT domain_sessionid
, MIN(derived_tstamp) AS conversion_tstamp
FROM funnel_prep
WHERE page_urlpath LIKE '%{{.conversion}}%'
GROUP BY 1
)
SELECT COUNT(DISTINCT i.domain_sessionid) AS sessions_entering_funnel -- W
, COUNT(DISTINCT s1.domain_sessionid) / (1.0 * COUNT(DISTINCT i.domain_sessionid)) AS proportion_intent_to_step1 -- X
, COUNT(DISTINCT c.domain_sessionid) / (1.0 * COUNT(DISTINCT s1.domain_sessionid)) AS proportion_step1_to_conversion -- Y
, COUNT(DISTINCT c.domain_sessionid) AS sessions_exiting_funnel -- Z
FROM intent AS i
LEFT JOIN step1 AS s1
ON i.domain_sessionid = s1.domain_sessionid
AND i.intent_tstamp < s1.step1_tstamp
LEFT JOIN conversion AS c
ON s1.domain_sessionid = c.domain_sessionid
AND s1.step1_tstamp < c.conversion_tstamp
;
Time spent at each stage of a customer journey
More depth can be added to the analysis above if page ping events are enabled. Page_pings can be queried to calculate the amount of time a user has spent engaged on each individual step in the funnel. All that is required to perform this analysis is for each funnel step to have a clear definition, in the example query below distinct values for the page_urlpath are used:
WITH funnel_prep AS(
SELECT domain_sessionid
, page_urlpath
, derived_tstamp
, event_id
FROM atomic.events
WHERE event_name = 'page_ping'
AND DATE(collector_tstamp) = '2021-01-07' -- timestamp argument, limit on partition key to improve query performance
)
, intent AS(
SELECT domain_sessionid
, MIN(derived_tstamp) AS intent_tstamp
, COUNT(DISTINCT event_id) * {{.interval_seconds}} AS time_engaged
FROM funnel_prep
WHERE page_urlpath LIKE '%{{.conversion_intent}}%'
GROUP BY 1
)
, step1 AS(
SELECT domain_sessionid
, MIN(derived_tstamp) AS step1_tstamp
, COUNT(DISTINCT event_id) * {{.interval_seconds}} AS time_engaged
FROM funnel_prep
WHERE page_urlpath LIKE '%{{.funnel_step1}}%'
GROUP BY 1
)
, conversion AS(
SELECT domain_sessionid
, MIN(derived_tstamp) AS conversion_tstamp
, COUNT(DISTINCT event_id) * {{.interval_seconds}} AS time_engaged
FROM funnel_prep
WHERE page_urlpath LIKE '%{{.conversion}}%'
GROUP BY 1
)
SELECT AVG(i.time_engaged) AS {{.conversion_intent}}_time_engaged_seconds
, AVG(s1.time_engaged) AS {{.funnel_step1}}_time_engaged_seconds
, AVG(c.time_engaged) AS {{.conversion}}_time_engaged_seconds
FROM intent AS i
LEFT JOIN step1 AS s1
ON i.domain_sessionid = s1.domain_sessionid
AND i.intent_tstamp < s1.step1_tstamp
LEFT JOIN conversion AS c
ON s1.domain_sessionid = c.domain_sessionid
AND s1.step1_tstamp < c.conversion_tstamp
;
Note: If heartbeat events are not enabled then it may be possible to adapt this query to calculate the time taken on a step by performing a DATE_DIFF calculation on the minimum and maximum timestamps for events in each step.
Similar to reporting on conversion rates this type of basic funnel analysis gives a good overview of the performance of the main and most clearly defined business funnels, and the outputs of these queries can be used to show where there is user friction either due to users taking a long time to complete a step or due to users dropping off a funnel.
Building out funnels in this way leads to the question: how can I analyze users dropping off a funnel? The answer to this question lies in multi-path funnel analysis.
Multi-Path funnel analysis
There are a variety of different ways to conduct multi-path funnel analysis. The first case examined answers the question: where do users go when they drop off a funnel?
Multi-Path Funnel without a defined terminal
This multi-path funnel has a defined starting point - but does not have a defined endpoint. Instead the query returns the following steps that a user lands on in a session after continuing their journey starting from the predefined start point. In the example given below up to 2 steps after the initial starting point are possible - meaning it is possible to see the number of sessions that follow any combination of 1, 2 or 3 steps from a predefined starting point. This session journey description is represented in the graph below:
The following query produces a table that conforms to the logic displayed in the diagram above:
WITH funnel_prep AS(
SELECT domain_sessionid
, page_urlpath
, derived_tstamp
FROM atomic.events
WHERE event_name = 'page_view'
AND DATE(collector_tstamp) >= '2021-01-07' -- timestamp argument, limit on partition key to improve query performance
)
, entry AS(
SELECT domain_sessionid
, page_urlpath
, MIN(derived_tstamp) AS entry_tstamp
FROM funnel_prep
WHERE page_urlpath LIKE '%{{.starting_url_placeholder}}%' -- define the starting point here
GROUP BY 1,2
)
, step_n AS(
SELECT domain_sessionid
, page_urlpath
, MIN(derived_tstamp) AS step_tstamp
FROM funnel_prep
WHERE domain_sessionid IN(SELECT domain_sessionid FROM entry)
AND page_urlpath NOT LIKE '%{{.starting_url_placeholder}}%'
GROUP BY 1,2
)
SELECT e.page_urlpath AS entry
, f.page_urlpath AS step1
, g.page_urlpath AS step2
, COUNT(DISTINCT e.domain_sessionid) AS sessions
FROM entry AS e
LEFT JOIN step_n AS f -- n = 1
ON e.domain_sessionid = f.domain_sessionid
AND e.entry_tstamp < f.step_tstamp
LEFT JOIN step_n AS g -- n = 2
ON e.domain_sessionid = g.domain_sessionid
AND f.step_tstamp < g.step_tstamp
GROUP BY 1,2,3
ORDER BY 4 DESC
;
An additional benefit of this query is the flexibility afforded by not having to define what product steps are allowed on each node of the graph – resulting in a table with all existing session journey combinations in the data, this makes the output table ideal for self serve analytics for less technical stakeholders.
Example output table:
(Taking the example of a fictional ecommerce company where the chosen starting point is a product-listings page)
EntryStep1Step2Sessionsproduct-listingsview-product_1checkout6067product-listingsview-product_2product-listings2904product-listingsNULLNULL2570product-listingsview-product_1NULL2528product-listingsview-product_3product-listing/sale1840product-listingsproduct-listing/saleNULL1441product-listingsview-product_3view-product_21352
From this table it is relatively simple for a business user to self serve to quickly establish the answers to questions such as:
- How many sessions ended immediately after visiting the product-listings page
- How many sessions end after viewing a single product
- How many sessions go to checkout after viewing product_1, product_2, product_x....
- How many sessions view one product before going back to the product listings page
- How many sessions return to the sales page after viewing a product
- How many sessions only contain x number of steps
Multi-path funnels with a defined terminal
An important variant of the multi-path funnel described above is the multi-path funnel that has both a starting point and a destination defined. Generally this destination is a conversion variant. This funnel type can be depicted with the following graph:
The following query produces a table that conforms to the logic displayed in the diagram above:
WITH funnel_prep AS(
SELECT domain_sessionid
, page_urlpath
, derived_tstamp
FROM atomic.events
WHERE event_name = 'page_view'
AND DATE(collector_tstamp) >= '2021-01-07' -- timestamp argument, limit on partition key to improve query performance
)
, entry AS(
SELECT domain_sessionid
, page_urlpath
, MIN(derived_tstamp) AS entry_tstamp
FROM funnel_prep
WHERE page_urlpath LIKE '%{{.starting_url_placeholder}}%' -- define the starting point here
GROUP BY 1,2
)
, step1 AS(
SELECT domain_sessionid
, page_urlpath
, MIN(derived_tstamp) AS step_tstamp
FROM funnel_prep
WHERE domain_sessionid IN(SELECT domain_sessionid FROM entry)
GROUP BY 1,2
)
, conversion AS(
SELECT domain_sessionid
, page_urlpath
, MIN(derived_tstamp) AS step_tstamp
FROM funnel_prep
WHERE domain_sessionid IN(SELECT domain_sessionid FROM entry)
AND page_urlpath LIKE '%{{.conversion}}%'
GROUP BY 1,2
)
SELECT e.page_urlpath AS entry
, f.page_urlpath AS step1
, g.page_urlpath AS step2
, COUNT(DISTINCT e.domain_sessionid) AS sessions
FROM entry AS e
LEFT JOIN step1 AS f
ON e.domain_sessionid = f.domain_sessionid
AND e.entry_tstamp < f.step_tstamp
JOIN conversion AS g
ON e.domain_sessionid = g.domain_sessionid
AND f.step_tstamp < g.step_tstamp
GROUP BY 1,2,3
ORDER BY 4 DESC
The output table of this query is very similar to the output table of the previous query, with the exception that all session journeys must end in a conversion. It may be useful to reflect on the self-serve analytics requirements of the business and consider if one table should be set up to contain funnel paths for sessions that end in conversions and one table should be set up to contain funnel paths for sessions that do not end in conversions. Additionally it is useful to consider and how many nodes should be included in these graphs.
Funnel node analysis
The funnel analytics above is designed to enable product analysts and other stakeholders to drill down into their funnels to find exactly which nodes in one or more funnels are causing user friction. Once these nodes have been identified as areas that require actioning a more nuanced approach may be needed.
It may be the case for example that the product analyst and product manager consider the causal problem associated with the node to be something that is difficult to measure with event tracking, and alternative approaches may be considered such as heatmapping or viewing the problem through the lens of a UX or CX perspective. However these approaches that do not use behavior event data are outside of the scope of this article.
Where event tracking can help with this sort of funnel node analysis is by examining what on a particular funnel node it is possible to track, consider the following examples:
- It may be possible to best understand the funnel node as a micro funnel in its own right, if the funnel node is a product page that contains 4 forms and the majority of users drop off after only filling in 3 out of the 4 forms then this is a valuable insight that should be considered carefully.
- Alternatively the implementation of additional custom tracking could help to understand why sessions fail to progress through a particular funnel. An example of this could be setting up a custom event to fire on form submission failure detailing the reason for failure.
At this stage of granularity there are practically an unlimited number of directions to go in - knowing which direction is the best one is open to the interpretation of the individual product analyst.
Funnel analytics in SQL with Snowplow
Hopefully the above provides a foundation for measuring product performance using behavioural data. The SQL queries written here can and should be customized according to the needs of your business. There is great value to unlock by editing these queries to compare (for example) the conversion rates when split by a sessions’ landing page, the session marketing channel, user geography, dates, etc.
The product analytics tooling market is a crowded one, and it can be difficult to decide what tool to purchase. Hopefully this article shows that with rich behavioral data and the right frameworks it is possible to go beyond the limits of packaged product analytics tools with powerful, granular insights without too much effort.
Footnotes:
- {{.}} format denotes a placeholder values, populate them with strings of text which are relevant to the placeholder descriptors (and can be found in your behavioural event dataset).
- Each SQL CTE represents a node in the directed funnel graphs above, to add more nodes to a directed graph simply add more CTEs in the same format as the existing code.
- The sub-queries in this article are joined together on common session identifiers. If a reliable user identifier is present in the initial data set then this could be substituted into the queries in place of the session identifier to return more user focussed metrics.