How to do marketing attribution with Snowplow
What is attribution modeling?
Attribution modeling is the process of assigning credit for conversions to marketing touchpoints. There are many ways to attribute conversions to marketing activities, the most common include the following models:
- First touch: this model gives all credit to the user's first touch preceding a conversion
- Last touch: this model gives all credit to the user’s last touch preceding a conversion
- Last non-direct touch: this model gives all credit to the user’s last non-direct touch preceding a conversion
- Linear: this model gives equal credit to all user touches preceding a conversion
- Bathtub: this model gives the majority of the credit to the user’s first and last touch (typically 40% each) preceding a conversion, with the remaining credit (typically 20%) linearly split between all touches in between
- Time decay: this model gives credit to all of a user’s touches that precede a conversion, with that amount of credit decaying the longer ago the touch occurred
They key to attribution modeling is capturing all marketing touchpoints and all conversions, and being able to assign them to a specific user. This allows you to look at the effectiveness of your marketing spend across platforms and channels, over time.
Why should you take control of your marketing attribution?
Marketing attribution models have received a fair bit of criticism over the years. Any model that aims to quantify the impact of marketing activities have on customer behaviour will have its limitations, for example:
- What is the value of branding?
- What customer activity happened because of marketing activity, rather than regardless of it?
Owning your attribution modeling forces you make assumptions explicitly and deliberately, a crucial step in understanding its limitations and using its outputs appropriately. Furthermore, being able to run multiple attribution models in parallel allows you to see the impact different modeling logic have on the outputs of the model.
Getting started with attribution in SQL
This guide will assume you already have all of our marketing touches and all conversions captured. For more information on this, take a look at our Introduction to marketing attribution with Snowplow.
The approach is as follows:
- Collect all marketing touches into scratch.marketing_touches
- Collect all conversions into scratch.conversions
- Union scratch.marketing_touches and scratch.conversions to create scratch.touches_and_conversions
- Join scratch.touches_and_conversions on itself, then only select the rows where the interaction occurred before the conversion and save those into scratch.touches_by_conversions
- Rank the touches into scratch.touches_by_conversions_ranked, including the maximum rank for each conversion
- Compute the attribution scores based on the ranks and maximum rank
The final table should contain one row per conversion and channel combination, with a score for each attribution model.
Additionally, you might want to consider the following:
- Adding acquisition costs
- Adding revenue
- Splitting out attribution by additional dimensions, such as device type or campaign information
- Considering different types of conversions, or modeling intent-to-convert
Owning your attribution model allows you to iterate your approach over time, refining your analysis as you learn more about your users and your marketing activities. However, it will also mean you’ll need to find the best way for other teams to access these insights. For this purpose, you’ll want to visualise the models in your BI tool of choice.
Building an attribution dashboard
As discussed, the output of your attribution model will be a table of one row per conversion and channel combination, with a score for each attribution model as well as any other relevant dimensions you might want to add.
First, you’ll need to sum the data by channel. You’ll want to do this in the BI tool, so you can easily add additional filters to split attribution by dimensions such as device type or campaign. Next, you might consider how you want to present the data to give your marketing teams all the data they need to make informed decisions. For example, you’ll definitely want to make visualizations that focus on showing the relative performance of channels, but you might also want to show clearly how the different models weight the different channels differently.
Below is an example of what an attribution dashboard in Looker could look like. Specifically, it shows a random sample of data from our website, with all marketing activities grouped into the following four channels for simplicity: social, content, search and other.
Advanced approaches to attribution
Rule-based attribution can sometimes oversimplify the complex interaction of channels that ultimately lead to conversions. Nowadays, there are data-driven solutions, such as Markov chain attribution (analysing how the removal of a given touchpoint from the customer journey affects the likelihood of conversion) or the Shapley Value (calculating the average value of each channel's marginal contribution given all possible channel combinations). The latter is borrowed from Game Theory as is the basis of how advanced attribution works in GA360. In this guide, we will show an example of how the Shapley value can be calculated from Snowplow data to get a better understanding of the different marketing channels are performing. For detailed instructions on this approach check out James Kinley’s guide!
The Shapley value
The aim of the Shapley value is to understand the impact of each player’s contribution to a team by observing how the team reaches goals. In our case, the channels are the players, the different possible combinations of the channels across a user journey (called coalitions) are the team, and the conversions are the goals. We will use the same data set as we used for the SQL based attribution above, consisting of four channels: social, content, search and other. These channels create fifteen possible coalitions. These coalitions are the primary units of decision-making for a user and represent cooperative behaviour: channels working together can create more conversions. The idea is that this approach allows us to estimate the impact of each channel on the final conversion. Therefore, the Shapley Value provides deeper insight into channel performance by fair division of credit based on measured contribution, rather than rules.
Steps
Calculating the conversion ratio: the first step in this process is to find the ratios of how often each channel (and coalition of channels) occur in the data set. In our example, we had 1852 conversions from 1260 users. All of these conversions were made from four channels or coalitions of these channels. The table below shows all possible channel coalitions and their conversion ratios:
Computing the binary square matrix: this matrix shows coalition memberships. For example, the coalition “content and other” includes members: content, other, and content+other resulting in the coefficients [1,1,0,0,1,0,0,0,0,0]:
Computing the coalition (channel grouping) worth: this is determined by the product of the coalition memberships and the coalition ratios. In our example, we calculate this worth by summing the conversion ratios of each channel in a coalition, for example, when calculating the value for the coalition S7 [content + social], we would do the following:
Coalition S7 = Content+Social
v(S7) = content (S1) + social (S4) + content+social (S7)
v(S7) = 0.03 + 0.12 + 0.02
v(S7) = 0.17
The worth is calculated by the characteristic function v(s) which assigns a value to each coalition to signify its worth. A coalition’s worth represents the payoff that it can generate when its channels occur together. The coalition containing all channels in a given solution is called the “grand coalition” whose worth should be equal to the total payoff. In our case this is v(S15): content+other+search+social.
Calculating the Shapley value: the Shapley value represents the average value of each channel's marginal contribution to the grand coalition, taking into account all possible orderings. It is calculated by distributing the worth of the grand coalition (total payoff) between our four channels based on the worth of all the possible coalitions:
From this we can see that search has the highest marginal contribution to the grand coalition, followed by social, content then other. This shows the individual strength of each channel by taking into account all possible orderings of a user journey so provides the best insight into which channels are most responsible for conversions.
Take control of your attribution modeling with Snowplow
This article has reviewed some of the possible approaches you can take to extend the standard last non-direct click approach to understanding the value your marketing spend is driving for your business. It has shown that effective marketing attribution depends on having high quality and granular data on how your users are engaging with you across platforms and channels, over time. If you are interested in learning more about how Snowplow can help you deliver on marketing attribution, get in touch with us today!