First and Last Touch Attribution Models in SQL: A Comprehensive Tutorial
In today’s data-driven marketing landscape, accurately attributing conversions to specific marketing channels is critical for optimizing marketing spend and strategy. First and last touch attribution models are foundational approaches to connecting revenue events to the marketing touchpoints that preceded them. This tutorial will guide you through implementing these models in SQL, using Snowplow data as the primary dataset.
We’ll focus on Redshift-compatible SQL, but the logic can be adapted for other databases with minor modifications.
Why Attribution Matters
Attribution models help digital marketers connect revenue events with marketing touches, providing insights into which channels or campaigns are driving conversions. First touch attribution assigns all credit to the initial marketing touch, while last touch attribution attributes it to the most recent touch before the conversion. Understanding both models equips data analysts with essential tools for performance analysis and optimization.
Data Preparation
1. Identifying Marketing Touches
To start, we need to identify all marketing touch events. We can generate a table with the relevant data as follows:
CREATE TABLE derived.marketing_touches AS (
SELECT
domain_userid,
derived_tstamp,
event_id,
mkt_medium,
mkt_source,
mkt_term,
mkt_content,
mkt_campaign,
refr_medium,
refr_source,
refr_term
FROM atomic.events
WHERE (refr_medium IS NULL OR refr_medium != 'internal')
ORDER BY 1, 2
);
2. Identifying Revenue Events
Next, we need to create a table to capture revenue events. For this example, we assume the revenue events are standard Snowplow transaction events:
CREATE TABLE derived.revenue_events AS (
SELECT
domain_userid,
derived_tstamp,
event_id,
tr_total
FROM atomic.events
WHERE event_name = 'transaction'
ORDER BY 1, 2
);
Implementing First Touch Attribution
In a first touch attribution model, all revenue is attributed to the first marketing touch event for each user. We identify the first marketing touch per user and connect it to their revenue events.
Step 1: Identifying First Touch
WITH first_touch_tstamps AS (
SELECT
domain_userid,
MIN(derived_tstamp) AS first_touch_tstamp
FROM derived.marketing_touches
GROUP BY domain_userid
)
CREATE TABLE derived.first_marketing_touch AS (
SELECT
m.*
FROM derived.marketing_touches m
JOIN first_touch_tstamps f ON m.domain_userid = f.domain_userid
AND m.derived_tstamp = f.first_touch_tstamp
);
Step 2: Connecting First Touch with Revenue Events
SELECT
f.*,
r.tr_total
FROM derived.first_marketing_touch f
RIGHT JOIN derived.revenue_events r ON f.domain_userid = r.domain_userid;
Implementing Last Touch Attribution
In a last touch attribution model, the revenue is credited to the most recent marketing touch before each revenue event.
Step 1: Combining Touches and Revenue Events
CREATE TABLE derived.marketing_touches_and_revenue_events AS (
SELECT
domain_userid,
derived_tstamp,
event_id AS marketing_event_id,
NULL AS revenue_event_id,
'marketing touch' AS event_type,
NULL AS revenue
FROM derived.marketing_touches
UNION
SELECT
domain_userid,
derived_tstamp,
NULL AS marketing_event_id,
event_id AS revenue_event_id,
'revenue event' AS event_type,
tr_total AS revenue
FROM derived.revenue_events
);
Step 2: Identifying Last Touch for Each Revenue Event
WITH last_touch_event_ids_calculated AS (
SELECT
domain_userid,
derived_tstamp,
LAST_VALUE(marketing_event_id IGNORE NULLS) OVER (
PARTITION BY domain_userid
ORDER BY derived_tstamp
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS last_marketing_event_id,
revenue_event_id,
event_type,
revenue
FROM derived.marketing_touches_and_revenue_events
)
SELECT
r.domain_userid,
r.derived_tstamp AS revenue_event_tstamp,
r.last_marketing_event_id,
r.revenue_event_id,
r.revenue,
m.mkt_medium,
m.mkt_source,
m.mkt_term,
m.mkt_content,
m.mkt_campaign,
m.refr_medium,
m.refr_source,
m.refr_term
FROM last_touch_event_ids_calculated r
LEFT JOIN derived.marketing_touches m ON r.last_marketing_event_id = m.event_id
WHERE r.event_type = 'revenue event';
Handling Edge Cases and Best Practices
- Direct Traffic Handling: Ensure that refr_medium includes NULL to capture direct traffic.
- Mobile Events: Handle cases where domain_userid is NULL for mobile events.
- Attribution Windows: Implement date-based filtering to limit attribution to a specific window (e.g., 7 days).
Example for 7-Day Attribution Window:
SELECT
*
FROM last_touch_event_ids_calculated
WHERE derived_tstamp BETWEEN last_marketing_event_id_tstamp AND last_marketing_event_id_tstamp + INTERVAL '7 days';
Conclusion
By implementing first and last touch attribution models, data teams can gain deeper insights into the marketing channels that drive conversions. This SQL tutorial demonstrates how to build both models using Snowplow data, enabling more effective analysis of marketing performance.
For advanced implementations, consider incorporating multi-touch attribution models, attribution windows, and identity stitching to further refine your analysis.
Stay tuned for more advanced SQL tutorials and attribution modeling strategies on the Snowplow blog.