Blog

First and Last Touch Attribution Models in SQL: A Comprehensive Tutorial

By
&
November 13, 2024
Share this post

In today’s data-driven marketing landscape, accurately attributing conversions to specific marketing channels is critical for optimizing marketing spend and strategy. First and last touch attribution models are foundational approaches to connecting revenue events to the marketing touchpoints that preceded them. This tutorial will guide you through implementing these models in SQL, using Snowplow data as the primary dataset.

We’ll focus on Redshift-compatible SQL, but the logic can be adapted for other databases with minor modifications.

Why Attribution Matters

Attribution models help digital marketers connect revenue events with marketing touches, providing insights into which channels or campaigns are driving conversions. First touch attribution assigns all credit to the initial marketing touch, while last touch attribution attributes it to the most recent touch before the conversion. Understanding both models equips data analysts with essential tools for performance analysis and optimization.

Data Preparation

1. Identifying Marketing Touches

To start, we need to identify all marketing touch events. We can generate a table with the relevant data as follows:

CREATE TABLE derived.marketing_touches AS (
  SELECT
    domain_userid,
    derived_tstamp,
    event_id,
    mkt_medium,
    mkt_source,
    mkt_term,
    mkt_content,
    mkt_campaign,
    refr_medium,
    refr_source,
    refr_term
  FROM atomic.events
  WHERE (refr_medium IS NULL OR refr_medium != 'internal')
  ORDER BY 1, 2
);

2. Identifying Revenue Events

Next, we need to create a table to capture revenue events. For this example, we assume the revenue events are standard Snowplow transaction events:

CREATE TABLE derived.revenue_events AS (
  SELECT
    domain_userid,
    derived_tstamp,
    event_id,
    tr_total
  FROM atomic.events
  WHERE event_name = 'transaction'
  ORDER BY 1, 2
);

Implementing First Touch Attribution

In a first touch attribution model, all revenue is attributed to the first marketing touch event for each user. We identify the first marketing touch per user and connect it to their revenue events.

Step 1: Identifying First Touch

WITH first_touch_tstamps AS (
  SELECT
    domain_userid,
    MIN(derived_tstamp) AS first_touch_tstamp
  FROM derived.marketing_touches
  GROUP BY domain_userid
)
CREATE TABLE derived.first_marketing_touch AS (
  SELECT
    m.*
  FROM derived.marketing_touches m
  JOIN first_touch_tstamps f ON m.domain_userid = f.domain_userid
  AND m.derived_tstamp = f.first_touch_tstamp
);

Step 2: Connecting First Touch with Revenue Events

SELECT
  f.*,
  r.tr_total
FROM derived.first_marketing_touch f
RIGHT JOIN derived.revenue_events r ON f.domain_userid = r.domain_userid;

Implementing Last Touch Attribution

In a last touch attribution model, the revenue is credited to the most recent marketing touch before each revenue event.

Step 1: Combining Touches and Revenue Events

CREATE TABLE derived.marketing_touches_and_revenue_events AS (
  SELECT
    domain_userid,
    derived_tstamp,
    event_id AS marketing_event_id,
    NULL AS revenue_event_id,
    'marketing touch' AS event_type,
    NULL AS revenue
  FROM derived.marketing_touches
  UNION
  SELECT
    domain_userid,
    derived_tstamp,
    NULL AS marketing_event_id,
    event_id AS revenue_event_id,
    'revenue event' AS event_type,
    tr_total AS revenue
  FROM derived.revenue_events
);

Step 2: Identifying Last Touch for Each Revenue Event

WITH last_touch_event_ids_calculated AS (
  SELECT
    domain_userid,
    derived_tstamp,
    LAST_VALUE(marketing_event_id IGNORE NULLS) OVER (
      PARTITION BY domain_userid
      ORDER BY derived_tstamp
      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS last_marketing_event_id,
    revenue_event_id,
    event_type,
    revenue
  FROM derived.marketing_touches_and_revenue_events
)
SELECT
  r.domain_userid,
  r.derived_tstamp AS revenue_event_tstamp,
  r.last_marketing_event_id,
  r.revenue_event_id,
  r.revenue,
  m.mkt_medium,
  m.mkt_source,
  m.mkt_term,
  m.mkt_content,
  m.mkt_campaign,
  m.refr_medium,
  m.refr_source,
  m.refr_term
FROM last_touch_event_ids_calculated r
LEFT JOIN derived.marketing_touches m ON r.last_marketing_event_id = m.event_id
WHERE r.event_type = 'revenue event';

Handling Edge Cases and Best Practices

  1. Direct Traffic Handling: Ensure that refr_medium includes NULL to capture direct traffic.

  2. Mobile Events: Handle cases where domain_userid is NULL for mobile events.

  3. Attribution Windows: Implement date-based filtering to limit attribution to a specific window (e.g., 7 days).

Example for 7-Day Attribution Window:

SELECT
  *
FROM last_touch_event_ids_calculated
WHERE derived_tstamp BETWEEN last_marketing_event_id_tstamp AND last_marketing_event_id_tstamp + INTERVAL '7 days';

Conclusion

By implementing first and last touch attribution models, data teams can gain deeper insights into the marketing channels that drive conversions. This SQL tutorial demonstrates how to build both models using Snowplow data, enabling more effective analysis of marketing performance.

For advanced implementations, consider incorporating multi-touch attribution models, attribution windows, and identity stitching to further refine your analysis.

Stay tuned for more advanced SQL tutorials and attribution modeling strategies on the Snowplow blog.

Subscribe to our newsletter

Get the latest content to your inbox monthly.

Get Started

Whether you’re modernizing your customer data infrastructure or building AI-powered applications, Snowplow helps eliminate engineering complexity so you can focus on delivering smarter customer experiences.