Identity Stitching in Snowplow: A Q&A for Data Engineers
Identity stitching is the process of taking the common identifiers scattered across a user's behavior (cookies, device IDs, email addresses, phone numbers, CRM IDs) and tying them back to a single person. When done correctly, it allows businesses to create a unified customer profile out of events that would otherwise look like dozens of different users.
We first put this Q&A together back in 2023. Back then, the conversation was mainly about warehouse-level stitching, SQL models, nightly rebuilds, and a mapping table you built yourself.
In 2026, that’s the fallback, not the default. Browsers have eroded cookies, users bounce between anonymous and logged-in states constantly. Use cases now include real-time personalization, mid-session activation, and AI agents that need to know who they’re talking to right now. All of which cannot wait for a nightly job.
So we’ve refreshed this post around Snowplow Identities, our graph-based, real-time identity resolution layer that launched in April 2026. The core questions data engineers ask about identity stitching haven’t changed much, but the answers certainly have.
What is identity stitching in Snowplow?
Identity stitching is the process of linking behavioral events to unique users across sessions, devices, browsers, and domains. By doing so, you can resolve a user’s scattered identifiers down to a canonical ID (an identity graph) so every downstream system sees a unified customer profile instead of a fragmented one.
In Snowplow, this happens inside Snowplow Identities. It runs on your event stream, in your own cloud, and writes a stable snowplow_id onto every enriched event before it lands in the warehouse. You still own the identity graph and the customer data. You just don't have to build the resolution logic yourself.
Why is identity stitching important?
Without it, your behavioral events scatter across cookies, devices, and platforms, with no correlation to the same user identity. Consequently, you won’t be able to:
- Track a full customer journey from first anonymous visit to conversion
- Attribute conversions back to the right touchpoints
- Build a unified customer profile for analytics, personalization, or modelling
- Feed AI agents the full customer context they need to reason well
Anonymous-to-known is the moment where broken identity hurts most. Let’s say you have a user that builds up two weeks of behavior before creating an account. If your identity resolution doesn’t stitch that anonymous history to the new account in real time, everything that anonymous user taught you is effectively lost the moment they log in. As a result, the campaign that converted them goes uncredited, the next email reverts to generic, and any AI agent reasoning about that customer starts from scratch.
How do I implement identity stitching with Snowplow data?
With Snowplow Identities, you turn it on as part of your pipeline. The heavy lifting (resolving identifiers, maintaining the graph, handling merges) is done for you. What you're responsible for is feeding it good identifiers.
Step 1: Track as many identifiers as possible on each event
Snowplow trackers collect a wide range of identifiers automatically. The useful ones for stitching:
network_userid: set server-side by the collectoruser_ipaddress: the user's IP addressuser_id: your own identifier (e.g. an authenticated email or CRM ID)domain_userid,domain_sessionid: first-party cookie IDs, JavaScript trackerapple_idfa,apple_idfv,android_idfa: mobile tracker identifiersclient_session.user_id,client_session.session_id: mobile session context- Custom contexts for additional identifiers like
email_address,loyalty_id, a hashed phone number, or any business-specific ID
One note for 2026. domain_userid is a client-side cookie, and Safari now caps those at seven days. Pair it with network_userid and an authenticated user_id wherever you can.
Step 2: Let Snowplow Identities resolve them in real time
Snowplow Identities stitches identifiers together the moment two of them appear together on an event. Log in, sign up, link an account, and the anonymous history is merged to the known identity before the event is loaded. Every event that follows carries the resolved snowplow_id.
With Identities, you don’t have to worry about nightly mapping jobs, LEFT JOIN in the warehouse, or reverse ETL to push profiles back out to activation tools.
Can I stitch across platforms like mobile and web?
Yes. Identities resolves across web, mobile, and cross-domain as long as you're tracking the right identifiers on each platform: domain_userid on web, apple_idfv / android_idfa on mobile, and a shared user_id or hashed email wherever a user authenticates.
The graph walk that used to live in recursive CTEs in your warehouse runs in the pipeline now. You don't write it, and it doesn't scale linearly with your event volume.
What about shared devices?
Shared devices (family tablets, public computers, kiosks) are where deterministic stitching can misattribute events. Identities gives you a few levers:
- Mark identifiers as "unique" so they can never be merged with another (for example, a login ID you know should only ever belong to one person)
- Configure identifier priority so more reliable IDs win merges
- Exclude known shared-device identifiers from the resolution logic entirely
Everything is deterministic, so you're not auditing probabilistic guesses. When a merge happens, the event that triggered it is recorded.
Can I include third-party marketing identifiers like GCLID or CID?
Yes. Capture them via custom contexts or URL parameters and attach them to the event at landing. Identities will associate them with the resolved snowplow_id from that point on, so every downstream system sees the click ID tied to the right user.
What's changed since cookies started disappearing?
Short answer: the reliability of anonymous identifiers has dropped, and the latency requirements on stitching have risen.
Longer answer:
- Third-party cookies are effectively dead on Safari and Firefox.
network_useridas a third-party cookie isn't safe to rely on. First-party is where you need to be - First-party cookies are under pressure too. Safari's ITP caps client-side-set cookies at seven days. Server-set first-party cookies last longer
- Mobile identifiers are patchier. 54% of mobile impressions lack identifiers, according to BDEX
- Data teams are being asked to support use cases (personalization, agentic context, mid-session activation) that can't wait for tomorrow's warehouse job
That last point is the real shift. Stitching used to be a reporting problem. In 2026 it's an activation and AI problem.
Does identity stitching need to be real-time now?
For a growing set of use cases, yes. A few tests:
- Does your business personalize the next page view, content served, email, or ad based on earlier behavior? Requirement: Real-time
- Do you have AI agents reasoning about a customer mid-session? Requirement: Real-time
- Are you pushing unified customer profiles into activation tools like Braze while the user is still on the site? Requirement: Real-time
- Is identity stitching purely fueling next-day dashboards and quarterly LTV reports? Requirement: Batch is fine
A lot of teams end up needing both, which is fine. But if any real-time use case is on your roadmap, warehouse-only stitching will always be a step behind.
What do I actually get with Snowplow Identities?
In April 2026 we launched Snowplow Identities to solve identity resolution at the pipeline layer. It's graph-based, fully deterministic, and runs on the Snowplow event stream in your own cloud.
What it gives you:
- A stable
snowplow_idon every enriched event. It follows a user from their first anonymous interaction - Real-time deterministic merges. The moment a user logs in, signs up, or links accounts, their anonymous history is merged into the known identity. No orphaned pre-authentication behavior
- Events already stitched when they land. Your warehouse receives resolved identities. No nightly mapping job. No reverse ETL to push profiles back out
- Auditable merges. Every merge is triggered by a specific event linking two identifiers, and the merge event is emitted so you can trace it
- Edge cases handled natively. Shared devices, cross-domain tracking, configurable identifier priority, unique ID flags to prevent incorrect merges
- Your infrastructure, your graph. Identity data sits in a managed Postgres alongside your pipeline, VPC-hosted, GDPR deletion supported
The same snowplow_id feeds your warehouse, Snowplow Signals, Event Forwarding to activation tools, and any AI agent that needs identity-aware context. Identity is resolved once, at the source.
Can I still stitch identities in the warehouse?
Yes, if that's what fits. Batch stitching with the Snowplow dbt packs is still supported and still a reasonable choice for a few scenarios:
- Pipelines with no real-time use case on the roadmap
- Teams already deep into a dbt-based warehouse model who don't want another moving part
- Small event volumes where pipeline-level resolution isn't worth the setup
The packs ship with identity mapping models (snowplow_id_mapping, snowplow_id_mapping_scd, snowplow_id_changes) that do the same job in SQL that Identities does in the pipeline, just on yesterday's data.
For everything else (real-time personalization, agentic use cases, cross-domain, anonymous-to-known merges that matter in-session), Identities is the default path going forward.
Are there tools that help with identity stitching in Snowplow?
Yes. A few that matter:
- Snowplow Identities: the real-time graph-based resolution layer. Default for almost every modern use case
- Snowplow dbt packs: warehouse-level mapping models for batch workflows
- Snowplow Enrich / Loader: shape and validate identifier payloads as they move through the pipeline
- Snowplow Signals: the real-time feature store that reads
snowplow_idstraight from the pipeline and serves customer context to activation and AI workloads
Final thoughts
Identity stitching used to be a reporting problem solved in the warehouse with SQL. In 2026, it's a pipeline problem solved in real time, feeding analytics, personalization, and AI agents in the same motion.
The principle that still holds: collect the raw behavioral data cleanly and keep it decoupled from your stitching logic. Your identity graph will evolve. New IDs get added, edge cases emerge, business rules change. You want to change how stitching works without losing the raw customer data underneath. That's still true. The difference in 2026 is that Snowplow Identities handles the resolution for you, while you keep the ownership.
If you'd like to see what pipeline-level identity stitching looks like in practice, you can read the Snowplow Identities announcement blog, explore the Identities product page, or see how we think about identity stitching vs. the competition.
And if you'd like to talk through your own setup, get in touch.