Introducing Agent Self-Tracking - A New Approach to Measuring First-Party Agent Experiences
Understanding the three layers of tracking needed to truly understand how your agents are serving your customers
Companies are deploying first-party agents at an incredible rate. So much so that they’re completely transforming how users interact with brands. But almost nobody has figured out how to actually measure whether those customer-facing agents are impacting the customer experience in a positive way.
In part one of my blog series, I explored the taxonomy of agents - back-office, third-party, and first-party agents - and the challenges they bring from both a customer experience and data analytics perspective. Businesses have been experimenting with internal AIs and agents extensively. First-party agents, on the other hand, have been less common, though this is starting to change.
Companies like Bank of America, Indeed, Kayak, Booking.com, Home Depot, Amazon.com, Jo Malone, Zendesk, and many more are building agents to go directly in front of end users. These agents perform a huge array of tasks. From applying for jobs to booking flights through to answering support tickets and summarizing user reviews. Fundamentally, they all aim to give end users a more natural way to complete tasks.
And it’s not just chat-based agents. With the rise of generative UIs - interfaces that are dynamically composed by an AI-based on what the user needs - we may soon be in a world where the majority of digital experience is agentic. So instead of users seeing a static page template served from a CMS, they’ll instead see a bespoke interface assembled on the fly by an agent that’s trying to best serve that specific user at that specific moment.
From where we’re standing, this is the direction digital experiences are heading. And we therefore need to get serious about understanding how effectively our agents are serving our customers. That starts with understanding how well your agent understands each customer in the first place. How do we do this? The answer lies in agent analytics.
The tools for building and deploying agents out in the wild (LangChain, Vercel AI SDK, Google’s Agent Development Kit (ADK), Langfuse etc.) have enabled the rapid development of agents. But the tooling for understanding the impact of first-party, customer- facing agents on the overall customer experience (CX) and agentic analytics is severely lacking.
Agents break the traditional digital analytics paradigm
With traditional web or digital analytics, everything is completely deterministic. If a user lands on a web page, fills out a lead form and clicks/taps a button that says “Book Now”, the website would send a request to a backend system that would add a booking to a database, and the frontend would navigate the user to a booking confirmation page. Bar something going wrong technically (which is obviously a big caveat in a lot of cases) this should happen every time, and if two users enter the page and provide the same inputs, the system as a whole should handle these users in a completely deterministic way.
Agentic systems break this paradigm. AI agents are inherently non-deterministic. An agent can receive identical inputs multiple times in a row and behave completely differently each time. They are built this way. It’s not a bug, it’s a feature.
This brings with it new challenges when it comes to digital measurement, such as being able to track various steps the agent took on the fly, as well as how the end user behaves given this new user experience.
Agentic interactions are hard to represent as structured data
Traditional analytics is built around the concept of well-defined events mapping to well-defined user actions. A page view, a button click, an add-to-cart - these all translate naturally into neat rows of structured data. Agentic interactions, especially conversations, don't map anywhere near as cleanly.
A single agentic session might involve a free-form conversation spanning multiple topics. The agent has to reason through several possible approaches, dynamic tool calls with varying inputs and outputs, and a generative UI that renders completely different components depending on context. Trying to represent all of this as structured data that's queryable and analyzable is a fundamentally harder problem than tracking a user clicking through a series of predefined pages.
The volume and dimensionality of events can overwhelm traditional tools
Let’s consider what happens when an agent dynamically creates a new web page for a user, composed of different React components assembled based on the conversation context. How would you track that in a traditional analytics tool? trackBrandNewPageCreatedNeverSeenBeforeWith200Properties?
Traditional analytics platforms simply aren't designed for the volume of events, the cardinality of event types, or the depth of properties that agentic applications can generate.
Each agent interaction can produce orders of magnitude more events than a traditional page-based interaction, and each event can carry significantly more context. This is a data infrastructure challenge as much as it is an analytics challenge.
Traditional funnels no longer apply
There is another shift that agentic experiences bring. They change how we understand user intent when a user hits our website or app. We have traditionally categorized pages in our digital estate by user intent - typically along a purchasing funnel.
- Blog page - top-of-funnel, discovery phase
- Search pages - research phase
- Product pages - mid-funnel, consideration phase
- Checkout pages - bottom-of-funnel. Ready to purchase, do not lose them, cross-sell and upsell if you can
However, this journey is potentially blown apart by an agentic chat app, or an entirely generative UI that dynamically changes based on what the user asks. Traditional funnels don’t apply when the agent can go off in a whole host of different directions, depending on what the user wants, how they respond, their conversation history, the agent’s capabilities, the underlying AI model, etc.
Interestingly, this is at odds with the fact that in a lot of cases, when users are dealing with AIs, they are literally telling us what their user intent is. But our current tooling is not geared up to make the most of this unique shift in user behavior.
We are relying on the intelligent agent to interpret the user’s intent and, based on that interpretation, take the best next step for that user given all the relevant context. As digital analysts, we want to know why the agent made the decision it made. If we know this, we can start to parse back together our conceptual “funnels” as they map onto the overall customer journey, and start to understand how beneficial our agentic applications are to our end users and their CX.
Traditional digital analytics tools track “what” users do. With agentic applications, we have the opportunity to understand the “why” and truly optimize for user intent.
The three layers of agentic tracking
To fully understand the behavior and impact of an agentic application, we need to utilize a layer of tracking that’s not been available before.
Conceptually, there are three places within an agentic application that events can be generated:
- The user’s client - The browser/app/device interacting with the agentic app. Events are triggered by the end user
- The application’s server - The servers hosting the agentic app, that call APIs and databases to serve the user. Events are triggered on the server.
- From within the agent - The agent working with the end user, that orchestrates messages back-and-forth with the user, and coordinates tool calls in order to assist the end user. Events are triggered from within the agent itself.
The first two are well understood in the field of digital data collection. The most sophisticated tracking setups generally utilize both client-side and server-side event tracking to ensure the most complete picture of user interactions.
The third, however, is completely new. Never before have we had access to such a capable and autonomous system in our digital ecosystem that we can now tap into to gather granular data on how users are interacting with our platforms.
To be slightly more concrete, see below for a breakdown of how events from the three layers work:

The third layer - agent-side tracking - is about capturing the agent's reasoning. What did the agent understand about the user's intent? Why did it decide to call a particular tool? What constraints did it encounter? This is the layer that bridges the gap between knowing what happened and understanding why it happened.
This framework also helps address the “blank page problem” outlined in the previous post, whereby agents are blind to what users are doing when approaching an agent. Client-side events allow an agent to gain that context, and the agent-side tracking helps understand if the agent is understanding the context correctly.
Introducing Agent Self-Tracking
Implementing agent-side self-tracking in practice requires working with the non-deterministic nature of an LLM, rather than around or against it. We’re proposing a pattern we call Agent Self-Tracking - making the agent itself responsible for instrumenting its own behaviour, using the same tool-calling mechanisms it already uses to perform business tasks.
The non-deterministic nature of LLMs means this is a non-trivial problem to solve, as we have to enforce some level of predictability and determinism in order to reliably collect data on the agent’s behaviour. We have a few levers we can pull to achieve this:
- The system prompt - most agents follow the instructions set in their system prompt quite closely, if the system prompt is well constructed
- Tools - by providing agents with tools to complete tasks that connect to other more traditional systems (like APIs, CLIs, and databases), we can have some level of predictability
- Structured inputs and outputs - most LLM providers and harnesses allow for defining the data format that an LLM must provide to a tool, and a structured definition for the output generated by an LLM (e.g. a JSON schema, or Structured Outputs)
We can take advantage of these levers in order to achieve reliable and predictable agent self-tracking.
Firstly, we surface self-tracking functions as tools the agent can call (e.g. userIntentDetected, agentDecisionLogged etc), right alongside business tools (e.g. getObjectFromSalesforce, searchFlights, uploadFileToGoogleDrive etc).
These tools have input schemas that the agent has to adhere to in order to call them. This object is then passed to the self-tracking tool, which in turn takes that object and provides it to a tracking function to send an event to the event collection platform (such as Snowplow).
The agent is instructed to use the self-tracking tools at various points during the agentic session (such as immediately after receiving a message from the user, or immediately before taking a specific action such as calling a business tool or closing off a conversation.)
Technically, the tools are semantically identical (interfaces with inline descriptions, structured input, and a function to be executed - as per the tool() interface from the Vercel AI SDK), However, conceptually, they are very different, and we can use the system prompt to differentiate between the two kinds of tools.
The elegance of this approach is that it works with the agent's natural tool-calling behavior rather than trying to bolt tracking on from the outside. The alternative - relying solely on embedding tracking within business tools so it fires whenever a tool is called - would capture usage, but miss the most valuable data: why the agent chose that tool, how it interpreted the user’s intent, and what constraints it considered before acting.
What Agent Analytics can unlock
With the three layers of agentic app tracking implemented, we now not only get insights into how our users are utilizing our agentic apps, but also how our agents are behaving in front of our users and why. The three layers combined allow analyses such as the following:
- Is the agent understanding the intent of users correctly?
- Is the agent calling the right tools at the right time? Do long running tool calls impact conversion rates?
- When the agent gives advice to a user, does it impact the rest of their conversion journey?
- Does longer user<->agent conversations lead to higher or lower conversion, average order values?
- Do users actually follow agent recommendations (for product recommendations for instance)?
- Are there frequent scenarios that the agent can’t satisfy (constraint violations)? How do these constraint violations impact the rest of the user journey?
- Has the underlying model provider rolled out a change that has impacted the agent’s behavior?
This is even more important in Multi-Agent Systems (MAS), where multiple agents operate within a customer-facing app. To improve efficiency and protect the customer experience, you need to understand both individual agent behavior and how those agents work together.
At Snowplow, we’re dedicating a lot of our time and focus to helping businesses build first-party, customer-facing agents, as we believe these will be the future of how end users interact with brands and their digital estates.
To that end, we’ve designed and launched a set of event and entity schemas on Iglu Central, as well as a new accelerator with a fully implemented agentic app with the three layers of tracking already installed, built on Next.js using the Vercel AI SDK.
You can check out the repo and explore the fully implemented app, or walk through the different stages of implementing agentic tracking into an agentic application.
Appendix
Agent observability, while a well-served area of tooling, doesn’t cover CX use cases
You may be thinking “The agentic platform I use does provide insight into model reasoning, input and output tokens, how quickly the agent responded, what tools it called etc. This isn’t an area we’re under-served in”. And you’d be correct. However, I would argue that this is a different and separate concern to the one I outlined above.
This is more what I would class as Agent Observability. In the same way that Datadog and New Relic serve to give SREs and DevOps staff visibility into the health, uptime, latency, reliability and issues with performance of their infrastructure, Agent Observability systems are aimed at AI Platform Engineers to provide insights into how the agent is running.
- How fast is my agent responding?
- How many tokens is my agent consuming?
- Does my agent ever error-out?
- How long does my agent take reasoning vs actually responding or taking actions?
All of these are very important questions that Agent Observability tools help monitor and answer.
However, they are not within the scope of what I would call Agent Analytics. This is the same differentiation I would make between web analytics and website monitoring or observability.
While we of course want to know that our agents are always up and working, and that we aren’t bankrupting our company with inference costs, we also want to know things like:
- What is the CSAT score of users who interact with our agent compared to those who don’t?
- How often does the agent actually solve our users' problems?
- Are users who interact with the agent more likely to convert than those that don’t?
If your company is investing in developing first-party customer-facing agents, then we should be serious about understanding if our agent experiences are improving the customer experience as a whole, not just whether the agent has Four Nines of availability (as important as that is).
Agent Observability tells you if your agent is working. Agent Analytics tells you if your agent is working for your customers.