Blog

How to Detect Bots and AI Agent Traffic on Your Website

By
Nick Stanchenko
&
Adam Roche
February 20, 2026
Share this post

In the first post of our agentic browsing series, we explained what an agentic browser is and why it matters. The second explored the hidden risks agentic browsing poses to your analytics stack. Now we’re going to get practical and answer the question: how do you actually detect bots and agentic browsers? 

What follows is drawn from our recent session on AI agent and bot behavioral intelligence featuring Snowplow Co-Founder & CEO Alex Dean and Senior Product Manager Nick Stanchenko. Between them, they laid out a detection framework built on what Snowplow sees across more than 2 million websites and over a trillion events per month. 

By the end of this post, you’ll understand why detection beats blocking, three ranked methods for identifying agent traffic, and how to build a composite bot score that separates your data into actionable segments. If you’re trying to figure out how to detect bots in your analytics, separate real user behavior from bot activity, and actually do something useful with that knowledge, this is the framework to follow.

Why Detection Beats Blocking

To date, the default industry response to bots has been defensive with companies deploying web application firewalls (WAFs), edge protection, and bot blocking. For malicious bots, these tactics make absolute sense. But when it comes to AI agents, blocking them creates a blind spot. You can’t understand or optimize for traffic you’ve shut out entirely. 

Why is this an issue? Well, McKinsey projects that 20–50% of traditional search traffic is at risk of displacement by AI-powered search. But the key thing here is that this traffic is expected to bring $750 billion in consumer spend by 2028. As we’ve touched on before, HUMAN Security has documented a four-digit percentage increase in agentic traffic over the past year alone. So it’s clear, blocking all of this growing traffic means blocking commercial intent. 

To address this, Alex Dean discussed the importance of playing offense whereby you proactively detect the agents, analyze what they’re doing constructively, in real time, while they’re still on site. So rather than treating every non-human visitor as a security risk, you start treating AI agents as first-class visitors whose behavior is worth modeling and responding to. A traditional bot detection solution focuses on blocking threats and minimizing the negative impacts on user experience. That's necessary but insufficient. This shift from defense to offense is the foundation everything else in this post builds on. 


Three Detection Methods, Ranked by Effectiveness

During the webinar, Nick Stanchenko framed detection as a spectrum. The methods below apply to bots and agents that show up in your browser-based analytics, those that execute JavaScript and generate client-side events. AI agents that fetch pages without running any scripts are a different problem, which we’ll cover later in this post.

The three bot detection techniques below are arranged by how hard they are for a bot to fake and also how difficult they are for you to implement. You can think of it as a trade-off between simplicity and reliability. Each method catches a different tier of bot.

Method 1: Identity Signals: Who They Appear to Be

This is the most simple check. If a user agent string says GPTBot, it’s probably GPTBot. If automation flags are turned on in the browser, or the screen size is anomalous, or the browser fingerprint has irregularities, those are signs you’re looking at a bot rather than a human in a traditional browser. 

New standards are emerging here too. For instance, RFC 9421 HTTP Message Signatures allow agents to cryptographically prove their identity, and Cloudflare’s web bot authentication standard uses this protocol. Adoption is still early, but these standards will help verify that a bot claiming to be ChatGPT actually is ChatGPT and not a malicious crawler posing as one.

However, the limitation with this method is obvious: this only catches bots that self-identify or are sloppy. If a bot or agent says it’s Chrome, it doesn’t necessarily mean it’s Chrome. Identity signals are the easiest layer to fake.

Method 2: Network-Origin Signals: Where They Come From

The next layer examines IP addresses and the autonomous systems they belong to. If traffic originates from an internet service provider, it’s probably a human. If it’s coming from a cloud provider like AWS or DigitalOcean, it’s probably a bot. This touches on what Nick calls the sweet spot for reliability versus complexity, meaning it’s harder to spoof than a user agent string, but simpler to implement than full behavioral analysis.

The caveat here is that VPN users create false positives. A real user browsing through a cloud-hosted VPN will look like a bot at the network-origin level. This is why no single method is sufficient on its own.

Method 3: Behavioral Signals: How They Act

This brings us to our final method, which is the hardest layer for bots and agents to fake. Whether it’s a scraper, a search agent, or an AI assistant acting on behalf of a user, the underlying constraint is the same: efficiency. Bots simply cannot afford to behave like humans and wander aimlessly around pages. They navigate directly, process content quickly, and move on. 

The behavioral signals to watch for include the presence or absence of mouse movements or screen taps, scrolling patterns, and click or tap intervals. We know humans exhibit irregular timing and indirect paths. But when you look at bot and agent behavior, they tend to browse pages in rapid succession with regular intervals. 

HUMAN Security's threat intelligence team found that ChatGPT's agent, for example, always moves its mouse in increments of 0.25 pixels, with smooth, linear traces that are visually distinct from the messy, unpredictable paths humans produce. Combined with device fingerprinting, these behavioral signals create a layered picture that’s extremely difficult to spoof. 

The more sophisticated the agent navigates and the more it tries to mimic human behavior to handle multi-step workflows across web pages, the more computational overhead it incurs. Efficiency and stealth are at odds, making behavioral signals the most reliable detection method. 


Combining Signals Into a Composite Bot Score

Nick stresses the point that you cannot expect to look at your analytics data and separate human users and automated bots with 100% accuracy. It’s a best-effort, diminishing-returns exercise. He suggests establishing a composite bot score, which is a probability that combines all three signal types into a single metric. This is the core of effective bot detection. There isn’t a binary yes/no, but a confidence score. 

By this, he means weighting signals by how reliable they are. 

Crucially, the weights need to be tunable per business. For example, marketplaces typically see short-session price crawlers that hit a single listing and disappear. Media companies, on the other hand, see longer-running bots that traverse content libraries. As agentic browsing scales, a one-size-fits-all threshold doesn’t work.

Once you have a bot score, the downstream value is significant. You can segment analytics by threshold to isolate human KPIs like bounce rate, click-through rate, and conversion rate. You can filter agent traffic out of the sensitive data feeding your ML models. These models were designed to learn from human behavior, not from agents browsing every page in rapid succession. 

As we explored in the second post in this series, tangled data doesn’t just skew dashboards, it degrades every decision built on top of it.

The Invisible AI Problem: Why Your Analytics Can’t See Most AI Agent Traffic

Everything we’ve discussed so far assumes the AI agent is running in a browser or uses a browser-like automation framework that executes JavaScript and therefore shows up in your analytics. Most don’t. The majority of AI agents - training crawlers, AI search agents, retrieval agents - fetch web pages without ever executing JavaScript or triggering your client-side tracking. As a result, they’re completely invisible to traditional web analytics tools like GA4. 

During the session, Nick illustrated this concept as an iceberg. Consider the full journey when a user gets an answer from ChatGPT that references your content. First, a training crawler (GPTBot) visited your page and added it to the model’s training data. Later, a search agent (ChatGPT-User) checked the current version. Then the user got their answer from the chatbot. Finally, the user clicked through to your site. That last step, the referral with utm_source=chatgpt, is the only part your analytics sees. The rest is below the waterline, visible only in server logs.

Why does this matter? Because optimizing for AI requires data on the full picture. You need to know which platforms are most active on your site, which content they’re indexing, and which pages they’re ignoring. Without server-log data, you’re basically making decisions on the tip of the iceberg. But if you deploy a standalone server-log tool, you end up creating a data silo. So the moment you need to answer a question that spans both datasets, like “which AI platforms are most active on my highest-engagement pages?”, you’re stuck. The solution, therefore, is a unified platform that ingests both client-side events and server-side logs into the same warehouse, with schemas that support different data types natively. 


The Hardest Problem: Detecting Mid-Session Handoffs

Historically, a session on your site was either human or bot. Agentic browsing breaks that assumption entirely. 

Let’s say a user visits your site. They browse normally, then hand off to Perplexity Comet to book a flight or fill out forms for a subscription renewal. At this point, the agent takes over, and your analytics sees one continuous visit. 

In this scenario, behavioral data can tell you the story of the handoff if you know what to look for. During the human phase, you’ll see wandering across the page, irregular scrolling, indirect paths, engagement with promotions and ads. But when the agent takes control, you’ll see the pattern changes sharply. The agent will navigate directly to target elements, with consistent timing between actions, and won’t engage with anything peripheral. As Nick put it: 

“the agent just processes the content, doesn’t click on anything considered secondary to its primary objective, and ignores all your ads and promotions.”

Detecting this handoff requires event-level behavioral scoring that updates in real time as the session progresses. When the score flips from “human” to “agent,” you can flag the transition in your analytics. But more importantly, you can adapt the experience for the agent, which brings us to the real opportunity. 

From Detection to Adaptation

Detection isn’t the end goal here. It’s what makes real-time adaptation possible. In a demo during the session, Nick showed what this looks like in practice using Snowplow event tracking and Snowplow Signals

When the system detected that an agent had taken over a browsing session, the website responded by serving an abridged version of the content, which is a summary rather than the full article. This then encourages the human to intervene to see the rest. When the human returned and resumed normal browsing behavior, the site detected the transition back and introduced its own customer-facing AI agent, an expert in the site’s content domain, to re-engage the user in a hyper-personalized manner based on both historical and in-session context. All interactions were logged as standard Snowplow events, meaning the full session - human phases, agent phases, and transitions - remained fully analyzable in the warehouse.

The key insight here is that you’re setting the terms for how the browser agent intermediates between your website and the user, rather than ceding all control. That’s the difference between playing defense and playing offense. 

Agentic browsing may still feel like it’s in its infancy, but the recent growth in their usage has been staggering. The recent introduction of Gemini in Chrome suggests agentic browsing is becoming increasingly mainstream. So now is the time for organizations to start setting up their offense. And this is only possible when detection works at event-level granularity, in real time, across both client-side and server-side data. 

See How Snowplow Solves Bot and Agentic Browser Detection

Watch the full on-demand session featuring Snowplow Co-Founder & CEO Alex Dean and Senior Product Manager Nick Stanchenko. See the three-pillar detection framework, the composite bot score, and the live adaptation demo in action.

Subscribe to our newsletter

Get the latest content to your inbox monthly.

Get Started

Whether you’re modernizing your customer data infrastructure or building AI-powered applications, Snowplow helps eliminate engineering complexity so you can focus on delivering smarter customer experiences.

Get Started

Whether you’re modernizing your customer data infrastructure or building AI-powered applications, Snowplow helps eliminate engineering complexity so you can focus on delivering smarter customer experiences.

Get Started

Building AI-powered applications? Spin it up. Inspect the architecture. Watch your first intervention fire — all in under 10 minutes. Snowplow helps eliminate engineering complexity so you can focus on delivering smarter customer experiences.