AI Doesn't Have an Intelligence Problem, It Has a Context Problem: Our Key Takeaways from Databricks' Data + AI Summit 2026
The Snowplow team spent our fifth consecutive year at Databricks' Data + AI Summit at the Moscone Center in San Francisco, alongside roughly 30,000 data professionals, engineers, and AI practitioners. While the scale of the event was record-breaking, what stuck with us wasn't the attendance number. It was a single line from the opening keynote.
Databricks CEO Ali Ghodsi opened by asking the room how many people believed AGI had already arrived. The response was split. He argued it had, then immediately shifted to the harder question: if the models are already capable enough, why aren't enterprises getting the outcomes they're building for?
His answer organized the entire keynote: enterprises don't have an AI intelligence problem. They have a context problem.
"AI does not have an intelligence problem, it has a context problem." - Ali Ghodsi, Databricks CEO, Data + AI Summit 2026 (BigDATAwire)
Ghodsi organized the keynote around four challenges facing every enterprise deploying AI: context, control, cost, and choice. Context came first and kept coming back. Customer records, internal documents, business processes, and operational systems sit scattered across disconnected platforms. However capable a model is, it can only reason over what it can see. Getting data connected to AI that is governed, structured, and ready to reason over is the work that determines whether AI delivers or disappoints.
This is the argument Snowplow has been making since we first started building behavioral data infrastructure for AI workloads. Hearing Databricks frame the core challenge of enterprise AI as a data context problem, from the keynote stage of the leading data and AI conference, was a clear statement of where this industry is heading. The bottleneck isn't the model. It's the data context the model reasons over.
Every major announcement at the Summit flowed from that thesis.
Our big takeaway:
The data platform is no longer just where data lives. It's becoming the layer where business decisions get made: continuously, in real time, by agents acting on behalf of marketing teams, product teams, and customers alike.
A few of the headline releases at the Summit:
- CustomerLake: Databricks entered the marketing technology market with an agentic Customer Data Platform built natively on the lakehouse. CustomerLake brings identity resolution, profile agents, campaign agents, and activation across major martech platforms into a single governed foundation. It's available in Private Preview, with early customers including HP, Circle K, AB InBev, and Getnet by Santander.
- Genie One (GA): Databricks' agentic coworker went generally available, enabling business teams to produce documents, reports, and automated workflows over structured and unstructured data, with scheduling, alerts, and MCP tool integration built in.
- Genie Ontology: A live context layer that automatically extracts and continuously updates business knowledge from Databricks and connected workplace tools, grounding Genie answers in governed data rather than document embeddings.
- Agent Bricks expansion: The agent-building platform now supports the Claude Code SDK, LangGraph, CrewAI, and OpenAI Agent SDKs. Over 100,000 agents have been built on Agent Bricks to date.
- Unity AI Gateway: Runtime governance for models, agents, tools, and MCP services, with spend caps, intelligent routing, and guardrails for PII and prompt injection.
- LTAP and Lakehouse RT: New infrastructure that unifies transactional and analytical workloads on a single governed storage layer, and delivers sub-100ms query latency on governed Delta Lake and Iceberg tables via a new compute engine called Reyden.
Snowplow was present throughout the week: at the booth, in partner conversations, and on stage. Here's a closer look at what mattered most and what it means for teams building on Databricks.
CustomerLake and the decisioning layer
The most discussed announcement of the Summit was CustomerLake. On the surface, it reads as Databricks entering the CDP market. The more accurate read is that it signals something broader.
Traditional CDPs were designed as middleware, sitting between data collection and activation, managing identity, segmentation, and campaign delivery. That position is being pressured from two directions simultaneously. The lakehouse has consolidated where data lives, removing the case for a separate data layer underneath. And AI agents are taking over the campaign execution that previously required human operators above. The CDP-as-middleware gets squeezed out.
CustomerLake isn't a replacement for traditional CDPs in the same category. It's built for a fundamentally different architecture: one where data, models, agents, and activation all run on the same governed platform. Profile Agents build customer 360 records from raw data. Campaign Agents build audiences, decide next-best actions, and activate across channels in continuous loops Databricks calls "infinity campaigns." Rather than a marketer triggering a one-off campaign, the system runs always-on, reacting to customer context as it changes. Notably, Databricks frames CustomerLake around "Golden Context" rather than the traditional CDP concept of a "Golden Record." Static profile snapshots are being replaced by dynamic, agent-accessible context.
For Snowplow customers on Databricks, this creates a real opportunity. CustomerLake's agents need clean, validated, first-party behavioral data to produce reliable outputs. Profile Agents that reason over inconsistent or schema-less event data produce inconsistent profiles. Snowplow validates every event against a schema at the point of collection, before it reaches the lakehouse. The behavioral data CustomerLake reasons over is already governed and trustworthy when it arrives.
What CustomerLake doesn't do is collect first-party behavioral data from your digital estate. That's deliberate. Doing it well, at the volume and consistency that makes it useful across multiple workflows, is a harder problem than it appears. As more Databricks customers turn to CustomerLake for their CDP layer, that collection foundation becomes the thing everything else depends on.
There's also a structural advantage that comes from running inside the Databricks ecosystem. Because Snowplow integrates natively with Databricks and Unity Catalog, the behavioral data it produces can interact directly with the governance layer that CustomerLake is built on. Third-party behavioral data vendors operating outside the lakehouse can't do the same. For teams that want to build on CustomerLake with full confidence in their identity resolution and segmentation, having first-party behavioral customer context that's already in the lakehouse is a meaningful starting point.
Snowplow CEO Alex Dean wrote a deeper analysis of what CustomerLake signals about the broader competitive shift underway in the marketing technology market, including why this isn't really a CDP story and what it means for the decisioning layer more broadly. Read it on his Substack →
Jon Malloy at DAIS: The missing layer between your lakehouse and Genie
On Tuesday afternoon, Snowplow's Jon Malloy presented "The Missing Layer Between Your Lakehouse & Genie: Customer Context with Snowplow," a session that addressed a problem many Databricks customers hit when they start deploying Genie in production.
Genie handles structured, well-defined data questions well: revenue last quarter, signups this month, highest-return products. The questions it struggles with are behavioral: why did conversions drop last Tuesday, which users are showing churn signals, what is a specific customer's intent right now?
Those questions don't fail because Genie isn't capable. They fail because the behavioral data needed to answer them was never captured, validated, or structured for AI consumption in the first place.
Jon's argument: agents are only as good as the data they reason over, and Genie is no exception. To answer behavioral questions reliably, Genie needs three things. First, validated event schemas, so agents understand what each event means and don't misinterpret ambiguous fields. Second, curated behavioral profiles, pre-built semantic models that expose only the relevant signal rather than flooding agent context with raw clickstream noise. Third, governed definitions are loaded into Genie's Knowledge Store, so metrics like "engaged user" or "high-intent session" are consistent across every query rather than re-interpreted each time.
That's the context Snowplow provides. Every event is validated against a schema at collection. Pre-built data models translate raw clickstream into behavioral profiles that land in the lakehouse ready for Genie to reason over. The result is a flywheel: better-governed input data produces more reliable Genie answers, which drives broader adoption, which creates the investment in better data.
The session title gets at something real: Genie is a capable agent, but its usefulness in answering customer and behavioral questions depends almost entirely on the quality of what's underneath it. That customer context doesn't come for free.
Other highlights from the Databricks team
Several other announcements shaped the week.
Genie One (GA) is Databricks' agentic coworker for internal business teams, now generally available across web, iOS, Android, Slack, and Teams. It goes beyond answering data questions to scheduling tasks, alerting on anomalies, and producing documents and reports. Jon Malloy's session covered the ground-level reality of deploying Genie in production: the behavioral data layer that has to exist before Genie can answer customer questions reliably. That context applies directly here: Genie One is only as useful as the schemas, profiles, and semantic definitions underneath it.
Genie Ontology is Databricks' approach to the grounding problem at scale. It automatically extracts and continuously updates business knowledge from Databricks and connected workplace tools (Google Drive, Slack, Jira, Confluence, SharePoint) to give Genie more accurate answers at lower token cost. The behavioral event definitions and semantic models Snowplow publishes can feed directly into this layer, keeping Genie's understanding of customer behavior current as product behavior evolves.
Unity AI Gateway addresses the cost and governance concerns that came up repeatedly in conversations throughout the week. As agentic systems consume more compute, teams need visibility into AI spend, hard caps, and guardrails against PII exposure. Unity AI Gateway brings models, agents, tools, and MCP services under a single governance layer, the same one already covering data assets in Unity Catalog. This matters for Snowplow customers because the behavioral data flowing through Snowplow's pipeline is governed from the point of collection, which gives Unity AI Gateway a clean, auditable data source to work with.
LTAP and Lakehouse RT collapse two infrastructure tiers that have historically required separate systems. LTAP (Lake Transactional/Analytical Processing) stores Postgres-native transactional data in Delta and Iceberg format from the moment of write, removing the ETL pipelines that have traditionally connected operational and analytical systems. Lakehouse RT adds sub-100ms query latency on governed tables. Both move the lakehouse closer to the real-time behavioral data infrastructure Snowplow customers are already operating at the event level (or ‘layer’).
Lakebase, Databricks' serverless Postgres database, expanded with cross-region disaster recovery, git-style branching, and a hybrid vector search beta. Now processing 12 million database launches per day, it continues to be the operational database backbone for teams building applications on top of the lakehouse.
Looking ahead
Ghodsi closed his keynote with a line that put the week in context: "The applications are not going away. The databases are not going away. What's going away is that you're not going to interact with them directly anymore."
That shift is already in motion. Agents are handling more of the interaction layer. The question for every enterprise is what those agents are reasoning over, and whether that data is trustworthy enough to produce reliable outcomes.
That's where Snowplow has always operated. We validate behavioral data at collection, before it reaches the lakehouse. We publish semantic models that translate raw events into governed, business-readable profiles. With CustomerLake now adding a native decisioning and activation layer on top of that same foundation, teams have a clearer path from first-party behavioral data to agentic marketing than has existed before.
The open question for most teams is not whether to build on the agentic stack. It's whether the data underneath that stack is good enough to make agents reliable in production. That's where we'll keep focusing.
Want to learn more? Spin up a free trial or get in touch to see how Snowplow works inside the Databricks ecosystem.