Blog

What is a Data Flywheel? A Guide to Sustainable Business Growth

By
Adam Roche
&
Yali Sassoon
December 5, 2024
Share this post
“The important thing about the chat bot, the customer service, is partly about the fact that you can automate. But it’s mostly about the data flywheel. You want to capture the conversation. You want to capture the engagement in your data flywheel. It’s going to create more data, of course. We’re probably seeing data expanding about 10x every five years. Wow.”

Jensen Huang, CEO, NVIDIA

Organizations today are in a race to harness the power of artificial intelligence (AI). But many haven’t recognized a powerful framework that could amplify their success: the data flywheel. 

While some organizations may have implemented aspects of data flywheels without labeling them as such, understanding this framework is crucial for any organization looking to build sustainable competitive advantages through AI. 

As a result, the concept, championed by industry leaders like Jensen Huang of NVIDIA, is starting to gain attention. 

In this article, which draws key insights from Yali Sassoon (CTO and Co-founder of Snowplow) and his talk on data flywheels at Data Science Connect, we’ll delve into what a data flywheel is and how it’s transforming industries. 

We’ll explore real-world examples of successful implementations and offer practical steps for building your own data flywheel so you can drive continuous improvement and business growth. 

Let’s dive in.

What is a Data Flywheel?

High-level illustration of a data flywheel

A data flywheel is a strategic framework for creating sustainable competitive advantage through AI applications. 

At its core, it's a self-reinforcing system where proprietary data - your organization's "special fuel" - powers AI to deliver valuable services, whether that's handling customer support, optimizing content placement, or designing advanced hardware like GPUs.

What makes it truly powerful is the flywheel effect: as your AI provides these services, you capture detailed data about both its operations and impact. 

This new data becomes additional proprietary fuel that feeds back into the system, enabling your AI to continuously improve its performance in ways that would be difficult for competitors to replicate with generic data sets.

Think of it like a wheel with unique properties. The initial push requires significant effort - gathering your proprietary data and implementing AI services. But once the wheel starts turning, each rotation generates richer and more valuable data about your AI's performance and impact. 

This data feeds back into the system, making the next rotation even more powerful. Over time, this creates an exponential growth cycle where your AI's effectiveness increases with each iteration, while the proprietary nature of your data makes it increasingly difficult for competitors to catch up.

The key difference from basic AI implementation is that a data flywheel doesn't just use data - it systematically captures the results of AI operations to generate even better data, creating a compounding advantage that grows stronger over time.

Why Do Data Flywheels Matter?

1. Sustainable Competitive Advantage

"The flywheel of machine learning is the most important thing. A lot of people don't even realize that it takes AI to curate data to teach an AI. We have agents helping design chips - Hopper wouldn't be possible, Blackwell wouldn't be possible."

Jensen Huang, CEO, NVIDIA

Firstly, data flywheels are proven to help organizations create a sustainable competitive advantage—a rare feat in today's hyper-competitive market. Organizations like NVIDIA are using data flywheels to gain an incredible edge.

NVIDIA has accumulated more chip design data than any other company in the industry. By being an early adopter of using large language models to support chip design, they've been continuously expanding this proprietary data advantage. 

The company leverages this vast repository of historical design data to fuel AI-driven chip development, creating a powerful positive feedback loop: their AI systems improve chip design, which generates more valuable data, which in turn makes their AI systems even better. 

This creates an almost insurmountable advantage over newcomers, who face a double disadvantage: they not only lack the historical design data needed to train competitive AI systems but also can't match the sophisticated AI-enhanced design capabilities NVIDIA has developed over years of iteration. 

This data advantage compounds over time, making it increasingly difficult for new entrants to compete effectively in the GPU market.

2. Improved Performance and Efficiency

Data flywheels are also capable of generating significant performance and efficiency gains. Let’s take Klarna as an example. 

Klarna was one of the first companies to leverage AI for customer service. It built an AI assistant using OpenAI’s Large Language Model (LLM), capable of handling support conversations in over 35 languages. And the results have been impressive:

  • Reduced response times from 11 minutes to 2 minutes
  • Maintained high customer satisfaction levels
  • Saved $40 million annually
  • Replaced 700 human agents while improving service quality

So, what makes this a data flywheel example? 

Well, Klarna initially trained its AI using its own customer service data, like past conversations and service documents. Now, each new customer interaction generates more data for the AI to learn from. 

The more customers it serves, the better it gets, giving Klarna a level of customer service that’s tough for competitors to match.

3. Continuous Learning and Improvement

Data flywheels improve your AI’s performance over time. The scale of this improvement is significant. As Jensen Huang notes, these systems are driving data expansion “about 10x every five years” through continuous feedback loops. 

We’re seeing this impact across multiple industries. In his talk, Yali shared how one media organization evolved their capabilities: starting with algorithms to measure content monetary value, they progressed to optimizing content placement, then automated content positioning through news cycles to maximize revenue. 

This eventually led to sophisticated algorithms for paywall decisions and dynamic subscription pricing.

Going back to our original example, NVIDIA's GPU design process shows the ultimate potential. Their system continuously learns from countless design iterations, making it nearly "impossible to design new generations of GPUs without AI assistance." 

The system gets smarter with each iteration, creating a compounding advantage that grows exponentially over time. Each interaction generates new data that feeds back into the system, creating a virtuous cycle of improvement that is challenging for competitors to match. 

Just as compound interest grows wealth exponentially in finance, compound learning through data flywheels grows competitive advantage exponentially in AI. 

Building Your Data Flywheel: 5 Essential Elements

1. Proprietary Data Foundation

Every effective data flywheel starts with rich proprietary data. This information is unique to your organization and is difficult for your competitors to replicate. 

And the opportunity is growing significantly - as Jensen Huang notes, we're seeing organizational data expand "about 10 times every five years." This makes it critical to start capturing and leveraging your proprietary data now, before the complexity and volume become overwhelming.

This might include: 

  • Customer behavioral data
  • Operational data related to manufacturing, supply chain and fulfilment
  • Design data (product design, machine design)
  • Knowledge bases (documentation, manuals)
  • Market research

Remember: this element is paramount. We’re in a world where everyone is using the same models provided by the same companies - from Llama to Mistral through to Gemini and ChatGPT. Consequently, businesses need to think about what makes them unique and what proprietary data they can tap into to differentiate themselves from the competition. 

2. Real-Time Architecture

Once you have identified the proprietary data you want to gather, it’s time to think about your data architecture. 

For your data flywheel to thrive, you need to have real-time architecture in place. Why is this? Well, data flywheels are all about momentum. The faster you can move through the observe-analyze-act cycle, the faster your system learns and improves

A robust real-time architecture enables you to:

  • Make instant decisions using the latest user data
  • Capture the impact of those decisions immediately
  • Feed results back into your AI systems quickly
  • Stay ahead of competitors by learning faster
  • Scale your operations without losing speed

The good news here is that it’s easier to build a real-time architecture than ever before. This is thanks to the growing number of modern tools and technologies (such as Delta Live Tables, Kafka, and Confluent) that have simplified the process of building these high-speed data systems.

3. Feature Pipeline Development

Having a vast amount of raw, unstructured data stored in a data lake isn’t enough for your data flywheel. You need to turn it into something meaningful that your AI can learn from. 

In his talk, Yali demonstrated this with a customer email example: rather than just collecting entire emails, their system breaks them down into specific features like humor, voice, length, and social proof. This structured approach helps the AI understand what works and what doesn't.

A good feature pipeline helps you:

  • Break down complex data into learnable characteristics
  • Create clear metrics to measure success
  • Help AI systems learn faster from each interaction
  • Test and improve your models quickly

Think of it like teaching a new language - instead of showing someone entire books, you start with basic vocabulary and grammar. Similarly, feature pipelines break down your raw data into "vocabulary" that your AI can understand and learn from.

 4. Data Quality and Governance

"Data doesn't rain down from clouds," as Yali explained - it must be actively manufactured with quality and governance built in from the start. This is crucial now that AIs are making more decisions than humans in many systems.

Quality and governance needs focus on:

  • Ensuring both humans and AIs understand exactly what the data means
  • Building trust through consistent data quality
  • Maintaining clear rules about how data can be used
  • Following compliance requirements throughout all processing

Think of it like manufacturing - you wouldn't build a car without quality controls. The same applies to your data flywheel: quality and governance aren't extra features, they're essential foundations.

5. Comprehensive Action Tracking

Lastly, for your data flywheel to succeed you need comprehensive tracking in place. It’s no longer a case of understanding customer journeys through user behavior - you also need to track what your AIs do. 

It’s important to note here that this element is often missed by organizations. Even though many of them are collecting this data type, they often fail to use it to drive systemic improvements in their application performance. 

Every journey involves multiple AI decisions happening behind the scenes predicting what a customer might buy, personalizing emails, and optimizing timing of communications.

Therefore, your tracking needs to capture:

  • What the customer does (like browsing products)
  • What the AI decides (like product recommendations)
  • How customers respond to AI decisions
  • Whether these interactions lead to success

Think of it like watching a tennis match - you need to see both players' moves to understand the game. Similarly, you need to track both user and AI actions to understand and improve your customer interactions.

Getting Started with Your Data Flywheel

1. Identify your unique data advantages

2. Choose a focused initial use case

3. Build real-time data collection systems

4. Implement quality controls from the start

5. Create clear feedback loops

6. Monitor and measure results

7. Expand gradually based on success

Other Industry Examples to Inspire Your Flywheel

Retail: Building a Smart Shopping Experience

Several of our retail customers are demonstrating how data flywheels can transform the entire customer journey. By layering multiple AI algorithms throughout their shopping experience, they create increasingly sophisticated personalization:

  • Homepage algorithms analyze available inventory, trending items, and user preferences to showcase the most relevant content and offers
  • Search functionality learns from user behavior to deliver personalized results
  • Product pages leverage AI to recommend complementary or alternative items based on purchase likelihood
  • Checkout experiences suggest both related and unrelated items to help customers complete their baskets

What makes this a powerful data flywheel is how each algorithm's learning compounds over time. The deep understanding of customer behavior developed in one area accelerates improvements across the entire journey. 

Success with homepage personalization, for instance, provides valuable insights that speed up the development of checkout recommendations.

Media: Balancing Breadth and Depth

We’re also seeing some of our media customers implementing similarly sophisticated flywheels that balance content discovery with deep engagement. Their systems combine two types of algorithms:

  • "Breadth" algorithms that help users explore diverse content options
  • "Depth" algorithms that guide users deeper into specific topics of interest

These algorithms are seamlessly integrated into the user interface, giving readers the freedom to either explore broadly or dive deep into particular subjects. The system continuously learns from growing volumes of user interaction data, becoming increasingly adept at striking the right balance for each individual reader.

This creates a powerful feedback loop: as users engage more with the content, the algorithms get better at understanding and serving their preferences, which in turn drives more engagement and generates more learning data for the system.

Conclusion

Now you understand the basic elements of a data flywheel and the high level steps to get started, it’s time to put theory into practice. While the initial effort may be substantial, the long-term benefits make it a crucial investment for any organization looking to create a sustainable competitive advantage. 

The key is to start small, focus on quality, and build momentum over time. Remember, every successful data flywheel started with a single turn of the wheel.

Ready to Build Your Data Flywheel?

If you’d like to learn more about data flywheels and how to build one, click here to watch Yali's Data Flywheel presentation in full. 

Over the coming months, we’ll also be releasing in-depth technical content to help you understand how to power your data flywheel with Snowplow data. Stay tuned! 

You can also contact us to learn how we can help you implement a successful data flywheel strategy for your organization.

Subscribe to our newsletter

Get the latest blog posts to your inbox every week.

Get Started

Unlock the value of your behavioral data with customer data infrastructure for AI, advanced analytics, and personalized experiences