Blog

How to Send Google Analytics Data to a Different Warehouse Without the High Cost

By
Trent T
July 1, 2024
Share this post

With Universal Analytics now retired, the migration to GA4 is complete for many businesses. But many companies we have spoken to recently have realized too late that Google enforces the use of Google Cloud Platform (GCP) and BigQuery, Google’s data warehouse, to be able to use raw data outside of Google Analytics. This vendor lock-in has been a major headache for businesses already invested in Snowflake, Databricks, and other cloud platforms.

Universal Analytics is no more. And businesses are now facing the full impact of GA4’s limitations and costs, making alternative solutions more critical than ever.

This blog post will guide you through the process of leveraging Snowplow to load your GA4 data straight to Snowflake, Databricks, Redshift, or a data lake.

What is the cost of exporting GA data?

Now that Universal Analytics has been retired, finding cost-effective alternatives to GA4’s data export limitations is more pressing than ever.

Exporting your valuable website and app data into BigQuery, only to export it again to a different destination, is a waste of time, money, and valuable resources. As event volumes scale, the costs become unjustifiable.

The team at HelloFresh have posted a stern warning on this:

“BigQuery gets very costly when it comes to compute and outbound data transfer (egress) if not done mindfully. The processing cost can vary from $5/TB to $7/TB and egress cost can vary from $0.12/GB to $0.19/GB.”

When it comes to working around this obstacle, there simply aren’t any mainstream alternative solutions. Snowflake released a GA Importer but this still just imports from BigQuery under the hood. The root of the issue is GA4 does not offer an API to query for raw data.

That’s why we’ve developed an alternative. You can bypass BigQuery and GCP altogether and send your GA4 data directly to your destinations in real time. This not only streamlines your data pipeline but also gives you more control over your event structure, allowing for deeper analysis.

What is Snowplow?

Snowplow is the next-generation of Customer Data Infrastructure (CDI) that enables organizations to capture, process, and analyze granular customer and event data from multiple sources and channels in real time. It is available as SaaS or can be deployed in your AWS, GCP, or Azure cloud environment

Snowplow is data warehouse/lake-focused and ensures that data lands in a format designed for downstream processing.

Direct-to-warehouse solution

The diagram below shows the steps to land GA data directly in both GA and your warehouse while avoiding BigQuery and GCP costs.

The solution here is to use Google Tag Manager Server-side (GTM SS) to send a copy of your data to both GA and your warehouse (via Snowplow).

GTM SS has been designed to be hosted in your infrastructure, receive events from customers on your website/app, and send them to Google Analytics. Snowplow has also created a tag for GTM SS, which is the perfect place to split the streams in two. GTM SS can be run on any cloud including AWS and Azure, so you don’t need GCP to host it.

Send an exact copy of your data directly to your warehouse

Let’s break down the steps

  1. The GA JavaScript tracker sends data to Google Tag Manager Server-side
  2. GTM SS picks up the request using the pre-installed GA4 client
  3. GTM SS sends data to both GA4 using the native GA4 tag, and your Snowplow pipeline using the Snowplow Tag for GTM SS
  4. Snowplow processes, validates, and enriches the events with extra data points
  5. Snowplow streams the data to your real-time stream (Kinesis, Kafka, Pub Sub)
  6. Snowplow sends the data to your data warehouse
    1. For Snowflake, we recommend using the Snowflake Streaming Loader
    2. For Databricks or any data lake, we recommend using the Lake Loader
    3. For Redshift, use the RDB Loader
    4. If you use BigQuery but you like our capabilities better, we also support BigQuery

Our enterprise customers value the fact that this solution can be deployed to process all data in their own cloud for Personal Identifiable Information (PII) and compliance purposes (e.g. GDPR).

Test this solution out today with Snowplow Community Edition or reach out if you want a quick demo.

Subscribe to our newsletter

Get the latest blog posts to your inbox every week.

Get Started

Unlock the value of your behavioral data with customer data infrastructure for AI, advanced analytics, and personalized experiences