The best-in-class tools for web analytics
When it comes to a web analytics stack, one size doesn’t fit all. As we mentioned in our last chapter, breaking out of a packaged analytics solution to build a modular stack from best-in-class tools will put you back in control over your data and data infrastructure.
Building a data stack like this opens up opportunities to do more with your data. But it isn’t easy. It means finding, researching and evaluating a number of vendors to find the tools that work best for your business.
To make it easier, we’ve compiled a list of key tools to consider when building out your web analytics stack. It’s not exhaustive, but a combination of these solutions will put you in a good place for leveraging behavioral data from web and other sources.
Data warehouse
One of the best ways to start making the most of your web and behavioral data is to load it into a data warehouse. This allows analysts to not only slice-and-dice the data in any way they wish, but it will also scale up with the data volume increases over time. The best data warehouses also have great marquee features such as integrations into other analytical products and services, and extra capabilities such as ML or querying semi-structured data (such as JSON data).
Check out this post from Poplin for a more in-depth comparison of how the major data warehouse solutions compare.
Redshift is what started the popularity of cloud hosted data warehouses, launching in 2013. It's ease of use and low cost (compared to popular on-prem solutions available at the time) drove huge adoption of Amazon's data warehouse. It has struggled somewhat in recent years to keep up with the innovation of its competitors, but with new RA3 cluster types (which separate storage and compute, which had previously been tightly coupled together) and recent feature announcements such as Redshift ML and the SUPER data type (with fuller JSON support than ever) are making Redshift a more appealing choice again. Tight integration with AWS services (such as S3, Sagemaker and Glue) and reserve pricing for predictable cost forecasting are also big selling points.
Google's cloud data warehouse (which was developed for internal use for a long time to analyze Google's search index) now is available as a pay-as-you-go web service (DWaaS). With great integrations into the rest of GCP (Google Dataflow, Google Cloud Storage, Google Cloud ML etc) as well as the Google marketing stack (Google Ads/Search Ads 360, Doubleclick, Ads Data Hub etc), BigQuery is great service to act as the center of all your marketing and customer data efforts. It also has good support for nested or repeated JSON records, supports real-time ingestion (through Streaming Inserts) and even has support for running ML algorithms with BQML.
Snowflake is a cloud data warehouse with some very powerful and unique features, available on all 3 of the big cloud platforms. It separates storage and compute (similarly to BigQuery) but allows further control by having separate Virtual Warehouses, which can all be different sizes and suited for different purposes. Since the data is stored separately from these Virtual Warehouses, this means Snowflake is probably the most scalable of all commercially available data warehouses on the market, and we see our customers with highest volumes generally moving to Snowflake. Snowflake also has excellent support for semi-structured JSON or XML data through its VARIANT data type – meaning Snowflake can also act as a data lake, popularizing the Data LakeHouse framework.
Data Visualization
For most users, staring at a large and unwieldy table of numbers can be daunting and hard to understand. In order to relay insights and findings to other stakeholders in the business, your web analytics stack needs good visualization capabilities
Google's free data visualization tool. More of a dashboarding tool than a BI tool, Data Studio connects well with services in the Google marketing stack (Google Ads, Google Search Console, Doubleclick/Google Marketing Platform etc) as well as tight integration to Google BigQuery. If you're heavily in the Google stack, this is a great starting point for dashboarding your data.
Google's enterprise BI tool is aimed at companies who want to enable self-serve analytics across their organisation. Its proprietary data modelling syntax LookML allows for analysts to define a metric once and let it be used by all end users throughout the business. It's specifically designed for cloud data warehouses and takes advantage of their performance. Currently considered best in class, though it does leave something to be desired when it comes to the flexibility in terms of different visualizations it can do.
One of the major players of the BI space for a number of years, Tableau is enterprise-ready and leads the industry in its capabilities for drag-and-drop visualization building. Tableau is the most capable in the space for creating custom visualizations, and since it is a low-code to no-code approach it's generally very easy for traditional BI analysts to use. Tableau leans heavily on a legacy approach of loading Tableau data exports onto its own servers to power its dashboards, but is rolling out new features to enable more cloud native approaches to data visualization.
Built on the Microsoft BI stack that has been popular for decades, this Windows only BI tool is popular with Excel analysts. Despite this, it also has powerful data modelling capabilities (through Power Query and its data modelling language M), and is flexible enough to work with the popular cloud and on-prem data warehouses. A very affordable price tag also makes this a good choice if you want to start small and scale up.
While they haven’t been around in the BI market for long, Holistics offers a powerful combination of data governance, ELT/transformation and visualization capabilities in a single attractive product. Entirely web based, this service is built from the ground up for the cloud, and utilizes the performance of cloud data warehouses to ensure speedy dashboards. This is a great tool if you're looking at a modern, all-in-one, cloud native dashboarding and BI solution.
Further reading: Snowplow and Holistics
Data Monitoring
With the increasingly large volume and diversity of data flowing through your website and into your points of analysis, it's more important than ever to monitor your data quality at every stage. These tools check and alert on your data quality across various points of your data lifecycle.
A great tool for running automated scans on your website(s) to audit and monitor your tagging set up. By default will crawl every page and log every tag that fires on that page, but custom user journeys can be added (such as checkout flows, product interactions etc) and it will alert if at any point tags stop firing or start firing incorrect or unexpected values. An enterprise level piece of software, with a price tag to match.
Iteratively helps teams catch analytics bugs before they hit production so you don’t have to worry about bad data downstream. The product consists of two parts: an intuitive web app where analysts, PMs and marketers can create and evolve their tracking plan (ditching their spreadsheets), and developer tooling for engineers to quickly and easily instrument tracking with type safety and auto-complete. They work hand-in-hand to ensure event tracking is implemented accurately and that the tracking plan is always enforced.
Great Expectations is an open-source framework that allows for automated tests run against your data in your data pipeline. From simple tests such as checking a column for unique values to more complex assertions, such as seeing if a value is within 2 standard deviations of the median value for the entire column. GE can run all sorts of tests on your data as it is ingested and transformed. We use it at Snowplow in our latest V1 data models for BigQuery, Redshift and Snowflake.
Tag Management
Deploying tracking to your website is central to your data collection, data quality and data privacy strategies. Tag management systems make it more straightforward to do this at scale, and with the flexibility required to track all customer interactions.
Google's popular free tag management solution (also available as a paid solution, GTM 360) is primarily aimed at marketers and analysts. This hugely popular solution has templates for common tag types, and is extensible through custom templates. For many this is the default choice in the industry.
Tealium's enterprise tag management system is aimed at organizations that want more high-end features, such as granular access controls and deployment workflows and a more developer friendly experience. It also integrates into Tealium's CDP product.
Formerly known as DTM, this is the go-to choice if your infrastructure sits in the Adobe ecosystem – Adobe Analytics, Adobe Target, Adobe Experience Manager, and so on.
Testing/Debugging
When debugging any web implementation, it's important to be able to see what the browser is doing and what data it is sending where and when. These Chrome extensions cover the dataLayer, common web analytics solutions, and help spot common installation issues, as well as allowing you to see if the data being sent is correct. This should be included both during implementation (before publishing to production) and when investigating any issues.
Google Tag Manager Assistant Chrome extension
Snowplow Inspector by Poplin extension
Analysis tools
Beyond visualizing your behavioural data (in dashboards and reports), there are higher-level analyses you may want to run over your data. BI tools and dashboarding solutions struggle to perform statistical analysis such as predictive models, forecasts and dynamic segmentation models. These are a couple of programming languages and packages aimed specifically at data scientists and statisticians to get you started.
R & RStudio
Python
Data Transformation
In order to perform any analysis or generate any reports, your data will need preparing. Transforming your data in a modern cloud data warehouse is a great way to do this, as it is performant, cost effective and can easily scale up with your data volumes. There are some great tools available to orchestrate this in-warehouse pipeline.
Dataform has recently been acquired by Google Cloud, and is now focusing on BigQuery specifically. Built on Typescript and Node.js, Dataform works most entirely in the browser (though there is an OS CLI tool) which provides instant compilation, automatic dependency inference, custom Javascript functions for repeating common tasks and scheduling to run your ELT pipelines inside BigQuery. It is also likely to get a lot focus and development from Google Cloud in the coming years.
dbt - Redshift, Snowflake, BigQuery, PostgreSQL
dbt have built a huge community of open-source users, bringing analytics engineering to the masses. dbt is open source and based on Python, and supports all the major cloud data warehouses. Given its popularity and usage across the industry there are lots of packages for common tasks (including a popular Snowplow package). dbt can also be self-hosted with no licence fee, but there is also dbt Cloud which can be used in the browser.
Data management
Snowplow is the leading platform for behavioral data management, including web data. For data teams looking to get more from their behavioral data, Snowplow offers unrivalled control and flexibility over your data set, as well as complete ownership of your raw, unopinionated data.
While this list isn’t exhaustive, we hope it helps to get you started on your journey to a more complete stack for web analytics. Once in place, your data stack should evolve with your business, setting you up for success for near-term goals, as well as for future aspirations. For this reason, although it takes time, effort and investment to piece together a stack that’s effective for modern web analytics, the hard work will be worth it in the long run.