Blog

Improving data discovery with Snowplow: The new Tracking Catalog

By
Phil Western
November 2, 2022
Share this post

Data discovery is a thorny issue.

Today's data platforms involve massive technical complexity, but with this comes the need to ensure everyone in the company can find and understand the data they need. Without this, all you have is a very expensive and complicated silo.

Our new Tracking Catalog release is another step on our journey to help all team members understand what events, entities, and properties are being tracked with Snowplow and - crucially - how different events are connected.

What is Tracking Catalog?

Tracking Catalog is a list of all the events, entities, and properties that went through your pipeline in the last 90 days. 

When you click on an event, you can see all properties associated with it and their structure, as well as an event map, which shows which entities are tracked with each event. 

How does Tracking Catalog help with data discovery?

Visualizing events

Entities are generally business-level concepts, such as a product or user, that the business wants to consistently track across events. This way of using entities is a very powerful aspect of Snowplow data, as it prevents inconsistencies in the grammar of your tracking.

Using entities across events also helps data teams maintain backward compatibility as changes to entities can be easily managed across every event in one go - as opposed to searching out every instance to make a small change. 

The Tracking Catalog enables teams to easily visualize this interconnectedness and understand the tracking holistically.

Connecting the whole team

Another huge advantage of Tracking Catalog is that it allows less-technical team members to contribute to and understand the wider structure of Snowplow data and the constituent parts of each event. If a Data Product Manager, for example, is taking stakeholders through the pipeline, Tracking Catalog can help them explain the events using clear visuals and event descriptions written in plain English.

Before this, data teams could be a bottleneck, as only a few specialists could fully understand the wider context of the live events by looking at the JSON schemas and the raw event data.

With effective data discovery, teams can go from this:

to something more like this:

Who can use Tracking Calalog?

At present, Tracking Catalog is only available to Snowplow BDP customers on Ascent and Summit tiers.

If you are an existing customer and want to upgrade your Snowplow tier, contact customer success.

If you are not yet a Snowplow customer, book a chat or schedule a demo.

Is Tracking Catalog a type of data catalog?

There are similarities and differences between Snowplow’s Tracking Catalog and a data catalog.

Here are some of the key distinctions:

Tracking CatalogData CatalogsPart of Snowplow BDP. Built with Snowplow users in mind, and maps to the unique logic of our tracking - e.g. mapping entities across events.Third-party software built to track metadata on a whole data platform. Not made with a specific product logic in mind.Reads your live stream directly from the JSON schema, allowing the wider team to access previously tribal knowledge.Generally scrapes data from the warehouse in order to display the metadata, such as column names, to end users.Reflects the data before validationReflects the data in the storage destinationDesigned for discovering what is being tracked A two-way approach where metadata can be inserted and tagged

Where are we going next?

Solving problems for Data Product Managers

The Data Product Manager is an emergent profession that has come from the increasing complexity of data platforms and the need to treat data with the same consistency and sense of ownership that we treat product development.

We have identified 5 challenges that Data Product Managers (DPM) face and are working on iterating solutions for greater ownership of data projects (see the infographic below).

Tracking Catalog is a solution for the first DPM challenge: finding out exactly what is being tracked. 

There is also a Tracking Plan in the works, which will solve for ‘Data Evolution’, or growing your data in a considered and systematic way from first principles.

Compliance, value and trust (steps 3, 4 and 5) are already key to our product, due to the way Snowplow allows data to be created from scratch, rather than being extracted from poor-quality sources, but we will keep rolling out improvements to help teams in these areas.

Learn about Data Creation with Snowplow.

The 5 key problems for Data Product Managers

Summary

Data discovery revolves around ease of access, intuitive, human-centered design, and allowing for effective collaboration between teams.

Even though the role of Data Product Manager is still relatively new, the concept of taking full ownership of your data pipeline is something which affects every organization.

Subscribe to our newsletter

Get the latest blog posts to your inbox every week.

Get Started

Unlock the value of your behavioral data with customer data infrastructure for AI, advanced analytics, and personalized experiences