A guide to data governance: What it is, and why your organization needs it
Delivering well-understood, accurate, and complete data is a challenging task for most organizations, and its difficulty increases as organizations grow. To combat this challenge, companies are investing an abundance of resources into data governance.
A recent study from McKinsey & Company found that companies, on average, are investing between 2.5% and 7.5% of their IT spend on data governance. For a midsized financial institution, this is estimated to be between $20 million and $50 million. Data governance aids organizations in delivering high-quality data users can easily understand and leverage.
What is data governance?
There are a variety of definitions out there for data governance. A simple Google search will have you feeling overwhelmed by the number of different explanations. Here at Snowplow, we find this definition from Gartner useful:
“Data governance is the specification of decision rights and an accountability framework to ensure the appropriate behavior in the valuation, creation, consumption and control of data and analytics.”
Data governance is arguably one of an organization’s most valuable practices. With more teams consuming data across businesses, behavioral data is no longer siloed in digital marketing or other teams. With data governance, specific people and teams own data use cases and are responsible for the data’s output and impact on the organization.
Strava uses a single source of truth to empower multiple self-serving analysts, thanks to Snowplow.
Why is it important?
Over the past 10 years, companies have begun to move beyond basic reporting to using data to drive competitive advantage. To rise above the competitors in their space, a company’s data needs to be well understood, accurate, and complete. Companies need a unified organizational effort and strong data governance in order to achieve these goals.
Strong data governance practices improve data accessibility for users in your organization. With data governance, users consume data in an actionable format, and various departments such as product teams, marketing, and customer support are empowered to use the data to make crucial business decisions.
Without data governance, there is a substantial chance your organization’s database or CRM contains flawed (inconsistent, duplicate, incomplete) data. When data is inaccurate, users lose trust in the information they interact with daily. Trust is a key component of why data quality matters. When data consumers trust the data, the return on investment from your data increases because decision-makers can act confidently when drawing insights from the data.
Data governance increases the value of data to your organization the full potential of your data comes to life. Businesses go from working with data in a limited scope to using data to drive various use cases that bring real value to their team.
Four goals of data governance
Data governance is an essential part of ensuring success with data in the organization. Data governance also aids companies in achieving the following goals:
1. Data is well understood
For data to be well understood, it needs to be structured in a way that allows employees across the company to use it. This is done through either well-thought-out documentation or employee training videos or seminars.
Data also needs to be tracked and processed to ensure the data can fulfill its potential. In our opinion, well-understood data is an underrated aspect of data governance that we focus on here with our product. Creating a single source of truth enables you, as an organization, to work with a single data set that can be actioned across the entire business. We strongly believe that this starts with data collection.
2. Data is accurate and complete
By accurate data, we mean the user performed the actions described by the data, not a different set of actions. An example of this could be a user’s time spent on a website. Did a user on your site spend five minutes on the page, or did they leave the tab open while visiting a different page in another tab?
By complete data, we mean the users didn’t perform an additional set of actions not captured in the data. With complete data, there is no information missing or lost, and it implies you have a clear understanding of events to analyze data.
“Data quality is an urgent issue, it’s time consuming and painful and slows us down the most. We find that it’s important to detect and surface data quality issues, make them easier to visualize and more transparent, not fix them.”Rahul Jain, Principal Engineering Manager, Omio
3. All data is compliant
Data compliance is a really important topic for businesses because users are becoming warier of their privacy and how their data might be used by private companies. Compliance will vary country by country or state by state, depending on where data is collected and stored. Regulations such as the GDPR and the CCPA are forcing companies to adapt to stricter data governance policies. These stricter policies are forcing companies to clarify the following questions:
- What data is collected?
- From whom?
- For what purpose?
- How can that data be used?
Organizations not only have to comply with regulations but also need to respect and take responsibility for their users when it comes to tracking their information and working off their consent. Ideally, businesses should not send their users’ information off to third-parties.
Further Reading: Rethinking your data collection strategy
How we think about user identification and privacy at Snowplow
“We have two enrichments (IP anonymisation and PII pseudonymization) which will anonymise the data at the enrichment stage and hence before the data is stored in your data warehouse.“Mike Jongbloet, Product Designer, Snowplow
4. Data is trusted within your organization
Data governance is trust-based, and not all business decisions assume that all information is equal, since the data ecosystem is dynamic and continuously evolving. Third-party data may be coming in from an external source, and because of this, the trustworthiness of that data is not on the same level as primary data that is captured directly from the user. This leads to users’ not trusting the data and using it to make decisions. If people do not trust the data, they will not work with it moving forward.
One of our clients, Omio, was able to build a quality first data culture where data quality was ensured up front. They use Snowplow to improve their data quality across the organization.
Further Reading: How Omio took drives a quality first data culture
5 best practices of data governance
We strongly believe that data governance is essential for increasing the value of data at your organization. Gartner recommends a list of several data governance best practices when implementing a strategy within your own organization.
1. Align data governance to specific business outcomes
It is crucial to align your business strategies and priorities with your data governance framework. More often than not, data governance efforts are not associated with business priorities. When data governance is not connected with business priorities, data leads are often not heard when it comes to decision-making.
Aligning data governance with specific business outcomes is attainable for organizations. It can be done by laying out attributable business metrics to stakeholders who work with data. Data governance decisions should reference both business and data metrics while relating those decisions to business goals.
Let’s say a marketing team does not trust the data they’re working with; they are not only unable to base their decisions on the data but are likely to take their data elsewhere. When data is siloed off, it is much harder to unify data within an organization, ultimately, leading to game-changing use cases never being turned into a reality.
2. Ensure a data governance strategy is rooted in accountability
Basing a data governance strategy on accountability will assure stakeholders that the right governance processes are in place. When users are accountable for their actions with data governance, users’ confidence in the quality of data will increase. This will lead to increased data productivity across the organization.
To achieve a data governance strategy rooted in accountability, it is best to start with evaluating the current accountability model your company follows for data governance. Compare that model with the way your data decisions are currently made in your organization, and then understand why you do so.
Once you figure out the difference between what is happening and what should be happening, you can identify the business impact and take action on your current accountability model. Here are some steps your organization can take to ensure data governance is rooted in accountability:
- Create a centralized team or person responsible for data governance in your organization.
- Work with internal stakeholders to agree on an approach to data governance.
- Work with tools to deploy your data governance model that works for both systems and people.
- Review and reassess your data governance strategy as often as possible.
Further reading: Data governance increasing data productivity
“You need someone internally to take responsibility for the initiative. Great analytics isn’t plug and play – but once you get it set up properly with Snowplow, it just works!!”Ty W, VP Operations, Anon
3. Ensure data governance operations are transparent and ethical
Your governance operating procedures should be connected to the principles of your digital ethics, which are clear and are communicated throughout the organization. When it comes to data governance, it is important to be as transparent as possible with your users and to gather consent before tracking their information.
Under the GDPR, site visitors expect that companies ethically handle their data. Users should be able to request all data that a company has stored about them and should have the ability to delete all of their accounts and data.
To achieve this, be transparent with handling users’ personal information, and follow clear and consistent governance policies. While first-party cookies allow you to create a better user experience, it may be time to rethink your tracking strategy in a world where users are cautious with their privacy.
Websites now feature cookie banners explaining data requirements and give users the ability to either opt in to or out of cookies. With privacy a top-level concern for users, it’s time to ask yourself whether site visitors want their information sent off to third-party vendors to be tracked as they navigate the web.
“We recommend setting your collector to track from a first-party domain so it can set a first-party server-side cookie; these cookies are unaffected by prevention methods such as Safari’s ITP and Mozilla’s ETP.“Alex Denne, Product Marketing Lead, Snowplow
Further reading: How to leverage cookieless and anonymous tracking with Snowplow (You can still track your users without holding personally identifiable information)
4. Train employees involved in data governance to improve their efficiency
While training will vary across your organization because of employees’ different roles, all employees who work with data should be trained in some capacity. To support data governance at your organization, education and training webinars should be conducted and should be an ongoing process. Information should be up to date for employees, and training should be measured through metrics to ensure that employees understand the material. Your organization should make data governance documentation accessible to employees and keep a centralized tracking plan.
For data governance to be effective at your organization, employees should know what to do, why it needs to be done, how to do it, and, most importantly, want to do it. This especially becomes important when it comes to setting up tracking in a consistent way across your organization.
Further reading: How to build a future proof approach of data collection
Who should really own your tracking plan?
5. Create a collaborative data governance environment for your organization
Data leaders should be brought into collaborative data governance initiatives throughout the organization; that way, it is not siloed in one part of the organization. To make data governance a collaborative task, it is best to assist employees in understanding the role they play in data governance.
To achieve a collaborative environment, organizations have relied on storytelling, leading to a more productive environment. When organizations tell stories, it connects with personal experience, their roles in the business, and their feelings regarding the value they contribute.
Organizations should empower individuals to take ownership of data governance and explain the importance of upholding it within the organization. It is also important for organizations to encourage a culture of transparency around what data governance looks like in the organization.
“Snowplow is open source, which means that we can have confidence in it; we can look at the code and figure out what’s going on or change things. We want to be able to control and own all of our data.”Rahul Jain, Principal Engineering Manager, Omio
Getting started with data governance at your organization
Data is far too valuable for your organization to not be effectively governed within your organization. The data governance framework you lay out today will guide your organization down the path of improving the quality and accessibility of data their users consume daily.
How organizations govern their data drives the value and productivity of data. If it is done effectively, the data is extremely valuable to end users because they trust it to make business decisions. If data governance is done poorly, users don’t trust the data and don’t use it to make decisions. To increase the value of your data, data governance is a must for your organization.
I found Snowplow after one year of trying to work with third party solutions, but now I am finally the owner of all my data. They helped me implement best practices in my data layer and my data pipeline.Pedro Gemal Lanzieri, CTO, PEBMED