The 14 best open source analytics tools
It’s fairly easy to find the top SaaS analytics tools. Just search Google for “analytics tools” and 'bingo'. Finding open source analytics tools isn’t quite as easy.
So here are our top choices broken down by what they do well, as well as any weak spots they might have.
Table of contents
We've grouped the 14 top open source analytics tools into 9 categories:
1. Open source Behavioral Data Platforms2. Open source product analytics tools3. Open Source a/b testing tools4. Open source CDP’s / Reverse ETL tools 5. Open source data validation tools6. Open source analytics engineering tools7. Open source anomaly detection tools8.Open source databases9. Open source data visualization tools
Open Source Behavioral Data Platforms
1. Snowplow
Snowplow / Mixed licence inc. Apache-2.0 / 5.8k stars
The strength of Snowplow is complete ownership of your data and data infrastructure. You have direct access to your granular data and can collect, process, analyze, and store it exactly as you need. Snowplow has trackers and webhooks to pull in multiple data sources and integrates with the main cloud data warehouses.
Snowplow is one of the most used trackers in the world, being used by over 2M+ mobile apps and websites.
The Snowplow Behavioral Data Platform is a transparent-core product, with certain elements of its powerful infrastructure available as open-source software.
Snowplow now has pre-made Data Applications to help customers deliver high-impact use cases faster.
Open source product analytics tools
These are entire platforms that can supersede your packaged SaaS tools and give you end-to-end control and insight into your product data. The overall pluses of these types of tools are control and customization. You have complete access to your data and can decide exactly how the data is analyzed. The downside is that they can be resource-intensive to set up and run.
2. Countly for easy mobile analytics
Countly / Countly GitHub / AGPL v3 license / 4.6k stars
The strength of Countly is easy access to your data with read and write API access and analytics for mobile, web, and desktop. It features a number of open source plugins to help you collect and understand your data better.
The downside of the open source version is that it doesn’t include all the features of the Enterprise paid version. With open source, you miss out on real-time data, user profiles, and the ability to design funnels. The open source version also “stores data (only) in an aggregated format,” so you can’t export the data and perform more granular analysis elsewhere (though this does make reporting faster).
3. PostHog for quick setup of self-hosted analytics
PostHog / PostHog GitHub / MIT license / 3.4k stars
PostHog is a self-hosted, open source analytics platform that allows for extremely easy deployment. You can deploy the tool directly to Heroku in one click. This sets it apart from a lot of other open source analytics tools that have a more involved setup process and require more knowledge to get up and running. PostHog works well for teams new to the open source world.
A weakness of PostHog is that you might be limited if you are building out marketing attribution with open source analytics. PostHog doesn’t currently have email link tracking or ad campaign tracking, so you will be missing a subset of your data when trying to understand your marketing campaigns better.
A note on ‘enterprise scale’ web analytics tools:
If you are looking for open source analytics at enterprise scale then you might actually want to consider a mesh of tools which deliver analytics data into a data warehouse. This is exactly what Snowplow was made for, and is number 14 on this list.
This would then enable you to build competitive advantage based on how you amass high quality data at scale, and activate it within tools built especially for real-time marketing automation, customer engagement and business intelligence.
Open source a/b testing tools
4. Wasabi - A real time enterprise grade a/b testing platform
Wasabi/Wasabi GitHub/ Apache-2.0 license / 973 stars
Wasabi is a real-time, open-source, 100% API-driven, A/B testing platform by Intuit. The open-source testing software allows users to own their data and experiment across the web, mobile, and desktop. Users utilize Wasabi because it’s fast, scalable, and easy to use for organizations of all sizes.
Developers lean toward Wasabi for A/B testing because it is 100% API-driven and can be developed in any programming language and environment. The software has been tested for years with products like TurboTax and QuickBooks.
While Wasabi is a proven open-source platform that can run on your servers or in the cloud, it is no longer under active development or supported by Intuit, as of August 28, 2019.
Open source CDPs / Reverse ETL tools
5. Grouparoo for integrating customer data with cloud-based tools
Grouparoo/Grouparoo Github/ Mozilla Public License 2.0/ 428 stars
Grouparoo is an open-source Reverse ETL solution that makes it easy to send data from your data warehouse to cloud-based marketing, sales and customer platforms like Mailchimp, Salesforce and Zendesk. Grouparoo integrates with any tech stack; you can configure your setup locally, commit changes, and deploy with git – just like how you'd deploy DBT projects. There's also a web-based user interface to support complex configurations.
Grouparoo is a very new solution and therefore doesn't feature as many integrations as its non open-source counterparts in the reverse ETL category. That being said, it's a hugely promising platform with advantages in its privacy and the fact you can fit it into your existing engineering workflow. Grouparoo also has great segmentation capabilities, including a group building tool that can be used by engineers as well as less technical teams like marketers. This can be used to determine which profiles get synced to certain tools and will also create tags or lists in the destination systems.
6. Pimcore for managing digital data
Pimcore/Pimcore Github / GPLv3 license /2K stars
Pimcore was introduced to the open-source world in 2010. The open-source platform assists organizations in managing digital data and customer experience. Pimcore is 100% API-driven, allowing integration into any tech stack. Eighty-two thousand customers across 56 countries utilize Pimcore to manage their data, including, SONY and Pepsi.
Pimcore stores data independently and can provide the managed data to any channel, such as B2B websites, ecommerce systems, and mobile applications.
It is important to know that Pimcore is not an “out of the box” software product and, therefore, is meant for people with software development experience.
Open source data validation tools
These tools have a specific use within your data pipeline. You can add them in as a step within an open source data platform to perform a single function. The plus of these tools is that they perform important operations that you are unlikely to get in packaged SaaS tools. The downside is that they are built specifically for certain purposes—you need multiple tools like these to answer every use case you have.
7. Great Expectations for data validation
Great Expectations / Great Expectations GitHub / Apache-2.0 license / 3.2k stars
The strength of Great Expectations (apart from its amazing name!) is that it allows you to set and assert specific validation rules for your data and be alerted when your data is straying from those rules. You can also automatically create documentation directly from these assertions:
A caveat is that Great Expectations is very new. It has a lot of promise, but key features, such as autogenerated documentation from tests and data profiling, are still experimental.
Open source analytics engineering tools
8. dbt for improved analytics workflow
dbt / dbt GitHub / Apache-2.0 license / 2.2k stars
dbt’s strength is that it allows you to bring general engineering principles, such as version control, testing, and sandboxing, into your data pipeline. You can perform data transformation and business logic without impacting users in separate, collaborative environments.
The limitation of dbt is that it is purely a transformation tool. It expects that extraction and loading will be done by another tool. This is fine, as there are plenty of other tools that can do these jobs in the pipeline, but it’s important to realize this is just one step in a larger process.
Open source anomaly detection tools
9. Hastic for data anomaly detection
Hastic / Hastic GitHub / Apache-2.0 license / 269 stars
The strength of Hastic is its ability to find anomalies in your data and alert you immediately. You set up predefined parameters for possible anomalies in your data, and Hastic will find them if they reoccur:
The limitation here is that Hasitc only works with open source analytics monitoring platform Grafana, so you can’t see these plots in Superset or Metabase. Hastic is also currently lightly documented, so setup and maintainability might be a challenge.
Open source databases
Open source databases allow you to store your data outside of the larger proprietary warehouses. A lot of databases, such as MySQL, PostgreSQL, CockroachDB, MongoDB, and SQLite, are open source, but the two highlighted here are different in that they are engineered to deal with specific types of data and analysis.
10. Apache Druid for real-time DB querying
Druid / Druid GitHub / Apache-2.0 license / 10.3k stars
The strength of Druid is in real-time analytics, where a user is performing multiple queries in rapid succession and needs sub-second answers. If you are working on a product that requires you to analyze data on the fly, then Druid is the right database to choose.
Druid’s lack of fault tolerance has been cited as a weakness, specifically if you are susceptible to network failures.
11. Timescale for time-series querying
Timescale / Timescale GitHub / Apache-2.0 license / 9.8k stars
Timescale’s strength is that it is optimized for time-series data. If you are working with time-series data, such as ongoing product usage, Timescale allows you to perform complex queries on the data.
A weakness of Timescale is that, though the relational database model is versatile, it can be more difficult to get started with. There is a steep learning curve for the tool.
Open source data visualization tools
For any data analysis, you want the ability to query and visualize the data. Proprietary dashboards and business intelligence tools such as Looker, Tableau, or Chartio are extremely popular, but so are some of the open source visualization tools available. These are some of the most starred and forked open source analytics tools out there.
12. Superset for visualizing data in any DB
Superset / Superset GitHub / Apache-2.0 license / 31.4k stars
The main strength of Superset is that it integrates with dozens of modern databases, so wherever your data currently lives, Superset can interface, allowing you to visualize your data. You can also visualize and analyze data from different sources simultaneously.
Superset is not necessarily an “enterprise-ready” tool. There is a challenging setup process, and some cite potential security risks of giving a Docker image access to your data. But it is an extremely powerful tool if you take the time to learn all that Superset has to offer.
13. Metabase for quick visualization
Metabase / Metabase GitHub / AGPL license / 22.9k stars
The strength of Metabase is its simplicity, both in setup (boasting a five-minute setup process) and in the analysis, where anyone on your team can use Metabase to query your data and get answers.
Its strength is also its weakness, in that the simplicity can mean complex querying of your data is more difficult. There is an SQL mode, but this isn’t the main feature of the tool as in other business intelligence tools.
14. Redash for different dashboards for different teams
Redash / Redash GitHub / BSD-2-Clause license / 17.7k stars
Like Metabase, the strength of Redash is in its ease of use. Though you do need some SQL experience to get the most out of the tool, you can easily create visualizations based on your data, and you can create different dashboards for different teams.
Probably, the downside of Redash is that the visualizations and dashboards of Redash aren’t quite as pretty and sophisticated as you can produce with Metabase, and it doesn’t have quite the power of Superset. It also has recently been acquired by Databricks, meaning its future is unknown.
Take control of your data with open source tools
With some, you can get an entire pipeline, from collection to transformation and visualization, up and running in an hour. Others will take your entire data team weeks to configure.
Whatever your use case, it makes sense to explore the flexibility of open source tools. In particular, it's worth taking advantage of thriving open source analytics communities, discourse forums, Slack environments, and Twitter chats to find the best tools for your chosen use case.
Snowplow users frequently integrate many of the above tools in order to open source a data stack. Snowplow’s modular technology can slot into your existing processes, giving the flexibility to leverage Snowplow for multiple use cases.
If you’d like to learn more about how Snowplow’s open core infrastructure can empower you on your data journey, why not try Snowplow yourself?