Start creating behavioral data faster with Snowplow BDP Cloud.
Join the waitlist for Snowplow BDP Cloud
Data compliance, Data governance

SaaS vs private-SaaS to maximize data ownership and compliance

With regulatory frameworks becoming increasingly strict on the ownership and good governance of data, alternative models to SaaS tools are becoming more prominent. 

What is private SaaS?

Many companies are choosing a ‘high governance and control’ approach, by reducing their technical dependencies on SaaS tools and instead are opting for ‘private SaaS’.

SaaS can be seen as a ‘multi-tenanted environment’ or a public cloud in which the Vendor hosts and manages a data tool for many clients. Private SaaS, on the other hand, means the data pipeline is run in a client’s private storage environment, while ongoing pipeline maintenance is managed by the Vendor. 

This model offers a combination of ownership and convenience, while requiring some technical know-how on the part of the Client.

SaaS vs private SaaS vs self-hosted data tools 

This article focuses only on Cloud storage locations.

“Self hosted” generally refers to open source products.

SaaSPrivate SaaSSelf-hosted
HostingVendor public cloudClient private CloudClient private Cloud
Technical levelLowMediumHigh
Choice of storage location (GDPR)RareYesYes
Pipeline managementVendorVendorClient
Data complianceDependent on VendorSignificant Client control Full Client control
Data governanceNoFull Client controlFull Client control
Management costs/overheadsLowFairly lowHigh
Vendor lock-inFrequentRareRare
Black-box decisions made by VendorFrequentNoNo

An example of a private SaaS data pipeline

Critically, a private-SaaS deployment sees the Client’s data move from private digital products, such as apps and websites, to a fully owned storage destination without ever leaving a private cloud environment. This offers complete transparency on the way the data pipeline works, including any logic built into it.

Learn about private SaaS tool ‘Rakam’

The costs and benefits of private SaaS applications for digital analytics

Private SaaS often brings some technical requirements that SaaS customers don’t need to think about, primarily involving the hosting of the service in their private cloud. Despite the fact that this is managed by the Vendor, the Client has to set up permissions and configure their cloud environment correctly in order to get started.

Providing these fairly simple steps are taken correctly, however, private SaaS offers a host of benefits, particularly around ownership and compliance. As the whole infrastructure is owned by the Client, every single decision taken about the data can be fully scrutinized as well as any associated metadata (great for monitoring and observability). This means audits can be carried out without hindrance, and decisions traced back all the way to first principles.

The good, the bad and the ugly of SaaS applications

SaaS tools have different degrees of transparency. 

A minority are fully open-source with the option of a hosted SaaS solution. These solutions can provide full visibility into what’s going on under the hood whilst offering to host the platform to make things easier for the customer (shameless plug: we’ve just created such a solution called BDP Cloud).

The majority of SaaS tools, however, do not offer this transparency. Larger suites of data tools operate in the Vendor’s cloud and don’t allow the Client any visibility into the inner workings of the data pipeline. These inner workings are, in effect, the value provided and so are proprietary.

The benefits of SaaS

SaaS tools can still be very convenient and deliver a great time to value, frequently offering suites of tools that all seamlessly integrate together. 

The most famous example in the world of data is Google; users of Google Analytics have the advantage of being able to easily integrate with a massive array of tools in the Google ecosystem, such as Pubsub for real-time streaming, Bigquery for storage and Data Studio for visualization, not to mention Google Ads, which is very convenient for marketers.

The problems with SaaS

Since Google Analytics is a black-box SaaS tool, it does have its limitations. Software of this type, also including Mixpanel, Segment, and Adobe, means users lack full governance over their data pipelines. This creates downstream issues, with one very topical example being a lack of ‘data sovereignty’ (i.e. users not having the choice of where to store or process their data). This has led to large fines being issued in France, Austria and Denmark for violations of GDPR. 

Another example of a black-box limitation common with packaged SaaS tools is the question as to how your data is used by the provider. Within Google’s walled garden, for example, your data might be sold as part of their advertising ecosystem. You cannot control how or where this data is used, even with reference to the vague terms of service. The erosion of Privacy Shield has further undermined confidence in this type of arrangement.

A similar issue is found with how metrics are defined, such as session-length or how a ‘user’ is defined, with hidden decisions being baked into the SaaS logic. Is Google’s 30-minute session definition, for example, optimal for all digital products, without exception?

Users of SaaS tools should therefore plan for the potential trade-off between convenience and transparency by making a strong business case for their choice of deployment – private SaaS, SaaS or self-hosted.

Snowplow’s approach to private SaaS

Snowplow has always been based on a private-SaaS ethos. We believe strongly that businesses should have full visibility over how their data is managed.

Snowplow manages everything from the automation, deployment and monitoring of pipelines, right up to the scaling and maintenance. Our BDP Enterprise tool is a private SaaS streaming analytics pipeline for behavioral event data. Metrics like ‘time on page’ and ‘session length’ can be tracked with incredible accuracy and used to create data products and applications, such as churn propensity or marketing attribution. Within this, our technical teams support integration with multiple cloud environments and manage integrations with existing customer systems. 

“What we offer is a fully managed service, but it’s isolated in a client’s own sub-account. So essentially, what that means is that each client comes to us and gives us their own sub-account, such as their own Google Cloud project, and we set up and maintain a full data pipeline within that. This way every client has their own isolated infrastructure entirely segmented from every other client. Basically, there’s no shared tenancy across anything”

Josh Beemster, Head of Engineering at Snowplow

The main benefits of our private Saas deployment are:

1. Customization

Snowplow makes sure your data pipeline is set up and tailored to your specific services, using a unique Data Creation approach to tracking.

We can fully customize how we manage instance reservations, security protocols and how we write the size of pipelines for your particular traffic patterns.

We can also manage for cost vs latency. For example, if latency is your priority, we can aim to get it down to around 1 second, but equally we can think about reducing that latency to economize, if that’s the priority. A key question here might be: does the report run once a week or is it constantly updated? 

Similarly, we can limit the number of destinations you load to based on projected expenses. No other mainstream provider really allows you to design your pipeline based on the most optimal way for you to consume information.

Listen to our Head of Engineering explain the challenges involved with offering a private SaaS solution and how we can customize to your needs (please note: this is very technical).

2. Auditability

We make absolutely sure that everything works for each client’s security teams, so when they finally sign off on the implementation, they’ve really thought about whether the configuration is up to spec.  

This includes:

  • Recording basis for tracking as GDPR contexts
  • Optimizing access permissions
  • Designing in accordance with security protocols

Learn more about GDPR contexts

3. Monitoring and Observability

A pipeline latency graph for BigQuery – an essential metric to observe pipeline health

Due to the fully exposed nature of Snowplow’s design, users can observe any metadata associated with their pipeline using CloudWatch and Stackdriver in a way that is not possible with SaaS tools. 

Basically, users can receive the same alerts as our Operations teams, and export these metrics as needed for complete transparency on pipeline status and health.

Take a deep dive into observability

Snowplow’s pipeline

This example is from Snowplow’s private SaaS tool ‘BDP Enterprise’. Data is collected from various sources with an SDK and sent through different processes such as ‘validate’ and ‘enrich’ in the Client’s private cloud environment, before being sent to a data warehouse or lake. From the storage location, it can be integrated with downstream tools, such as ad platforms and BI tools. 

Snowplow’s alternatives to private SaaS

While private SaaS is generally our recommended deployment approach, we recognize that it isn’t for everyone. We also have a fully open-source product (self hosted) and Snowplow BDP Cloud (which is the ‘transparent’ breed of SaaS tool discussed earlier, as design decisions are not made in a black box). 

More about
the author

Phil Western
Phil Western

Product Marketing Manager at Snowplow

View author

Ready to start creating rich, first-party data?

Image of the Snowplow app UI