SaaS vs private-SaaS to maximize data ownership and compliance
With regulatory frameworks becoming increasingly strict on the ownership and good governance of data, alternative models to SaaS tools are becoming more prominent.
What is private SaaS?
Many companies are choosing a ‘high governance and control’ approach, by reducing their technical dependencies on SaaS tools and instead are opting for ‘private SaaS’.
SaaS can be seen as a ‘multi-tenanted environment’ or a public cloud in which the Vendor hosts and manages a data tool for many clients. Private SaaS, on the other hand, means the data pipeline is run in a client’s private storage environment, while ongoing pipeline maintenance is managed by the Vendor.
This model offers a combination of ownership and convenience, while requiring some technical know-how on the part of the Client.
SaaS vs private SaaS vs self-hosted data tools
This article focuses only on Cloud storage locations.
“Self hosted” generally refers to open source products.
|Hosting||Vendor public cloud||Client private Cloud||Client private Cloud|
|Choice of storage location (GDPR)||Rare||Yes||Yes|
|Data compliance||Dependent on Vendor||Significant Client control||Full Client control|
|Data governance||No||Full Client control||Full Client control|
|Management costs/overheads||Low||Fairly low||High|
|Black-box decisions made by Vendor||Frequent||No||No|
An example of a private SaaS data pipeline
Critically, a private-SaaS deployment sees the Client’s data move from private digital products, such as apps and websites, to a fully owned storage destination without ever leaving a private cloud environment. This offers complete transparency on the way the data pipeline works, including any logic built into it.
The costs and benefits of private SaaS applications for digital analytics
Private SaaS often brings some technical requirements that SaaS customers don’t need to think about, primarily involving the hosting of the service in their private cloud. Despite the fact that this is managed by the Vendor, the Client has to set up permissions and configure their cloud environment correctly in order to get started.
Providing these fairly simple steps are taken correctly, however, private SaaS offers a host of benefits, particularly around ownership and compliance. As the whole infrastructure is owned by the Client, every single decision taken about the data can be fully scrutinized as well as any associated metadata (great for monitoring and observability). This means audits can be carried out without hindrance, and decisions traced back all the way to first principles.
The good, the bad and the ugly of SaaS applications
SaaS tools have different degrees of transparency.
A minority are fully open-source with the option of a hosted SaaS solution. These solutions can provide full visibility into what’s going on under the hood whilst offering to host the platform to make things easier for the customer (shameless plug: we’ve just created such a solution called BDP Cloud).
The majority of SaaS tools, however, do not offer this transparency. Larger suites of data tools operate in the Vendor’s cloud and don’t allow the Client any visibility into the inner workings of the data pipeline. These inner workings are, in effect, the value provided and so are proprietary.
The benefits of SaaS
SaaS tools can still be very convenient and deliver a great time to value, frequently offering suites of tools that all seamlessly integrate together.
The most famous example in the world of data is Google; users of Google Analytics have the advantage of being able to easily integrate with a massive array of tools in the Google ecosystem, such as Pubsub for real-time streaming, Bigquery for storage and Data Studio for visualization, not to mention Google Ads, which is very convenient for marketers.
The problems with SaaS
Since Google Analytics is a black-box SaaS tool, it does have its limitations. Software of this type, also including Mixpanel, Segment, and Adobe, means users lack full governance over their data pipelines. This creates downstream issues, with one very topical example being a lack of ‘data sovereignty’ (i.e. users not having the choice of where to store or process their data). This has led to large fines being issued in France, Austria and Denmark for violations of GDPR.
Another example of a black-box limitation common with packaged SaaS tools is the question as to how your data is used by the provider. Within Google’s walled garden, for example, your data might be sold as part of their advertising ecosystem. You cannot control how or where this data is used, even with reference to the vague terms of service. The erosion of Privacy Shield has further undermined confidence in this type of arrangement.
A similar issue is found with how metrics are defined, such as session-length or how a ‘user’ is defined, with hidden decisions being baked into the SaaS logic. Is Google’s 30-minute session definition, for example, optimal for all digital products, without exception?
Users of SaaS tools should therefore plan for the potential trade-off between convenience and transparency by making a strong business case for their choice of deployment – private SaaS, SaaS or self-hosted.
Snowplow’s approach to private SaaS
Snowplow has always been based on a private-SaaS ethos. We believe strongly that businesses should have full visibility over how their data is managed.
Snowplow manages everything from the automation, deployment and monitoring of pipelines, right up to the scaling and maintenance. Our BDP Enterprise tool is a private SaaS streaming analytics pipeline for behavioral event data. Metrics like ‘time on page’ and ‘session length’ can be tracked with incredible accuracy and used to create data products and applications, such as churn propensity or marketing attribution. Within this, our technical teams support integration with multiple cloud environments and manage integrations with existing customer systems.
“What we offer is a fully managed service, but it’s isolated in a client’s own sub-account. So essentially, what that means is that each client comes to us and gives us their own sub-account, such as their own Google Cloud project, and we set up and maintain a full data pipeline within that. This way every client has their own isolated infrastructure entirely segmented from every other client. Basically, there’s no shared tenancy across anything”Josh Beemster, Head of Engineering at Snowplow
The main benefits of our private Saas deployment are:
Snowplow makes sure your data pipeline is set up and tailored to your specific services, using a unique Data Creation approach to tracking.
We can fully customize how we manage instance reservations, security protocols and how we write the size of pipelines for your particular traffic patterns.
We can also manage for cost vs latency. For example, if latency is your priority, we can aim to get it down to around 1 second, but equally we can think about reducing that latency to economize, if that’s the priority. A key question here might be: does the report run once a week or is it constantly updated?
Similarly, we can limit the number of destinations you load to based on projected expenses. No other mainstream provider really allows you to design your pipeline based on the most optimal way for you to consume information.
Listen to our Head of Engineering explain the challenges involved with offering a private SaaS solution and how we can customize to your needs (please note: this is very technical).
We make absolutely sure that everything works for each client’s security teams, so when they finally sign off on the implementation, they’ve really thought about whether the configuration is up to spec.
- Recording basis for tracking as GDPR contexts
- Optimizing access permissions
- Designing in accordance with security protocols
3. Monitoring and Observability
Due to the fully exposed nature of Snowplow’s design, users can observe any metadata associated with their pipeline using CloudWatch and Stackdriver in a way that is not possible with SaaS tools.
Basically, users can receive the same alerts as our Operations teams, and export these metrics as needed for complete transparency on pipeline status and health.
This example is from Snowplow’s private SaaS tool ‘BDP Enterprise’. Data is collected from various sources with an SDK and sent through different processes such as ‘validate’ and ‘enrich’ in the Client’s private cloud environment, before being sent to a data warehouse or lake. From the storage location, it can be integrated with downstream tools, such as ad platforms and BI tools.
Snowplow’s alternatives to private SaaS
While private SaaS is generally our recommended deployment approach, we recognize that it isn’t for everyone. We also have a fully open-source product (self hosted) and Snowplow BDP Cloud (which is the ‘transparent’ breed of SaaS tool discussed earlier, as design decisions are not made in a black box).