Snowplow 98 Argentomagus released


We are pleased to announce the release of Snowplow R98 Argentomagus. This realtime pipeline release brings some critical security and quality-related improvements, new Scala Stream Collector capabilities plus the introduction of the four webhooks introduced in R97 Knossos’s to the realtime pipeline.
The new features for the Scala Stream Collector were driven by community member Rick Bolkey from OneSpot – huge thanks Rick!
Read on for more information on R98 Argentomagus, named after [the ancient Roman city located in central France][argentomagus]:
- Stream Enrich: better timestamp validation
- Scala Stream Collector: configurable Flash cross-domain policy
- Other Scala Stream Collector improvements
- Upgrading
- Roadmap
- Help
1. Stream Enrich: better timestamp validation
Prior to this release both the realtime and batch enrichment processes would let nonsensical timestamps, such as 22017-11-28 10:01:36
, through. However, those events would fail loading into the database of your choice.
With Argentomagus, our realtime enrichment process will now reject those events, which will be routed to the “bad rows” event stream.
This data quality improvement will make its way to the batch pipeline in the next release.
2. Scala Stream Collector: configurable Flash cross-domain policy
On the security side of things, we have made the cross domain policy of the Scala Stream Collector configurable.
First, what is a Flash cross-domain policy? Quoting the Adobe website:
A cross-domain policy file is an XML document that grants a web client, such as Adobe Flash Player or Adobe Acrobat (though not necessarily limited to these), permission to handle data across domains. When clients request content hosted on a particular source domain and that content make requests directed towards a domain other than its own, the remote domain needs to host a cross-domain policy file that grants access to the source domain, allowing the client to continue the transaction.
To allow a Flash media player hosted on another web server to access content from the Adobe Media Server web server, we require a crossdomain.xml file. A typical use case will be HTTP streaming (VOD or Live) to a Flash Player. The crossdomain.xml file grants a web client the required permission to handle data across multiple domains.
A cross-domain policy file gives the necessary permissions when, for example, you are trying to make a request to a Snowplow collector from a Flash game given that both are running on different hosts.
The Scala Stream Collector embeds what was a very permissive cross-domain policy file, giving permission to any domain and not enforcing HTTPS:
<?xml version="1.0"?> <cross-domain-policy> <allow-access-from domain="*" secure="false" /> </cross-domain-policy>
With Release 98, we’re completely removing the /crossdomain.xml
route by default – it will have to be manually re-enabled by adding the following crossDomain
section to the configuration:
collector { # ... crossDomain { # Domain that is granted access, *.acme.com will match http://acme.com and http://sub.acme.com domain = "*" # Whether to only grant access to HTTPS or both HTTPS and HTTP sources secure = true } }
3. Other Scala Stream Collector improvements
Rick Bolkey from OneSpot has contributed a whole suite of improvements to the Scala Stream Collector – much appreciated, Rick.
3.1 URL redirect replacement macro
This new feature lets you scan your redirect for a pattern and replaces it with the network_userid
. This is a powerful tool for performing cookie matching, aka “cookie sync”, for sharing your Snowplow third-party cookie IDs with an ad platform or similar.
As an example, let’s say you’ve enabled this feature by adding the following to your configuration:
collector { # ... redirectMacro { enabled = true placeholder = "[TOKEN]" } }
And you’re making a redirect request to:
http://your-collector-endpoint/r/tp2?u=http%3A%2F%2Fexample.com%3Fnuid%3D[TOKEN]
The redirect will point to:
http://example.com?nuid=123
Where 123
is the network_userid
.
3.2 Preserving the HTTP scheme when leveraging cookie bounce
In Snowplow R93 Virunum, we introduced cookie bounce. The limitation of this feature was that, when running Scala Stream Collectors behind a load balancer, redirects would lose the original request’s scheme and http
would always be assumed.
Now you can leverage a header specifying the original scheme and use it in your redirect with the following configuration:
collector { # ... cookieBounce { # ... forwardedProtocolHeader = "X-Forwarded-Proto" } }
Note that for AWS Classic ELB, the original request’s scheme is contained in the X-Forwarded-Proto
header; your load balancer may use a different header.
3.3 Bypassing Akka-HTTP partial URL decoding of redirects
When using redirects, the Scala Stream Collector would leverage the built-in Location
header provided by Akka-HTTP, the HTTP server library used by the Scala Stream Collector.
However, if this redirect contained a URL as a query parameter, this URL would be partially decoded and would not be resolvable. This has been fixed in Argentomagus.
4. Upgrading
The real-time applications for R98 Argentomagus are available at the following locations:
http://dl.bintray.com/snowplow/snowplow-generic/snowplow_scala_stream_collector_0.12.0.zip http://dl.bintray.com/snowplow/snowplow-generic/snowplow_stream_enrich_0.13.0.zip
Docker images for those new artifacts will follow shortly.
5. Roadmap
Upcoming Snowplow releases will include:
- R99 [BAT] GDPR support, the first wave of GDPR features being added to Snowplow
- R9x [STR] GCP support, which will let you run the Snowplow realtime pipeline on Google Cloud Platform
- R9x [BAT] Priority fixes, the release analogous to this one for the batch pipeline
6. Getting help
For more details on this release, please check out the release notes on GitHub.
If you have any questions or run into any problems, please visit our Discourse forum.