Snowplow R115 with minor updates to EmrEtlRunner and Event Manifest Populator


We are pleased to release Snowplow R115 Sigiriya, named after the ancient ancient rock fortress in Sri Lanka. This Snowplow release includes two updates to EmrEtlRunner and one to Event Manifest Populator (used for cold start deduplication).
1. Updates to EmrEtlRunner
1.1. Bug fix to the function displaying failures
While improving the reliability of EmrEtlRunner in R114, a bug had been introduced where a step could fail without having the error message in the logs.
This has now been fixed and we improved the code quality of EmrEtlRunner
by adding more unit testing.
1.2. Step failure on transient EMR cluster
Still in an effort to improve the reliability of EmrEtlRunner
, an update has been made so that EmrEtlRunner
fails if an EMR step can’t be successfully submitted to a transient EMR cluster.
This was already the case for a standard EMR cluster.
2. Event Manifest Populator
To solve the “cold start” problem for cross-batch deduplication in the RDB Shredder, we had developed the Event Manifest Populator.
It was developed to use events emitted by spark-enrich
as the input, but we have now added the possibility to also read events emitted by stream-enrich
.
3. Upgrading
The new version of EmrEtlRunner with improved reliability is available in our Bintray.
4. Roadmap
2 Snowplow releases are currently being worked on:
- R116 Madara Rider: this release will mainly add features to the Scala Stream Collector, like for instance the possibility to specify custom path mappings, the support for TLS port binding and certificate or the ossibility to use multiple cookie domains.
- R117 Morgantina: this release will incorporate the new bad row format discussed in the dedicated RFC.
Stay tuned for announcements of more upcoming Snowplow releases soon!
5. Getting help
For more details on this release, please check out the release notes on GitHub.
If you have any questions or run into any problem, please visit our Discourse forum.