Snowplow 62 Tropical Parula released
We are pleased to announce the immediate availability of Snowplow 62, Tropical Parula. This release is designed to fix an incompatibility issue between r61’s EmrEtlRunner and some older Elastic Beanstalk configurations. It also includes some other EmrEtlRunner improvements.
Many thanks to Snowplow community member Dani Solà from Simply Business for his contribution to this release!
- Fix to support legacy Beanstalk access logs
- Custom bootstrap actions
- Other improvements to EmrEtlRunner
- Upgrading
- Getting help
1. Fix to support legacy Beanstalk access logs
After the release of r61 Pygmy Parrot, we became aware that the updated file handling code for access logs generated by the Clojure Collector did not work with certain legacy Elastic Beanstalk environments, thus:
- Middle portion of access log filename is
tomcat7_rotated
– works fine - Middle portion of access log filename is
tomcat8_rotated
– works fine - Middle portion of access log filename is
tomcat7
– EmrEtlRunner does not move logs to Staging bucket
This is an easy issue to diagnose: if it affects you, then following an upgrade to r61 Pygmy Parrot, your Snowplow pipeline will copy no Clojure Collector access logs to Staging, and thus generate no enriched events.
This issue (#1480) is resolved in this release: EmrEtlRunner now supports all Clojure Collector access log filename formats again.
2. Custom bootstrap actions
The EmrEtlRunner now has support for adding one or more of your own custom bootstrap actions (#1405). This is particularly useful if you are running your own Hadoop job steps as part of your scheduled jobflow on EMR. Many thanks to Dani Solà for contributing this feature.
You simply set your custom bootstrap actions in your EmrEtlRunner’s config.yml
as an array:
3. EmrEtlRunner improvements
We have made a variety of improvements “under the hood” to EmrEtlRunner:
- EmrEtlRunner now tolerates more exception types in EmrJob’s wait_for (#358). This should reduce the incidence of monitoring failures during EMR runs
- We have bumped the version of Contracts to 0.7 (#1498), and moved
include Contracts
into classes and modules following best practice (#1438) - The missing
:archive:
property has been added into theBucketHash
(#1475) - We have removed
time_diff
as a dependency because it was no longer used (#1352) - The breaking test in the EmrEtlRunner’s test suite is now fixed (#1287). The test suite now passes again
4. Upgrading
You need to update EmrEtlRunner to the latest code (0.13.0) on GitHub:
You must also update your EmrEtlRunner’s configuration file, or else you will get a Contract failure on start. See the next section for details.
Whether or not you use the new bootstrap option, you must update your EmrEtlRunner’s config.yml
file to include an entry for it:
In the :emr:
section of your EmrEtlRunner’s config.yml
file, add in a :bootstrap:
property like so:
For a complete example, see our sample config.yml
template.
5. Getting help
For more details on this release, please check out the r62 Tropical Parula Release Notes on GitHub.
If you have any questions or run into any problems, please raise an issue or get in touch with us through the usual channels.