Kinesis S3 0.3.0 released


We are pleased to announce the release of Kinesis S3 version 0.3.0. This release greatly improves the speed, efficiency, and reliability of Snowplow’s real-time S3 sink for Kinesis streams.
Table of contents:
- Embedded Snowplow tracking
- Optimization and efficiency
- More informative bad rows
- Improved Vagrant VM
- Other changes
- Upgrading
- Getting help
1. Embedded Snowplow tracking
This release brings with it the ability to record Snowplow events from within the sink application itself. These events include a heartbeat
which is sent every 5 minutes so we know that the application is still alive and kicking, an event for each failure
in pushing events to the Kinesis Streams or S3 and initialization
/shutdown
events.
Using Snowplow to monitor the performance of the Kinesis S3 application is of course optional.
2. Optimization and efficiency
Release 0.3.0 has also addressed a design flaw which was causing excessive Heap consumption. Essentially events were being stored twice, as a String
and as a Byte[Array]
, which at large volumes could result in the application running out of useable memory. An easy fix was just to increase the useable Heap size for the application or to reduce the maximum amount of events before flushing the app.
However we are not fans of workarounds and this fix should hopefully result in much less memory usage, even under very high volumes. That being said you should still give this app a decent amount of Heap to play with! See ticket #32 for more information.
3. More informative bad rows
Kinesis S3 can emit bad rows corresponding to failed events. These bad rows have a line
field, containing the body of the failed event, and an errors
field, containing a non-empty list of problems with the event. This release adds a timestamp
field containing the time at which the event was failed. This makes it easier to monitor the progress of applications which consume failed events; it also makes it easier to query these bad rows in Elasticsearch.
4. Improved Vagrant VM
Building the Snowplow apps using sbt assembly
in the Vagrant virtual machine involves reading a lot of files. To speed up this process, we have added comments to the project’s Vagrantfile indicating how to use NFS and how to allow the VM to use multiple cores.
5. Other changes
We have also:
- Unified the logging configuration so that you can control both the application and KCL logging level (#19)
- Made Kinesis S3 exit immediately if the bad stream does not exist, rather than waiting until the first bad event (#18)
6. Upgrading
The Kinesis S3 application is now all available in a single zip file here:
http://bintray.com/artifact/download/snowplow/snowplow-generic/kinesis_s3_0.3.0.zip
Upgrading will require various configuration changes to the application’s HOCON configuration file:
- Add the following for logging control for this application:
logging { level: "" }
- If you want to include Snowplow Tracking for this application please append the following:
monitoring { snowplow { collector-uri: "" collector-port: 80 app-id: "" method: "GET" } }
Note that this is an optional section, if you do not want Snowplow tracking to occur, do not add this to your configuration file.
And that’s it – you should now be fully upgraded!
7. Getting help
For more details on this release, please check out the Kinesis S3 0.3.0 release on GitHub.
If you have any questions or run into any problems, please raise an issue or get in touch with us through the usual channels.