We are pleased to announce the release of Kinesis S3 version 0.4.0. Many thanks to Kacper Bielecki from Avari for his contribution to this release!
Table of contents:
1. gzip support
Kinesis S3 now supports gzip as a second storage/compression option for the files it writes out to S3. Using this format, each record is treated as a byte array containing a UTF-8 encoded string (whether CSV, JSON or TSV). The records are then written to files as strings, one record per line and gzipped.
Big thanks go to Kacper Bielecki for contributing this storage option! For more information please see Kacper’s pull request.
Snowplow users please note: you must continue to use the LZO format for storing raw Snowplow events.
2. Infinite loops
With the recent Amazon S3 outage in us-east-1, an issue was discovered where Kinesis S3 was unable to recover the connection to S3 even after the service was restored. This resulted in an infinite loop of failures to
PUT any records into S3. To fix this, we had to manually restart all Kinesis S3 instances.
To prevent this recurring, Kinesis S3 now supports a failure timeout: if failures extend beyond this timeout, then Kinesis S3 will self-terminate. You can specify this timeout in the configuration file:
This feature can be neatly coupled with an automated restart wrapper to ensure that the application will recover without human intervention.
3. Safer record batching
In the previous release post we discussed potential out-of-memory problems for this application. To improve things further we have implemented a new configuration option:
max-records to specify how many records the application is allowed to read per
GetRecords call. This helps prevent the application from suddenly exceeding the Heap with sudden traffic spikes.
Unless you are experiencing out-of-memory issues, we recommend using the default of
10000. Please note that
10000, for the moment, is also the maximum setting. If set any higher an
InvalidArgumentException will be thrown.
4. Bug fixes
We have also:
- Fixed a bug where the Snowplow Tracker was using the wrong event type for
- Added logging for
OutOfMemoryErrorsso it is easier to debug in the future (#29)
The Kinesis S3 application is available in a single zip file here:
Upgrading will require various configuration changes to the application’s HOCON configuration file:
sink.kinesis.insection and configure how many records you want the application to get at any one time
sink.s3section and select either
gzipto control what format files are written in
sink.s3section and enter the maximum timeout in ms for the application
And that’s it – you should now be fully upgraded!
6. Getting help
For more details on this release, please check out the Kinesis S3 0.4.0 release on GitHub.
If you have any questions or run into any problems, please raise an issue or get in touch with us through the usual channels.