Snowplow 0.9.9 released with campaign attribution enrichment
We are pleased to announce the release of Snowplow 0.9.9. This is primarily a comprehensive bug fix release, although it also adds the new campaign_attribution
enrichment to our enrichment registry. Here are the sections after the fold:
- The campaign_attribution enrichment
- Clojure Collector fixes
- StorageLoader fixes
- EmrEtlRunner fixes and enhancements
- Hadoop Enrich fixes and enhancements
- Upgrading
- Documentation and help
1. The campaign_attribution enrichment
Snowplow has five fields relating to campaign attribution: mkt_medium
, mkt_source
, mkt_term
, mkt_content
, and mkt_campaign
. In previous versions of Snowplow, the values of these fields were based on the corresponding five utm_
fields supported by Google for campaign manual tagging.
The new campaign_attribution
enrichment allows you to alter this behavior. For each of the five fields, you can specify an array of querystring fields to check for the appropriate value.
This is the configuration to use if you want to duplicate the functionality of previous Snowplow versions, populating the campaign fields from the standard utm_
querystring parameters:
The JSON has the same format as the JSONs for the other enrichments: static name
and vendor
fields, an enabled
field which can be used to turn the enrichment off, and a parameters
field containing data specific to the enrichment:
- The
mapping
must, for now, have the value “static”. See the Roadmap section below for an explanation of our plans for this field. - The
fields
field matches each of the five Snowplow mkt_ fields with a list of querystring fields to be populated by.
With the above configuration, if the querystring contained
...&utm_content=logolink&utm_source=google&utm_medium=cpc&utm_term=shoes&utm_campaign=april_sale...
then the fields would be populated like this:
Field | Value |
---|---|
mkt_medium |
"cpc" |
mkt_source |
"google" |
mkt_term |
"shoes" |
mkt_content |
"logolink" |
mkt_campaign |
"april_sale" |
You can have more than one querystring field in each array:
The first field name found takes precedence. In this example, if there is a “utm_medium” field in the querystring, its value will be used as the ‘mkt_medium’; otherwise, if there is a “medium” field in the querystring, its value will be used; otherwise, the mkt_medium
field will be null
.
We plan on extending the campaign_attribution
enrichment to also extract the advert’s click ID as well, if found (#1073). This will serve as a good basis for more granular campaign analytics.
We have also sketched out a potential option to set the "mapping"
field to “script” to enable JavaScript scripting support (#436). This would allow the use of more complex custom transformations to extract campaign attribution values from the querystring.
2. Clojure Collector fixes
We have fixed a pair of bugs which caused issues with the IP addresses recorded by the Clojure Collector, especially when running in a VPC with multiple nodes. The tickets are here:
- Fixed regression in log record format caused by #854 (#992)
- Correctly handles multiple IPs in X-Forwarded-For (#970)
Thank you for your patience in the resolution of these issues – we have had the updated version in test with various respondents and everything seems to be functioning correctly now.
3. StorageLoader fixes
There was an issue (#1012) where the StorageLoader was attempting to fetch JSON Path files from the main Snowplow Hosted Assets bucket, which is in eu-west-1
. For users trying to load shredded JSONs into a Redshift instance in another region, the COPY FROM JSON
was failing because any JSON Path files must be in the same region as the target table.
We have fixed this by mirroring all of our hosted assets (including JSON Path files) to per-region buckets (s3://snowplow-hosted-assets-us-east-1
etc). Then StorageLoader chooses the correct Snowplow Hosted Assets bucket to use, based on the region of the target Redshift database.
4. EmrEtlRunner fixes and enhancements
We have resolved two issues which should facilitate the smoother running of EmrEtlRunner:
- We fixed a regression with
--process-enrich
, thanks to community member Rob Kingston for spotting this (#1089) - Now if there are no rows to process, EmrEtlRunner correctly returns a 0 status code at the command-line, not a 1 as before (#1018)
To make EmrEtlRunner more robust in scenarios where it is run very frequently (e.g. every hour), we have added in checks that the :enriched:good
and :shredded:good
folders are empty before starting jobflow steps that would write additional data to them. Please see issue #1124 for more details on this.
5. Hadoop Enrich fixes and enhancements
0.9.9 fixes a bug in how Snowplow’s Hadoop Enrichment process validates an incoming (i.e. tracker-generated) event_id
UUID. According to the specification, UUIDs with capital letters are valid on read. This release fixes the bug by downcasing all incoming UUIDs.
This release also now supports trackers sending in the original client’s useragent via the &ua=
parameter (new in the Snowplow Tracker Protocol). This is useful for situations where your tracker does not reflect the true source of the event, e.g. with the Ruby Tracker reporting a user’s checkout event in Rails.
Finally, this version of the Hadoop Enrichment process introduces some more robust handling of numeric field validation (#570 and #1062).
6. Upgrading
You need to update EmrEtlRunner and StorageLoader to the latest code (0.9.2 and 0.3.3 respectively) on GitHub:
This release bumps the Hadoop Enrichment process to version 0.8.0.
In your EmrEtlRunner’s config.yml
file, update your Hadoop enrich job’s version to 0.8.0, like so:
For a complete example, see our sample config.yml
template.
If you upgrade Hadoop Enrich to version 0.8.0 as above, you MUST also follow these steps, or else campaign attribution will be disabled.
To use the new enrichment, add a “campaign_attribution.json” file containing a campaign_attribution
enrichment JSON to your enrichments directory. Note that the previously automatic behaviour of populating the mkt_
fields based on the utm_
querystring fields no longer occurs by default. To reproduce it you must use the Google-like manual tagging configuration.
This release bumps the Clojure Collector to version 0.8.0.
To upgrade to this release:
- Download the new warfile by right-clicking on this link and selecting “Save As…”
- Log in to your Amazon Elastic Beanstalk console
- Browse to your Clojure Collector’s application
- Click the “Upload New Version” and upload your warfile
7. Documentation and help
Documentation relating to enrichments is available on the wiki:
As always, if you do run into any issues or don’t understand any of the above changes, please raise an issue or get in touch with us via the usual channels.