Possible Values of the refr_medium Field in Snowplow
Understanding the refr_medium field is essential for accurately classifying traffic sources in Snowplow data. This post details the possible values of the refr_medium field and provides guidance on interpreting and managing these values effectively.
Q: What are the possible values of the refr_medium field?
The refr_medium field can take the following values:
- NULL: Occurs when the page view has no referrer.
- internal: The referrer and the page URL share the same host, or the referring domain is configured as internal.
- social: Identified as social traffic based on the referer-parsing YAML file.
- search: Classified as search engine traffic based on the referer-parsing YAML file.
- email: Identified as email traffic based on the referer-parsing YAML file.
- unknown: Any referrer that does not match known patterns in the referer-parsing YAML file.
Q: How does Snowplow determine the refr_medium value?
The values are derived based on the configuration in the referer-parser YAML file. This file includes predefined lists of domains categorized by medium (e.g., social, search, email).
- internal: Determined by matching the referring domain with the configured internal domains in referer_parser.json.
- social, search, email: Determined by checking the referring domain against the lists in the referer-parsing YAML file.
Q: What if the referrer is classified as unknown?
If the referrer is classified as unknown, it means that the referring domain was not present in the YAML file. To address this:
- Update the YAML file: If the domain should be classified under a specific medium, consider submitting a pull request to update the referer-parsing YAML file.
- Monitor Future Releases: New versions of the referer YAML file may include updated classifications for common domains.
Q: Can the referer YAML file be customized?
Yes, Snowplow users can edit the YAML file to include custom domains or reclassify existing ones. However, note that:
- Custom Versions: In upcoming releases, Snowplow plans to decouple the YAML file from the enrichment pipeline, allowing for more flexible updates without affecting the core pipeline.
- Best Practices: When customizing the file, follow the existing structure to maintain consistency and avoid breaking the parsing logic.
Q: How can I query the refr_medium values in Redshift?
To analyze the distribution of refr_medium values in the atomic.events table, run the following query:
SELECT refr_medium, COUNT(*)
FROM atomic.events
GROUP BY refr_medium
ORDER BY COUNT(*) DESC;
Final Thoughts
The refr_medium field is a vital component for tracking traffic sources in Snowplow. By understanding how these values are determined and how to handle unknown values, Snowplow users can maintain accurate traffic classification and ensure comprehensive data coverage.