Snowplow, the global leader in customer data infrastructure (CDI) for AI, enables every organization to own and unlock the value of its customer behavioral data to fuel AI-driven marketing, digital products and services, customer experiences, and fraud mitigation.
Snowplow, the global leader in customer data infrastructure (CDI) for AI, enables every organization to own and unlock the value of its customer behavioral data to fuel AI-driven marketing, digital products and services, customer experiences, and fraud mitigation.
1. New option to lint schemas to a higher standard
Snowplow users will define JSON Schemas for event and context types, and then use Igluctl to auto-generate the associated Redshift table definition using the igluctl static generate command.
Often, a JSON Schema might be entirely valid. However, it is not precise enough to fully determine the corresponding Redshift table definition. Two examples:
1a. Determining the correct numeric type in Redshift
If you have a schema that defines a numeric field e.g.
When generating the associated Redshift table definition, which Redshift numeric type should be assigned to the example_integer_field? Redshift supports three integer types:
smallint, with a range from -32768 to +32767
integer, with a range from -2147483648 to +2147483647
bigint, with a range from -9223372036854775808 to 9223372036854775807
The existing version of the JSON Schema doesn’t have enough information to enable Igluctl to determine which of the above field types in Redshift to use.
Now, if you lint the above schema with the default severity level it will pass, because it is a valid JSON Schema:
However, if you lint the above schema with the increased severity level 2 it will fail because the schema under-determines the associated Redshift table definition:
If we now update the schema to include the minimum and maximumg properties:
Linting the schema with the higher severity level now works:
Now when we use Igluctl to generate our Redshift DDL we can see that Igluctl has correctly set the corresponding Redshift column type to smallint:
1b. Determining the correct string types in Redshift
The same issue of a JSON Schema field definition under-determining the associated Redshift column type occurs for string fields. If we have the following schema, for example:
It is clear that the column type for the example_string_field should be VARCHAR. However, there is nothing to indicate how long the field should be. As a result, the schema under-determines the associated Redshift DDL, and linting the schema with the increased severity level will fail:
If we update the field definition to include a maxLength property:
{"$schema":"http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#","description":"Schema for an example event","self":{"vendor":"com.example_company","name":"example_event","format":"jsonschema","version":"1-0-1"},"type":"object","properties":{"exampleNumericField":{"type":"integer","minimum":0,"maximum":10000},"exampleStringField":{"type":"string","maxLength":100}},"minProperties":1,"additionalProperties":false}
The schema does validate against the higher severityLevel:
Now Igluctl generates the associated Redshift table DDL with the correct field length:
2. Publish schemas and JSON Path files to S3
Previously Igluctl enabled users to publish schemas stored locally to a remote Iglu registry using the igluctl static push command.
However, users that wanted to publish schemas to S3-backed static registries, or publish JSON Path files to S3 so they can be used to load event and context data into Redshift, had to use another tool to do so. (Most commonly Amazon’s excellent AWS CLI).
Igluctl now has a new command: s3cp, for copying files locally to S3. This means that you can publish JSON Schemas to S3-backed static registries:
and publish JSON path files to s3:
3. Other updates
The updated Igluctl includes a number of other small but important updates: