Iglu R7 Treskilling Yellow released
We are pleased to announce a new Iglu release, R7 Treskilling Yellow, bringing the codebase up-to-date and preparing it for some significant releases going forwards.
- Iglu Core overhaul for Scala developers
- New linters for igluctl users
- A custom format, date, for JSON Schema v4
- Other updates
- Getting Help
Read on for more information on Release 7 Treskilling Yellow, named after a Swedish postage stamp of which only one example is known to exist:
1. Iglu Core overhaul for Scala developers
In preparation for our planned work on schema versioning and schema inference (RFC coming soon), we have made various changes to our Scala reference implementation of Iglu Core.
The most important change is that the
SchemaKey entity now is no longer attachable to both the schema and the self-describing instance itself. Now,
SchemaKey can only be attached to the self-describing instance, and contains a
SchemaVer that can be either
Partial to reflect the fact that we may need to infer the schema’s version.
Meanwhile, schema objects are described by a new
SchemaMap – similar to the
SchemaKey, but with an always-explicit
SchemaVer because a schema must always be defined with a definite version.
Another important change lies in the package structure. Previously, you had to know what type-class instances you need to import and also mark them implicit yourself. This is confusing for people who are exploring Iglu Core and just want to use its primitives.
So now, all you need to know are these two imports:
This should greatly improve the usability and portability of Iglu Core.
And lastly, Iglu Scala Core now supports all major Scala versions (2.10, 2.11 and 2.12), the latest stable version of Circe (0.9.0) and the version of Json4s used widely in Snowplow (3.2.11).
2. New linters for igluctl users
Moving on to less dramatic changes to Iglu, this release adds some new linting features to igluctl’s
lint command, which should allow you to avoid subtle mistakes in your schemata.
2.1 Linting custom formats
Let’s say you have a JSON Schema which applies a fictitious
camelCase format to certain properties. Although this format might be supported by certain non-standard validators, Iglu does not support this format. Our linter now rejects this format, making it explicit that properties will not be successfully validated against it.
This also helps with plain old typos – for example if the schema author mistyped
date-time format as
datetime, then this linter will pick this up.
This new linting feature is available for the (default) first severity level, meaning it will always run.
2.2 Linting optional fields
Let’s say you have a JSON Schema as following:
Although, this is a valid JSON Schema, at Snowplow we find it better to express “nullability” in a record with an actual
null as value, instead of the absence of that property entirely.
Taking this approach gives us some distinct advantages:
- Schema-derivation tools (like Spark DataFrame’s own) will always end up knowing about the null property even for super-small datasets
- It shows that the developer is aware of the optional property (and has not simply forgotten it)
- Null is more convenient when working with templating languages
This explicit use of
null is an opionated preference, available only at the third severity level of linting.
2.3 Linting max
Let’s say you have a JSON Schema as following:
Again, for most users this might look like a valid JSON Schema.
But if you’re going to use this schema with Redshift, you’ll hit one of Redshift’s limitations – that Redshift’s
VARCHAR type is limited to at most 65,535 bytes. It means your data will be silently truncated, which is most likely not what you want.
As this is a common problem in our users’ schemas, we decided to make this available at the default, first severity level.
3. A new “date” custom format
One of the most common mis-steps in writing JSON Schemas is trying to use a
date format which does not formally exist.
In place of prohibiting it with our new linters, Mike Robbins of Snowflake Analytics has submitted a PR implementing support of the
date format for igluctl’s
static generate command. Huge thanks, Mike!
Now, all JSON Schemas with
date format can be transformed into Redshift DDLs with a corresponding
4. Other updates
Treskilling Yellow also fixes an important bug in igluctl, which could generate output filenames which are incompatible with the Snowplow RDB Loader.
Also, we have now added the Iglu Ruby Client to the list of officially supported Iglu clients.
5. Getting help
For more details on this release, as always do check out the release notes on GitHub.
If you have any questions or run into any problems, please visit our Discourse forum.