Iglu JSON Schema Registry 3 Penny Black released
We are excited to announce the immediate availability of a new version of Iglu, incorporating a release of the Swagger-powered Scala Repo Server. Iglu has existed as a project at Snowplow for over two years now: after a period of relative quiet, we have an ambitious release schedule for Iglu planned for 2016, starting with this release.
To reflect the growing importance of Iglu, and the number of moving parts within the platform, we will be following the Snowplow naming system for Iglu, with a release number plus a codename. The individual components of Iglu (such as Iglu clients and servers) will continue to use semantic versioning.
This is release 3; the codenames for Iglu will be famous postage stamps, starting today with the Penny Black. Read on for more information on Release 3 Penny Black.
- Elastic Beanstalk deployment
- Using the Scala Repo Server
- Vagrant quickstart
- The future
- Getting help
Scala Repo Server is a more powerful alternative to our static schema repository, and its API is a superset of that repository’s API. At the moment it offers the following additional features:
- Authentication: in the static repo, anybody can view all schemas. Scala Repo Server supports both public documents and private documents which require a key to access. Multiple users with separate keys can use the same Scala Repo Server instance. Support for authenticated Iglu repos will be coming to Snowplow soon
- Schema validation: in this server, attempts to upload an invalid schema will be rejected. This is in contrast with the static schema repository, which can hold invalid schemas, leading to errors at schema retrieval time
Please be aware that the Scala Repo Server remains in “beta” – we continue to recommend using S3-based static schema repositories for Iglu in conjunction for all production use cases, including with Snowplow; there are no plans to move Iglu Central over to the Scala Repo Server at this time.
2. Elastic Beanstalk deployment
Scala Repo Server can now run on AWS Elastic Beanstalk!
Elastic Beanstalk will automatically configure and manage the EC2 instances needed to run the app. Instructions are available on the Setting up Iglu Server on AWS wiki page.
3. Using the Scala Repo Server
Once we have a Scala Repo Server up-and-running, we can start to interact with it. If you browse to the HTTP root (
/) of your Scala Repo Server’s API, you will see auto-generated Swagger documentation on all of the available API endpoints.
To start with, we need to create a user who can create schemas for a new vendor prefix:
Now let’s grab a schema that we have available and
POST it to Iglu:
It’s important to note:
- All API operations should be addressed to the
isPublic=trueflag ensures that our schema is publically visible – particularly important as Snowplow does not support authenticated repositories yet
The Scala Repo Server should now contain two public JSON Schemas – let’s check:
Good! We can see both of our schemas are now available.
This has been a whirlwind tour through the new capabilities of the Scala Repo Server. I’d recommend setting up an instance, consulting the Swagger documentation at root (
/) and trying out some other commands.
4. Vagrant quickstart
5. The future
We are making it increasingly easy to work with schemas in Snowplow and Iglu. Originally it was necessary to manually write three files:
- The JSON Schema, defining what the event should look like. This is uploaded to an Iglu repo
- The JSON Paths file, used to load JSONs conforming to the schema into Redshift. This is uploaded to S3
and used by Snowplow’s StorageLoader component
- The Redshift table definition DDL, used to create the table into which these JSONs are loaded. This table definition is manually deployed into the Redshift database
Then we started to simplify this process with Schema Guru:
schema-guru schemacan generate a JSON schema from a corpus of JSONs
schema-guru ddlcan automatically generate the JSON Paths file and Redshift DDL for a given schema
Future Iglu releases will make the process still easier. Scala Repo Server will be able to automatically generate the JSON Paths and Redshift DDL files when a schema is uploaded; these files will be served by Scala Repo Server, so it will no longer be necessary to host them in GitHub and/or S3 separately.
A word of warning: the Iglu Server API is still evolving, so future releases are unlikely to be backward compatible with this one. Please continue to use a static repo for all production use cases, such as with Snowplow.
6. Getting help
Information on setting up the Scala Repo Server is available on the wiki.