Iglu release 2 with a new RESTful schema server

We are pleased to announce the second release of Iglu, our machine-readable schema repository system for JSON Schema. If you are not familiar with what Iglu is, please read the blog post for the initial release of Iglu.
Iglu release 2 introduces a new Scala-based repository server, allowing users to publish, test and serve schemas via an easy-to-use RESTful interface. This is a huge step forward compared to our current approach, which involves uploading schemas to a static website on Amazon S3. The new Scala repository server is version 0.1.0.
In this post, we will cover the following aspects of the new repository server:
- The schema service
- Schema validation and the validation service
- Api authentication
- Running your own server
- Documentation and support
1. The schema service
Our new Scala repository server takes the form of a RESTful API containing various services, the most important of which is the schema service. It lets you interact with schemas via simple HTTP requests.
1.1 POST requests
Use a POST
request to the schema service to publish new schemas to your repository.
For example, let’s say you own the com.acme
prefix (the details regarding owning a vendor prefix will be covered in the API authentication section) and you have a JSON schema defined as follows:
{ "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", "description": "Schema for an Acme Inc ad click event", "self": { "vendor": "com.acme", "name": "ad_click", "format": "jsonschema", "version": "1-0-0" }, "type": "object", "properties": { "clickId": { "type": "string" }, "targetUrl": { "type": "string", "minLength": 1 } }, "required": ["targetUrl"], "additionalProperties": false }
Adding this schema to your own registry of JSON schemas is as simple as making a POST
request following this URL pattern:
HOST/api/schemas/vendor/name/format/version
You have three options to pass the schema you want to add:
- through the request body
- through a form entry named
schema
- through a query parameter named
schema
Additionally, you can add an isPublic
parameter which takes on the value true
or false
depending on whether or not you want to make your schema available to others (it defaults to false
if not specified).
With our example, if we wanted to keep our schema private and pass the schema through the request body, it would be:
curl HOST/api/schemas/com.acme/ad_click/jsonschema/1-0-0 -X POST -H "api_key: your_api_key" -d "{ "your": "json" }"
Or, if we wanted to make our schema public and pass the schema through a query parameter:
curl HOST/api/schemas/com.acme/ad_click/jsonschema/1-0-0 -X POST -H "api_key: your_api_key" --data-urlencode "schema={ "your": "json" }" -d "isPublic=true"
Once the request is processed, you should receive a JSON response like this one:
{ "status": 201, "message": "Schema successfully added", "location": "/api/schemas/com.acme/ad_click/jsonschema/1-0-0" }
1.2 PUT requests
Let’s say you have made a mistake in your initial schema which you would like to correct. You can make a PUT
request in order to correct it following this URL pattern:
HOST/api/schemas/vendor/name/format/version
You can pass the new schema as you would for a POST
request (body request, query parameter or form data). You can also specify an isPublic
parameter if you would like to change the visibility of your schema (going from a private schema to a public one and conversely).
As an example:
curl HOST/api/schemas/com.acme/ad_click/jsonschema/1-0-0 -X PUT -H "api_key: your_api_key" -d "{ "your": "new json" }"
You can also create a schema through a PUT
request if it doesn’t already exist.
1.3 Single GET requests
As soon as your schema is added to the repository you can retrieve it by making a GET
request:
curl HOST/api/schemas/com.acme/ad_click/jsonschema/1-0-0 -X GET -H "api_key: your_api_key"
The JSON response should look like this:
{ "$schema": "http://iglucentral.com/schemas/com.snowplowanalytics.self-desc/schema/jsonschema/1-0-0#", "description": "Schema for an Acme Inc ad click event", "self": { "vendor": "com.acme", "name": "ad_click", "format": "jsonschema", "version": "1-0-0" }, "type": "object", "properties": { "clickId": { "type": "string" }, "targetUrl": { "type": "string", "minLength": 1 } }, "required": ["targetUrl"], "additionalProperties": false, "metadata": { "location": "/api/schemas/com.snowplowanalytics.snplw/ad_click/jsonschema/1-0-0", "createdAt": "08/19/2014 12:51:15", "updatedAt": "08/22/2014 17:22:02", "permissions": { "read": "private", "write": "private" } } }
As you might have noticed, some metadata from the repository server is inserted into the schema. One important thing to note is the permissions
object which contains the read/write authorizations of this specific schema. In particular, the read
field contains the value public
if your schema is public or private
if your schema is private. The write
field contains private
if you have write access for this schema or none
if you do not, all according to your API key’s permission.
If you do not need to retrieve the schema itself and just want to check its metadata, you can send a GET
request to:
HOST/schemas/vendor/name/format/version?filter=metadata
Like this one:
curl HOST/api/schemas/com.acme/ad_click/jsonschema/1-0-0 -X GET -H "api_key: your_api_key" -d "filter=metadata"
To get back:
{ "vendor": "com.acme", "name": "ad_click", "format": "jsonschema", "version": "1-0-0", "metadata": { "location": "/api/schemas/com.acme/ad_click/jsonschema/1-0-0", "createdAt": "08/19/2014 12:51:15", "updatedAt": "08/22/2014 17:22:02", "permissions": { "read": "private", "write": "private" } } }
1.4 Multiple GET requests
If you need to retrieve multiple schemas in one single GET
request you can do so in a few different ways:
Vendor-based requests
You can retrieve every schema belonging to a vendor (if you own it):
HOST/api/schemas/vendor
curl HOST/api/schemas/com.acme -X GET -H "api_key: your_api_key"
You will get back an array of every schema belonging to this vendor.
You can also retrieve every schema from multiple vendors using a comma-separated list of vendors:
HOST/api/schemas/vendor1,vendor2
curl HOST/api/schemas/com.acme,uk.co.acme -X GET -H "api_key: your_api_key"
As you might have assumed you will get back an array of every schema belonging to com.acme
or co.uk.acme
.
Please note: if you do not own those vendors, you will still be able to make these requests but you will only retrieve public schemas (if any).
Name-based requests
Using the same approach, you can get every version of every format of a schema:
HOST/api/schemas/vendor/name
curl HOST/api/schemas/com.acme/ad_click -X GET -H "api_key: your_api_key"
Or try every version of every format of multiple schemas:
HOST/api/schemas/vendor/name1,name2
curl HOST/api/schemas/com.acme/ad_click,ad_impression,ad_conversion -X GET -H "api_key: your_api_key"
Format-based requests
The same concept applies when you want to retrieve every version of a schema in a given format (e.g. JSON Schema):
HOST/api/schemas/vendor/name/format
curl HOST/api/schemas/com.acme/ad_click/jsonschema -X GET -H "api_key: your_api_key"
And if you want to retrieve every version of a schema in multiple formats:
HOST/api/schemas/vendor/name/format1,format2
curl HOST/api/schemas/com.acme/ad_click/jsonschema,jsontable -X GET -H "api_key: your_api_key"
Version-based requests
If you need to retrieve a specific version of a schema we fall back to the case of single GET
requests which we already covered, but you can also retrieve multiple versions:
HOST/api/schemas/vendor/name/format/version1,version2
curl HOST/api/schemas/com.acme/ad_c
lick/jsonschema/1-0-0,1-0-1 -X GET -H "api_key: your_api_key"
Combinations
You can also combine those URLs to satisfy your needs. I will give a few examples in this section.
If you want to retrieve two specific versions of two differents schemas:
curl HOST/api/schemas/com.acme/ad_click,link_click/jsonschema/1-0-0,1-0-1 -X GET -H "api_key: your_api_key"
If you want to retrieve a specific version of a specific schema in two different formats:
curl HOST/api/schemas/com.acme/ad_click/jsonschema,jsontable/1-0-0 -X GET -H "api_key: your_api_key"
Or let’s say you want to compare your schema with a company which has made their schema public:
curl HOST/api/schemas/com.snowplow.snowplowanalytics,com.acme/ad_click/jsonschema/1-0-0 -X GET -H "api_key: your_api_key"
Retrieving multiple disjoint schemas
You can retrieve multiple schemas which are completely independent of each other like so:
HOST/api/schemas/vendor1/name1/format1/version1,vendor2/name2/format2/version2
curl HOST/api/schemas/com.snowplowanalytics.snowplow/ad_click/jsonschema/1-0-0,com.acme/ad_click/jsonschema/1-0-0 -X GET -H "api_key: your_api_key"
Public schemas
You can also retrieve a list of every single public schema with this endpoint:
HOST/api/schemas/public
curl HOST/api/schemas/public -X GET -H "api_key: your_api_key"
Metadata filter
You can add a filter=metadata
query parameter to any of the previous types of URLs if you do not need the whole schemas.
1.5 Swagger support
We have added Swagger support to our API so you can explore the repository server’s API interactively.
The Swagger UI is available at the root URL of your repository server and looks like this:
You will have to enter your API key in the form on the top right of the page. Once this is done, you are free to explore the API using the Swagger UI.
2. Schema validation and the validation service
2.1 Schema validation when adding a schema
One thing you may have noticed is that every schema you add to the repository must be self-describing (please read the post on self-describing JSON Schemas if you are not familiar with the concept). It essentially means that your schema must have a self
property containing itself the following properties: vendor
, name
, format
, version
.
If your schema is not self-describing you will get back this JSON response when trying to add it to the repository:
{ "status": 400, "message": "The schema provided is not a valid self-describing schema", "report": { ... } }
The report
object will contain the full validation failure message for you to analyze.
2.2 The validation service
As well as validating that a schema is self-describing when adding it to the repository, we also provide up a validation service which lets you:
- Validate that a schema is valid without adding it to the repository
- Validate an instance against its schema
For example, if you want to make sure that your schema is a valid self-describing JSON Schema before adding it to the repository:
HOST/api/schemas/validate/format?schema={ "some": "schema" }
curl HOST/api/schemas/validate/jsonschema -X GET -H "api_key: your_api_key" --data-urlencode "schema={ "schema": "to be validated" }"
The schema
query parameter containing the schema you want to validate.
For now, only the jsonschema
format is supported, but additional schema formats such as JSON Table and Avro will be supported in the future.
Similarly to a POST
request, if the validation fails you will receive the following response:
{ "status": 400, "message": "The schema provided is not a valid self-describing schema", "report": { ... } }
With the report
object containing the full validation failure message.
If the validation succeeds, you should get back something like:
{ "status": 200, "message": "The schema provided is a valid self-describing schema" }
You can also validate an instance against its schema:
HOST/api/schemas/validate/vendor/name/format/version?instance={ "some": "instance" }
curl HOST/api/schemas/validate/com.acme/ad_click/jsonschema/1-0-0 -X GET -H "api_key: your_api_key" --data-urlencode "instance={ "instance": "to be validated" }"
Here, the path indicates the schema to validate against and the instance
query parameter the instance to be validated.
Similarly to validating a schema, you will receive the following JSON if the instance is not valid against the schema:
{ "status": 400, "message": "The instance provided is not valid against the schema", "report": { ... } }
The validation service is also accessible through the Swagger UI.
3. API authentication
To restrict access to schemas, we have implemented an API key-based authentication system. The administrator of your Iglu repository server can generate a pair of API keys (one with read access and one with read-and-write access) for any given vendor prefix. Users of the repository server will need to provide this API key with each request through an api_key
HTTP header as shown in the previous examples.
For example, let’s say you work for Acme Inc, and so the administrator of the Iglu repository you are using gives you a pair of keys for the com.acme
vendor prefix.
One of these API keys will have read access and consequently will let you retrieve schemas through GET
requests. The other will have both read and write access so you will be able to publish and modify schemas through POST
and PUT
requests in addition to being able to retrieve them. It is then up to you on to distribute those two keys however you want. Those keys grants you access to every schema whose vendor starts with com.acme
.
As a concrete example, let’s say you request API keys from your administrator and she sends you get back this pair of API keys:
663ee2a1-98a2-4a85-a05b-20f343e4961d
for read access86da37e8-fdac-406a-8c71-3ae964e75882
for both read and write access
Using the second API key you will be able to create schemas, as long as the vendor starts with com.acme
:
curl HOST/api/schemas/com.acme.project1/ad_click/jsonschema/1-0-0 -X POST -H "api_key: 86da37e8-fdac-406a-8c71-3ae964e75882" -d "{ "your": "json" }"
And you will be able to retrieve this schema with either one of those API keys:
curl HOST/api/schemas/com.acme.project1/ad_click/jsonschema/1-0-0 -X GET -H "api_key: 663ee2a1-98a2-4a85-a05b-20f343e4961d"
4. Running your own server
Running your own Iglu repository server lets you publish, test and serve schemas in support of applications like Snowplow and others. Running your own repository server requires a few steps which will be detailed here.
4.1 Installing the executable jarfile
You have two options to get the jarfile:
1. Download the server jarfile directly
To get a pre-built copy, you can download the jarfile from Iglu Hosted Assets, or directly by right-clicking on this link and selecting “Save As…”
2. Compile it from source
Alternatively, you can compile it yourself by cloning the Iglu repo:
git clone https://github.com/snowplow/iglu.git
Navigating to the Scala repository server folder:
cd 2-repositories/scala-repo-server/
And finally, building the jarfile with SBT:
sbt assembly
The jarfile will be saved as iglu-server-0.1.0
in the target/scala.2.10
subdirectory.
4.2 Configuring the server
To configure your server you will have to download a copy of our sample application.conf file and fill in the appropriate values.
The Scala repository server uses PostgreSQL to store all schemas and related data. Assuming that you already have a PostgreSQL instance available, modify your application.conf
file with your PostgreSQL connection details:
postgres.host
postgres.port
,postgres.dbname
postgrs.username
postgres.password
You can also modify the HTTP server settings repo-server.interface
and repo-server.port
to fit your needs.
4.3 Launching the server
Once your application.conf
file is filled in properly, you can launch the server and the necessary tables (apikeys
and schemas
) will be created automatically:
java -Dconfig.file=/path/to/your/application.conf -jar iglu-server-0.1.0 com.snowplowanalytics.iglu.server.Boot
4.4 The super API key
Once the server is launched, you will still need to add a super
API key manually to the database. This API key will be used to generate your clients’ API keys.
insert into apikeys (uid, vendor_prefix, permission, createdat) values ('an-uuid', '.', 'super', current_timestamp<span class="p">);
4.5 The API key generation service
Once your super API key has been created, you will be able to use it to generate API keys for your clients through the API key generation service.
This service is as simple to use as the schema service and validation service.
To generate a read and write pair of keys for a specific vendor prefix simply send a POST
request with this URL using your super API key in an api_key
HTTP header:
HOST/api/auth/keygen
As with the schema service, you have the choice of how you want to pass the new API keys’ vendor prefix:
- through the request body
- through a form entry named
vendor_prefix
- through a query parameter named
vendor_prefix
For example, through a query parameter:
curl HOST/api/auth/keygen -X POST -H "api_key: your_super_api_key" -d "vendor_prefix=com.acme"
You should receive a JSON response like this one:
{ "read": "an-uuid", "write": "another-uuid" }
If you want to revoke a specific API key, send a DELETE
request like so:
HOST/api/auth/keygen?key=some-uuid
curl HOST/api/auth/keygen -X DELETE -H "api_key: your_super_api_key" -d "key=some-uuid"
You can also delete every API key linked to a specific vendor prefix by sending a DELETE
request:
HOST/api/auth/keygen?vendor_prefix=the.vendor.prefix.in.question
curl HOST/api/auth/keygen -X DELETE -H "api_key: your_super_api_key" -d "vendor_prefix=some.vendor.prefix"
The API key generation service is also accessible through the Swagger UI.
5. Documentation and support
And that’s it! As always, if there is a feature you would like to see implemented or if you encounter a bug, please raise an issue on the GitHub project page.
To find out more about the Scala repository server, check out the documentation here:
And if you have more general questions about Iglu or clarifications about this release, please do get in touch with us via the usual channels.