Towards universal event analytics – building an event grammar
As we outgrow our “fat table” structure for Snowplow events in Redshift, we have been spending more time thinking about how we can model digital events in Snowplow in the most universal, flexible and future-proof way possible.
When we blogged about building out the Snowplow event model earlier this year, a comment left on that post by Loic Dias Da Silva made us realize that we were missing an even more fundamental point: defining a Snowplow event grammar to underpin our Snowplow event dictionary. Here is part of Loic’s excellent comment – although I would encourage you to read it in full on the blog post:
Hi, we’re also working on an event model for our global eventing platform but our events currently are more macro, inspired by RDF in a sense:
An Actor(id/type) made and Action(verb, context) on another Object(id/type).
Each Actor, Action and Object can hold k/v properties.
The context itself, owned by the action, is a k/v dictionary.
So in designing his event grammar, Loic was influenced by the Resource Description Framework, the W3C specifications for modelling relationships to web resources.
An event grammar inspired by RDF is certainly interesting, but I am using a much older, more sophisticated and more tested “event grammar” to write this sentence: the grammar of human language. Why not start, then, from the core grammar underpinning English, Latin, Greek, German and other languages to see just how far this approach can take us in modelling events in the digital world?
So, in the rest of this post we will:
- Introduce the components of our grammar
- Model some ecommerce events
- Model some videogame events
- Model some digital media events
- Discuss what we have learnt
- Draw some conclusions
1. The components of our grammar
All of the human languages mentioned above (and many, many others) share the same fundamental building blocks in their grammars for describing an event with a verb in the active voice:
To go through these in turn:
- Subject, or noun in the nominative case. This is the entity which is carrying out the action: “I wrote a letter”
- Verb, this describes the action being done by the Subject: “I wrote a letter”
- Direct Object, or simply Object or noun in the accusative case. This is the entity to which the action is being done: “I wrote a letter”
- Indirect Object, or noun in the dative case. A slightly more tricky concept: this is the entity indirectly affected by the action: “I sent the letter to Tom”
- Prepositional Object. An object introduced by a preposition (in, for, of etc), but not the direct or indirect object: “I put the letter in an envelope”. In a language such as German, prepositional objects will be found in the accusative, dative or genitive case depending on the preposition used
- Context. Not a grammatical term, but we will use context to describe the phrases of time, manner, place and so on which provide additional information about the action being performed: “I posted the letter on Tuesday from Boston”
With these grammatical building blocks defined, let’s now put them through their paces modelling some digital events – starting with some online retail events:
2. Modelling some ecommerce events
Here are some ecommerce events mapped to our grammatical model:
In this event, a shopper (Subject) views (Verb) a t-shirt (Direct Object) while browsing an online store (Context).
Here we introduce an Indirect Object which has been affected by the event: the shopper (Subject) adds (Verb) a t-shirt (Direct Object) to her shopping basket (Indirect Object). Again, this is while browsing the online store (Context).
Here we have an Object introduced by preposition: the shopper (Subject) pays (Verb) for his order (Prepositional Object). This is all within the checkout flow (Context).
3. Modelling some videogame events
So far so good, but how well does this model work with events generated by a gaming session?
In a gifting screen within the game (Context), the player (Subject) gifts (Verb) some gold (Direct Object) to another player (Indirect Object).
During a two-player skirmish (Context), the first player (Subject) kills (Verb) the second player (Direct Object) using a nailgun (Prepositional Object). This illustrates how your end-users can be the Object of events, not just their Subjects.
Here we illustrate a reflexive verb: through grinding (Context), the player (Subject) levels herself up (Verb, reflexive). A reflexive Verb is one where the Subject and the Object are the same.
4. Modelling some digital media events
This seems to be working well! Finally, let’s map our new event grammar onto the world of digital media and publishing:
While consuming media on your site (Context), a user (Subject) reads (Verb) an article (Direct Object).
Wanting to share content socially (Context), a user (Subject) shares (Verb) a video (Direct Object) on Twitter (Prepositional Object). Also note that Twitter here is a proper noun (not a common noun).
Working from the moderation UI (Context), an administrator (Subject) bans (Verb) user #23 (Direct Object). This illustrates how an end-user can be the Object of an event, and how someone other than an end-user can be the Subject of the event.
5. What have we learnt
As you can see, it is relatively straightforward to map any of the digital events above into these six “slots” of: Subject, Verb, Object, Indirect Object, Prepositional Object and Context. This is unsurprising: our core grammar has been unambiguously describing events in many different human languages across thousands of years.
Going through the above exercise, several further things have become clear to us that we will want to factor into the Snowplow event grammar going forwards:
Implicit Subjects are a mistake
Most web and event analytics systems make the mistake of making the Subject of the event
implicit:
(End user) adds product to basket (Admin) bans user #23
This is a mistake, because as we have seen above, expressing the Subject is a key component of our event grammar.
Going further, it is particularly dangerous to assume that the Subject of every event is your end-user or customer, because we have seen cases where this is not the case.
An entity can be Subject or Object or both across multiple events
As per these gaming examples:
User #1 gifts gold to user #2 User #2 kills user #3 User #2 levels up Admin bans user #1
As we can see from this, the same entities will be found as Subject, Direct Object, Indirect Object or Prepositional Object depending on the event.
Most analytics systems miss the fact that an end-user (for example) is not merely the implicit Subject of multiple events, but is in fact an entity which is the Subject and the Object of different events.
We can keep our Verbs really simple
All of the events above were modelled simply using verbs in the active voice, not the passive voice:
- Active voice: “I watch a video”
- Passive voice: “the video was watched by Alex”
We don’t need to use passive voice for our event model, because we can always derive (if needed) a passive voice event from our active voice event.
Going further, Verbs conjugate in lots of other ways (tense, person, mood etc) – but again we don’t need to include any of this into our event model: all of this can be derived (if needed) from our event’s Context.
Context is king
Our idea of Context does not map cleanly onto a singular grammatical component, but it is just too useful to exclude. In fact, de facto we already have a rich web context for Snowplow events in our Canonical event model, including:
- When the event occurred
- Where (geographically) the event occurred
- Properties of the device on which the event occurred
6. Conclusions
We hope this has been an interesting exploration of how we can potentially adapt and simplify the grammar of human languages to express a new grammar for digital events. We are really excited about the possibilities this opens up – initially around expressing such a grammar in our new Avro event model, and later hopefully in graph databases such as Neo4J.
Of course, we have only just started to sketch out this new event model, and we hope that it will prompt a wider debate with the Snowplow and analytics communities. We are excited to evolve these ideas and build a model for universal event analytics with you, together – and we look forward to continuing the conversation on our snowplow-user mailing list.
And finally, many thanks again to Loic Dias Da Silva for sharing his original Actor-Action-Object idea on our blog!