History Digest Overview

This article provides and overview of history digests. For full documentation see the History Digests API.

What are History Digests for?

History data is unstructured and large. Any data that can be expressed in JSON can be stored in the subject or in an event in a history, so any collection of fields might exist in each subject/event. Because of the large number of events and their unstructured nature, it can be slow and cumbersome to get answers to certain questions from the history service. Questions like:

"How many different reports have come from this postcode?"
"Did we close more cases in August than July?"

What does a Digester do?

Digesters take the entire history database and copy a subset of the data into a separate database table. This has two advantages over the complete history database:

The data is structured
The data is small

Only the fields that were specified are copied into the digest, so all the other fields in each event are not present. Only the histories and events that were of interest to the digester are copied so the bulk of the history data is left behind. This small table with only the necessary columns can give very high performance in answering the sort of questions posed above.

Digests vs Aggregations

Most of these types of questions rely on aggregate functions like SUM COUNT AVERAGE MIN MAX but the digester doesn't do any aggregations across multiple histories. This is because, while history events are immutable, histories can be deleted. If an aggregation (like an average cost) were saved and then one of the histories which contributed to that average were deleted, it wouldn't be possible to modify the average to match. It would be like trying to take one egg out of an omelette.

Digest tables are strictly one-row-per-history and when a history is deleted so too is that row of the digest. All the aggregations should be performed either as the data is queried from the view or in post processing of the queried data. The two main challenges of designing a digest are deciding what data to keep in the digest and working out how to query that data to produce meaningful information.

Designing a Digester

registerDigest() builds your digester. It includes different levels of filter which can:

Target specific histories
Only consider histories that contain certain events
Pick the events within a history that will be digested

Each row in your digest table corresponds to one history, and the columns in your table will be one of seven types, populated using one of nine column operations, including the ability to create child tables.

That level of flexibility can make a digester difficult to get your head around!

Which Histories?

There are two filters you can use to select the histories to be digested. They can be used on their own or together.

The top-level filter parameter uses a filter object to select histories by their top-level properties, like their labels or created and last updated dates. For example:

"filter": { "key": "labela", "EQ": "Customer Enquiry" }

The top-level eventFilter parameter uses a filter object to select histories by the properties of an event in the history. When a history is found that has an event that matches the filter, that event and any subsequent events will be digested. For example:

"eventFilter": { "key": "event", "EQ": "Mail Sent" }

Generally the top-level eventFilter is used to refine the histories selected by the filter - you can use it to only consider a history once an event has happened. It can also be used on its own, brining every history into scope (for example every history event that records a Case Management case as being "escalated" no matter what the case type).

Which Events?

Once you have created filters that select relevant histories, the columns in your digester use an operation and optional eventFilter to pick the event data (also history labels) that will populate the table.

Each column you define always has a name, type and operation. Operations can retrieve a history's labels and "sealed" state, values and timestamps from events, can count events, and create child tables.

Child tables have their own set of columns. These columns have their own operations. all selects a value (which matches a key) or timestamp from relevant events. distinct creates one column in the child table for data that matches a value.

A column operation used on it's own will find the first or last value (as appropriate) from any event in the history. You can also use a column-level eventFilter so that only certain events are considered.

For example, this operation would record the timestamp of the first time an event called "Mail Sent" was recorded:

{ "name": "first_response_time", "type": "datetime", "eventFilter": { "key": "event", "EQ": "Mail Sent" }, "operation": "firsttimestamp" }

And this would count how many events called "Mail Sent" there are in the history:

{ "name": "purchase_count", "type": "int", "eventFilter": { "key": "event", "EQ": "Mail Sent" }, "operation": "countevents" }

Performing the Digest

Once a digest has been registered, digestHistories() actually performs the digest operation. That function is often used in an End Point which can be called using a Scheduled Tasks. Where we have used digests to provide data for dashboards, the scheduled task generally calls the End Point every five minutes.

Last modified on 11 October 2022

This website uses cookies