Toggle menu

Complex Digest with Query Examples

This digester contains many more columns. It produces data suitable for generating statistical reports and it demonstrates most of the capabilities of digesters.

{
    "id": 1002,
    "method": "registerdigest",
    "params": {
        "name": "binoperationstasks",
        "filter": {
            "AND": [
                {
                    "key": "labela",
                    "EQ": "Bin Operations Task"
                },
                {
                    "key": "labelc",
                    "EQ": "binOperationsTask"
                }
            ]
        },
        "eventFilter": {
            "OR": [
                {
                    "KEYEXISTS": "taskDate"
                },
                {
                    "KEYEXISTS": "uprn"
                },
                {
                    "KEYEXISTS": "binTypes.quantity"
                }
            ]
        },
        "columns": [
            {
                "name": "Process",
                "type": "varchar",
                "size": 50,
                "operation": "label",
                "key": "labela"
            },
            {
                "name": "BusinessKey",
                "type": "varchar",
                "size": 50,
                "operation": "label",
                "key": "labelb"
            },
            {
                "name": "Sealed",
                "type": "boolean",
                "operation": "issealed"
            },
            {
                "name": "Started",
                "type": "datetime",
                "operation": "firsttimestamp"
            },
            {
                "name": "SlotType",
                "type": "varchar",
                "size": 16,
                "operation": "firstvalue",
                "key": "slotType"
            },
            {
                "name": "ProcessType",
                "type": "varchar",
                "size": 20,
                "operation": "firstvalue",
                "key": "processType"
            },
            {
                "name": "Ward",
                "type": "varchar",
                "size": 50,
                "operation": "firstvalue",
                "key": "loc_ward"
            },
            {
                "name": "Area",
                "type": "varchar",
                "size": 50,
                "operation": "firstvalue",
                "key": "loc_area"
            },
            {
                "name": "Latitude",
                "type": "float",
                "operation": "firstvalue",
                "key": "loc_lat"
            },
            {
                "name": "Longitude",
                "type": "float",
                "operation": "firstvalue",
                "key": "loc_lon"
            },
            {
                "name": "taskDate",
                "type": "varchar",
                "size": 20,
                "operation": "lastvalue",
                "key": "taskDate"
            },
            {
                "name": "UserRole",
                "type": "varchar",
                "size": 20,
                "operation": "firstvalue",
                "key": "userRole"
            },
            {
                "name": "BinTypes",
                "type": "table",
                "operation": "all",
                "eventFilter": {
                    "AND": [
                        {
                            "KEYEXISTS": "binTypes.quantity"
                        },
                        {
                            "key": "binTypes.reviewed",
                            "EQ": true
                        }
                    ]
                },
                "columns": [
                    {
                        "name": "Quantity",
                        "type": "int",
                        "operation": "value",
                        "key": "binTypes.quantity"
                    },
                    {
                        "name": "Charge",
                        "type": "float",
                        "operation": "value",
                        "key": "binTypes.charge"
                    },
                    {
                        "name": "BinType",
                        "type": "varchar",
                        "size": 50,
                        "operation": "value",
                        "key": "binTypes.binType"
                    },
                    {
                        "name": "ServiceChannel",
                        "type": "varchar",
                        "size": 20,
                        "operation": "value",
                        "key": "binTypes.serviceChannel"
                    },
                    {
                        "name": "WhyReplacing",
                        "type": "varchar",
                        "size": 50,
                        "operation": "value",
                        "key": "binTypes.whyReplacing"
                    }
                ]
            }
        ]
    }
}

The "eventFilter" means that in addition to the "filter" on labels, each history will only become eligible for digestion once it contains an event with either a "taskDate", "uprn" or "binTypes.quantity".

The columns "Process" and "BusinessKey" copy values from history labels just like in the first example.

The column "Sealed" copies the "sealedness" of the entire history. That is whether the history has been sealed at the time of digestion.

The column "Started" copies the timestamp of the first event that is considered in the history. The first event that is considered in each history has either a "taskDate", "uprn" or "binTypes.quantity". A column level "eventFilter" could also be set, which would bring even fewer events into consideration. This can be useful to get data like "when was this case first reviewed" or "when was this case finally closed".

Columns "SlotType", "ProcessType", "Ward", "Area", "Latitude", "Longitude", "taskDate", "UserRole" all use the "firstvalue" or "lastvalue" operations to copy fields from the history into the digest.

The column "BinTypes" is much more complicated and interesting. Because multiple bins are recorded one-per-event in the history we need to keep data from multiple events in the same history. Keeping just the first or last value would miss all the bins other than the first or last one. The operation is "all" and the column type is "table". The digest will create a child view called "dh_binoperationstasks_bintypes_vw" which is linked to the main digest "dh_binoperationstasks_vw" by the public history id. "dh_binoperationstasks_vw" will have one-row-per-eligible-history, containing the labels, fields and timestamps that we have already described. "dh_binoperationstasks_bintypes_vw" can have any number of rows associated with each one row in "dh_binoperationstasks_vw". "BinTypes" has its own event filter so that only events with "binTypes.quantity" and "binTypes.reviewed"=true get considered when populating the child view. "BinTypes" has 5 columns which all use the "value" operation to copy data from the event into the child view. The sub-operations available within "all" are "value" and "timestamp".

Query Techniques

All Data

-- This shows all the digested bin tasks data by joining both views
SELECT *
FROM dh_binoperationstasks_vw tasks, dh_binoperationstasks_bintypes_vw bintypes
WHERE tasks.PublicID = bintypes.PublicID
ORDER BY tasks.PublicID

PublicIDprocessbusinesskeysealedstartedslottypeprocesstypewardarealatitudelongitudetaskdateuserrolePublicIDquantitychargebintypeservicechannelwhyreplacing
D45EB573-FDFD-44E8-B1C1-01B102D302A9"Bin Operations Task""5675-2052-3762-1492"02021-01-05 15:01:39.167"Replacement""Replacement bins""DUNSTON AND TEAMS""Inner West"54.9483299255371-1.6357569694519"2021-01-14""PUBLIC"D45EB573-FDFD-44E8-B1C1-01B102D302A9135"Garden Waste 240L""online""Lost"
D45EB573-FDFD-44E8-B1C1-01B102D302A9"Bin Operations Task""5675-2052-3762-1492"02021-01-05 15:01:39.167"Replacement""Replacement bins"DUNSTON AND TEAMS""Inner West"54.9483299255371-1.6357569694519"2021-01-14""PUBLIC"D45EB573-FDFD-44E8-B1C1-01B102D302A9125"Recycling Blue With Inner Caddy 240L""online""Lost"
B89E60FD-3A5E-4AE7-8A47-0254A79337CC"Bin Operations Task""4715-1814-7663-9659"12020-12-31 08:39:30.687"Repair""Replacement bins""BRIDGES""Central"54.9553031921387-1.59022200107574"2021-01-12""PUBLIC"B89E60FD-3A5E-4AE7-8A47-0254A79337CC10"Recycling Blue With Inner Caddy 240L""online""Damaged"
2C720AEE-7905-49C5-B9CD-05B92CC254CD"Bin Operations Task""6292-8744-1200-6828"02020-12-30 09:18:40.903"Delivery""Additional bins""DUNSTON HILL AND WHICKHAM EAST""Inner West"54.9434547424316-1.64054596424103"2021-01-07""PUBLIC"2C720AEE-7905-49C5-B9CD-05B92CC254CD135"Residual 240L""face"NULL
9B474A76-0E21-4DD3-9E9A-097101D2AD4C"Bin Operations Task""8164-4667-5524-6821"12021-01-25 15:31:27.813"Delivery""Additional bins""WHICKHAM NORTH""Inner West"54.9493026733398-1.68119704723358"2021-02-04""PUBLIC"9B474A76-0E21-4DD3-9E9A-097101D2AD4C135"Residual 240L""face"NULL
Many rows omitted for clarity                

Notice that some business keys (the first two rows) include multiple bin types and charges and the some fields (bottom right) contain NULL because there was no value in the history to digest.

Focused aggregate query

This query uses the aggregate function SUM() and groups the results by ward to answer the question "how many bins in each ward". This is a very useful type of query, but to produce a dashboard or statistical report you would have to issue many similar queries.

-- Focused aggregate query for "how many bins in each ward"
SELECT SUM(quantity) AS NumBins, ward
FROM dh_binoperationstasks_vw tasks, dh_binoperationstasks_bintypes_vw bintypes
WHERE tasks.PublicID = bintypes.PublicID
GROUP BY ward

NumBinsward
16"BRIDGES"
24"DUNSTON AND TEAMS"
13"DUNSTON HILL AND WHICKHAM EAST"
9"HIGH FELL"
4"LOW FELL"
91"WHICKHAM NORTH"

Group by everything aggregate query

This is the form of query that most of our dashboards and reports have tended to use. All the columns of interest to the report appear either in an aggregate function or in the GROUP BY clause. While the result-set might not be as useful as a focused query it is simple to post-process it into multiple useful metrics. You could run through the result-set once, row by row, and maintain counters and totals for, say, the total price charged for new bins and the number of damaged bins in each ward.

-- This is the classic "group by everything" structure that our projects have tended to use
SELECT bintype, SUM(quantity) AS NumBins, SUM(quantity * charge) AS Gross, 
    slottype, ward, userrole, servicechannel, processtype, whyreplacing
FROM dh_binoperationstasks_vw tasks, dh_binoperationstasks_bintypes_vw bintypes
WHERE tasks.PublicID = bintypes.PublicID
GROUP BY bintype, slottype, ward, userrole, servicechannel, processtype, whyreplacing

bintypeNumBinsGrossslottypewarduserroleservicechannelprocesstypewhyreplacing
"Garden Waste 240L"5175"Delivery""HIGH FELL""PUBLIC""online""Additional bins"NULL
"Garden Waste 240L"270"Delivery""LOW FELL""PUBLIC""online""New bins"NULL
"Garden Waste 240L"411435"Delivery""WHICKHAM NORTH""PUBLIC""online""Additional bins"NULL
"Garden Waste 240L"10"Repair""LOW FELL""PUBLIC""online""Replacement bins""Damaged"
"Garden Waste 240L"20"Repair""WHICKHAM NORTH""PUBLIC""online""Replacement bins""Damaged"
"Residual 240L"135"Delivery""HIGH FELL""PUBLIC""face""Additional bins"NULL
"Residual 240L"399"Delivery""WHICKHAM NORTH""PUBLIC""face""Additional bins"NULL
"Residual 240L"135"Replacement""WHICKHAM NORTH""PUBLIC""online""Replacement bins""Damaged"
Many rows omitted for clarity     

Time buckets

Some reports need to split the data into different periods, but you don't want to go back to the database for every single period. This technique uses an inner query with a CASE statement to put the data into different time buckets. The outer query can group by all the usual fields as well as the time buckets.

-- Inner query with time buckets
SELECT start_bucket, bintype, SUM(quantity) AS NumBins, SUM(quantity * charge) AS Gross, 
slottype, ward, userrole, servicechannel, processtype, whyreplacing
FROM (
    SELECT
        (
            CASE WHEN started >= '2020-12-01 00:00:00' AND started < '2021-01-01 00:00:00' THEN 'December'
            WHEN started >= '2021-01-01 00:00:00' AND started < '2021-02-01 00:00:00' THEN 'January'
            WHEN started >= '2021-02-01 00:00:00' AND started < '2021-03-01 00:00:00' THEN 'February' END
        ) AS start_bucket,
        bintype, quantity, charge, slottype, ward, userrole, servicechannel, processtype, whyreplacing
    FROM dh_binoperationstasks_vw tasks, dh_binoperationstasks_bintypes_vw bintypes
    WHERE tasks.PublicID = bintypes.PublicID
) bucketedbins
GROUP BY start_bucket, bintype, slottype, ward, userrole, servicechannel, processtype, whyreplacing

start_bucketbintypeNumBinsGrossslottypewarduserroleservicechannelprocesstypewhyreplacing
January"Residual 240L"135"Delivery""DUNSTON AND TEAMS""PUBLIC""online""Additional bins"NULL
January"Residual 240L"10"Repair""DUNSTON AND TEAMS""PUBLIC""online""Replacement bins""Damaged"
February"Residual 240L"60"Repair""DUNSTON AND TEAMS""PUBLIC""online""Replacement bins""Damaged"
Many rows omitted for clarity      

This result-set would still be post-processed row-by-row to add up the counts and totals for different wards and different months.

Last modified on November 18, 2022

Share this page

Facebook icon Twitter icon email icon

Print

print icon