Asynchronous End Point Framework

This article looks at one way very large datasets can be processed in batches, with the state of the job stored by the Site Session Store worker, which can then be queried for progress updates.

The API Server supports asynchronous requests, with the result of a request being passed in a callback to another worker. This allows us to create a looping process where one end point calls another, then passes the result back to itself, where a decision can be made whether or not to make the call again. Each pass of the loop processes the next set of data. When all of the data has been processed the loop ends and your end point "does something" with the result.

The framework can be used as is and downloaded from this page. The zip file includes several example "target" end points which actually do the job of processing the data, and can be copied and modified as needed.

Quick Start

Download and import the end point group in the downloads area. You'll find them in a group called formstraining.ajaxpolling.

The start end point has several tests you can run. The Article Query test is looked at in more detail below.

To add your own end point into the framework, it's name needs to be passed to the start end point as the endPointName parameter.

The example framework includes a state storage mechanism which uses the Site Session Store worker. This standard worker is not enabled by default in all sites. You may need to add it into the config file of your API Server as

{ "name": "sitesessionstore", "instances": 1 }

Overview

The core of the framework uses three end points: start, continueCB, and your own target end point. The end points interacting with the Session Store worker are optional and can be removed, and references to them removed from the other end points and their schema.

End Points

Start

The start end point first creates a session which is used to record the progress of the job. It then calls your target end point asynchronously, which will actually do the work that's needed.

resp = this.callWorkerMethod("serverlibrary", params.endPointName, { "sessionData": sessionData, "endPointParams": endPointParams, "_async": { "asyncCallerId": params.id, "callback": { "methodName": "formstraining.ajaxpolling.continueCB", "workerName": "serverlibrary", "additionalParams": { "endPointName": params.endPointName, "id": params.id, "endPointParams": endPointParams, "sessionData": sessionData } } } });

The result of the call to your target end point is passed to continueCB, along with the parameters of the original request in the additionalParams. Passing the original parameters in this way maintains them throughout the loop.

Parameters

The start End points requires the following.

{ "endPointName": "formstraining.examples.articleQuery", "endPointParams": { "fromEmail": "test@testsite.com", "batchSize": 10, "toEmail": "me@testsite.com" }, "id": "1" }

endPointName - the name of your target end point that will do the work in batches
endPointParams - the parameters object included with the initial request to the target end point
id - recorded throughout the processing

Target

The target end point actually performs the job you need doing (database queries, requests to other services, sending spam etc). It is also responsible for batching up the data to process. It is called in two different situations:

An initial request is made by the start end point. This request must include all of the parameters your end point needs in the endPointParams object.
Subsequent requests are made by the continueCB end point. This end point maintains the endPointParams of the initial request and a previousResult object which holds the result of the initial and subsequent calls to the target end point.

The response from your target end point must return an object. This object should include:

A "complete" boolean to indicate whether or not processing is complete
An optional "progress" : { "progressPercent" : 50 } property
An optional finalResult property (which will be stored in the session data)
Parameters needed for the next pass of the loop

Example - Article Query

In reality this example wouldn't need to be processed in batches. Querying one or two properties of items in the iCM database is fairly quick, even when dealing with thousands of items. However, if you are requesting object data, or article content, responses will be slower and use more memory, so making queries in batches of 50 or so makes sense.

This end point queries the iCM database for all article headings and IDs. Articles are queried in batches of ten, and the result of each query is added to a CSV file. The CSV is emailed at the end of the process and then deleted.

The articleQuery end point in the download zip has inline comments you should read alongside this documentation.

Initial Request

The first time this end point is called it returns:

{ "exportFileName": exportFileName, "articleIDs": articleIDs, "totalNumArticles": totalNumArticles, "progress": { "progressPercent": (((totalNumArticles - articleIDs.length) * 100) / totalNumArticles) }, "complete": false }

exportFileName - the path to a temporary file created on the file system using Node's "fs"
articleIDs - an array of the IDs of all of the (non-secure and live) articles in the database
totalNumArticles - a count of the article IDs
progress - the percentage of articles processed so far
complete - false, we need to make another call

This is passed to the continueCB end point as part of the callback from the start end point.

Subsequent Requests

Further calls to the end point process the next batch of articles in the array. The array of current article IDs is "spliced" (remember that splice changes the original array), so our articleIDs array is reduced by the batch size, and those that are spliced-off can be processed.

If there are still articles to process, the end point returns an updated version of response object above.

Final Request

When there are no more article IDs to process (when articleIDs.length = 0) the end point uses the Email worker to send the CSV file of data. The file is then deleted and the end point returns:

{ "complete": true, "finalResult": { "summary": "Finished processing all articles" } }

continueCB

When called for the first time this end point receives the result from the target end point, plus all of the additionalParams set in the start end point.

If the "complete" flag is set as false in the response from the target, continueCB sets up another asynchronous call to the target, with itself as the end point to handle the callback (creating a loop). The endPointName and endPointParams from the start end point are passed in again. The previousResult object holds the full response from the last call to the target, minus the progress count, complete flag, or optional final description properties.

resp = this.callWorkerMethod("serverlibrary", params.endPointName, { "sessionData": params.sessionData, "endPointParams": params.endPointParams, // Provided by the start end point on the first call, then the callback.additionalParams below on subsequent calls "previousResult": params.response.result, // Returned from the target end point "_async": { "asyncCallerId": params.id, "callback": { "methodName": "formstraining.ajaxpolling.continueCB", "workerName": "serverlibrary", "additionalParams": { "endPointName": params.endPointName, "id": params.id, "endPointParams": params.endPointParams, "sessionData": params.sessionData } } } });

Session Data

The example framework uses the Session Store worker to store progress information while data is being processed. If your target end point returns a "progress" object you can use the framework as is.

The start end point creates a session then immediately records a progress update:

var sessionData = this.invokeEP(".initProgress", { "id": params.id }); sessionData = this.invokeEP(".updateProgress", { "sessionData": sessionData, "status": "RUNNING", "progressPercent": 0, "description": "Running" });

Your target end point should return a progress update to the continueCB end point. The examples calculate a percentage:

"progress": { "progressPercent": (((totalNumArticles - articleIDs.length) * 100) / totalNumArticles) }

The continueCB end point then passes that to the session end points for recording:

params.sessionData = this.invokeEP(".updateProgress", { "sessionData": params.sessionData, "status": "RUNNING", "progressPercent": Math.floor(progressPercent), "description": "Running", "accumulate": accumulate });

The final pass of the loop records that the job is complete:

params.sessionData = this.invokeEP(".completeProgress", { "sessionData": params.sessionData, "description": "Completed", "finalResult": finalResult });

Progress can be queried by calling the queryProgress End point with the sessionKey (returned from the start end point) as a parameter.

Last modified on 2 August 2023