Workflow Transactions - Avoiding Problems - The Digital Platform Documentation Site

This article looks in more detail at how and when various elements of a process instance are generated.

The workflow engine behaves transactionally. A transaction must complete and reach a "wait" state to be considered successful. Any errors thrown during a transaction cause it to return to its previous state (this could mean the instance not starting at all).

Wait states are elements that cause an execution to pause. These are:

User tasks
Intermediate timer, signal and message events
Any activity marked as asynchronous its properties

See the Workflow Transactions - Foreground and Background Jobs article for more information.

The Business Key and Starting a Process Instance

A unique business key is created for a process instance as part of the call to the workflow worker's startProcess function. The function generates the key, making it immediately available to activities. However, the first transaction must complete and reach a wait state for the key to be returned to the user and the instance to be truly active.

Instances started by methods internal to the workflow engine, like timers, signals, and messages, don't have business keys.

Potential Problems

Consider this example, which deliberately generates an error.

When the form that starts this process is submitted, the process starts. A business key is generated, and can be used in the End Point activity, which completes successfully. The dodgy API Server task then throws an error. Because this error is still part of the first transaction the form displays an error message to the user that the process failed to start. However, the action of the End Point still happened, and (if enabled) the workflow will have written a history record.

By inspecting the API Server trace we can see, from bottom to top:

The request to start an instance of the process
The history being written
The End Point successfully being called and returning
The End Point success being logged to the history
The error generated by the API Server task
The startProcess failing and the error being returned to the form

So despite the process failing to start, two other elements of the platform have carried out their activities with no way to roll back their actions. Similarly, if a mail task were placed before the erroring element, an email would be sent that could include a business key for a process instance that doesn't exist.

Solutions

How you handle this problem depends upon the nature of your process. You could set the problem activity to ignore any errors returned, which would allow the execution to progress to the next stage. Attaching an error boundary event and setting up an alternate path for the execution to follow would allow you to design in some manual intervention or set the activity up to be tried again. The Workflow Error Handling and External Calls - Callbacks and Polling articles look at these options in a bit more detail.

Marking the problem activity as asynchronous would also allow the instance to start successfully with its proper business key, only erroring in the background job created for the asynchronous task. Some form of error handling would still be needed to fix the problem, but at least the instance would exist in the workflow engine.

Messages

Messages can be used to start instances, catch an execution (pausing it until a message is received to proceed), and be attached as boundary events to other elements, potentially cancelling them.

When using any of the message elements you need to set a message reference in that element's properties. This is the message that will be listened for and cause the execution to start/continue/cancel etc when received.

Message Listeners

Consider this process. The first user task is completed, the execution then pauses until a message is received, then it progresses on to another user task.

Elsewhere in this documentation it's noted that there is no point sending a message to a process until the execution arrives and is waiting at the message event. If the message is sent too early it will have no effect (the execution isn't paused yet) and if too late there's nothing to receive it.

Looking at message events in more detail, badly timed messages actually fail because listeners only exist while the execution is at the message event, not when the instance starts. This means that a workflow activity, perhaps an End Point calling an external service, cannot send a message back to its process instance to be caught by a future message event. Care should also be taken when messaging parallel branches (see below).

In the example above, querying the instance when it is at the first user task would show:

{ ... "messageListeners": {} ... }

Performing the same query once the first task is complete and the execution is waiting returns:

{ ... "messageListeners": { "myMessage": { "id": "myMessage", "formName": null } } ... }

Repeating the query once the execution has moved on to task two would also return an empty object. (Note that in the current release of the Workflow worker "includeMessageListeners" only returns information about boundary messages, that's a limitation of the API - the listeners are there!).

Parallel Executions

When an execution arrives a parallel gateway it generates one or more child executions. Each child execution is generated sequentially within a single transaction. Should one error then all could potentially (depending on error handling) be rolled back. However any activities that have non-transactional interactions (calling a third party service) could still be executed.

For a transaction that includes a parallel gateway to commit, all paths from the gateway must come to a successful wait state.

A Problem Gateway

This is a simplified model, but the scenario is not uncommon in the real world.

The execution reaches the gateway then attempts to create three child executions. The user and mail tasks do not error, the End Point task does. This means that the email task will send a message, but the user task will not be created. The execution returns to the gateway and becomes stuck.

Solutions

Guarding against this problem is similar to the previous scenarios discussed above. If the End Point task is marked as asynchronous the process instance will start and reach a wait state before the End Point is called, which will allow the instance to exist in the process engine and the user task to be active.

Error handling should also be set up. You could choose to ignore any errors returned form the End Point, allowing the execution to progress to the next stage, or the activity could be set to throw a BPMN error that can then be handled in the design of the model.

Last modified on 28 February 2019

You are currently offline

Workflow Transactions - Avoiding Problems