Understanding Data Retention Schedules

The schedules you create in the Data Retention Manager are important. They irreversibly delete data from the platform that cannot be recovered. You need to make sure that your schedules only delete the data you want to be deleted, but also make sure that data is not left behind that should be removed.

This article provides a detailed look at how data is identified for deletion, what happens when data is found, and what is deleted.

How is History Data Selected?

Your DRM schedules look at history records, so it's important you understand our history conventions and how we use "labelc" values.

History Labels

When you create a schedule, the first thing you do is pick the history "labela" value you'd like to make a schedule for. This label is the name of the service (eg the name of the form or workflow process that wrote the history).

Histories also have a "labelc" value (you can ignore "labelb" - that's a randomly generated unique ID for the history, which will match the process instance business key if the history was created by a workflow process). The "labelc" is used to identify the type of history record. Different products use this label for different things, but common uses include storing emails, attachments or additional notes separate to the main history. The default "labelc" for histories written by workflow processes is null, and histories like this will have a record of everything that happened during the process instance.

When you create your DRM schedule you pick which "labelc" records you would like to delete. You can delete all of them at once, or have different schedules for different labels. In the list of labels you'll also see {DEFAULT} which includes notes, attachments, email and null.

There are two special types of "labelc" value:

"reporting" - This label is only ever used to store data that has no personally identifiable information in it. It might be a count of processes, dates, the duration, or the type of request. These histories are what our dashboards use as a data source. Keep these records for as long as you need to be able to report on your process
"audit" - This label is used to record the fact that other histories have been deleted. These histories won't contain any personal data and should not be deleted

Retention Periods

When your schedule runs (which will happen on a nightly basis as part of a scheduled task set up during Installation) it will go and look for histories that have the "labela" and "labelc" values you've chosen. Histories that are outside of your retention period are captured for further filtering.

You can choose whether your retention period for selecting histories counts from the date the history was created or the date it was last updated.

Sealed Histories

Histories can be in one of two states, sealed or not sealed. Sealed histories are considered "historic" - they cannot have any further events added to them. Histories that are not sealed are "active" - they can have new events added to them.

In most circumstances, while a process instance is active, its accompanying history is not sealed. When the instance ends, the history is sealed. However, in some situations a history may not be sealed. For example, in Case Management, the "notes" history is left unsealed so that notes can be added to closed cases.

We generally recommend that your schedules delete both types of history.

Filtering History Records

Just because a history record falls outside of your retention period doesn't mean it will be deleted.

Active Process Instances

If a history record relates to an active process instance (we can check this because the "labelb" is also the process instance business key) it is never deleted, even if it falls outside of your retention period. The DRM checks for matching active business keys and if it finds one, the history record is filtered out.

Closed Process Instances

When the DRM finds a closed process instance, it checks its end date. Counting backwards from "now" (ie the time the schedule runs) if the end date is within your retention period the history record is kept, if it is older than your retention period it moves to the deletion stage. It doesn't matter if your schedule selected created or last updated date as the basis of the retention period - if there's a closed process instance, the retention is always calculated from its end date.

No Process Instance

The DRM might not find a process instance for a history record. That could be because the history was created by something else, or because the process instance has already been deleted.

Where there is no process instance, the created or last updated date of the history record is used (depending on what you chose for your retention period).

If you create multiple schedules for the histories written by a process instance, take care which schedule deletes the closed instance because once it has gone the retention period will switch to using the dates of the remaining history records, not the end date of the instance.

Which Histories and Process Instances are Deleted?

After filtering, the DRM will have a list of histories that:

match your labels
match your chosen sealed state
don't relate to active process instances
and have either created or last modified dates that are older than your retention period

These histories will be deleted. If your schedule is set to delete the terminated process instance, it will also be deleted (and any search indexes updated too).

Form Data

The policies for form data are far simpler that those of history records.

When you set up Form Data Retention Policies, you select the form data type (ie the form) and how long you want to keep data (form submissions) for. When saved form data is older than your retention period, which can be based on its created or last modified date, it is deleted.

Last modified on 17 September 2024

This website uses cookies