Toggle menu

Data Storage - Strengths and Weaknesses

Each of the platform's storage options has been built for a specific purpose, which means they have their own strengths and weaknesses. Being aware of them will help you choose the correct storage for the service you are building.

Summary

StorageQuery TypeRead Only/EditablePersistence
HistorySQL StyleRead Only (once written)Very long (Until deleted - Deletion policies set by the data retention manager)
Form Data/ObjectsSolr SearchEditableLong (Until deleted)
WorkflowSQL + SolrEditableMedium (Length of process + months)
Session StoreNoneEditableShort (Session length)

History

History is an immutable store of data (although complete histories can be deleted). A history will consume and store any shape of JSON object. Once stored you can then query for any part of that JSON object. Over time the database backing this will end up with tens of millions of rows, as the JSON object is deconstructed into name and value pairs - that's by design. Think of it as schema on read. The data is stored as a stream of time based events against a 5 key index. A history can be sealed which means that no more data can be written into that history.

Strengths and Weaknesses

  • Once written they cannot be changed so good for auditing and compliance
  • Is a date organised stream of events, that can be granularly queried by any element in the JSON packet
  • Focused on fast writes, or a very specific reading of an event/history
  • Not currently backed by a search index
  • Can have poor performance if doing wide ranging report-type queries, do not use it for this
  • If asked to return all data for all time, it will attempt this, often at the expense of the database server's health
  • Is one of the platform's main transaction engines, so do not use it for reporting (read replica coming soon)
  • Data can be "digested" into other tables for reporting or general SQL queries. If you have to produce lots of digests you are probably using the wrong storage type

Form Data and Objects

Objects provide Read-Write object (JSON) based storage, for mutable data. The objects table has one row per object, and schemas are simply created using the forms designer. The database is backed by a specialised search index (currently a Solr index). Any field can be marked as searchable. Fields can also be marked as geographic coordinates for searching. Due to the search index you are able to bucket and facet data very quickly.

Strengths and Weaknesses

  • Quick and simple to store, read and search. The search interface is particularly quick for range queries or facets. Much more difficult to focus on specific field value pairs as you will need to use the search rather than direct matching
  • Solr indexing supports searches by geospatial coordinates
  • The search is eventually consistent, so there can be several seconds lag as the index updates (depending on the queue).
  • The data can be accessed within iCM using the form data browser, due to this it is easy to set permissions to give access to certain iCM users
  • The form data browser allows data to be exported as XML
  • Lack of an audit for the data changing - only the last update time and user is recorded
  • If the schema changes, the old data is not modified, however data can be coerced into the new schema by reading and re-writing it. So new data will be given it's default values and data with no place in the schema will simply not then be stored. So if you really needed to ensure all data matches the current schema, you would have to read and write all the data

User and User Profiles

A user's core properties are stored in database tables (including things like display name, email address, last login) whilst the user's profile is backed by objects. This is a mutable data store, where data will live as long as the user is active. Users can be marked as inactive.

Strengths and Weaknesses

  • Data that needs very quick common access is stored in the database columns, whilst the flexibility and searchability of objects powers the user profile
  • Some data is duplicated in the profile object for ease of use and search capability
  • Data is mutable and does not have a long term audit, this is both a strength and a weakness for GDPR compliance
  • Users can also be linked to one another, links will have metadata describing their purpose

Workflow

The workflow engine is also backed by a database and search index, it is also a mutable store. Workflow will store information about the process instance and also a limited set of data about user tasks. The workflow engine is a transaction engine therefore is designed to hold the data whilst the interactions are in flight or for a short time afterwards. It is not designed to be the long term storage of data. For further performance optimisation internally the workflow organises its data by active processes and complete (historic) processes.  It will store limited data in fixed columns on the database and the rest in a key-value style.

Strengths and Weaknesses

  • Can handle a large number of concurrent transactions
  • Older transactions are completely removed to ensure the transaction speed is not bogged down by years of historic data
  • Solr indexing supports searches by geospatial coordinates
  • Changes during a process can be written to a history
  • Due to the nature of the storage the more process variables you ask for when querying the more the impact on performance - only ask for what you need
  • The querying of data with workflow is extremely powerful and flexible, you also have the capacity to do wide ranging searches.
  • Workflow queries (not searches) will truncate process variables returned to 20,000, this is still an awful lot of data so ensure your queries are focused in order to not materially impact the database instance

Secure Filestore

The filestore is a binary blob store, mostly used by other components of the platform. For example, if workflow or forms need to store binary objects they will offload the storage to the filestore and create a reference to the binary object that is stored. The filestore will take the binary blob, encrypt it and then keep it stored whilst there are references to it, once all references are removed it is likely to be deleted during the next clean up. You can add and remove references via the API.

Strengths and Weaknesses

  • Allows the other systems to not have to worry about large binary objects
  • The content is encrypted on disk, so virus checking the encrypted object is pointless as none of the virus signatures would match
  • Due to the files being encrypted they cannot be executed on the server where they are stored
Last modified on 16 May 2023

Share this page

Facebook icon Twitter icon email icon

Print

print icon