Toggle menu

Core Concepts

Files are stored on disk in an encrypted state, with a corresponding file record stored on the database in the following tables:

File Store Worker Database Tables
 

File Details

The following is the JSON representation of a file record (ie the file details), as stored in the database:

{
    // Unique file identifier used in further requests
    "id": "98ceef29-33d5-4eee-8a5d-25ed07111936",
    // Original file name, as supplied when the file was stored
    "filename": "pusheen.png",
    // MIME Type, as supplied when the file was stored
    "type": "image/png",
    // MD5 hash of the file
    "hash": "ba44f7f8f88543fbf91f9b9eb0217cfe",
    // Size of the file in bytes
    "size": 37029,
    // References to the file
    "references": [{   
        "refType": "OBJECT",
        "refIdentifier": "1"  
    }],
    // When the file record was first created
    "created": "2016-02-22T11:37Z",
    // Optional creator identifier, see 'uploaderId' URL parameter
    "createdBy": "Pusheen",
    // When the file record was last updated
    "lastUpdated": "2016-02-22T11:37Z",
    // Optional updater identifier, see 'uploaderId' URL parameter
    "lastUpdatedBy": "Pusheen"
}

File References

File references exist in the file store to solve the following issue:

  • Multiple items (ITEMA, ITEMB) have an interest in the same file (FILEA)
  • When an item is deleted it must delete files that it references so that they are not left orphaned in the file store taking up disk space
  • Upon being deleted, ITEMA deletes FILEA
  • ITEMB no longer has access to FILEA

To demonstrate how file references solve this issue:

  • FormA has a file upload field and two form actions: DATABASESAVE and STARTWORKFLOW
  • On submit the Form SESSION, DATABASESAVE, and STARTWORKFLOW each add a reference to the file (the form session creates a reference when the page the file upload field is on is submitted, not when the whole form is submitted)
  • When the form session ends the Forms Service issues a deleteFiles call specifying its refDetails. The file still exists, but the SESSION file reference is deleted. Had other references not been added before the Form Session ended (ie the form didn't have any action fields) then the file could be deleted at this point
  • When the workflow instance ends the Workflow worker issues a deleteFiles call specifying its refDetails. The file still exists, but the WORKFLOW file reference is deleted
  • Some time later an iCM admin user deletes the form data saved by the DATABASESAVE field type via iCM. iCM issues a deleteFiles call specifying its refDetails. As this is the last file reference the file can now be deleted from the file store

The JSONRPC methods deleteFiles, getFileDetails, addReferences, and removeReferences all allow an array of file references to be specified via their refDetails parameter rather than an explicit fileID.

Reference Identifiers

Standard platform components structure their file references as follows.

Objects created by form submissions using the database save action use the object type (ie the form name) and the object label, separated by an underscore:

{
    "refIdentifier": "FORM_COPYOFFILEUPLOAD_17853C4D-A07A-4C01-9CEF-3069BD6F8F23",
    "refType": "OBJECT"
}

Workflow processes use the business key:

{
    "refIdentifier": "5753-0922-2693-3183",
    "refType": "WORKFLOW"
}

History records use the five history labels:

{
    "refIdentifier": "\"fileuplaod\"_\"1659-0688-1032-4145\"_null_null_null",
    "refType": "HISTORY"
}

Form sessions use the session ID:

{
    "refIdentifier": "3F532135-7204-48BD-B0CF-5D3F20B2B516",
    "refType": "SESSION"
}

Deleting Files

There is often some confusion when it comes to file references and deleting files. If you are storing a file, it is your responsibility to delete it. Files are not automatically removed from the File Store when all references to it are removed unless you set removeFileIfUnreferenced as true (it defaults to false, see removeReferences). You can also make a deleteFiles call when you have finished working with a file. Your delete call contains a file reference or ID. The delete calls removes your reference and attempts to delete the file. If no other references exist the file will be deleted. As described above, this functionality is built into the other workers (History, Workflow, Forms Service etc) that interact with the File Store so that when their content is deleted (ie a history is deleted, a form session ends) a delete call is also made.

File Encryption

By default files are encrypted before they are stored on the file system. The key and IV are randomly generated per file and stored separately on the database in the file record. In order to decrypt the file you must therefore be in possession of both the encrypted file and the file record.

Encryption Algorithm Details
FamilyKey SizeModePadding
AES128CTRNoPadding

Files are unencrypted upon retrieval via the http/file/<fileId> method. CTR mode was chosen as unlike all other AES modes it allows both the encryption and decryption to be performed in a parallel fashion. In file storage and retrieval tests CTR mode proved to be consistently 3-4 times quicker than other popular modes such as CBC.

File encryption can be disabled via the encryptFiles worker configuration option. Note that setting encryptFiles to false will only affect files uploaded from that point onwards. Files that were previously uploaded while encryptFiles was true will still be encrypted.

File Integrity

Retrieved files are guaranteed to be bit-for-bit identical to that which was stored.

Upon file upload an MD5 hash is taken of the (unencrypted) uploaded file. After the file is encrypted another MD5 hash is taken of the (possibly encrypted) stored file. Both of these hashes are stored with the file record.

Upon file retrieval the integrity of the (possibly encrypted) stored file is validated against the stored file hash, and after decryption the decrypted file's integrity is verified against the hash of the file prior to encryption.

Storage Paths

NTFS slows down considerably when accessing a directory with many files. For this reason files are stored in a directory structure taking the first three characters of the public file id (UUID) into account. As the first three characters of a UUID are 'random' this results in an evenly 'random' distribution of files between the 4096 possible directories.

For example the file 01f5fc00-64c5-4888-a8d4-c614b758374d.zip will be stored at <rootStorageDir>/0/1/f/01f5fc00-64c5-4888-a8d4-c614b758374d.zip.

Last modified on August 11, 2021

Share this page

Facebook icon Twitter icon email icon

Print

print icon