Files are stored on disk in an encrypted state, with a corresponding file record stored on the database in the following tables:
File Details
The following is the JSON representation of a file record (ie the file details), as stored in the database:
{
// Unique file identifier used in further requests
"id": "98ceef29-33d5-4eee-8a5d-25ed07111936",
// Original file name, as supplied when the file was stored
"filename": "pusheen.png",
// MIME Type, as supplied when the file was stored
"type": "image/png",
// MD5 hash of the file
"hash": "ba44f7f8f88543fbf91f9b9eb0217cfe",
// Size of the file in bytes
"size": 37029,
// References to the file
"references": [{
"refType": "OBJECT",
"refIdentifier": "1"
}],
// When the file record was first created
"created": "2016-02-22T11:37Z",
// Optional creator identifier, see 'uploaderId' URL parameter
"createdBy": "Pusheen",
// When the file record was last updated
"lastUpdated": "2016-02-22T11:37Z",
// Optional updater identifier, see 'uploaderId' URL parameter
"lastUpdatedBy": "Pusheen"
}
File References
File references exist in the file store to solve the following issue:
- Multiple items (ITEMA, ITEMB) have an interest in the same file (FILEA)
- When an item is deleted it must delete files that it references so that they are not left orphaned in the file store taking up disk space
- Upon being deleted, ITEMA deletes FILEA
- ITEMB no longer has access to FILEA
To demonstrate how file references solve this issue:
- FormA has a file upload field and two form actions: DATABASESAVE and STARTWORKFLOW
- On submit the Form SESSION, DATABASESAVE, and STARTWORKFLOW each add a reference to the file (the form session creates a reference when the page the file upload field is on is submitted, not when the whole form is submitted)
- When the form session ends the Forms Service issues a deleteFiles call specifying its refDetails. The file still exists, but the SESSION file reference is deleted. Had other references not been added before the Form Session ended (ie the form didn't have any action fields) then the file could be deleted at this point
- When the workflow instance ends the Workflow worker issues a deleteFiles call specifying its refDetails. The file still exists, but the WORKFLOW file reference is deleted
- Some time later an iCM admin user deletes the form data saved by the DATABASESAVE field type via iCM. iCM issues a deleteFiles call specifying its refDetails. As this is the last file reference the file can now be deleted from the file store
The JSONRPC methods deleteFiles, getFileDetails, addReferences, and removeReferences all allow an array of file references to be specified via their refDetails parameter rather than an explicit fileID.
Reference Identifiers
Standard platform components structure their file references as follows.
Objects created by form submissions using the database save action use the object type (ie the form name) and the object label, separated by an underscore:
{
"refIdentifier": "FORM_COPYOFFILEUPLOAD_17853C4D-A07A-4C01-9CEF-3069BD6F8F23",
"refType": "OBJECT"
}
Workflow processes use the business key:
{
"refIdentifier": "5753-0922-2693-3183",
"refType": "WORKFLOW"
}
History records use the five history labels:
{
"refIdentifier": "\"fileuplaod\"_\"1659-0688-1032-4145\"_null_null_null",
"refType": "HISTORY"
}
Form sessions use the session ID:
{
"refIdentifier": "3F532135-7204-48BD-B0CF-5D3F20B2B516",
"refType": "SESSION"
}
Deleting Files
There is often some confusion when it comes to file references and deleting files. If you are storing a file, it is your responsibility to delete it. Files are not automatically removed from the File Store when all references to it are removed unless you set
File Encryption
By default files are encrypted before they are stored on the file system. The key and IV are randomly generated per file and stored separately on the database in the file record. In order to decrypt the file you must therefore be in possession of both the encrypted file and the file record.
Encryption Algorithm Details | |||
Family | Key Size | Mode | Padding |
AES | 128 | CTR | NoPadding |
Files are unencrypted upon retrieval via the http/file/<fileId> method. CTR mode was chosen as unlike all other AES modes it allows both the encryption and decryption to be performed in a parallel fashion. In file storage and retrieval tests CTR mode proved to be consistently 3-4 times quicker than other popular modes such as CBC.
File encryption can be disabled via the encryptFiles worker configuration option. Note that setting encryptFiles to false will only affect files uploaded from that point onwards. Files that were previously uploaded while encryptFiles was true will still be encrypted.
File Integrity
Retrieved files are guaranteed to be bit-for-bit identical to that which was stored.
Upon file upload an MD5 hash is taken of the (unencrypted) uploaded file. After the file is encrypted another MD5 hash is taken of the (possibly encrypted) stored file. Both of these hashes are stored with the file record.
Upon file retrieval the integrity of the (possibly encrypted) stored file is validated against the stored file hash, and after decryption the decrypted file's integrity is verified against the hash of the file prior to encryption.
Storage Paths
NTFS slows down considerably when accessing a directory with many files. For this reason files are stored in a directory structure taking the first three characters of the public file id (UUID) into account. As the first three characters of a UUID are 'random' this results in an evenly 'random' distribution of files between the 4096 possible directories.
For example the file 01f5fc00-64c5-4888-a8d4-c614b758374d.zip will be stored at <rootStorageDir>/0/1/f/01f5fc00-64c5-4888-a8d4-c614b758374d.zip.