Toggle menu

importData

This method creates a database table and the defined columns, then imports values from a CSV file.

CSV columns are mapped to table columns and a range of filters and formatters can be applied to rows and values. It is also possible to create columns not present in the CSV file.

Naming

All table names are automatically prefixed di_ (data import). Although, if you provide a table name that starts with di_ the prefixing is skipped - you won't end up with di_di_tablename.

Import Version Numbers

When you first perform an import the version number is recorded in the table's di_version column as version 1. Subsequent imports into the same table create additional rows with the version number incremented by 1. The purgeImportData method can be used to tidy up old versions.

Appending Data

It is possible to append data to an existing import without increasing the version number. The existing data must have been imported with allowAppend: true. Subsequent imports should set the appendTo parameter, which is the di_version of the existing data that the import should be added to.

Metadata

Two tables hold metadata for the imports:

  • di_meta_tables_v1 records the names and latest version of each data import table
  • di_meta_versions_v1 records the table name, version, label, import start time and import complete time of each data import. The version number increases each time (non appended) data is imported into a table. The importcompleted column will be NULL if allowAppend: true when the import is performed

Parameters

NameTypeDescription
tablenameStringRequired. The name of the table to create. All table names will be automatically prefixed di_ 
fileStringOne of file or filestoreid must be provided. The path to a local file (local to the server)
filestoreidStringOne of file or filestoreid must be provided. The fileID of a file held in the platform's filestore
filecolumnsArray of objectsRequired. Each object represents a column in the CSV file and holds the column name, index, the tablecolumn mapping etc, see below for more detail
tablecolumnsArray of objectsRequired. The columns to create. Each object includes the column's name, type, size etc, see below for more detail
datasourceStringOptional. If not provided the datasource will be set as iCM datasource appended with _custom, eg icm_custom. The importer is designed to be used with its own datasource and the name of that datasource should be provided with each request - speak to your account manager about getting a datasource set up
labelStringOptional. A label to identify the import in the di_meta tables. If not set the file or filestoreid will be used
columnHeaderBooleanOptional, default: true. Whether or not the CSV has column headers. If it doesn't have headers, set the col property in the filecolumns instead, see below
rowFilterStringOptional. A JavaScript statement to filter the CSV rows as they are imported, see below
delimiterStringOptional, default: ","
batchsizeIntegerOptional, default: 50. Set a value between 50 and 1000
appendToIntegerOptional. An existing import version number to append this data to. If not set the data will be imported with a new version number. allowAppend must be set as true in the data you would like to append to
allowAppendBooleanOptional, default: false. Set true to allow future imports to be appended to this import version
sqlCopyBooleanOptional, default: false. Whether or not to use the PostgreSQL COPY FROM command when performing the import. The file must come from the filestore, or be local to the database server. You may like to experiment with this and compare the performance of the import

File Columns

The filecolumns parameter is an array of objects where each object represents a column in the CSV file.

For example, this represents some data held in a CSV about cars:

"filecolumns":[
    {"name":"ID","tablecolumn":"ID"},
    {"name":"MODEL","tablecolumn":"Model","nullvalue":"","formatters":"trim"},
    {"name":"COLOUR","tablecolumn":"Colour","nullvalue":"","formatters":"trim,lower"}
]

Each object has the following properties.

PropertyTypeDescription
nameStringRequired. The name of the column that's being imported, if the CSV file has a header row and columnHeader: true.  If the CSV doesn't have a header row this name is still used in the rowFilter.

It's possible to read a column from the CSV multiple times and insert the values into different table columns, perhaps with different formatters.

You can also set the name as "NULL" to create table columns that are not present in the CSV file, useful if you want to insert empty columns or, for example, the current date using the "now" formatter
colIntegerOptional. If the CSV file doesn't have a header row, this is the index of the column to read data from. If you set both a name and col, the col is used
tableColumnStringRequired. The table column to import into. Must be defined in the tablecolumns array, see below
nullValueStringOptional. If this value is found in the data it will be inserted as null
formattersStringOptional. One or more named formatters (comma separated) to apply to the data

Formatters

The following formatters can be applied to rows as data is imported.

NameDescription
nowInsert the current date and time as "yyyy-MM-dd HH:mm:ss"
lowerInsert the value in lowercase
upperInsert the value in uppercase
trimInsert the value with whitespace trimmed from both ends
ltrimInsert the value with whitespace trimmed from the start
rtrimInsert the value with whitespace trimmed from the end
capitaliseInsert the value with the first letter capitalised (other letters are first lowercased)
quoteInsert the value wrapped in double quotes
singlequoteInsert the value wrapped in single quotes
nospaceInsert the value with spaces removed
roundInsert the value rounded to the nearest integer

Row Filters

Row filters are JavaScript statements. Only rows in the CSV file that match the filter will be imported.

The statement is provided with variables derived from the name field in each filecolumn. 

Note than any non-alphanumeric characters in column names are replaced with underscores before being passed to the filter engine. Be careful with case sensitivity. For a name of MAG the filter could use MAG, Mag or mag, but not mAg or maG.

For example, this filter would insert rows that have either a value in the "Mag" column that is less than 5 and a value in the "Distance" column that is less than 200, or a value in the "Mag" column greater than 30:

"rowFilter": "(Mag < 5 && Distance < 200) || Mag > 30"

This filter only imports rows that have "Yes" in the "In use?" column:

"rowFilter": "IN_USE_ == 'Yes'"

Table Columns

The tablecolumns parameter is an array of objects where each object represents a column that will be created in the new database table. Note that a di_version column is automatically created in every table as described above.

PropertyTypeDescription
nameStringRequired. The name of the column
typeStringRequired. The data type
sizeIntegerRequired if type is VARCHAR
primaryKeyBooleanOptional. Whether or not this column is the primary key
nullableBooleanOptional. Whether or not the column allows null values
indexStringOptional. If set the data will be added to an index of this name
indexorderStringOptional. The position in the index, eg "1"
includeStringOptional. The names of other table columns to include as non-key columns in the index

Examples

This example imports data from four named columns in the CSV file into four columns in the table. It will be imported as a new version and wouldn't allow future imported data to be appended to it.

{
    "id": 1234567,
    "method": "importData",
    "params": {
        "datasource": "my_datasource",
        "tablename": "di_cars",
        "filestoreid": "fe7bfe91-a4f1-446f-8678-0dc8657f0a64",
        "filecolumns": [
            {"name": "ID", "tablecolumn":"ID"},
            {"name": "MANU", "tablecolumn": "Manu", "nullvalue": "", "formatters": "trim"},
            {"name" :"MODEL", "tablecolumn": "Model", "nullvalue": "", "formatters": "trim"},
            {"name":"COLOUR", "tablecolumn": "Colour", "nullvalue": "", "formatters": "trim"}
        ],
        "tablecolumns": [
            {"name": "ID", "type": "INTEGER", "primaryKey": true},
            {"name": "Manu", "type": "VARCHAR", "size": 60},
            {"name": "Model", "type": "VARCHAR", "size": 25},
            {"name": "Colour", "type": "VARCHAR", "size": 25}
        ],
        "label": "Cars"
    },
    "jsonrpc": "2.0"
}

This example imports postcode data. Only rows that are "in use" will be imported. The import uses column indexes as there are no column headers. Any empty strings in the CSV will be inserted as NULL.

{
    "id": 1234567,
    "method": "importData",
    "params": {
        "datasource": "my_datasource",
        "tablename": "postcodes",
        "filestoreid": "9b181ce4-e3f8-4c9f-913f-d652b176fe89",
        "columnHeader": false,
        "filecolumns": [
            {"name": "Postcode", "col" :1, "tablecolumn": "Postcode"},
            {"name": "Inuse", "col": 2, "tablecolumn": "Inuse"},
            {"name": "Easting", "col": 5, "tablecolumn": "Easting", "nullvalue": ""},
            {"name": "Northing", "col": 6, "tablecolumn": "Northing", "nullvalue": ""},
            {"name": "Latitude", "col": 3, "tablecolumn": "Latitude", "nullvalue": ""},
            {"name": "Longitude", "col": 4, "tablecolumn": "Longitude", "nullvalue": ""}
        ],
        "tablecolumns":[
            {"name": "Postcode", "type": "VARCHAR", "size": 25, "primaryKey": true},
            {"name": "InUse", "type": "VARCHAR", "size": 5},
            {"name": "Easting", "type": "INTEGER", "nullable": true},
            {"name": "Northing", "type": "INTEGER", "nullable": true},
            {"name": "Latitude", "type": "FLOAT", "nullable": true},
            {"name": "Longitude", "type": "FLOAT", "nullable": true}
        ],
        "rowFilter": "INUSE == 'Yes'",
        "batchsize": 500,
        "label": "PL_postcodes"
    },
    "jsonrpc": "2.0"
}

This final example adds the imported data to an existing version and creates columns in the table which don't exist in the CSV file (the three NULL columns) one of which inserts the date and time of the import.

{
    "id": 1234567,
    "method": "importData",
    "params": {
        "datasource": "my_datasource",
        "tablename":"stars10",
        "filestoreid":"4f0bba11-6a8c-4cd9-8381-1e46f4aedb92",
        "columnHeader":true,
        "filecolumns":[
            {"name": "StarID", "tablecolumn": "ID"},
            {"name": "PROPERNAME", "tablecolumn": "StarName", "nullvalue": "-"},
            {"name": "RA", "tablecolumn":" RightAsc"},
            {"name": "DEC", "tablecolumn": "Declination"},
            {"name": "DISTANCE", "tablecolumn": "Distance"},
            {"name": "MAG", "tablecolumn": "Magnitude"},
            {"name": "SPECTRUM", "tablecolumn": "Spectrum1", "formatters": "trim,lower"},
            {"name": "SPECTRUM", "tablecolumn": "Spectrum2", "formatters": "lower", "nullvalue": "-"},
            {"name": "NULL", "tablecolumn": "ImportedLocE"},
            {"name": "NULL", "tablecolumn": "ImportedLocN"},
            {"name": "NULL", "tablecolumn": "ImportedWhen", "formatters": "now"}
        ],
        "tablecolumns":[
            {"name": "ID", "type": "INTEGER", "primaryKey": true},
            {"name": "StarName", "type": "VARCHAR", "size": 60, "nullable": true},
            {"name": "RightAsc", "type": "FLOAT", "index": "radec"},
            {"name": "Declination", "type": "FLOAT", "index": "radec"},
            {"name": "Distance", "type": "FLOAT"},
            {"name": "Magnitude", "type": "FLOAT"},
            {"name": "Spectrum1", "type": "VARCHAR", "size": 30},
            {"name": "Spectrum2", "type": "VARCHAR", "size": 30, "nullable": true},
            {"name": "ImportedLocE", "type": "INTEGER", "nullable": true},
            {"name": "ImportedLocN", "type": "INTEGER", "nullable": true},
            {"name": "ImportedWhen", "type": "DATETIME"}],
        "rowFilter": "(Mag < 5 && Distance < 200) || Mag > 30",
        "batchsize": 20,
        "appendTo": 6,
        "allowAppend": true
    },
    "jsonrpc": "2.0"
}

Last modified on 6 September 2022

Share this page

Facebook icon Twitter icon email icon

Print

print icon