Toggle menu

indexData

indexData(ctx, tablename, version, keytype, mappings, sqlFilter, queryPages, batchsize, commitsize, progresscallback, progressrowcount)

This function indexes data from a table into the SOLR site collection so it can be searched.

Parameters

NameTypeDescription
ctxStruct, requiredDatabase details, see above
tablenameString, requiredThe name of the table to index. Name will be prepended with DI_
versionInteger, requiredThe DI_version number of the data to index
keytypeString, requiredA name for the SOLR collection (SOLR keytype) to index the data into
mappingsArray of objects, requiredAn array of mappings. These set which data is indexed into which fields
mappings[n].primaryKeyBoolean, requiredIndicates that this mapping provides the primary key which uniquely identifies SOLR documents. If the rows in your data have a unique identifier, use that. Can also be constructed from a combination of values (see below)
mappings[n].tableColsArray of strings, requiredThe names of columns that will provide the data for this mapping
mappings[n].solrFieldString, optionalThe name of the top level field in the SOLR document where this data will be indexed/stored. See the Search Indexing article for a list of field names
mappings[n].solrObjFieldString, optionalThe name of the dynamic object field in the SOLR document where this data will be indexed/stored
mappings[n].typeString, optionalThe type of data that will be stored in SOLR. One of "VARCHAR", "INTEGER", "FLOAT", "DATETIME", "LATLONG"
mappings[n].formattersArray of function names, optionalAn array of callback functions that format a value before it is inserted into the database table. The functions are called in order. Each function must return a formatted version of the data value. These formatters work in the same way as the formatters described for indexing data
sqlFilterString, optionalA clause that can be anded to the SQL query to reduce the number of rows that are read from the database and indexed into SOLR
queryPagesInteger, optionalThe number separate queries that will be used to load the data. Defaults to 1
batchsizeInteger, optionalThe number of rows indexed in each batch. Defaults to 200
commitsizeInteger, optionalThe number of rows that are indexed before they are committed. Defaults to 1000
progresscallbackFunction, optionalA callback function which is told how many rows have been processed. The function must not return a value. It takes two augments: nRead (Integer) the number of rows that have been read; nWritten (Integer) the number of rows that have been indexed
progressrowcountInteger, optionalThe number of rows after which the progresscallback function is called. Defaults to 250

Returns

NameTypeDescription
nReadIntegerThe number of lines read from the database
nWrittenintegerThe number of rows indexed in SOLR

Example

In this example we're indexing a version of some data that has already been imported. If you are indexing data at the same time as it is imported you can use the version returned from importData. A progress callback has been included that will output a count of rows indexed.

<cfset ds=APPLICATION.datasource>
<cfset dt=APPLICATION.databasetype>
<cfset tablename="timstable">
<!--- get a data importer --->
<cfmodule template="/icm/admin/dataimport/importer_v1.cfm" name="di" datasource=#ds# databasetype=#dt#>
<!--- database table to SOLR mappings --->
<cfset table2solr=[
    {"tableCols":["FirstName", "LastName", "PostCode"], "primaryKey":true},
    {"tableCols":["FirstName"], "solrField":"title", "type":"VARCHAR"},
    {"tableCols":["FirstName", "LastName", "PostCode"], "solrField":"summary", "type":"VARCHAR"}
]>
<!--- index data from the table into solr --->
<cfset di.indexData(ctx=#di#,
        tablename=#tablename#,
        version=14,
        keytype="names",
        mappings=#table2solr#,
        progresscallback=progress,
        progressrowcount=200
)>
<cfoutput>...Done<br></cfoutput><cfflush>
<!--- custom progress updates --->
<cffunction name="progress" returntype="Void">
    <cfargument name="nRead" type="Numeric" required="yes">
    <cfargument name="nWritten" type="Numeric" required="yes">
    <cfoutput>Read:#ARGUMENTS.nRead#, Written:#ARGUMENTS.nWritten#<br></cfoutput><cfflush>
</cffunction>

In SOLR each document looks like this:

{
    "custom1": "names",
    "custom2": "14",
    "keyid": "Alesia Katie BL4 7AF",
    "nkeyid": 0,
    "groupkey": "names(14)Alesia Katie BL4 7AF",
    "keytype": "names(14)",
    "metadata": "",
    "parentdata": "",
    "summary": "Alesia Katie BL4 7AF",
    "title": "Alesia",
    "id": "names(14)Alesia Katie BL4 7AF",
    "securitydata": "0",
    "OBJECT_C__title": "Alesia"
}

Last modified on September 30, 2022

Share this page

Facebook icon Twitter icon email icon

Print

print icon