indexData(ctx, tablename, version, keytype, mappings, sqlFilter, queryPages, batchsize, commitsize, progresscallback, progressrowcount)
This function indexes data from a table into the SOLR site collection so it can be searched.
Parameters
Name | Type | Description |
---|---|---|
ctx | Struct, required | Database details, see above |
tablename | String, required | The name of the table to index. Name will be prepended with |
version | Integer, required | The |
keytype | String, required | A name for the SOLR collection (SOLR keytype) to index the data into |
mappings | Array of objects, required | An array of mappings. These set which data is indexed into which fields |
mappings[n].primaryKey | Boolean, required | Indicates that this mapping provides the primary key which uniquely identifies SOLR documents. If the rows in your data have a unique identifier, use that. Can also be constructed from a combination of values (see below) |
mappings[n].tableCols | Array of strings, required | The names of columns that will provide the data for this mapping |
mappings[n].solrField | String, optional | The name of the top level field in the SOLR document where this data will be indexed/stored. See the Search Indexing article for a list of field names |
mappings[n].solrObjField | String, optional | The name of the dynamic object field in the SOLR document where this data will be indexed/stored |
mappings[n].type | String, optional | The type of data that will be stored in SOLR. One of "VARCHAR", "INTEGER", "FLOAT", "DATETIME", "LATLONG" |
mappings[n].formatters | Array of function names, optional | An array of callback functions that format a value before it is inserted into the database table. The functions are called in order. Each function must return a formatted version of the data value. These formatters work in the same way as the formatters described for indexing data |
sqlFilter | String, optional | A clause that can be anded to the SQL query to reduce the number of rows that are read from the database and indexed into SOLR |
queryPages | Integer, optional | The number separate queries that will be used to load the data. Defaults to 1 |
batchsize | Integer, optional | The number of rows indexed in each batch. Defaults to 200 |
commitsize | Integer, optional | The number of rows that are indexed before they are committed. Defaults to 1000 |
progresscallback | Function, optional | A callback function which is told how many rows have been processed. The function must not return a value. It takes two augments: nRead (Integer) the number of rows that have been read; nWritten (Integer) the number of rows that have been indexed |
progressrowcount | Integer, optional | The number of rows after which the progresscallback function is called. Defaults to 250 |
Returns
Name | Type | Description |
---|---|---|
nRead | Integer | The number of lines read from the database |
nWritten | integer | The number of rows indexed in SOLR |
Example
In this example we're indexing a version of some data that has already been imported. If you are indexing data at the same time as it is imported you can use the version returned from
<cfset ds=APPLICATION.datasource>
<cfset dt=APPLICATION.databasetype>
<cfset tablename="timstable">
<!--- get a data importer --->
<cfmodule template="/icm/admin/dataimport/importer_v1.cfm" name="di" datasource=#ds# databasetype=#dt#>
<!--- database table to SOLR mappings --->
<cfset table2solr=[
{"tableCols":["FirstName", "LastName", "PostCode"], "primaryKey":true},
{"tableCols":["FirstName"], "solrField":"title", "type":"VARCHAR"},
{"tableCols":["FirstName", "LastName", "PostCode"], "solrField":"summary", "type":"VARCHAR"}
]>
<!--- index data from the table into solr --->
<cfset di.indexData(ctx=#di#,
tablename=#tablename#,
version=14,
keytype="names",
mappings=#table2solr#,
progresscallback=progress,
progressrowcount=200
)>
<cfoutput>...Done<br></cfoutput><cfflush>
<!--- custom progress updates --->
<cffunction name="progress" returntype="Void">
<cfargument name="nRead" type="Numeric" required="yes">
<cfargument name="nWritten" type="Numeric" required="yes">
<cfoutput>Read:#ARGUMENTS.nRead#, Written:#ARGUMENTS.nWritten#<br></cfoutput><cfflush>
</cffunction>
In SOLR each document looks like this:
{
"custom1": "names",
"custom2": "14",
"keyid": "Alesia Katie BL4 7AF",
"nkeyid": 0,
"groupkey": "names(14)Alesia Katie BL4 7AF",
"keytype": "names(14)",
"metadata": "",
"parentdata": "",
"summary": "Alesia Katie BL4 7AF",
"title": "Alesia",
"id": "names(14)Alesia Katie BL4 7AF",
"securitydata": "0",
"OBJECT_C__title": "Alesia"
}