Toggle menu

Search Indexing

When content is created and updated in iCM it is indexed in iCM's SOLR search engine. iCM maintains two SOLR instances, one for searching in iCM, and one for the website. Each instance has several categories of item, known as collections.

CollectionDescriptionAvailable
ArticleThe articles in iCMSite and iCM
MediaItems in the iCM media library and (optionally) the media file componentsSite and iCM
MessageReplies to forum threadsSite and iCM
MetadataMetadata values in iCMSite and iCM
ObjectiCM objectsSite and iCM
PublishedEndpointPublished End PointsiCM only
PublishedFormPublished FormsiCM only
SiteUserWebsite usersSite and iCM
WIPEndpointWork in progress End pointsiCM only
WIPFormWork in progress formsiCM only
WorkflowWorkflow process instance and task data. Use the Process Instance Search Indexing API to query this dataSite only

Indexed Content

The content of each item in a collection is stored in fields defined in SOLR's schema.xml. The schema is part of the default iCM installation and editing it is not supported. The standard fields are:

<field name="keyid" type="string" indexed="true" stored="true" required="true" />
<field name="groupkey" type="string" indexed="true" stored="true" required="false" />
<field name="nkeyid" type="tint" indexed="true" stored="true" required="true" />
<field name="keytype" type="string" indexed="true" stored="true" required="true" />
<field name="id" type="string" indexed="true" stored="true" required="false" />
<field name="custom1" type="commaDelimLowerCase" indexed="true" stored="true" />
<field name="custom2" type="commaDelimLowerCase" indexed="true" stored="true" />
<field name="custom3" type="commaDelimLowerCase" indexed="true" stored="true" required="false" />
<field name="metadata" type="commaDelimLowerCase" indexed="true" stored="true" />
<field name="parentdata" type="commaDelimLowerCase" indexed="true" stored="true" />
<field name="title" type="text" indexed="true" stored="true" omitNorms="false" />
<field name="summary" type="text" indexed="true" stored="true" omitNorms="false" />
<field name="body" type="text" indexed="true" stored="false" />
<field name="keywords" type="textKeywords" indexed="true" stored="false" />
<field name="url" type="objectRawText" indexed="true" stored="true" />
<field name="spell" type="textSpell" indexed="true" stored="false" multiValued="true" />
<field name="creationdate" type="tdate" indexed="true" stored="true" required="false" />
<field name="modificationdate" type="tdate" indexed="true" stored="true" required="false" />
<field name="displaystartdate" type="tdate" indexed="true" stored="true" required="false" />
<field name="displayenddate" type="tdate" indexed="true" stored="true" required="false" />
<field name="securitydata" type="string" indexed="true" stored="true" required="false" />

The spell field collects values from other fields to create an index which can be used for spellchecking.

<copyField source="title" dest="spell" />
<copyField source="summary" dest="spell" />
<copyField source="body" dest="spell" />
<copyField source="keywords" dest="spell" />

There are also a set of dynamic fields, generally used to store iCM object data (whether article extras, object properties, form fields, user profile information etc). They are dynamic because any number of fields could be present:

<dynamicField name="OBJECT_TEXT_*" type="objectText" indexed="true" stored="false" />
<dynamicField name="OBJECT_RAWC_*" type="objectRawText" indexed="true" stored="false" />
<dynamicField name="OBJECT_C_*" type="textKeywords" indexed="true" stored="true" />
<dynamicField name="OBJECT_I_*" type="tint" indexed="true" stored="true" />
<dynamicField name="OBJECT_N_*" type="tdouble" indexed="true" stored="true" />
<dynamicField name="OBJECT_DT_*" type="tdate" indexed="true" stored="true" />
<dynamicField name="OBJECT_SF_*" type="searchField" indexed="true" stored="true" />
<dynamicField name="OBJECT_LL_*" type="latlong" indexed="true" stored="true" />
<dynamicField name="OBJECT_CODE_*" type="textCode" indexed="true" stored="false" />
<dynamicField name="OBJECT_ASSETLIST_*" type="commaDelimLowerCase" indexed="true" stored="true" />
<dynamicField name="OBJECT_ASSET_*" type="tint" indexed="true" stored="true" />
<dynamicField name="OBJECT_FILE_*" type="objectRawText" indexed="false" stored="false" />
<dynamicField name="ignored_*" type="ignored" multiValued="true" />
<dynamicField name="attr_*" type="ignored" indexed="true" stored="true" multiValued="true" />
<dynamicField name="random*" type="random" indexed="true" stored="false" />

OBJECT_RAWC_* is a copy of OBJECT_C_*. It is indexed differently. The values in C_ are broken down into individual tokens (for keyword stemming etc), those in RAWC_ aren't.

Indexed vs Stored

Fields in the schema have indexed and stored properties.

indexed="true" makes the field searchable. Values will be indexed and you can use them for things like faceting. If a field is not indexed then you won't be able to search for the values stored in it. All of the fields above are indexed apart from OBJECT_FILE_* (This field is not currently used or supported, but may in future developments hold references to files in the filestore)

stored="true" means the actual values of the field will be returned in your results.

For example, an article stores its articleid, articleheading, articleintrotext, articletext, ArticleLinkText, ArticleSummary, ExtraData, MetaDataNames and FriendlyURL in the body field. The body field has indexed="true" but stored="false". This means all of those bits of an article are indexed and can be searched for, but that field is never returned directly in a search result. The article does store some of those values in other fields (eg heading in the title field, summary in the summary field) which are stored, so do return values in the search results.

The following sections detail the content, fields and values indexed and stored for each collection.

Article

Standard Fields

FieldStoredDescription
keyidtrueThe article ID as a string
groupkeytrueArticle+keyid, eg Article123
nkeyidtrueThe article ID as an integer
keytypetrueThe type of item - ie Article
idtrueCombined keytype and keyid
custom1trueThe article parentdata, see below
custom2trueThe article metadata, see below
custom3trueThe ID of the template this article is using
metadatatrueThe IDs of any metadata values related to this article. String of comma separated values.
parentdatatrueThe current article ID and ancestor IDs in the article tree, back to 0. Hidden articles (Display:Off or with display date properties outside of the current date) don't appear. String of comma separated values
titletrueThe article heading
summarytrueThe article introductory and body text. This will be replaced by the article summary text if there is any. The summary has a maximum length of 997 characters
bodyfalseThe articleid, articleheading, articleintrotext, articletext, ArticleLinkText, ArticleSummary, ExtraData, MetaDataNames and FriendlyURL
keywordsfalseAny "boosted keywords" added to an article using the "Search" tab
urltrueThe friendly URL
spellfalseThe combined title, summary, body and keyword fields
creationdatetrueThe article creation date
modificationdatetrueThe article last modified date
displaystartdatetrueThe article display start date
displayenddatetrueThe article display end date
securitydatatrueA list of codes that indicate which users have access to this article

Dynamic Fields

Articles always include the OBJECT_C__title (indexed and stored) and OBJECT_RAWC__title (indexed, not stored) fields. OBJECT_C_ is split into individual tokens so the words can be stemmed etc. The difference between the two is most easily seen when creating facets using CSSearchMultiple: 

"OBJECT_RAWC__title": {
    "parentsubsite home page": 1,
    "another home": 1,
    "file upload": 1,
    "javasite home page": 1
},
"OBJECT_C__title": {
    "file": 1,
    "upload": 1,
    "another": 1,
    "javasite": 1,
    "page": 2,
    "home": 3,
    "parentsubsite": 1
}

Remaining dynamic fields are used to index and store values entered via the article extras. The field type used by each field on the extra form determines the dynamic field used to store its value. Names are built from the dynamic type prefix and the name of the field in the form designer. For example, here's an article with two fields on its extra form, a text field called "SOMETEXTFIELD" and a location picker field called "LOCATIONEXTRAS"

"OBJECT_C_SOMETEXTFIELD": "some text",
"OBJECT_C__title": "My Best Article",
"OBJECT_LL_LOCATIONEXTRAS": "51.4930534589,-0.1301716614"

Media

The media collection holds two distinct, but linked, types of item. The first are the media items themselves, created and managed in iCM. The second are the actual physical media files associated with an item (a media item's components). Files are only indexed if a media type definition has the "Index files" checkbox checked. This means that every media item will have an entry, but not all items will have indexed files (nor should they). Some items may have several indexed files. Media items and their component files are linked by matching groupkeys.

Standard Fields - Media Items

FieldStoredDescription
keyidtrueThe media ID as a string
groupkeytrueMedia+keyid, eg Media123
nkeyidtrueThe media ID as an integer
keytypetrueThe type of item - ie Media
idtrueCombined keytype and keyid
custom1trueThe media parentdata, see below
custom2trueThe group the media item is in, followed by the ID of its type
custom3trueNot used
metadatatrueThe IDs of any metadata values related to this item. String of comma separated values.
parentdatatrueThe ID of the group and ancestor groups this item is in, back to 0. String of comma separated values
titletrueThe title of the media item
summarytrueKeywords, title, description and metadata values.
bodyfalseThe mediaid, keywords, title, description and MetaData
keywordsfalseNot used (the "keywords" added to a media item are stored in the summary and body fields)
urltrueNot used
spellfalseNot used
creationdatetrueThe media item creation date
modificationdatetrueThe media item last modified date
displaystartdatetrueThe media item display start date
displayenddatetrueThe media item display end date
securitydatatrueA list of codes that indicate which users have access to this article

Dynamic Fields - Media Items

Media items only include the OBJECT_C__title (indexed and stored) and OBJECT_RAWC__title (indexed, not stored) fields.

Standard Fields - Indexed Files

Indexed files take many of their field values from the parent media item they are a component of.

FieldStoredDescription
keyidtrueThe filepath
groupkeytrueMedia+ the ID of the media item this file is a component of, eg Media123. This creates the link between an indexed item and its components
nkeyidtrueAlways 0
keytypetrueThe type of item - ie Media
idtrueCombined keytype and keyid
custom1trueAlways "MediaFile"
custom2trueThe ID of the media item this file is a component of
custom3trueThe group the media item is in, followed by the ID of its type (ie the custom2 field of the parent media item)
metadatatrueNot used
parentdatatrueThe ID of the group and ancestor groups this item is in, back to 0. String of comma separated values
titletrueNot used
summarytrueNot used
bodyfalseNot used
keywordsfalseNot used
urltrueThe URL of this media item accessed via iCM's mediaaccess.cfm eg http://timssite/enterprise/icm/mediaaccess.cfm?file=/image/p/c/528927-country-pub-warwickshire.jpeg
spellfalseNot used
creationdatetrueThe media parent item's creation date
modificationdatetrueThe media parent item's last modified date
displaystartdatetrueThe media parent item's display start date
displayenddatetrueThe media parent item's display end date
securitydatatrueNot used (the parent item is secured, not the individual files)

Dynamic Fields - Indexed Files

Media files only include the OBJECT_C__title (indexed and stored) and OBJECT_RAWC__title (indexed, not stored) fields. Both of these are taken from the parent item.

Message

Standard Fields

FieldStoredDescription
keyidtrueThe message ID as a string
groupkeytrueMessage+keyid, eg Message123
nkeyidtrueThe message ID as an integer
keytypetrueThe type of item - ie Message
idtrueCombined keytype and keyid
custom1trueThe parent thread ID
custom2trueThe parent forum ID
custom3trueNot used
metadatatrueNot used
parentdatatrueThe ID of the group and ancestor groups this message is in, back to 0. String of comma separated values
titletrueThe message subject
summarytrueWho the message is "from" followed by the message body
bodyfalseThe MessageID, Subject, UserName and Body
keywordsfalseNot used
urltrueNot used
spellfalseNot used
creationdatetrueThe message creation date
modificationdatetrueThe message last modified date
displaystartdatetrueNot used
displayenddatetrueNot used
securitydatatrueNot used

Dynamic Fields

Messages include five dynamic fields. OBJECT_RAWC__title is indexed but not stored. The remaining four are indexed and stored:

"OBJECT_C__title": "The message subject",
"OBJECT_I_MESSAGEPARENT": 4,
"OBJECT_C_MESSAGEBODY": "The body of the message",
"OBJECT_C_MESSAGEUSERNAME": "username"

"OBJECT_I_MESSAGEPARENT" is the ID of a message, should this message be a reply to an existing message. It will be 0 if the message is a "top level" message in a thread.

Metadata

Standard Fields

FieldStoredDescription
keyidtrueThe metadata ID as a string
groupkeytrueMetadata+keyid, eg Metadata123
nkeyidtrueThe metadata ID as an integer
keytypetrueThe type of item - ie Metadata
idtrueCombined keytype and keyid
custom1trueThe parentdata
custom2trueThe schema name and schema ID, comma separated
custom3trueNot used
metadatatrueNot used
parentdatatrueThe ID of the property, group and ancestor groups this value is in. String of comma separated values
titletrueThe metadata property and value, colon separated, eg "colour : red"
summarytrueThe metadata value, description, local terms, schema name, schema ID and known terms
bodyfalseAs per the summary, plus the value ID
keywordsfalseNot used
urltrueNot used
spellfalseNot used
creationdatetrueNot used
modificationdatetrueNot used
displaystartdatetrueNot used
displayenddatetrueNot used
securitydatatrueNot used

Dynamic Fields

Messages only include the OBJECT_C__title (indexed and stored) and OBJECT_RAWC__title (indexed, not stored) fields.

Search Fields

When creating metadata groups and properties you can add a "Search Field" name. When metadata values from a group or property that has a search field is related to another iCM content item (an article for example), the values will be indexed as a dynamic "OBJECT_SF_*" field, using the name set in the group/property. See Understanding Metadata Properties and Values for an example.

This functionality is intended to be used for faceting. When searching against this field you can use a pipe (|) delimited set of values, which won't be split into tokens.

Object

The most common objects you search for will be those created by form submissions that include a "database save action". Fields you'd like to search on (ie the properties of the object) need to be set as searchable in your form design. If you form uses an external type definition, the settings in the external definition determine whether a field is searchable.

Standard Fields

FieldStoredDescription
keyidtrueThe object ID as a string
groupkeytrueObject+keyid, eg Object123
nkeyidtrueThe object ID as an integer
keytypetrueThe type of item - ie Object
idtrueCombined keytype and keyid
custom1trueThe object's type
custom2true"0" private types, "1" for public types
custom3trueNot used
metadatatrueNot used
parentdatatrueNot used
titletrueThe type, date-time created and created by
summarytrueThe type and label
bodyfalseThe values from properties of the object set as searchable
keywordsfalseNot used
urltrueThe object's label
spellfalseNot used
creationdatetrueThe object's creation date
modificationdatetrueThe object's last modified date
displaystartdatetrueNot used
displayenddatetrueNot used
securitydatatrueNot used

Dynamic Fields

The dynamic fields of an indexed object are those properties that have values and are set as searchable. This object, called PERSONNAME has two properties, LASTNAME and FIRSTNAME that are searchable and have values.

"OBJECT_C__icmTypeName": "PERSONNAME",
"OBJECT_C__icmCreatedBy": "UNKNOWN",
"OBJECT_C__icmPublicType": "1",
"OBJECT_C__title": "PERSONNAME,{ts '2017-05-16 12:22:01'},UNKNOWN",
"OBJECT_C_LASTNAME": "Gulliver",
"OBJECT_C_FIRSTNAME": "Tim",
"OBJECT_C__icmLastUpdatedBy": "UNKNOWN"

All objects also include the same five standard dynamic fields - the lowercase fields above.

Site User

Note that only site account users are fully indexed. The various login provider accounts associated with a site user are not. However, the login provider usernames linked to an account are indexed in the OBJECT_K___loginusernames field of their parent account. For many providers this value will be a completely unknown ID, however searching by this field may be useful for users created via LDAP or ADFS where usernames are known.

Standard Fields

FieldStoredDescription
keyidtrueThe user ID as a string
groupkeytrueSiteUser+keyid, eg SiteUser123
nkeyidtrueThe user ID as an integer
keytypetrueThe type of item - ie SiteUser
idtrueCombined keytype and keyid
custom1trueThe ID of the userprofile object associated with this user's profile
custom2trueA comma separated list of groups IDs this user is a member of (doesn't seem to return though)
custom3trueWhether the user is disabled (1) or not (0)
metadatatrueNot used
parentdatatrueNot used
titletrueThe username
summarytrueThe description, or the OBJECT_C__searchableFields if the description is blank
bodyfalseThe values from properties of the userprofile object set as searchable
keywordsfalseNot used
urltrueNot used
spellfalseNot used
creationdatetrueThe site user creation date
modificationdatetrueThe site user last modified date. A user's last successful login will also update this field
displaystartdatetrueNot used
displayenddatetrueNot used
securitydatatrueNot used

Dynamic Fields

In a similar way to Objects, items in the SiteUser collection index any properties of a user's userprofile that have been set as searchable. The OBJECT_C___searchableFields property includes all of the searchable fields in the profile. OBJECT_C___prefusername and OBJECT_C___email correspond to the prefname and email fields set in your User Settings.

For example, in the fields below OBJECT_C_EMAIL is the actual name of the searchable field in the user profile form that a user enters their email into. OBJECT_C___email is the value of the field iCM has been told holds the email address. This entry will always be called OBJECT_C___email, no matter what the actual name of the field in the form design is. The same is true for OBJECT_C___prefusername.

"OBJECT_C_FAMILYNAME": "Gulliver",
"OBJECT_C_DISPLAYNAME": "Tim Gulliver",
"OBJECT_C_PREFNAME": "Tim"
"OBJECT_C__icmCreatedBy": "ADMIN",
"OBJECT_C_EMAIL": "support@gossinteractive.com",
"OBJECT_C_GIVENNAMES": "Tim",
"OBJECT_C__title": "TIMG",
"OBJECT_C__icmLastUpdatedBy": "ADMIN",
"OBJECT_C___displayusername": "Tim"
"OBJECT_C___searchableFields": "Tim Gulliver support@gossinteractive.com"
"OBJECT_K___loginusernames": "G_112233445566778899",
"OBJECT_C___email": "support@gossinteractive.com",
"OBJECT_C___prefusername": "Prefname",
"OBJECT_DT___loginuserlastloggedin": "2023-03-09T14:05:33Z"

Last modified on 17 August 2023

Share this page

Facebook icon Twitter icon email icon

Print

print icon