Server

Elasticsearch Indexing Logic

Updated: March 18, 2024

Hyland University
Watch the related courses on Hyland University

Indexing

When manipulating a session to create, update or delete documents, a synchronous listener stacks the indexing commands to process. These commands are factorized and are processed either in an asynchronous job or at post commit time.

Post Commit Mode

If the commands are recorded on a UI thread (a thread used to render a JSF page for instance) the commands are treated in post commit. This means that after the transaction is committed the indexing command are sent to Elasticsearch and a refresh operation is also send to make the indexed documents available to the next query. This approach give a real time indexing appearance. A document that is created by an action is searchable on the next action.

Asynchronous Mode

The asynchronous mode will process the commands and not send any refresh operation so they are treated in near real time (~1s after the indexing command is send).

Recursive Commands

A command can be on a single document or applied to its children (recursive). So the number of command processed reported in the Admin tab doesn't have to match the number of document processed.

Recursive command that are triggered when moving a folder or changing an ACL are not treated in post commit listener. Only the first level is treated in post commit the recursive indexing is done asynchronously.

JSON Document

When indexing a document the Nuxeo Platform sends a JSON representation to be indexed. For now a creation or an update command submits the complete document. The JSON document can be viewed in the _source field of the Elasticsearch document. The _source contains all the fields.

Searching and Limitations

NXQL Queries

A NXQL query can be translated to Elasticsearch query with some limitations. See the page NXQL documentation.

When the query does not specify an ordering, the results are sorted by descending order of relevance as described in Elasticsearch documentation. There are multiple ways to tune relevance:

Operators and Mapping

Some operators need an explicit mapping to work properly. This is the case for FULLTEXT, LIKE and ILIKE operators (STARTSWITH for ecm:path has a special mapping setup by default). See the page Configuring the Elasticsearch Mapping for more information.

Security and ACLs

The security clause is automatically added to match the principal and its groups. Each document contains the list of the users or groups that have permission to browse the document.

Only the simplified ACL is supported with Elasticsearch (this is the default security mode since 6.0). Simplified ACL means we only handle DENY on Everyone (block all rights) and not DENY on principals.