Moving Load from Database to Elasticsearch

By moving query load from the database to Elasticsearch, applications can drastically increase performance and scalability.

It is easy to pinpoint slow queries that need to be migrated from the database to Elasticsearch by monitoring slow queries.

Migrating Content View or Page Provider

This can be done as described in How to Make a Page Provider or Content View Query Elasticsearch Index.

Using a page provider to query the repository makes it easy to tune or override queries.

Migrating an Explicit NXQL Query

If you can not use a page provider and want to migrate code like this:

DocumentModelList docs = session.query(nxql);

Using Elasticsearch you will use a query builder:

ElasticSearchService ess = Framework.getService(ElasticSearchService.class);
DocumentModelList docs = ess.query(new NxQueryBuilder(session).nxql(nxql).limit(10000));

The first difference is that using session.queryall the documents are returned while using Elasticsearch there is a default limit of 10 documents and a maximum of 10000, see index.max_result_window section. To get all the documents use the scroll API. Note that with a limit set to 0 you can get the total results size (using docs.totalSize()) without loading any documents.

Another difference is that documents that are searchable at time **t** may be different between database and Elasticsearch:

When using the repository API, a document is searchable after a modification once there is a session.save() or after the transaction commit for others sessions.
When using Elasticsearch a document is searchable after a modification only when: the transaction is committed AND asynchronous indexing job is done AND Elasticsearch index is refreshed, which happens every second by default.

For instance migrating this code:

doc.setPropertyValue("dc:title", "A new title");
session.saveDocument(doc);
session.save();

docs = session.query("SELECT * FROM Document WHERE dc:title = 'A new title'"); // expect to match "doc"

Can be done like this:

doc.setPropertyValue("dc:title", "A new title");
session.saveDocument(doc);
session.save();

ElasticSearchAdmin esa = Framework.getService(ElasticSearchAdmin.class);
TransactionHelper.commitOrRollbackTransaction();
TransactionHelper.startTransaction();
esa.prepareWaitForIndexing().get(20, TimeUnit.SECONDS); // wait for indexing
esa.refresh(); // explicit refresh

ess.query(new NxQueryBuilder(session).nxql("SELECT * FROM Document WHERE dc:title = 'A new title'")); // "doc" is returned

Obviously there is a write overhead here because we are splitting the transaction and explicitly call a refresh. This can be useful for unit test migration but on normal code you have to decide if it make sense to search documents that are probably already loaded in your context.

Migrating a queryAndFetch Request

Replace the code:

IterableQueryResult rows = session.queryAndFetch("SELECT ecm:uuid, dc:title FROM Document", NXQL.NXQL);
...

With:

EsResult result = ess.queryAndAggregate(new NxQueryBuilder(session).nxql("SELECT ecm:uuid, dc:title FROM Document").limit(10000));
IterableQueryResult rows = result.getRows();

And you gain the limit/offset options.

For now the select clause support is limited to scalar properties. See the page Elasticsearch limitations for more information.

Migrating a Request using scroll API

Since Elasticsearch 2.x, the engine rejects request where from + size > 10000. You can change this value by changing index.max_result_window, but it is highly unadvised. If you need to get all documents from Elasticsearch, you should use scroll API.

To do that, we can use ElasticSearchService#scroll(NxQueryBuilder, long) and ElasticSearchService#scroll(EsScrollResult).

For instance, with the previous query to get documents:

ElasticSearchService ess = Framework.getService(ElasticSearchService.class);

// Perform initial search and get first batch of 20 results
String query = "SELECT * FROM Document WHERE dc:title = 'A new title'";
EsScrollResult scrollResult = ess.scroll(new NxQueryBuilder(session).nxql(query).limit(20), 10000);

while (!scrollResult.getDocuments().isEmpty()) {
    DocumentModelList batchOfDocs = scrollResult.getDocuments();
    for (DocumentModel doc : batchOfDocs) {
        // Process document
        ...
    }
    // Get next batch of results
    scrollResult = ess.scroll(scrollResult);
}

Limit given to NxQueryBuilder represents the size of each batch retrieved with scroll API. The keepAlive parameter in milliseconds only needs to be long enough to perform the next scroll query.

In a context of IterableQueryResult, you can use EsIterableQueryResultImpl class to get an IterableQueryResult relying on scroll API. For example:

String query = "SELECT ecm:uuid, dc:title FROM Document";
EsScrollResult scrollResult = ess.scroll(new NxQueryBuilder(session).nxql(query).limit(20), 10000);
IterableQueryResult rows = new EsIterableQueryResultImpl(ess, scrollResult);

Deactivating Database Optimizations

By default there are two optimizations done at the database level: one for the document path search (STARTSWITH operator) and one for the right filtering (ACL Read). They optimize the read request but they have a cost on write operations. Basically they are materializing data (document path and ACL read) using stored procedures.

See VCS configuration documentation to see how to disable pathOptimizations and aclOptimizations.

If you disable these optimizations you will have bad response time on NXQL for non-administrator user and for queries involving the STARTSWITH operator. Again look at the slow queries monitoring and migrate them to Elasticsearch (see above section).

Deactivating Database Full-Text Search

By default full text is indexed at the database level. If you have moved your full-text search to Elasticsearch you don't need to maintain database full-text index and trigger.

By using nuxeo.vcs.fulltext.search.disabled=true option in the nuxeo.conf file, full-text will be extracted and saved into the database, but there will be no full-text index, triggers and duplication overhead.

When disabling Database Full-text Search on an existing instance you have to remove the trigger and index manually, for instance for PostgreSQL:

DROP TRIGGER nx_trig_ft_update ON fulltext;
DROP INDEX fulltext_fulltext_idx;
DROP INDEX fulltext_fulltext_title_idx;

Going Further with Elasticsearch

Searching over Multiple Repositories

If you have set up a multi repositories configuration to query over them just use the searchOnAllRepositories option:

docs = ess.query(new NxQueryBuilder(session).nxql(nxql).searchOnAllRepositories());

Using the native Elasticsearch HTTP API

The nuxeo-elasticsearch-http-read-only addon exposes a limited set of Read Only Elasticsearch HTTP REST API, taking in account the Nuxeo authentication and authorization.

See the addon README for more information.