Moving Load from Database to Elasticsearch

By moving query load from the database to Elasticsearch, applications can dramatically increase performance and scalability.

It is easy to pinpoint slow queries that need to be migrated from the database to Elasticsearch by monitoring slow queries.

Migrating Content View or Page Provider

This can be done as described in How to Make a Page Provider or Content View Query Elasticsearch Index.

Using a page provider to query the repository makes it easy to tune or override queries.

Migrating an Explicit NXQL Query

If you can not use a page provider and want to migrate code like this:

DocumentModelList docs = session.query(nxql);

Using Elasticsearch you will use a query builder:

ElasticSearchService ess = Framework.getLocalService(ElasticSearchService.class);
DocumentModelList docs = ess.query(new NxQueryBuilder(session).nxql(nxql).limit(-1)); // do we need to load all documents ?

The first difference is that using session.query all the documents are returned while using Elasticsearch there is a default limit of 10 documents. To get all the documents use a limit of -1. Think twice before using the -1 limit: loading all the documents can affect performance especially if the query is dynamic and can match all the documents in the repository. Note that with a limit set to 0 you can get the total results size (using docs.totalSize()) without loading any documents.

Another difference is that documents that are searchable at time **t** may be different between database and Elasticsearch:

When using the repository API, a document is searchable after a modification once there is a session.save() or after the transaction commit for others sessions.
When using Elasticsearch a document is searchable after a modification only when: the transaction is committed AND asynchronous indexing job is done AND Elasticsearch index is refreshed, which happens every second by default.

For instance migrating this code:

doc.setPropertyValue("dc:title", "A new title");
session.saveDocument(doc);
session.save();

docs = session.query("SELECT * FROM Document WHERE dc:title = 'A new title'"); // expect to match "doc"

Can be done like this:

doc.setPropertyValue("dc:title", "A new title");
session.saveDocument(doc);
session.save();

ElasticSearchAdmin esa = Framework.getService(ElasticSearchAdmin.class);
TransactionHelper.commitOrRollbackTransaction();
TransactionHelper.startTransaction();
esa.prepareWaitForIndexing().get(20, TimeUnit.SECONDS); // wait for indexing
esa.refresh(); // explicit refresh

ess.query(new NxQueryBuilder(session).nxql("SELECT * FROM Document WHERE dc:title = 'A new title'").limit(-1)); // "doc" is returned

Obviously there is a write overhead here because we are splitting the transaction and explicitly call a refresh. This can be useful for unit test migration but on normal code you have to decide if it make sense to search documents that are probably already loaded in your context.

Migrating a queryAndFetch Request

Replace the code:

        IterableQueryResult rows = session.queryAndFetch("SELECT ecm:uuid, dc:title FROM Document", NXQL.NXQL);
        ...

With:

        EsResult result = ess.queryAndAggregate(new NxQueryBuilder(session).nxql("SELECT ecm:uuid, dc:title FROM Document").limit(-1));
        IterableQueryResult rows = result.getRows();

And you gain the limit/offset options.

For now the select clause support is limited to scalar properties. See the page Elasticsearch limitations for more information.

Deactivating Database Optimizations

By default there are two optimizations done at the database level: one for the document path search (STARTSWITH operator) and one for the right filtering (ACL Read). They optimize the read request but they have a cost on write operations. Basically they are materializing data (document path and ACL read) using stored procedures.

See VCS configuration documentation to see how to disable pathOptimizations and aclOptimizations.

If you disable these optimizations you will have bad response time on NXQL for non-administrator user and for queries involving the STARTSWITH operator. Again look at the slow queries monitoring and migrate them to Elasticsearch (see above section).

Deactivating Database Full-Text Search

By default full text is indexed at the database level. If you have moved your full-text search to Elasticsearch you don't need to maintain database full-text index and trigger.

By using nuxeo.vcs.fulltext.search.disabled=true option in the nuxeo.conf file, full-text will be extracted and saved into the database, but there will be no full-text index, triggers and duplication overhead.

When disabling Database Full-text Search on an existing instance you have to remove the trigger and index manually, for instance for PostgreSQL:

DROP TRIGGER nx_trig_ft_update ON fulltext;
DROP INDEX fulltext_fulltext_idx;
DROP INDEX fulltext_fulltext_title_idx;

Going further with Elasticsearch

Searching over Multiple Repositories

If you have set up a multi repositories configuration to query over them just use the searchOnAllRepositories option:

docs = ess.query(new NxQueryBuilder(session).nxql(nxql).searchOnAllRepositories());

Using the native Elasticsearch HTTP API

The nuxeo-elasticsearch-http-read-only addon exposes a limited set of Read Only Elasticsearch HTTP REST API, taking in account the Nuxeo authentication and authorization.

See the addon README for more information.