Moving Load from Database to Search Engine

By moving query load from the database to a search engine (Elasticsearch, OpenSearch, ...), applications can drastically increase performance and scalability.

It is easy to pinpoint slow queries that need to be migrated from the database by monitoring slow queries.

Migrating Content View or Page Provider

This can be done as described in How to Make a Page Provider or Content View Query Elasticsearch Index.

Using a page provider to query the repository makes it easy to tune or override queries.

Migrating an Explicit NXQL Query

If you can not use a page provider and want to migrate code like this:

DocumentModelList docs = session.query(nxql);

Using the Search Service you will use a SearchQuery:

SearchService searchService = Framework.getService(SearchService.class);
SearchResponse response = searchService.search(SearchQuery.Builder(nxql).limit(100).build());
DocumentModelList docs = response.loadDocuments(session);

The first difference is that using session.query all the documents are returned while using SearchService there is a default limit of 10 documents. On the OpenSearch client implementation, this limit is capped at 10000 (see index.max_result_window section). To get all the documents, use the scroll API. Note that with a limit set to 0 you can get the total results size (using docs.totalSize()) without loading any documents. The documents are loaded from the repository in a second step during response.loadDocument(session). Be aware that this operation can be expensive.

Another difference is that documents that are searchable at time **t** may be different between database and Elasticsearch:

When using the repository API, a document is searchable after a modification once there is a session.save() or after the transaction commit for others sessions.
When using the SearchService, a document is searchable after a modification only when the transaction is committed AND, indexing processors are done AND depending on the search client implementation, if the index is refreshed (which happens every second on OpenSearch).

For instance migrating this code:

doc.setPropertyValue("dc:title", "A new title");
session.saveDocument(doc);
session.save();

docs = session.query("SELECT * FROM Document WHERE dc:title = 'A new title'"); // expect to match "doc"

Can be done like this:

doc.setPropertyValue("dc:title", "A new title");
session.saveDocument(doc);
session.save();
// flag the thread to wait for synchronous indexing on commit
ThreadLocalIndexingCommandsStacker.useSyncIndexing.set(true);
// commit
TransactionHelper.commitOrRollbackTransaction();
TransactionHelper.startTransaction();

SearchService searchService = Framework.getService(SearchService.class);
SearchResponse response = searchService.search(SearchQuery.Builder("SELECT * FROM Document WHERE dc:title = 'A new title'").build());
DocumentModelList docs = response.loadDocuments(session); // "doc" is returned

Obviously, there is a write overhead here because we are splitting the transaction and explicitly calling a refresh. This can be useful but on normal code you have to decide if it makes sense to search documents that are probably already loaded in your context. Note that in unit test you don't have to do this because the CoreSearchFeature is already configured to wait for synchronous indexing during commit.

Migrating a queryAndFetch Request

Replace the code:

IterableQueryResult rows = session.queryAndFetch("SELECT ecm:uuid, dc:title FROM Document", NXQL.NXQL);
...

With:

SearchResponse response = searchService.search(SearchQuery.Builder("SELECT ecm:uuid, dc:title FROM Document").build());
IterableQueryResult docs = response.getHitsAsIterator();

And you gain the limit/offset options.

For now the select clause support is limited to scalar properties. See the page Elasticsearch limitations for more information.

Migrating a Request using scroll API

To prevent overloading Nuxeo, some search client implementation such as Elasticsearch/OpenSearch will reject search request where from + size > 10000. If you need to get all documents from query, you should use scroll search request.

To do that, we can use the following code snippet`.

SearchService searchService = Framework.getService(SearchService.class);

// Perform initial search and get first batch of 20 results
String query = "SELECT * FROM Document WHERE dc:title = 'A new title'";
var response = searchService.search(SearchQuery.Builder(nxql).scrollSearch(true).scrollSize(20).build);

while (response.getHitsCount() > 0) {
    for (DocumentModel doc : response.loadDocuments(session)) {
        // Process document
        ...
    }
    // Get next batch of results
    response = searchService.scrollSearch(response.getScrollContext());
}

The scrollSize represents the size of each batch retrieved with scroll API. A scrollKeepAlive can also be set, it should be long enough to perform the next scroll query.

Deactivating Database Optimizations

By default there are two optimizations done at the database level: one for the document path search (STARTSWITH operator) and one for the right filtering (ACL Read). They optimize the read request but they have a cost on write operations. Basically they are materializing data (document path and ACL read) using stored procedures.

See VCS configuration documentation to see how to disable pathOptimizations and aclOptimizations.

If you disable these optimizations you will have bad response time on NXQL for non-administrator user and for queries involving the STARTSWITH operator. Again look at the slow queries monitoring and migrate them to Elasticsearch (see above section).

Deactivating Database Full-Text Search

By default full text is indexed at the database level. If you have moved your full-text search to Elasticsearch you don't need to maintain database full-text index and trigger.

By using nuxeo.vcs.fulltext.search.disabled=true option in the nuxeo.conf file, full-text will be extracted and saved into the database, but there will be no full-text index, triggers and duplication overhead.

When disabling Database Full-text Search on an existing instance you have to remove the trigger and index manually, for instance for PostgreSQL:

DROP TRIGGER nx_trig_ft_update ON fulltext;
DROP INDEX fulltext_fulltext_idx;
DROP INDEX fulltext_fulltext_title_idx;

On MongoDB:

// remove the fulltext index
db.default.dropIndex('fulltext')
// remove the field containing the metadata fulltext
db.default.updateMany({}, {$unset: {"ecm:fulltextSimple":1}});

Using a Search Engine Pass Through (HTTP API)

The OpenSearch search client package comes with PassThrough endpoint, it exposes a limited set of Read Only OpenSearch HTTP REST API, taking in account the Nuxeo authentication and authorization.

See the OpenSearch Passthrough page for more information.