By moving query load from the database to Elasticsearch, applications can dramatically increase performance and scalability.
It is easy to pinpoint slow queries that need to be migrated from the database to Elasticsearch by monitoring slow queries.
This can be done as described in How to Make a Page Provider or Content View Query Elasticsearch Index.
Using a page provider to query the repository makes it easy to tune or override queries.
If you can not use a page provider and want to migrate code like this:
DocumentModelList docs = session.query(nxql);
Using Elasticsearch you will use a query builder:
ElasticSearchService ess = Framework.getLocalService(ElasticSearchService.class); DocumentModelList docs = ess.query(new NxQueryBuilder(session).nxql(nxql).limit(-1)); // do we need to load all documents ?
The first difference is that using
session.query all the documents are returned while using Elasticsearch there is a default limit of
10 documents. To get all the documents use a limit of
-1. Think twice before using the
-1 limit: loading all the documents can affect performance especially if the query is dynamic and can match all the documents in the repository. Note that with a limit set to
0 you can get the total results size (using
docs.totalSize()) without loading any documents.
Another difference is that documents that are searchable at time
**t** may be different between database and Elasticsearch:
- When using the repository API, a document is searchable after a modification once there is a
session.save()or after the transaction commit for others sessions.
- When using Elasticsearch a document is searchable after a modification only when: the transaction is committed AND asynchronous indexing job is done AND Elasticsearch index is refreshed, which happens every second by default.
For instance migrating this code:
doc.setPropertyValue("dc:title", "A new title"); session.saveDocument(doc); session.save(); docs = session.query("SELECT * FROM Document WHERE dc:title = 'A new title'"); // expect to match "doc"
Can be done like this:
doc.setPropertyValue("dc:title", "A new title"); session.saveDocument(doc); session.save(); ElasticSearchAdmin esa = Framework.getService(ElasticSearchAdmin.class); TransactionHelper.commitOrRollbackTransaction(); TransactionHelper.startTransaction(); esa.prepareWaitForIndexing().get(20, TimeUnit.SECONDS); // wait for indexing esa.refresh(); // explicit refresh ess.query(new NxQueryBuilder(session).nxql("SELECT * FROM Document WHERE dc:title = 'A new title'").limit(-1)); // "doc" is returned
Obviously there is a write overhead here because we are splitting the transaction and explicitly call a refresh. This can be useful for unit test migration but on normal code you have to decide if it make sense to search documents that are probably already loaded in your context.
Replace the code:
IterableQueryResult rows = session.queryAndFetch("SELECT ecm:uuid, dc:title FROM Document", NXQL.NXQL); ...
EsResult result = ess.queryAndAggregate(new NxQueryBuilder(session).nxql("SELECT ecm:uuid, dc:title FROM Document").limit(-1)); IterableQueryResult rows = result.getRows();
And you gain the limit/offset options.
For now the select clause support is limited to scalar properties. See the page Elasticsearch limitations for more information.
By default there are two optimizations done at the database level: one for the document path search (
STARTSWITH operator) and one for the right filtering (ACL Read). They optimize the read request but they have a cost on write operations. Basically they are materializing data (document path and ACL read) using stored procedures.
See VCS configuration documentation to see how to disable
If you disable these optimizations you will have bad response time on NXQL for non-administrator user and for queries involving the
STARTSWITH operator. Again look at the slow queries monitoring and migrate them to Elasticsearch (see above section).
By default full text is indexed at the database level. If you have moved your full-text search to Elasticsearch you don't need to maintain database full-text index and trigger.
nuxeo.vcs.fulltext.search.disabled=true option in the
nuxeo.conf file, full-text will be extracted and saved into the database, but there will be no full-text index, triggers and duplication overhead.
When disabling Database Full-text Search on an existing instance you have to remove the trigger and index manually, for instance for PostgreSQL:
DROP TRIGGER nx_trig_ft_update ON fulltext; DROP INDEX fulltext_fulltext_idx; DROP INDEX fulltext_fulltext_title_idx;
If you have set up a multi repositories configuration to query over them just use the
docs = ess.query(new NxQueryBuilder(session).nxql(nxql).searchOnAllRepositories());
The nuxeo-elasticsearch-http-read-only addon exposes a limited set of Read Only Elasticsearch HTTP REST API, taking in account the Nuxeo authentication and authorization.
See the addon README for more information.