Server

Indexing and Query

Updated: September 10, 2024

Hyland University
Watch the related course on Hyland University:
Video on Indexes from the Data Persistence course
university_indexes.png
university_indexes.png

Architecture

Data store and index

The Nuxeo Platform stores documents and their property values either in a database (VCS) or in a NoSQL database (DBS). This data is also at the same time indexed in an Elasticsearch index. To query those documents, several approaches are offered by the platform depending on whether you are from a remote application or in some Java code executed server-side. In the end, all the methods lead to two possibilities:

  • Query the data store (VCS or DBS). Queries there are "transactional", which means that the result will reflect exactly the state of the database in the transaction where the query is executed
  • Query the Elasticsearch index. This is the most scalable and efficient way to perform a query. Benchmarks show querying the repository using this Elasticsearch index scales orders of magnitude better than the database.

A Query Language

The natural way of expressing a query in the Nuxeo Platform is with NXQL, the Nuxeo Query Language. 

SELECT * FROM Document WHERE
dc:contributors = ?                          -- simple match on a multi-valued field
AND ecm:mixinType != 'Folderish'             -- use facet to remove all folderish documents
AND ecm:mixinType != 'HiddenInNavigation'    -- use facet to remove all documents that should be hidden
AND ecm:isVersion = 0                        -- only get checked-out documents
AND ecm:isProxy = 0                          -- don't return proxies
AND ecm:isTrashed = 0                        -- don't return documents that are in the trash

As you may see, there is no security clause, because the repository will always only return documents that the current user can see. Security filtering is built-in, so you don't have to post-filter results returned by a search, even if you use complex custom security policies.

The Nuxeo Platform is also compatible with the CMISQL defined in the CMIS standard.

Page Providers: a Pagination Service

The framework also provides a paginated query system, the Page Providers. Page Providers are a way to expose a query defined in NXQL with additional services: pagination, parameters, maximum number of results, aggregates definition. Page providers are named and declared to the server via a contribution. More information can be found about the page provider object. Page providers are used in the platform in many places: Web application for browsing, for dashboards, …

Resources Endpoint are also based on a page provider. By being declarative, page providers are very easy to override. That way, most of the document lists logic of the default application can be redefined just by overriding the corresponding page provider. You can also build your own application in the same way. Note that in the web application, page providers are associated to a higher concept, the Content View, that wraps all the UI aspects of executing and presenting a search result (see paragraph below).

How to Query the Repository

The following table and schema gives an overview of the different ways of querying the repository.

  1. Search Endpoint (Client side) A resource oriented REST API that allows to execute direct NXQL queries or to use a named page provider that has been declared server side. The API returns serialised JSON documents and offers all the mechanisms provided by Nuxeo Platform Rest API (Content Enricher, Specific headers…).

    Example:

    http://localhost:8080/nuxeo/site/api/v1/search/lang/NXQL/execute?query=select * from Document&pageSize=2&currentPageIndex=1
    

    Related topics:

  2. Command Operations (Client side) A set of Automation operations allow to query a page provider that has been declared on the server. Scope is pretty much the same as with the search endpoint.

    Example:

    curl -XPOST -u Administrator:Administrator -H"Content-Type: application/json+nxrequest; charset=UTF-8" http://localhost:8080/nuxeo/site/automation/Repository.PageProvider -d '{"params":{"providerName":"default_document_suggestion"}}'
    

    Related topics:

  3. CMIS (Client side & Server side) The Nuxeo Platform is compatible with the CMIS standard. CMIS covers query scope, using CMISQL. It is possible to query the Nuxeo Platform repository using CMISQL in Java server side, or via JSON-based or AtomPub bindings remotely.

    Example:

    ItemIterable<QueryResult> results = session.query("SELECT * FROM cmis:document" , false);
    for (QueryResult hit: results) {
      for(PropertyData<?> property: hit.getProperties()) {
        String queryName = property.getQueryName();
        Object value = property.getFirstValue();
      }
    }
    

    Related topics:

  4. PageProvider (Server side) Page Providers objects implement the PageProvider interface. It provides in Java all the primitives for getting each documents, pages and related information.

    Example

    PageProvider<DocumentModel> pp = (PageProvider<DocumentModel>) ppService.getPageProvider(
            "TREE_CHILDREN_PP", null, null, null, props,
            new Object[] { myDoc.getId() });
    List<DocumentModel> documents = pp.getCurrentPage();
    

    Related topics:

  5. CoreSession.query (Server side) The CoreSession object is the main server side Java interface for accession among the repository. Among available methods is the query() that allows to perform directly an NXQL query and get a list of documentModels (the basic Java wrapping of a Nuxeo Document). In most of the situations it is better to rely on a page provider as it is easier to override, maintain, etc… but session.query() is still an option.
  6. CoreSession.queryAndFetch() (Server side) Like session.query(), CoreSession.queryAndFetch() provides a way to perform an NXQL query and get an iterable of Java Map instead of DocumentModel.
  7. CoreSession.queryProjection() (Server side) Methods queryProjection() allow to perform an NXQL query in order to get a page of projections as Java Map. Note: query() allows to get a page too, but you get a DocumentModelList as result.

Elasticsearch Configuration

The default configuration uses an embedded Elasticsearch instance that runs in the same JVM as the Nuxeo Platform's one. By default the Elasticsearch indexes will be located in nxserver/data/elasticsearch.

This embedded mode is only for testing purpose and should not be used in production.

See the documentation to setup and configure an Elasticsearch cluster.

Full-Text Capabilities

Both VCS/DBS  implementations and Elasticsearch provide full-text search capabilities. Depending on the back end (Oracle, Postgres, SQL server, …) capabilities may be slightly different. The Elasticsearch implementation performs bests in terms of relevancy,  for configuring dictionaries, running the stemming etc. Thus it is advised to leverage an Elasticsearch page provider when you want to do searches on full text index.

More documentation can be found about full-text search expressions.

You should also read carefully how you can tune the full-text index for maximizing the relevance depending on your context of use.

Facets and Other Aggregates Support

Aggregates are a way to compute additional information on a search result so as to group and count result items and project them against various axis. For instance, "in the search result, 5 of the documents have the value "Specifications" for the field dc:nature ".

The Elasticsearch page provider implementation provides aggregates support. It is possible to define which aggregates can be requested to Elasticsearch with each queries, and a mechanism is implemented so as to filter following queries with the aggregates system offered by Elasticsearch.

It is possible to leverage aggregates both at the API level and in the user interface, where a set of dedicated aggregates widgets has been added. They can be used from Nuxeo Studio.

See the How-to about aggregates widgets.

  • Terms with nuxeo-checkbox-aggregation element:

    terms-vocabulary.png
    terms-vocabulary.png

  • Date Histograms and Date Ranges:

    date-histogram.png
    date-histogram.png

  • Terms with User & Groups with the nuxeo-dropdown-aggregation element:

    authors-suggestion.png
    authors-suggestion.png

  • Range:

    range.png
    range.png

Configuring Search Interfaces in the Nuxeo Platform

Searches have been conceptualized throughout the notion of “page providers” in the Nuxeo Platform. The Page Provider object holds all the necessary information for rendering a search: a NXQL query, pagination logics, quick filters, sorting capabilities etc. Page providers are used in a non-UI context: you need to generate the form layout and the corresponding result layouts to display the search to the users (in Nuxeo Web UI typically).

These elements are fully configurable via Nuxeo Studio which makes it a matter of a few minutes to configure new business specific search screens for your application users.

search-exemple.png
search-exemple.png

Indexing Logic

The section Elasticsearch Indexing Logic provides more details on how documents are indexed in Elasticsearch.