Developer Documentation Center

Indexing and Query

Updated: October 16, 2020

Architecture

Data store and index

The Nuxeo Platform stores documents and their property values either in a database (VCS) or in a NoSQL database (DBS). This data is also at the same time indexed in an Elasticsearch index. To query those documents, several approaches are offered by the platform depending on whether you are from a remote application or in some Java code executed server-side. In the end, all the methods lead to two possibilities:

  • Query the data store (VCS or DBS). Queries there are "transactional", which means that the result will reflect exactly the state of the database in the transaction where the query is executed
  • Query the Elasticsearch index. This is the most scalable and efficient way to perform a query. Benchmarks show querying the repository using this Elasticsearch index scales orders of magnitude better than the database.

A Query Language

The natural way of expressing a query in the Nuxeo Platform is with NXQL, the Nuxeo Query Language.

SELECT * FROM Document WHERE
dc:contributors = ? -- simple match on a multi-valued field
AND ecm:mixinType != 'Folderish' -- use facet to remove all folderish documents
AND ecm:mixinType != 'HiddenInNavigation' -- use facet to remove all documents that should be hidden
AND ecm:isVersion = 0 -- only get checked-out documents
AND ecm:isProxy = 0 AND -- don't return proxies
ecm:currentLifeCycleState != 'deleted' -- don't return documents that are in the trash

As you may see, there is no security clause, because the repository will always only return documents that the current user can see. Security filtering is built-in, so you don't have to post-filter results returned by a search, even if you use complex custom security policies.

The Nuxeo Platform is also compatible with the CMISQL defined in the CMIS standard.

Page Providers: a Pagination Service 

The framework also provides a paginated query system, the Page Providers. Page Providers are a way to expose a query defined in NXQL with additional services: pagination, parameters, maximum number of results, aggregates definition. Page providers are named and declared to the server via a contribution. More information can be found about the page provider object. Page providers are used in the platform in many places: Web application for browsing, for dashboards, …

Resources Endpoint are also based on a page provider. By being declarative, page providers are very easy to override. That way, most of the document lists logic of the default application can be redefined just by overriding the corresponding page provider. You can also build your own application in the same way. Note that in the web application, page providers are associated to a higher concept, the Content View, that wraps all the UI aspects of executing and presenting a search result (see paragraph below).

How to Query the Repository

The following table and schema gives an overview of the different ways of querying the repository.

  1. Query Endpoint

    Client side

    A resource oriented REST API that allows to execute direct NXQL queries or to use a named page provider that has been declared server side. The API returns serialised JSON documents and offers all the mechanisms provided by Nuxeo Platform Rest API (Content Enricher, Specific headers…).

    Example:

    http://NUXEO_SERVER/nuxeo/site/api/v1/query?query=select * from Document&pageSize=2&currentPageIndex=1
    

    Related topics:

  2. Command Operations

    Client side

    A set of Automation operations allow to query a page provider that has been declared on the server. Scope is pretty much the same as with the query endpoint, you may prefer using Automation if you are in a Java environment as the Automation Java client is best suited for this use case.

    Example: TODO sample cURL POST

    Related topics:

  3. CMIS

    Client side & Server side

    The Nuxeo Platform is compatible with the CMIS standard. CMIS covers query scope, using CMISQL. It is possible to query the Nuxeo Platform repository using CMISQL in Java server side, or via SOAP and ATOM/PUB bindings remotely.

    Example:

    ItemIterable<QueryResult> results = 
    session.query(
    "SELECT 
    * FROM 
    cmis:document"
    , false);
    for
    (QueryResult hit: results) {
    for(PropertyData<?> property: hit.getProperties()) 
    {  String queryName = property.getQueryName();
            Object value = property.getFirstValue();
    }
    

    Related topics:

  4. PageProvider

    Server side

    Page Providers objects implement the PageProvider interface. It provides in Java all the primitives for getting each documents, pages and related information.

    Example:

    PageProvider<DocumentModel> pp = (PageProvider<DocumentModel>) ppService.getPageProvider(
            "TREE_CHILDREN_PP", null, null, null, props,
            new Object[] { myDoc.getId() });
    List<DocumentModel> documents = pp.getCurrentPage();
    

    Related topics:

  5. CoreSession.query

    Server side

    The CoreSession object is the main server side Java interface for accession among the repository. Among available methods is the query() that allows to perform directly an NXQL query and get a list of documentModels (the basic Java wrapping of a Nuxeo Document). In most of the situations it is better to rely on a page provider as it is easier to override, maintain, etc… but session.query() is still an option.

  6. CoreSession.QueryAndFetch()

    Server side TODO

Elasticsearch Configuration

The default configuration uses an embedded Elasticsearch instance that runs in the same JVM as the Nuxeo Platform's one. By default the Elasticsearch indexes will be located in nxserver/data/elasticsearch.

This embedded mode is only for testing purpose and should not be used in production.

See the administration documentation to setup and configure an Elasticsearch cluster.

Full-Text Capabilities

Both VCS/DBS implementations and Elasticsearch provide full-text search capabilities. Depending on the back end (Oracle, Postgres, SQL server, …) capabilities may be slightly different. The Elasticsearch implementation performs bests in terms of relevancy, for configuring dictionaries, running the stemming etc. Thus it is advised to leverage an Elasticsearch page provider when you want to do searches on full text index.

More documentation can be found about full-text search expressions.

You should also read carefully how you can tune the full-text index for maximizing the relevance depending on your context of use.

Facets and Other Aggregates Support

Aggregates are a way to compute additional information on a search result so as to group and count result items and project them against various axis. For instance, "in the search result, 5 of the documents have the value "Specifications" for the field dc:nature ". The Elasticsearch page provider implementation provides aggregates support. It is possible to define which aggregates can be requested to Elasticsearch with each queries, and a mechanism is implemented so as to filter following queries with the aggregates system offered by Elasticsearch. It is possible to leverage aggregates both at the API level and in the user interface, where a set of dedicated aggregates widgets has been added. They can be used from Nuxeo Studio.

See the How-to about aggregates widgets.

Terms with Directory Widget
Terms with Directory Widget
Date Histograms and Date Ranges
Date Histograms and Date Ranges
Range
Range

Configuring Search Interfaces in the Nuxeo Platform Back Office: Content Views

Search UI has been conceptualized throughout the notion of “content view” in the Nuxeo Platform framework. The Content View object holds all the necessary information for rendering a search filter, the associated page provider, a search result and all actions that can be made around that search result (sorting, exporting, slide showing, selection actions…). Content views are fully configurable via Nuxeo Studio which makes it a matter of a few minutes to configure new business specific search screens for your application users.

A Content View
A Content View

Indexing Logic

The section Elasticsearch Indexing Logic provides more details on how documents are indexed in Elasticsearch.

In this section: