Nuxeo Server

Indexing and Query

Updated: December 2, 2016 Page Information Edit on GitHub

This page is scheduled for review and update. Check back soon for updated content!

This page is being rewritten, you should expect regular updates.

Architecture

Data store and index

The Nuxeo Platform stores documents and their property values either in a database (VCS) or in a NoSQL database (DBS). This data is also at the same time indexed in an Elasticsearch index. To query those documents, several approaches are offered by the platform depending on whether you are from a remote application or in some Java code executed server-side. In the end, all the methods lead to two possibilities:

  • Query the data store (VCS or DBS).  Queries there are "transactional", which means that the result will reflect exactly the state of the database in the transaction where the query is executed
  • Query the Elasticsearch index. This is the most scalable and efficient way to perform a query. Benchmarks show querying the repository using this Elasticsearch index scales orders of magnitude better than the database.

A Query Language

The natural way of expressing a query in the Nuxeo Platform is with NXQL, the Nuxeo Query Language. 

SELECT * FROM Document WHERE
dc:contributors = ?                          -- simple match on a multi-valued field
AND ecm:mixinType != 'Folderish'             -- use facet to remove all folderish documents
AND ecm:mixinType != 'HiddenInNavigation'    -- use facet to remove all documents that should be hidden
AND ecm:isCheckedInVersion = 0               -- only get checked-out documents
AND ecm:isProxy = 0 AND                      -- don't return proxies
ecm:currentLifeCycleState != 'deleted'       -- don't return documents that are in the trash

As you may see, there is no security clause, because the repository will always only return documents that the current user can see. Security filtering is built-in, so you don't have to post-filter results returned by a search, even if you use complex custom security policies.

The Nuxeo Platform is also compatible with the CMISQL defined in the CMIS standard.

Page Providers: a Pagination Service 

The framework also provides a paginated query system, the Page Providers. Page Providers are a way to expose a query defined in NXQL with additional services: pagination, parameters, maximum number of results, aggregates definition. Page providers are named and declared to the server via a contribution. More information can be found about the page provider object. Page providers are used in the platform in many places: Web application for browsing, for dashboards, …

Resources Endpoint are also based on a page provider. By being declarative, page providers are very easy to override. That way, most of the document lists logic of the default application can be redefined just by overriding the corresponding page provider. You can also build your own application in the same way. Note that in the web application, page providers are associated to a higher concept, the Content View, that wraps all the UI aspects of executing and presenting a search result (see paragraph below).

How to Query the Repository

The following table and schema gives an overview of the different ways of querying the repository.

  1. Query Endpoint (Client side) A resource oriented REST API that allows to execute direct NXQL queries or to use a named page provider that has been declared server side. The API returns serialised JSON documents and offers all the mechanisms provided by Nuxeo Platform Rest API (Content Enricher, Specific headers…). Example:

    http://localhost:8080/nuxeo/site/api/v1/query?query=select * from Document&pageSize=2&currentPageIndex=1
    

    Related topics:

  2. Command Operations (Client side) A set of Automation operations allow to query a page provider that has been declared on the server. Scope is pretty much the same as with the query endpoint, you may prefer using Automation if you are in a Java environment as the Automation Java client is best suited for this use case. Example: TODO sample cURL POST Related topics:
  3. CMIS (Client side & Server side) The Nuxeo Platform is compatible with the CMIS standard. CMIS covers query scope, using CMISQL. It is possible to query the Nuxeo Platform repository using CMISQL in Java server side, or via SOAP and ATOM/PUB bindings remotely. Example:

    ItemIterable<QueryResult> results =
    session.query(
    "SELECT
    * FROM
    cmis:document"
    , false);
    for
    (QueryResult hit: results) {
    for(PropertyData<?> property: hit.getProperties())
    {  String queryName = property.getQueryName();
            Object value = property.getFirstValue();
    }
    

    Related topics:

  4. PageProvider (Server side) Page Providers objects implement the PageProvider interface. It provides in Java all the primitives for getting each documents, pages and related information. Example

    PageProvider<DocumentModel> pp = (PageProvider<DocumentModel>) ppService.getPageProvider(
            "TREE_CHILDREN_PP", null, null, null, props,
            new Object[] { myDoc.getId() });
    List<DocumentModel> documents = pp.getCurrentPage();
    

    Related topics:

  5. CoreSession.query (Server side) The CoreSession object is the main server side Java interface for accession among the repository. Among available methods is the query() that allows to perform directly an NXQL query and get a list of documentModels (the basic Java wrapping of a Nuxeo Document). In most of the situations it is better to rely on a page provider as it is easier to override, maintain, etc… but session.query() is still an option.
  6. CoreSession.QueryAndFetch() (Server side) TODO

Elasticsearch Configuration

The default configuration uses an embedded Elasticsearch instance that runs in the same JVM as the Nuxeo Platform's one. By default the Elasticsearch indexes will be located in nxserver/data/elasticsearch.

This embedded mode is only for testing purpose and should not be used in production.

See the administration documentation to setup and configure an Elasticsearch cluster.

Full-Text Capabilities

Both VCS/DBS  implementations and Elasticsearch provide full-text search capabilities. Depending on the back end (Oracle, Postgres, SQL server, …) capabilities may be slightly different. The Elasticsearch implementation performs bests in terms of relevancy,  for configuring dictionaries, running the stemming etc. Thus it is advised to leverage an Elasticsearch page provider when you want to do searches on full text index.

More documentation can be found about full-text search expressions.

You should also read carefully how you can tune the full-text index for maximizing the relevance depending on your context of use.

Facets and Other Aggregates Support

Aggregates are a way to compute additional information on a search result so as to group and count result items and project them against various axis. For instance, "in the search result, 5 of the documents have the value "Specifications" for the field dc:nature ". The Elasticsearch page provider implementation provides aggregates support. It is possible to define which aggregates can be requested to Elasticsearch with each queries, and a mechanism is implemented so as to filter following queries with the aggregates system offered by Elasticsearch.  It is possible to leverage aggregates both at the API level and in the user interface, where a set of dedicated aggregates widgets has been added. They can be used from Nuxeo Studio.

See the How-to about aggregates widgets.

Terms with Directory Widget
Terms with Directory Widget
     
Date Histograms and Date Ranges
Date Histograms and Date Ranges
   
    
Range
Range

Configuring Search Interfaces in the Nuxeo Platform Back Office: Content Views

Search UI has been conceptualized throughout the notion of “content view” in the Nuxeo Platform framework. The Content View object holds all the necessary information for rendering a search filter, the associated page provider, a search result and all actions that can be made around that search result (sorting, exporting, slide showing, selection actions…). Content views are fully configurable via Nuxeo Studio which makes it a matter of a few minutes to configure new business specific search screens for your application users.

A Content View
A Content View

Indexing Logic

The section Elasticsearch Indexing Logic provides more details on how documents are indexed in Elasticsearch.

9 months ago Andrew Goodricke Add content-review-lts2016 to labels
9 months ago Kevin Leturc NXP-19481: Update MarkLogic page to detail how to configure range element indexes
2 years ago Solen Guitter 59
3 years ago Solen Guitter 58
3 years ago Benoit Delbosc 57
3 years ago Benoit Delbosc 56
3 years ago Alain Escaffre 55
3 years ago Solen Guitter 54
3 years ago Solen Guitter 53
3 years ago Solen Guitter 52
3 years ago Solen Guitter 51 | Merging tables cells
3 years ago Manon Lumeau 50
3 years ago Solen Guitter 49
3 years ago Manon Lumeau 48
3 years ago Alain Escaffre 47
3 years ago Alain Escaffre 46
3 years ago Alain Escaffre 45
3 years ago Alain Escaffre 44
3 years ago Alain Escaffre 43
3 years ago Alain Escaffre 42
3 years ago Alain Escaffre 41
3 years ago Alain Escaffre 40
3 years ago Alain Escaffre 39
3 years ago Alain Escaffre 38
3 years ago Alain Escaffre 37
3 years ago Alain Escaffre 36
3 years ago Alain Escaffre 35
3 years ago Alain Escaffre 34
3 years ago Alain Escaffre 33
3 years ago Alain Escaffre 32
3 years ago Alain Escaffre 31
3 years ago Alain Escaffre 30
3 years ago Solen Guitter 29 | typos and format
3 years ago Alain Escaffre 28
3 years ago Alain Escaffre 27
3 years ago Alain Escaffre 26
3 years ago Alain Escaffre 25
3 years ago Alain Escaffre 24
3 years ago Alain Escaffre 23
3 years ago Alain Escaffre 22
3 years ago Alain Escaffre 21
3 years ago Alain Escaffre 20
3 years ago Alain Escaffre 19
3 years ago Alain Escaffre 18
3 years ago Alain Escaffre 17
3 years ago Alain Escaffre 16
3 years ago Alain Escaffre 15
3 years ago Alain Escaffre 14
3 years ago Alain Escaffre 13
4 years ago Anahide Tchertchian 11 | link to search screen doc + rephrase
4 years ago Anahide Tchertchian 12
4 years ago Solen Guitter 10 | Added children pages and excerpts
4 years ago Solen Guitter 9
6 years ago Solen Guitter 7
6 years ago Solen Guitter 8 | Migrated to Confluence 4.0
6 years ago Solen Guitter 6
7 years ago Florent Guillaume 5
7 years ago Florent Guillaume 4
7 years ago Florent Guillaume 3
7 years ago Florent Guillaume 2
7 years ago Florent Guillaume 1
History: Created by Florent Guillaume