Architecture
Data store and index
The Nuxeo Platform stores documents and their property values either in a database (VCS) or in a NoSQL database (DBS). This data is also at the same time indexed in an Elasticsearch index. To query those documents, several approaches are offered by the platform depending on whether you are from a remote application or in some Java code executed server-side. In the end, all the methods lead to two possibilities:
- Query the data store (VCS or DBS). Queries there are "transactional", which means that the result will reflect exactly the state of the database in the transaction where the query is executed
- Query the Elasticsearch index. This is the most scalable and efficient way to perform a query. Benchmarks show querying the repository using this Elasticsearch index scales orders of magnitude better than the database.
A Query Language
The natural way of expressing a query in the Nuxeo Platform is with NXQL, the Nuxeo Query Language.
SELECT * FROM Document WHERE
dc:contributors = ? -- simple match on a multi-valued field
AND ecm:mixinType != 'Folderish' -- use facet to remove all folderish documents
AND ecm:mixinType != 'HiddenInNavigation' -- use facet to remove all documents that should be hidden
AND ecm:isVersion = 0 -- only get checked-out documents
AND ecm:isProxy = 0 AND -- don't return proxies
ecm:currentLifeCycleState != 'deleted' -- don't return documents that are in the trash
As you may see, there is no security clause, because the repository will always only return documents that the current user can see. Security filtering is built-in, so you don't have to post-filter results returned by a search, even if you use complex custom security policies.
The Nuxeo Platform is also compatible with the CMISQL defined in the CMIS standard.
Page Providers: a Pagination Service
The framework also provides a paginated query system, the Page Providers. Page Providers are a way to expose a query defined in NXQL with additional services: pagination, parameters, maximum number of results, aggregates definition. Page providers are named and declared to the server via a contribution. More information can be found about the page provider object. Page providers are used in the platform in many places: Web application for browsing, for dashboards, …
Resources Endpoint are also based on a page provider. By being declarative, page providers are very easy to override. That way, most of the document lists logic of the default application can be redefined just by overriding the corresponding page provider. You can also build your own application in the same way. Note that in the web application, page providers are associated to a higher concept, the Content View, that wraps all the UI aspects of executing and presenting a search result (see paragraph below).
How to Query the Repository
The following table and schema gives an overview of the different ways of querying the repository.
Query Endpoint
Client side
A resource oriented REST API that allows to execute direct NXQL queries or to use a named page provider that has been declared server side. The API returns serialised JSON documents and offers all the mechanisms provided by Nuxeo Platform Rest API (Content Enricher, Specific headers…).
Example:
http://NUXEO_SERVER/nuxeo/site/api/v1/query?query=select * from Document&pageSize=2¤tPageIndex=1
Related topics:
Command Operations
Client side
A set of Automation operations allow to query a page provider that has been declared on the server. Scope is pretty much the same as with the query endpoint, you may prefer using Automation if you are in a Java environment as the Automation Java client is best suited for this use case.
Example: TODO sample cURL POST
Related topics:
- PageProvider Operation definition
- How to Use the Java Automation Client
- How to use the PageProvider operation with the from the JavaScript client (search for "Document.PageProvider")
CMIS
Client side & Server side
The Nuxeo Platform is compatible with the CMIS standard. CMIS covers query scope, using CMISQL. It is possible to query the Nuxeo Platform repository using CMISQL in Java server side, or via SOAP and ATOM/PUB bindings remotely.
Example:
ItemIterable<QueryResult> results = session.query( "SELECT * FROM cmis:document" , false); for (QueryResult hit: results) { for(PropertyData<?> property: hit.getProperties()) { String queryName = property.getQueryName(); Object value = property.getFirstValue(); }
Related topics:
PageProvider
Server side
Page Providers objects implement the PageProvider interface. It provides in Java all the primitives for getting each documents, pages and related information.
Example:
PageProvider<DocumentModel> pp = (PageProvider<DocumentModel>) ppService.getPageProvider( "TREE_CHILDREN_PP", null, null, null, props, new Object[] { myDoc.getId() }); List<DocumentModel> documents = pp.getCurrentPage();
Related topics:
CoreSession.query
Server side
The CoreSession object is the main server side Java interface for accession among the repository. Among available methods is the
query()
that allows to perform directly an NXQL query and get a list of documentModels (the basic Java wrapping of a Nuxeo Document). In most of the situations it is better to rely on a page provider as it is easier to override, maintain, etc… butsession.query()
is still an option.CoreSession.QueryAndFetch()
Server side TODO
Elasticsearch Configuration
The default configuration uses an embedded Elasticsearch instance that runs in the same JVM as the Nuxeo Platform's one. By default the Elasticsearch indexes will be located in nxserver/data/elasticsearch
.
This embedded mode is only for testing purpose and should not be used in production.
See the administration documentation to setup and configure an Elasticsearch cluster.
Full-Text Capabilities
Both VCS/DBS implementations and Elasticsearch provide full-text search capabilities. Depending on the back end (Oracle, Postgres, SQL server, …) capabilities may be slightly different. The Elasticsearch implementation performs bests in terms of relevancy, for configuring dictionaries, running the stemming etc. Thus it is advised to leverage an Elasticsearch page provider when you want to do searches on full text index.
More documentation can be found about full-text search expressions.
You should also read carefully how you can tune the full-text index for maximizing the relevance depending on your context of use.
Facets and Other Aggregates Support
Aggregates are a way to compute additional information on a search result so as to group and count result items and project them against various axis. For instance, "in the search result, 5 of the documents have the value "Specifications" for the field dc:nature
". The Elasticsearch page provider implementation provides aggregates support. It is possible to define which aggregates can be requested to Elasticsearch with each queries, and a mechanism is implemented so as to filter following queries with the aggregates system offered by Elasticsearch. It is possible to leverage aggregates both at the API level and in the user interface, where a set of dedicated aggregates widgets has been added. They can be used from Nuxeo Studio.
See the How-to about aggregates widgets.
Configuring Search Interfaces in the Nuxeo Platform Back Office: Content Views
Search UI has been conceptualized throughout the notion of “content view” in the Nuxeo Platform framework. The Content View object holds all the necessary information for rendering a search filter, the associated page provider, a search result and all actions that can be made around that search result (sorting, exporting, slide showing, selection actions…). Content views are fully configurable via Nuxeo Studio which makes it a matter of a few minutes to configure new business specific search screens for your application users.
Indexing Logic
The section Elasticsearch Indexing Logic provides more details on how documents are indexed in Elasticsearch.
In this section:
- NXQL
- Full-Text Queries — Nuxeo documents can be searched using full-text queries; the standard way to do so is to use the top-right "quick search" box in the Nuxeo Platform. Search queries are expressed in a Nuxeo-defined syntax, described below.
- Page Providers — Page providers allow retrieving items with pagination facilities, they can be used in a non-UI or non-JSF context like event listeners or core services.
- Page Provider Aggregates — When using the Elasticsearch Page Provider, you can define aggregates that will be returned along with the query result.
- Configuring the Elasticsearch Mapping — This documentation page talks about the many aspects you can tune for improving the search experience for your users when it comes to full-text search. This page is limited to full-text searches querying the Elasticsearch index, which is the recommended index for performing full-text searches.
- Elasticsearch Indexing Logic
- How to Configure a New Default Search Form in the Search Tab
- How to Make CMISQL Queries Using Java
- How to Make a Page Provider or Content View Query Elasticsearch Index — Learn how to make a content view query Elasticsearch instead of the Core API.
- How to Configure a Search Filter With Facets and Other Aggregates
- Indexing and Querying How-To Index
- Quick Search — The simple search is configured to work in conjunction with a content view. This section describes the document type and layouts used in the default simple search.
- Moving Load from Database to Elasticsearch — By moving query load from the database to Elasticsearch, applications can dramatically increase performance and scalability.