Nuxeo Server

Elasticsearch Setup

Updated: November 13, 2017 Page Information Edit on GitHub

This page provides several configuration use cases for Elasticsearch.

Setting up an Elasticsearch Cluster

Elasticsearch Supported Versions

The Nuxeo Platform can communicate with Elasticsearch using 2 different protocols:

  • The transport client protocol (using port 9300 by default), in this case you are encouraged to use the same major version on client and cluster sides as described in the matrix below. We recommend to use the same JVM version for all Elasticsearch nodes and Nuxeo.
  • The HTTP Rest protocol (using port 9200 by default), which provides looser coupling with Elasticsearch, this protocol is supported since Nuxeo 9.3.
Nuxeo Platform Version: 9.3 9.2 9.1 LTS 2016 LTS 2015 6.0
Elasticsearch Library: 5.6.3
Cluster: 5.6.x
Library: 2.3.5
Cluster: 2.3.x to 2.4.x
Library: 2.3.5
Cluster: 2.3.x to 2.4.x
From 8.1 to 8.3:
Library: 1.5.2
Cluster: 1.5.2 to 1.7.x
From 8.10:
Library: 2.3.5
Cluster: 2.3.x to 2.4.x
Library: 1.5.2
Cluster: 1.5.2 to 1.7.x
Library: 1.1.2
Cluster: 1.1.2 to 1.7.x

The default configuration uses an embedded Elasticsearch instance that runs in the same JVM as the Nuxeo Platform's.

This embedded mode is only for testing purpose and should not be used in production, neither Elasticsearch nor Nuxeo can support an embedded installation.

For production you need to setup an Elasticsearch cluster.

Installing the Elasticsearch Cluster

Refer to the Elasticsearch documentation to install and secure your cluster. Basically:

  • Don’t run Elasticsearch open to the public.
  • Don’t run Elasticsearch as root.

Use an explicit cluster name by setting the cluster.name in the /etc/elasticsearch/elasticsearch.yml file, this will avoid conflicts with other environments.

If you have a large number of documents or if you use Nuxeo in cluster you may reach the default configuration limitation, here are some recommended tuning:

Consider disabling the OS swapping or using other Elasticsearch option to prevent the heap to be swapped.

In /etc/default/elasticsearch file you can increase the JVM heap to half of the available OS memory:

# For a dedicated node with 12g of RAM
ES_HEAP_SIZE=6g

To prevent indexing errors like:

EsRejectedExceptionException[rejected execution (queue capacity 50)

Increase the bulk queue size In/etc/elasticsearch/elasticsearch.yml configuration file:

threadpool.bulk.queue_size: 500

Configuring Nuxeo to Access the Elasticsearch Cluster

Nuxeo supports two protocols to access the Elasticsearch cluster: the transport client protocol and the Rest client.

The Transport Client protocol (default)

Here are the nuxeo.conf options available for the Transport Client protocol:

elasticsearch.client=TranportClient
elasticsearch.addressList=somenode:9300,anothernode:9300
elasticsearch.clusterName=elasticsearch

Where:

  • elasticsearch.client choose the TransportClient protocol, this is the default so this option is not required.
  • elasticsearch.addressList points to one or many Elasticsearch nodes, this is a comma separated list of host:port. Note that the default port for this protocol is 9300 (and not 9200).
  • elasticsearch.clusterName is the cluster name to join, elasticsearch being the default cluster name.

The REST Client

This protocol is supported since Nuxeo 9.3:

elasticsearch.client=RestClient
elasticsearch.addressList=http://somenode:9200,https://anothernode:443

Where:

  • elasticsearch.client choose the RestClient protocol
  • elasticsearch.addressList is a comma separated list of URL.

Advanced REST Client configuration

If you need to use Basic Authentication or X509 certificate you need to override the default template, Here are all the available options:

  <extension target="org.nuxeo.elasticsearch.ElasticSearchComponent" point="elasticSearchClient">
  <elasticSearchClient class="org.nuxeo.elasticsearch.client.ESRestClientFactory">
    <option name="addressList">https://somenode:443</option>
    <!-- basic auth -->
    <option name="username">scott</option>
    <option name="password">tiger</option>
    <!-- use a keystore -->
    <option name="keystore.path">/usr/lib/jvm/java-8-oracle/jre/lib/security/cacerts</option>
    <option name="keystore.password">changeit</option>
    <!-- connection and socket timeout for the rest client -->
    <option name="connection.timeout.ms">5000</option>
    <option name="socket.timeout.ms">20000</option>
  </elasticSearchClient>
  </extension>

Index names

Nuxeo manages 3 Elasticsearch indexes:

  • The repository index used to index document content, this index can be rebuild from scratch by extracting content from the repository.
  • The audit logs index to store audit entries, this index is a primary storage and can not be rebuild.
  • A sequence index used to serve unique value that can be used as primary keys, this index is also a primary storage.

To make the connection between the Nuxeo Platform instance and the ES cluster check the following options in the nuxeo.conf file and edit if you need to change the default value:

elasticsearch.indexName=nuxeo
elasticsearch.indexNumberOfReplicas=0
audit.elasticsearch.indexName=${elasticsearch.indexName}-audit
seqgen.elasticsearch.indexName=${elasticsearch.indexName}-uidgen

Where

  • elasticsearch.indexName is the name of the Elasticsearch index for the default document repository.
  • elasticsearch.indexNumberOfReplicas is the number of replicas. By default you have 5 shards and 1 replicas. If you have a single node in your cluster you should set the indexNumberOfReplicasto 0. Visit the Elasticsearch documentation for more information on shards and replicas.
  • audit.elasticsearch.indexName is the name of the Elasticsearch index for audit logs.
  • seqgen.elasticsearch.indexName is the name of the Elasticsearch index for the uid sequencer, extensively used for audit logs.

You can find all the available options in the nuxeo.defaults.

Disabling Elasticsearch

Elasticsearch is enabled by default, if you want to disable Elasticsearch indexing and search you can simply add the following option to the nuxeo.conf:

elasticsearch.enabled=false

Disabling Elasticsearch for Audit Logs

When Elasticsearch is enabled and the audit.elasticsearch.enabled property is set to true in nuxeo.conf which is the case by default, Elasticsearch is used as a backend for audit logs.

This improves scalability, especially when using Nuxeo Drive with a large set of users.

When Elasticsearch is used as a backend for audit logs it becomes the reference (no more SQL backend as it was the case in Nuxeo versions lower than 7.3).

For this purpose make sure you read the Backing Up and Restoring the Audit Elasticsearch Index page.

If you want to disable Elasticsearch and use the SQL database as the default backend for audit logs you can simply update this property in nuxeo.conf:

audit.elasticsearch.enabled=false

Rebuilding the Repository Index

If you need to reindex the whole repository, you can do this from the Admin > Elasticsearch > Admin tab.

You can fine tune the indexing process using the following options:

  • Sizing the indexing worker thread pool. The default size is 4, using more threads will crawl the repository faster:

    elasticsearch.indexing.maxThreads=4
    
  • Tuning the number of documents per worker and the number of document submitted using the Elasticsearch bulk API:

    # Reindexing option, number of documents to process per worker
    elasticsearch.reindex.bucketReadSize=500
    # Reindexing option, number of documents to submit to Elasticsearch per bulk command
    elasticsearch.reindex.bucketWriteSize=50
    

Changing the Mappings and Settings of Indexes

Updating the Repository Index Configuration

Nuxeo comes with a default mapping that sets the locale for full-text and declares some fields as being date or numeric.

For fields that are not explicitly defined in the mapping, Elasticsearch will try to guess the type the first time it indexes the field. If the field is empty it will be treated as a String field. This is why most of the time you need to explicitly set the mapping for your custom fields that are of type date, numeric or full-text. Also fields that are used to sort and that could be empty need to be defined to prevent an unmapped field error.

The default mapping is located in the ${NUXEO_HOME}/templates/common-base/nxserver/config/elasticsearch-config.xml.nxftl.

To override and tune the default mapping:

Since Nuxeo 9.3, instead of overriding the extension point you can simply override the default mapping or settings JSON files:

  1. Create a custom template like myapp with a nuxeo.defaults file that contains:

    myapp.target=.
    
  2. In this custom template create a file named nxserver/config/elasticsearch-doc-mapping.json to override the mapping. You can create a file named nxserver/config/elasticsearch-doc-settings.json to override the settings.

  1. Update the nuxeo.conf to use your custom template.

    nuxeo.templates=default,/etc/nuxeo/myapp
    
  2. Restart and re-index the entire repository from the Admin tab (see previous section), a re-indexing is needed to apply the new settings and mapping.

For mapping customization examples, see the page Configuring the Elasticsearch Mapping.

Updating the Audit Logs Index Configuration

Here the index is a primary storage and you cannot rebuild it. So we need a tool that will extract the _source of documents from one index and submit it to a new index that have been setup with the new configuration.

  1. Update the mappings or settings configuration by overriding the {NUXEO_HOME}/templates/common-base/nxserver/config/elasticsearch-audit-index-config.xml(follow the same procedure as the section above for the repository index)
  2. Use a new name for the audit.elasticsearch.indexName(like nuxeo-audit2)
  3. Start the Nuxeo Platform. The new index is created with the new mapping.
  4. Stop the Nuxeo Platform
  5. Copy the audit logs entries in the new index using stream2es. Here we copy nuxeo-audit to nuxeo-audit2.

    curl -O download.elasticsearch.org/stream2es/stream2es; chmod +x stream2es
    ./stream2es es --source http://localhost:9200/nuxeo-audit --target http://localhost:9200/nuxeo-audit2 --replace
    

Configuration for Multi Repositories

You need to define an index for each repository. This is done by adding an elasticSearchIndex contribution.

  1. Create a custom template as described in the above section "Changing the mapping of the index".
  2. Add a second elasticSearchIndex contribution:

    <elasticSearchIndex name="nuxeo-repo2" type="doc" repository="repo2"> ....
    

    Where name is the Elasticsearch index name and repository the repository name.

Investigating and Reporting Problems

Activate Traces

To understand why a document is not present in search results or not indexed, you can activate a debug trace.

Open at the lib/log4j.xml file and uncomment the ELASTIC section:

      <appender name="ELASTIC" class="org.apache.log4j.FileAppender">
        <errorHandler class="org.apache.log4j.helpers.OnlyOnceErrorHandler" />
        <param name="File" value="${nuxeo.log.dir}/elastic.log" />
        <param name="Append" value="false" />
        <layout class="org.apache.log4j.PatternLayout">
          <param name="ConversionPattern" value="%d{ISO8601} %-5p [%t][%c] %m%X%n" />
        </layout>
      </appender>
      <category name="org.nuxeo.elasticsearch" additivity="false">
        <priority value="TRACE" />
        <appender-ref ref="ELASTIC" />
      </category>

The elastic.log file will contain all the requests done by the Nuxeo Platform to Elasticsearch including the curl command ready to be copy/past/debug in a term.

Reporting Settings and Mapping

It is also important to report the current settings and mapping of an Elasticsearch index (here called nuxeo)

curl localhost:9200/nuxeo/_settings?pretty > /tmp/nuxeo-settings.json
curl localhost:9200/nuxeo/_mapping?pretty > /tmp/nuxeo-mapping.json
# misc info and stats on Elasticsearch
curl localhost:9200 > /tmp/es-info.txt
curl localhost:9200/_cluster/stats?pretty >> /tmp/es-info.txt
curl localhost:9200/_nodes/stats?pretty >> /tmp/es-info.txt
curl localhost:9200/_cat/health?v >> /tmp/es-info.txt
curl localhost:9200/_cat/nodes?v >> /tmp/es-info.txt
curl localhost:9200/_cat/indices?v >> /tmp/es-info.txt

Testing an Analyzer

To test the full-text analyzer:

curl -XGET 'localhost:9200/nuxeo/_analyze?analyzer=fulltext&pretty' -d 'This is a text for testing, file_name/1-foos-BAR.jpg'

To test an analyzer derived from the mapping:

curl -XGET 'localhost:9200/nuxeo/_analyze?field=ecm:path.children&pretty' -d 'workspaces/main folder/folder'

Viewing Indexed Terms for Document Field

This can be done using tool like Luke to analyze at the Lucene index level. It is also possible to use aggregate on fields that are not text or text with fielddata option:

# view indexed tokens for dc:title.fulltext of document 3d50118c-7472-4e99-9cc9-321deb4fe053
curl -XGET 'localhost:9200/nuxeo/doc/_search?pretty' -d'{
 "query" : {"ids" : { "values" : ["3d50118c-7472-4e99-9cc9-321deb4fe053"] }},
 "aggs": {"my_aggs": {"terms": {"field": "dc:title", "order" : { "_count" : "desc" }, "size": 1000}}}}'

You may need to change the size parameter to get more or less indexed terms.

Comparing the Elasticsearch Index with the Database Content

You can use the esync tool to compare both content and pinpoint discrepancies.

This tool is a read-only standalone tool, it requires both access to the database and Elasticsearch (using transport client on port 9300).


5 days ago manonlumeau Added content-review-lts2017 label
10 days ago manonlumeau NXDOC-1347: Update documentation for Nuxeo 9.3 and Elasticsearch 5.6
10 days ago manonlumeau Review format
a month ago manonlumeau NXDOC-1346-FT review screenshot
a year ago Solen Guitter 87
a year ago Frédéric Vadon 86 | typo
a year ago Manon Lumeau 85 | remove <span>
a year ago Vincent Dutat 84
a year ago Manon Lumeau 83 | Fix link to Elasticsearch mapping
a year ago Manon Lumeau 82
a year ago Thierry Martins 81 | Add size to 'indexed terms' query
2 years ago Solen Guitter 80 | Use excerpt for compatibility e
2 years ago Solen Guitter 79 | Add anchor
2 years ago Benoit Delbosc 78 | Add a note about swap and reduce bulk queue size
2 years ago Manon Lumeau 77
2 years ago Antoine Taillefer 76
2 years ago Manon Lumeau 75
2 years ago Benoit Delbosc 74
2 years ago Benoit Delbosc 73
2 years ago Benoit Delbosc 72 | how to change mapping for audit index
2 years ago Benoit Delbosc 71
2 years ago Solen Guitter 70
2 years ago Benoit Delbosc 69
2 years ago Benoit Delbosc 68
2 years ago Benoit Delbosc 67
2 years ago Benoit Delbosc 66 | Better explanation on how to create a new template to override the ES mapping
2 years ago Manon Lumeau 65
2 years ago Benoit Delbosc 64
2 years ago Joshua Fletcher 63 | Grammar.
2 years ago Antoine Taillefer 62
2 years ago Antoine Taillefer 61
2 years ago Antoine Taillefer 60
2 years ago Antoine Taillefer 59
2 years ago Antoine Taillefer 58
2 years ago Antoine Taillefer 57
2 years ago Antoine Taillefer 56
2 years ago Benoit Delbosc 55 | Use same JVM for Nuxeo and Es
2 years ago Solen Guitter 54
2 years ago Benoit Delbosc 53
2 years ago Benoit Delbosc 52
2 years ago Benoit Delbosc 51
2 years ago Benoit Delbosc 50
2 years ago Benoit Delbosc 49 | Add some ES tuning
2 years ago Antoine Taillefer 48
2 years ago Antoine Taillefer 47
2 years ago Antoine Taillefer 46
2 years ago Antoine Taillefer 45
2 years ago Antoine Taillefer 44
2 years ago Antoine Taillefer 43
2 years ago Antoine Taillefer 42
2 years ago Antoine Taillefer 41
2 years ago Antoine Taillefer 40
2 years ago Antoine Taillefer 39
2 years ago Solen Guitter 38 | Title cap
2 years ago Antoine Taillefer 37
2 years ago Benoit Delbosc 36
3 years ago Solen Guitter 35 | Move supported versions in a dedicated section
3 years ago Benoit Delbosc 33 | Update on ES version supported
3 years ago Benoit Delbosc 34
3 years ago Benoit Delbosc 32 | Adding esync ref
3 years ago Solen Guitter 31
3 years ago Benoit Delbosc 30
3 years ago Solen Guitter 29
3 years ago Benoit Delbosc 28
3 years ago Benoit Delbosc 27
3 years ago Solen Guitter 26
3 years ago Benoit Delbosc 25
3 years ago Benoit Delbosc 24
3 years ago Benoit Delbosc 23
3 years ago Solen Guitter 22
3 years ago Benoit Delbosc 21
3 years ago Solen Guitter 20
3 years ago Solen Guitter 19
3 years ago Solen Guitter 18
3 years ago Solen Guitter 17
3 years ago Solen Guitter 16 | format
3 years ago Solen Guitter 15
3 years ago Benoit Delbosc 14
3 years ago Benoit Delbosc 13
3 years ago Benoit Delbosc 12
3 years ago Benoit Delbosc 11
3 years ago Benoit Delbosc 10
3 years ago Benoit Delbosc 9
3 years ago Benoit Delbosc 8
3 years ago Benoit Delbosc 7
3 years ago Benoit Delbosc 6
3 years ago Benoit Delbosc 5
3 years ago Benoit Delbosc 4
3 years ago Benoit Delbosc 3
3 years ago Benoit Delbosc 2
3 years ago Benoit Delbosc 1
History: Created by Benoit Delbosc