Nuxeo Server

Configuring the Elasticsearch Mapping

Updated: November 13, 2017 Page Information Edit on GitHub

This documentation page talks about the many aspects you can tune for improving the search experience for your users when it comes to search the document repository index.

This documentation page apply only to Nuxeo Platform version greater or equal to 9.3 with Elasticsearch version greater or equal to 5.6.

Nuxeo comes with a default mapping that can work with custom fields of your schemas, but in a limited way. To leverage the search capabilities of Elasticsearch you need to define your own mapping, for instance in the following cases:

  • use of a non English or a custom analyzer
  • use a specific NXQL operators on a custom field: LIKE, ILIKE, ecm:fulltext.custom, STARTSWITH
  • exclude field from the full-text search

To do this you need to create your own custom template that redefines the Elasticsearch mapping. This way the mapping reference stay on the Nuxeo configuration side and you should not update the mapping directly on the Elasticsearch side.

Nuxeo updates the mapping and setting on Elasticsearch only when:

  • The Elasticsearch index does not exist
  • A full repository re-indexing is performed

Customizing the Language

The Nuxeo code and mapping use a full-text analyzer named fulltext, this analyzer is defined in the settings file as an English analyzer.

You can reconfigure the fulltext analyzer to match your language and requirements. Note that a fulltext_fr is provided as a French analyzer example.

Since Nuxeo 9.3 and the switch to Elasticsearch 5.6, there is no more default _all field. This field is going to be deprecated in Elasticsearch 6.0.

The default fulltext field now relies on a custom field named all_field.

The default mapping contains a dynamic template that copies any text fields into this all_field.

This means that:

  • if you don't set an explicit mapping for a text field it will be part of the all_field.

  • if you set an explicit mapping for a text field you need to choose:

    • To include the field to the default fulltext using the copy_to option:
      "my:field" : {
      "type" : "keyword",
      "copy_to" : "all_field"
      }
      
    • To not include the field to the default fulltext by omitting the copy option:
      "my:unsearchable_field" : {
        "type" : "keyword"
      }
      

Making LIKE Work

The LIKE query can be translated to match_phrase_prefix for right truncation, this requires a text field defined as:

"my:field" : {
  "type" : "text",
  "copy_to" : "all_field"
}

If the field is also used for sorting results, it needs to have a special fielddata option:

"my:field" : {
  "type" : "text",
  "copy_to" : "all_field",
  "fielddata" : true
}

To do case insensitive search using an ILIKE operation you need to declare your field as a multi field with a lowercase field like this:

"my:field" : {
  "type" : "keyword",
  "copy_to" : "all_field",
  "fields" : {
    "lowercase" : {
      "type": "text",
      "analyzer" : "lowercase_analyzer"
    }
  }
}

Making STARTSWITH Work with a Custom Field

To use a STARTSWITH operator on a field with a path pattern like a hierarchical vocabulary. Turn your field into a multi field with a children subfield:

"my:field" : {
  "type" : "keyword",
  "fields" : {
    "children" : {
      "type" : "text",
      "search_analyzer" : "keyword",
      "analyzer" : "path_analyzer"
    }
  }
}

Adding a New Full-Text Field

To use the full-text search syntax on a custom field you need to create a multi field with a fulltext index like this:

"my:text" : {
  "type" : "keyword",
  "copy_to" : "all_field",
  "fields" : {
      "fulltext" : {
      "type": "text",
      "analyzer" : "fulltext"
    }
  }
}

When you need to search with left truncature (or left and right truncatures) the NXQL syntax to use is LIKE '%foo%'. This kind of query use an Elasticsearch wildcard search but the cost of the left truncature is high because the term index can not be used efficiently. Using an NGram index is a good alternative for such a case.

First you need to define an nGram analyzer in your settings:

   "analysis" : {
...
      "tokenizer" : {
...
         "ngram_tokenizer": {
           "type": "nGram",
           "min_gram": 3,
           "max_gram": 12
          },
...
      "analyzer" : {
...
        "ngram_analyzer": {
          "type": "custom",
          "filter": [
            "lowercase"
          ],
          "tokenizer": "ngram_tokenizer"
        },
...

Then use it in the mapping:

   "properties" : {
...
      "dc:title" : {
         "type" : "text",
         "fields" : {
           "fulltext" : {
             "type": "text",
             "analyzer" : "fulltext",
             "boost": 2
           },
           "ngram": {
             "type": "text",
             "analyzer": "ngram_analyzer"
           }
         }
      },

Now you can do an efficient version of:

SELECT * FROM Document WHERE dc:title ILIKE '%Foo%'

Using:

SELECT * FROM Document WHERE /*+ES: INDEX(dc:title.ngram) ANALYZER(lowercase_analyzer) OPERATOR(match) */ dc:title = 'Foo'"));

5 days ago manonlumeau Added content-review-lts2017 label
10 days ago manonlumeau NXDOC-1347: Update documentation for Nuxeo 9.3 and Elasticsearch 5.6
10 days ago manonlumeau Review format
a month ago manonlumeau NXDOC-1346-FT review screenshot
a year ago Manon Lumeau 31
a year ago Benoit Delbosc 30 | Add a note about ngram search
2 years ago Solen Guitter 29
2 years ago Bertrand Chauvin 28 | Fix typo
2 years ago Bertrand Chauvin 27 | Added video
2 years ago Bertrand Chauvin 26 | Update explanations
2 years ago Manon Lumeau 25
2 years ago Bertrand Chauvin 24 | fix anchor
2 years ago Bertrand Chauvin 23
2 years ago Bertrand Chauvin 22
2 years ago Bertrand Chauvin 21 | Added common operator mapping conf
2 years ago Benoit Delbosc 20
2 years ago Bertrand Chauvin 19 | Typo and anchor
2 years ago Benoit Delbosc 18 | Add new mapping for STARTSWITH needed since 7.10
2 years ago Bertrand Chauvin 17 | Removed reference to 6.0
3 years ago Benoit Delbosc 16 | don't disable default index for fulltext field unless you know how the field is used
3 years ago Solen Guitter 15 | fix brocken link
3 years ago Solen Guitter 14
3 years ago Benoit Delbosc 13
3 years ago Benoit Delbosc 12
3 years ago Benoit Delbosc 11
3 years ago Michaël Vachette 10
3 years ago Michaël Vachette 9
3 years ago Michaël Vachette 8
3 years ago Michaël Vachette 7
3 years ago Solen Guitter 6 | Formatting
3 years ago Manon Lumeau 5
3 years ago Alain Escaffre 4
3 years ago Solen Guitter 3
3 years ago Alain Escaffre 2
3 years ago Alain Escaffre 1
History: Created by Alain Escaffre