Indexing and Query

Elasticsearch Hints Cheat Sheet

Updated: July 17, 2023

This page lists interesting use cases of Elasticsearch Hints.

 

Fuzzy Search on Full Text Index

Configuration

  • Drop any string field on your content view
  • Use the following values for the ES hints configuration:
    • Index: _all
    • Analyzer: fulltext
    • Operator: fuzzy

Test case

  • Create a new document that contains a text file which itself contains the string "Nuxeo rocks"
  • Search for "Nuxo", the document created previously appears in the results

Using the Common Operator on the Main Attachment Content

 

Extract from the course What's New in Nuxeo Platform LTS 2015? in Hyland University

Suppose you want to be able to search using the common operator on your documents' main attachment content. This Elasticsearch operator is interesting for two reasons:

  • The common operator can be seen as an alternative to the full-text search. One notable difference is that it allows to search on terms that would have been removed by the full-text analyzer. If I absolutely want to search for the “Not Beyond Space Travel Agencies”, I’d like to be able to search for the “Not” keyword.
  • The common operator is smart. It divides query terms between those which are rare into the index, and those which are commonly found into it. Rare terms will get a boost, common terms will be lowered. Let's say you have lots of contracts in your repository, and you search for "confidentiality clause". If both query terms were considered of same importance, most relevant results might be drowned. The common operator will understand that the term "confidentiality" is rare and boost it, while lowering the importance of the "clause" term, that is common. This will help you getting the most relevant results first.

To implement this use case:

  • In the analyzer configuration, add an analyzer that will be used to index the main attachment's content:
"my_attachment_analyzer" : {
  "type" : "custom",
    "filter" : [
      "word_delimiter_filter",
      "lowercase",
      "asciifolding"
    ],
  "tokenizer" : "standard"
}

 

  • In the properties configuration, update the ecm:binarytext field mapping configuration to the following:
"ecm:binarytext" : {
  "type" : "multi_field",
  "fields" : {
    "ecm:binarytext" : {
      "type" : "string",
      "index" : "no",
      "include_in_all" : true
    },
    "common" : {
      "type": "string",
      "analyzer" : "my_attachment_analyzer",
      "include_in_all" : false
    }
  }
}

You can now configure hints in Nuxeo Studio using the common operator when querying on the ecm:binarytext.common index.

Nuxeo Studio Configuration

  • Drop any string field in the search layout of your content view
  • Use the following values for the ES hints configuration:
    • Index: ecm:binarytext.common
    • Analyzer: my_attachment_analyzer
    • Operator: common

Test case

  • Create a new document that contains an attachment which itself contains the string "Not Beyond Space Travel Agency"
  • Search for "Not", the document created previously appears in the results

Please note this is a basic test case. The common operator is best used on very large indexes.