Package org.nuxeo.common.utils
Class FullTextUtils
java.lang.Object
org.nuxeo.common.utils.FullTextUtils
Functions related to simple fulltext parsing. They don't try to be exhaustive but they work for simple cases.
-
Field Details
-
wordPattern
-
MIN_SIZE
public static final int MIN_SIZE- See Also:
-
STOP_WORDS
- See Also:
-
stopWords
-
UNACCENTED
- See Also:
-
-
Method Details
-
parseFullText
Extracts the words from a string for simple fulltext indexing.Initial order is kept, but duplicate words are removed.
It omits short or stop words, removes accents and does pseudo-stemming.
- Parameters:
string
- the stringremoveDiacritics
- if the diacritics must be removed- Returns:
- an ordered set of resulting words
-
parseWord
Parses a word and returns a simplified lowercase form.- Parameters:
string
- the wordremoveDiacritics
- if the diacritics must be removed- Returns:
- the simplified word, or
null
if it was removed as a stop word or a short word
-