Package org.nuxeo.common.utils
Class FullTextUtils
- java.lang.Object
-
- org.nuxeo.common.utils.FullTextUtils
-
public class FullTextUtils extends Object
Functions related to simple fulltext parsing. They don't try to be exhaustive but they work for simple cases.
-
-
Field Summary
Fields Modifier and Type Field Description static int
MIN_SIZE
static String
STOP_WORDS
static Set<String>
stopWords
static String
UNACCENTED
static Pattern
wordPattern
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static Set<String>
parseFullText(String string, boolean removeDiacritics)
Extracts the words from a string for simple fulltext indexing.static String
parseWord(String string, boolean removeDiacritics)
Parses a word and returns a simplified lowercase form.
-
-
-
Field Detail
-
wordPattern
public static final Pattern wordPattern
-
MIN_SIZE
public static final int MIN_SIZE
- See Also:
- Constant Field Values
-
STOP_WORDS
public static final String STOP_WORDS
- See Also:
- Constant Field Values
-
UNACCENTED
public static final String UNACCENTED
- See Also:
- Constant Field Values
-
-
Method Detail
-
parseFullText
public static Set<String> parseFullText(String string, boolean removeDiacritics)
Extracts the words from a string for simple fulltext indexing.Initial order is kept, but duplicate words are removed.
It omits short or stop words, removes accents and does pseudo-stemming.
- Parameters:
string
- the stringremoveDiacritics
- if the diacritics must be removed- Returns:
- an ordered set of resulting words
-
parseWord
public static String parseWord(String string, boolean removeDiacritics)
Parses a word and returns a simplified lowercase form.- Parameters:
string
- the wordremoveDiacritics
- if the diacritics must be removed- Returns:
- the simplified word, or
null
if it was removed as a stop word or a short word
-
-