Class FullTextUtils


  • public class FullTextUtils
    extends Object
    Functions related to simple fulltext parsing. They don't try to be exhaustive but they work for simple cases.
    • Method Detail

      • parseFullText

        public static Set<String> parseFullText​(String string,
                                                boolean removeDiacritics)
        Extracts the words from a string for simple fulltext indexing.

        Initial order is kept, but duplicate words are removed.

        It omits short or stop words, removes accents and does pseudo-stemming.

        Parameters:
        string - the string
        removeDiacritics - if the diacritics must be removed
        Returns:
        an ordered set of resulting words
      • parseWord

        public static String parseWord​(String string,
                                       boolean removeDiacritics)
        Parses a word and returns a simplified lowercase form.
        Parameters:
        string - the word
        removeDiacritics - if the diacritics must be removed
        Returns:
        the simplified word, or null if it was removed as a stop word or a short word