Class BrazilianAnalyzer

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    public final class BrazilianAnalyzer
    extends org.apache.lucene.analysis.StopwordAnalyzerBase
    Analyzer for Brazilian Portuguese language.

    Supports an external list of stopwords (words that will not be indexed at all) and an external list of exclusions (words that will not be stemmed, but indexed).

    NOTE: This class uses the same Version dependent settings as StandardAnalyzer.

    • Field Detail

      • DEFAULT_STOPWORD_FILE

        public static final String DEFAULT_STOPWORD_FILE
        File containing default Brazilian Portuguese stopwords.
        See Also:
        Constant Field Values
    • Constructor Detail

      • BrazilianAnalyzer

        public BrazilianAnalyzer​(org.apache.lucene.util.Version matchVersion)
        Builds an analyzer with the default stop words (getDefaultStopSet()).
      • BrazilianAnalyzer

        public BrazilianAnalyzer​(org.apache.lucene.util.Version matchVersion,
                                 Set<?> stopwords)
        Builds an analyzer with the given stop words
        Parameters:
        matchVersion - lucene compatibility version
        stopwords - a stopword set
      • BrazilianAnalyzer

        public BrazilianAnalyzer​(org.apache.lucene.util.Version matchVersion,
                                 Set<?> stopwords,
                                 Set<?> stemExclusionSet)
        Builds an analyzer with the given stop words and stemming exclusion words
        Parameters:
        matchVersion - lucene compatibility version
        stopwords - a stopword set
      • BrazilianAnalyzer

        @Deprecated
        public BrazilianAnalyzer​(org.apache.lucene.util.Version matchVersion,
                                 Map<?,​?> stopwords)
        Deprecated.
        Builds an analyzer with the given stop words.
    • Method Detail

      • getDefaultStopSet

        public static Set<?> getDefaultStopSet()
        Returns an unmodifiable instance of the default stop-words set.
        Returns:
        an unmodifiable instance of the default stop-words set.
      • createComponents

        protected org.apache.lucene.analysis.ReusableAnalyzerBase.TokenStreamComponents createComponents​(String fieldName,
                                                                                                         Reader reader)
        Creates ReusableAnalyzerBase.TokenStreamComponents used to tokenize all the text in the provided Reader.
        Specified by:
        createComponents in class org.apache.lucene.analysis.ReusableAnalyzerBase
        Returns:
        ReusableAnalyzerBase.TokenStreamComponents built from a StandardTokenizer filtered with LowerCaseFilter, StandardFilter, StopFilter , and BrazilianStemFilter.