Package org.apache.lucene.analysis
Class StopFilter
- java.lang.Object
-
- org.apache.lucene.util.AttributeSource
-
- org.apache.lucene.analysis.TokenStream
-
- org.apache.lucene.analysis.TokenFilter
-
- org.apache.lucene.analysis.FilteringTokenFilter
-
- org.apache.lucene.analysis.StopFilter
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public final class StopFilter extends FilteringTokenFilter
Removes stop words from a token stream.You must specify the required
Version
compatibility when creating StopFilter:- As of 3.1, StopFilter correctly handles Unicode 4.0 supplementary characters in stopwords and position increments are preserved
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource
AttributeSource.AttributeFactory, AttributeSource.State
-
-
Field Summary
-
Fields inherited from class org.apache.lucene.analysis.TokenFilter
input
-
-
Constructor Summary
Constructors Constructor Description StopFilter(boolean enablePositionIncrements, TokenStream in, Set<?> stopWords)
Deprecated.useStopFilter(Version, TokenStream, Set)
insteadStopFilter(boolean enablePositionIncrements, TokenStream input, Set<?> stopWords, boolean ignoreCase)
Deprecated.UseStopFilter(Version, TokenStream, Set)
insteadStopFilter(Version matchVersion, TokenStream in, Set<?> stopWords)
Constructs a filter which removes words from the input TokenStream that are named in the Set.StopFilter(Version matchVersion, TokenStream input, Set<?> stopWords, boolean ignoreCase)
Deprecated.UseStopFilter(Version, TokenStream, Set)
instead
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description protected boolean
accept()
Returns the next input Token whose term() is not a stop word.static boolean
getEnablePositionIncrementsVersionDefault(Version matchVersion)
Deprecated.useStopFilter(Version, TokenStream, Set)
insteadstatic Set<Object>
makeStopSet(String... stopWords)
Deprecated.usemakeStopSet(Version, String...)
insteadstatic Set<Object>
makeStopSet(String[] stopWords, boolean ignoreCase)
Deprecated.usemakeStopSet(Version, String[], boolean)
instead;static Set<Object>
makeStopSet(List<?> stopWords)
Deprecated.usemakeStopSet(Version, List)
insteadstatic Set<Object>
makeStopSet(List<?> stopWords, boolean ignoreCase)
Deprecated.usemakeStopSet(Version, List, boolean)
insteadstatic Set<Object>
makeStopSet(Version matchVersion, String... stopWords)
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor.static Set<Object>
makeStopSet(Version matchVersion, String[] stopWords, boolean ignoreCase)
Creates a stopword set from the given stopword array.static Set<Object>
makeStopSet(Version matchVersion, List<?> stopWords)
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor.static Set<Object>
makeStopSet(Version matchVersion, List<?> stopWords, boolean ignoreCase)
Creates a stopword set from the given stopword list.-
Methods inherited from class org.apache.lucene.analysis.FilteringTokenFilter
getEnablePositionIncrements, incrementToken, reset, setEnablePositionIncrements
-
Methods inherited from class org.apache.lucene.analysis.TokenFilter
close, end
-
Methods inherited from class org.apache.lucene.util.AttributeSource
addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
-
-
-
-
Constructor Detail
-
StopFilter
@Deprecated public StopFilter(boolean enablePositionIncrements, TokenStream input, Set<?> stopWords, boolean ignoreCase)
Deprecated.UseStopFilter(Version, TokenStream, Set)
insteadConstruct a token stream filtering the given input. IfstopWords
is an instance ofCharArraySet
(true ifmakeStopSet()
was used to construct the set) it will be directly used andignoreCase
will be ignored sinceCharArraySet
directly controls case sensitivity. IfstopWords
is not an instance ofCharArraySet
, a new CharArraySet will be constructed andignoreCase
will be used to specify the case sensitivity of that set.- Parameters:
enablePositionIncrements
- true if token positions should record the removed stop wordsinput
- Input TokenStreamstopWords
- A Set of Strings or char[] or any other toString()-able set representing the stopwordsignoreCase
- if true, all words are lower cased first
-
StopFilter
@Deprecated public StopFilter(Version matchVersion, TokenStream input, Set<?> stopWords, boolean ignoreCase)
Deprecated.UseStopFilter(Version, TokenStream, Set)
insteadConstruct a token stream filtering the given input. IfstopWords
is an instance ofCharArraySet
(true ifmakeStopSet()
was used to construct the set) it will be directly used andignoreCase
will be ignored sinceCharArraySet
directly controls case sensitivity. IfstopWords
is not an instance ofCharArraySet
, a new CharArraySet will be constructed andignoreCase
will be used to specify the case sensitivity of that set.- Parameters:
matchVersion
- Lucene version to enable correct Unicode 4.0 behavior in the stop set if Version > 3.0. See above for details.input
- Input TokenStreamstopWords
- A Set of Strings or char[] or any other toString()-able set representing the stopwordsignoreCase
- if true, all words are lower cased first
-
StopFilter
@Deprecated public StopFilter(boolean enablePositionIncrements, TokenStream in, Set<?> stopWords)
Deprecated.useStopFilter(Version, TokenStream, Set)
insteadConstructs a filter which removes words from the input TokenStream that are named in the Set.- Parameters:
enablePositionIncrements
- true if token positions should record the removed stop wordsin
- Input streamstopWords
- A Set of Strings or char[] or any other toString()-able set representing the stopwords- See Also:
makeStopSet(Version, java.lang.String[])
-
StopFilter
public StopFilter(Version matchVersion, TokenStream in, Set<?> stopWords)
Constructs a filter which removes words from the input TokenStream that are named in the Set.- Parameters:
matchVersion
- Lucene version to enable correct Unicode 4.0 behavior in the stop set if Version > 3.0. See above for details.in
- Input streamstopWords
- A Set of Strings or char[] or any other toString()-able set representing the stopwords- See Also:
makeStopSet(Version, java.lang.String[])
-
-
Method Detail
-
makeStopSet
@Deprecated public static final Set<Object> makeStopSet(String... stopWords)
Deprecated.usemakeStopSet(Version, String...)
insteadBuilds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.- See Also:
passing false to ignoreCase
-
makeStopSet
public static final Set<Object> makeStopSet(Version matchVersion, String... stopWords)
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.- Parameters:
matchVersion
- Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0stopWords
- An array of stopwords- See Also:
passing false to ignoreCase
-
makeStopSet
@Deprecated public static final Set<Object> makeStopSet(List<?> stopWords)
Deprecated.usemakeStopSet(Version, List)
insteadBuilds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.- Parameters:
stopWords
- A List of Strings or char[] or any other toString()-able list representing the stopwords- Returns:
- A Set (
CharArraySet
) containing the words - See Also:
passing false to ignoreCase
-
makeStopSet
public static final Set<Object> makeStopSet(Version matchVersion, List<?> stopWords)
Builds a Set from an array of stop words, appropriate for passing into the StopFilter constructor. This permits this stopWords construction to be cached once when an Analyzer is constructed.- Parameters:
matchVersion
- Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0stopWords
- A List of Strings or char[] or any other toString()-able list representing the stopwords- Returns:
- A Set (
CharArraySet
) containing the words - See Also:
passing false to ignoreCase
-
makeStopSet
@Deprecated public static final Set<Object> makeStopSet(String[] stopWords, boolean ignoreCase)
Deprecated.usemakeStopSet(Version, String[], boolean)
instead;Creates a stopword set from the given stopword array.- Parameters:
stopWords
- An array of stopwordsignoreCase
- If true, all words are lower cased first.- Returns:
- a Set containing the words
-
makeStopSet
public static final Set<Object> makeStopSet(Version matchVersion, String[] stopWords, boolean ignoreCase)
Creates a stopword set from the given stopword array.- Parameters:
matchVersion
- Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0stopWords
- An array of stopwordsignoreCase
- If true, all words are lower cased first.- Returns:
- a Set containing the words
-
makeStopSet
@Deprecated public static final Set<Object> makeStopSet(List<?> stopWords, boolean ignoreCase)
Deprecated.usemakeStopSet(Version, List, boolean)
insteadCreates a stopword set from the given stopword list.- Parameters:
stopWords
- A List of Strings or char[] or any other toString()-able list representing the stopwordsignoreCase
- if true, all words are lower cased first- Returns:
- A Set (
CharArraySet
) containing the words
-
makeStopSet
public static final Set<Object> makeStopSet(Version matchVersion, List<?> stopWords, boolean ignoreCase)
Creates a stopword set from the given stopword list.- Parameters:
matchVersion
- Lucene version to enable correct Unicode 4.0 behavior in the returned set if Version > 3.0stopWords
- A List of Strings or char[] or any other toString()-able list representing the stopwordsignoreCase
- if true, all words are lower cased first- Returns:
- A Set (
CharArraySet
) containing the words
-
accept
protected boolean accept() throws IOException
Returns the next input Token whose term() is not a stop word.- Specified by:
accept
in classFilteringTokenFilter
- Throws:
IOException
-
getEnablePositionIncrementsVersionDefault
@Deprecated public static boolean getEnablePositionIncrementsVersionDefault(Version matchVersion)
Deprecated.useStopFilter(Version, TokenStream, Set)
insteadReturns version-dependent default for enablePositionIncrements. Analyzers that embed StopFilter use this method when creating the StopFilter. Prior to 2.9, this returns false. On 2.9 or later, it returns true.
-
-