Uses of Class
org.apache.lucene.analysis.TokenFilter
-
Packages that use TokenFilter Package Description org.apache.lucene.analysis API and code to convert text into indexable/searchable tokens.org.apache.lucene.analysis.ar Analyzer for Arabic.org.apache.lucene.analysis.bg Analyzer for Bulgarian.org.apache.lucene.analysis.br Analyzer for Brazilian Portuguese.org.apache.lucene.analysis.cjk Analyzer for Chinese, Japanese, and Korean, which indexes bigrams (overlapping groups of two adjacent Han characters).org.apache.lucene.analysis.cn Analyzer for Chinese, which indexes unigrams (individual chinese characters).org.apache.lucene.analysis.cn.smart Analyzer for Simplified Chinese, which indexes words.org.apache.lucene.analysis.compound A filter that decomposes compound words you find in many Germanic languages into the word parts.org.apache.lucene.analysis.cz Analyzer for Czech.org.apache.lucene.analysis.de Analyzer for German.org.apache.lucene.analysis.el Analyzer for Greek.org.apache.lucene.analysis.en Analyzer for English.org.apache.lucene.analysis.es Analyzer for Spanish.org.apache.lucene.analysis.fa Analyzer for Persian.org.apache.lucene.analysis.fi Analyzer for Finnish.org.apache.lucene.analysis.fr Analyzer for French.org.apache.lucene.analysis.ga Analysis for Irish.org.apache.lucene.analysis.gl Analyzer for Galician.org.apache.lucene.analysis.hi Analyzer for Hindi.org.apache.lucene.analysis.hu Analyzer for Hungarian.org.apache.lucene.analysis.hunspell Stemming TokenFilter using a Java implementation of the Hunspell stemming algorithm.org.apache.lucene.analysis.icu Analysis components based on ICUorg.apache.lucene.analysis.id Analyzer for Indonesian.org.apache.lucene.analysis.in Analysis components for Indian languages.org.apache.lucene.analysis.it Analyzer for Italian.org.apache.lucene.analysis.ja Analyzer for Japanese.org.apache.lucene.analysis.lv Analyzer for Latvian.org.apache.lucene.analysis.miscellaneous Miscellaneous TokenStreamsorg.apache.lucene.analysis.ngram Character n-gram tokenizers and filters.org.apache.lucene.analysis.nl Analyzer for Dutch.org.apache.lucene.analysis.no Analyzer for Norwegian.org.apache.lucene.analysis.payloads Provides various convenience classes for creating payloads on Tokens.org.apache.lucene.analysis.phonetic Analysis components for phonetic search.org.apache.lucene.analysis.position Filter for assigning position increments.org.apache.lucene.analysis.pt Analyzer for Portuguese.org.apache.lucene.analysis.reverse Filter to reverse token text.org.apache.lucene.analysis.ru Analyzer for Russian.org.apache.lucene.analysis.shingle Word n-gram filtersorg.apache.lucene.analysis.snowball TokenFilter
andAnalyzer
implementations that use Snowball stemmers.org.apache.lucene.analysis.standard Standards-based analyzers implemented with JFlex.org.apache.lucene.analysis.stempel Stempel: Algorithmic Stemmerorg.apache.lucene.analysis.sv Analyzer for Swedish.org.apache.lucene.analysis.synonym Analysis components for Synonyms.org.apache.lucene.analysis.th Analyzer for Thai.org.apache.lucene.analysis.tr Analyzer for Turkish.org.apache.lucene.collation CollationKeyFilter
converts each token into its binaryCollationKey
using the providedCollator
, and then encode theCollationKey
as a String usingIndexableBinaryStringTools
, to allow it to be stored as an index term.org.apache.lucene.facet.enhancements Enhanced category featuresorg.apache.lucene.facet.enhancements.association Association category enhancementsorg.apache.lucene.facet.index.streaming Expert: attributes streaming definition for indexing facetsorg.apache.lucene.queryParser A simple query parser implemented with JavaCC.org.apache.lucene.search.highlight The highlight package contains classes to provide "keyword in context" features typically used to highlight search terms in the text of results pages. -
-
Uses of TokenFilter in org.apache.lucene.analysis
Subclasses of TokenFilter in org.apache.lucene.analysis Modifier and Type Class Description class
ASCIIFoldingFilter
This class converts alphabetic, numeric, and symbolic Unicode characters which are not in the first 127 ASCII characters (the "Basic Latin" Unicode block) into their ASCII equivalents, if one exists.class
CachingTokenFilter
This class can be used if the token attributes of a TokenStream are intended to be consumed more than once.class
FilteringTokenFilter
Abstract base class for TokenFilters that may remove tokens.class
ISOLatin1AccentFilter
Deprecated.If you build a new index, useASCIIFoldingFilter
which covers a superset of Latin 1.class
KeywordMarkerFilter
Marks terms as keywords via theKeywordAttribute
.class
LengthFilter
Removes words that are too long or too short from the stream.class
LimitTokenCountFilter
This TokenFilter limits the number of tokens while indexing.class
LookaheadTokenFilter<T extends LookaheadTokenFilter.Position>
An abstract TokenFilter to make it easier to build graph token filters requiring some lookahead.class
LowerCaseFilter
Normalizes token text to lower case.class
MockFixedLengthPayloadFilter
TokenFilter that adds random fixed-length payloads.class
MockGraphTokenFilter
Randomly inserts overlapped (posInc=0) tokens with posLength sometimes > 1.class
MockHoleInjectingTokenFilter
class
MockRandomLookaheadTokenFilter
UsesLookaheadTokenFilter
to randomly peek at future tokens.class
MockVariableLengthPayloadFilter
TokenFilter that adds random variable-length payloads.class
PorterStemFilter
Transforms the token stream as per the Porter stemming algorithm.class
StopFilter
Removes stop words from a token stream.class
TeeSinkTokenFilter
This TokenFilter provides the ability to set aside attribute states that have already been analyzed.class
TypeTokenFilter
Removes tokens whose types appear in a set of blocked types from a token stream.class
ValidatingTokenFilter
A TokenFilter that checks consistency of the tokens (eg offsets are consistent with one another). -
Uses of TokenFilter in org.apache.lucene.analysis.ar
Subclasses of TokenFilter in org.apache.lucene.analysis.ar Modifier and Type Class Description class
ArabicNormalizationFilter
ATokenFilter
that appliesArabicNormalizer
to normalize the orthography.class
ArabicStemFilter
ATokenFilter
that appliesArabicStemmer
to stem Arabic words.. -
Uses of TokenFilter in org.apache.lucene.analysis.bg
Subclasses of TokenFilter in org.apache.lucene.analysis.bg Modifier and Type Class Description class
BulgarianStemFilter
ATokenFilter
that appliesBulgarianStemmer
to stem Bulgarian words. -
Uses of TokenFilter in org.apache.lucene.analysis.br
Subclasses of TokenFilter in org.apache.lucene.analysis.br Modifier and Type Class Description class
BrazilianStemFilter
ATokenFilter
that appliesBrazilianStemmer
. -
Uses of TokenFilter in org.apache.lucene.analysis.cjk
Subclasses of TokenFilter in org.apache.lucene.analysis.cjk Modifier and Type Class Description class
CJKBigramFilter
Forms bigrams of CJK terms that are generated from StandardTokenizer or ICUTokenizer.class
CJKWidthFilter
ATokenFilter
that normalizes CJK width differences: Folds fullwidth ASCII variants into the equivalent basic latin Folds halfwidth Katakana variants into the equivalent kana -
Uses of TokenFilter in org.apache.lucene.analysis.cn
Subclasses of TokenFilter in org.apache.lucene.analysis.cn Modifier and Type Class Description class
ChineseFilter
Deprecated.UseStopFilter
instead, which has the same functionality. -
Uses of TokenFilter in org.apache.lucene.analysis.cn.smart
Subclasses of TokenFilter in org.apache.lucene.analysis.cn.smart Modifier and Type Class Description class
WordTokenFilter
ATokenFilter
that breaks sentences into words. -
Uses of TokenFilter in org.apache.lucene.analysis.compound
Subclasses of TokenFilter in org.apache.lucene.analysis.compound Modifier and Type Class Description class
CompoundWordTokenFilterBase
Base class for decomposition token filters.class
DictionaryCompoundWordTokenFilter
ATokenFilter
that decomposes compound words found in many Germanic languages.class
HyphenationCompoundWordTokenFilter
ATokenFilter
that decomposes compound words found in many Germanic languages. -
Uses of TokenFilter in org.apache.lucene.analysis.cz
Subclasses of TokenFilter in org.apache.lucene.analysis.cz Modifier and Type Class Description class
CzechStemFilter
ATokenFilter
that appliesCzechStemmer
to stem Czech words. -
Uses of TokenFilter in org.apache.lucene.analysis.de
Subclasses of TokenFilter in org.apache.lucene.analysis.de Modifier and Type Class Description class
GermanLightStemFilter
ATokenFilter
that appliesGermanLightStemmer
to stem German words.class
GermanMinimalStemFilter
ATokenFilter
that appliesGermanMinimalStemmer
to stem German words.class
GermanNormalizationFilter
Normalizes German characters according to the heuristics of the German2 snowball algorithm.class
GermanStemFilter
ATokenFilter
that stems German words. -
Uses of TokenFilter in org.apache.lucene.analysis.el
Subclasses of TokenFilter in org.apache.lucene.analysis.el Modifier and Type Class Description class
GreekLowerCaseFilter
Normalizes token text to lower case, removes some Greek diacritics, and standardizes final sigma to sigma.class
GreekStemFilter
ATokenFilter
that appliesGreekStemmer
to stem Greek words. -
Uses of TokenFilter in org.apache.lucene.analysis.en
Subclasses of TokenFilter in org.apache.lucene.analysis.en Modifier and Type Class Description class
EnglishMinimalStemFilter
ATokenFilter
that appliesEnglishMinimalStemmer
to stem English words.class
EnglishPossessiveFilter
TokenFilter that removes possessives (trailing 's) from words.class
KStemFilter
A high-performance kstem filter for english. -
Uses of TokenFilter in org.apache.lucene.analysis.es
Subclasses of TokenFilter in org.apache.lucene.analysis.es Modifier and Type Class Description class
SpanishLightStemFilter
ATokenFilter
that appliesSpanishLightStemmer
to stem Spanish words. -
Uses of TokenFilter in org.apache.lucene.analysis.fa
Subclasses of TokenFilter in org.apache.lucene.analysis.fa Modifier and Type Class Description class
PersianNormalizationFilter
ATokenFilter
that appliesPersianNormalizer
to normalize the orthography. -
Uses of TokenFilter in org.apache.lucene.analysis.fi
Subclasses of TokenFilter in org.apache.lucene.analysis.fi Modifier and Type Class Description class
FinnishLightStemFilter
ATokenFilter
that appliesFinnishLightStemmer
to stem Finnish words. -
Uses of TokenFilter in org.apache.lucene.analysis.fr
Subclasses of TokenFilter in org.apache.lucene.analysis.fr Modifier and Type Class Description class
ElisionFilter
Removes elisions from aTokenStream
.class
FrenchLightStemFilter
ATokenFilter
that appliesFrenchLightStemmer
to stem French words.class
FrenchMinimalStemFilter
ATokenFilter
that appliesFrenchMinimalStemmer
to stem French words.class
FrenchStemFilter
Deprecated.UseSnowballFilter
withFrenchStemmer
instead, which has the same functionality. -
Uses of TokenFilter in org.apache.lucene.analysis.ga
Subclasses of TokenFilter in org.apache.lucene.analysis.ga Modifier and Type Class Description class
IrishLowerCaseFilter
Normalises token text to lower case, handling t-prothesis and n-eclipsis (i.e., that 'nAthair' should become 'n-athair') -
Uses of TokenFilter in org.apache.lucene.analysis.gl
Subclasses of TokenFilter in org.apache.lucene.analysis.gl Modifier and Type Class Description class
GalicianMinimalStemFilter
ATokenFilter
that appliesGalicianMinimalStemmer
to stem Galician words.class
GalicianStemFilter
ATokenFilter
that appliesGalicianStemmer
to stem Galician words. -
Uses of TokenFilter in org.apache.lucene.analysis.hi
Subclasses of TokenFilter in org.apache.lucene.analysis.hi Modifier and Type Class Description class
HindiNormalizationFilter
ATokenFilter
that appliesHindiNormalizer
to normalize the orthography.class
HindiStemFilter
ATokenFilter
that appliesHindiStemmer
to stem Hindi words. -
Uses of TokenFilter in org.apache.lucene.analysis.hu
Subclasses of TokenFilter in org.apache.lucene.analysis.hu Modifier and Type Class Description class
HungarianLightStemFilter
ATokenFilter
that appliesHungarianLightStemmer
to stem Hungarian words. -
Uses of TokenFilter in org.apache.lucene.analysis.hunspell
Subclasses of TokenFilter in org.apache.lucene.analysis.hunspell Modifier and Type Class Description class
HunspellStemFilter
TokenFilter that uses hunspell affix rules and words to stem tokens. -
Uses of TokenFilter in org.apache.lucene.analysis.icu
Subclasses of TokenFilter in org.apache.lucene.analysis.icu Modifier and Type Class Description class
ICUFoldingFilter
A TokenFilter that applies search term folding to Unicode text, applying foldings from UTR#30 Character Foldings.class
ICUNormalizer2Filter
Normalize token text with ICU'sNormalizer2
class
ICUTransformFilter
ATokenFilter
that transforms text with ICU. -
Uses of TokenFilter in org.apache.lucene.analysis.id
Subclasses of TokenFilter in org.apache.lucene.analysis.id Modifier and Type Class Description class
IndonesianStemFilter
ATokenFilter
that appliesIndonesianStemmer
to stem Indonesian words. -
Uses of TokenFilter in org.apache.lucene.analysis.in
Subclasses of TokenFilter in org.apache.lucene.analysis.in Modifier and Type Class Description class
IndicNormalizationFilter
ATokenFilter
that appliesIndicNormalizer
to normalize text in Indian Languages. -
Uses of TokenFilter in org.apache.lucene.analysis.it
Subclasses of TokenFilter in org.apache.lucene.analysis.it Modifier and Type Class Description class
ItalianLightStemFilter
ATokenFilter
that appliesItalianLightStemmer
to stem Italian words. -
Uses of TokenFilter in org.apache.lucene.analysis.ja
Subclasses of TokenFilter in org.apache.lucene.analysis.ja Modifier and Type Class Description class
JapaneseBaseFormFilter
Replaces term text with theBaseFormAttribute
.class
JapaneseKatakanaStemFilter
ATokenFilter
that normalizes common katakana spelling variations ending in a long sound character by removing this character (U+30FC).class
JapanesePartOfSpeechStopFilter
Removes tokens that match a set of part-of-speech tags.class
JapaneseReadingFormFilter
ATokenFilter
that replaces the term attribute with the reading of a token in either katakana or romaji form. -
Uses of TokenFilter in org.apache.lucene.analysis.lv
Subclasses of TokenFilter in org.apache.lucene.analysis.lv Modifier and Type Class Description class
LatvianStemFilter
ATokenFilter
that appliesLatvianStemmer
to stem Latvian words. -
Uses of TokenFilter in org.apache.lucene.analysis.miscellaneous
Subclasses of TokenFilter in org.apache.lucene.analysis.miscellaneous Modifier and Type Class Description class
StemmerOverrideFilter
Provides the ability to override anyKeywordAttribute
aware stemmer with custom dictionary-based stemming. -
Uses of TokenFilter in org.apache.lucene.analysis.ngram
Subclasses of TokenFilter in org.apache.lucene.analysis.ngram Modifier and Type Class Description class
EdgeNGramTokenFilter
Tokenizes the given token into n-grams of given size(s).class
NGramTokenFilter
Tokenizes the input into n-grams of the given size(s). -
Uses of TokenFilter in org.apache.lucene.analysis.nl
Subclasses of TokenFilter in org.apache.lucene.analysis.nl Modifier and Type Class Description class
DutchStemFilter
Deprecated.UseSnowballFilter
withDutchStemmer
instead, which has the same functionality. -
Uses of TokenFilter in org.apache.lucene.analysis.no
Subclasses of TokenFilter in org.apache.lucene.analysis.no Modifier and Type Class Description class
NorwegianLightStemFilter
ATokenFilter
that appliesNorwegianLightStemmer
to stem Norwegian words.class
NorwegianMinimalStemFilter
ATokenFilter
that appliesNorwegianMinimalStemmer
to stem Norwegian words. -
Uses of TokenFilter in org.apache.lucene.analysis.payloads
Subclasses of TokenFilter in org.apache.lucene.analysis.payloads Modifier and Type Class Description class
DelimitedPayloadTokenFilter
Characters before the delimiter are the "token", those after are the payload.class
NumericPayloadTokenFilter
Assigns a payload to a token based on theToken.type()
class
TokenOffsetPayloadTokenFilter
Adds theToken.setStartOffset(int)
andToken.setEndOffset(int)
First 4 bytes are the startclass
TypeAsPayloadTokenFilter
Makes theToken.type()
a payload. -
Uses of TokenFilter in org.apache.lucene.analysis.phonetic
Subclasses of TokenFilter in org.apache.lucene.analysis.phonetic Modifier and Type Class Description class
BeiderMorseFilter
TokenFilter for Beider-Morse phonetic encoding.class
DoubleMetaphoneFilter
Filter for DoubleMetaphone (supporting secondary codes)class
PhoneticFilter
Create tokens for phonetic matches. -
Uses of TokenFilter in org.apache.lucene.analysis.position
Subclasses of TokenFilter in org.apache.lucene.analysis.position Modifier and Type Class Description class
PositionFilter
Set the positionIncrement of all tokens to the "positionIncrement", except the first return token which retains its original positionIncrement value. -
Uses of TokenFilter in org.apache.lucene.analysis.pt
Subclasses of TokenFilter in org.apache.lucene.analysis.pt Modifier and Type Class Description class
PortugueseLightStemFilter
ATokenFilter
that appliesPortugueseLightStemmer
to stem Portuguese words.class
PortugueseMinimalStemFilter
ATokenFilter
that appliesPortugueseMinimalStemmer
to stem Portuguese words.class
PortugueseStemFilter
ATokenFilter
that appliesPortugueseStemmer
to stem Portuguese words. -
Uses of TokenFilter in org.apache.lucene.analysis.reverse
Subclasses of TokenFilter in org.apache.lucene.analysis.reverse Modifier and Type Class Description class
ReverseStringFilter
Reverse token string, for example "country" => "yrtnuoc". -
Uses of TokenFilter in org.apache.lucene.analysis.ru
Subclasses of TokenFilter in org.apache.lucene.analysis.ru Modifier and Type Class Description class
RussianLightStemFilter
ATokenFilter
that appliesRussianLightStemmer
to stem Russian words.class
RussianLowerCaseFilter
Deprecated.UseLowerCaseFilter
instead, which has the same functionality.class
RussianStemFilter
Deprecated.UseSnowballFilter
withRussianStemmer
instead, which has the same functionality. -
Uses of TokenFilter in org.apache.lucene.analysis.shingle
Subclasses of TokenFilter in org.apache.lucene.analysis.shingle Modifier and Type Class Description class
ShingleFilter
A ShingleFilter constructs shingles (token n-grams) from a token stream. -
Uses of TokenFilter in org.apache.lucene.analysis.snowball
Subclasses of TokenFilter in org.apache.lucene.analysis.snowball Modifier and Type Class Description class
SnowballFilter
A filter that stems words using a Snowball-generated stemmer. -
Uses of TokenFilter in org.apache.lucene.analysis.standard
Subclasses of TokenFilter in org.apache.lucene.analysis.standard Modifier and Type Class Description class
ClassicFilter
Normalizes tokens extracted withClassicTokenizer
.class
StandardFilter
Normalizes tokens extracted withStandardTokenizer
. -
Uses of TokenFilter in org.apache.lucene.analysis.stempel
Subclasses of TokenFilter in org.apache.lucene.analysis.stempel Modifier and Type Class Description class
StempelFilter
Transforms the token stream as per the stemming algorithm. -
Uses of TokenFilter in org.apache.lucene.analysis.sv
Subclasses of TokenFilter in org.apache.lucene.analysis.sv Modifier and Type Class Description class
SwedishLightStemFilter
ATokenFilter
that appliesSwedishLightStemmer
to stem Swedish words. -
Uses of TokenFilter in org.apache.lucene.analysis.synonym
Subclasses of TokenFilter in org.apache.lucene.analysis.synonym Modifier and Type Class Description class
SynonymFilter
Matches single or multi word synonyms in a token stream. -
Uses of TokenFilter in org.apache.lucene.analysis.th
Subclasses of TokenFilter in org.apache.lucene.analysis.th Modifier and Type Class Description class
ThaiWordFilter
TokenFilter
that useBreakIterator
to break each Token that is Thai into separate Token(s) for each Thai word. -
Uses of TokenFilter in org.apache.lucene.analysis.tr
Subclasses of TokenFilter in org.apache.lucene.analysis.tr Modifier and Type Class Description class
TurkishLowerCaseFilter
Normalizes Turkish token text to lower case. -
Uses of TokenFilter in org.apache.lucene.collation
Subclasses of TokenFilter in org.apache.lucene.collation Modifier and Type Class Description class
CollationKeyFilter
Converts each token into itsCollationKey
, and then encodes the CollationKey withIndexableBinaryStringTools
, to allow it to be stored as an index term.class
ICUCollationKeyFilter
Converts each token into itsCollationKey
, and then encodes the CollationKey withIndexableBinaryStringTools
, to allow it to be stored as an index term. -
Uses of TokenFilter in org.apache.lucene.facet.enhancements
Subclasses of TokenFilter in org.apache.lucene.facet.enhancements Modifier and Type Class Description class
EnhancementsCategoryTokenizer
A tokenizer which adds to each category token payload according to theCategoryEnhancement
s defined in the givenEnhancementsIndexingParams
. -
Uses of TokenFilter in org.apache.lucene.facet.enhancements.association
Subclasses of TokenFilter in org.apache.lucene.facet.enhancements.association Modifier and Type Class Description class
AssociationListTokenizer
Tokenizer for associations of a category -
Uses of TokenFilter in org.apache.lucene.facet.index.streaming
Subclasses of TokenFilter in org.apache.lucene.facet.index.streaming Modifier and Type Class Description class
CategoryListTokenizer
A base class for category list tokenizers, which add category list tokens to category streams.class
CategoryParentsStream
This class adds parents to aCategoryAttributesStream
.class
CategoryTokenizer
Basic class for setting theCharTermAttribute
s andPayloadAttribute
s of category tokens.class
CategoryTokenizerBase
A base class for all token filters which add term and payload attributes to tokens and are to be used inCategoryDocumentBuilder
.class
CountingListTokenizer
CategoryListTokenizer
for facet counting -
Uses of TokenFilter in org.apache.lucene.queryParser
Subclasses of TokenFilter in org.apache.lucene.queryParser Modifier and Type Class Description static class
QueryParserTestBase.QPTestFilter
Filter which discards the token 'stop' and which expands the token 'phrase' into 'phrase1 phrase2' -
Uses of TokenFilter in org.apache.lucene.search.highlight
Subclasses of TokenFilter in org.apache.lucene.search.highlight Modifier and Type Class Description class
OffsetLimitTokenFilter
This TokenFilter limits the number of tokens while indexing by adding up the current offset.
-