Class ChineseTokenizer

  • All Implemented Interfaces:
    Closeable, AutoCloseable

    @Deprecated
    public final class ChineseTokenizer
    extends org.apache.lucene.analysis.Tokenizer
    Deprecated.
    Use StandardTokenizer instead, which has the same functionality. This filter will be removed in Lucene 5.0
    Tokenize Chinese text as individual chinese characters.

    The difference between ChineseTokenizer and CJKTokenizer is that they have different token parsing logic.

    For example, if the Chinese text "C1C2C3C4" is to be indexed:

    • The tokens returned from ChineseTokenizer are C1, C2, C3, C4.
    • The tokens returned from the CJKTokenizer are C1C2, C2C3, C3C4.

    Therefore the index created by CJKTokenizer is much larger.

    The problem is that when searching for C1, C1C2, C1C3, C4C2, C1C2C3 ... the ChineseTokenizer works, but the CJKTokenizer will not work.

    Version:
    1.0
    • Nested Class Summary

      • Nested classes/interfaces inherited from class org.apache.lucene.util.AttributeSource

        org.apache.lucene.util.AttributeSource.AttributeFactory, org.apache.lucene.util.AttributeSource.State
    • Field Summary

      • Fields inherited from class org.apache.lucene.analysis.Tokenizer

        input
    • Method Summary

      All Methods Instance Methods Concrete Methods Deprecated Methods 
      Modifier and Type Method Description
      void end()
      Deprecated.
       
      boolean incrementToken()
      Deprecated.
       
      void reset()
      Deprecated.
       
      void reset​(Reader input)
      Deprecated.
       
      • Methods inherited from class org.apache.lucene.analysis.Tokenizer

        close, correctOffset
      • Methods inherited from class org.apache.lucene.util.AttributeSource

        addAttribute, addAttributeImpl, captureState, clearAttributes, cloneAttributes, copyTo, equals, getAttribute, getAttributeClassesIterator, getAttributeFactory, getAttributeImplsIterator, hasAttribute, hasAttributes, hashCode, reflectAsString, reflectWith, restoreState, toString
    • Constructor Detail

      • ChineseTokenizer

        public ChineseTokenizer​(Reader in)
        Deprecated.
      • ChineseTokenizer

        public ChineseTokenizer​(org.apache.lucene.util.AttributeSource source,
                                Reader in)
        Deprecated.
      • ChineseTokenizer

        public ChineseTokenizer​(org.apache.lucene.util.AttributeSource.AttributeFactory factory,
                                Reader in)
        Deprecated.
    • Method Detail

      • incrementToken

        public boolean incrementToken()
                               throws IOException
        Deprecated.
        Specified by:
        incrementToken in class org.apache.lucene.analysis.TokenStream
        Throws:
        IOException
      • end

        public final void end()
        Deprecated.
        Overrides:
        end in class org.apache.lucene.analysis.TokenStream
      • reset

        public void reset()
                   throws IOException
        Deprecated.
        Overrides:
        reset in class org.apache.lucene.analysis.TokenStream
        Throws:
        IOException
      • reset

        public void reset​(Reader input)
                   throws IOException
        Deprecated.
        Overrides:
        reset in class org.apache.lucene.analysis.Tokenizer
        Throws:
        IOException