Class CharStream

  • All Implemented Interfaces:
    Closeable, AutoCloseable, Readable
    Direct Known Subclasses:
    CharFilter, CharReader

    public abstract class CharStream
    extends Reader
    CharStream adds correctOffset(int) functionality over Reader. All Tokenizers accept a CharStream instead of Reader as input, which enables arbitrary character based filtering before tokenization. The correctOffset(int) method fixed offsets to account for removal or insertion of characters, so that the offsets reported in the tokens match the character offsets of the original Reader.
    • Constructor Detail

      • CharStream

        public CharStream()
    • Method Detail

      • correctOffset

        public abstract int correctOffset​(int currentOff)
        Called by CharFilter(s) and Tokenizer to correct token offset.
        Parameters:
        currentOff - offset as seen in the output
        Returns:
        corrected offset based on the input