Class IndexableBinaryStringTools


  • public final class IndexableBinaryStringTools
    extends Object
    Provides support for converting byte sequences to Strings and back again. The resulting Strings preserve the original byte sequences' sort order.

    The Strings are constructed using a Base 8000h encoding of the original binary data - each char of an encoded String represents a 15-bit chunk from the byte sequence. Base 8000h was chosen because it allows for all lower 15 bits of char to be used without restriction; the surrogate range [U+D8000-U+DFFF] does not represent valid chars, and would require complicated handling to avoid them and allow use of char's high bit.

    Although unset bits are used as padding in the final char, the original byte sequence could contain trailing bytes with no set bits (null bytes): padding is indistinguishable from valid information. To overcome this problem, a char is appended, indicating the number of encoded bytes in the final content char.

    Some methods in this class are defined over CharBuffers and ByteBuffers, but these are deprecated in favor of methods that operate directly on byte[] and char[] arrays. Note that this class calls array() and arrayOffset() on the CharBuffers and ByteBuffers it uses, so only wrapped arrays may be used. This class interprets the arrayOffset() and limit() values returned by its input buffers as beginning and end+1 positions on the wrapped array, respectively; similarly, on the output buffer, arrayOffset() is the first position written to, and limit() is set to one past the final output array position.

    WARNING: This means that the deprecated Buffer-based methods only work correctly with buffers that have an offset of 0. For example, they will not correctly interpret buffers returned by ByteBuffer.slice().

    WARNING: This API is experimental and might change in incompatible ways in the next release.
    • Method Detail

      • getEncodedLength

        @Deprecated
        public static int getEncodedLength​(ByteBuffer original)
                                    throws IllegalArgumentException
        Deprecated.
        Use getEncodedLength(byte[], int, int) instead. This method will be removed in Lucene 4.0
        Returns the number of chars required to encode the given byte sequence.
        Parameters:
        original - The byte sequence to be encoded. Must be backed by an array.
        Returns:
        The number of chars required to encode the given byte sequence
        Throws:
        IllegalArgumentException - If the given ByteBuffer is not backed by an array
      • getEncodedLength

        public static int getEncodedLength​(byte[] inputArray,
                                           int inputOffset,
                                           int inputLength)
        Returns the number of chars required to encode the given bytes.
        Parameters:
        inputArray - byte sequence to be encoded
        inputOffset - initial offset into inputArray
        inputLength - number of bytes in inputArray
        Returns:
        The number of chars required to encode the number of bytes.
      • getDecodedLength

        @Deprecated
        public static int getDecodedLength​(CharBuffer encoded)
                                    throws IllegalArgumentException
        Deprecated.
        Use getDecodedLength(char[], int, int) instead. This method will be removed in Lucene 4.0
        Returns the number of bytes required to decode the given char sequence.
        Parameters:
        encoded - The char sequence to be decoded. Must be backed by an array.
        Returns:
        The number of bytes required to decode the given char sequence
        Throws:
        IllegalArgumentException - If the given CharBuffer is not backed by an array
      • getDecodedLength

        public static int getDecodedLength​(char[] encoded,
                                           int offset,
                                           int length)
        Returns the number of bytes required to decode the given char sequence.
        Parameters:
        encoded - char sequence to be decoded
        offset - initial offset
        length - number of characters
        Returns:
        The number of bytes required to decode the given char sequence
      • encode

        public static void encode​(byte[] inputArray,
                                  int inputOffset,
                                  int inputLength,
                                  char[] outputArray,
                                  int outputOffset,
                                  int outputLength)
        Encodes the input byte sequence into the output char sequence. Before calling this method, ensure that the output array has sufficient capacity by calling getEncodedLength(byte[], int, int).
        Parameters:
        inputArray - byte sequence to be encoded
        inputOffset - initial offset into inputArray
        inputLength - number of bytes in inputArray
        outputArray - char sequence to store encoded result
        outputOffset - initial offset into outputArray
        outputLength - length of output, must be getEncodedLength
      • decode

        public static void decode​(char[] inputArray,
                                  int inputOffset,
                                  int inputLength,
                                  byte[] outputArray,
                                  int outputOffset,
                                  int outputLength)
        Decodes the input char sequence into the output byte sequence. Before calling this method, ensure that the output array has sufficient capacity by calling getDecodedLength(char[], int, int).
        Parameters:
        inputArray - char sequence to be decoded
        inputOffset - initial offset into inputArray
        inputLength - number of chars in inputArray
        outputArray - byte sequence to store encoded result
        outputOffset - initial offset into outputArray
        outputLength - length of output, must be getDecodedLength(inputArray, inputOffset, inputLength)