Class BytesRefHash


  • public final class BytesRefHash
    extends Object
    BytesRefHash is a special purpose hash-map like data-structure optimized for BytesRef instances. BytesRefHash maintains mappings of byte arrays to ordinal (Map) storing the hashed bytes efficiently in continuous storage. The mapping to the ordinal is encapsulated inside BytesRefHash and is guaranteed to be increased for each added BytesRef.

    Note: The maximum capacity BytesRef instance passed to add(BytesRef) must not be longer than ByteBlockPool.BYTE_BLOCK_SIZE-2. The internal storage is limited to 2GB total byte storage.

    NOTE: This API is for internal purposes only and might change in incompatible ways in the next release.
    • Method Detail

      • get

        public BytesRef get​(int ord,
                            BytesRef ref)
        Populates and returns a BytesRef with the bytes for the given ord.

        Note: the given ord must be a positive integer less that the current size ( size())

        Parameters:
        ord - the ord
        ref - the BytesRef to populate
        Returns:
        the given BytesRef instance populated with the bytes for the given ord
      • compact

        public int[] compact()
        Returns the ords array in arbitrary order. Valid ords start at offset of 0 and end at a limit of size() - 1

        Note: This is a destructive operation. clear() must be called in order to reuse this BytesRefHash instance.

      • sort

        public int[] sort​(Comparator<BytesRef> comp)
        Returns the values array sorted by the referenced byte values.

        Note: This is a destructive operation. clear() must be called in order to reuse this BytesRefHash instance.

        Parameters:
        comp - the Comparator used for sorting
      • clear

        public void clear​(boolean resetPool)
        Clears the BytesRef which maps to the given BytesRef
      • clear

        public void clear()
      • close

        public void close()
        Closes the BytesRefHash and releases all internally used memory
      • add

        public int add​(BytesRef bytes,
                       int code)
        Adds a new BytesRef with a pre-calculated hash code.
        Parameters:
        bytes - the bytes to hash
        code - the bytes hash code

        Hashcode is defined as:

         int hash = 0;
         for (int i = offset; i < offset + length; i++) {
           hash = 31 * hash + bytes[i];
         }
         
        Returns:
        the ord the given bytes are hashed if there was no mapping for the given bytes, otherwise (-(ord)-1). This guarantees that the return value will always be >= 0 if the given bytes haven't been hashed before.
        Throws:
        BytesRefHash.MaxBytesLengthExceededException - if the given bytes are > ByteBlockPool.BYTE_BLOCK_SIZE - 2
      • addByPoolOffset

        public int addByPoolOffset​(int offset)
      • reinit

        public void reinit()
        reinitializes the BytesRefHash after a previous clear() call. If clear() has not been called previously this method has no effect.
      • byteStart

        public int byteStart​(int ord)
        Returns the bytesStart offset into the internally used ByteBlockPool for the given ord
        Parameters:
        ord - the ord to look up
        Returns:
        the bytesStart offset into the internally used ByteBlockPool for the given ord