Class DirectoryTaxonomyWriter

    • Field Detail

      • INDEX_CREATE_TIME

        public static final String INDEX_CREATE_TIME
        Property name of user commit data that contains the creation time of a taxonomy index.

        Applications should not use this property in their commit data because it will be overridden by this taxonomy writer.

        See Also:
        Constant Field Values
    • Method Detail

      • setDelimiter

        public void setDelimiter​(char delimiter)
        setDelimiter changes the character that the taxonomy uses in its internal storage as a delimiter between category components. Do not use this method unless you really know what you are doing. It has nothing to do with whatever character the application may be using to represent categories for its own use.

        If you do use this method, make sure you call it before any other methods that actually queries the taxonomy. Moreover, make sure you always pass the same delimiter for all LuceneTaxonomyWriter and LuceneTaxonomyReader objects you create for the same directory.

      • unlock

        public static void unlock​(Directory directory)
                           throws IOException
        Forcibly unlocks the taxonomy in the named directory.

        Caution: this should only be used by failure recovery code, when it is known that no other process nor thread is in fact currently accessing this taxonomy.

        This method is unnecessary if your Directory uses a NativeFSLockFactory instead of the default SimpleFSLockFactory. When the "native" lock is used, a lock does not stay behind forever when the process using it dies.

        Throws:
        IOException
      • defaultTaxonomyWriterCache

        public static TaxonomyWriterCache defaultTaxonomyWriterCache()
        Defines the default TaxonomyWriterCache to use in constructors which do not specify one.

        The current default is Cl2oTaxonomyWriterCache constructed with the parameters (1024, 0.15f, 3), i.e., the entire taxonomy is cached in memory while building it.

      • getCacheMemoryUsage

        public int getCacheMemoryUsage()
        Returns the number of memory bytes used by the cache.
        Returns:
        Number of cache bytes in memory, for CL2O only; zero otherwise.
      • closeResources

        protected void closeResources()
                               throws IOException
        A hook for extending classes to close additional resources that were used. The default implementation closes the IndexReader as well as the TaxonomyWriterCache instances that were used.
        NOTE: if you override this method, you should include a super.closeResources() call in your implementation.
        Throws:
        IOException
      • findCategory

        protected int findCategory​(CategoryPath categoryPath)
                            throws IOException
        Look up the given category in the cache and/or the on-disk storage, returning the category's ordinal, or a negative number in case the category does not yet exist in the taxonomy.
        Throws:
        IOException
      • addCategory

        public int addCategory​(CategoryPath categoryPath)
                        throws IOException
        Description copied from interface: TaxonomyWriter
        addCategory() adds a category with a given path name to the taxonomy, and returns its ordinal. If the category was already present in the taxonomy, its existing ordinal is returned.

        Before adding a category, addCategory() makes sure that all its ancestor categories exist in the taxonomy as well. As result, the ordinal of a category is guaranteed to be smaller then the ordinal of any of its descendants.

        Specified by:
        addCategory in interface TaxonomyWriter
        Throws:
        IOException
      • ensureOpen

        protected final void ensureOpen()
        Verifies that this instance wasn't closed, or throws AlreadyClosedException if it is.
      • getSize

        public int getSize()
        getSize() returns the number of categories in the taxonomy.

        Because categories are numbered consecutively starting with 0, it means the taxonomy contains ordinals 0 through getSize()-1.

        Note that the number returned by getSize() is often slightly higher than the number of categories inserted into the taxonomy; This is because when a category is added to the taxonomy, its ancestors are also added automatically (including the root, which always get ordinal 0).

        Specified by:
        getSize in interface TaxonomyWriter
      • setCacheMissesUntilFill

        public void setCacheMissesUntilFill​(int i)
        Set the number of cache misses before an attempt is made to read the entire taxonomy into the in-memory cache.

        LuceneTaxonomyWriter holds an in-memory cache of recently seen categories to speed up operation. On each cache-miss, the on-disk index needs to be consulted. When an existing taxonomy is opened, a lot of slow disk reads like that are needed until the cache is filled, so it is more efficient to read the entire taxonomy into memory at once. We do this complete read after a certain number (defined by this method) of cache misses.

        If the number is set to 0, the entire taxonomy is read into the cache on first use, without fetching individual categories first.

        Note that if the memory cache of choice is limited in size, and cannot hold the entire content of the on-disk taxonomy, then it is never read in its entirety into the cache, regardless of the setting of this method.

      • getParent

        public int getParent​(int ordinal)
                      throws IOException
        Description copied from interface: TaxonomyWriter
        getParent() returns the ordinal of the parent category of the category with the given ordinal.

        When a category is specified as a path name, finding the path of its parent is as trivial as dropping the last component of the path. getParent() is functionally equivalent to calling getPath() on the given ordinal, dropping the last component of the path, and then calling getOrdinal() to get an ordinal back.

        If the given ordinal is the ROOT_ORDINAL, an INVALID_ORDINAL is returned. If the given ordinal is a top-level category, the ROOT_ORDINAL is returned. If an invalid ordinal is given (negative or beyond the last available ordinal), an ArrayIndexOutOfBoundsException is thrown. However, it is expected that getParent will only be called for ordinals which are already known to be in the taxonomy. TODO (Facet): instead of a getParent(ordinal) method, consider having a

        getCategory(categorypath, prefixlen) which is similar to addCategory except it doesn't add new categories; This method can be used to get the ordinals of all prefixes of the given category, and it can use exactly the same code and cache used by addCategory() so it means less code.

        Specified by:
        getParent in interface TaxonomyWriter
        Throws:
        IOException