Class ComplementNaiveBayes

  • All Implemented Interfaces:
    java.io.Serializable, java.lang.Cloneable, CapabilitiesHandler, OptionHandler, RevisionHandler, TechnicalInformationHandler, WeightedInstancesHandler

    public class ComplementNaiveBayes
    extends Classifier
    implements OptionHandler, WeightedInstancesHandler, TechnicalInformationHandler
    Class for building and using a Complement class Naive Bayes classifier.

    For more information see,

    Jason D. Rennie, Lawrence Shih, Jaime Teevan, David R. Karger: Tackling the Poor Assumptions of Naive Bayes Text Classifiers. In: ICML, 616-623, 2003.

    P.S.: TF, IDF and length normalization transforms, as described in the paper, can be performed through weka.filters.unsupervised.StringToWordVector.

    BibTeX:

     @inproceedings{Rennie2003,
        author = {Jason D. Rennie and Lawrence Shih and Jaime Teevan and David R. Karger},
        booktitle = {ICML},
        pages = {616-623},
        publisher = {AAAI Press},
        title = {Tackling the Poor Assumptions of Naive Bayes Text Classifiers},
        year = {2003}
     }
     

    Valid options are:

     -N
      Normalize the word weights for each class
     
     -S
      Smoothing value to avoid zero WordGivenClass probabilities (default=1.0).
     
    Version:
    $Revision: 5516 $
    Author:
    Ashraf M. Kibriya (amk14@cs.waikato.ac.nz)
    See Also:
    Serialized Form
    • Constructor Detail

      • ComplementNaiveBayes

        public ComplementNaiveBayes()
    • Method Detail

      • listOptions

        public java.util.Enumeration listOptions()
        Returns an enumeration describing the available options.
        Specified by:
        listOptions in interface OptionHandler
        Overrides:
        listOptions in class Classifier
        Returns:
        an enumeration of all the available options.
      • getOptions

        public java.lang.String[] getOptions()
        Gets the current settings of the classifier.
        Specified by:
        getOptions in interface OptionHandler
        Overrides:
        getOptions in class Classifier
        Returns:
        an array of strings suitable for passing to setOptions
      • setOptions

        public void setOptions​(java.lang.String[] options)
                        throws java.lang.Exception
        Parses a given list of options.

        Valid options are:

         -N
          Normalize the word weights for each class
         
         -S
          Smoothing value to avoid zero WordGivenClass probabilities (default=1.0).
         
        Specified by:
        setOptions in interface OptionHandler
        Overrides:
        setOptions in class Classifier
        Parameters:
        options - the list of options as an array of strings
        Throws:
        java.lang.Exception - if an option is not supported
      • getNormalizeWordWeights

        public boolean getNormalizeWordWeights()
        Returns true if the word weights for each class are to be normalized
        Returns:
        true if the word weights are normalized
      • setNormalizeWordWeights

        public void setNormalizeWordWeights​(boolean doNormalize)
        Sets whether if the word weights for each class should be normalized
        Parameters:
        doNormalize - whether the word weights are to be normalized
      • normalizeWordWeightsTipText

        public java.lang.String normalizeWordWeightsTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • getSmoothingParameter

        public double getSmoothingParameter()
        Gets the smoothing value to be used to avoid zero WordGivenClass probabilities.
        Returns:
        the smoothing value
      • setSmoothingParameter

        public void setSmoothingParameter​(double val)
        Sets the smoothing value used to avoid zero WordGivenClass probabilities
        Parameters:
        val - the new smooting value
      • smoothingParameterTipText

        public java.lang.String smoothingParameterTipText()
        Returns the tip text for this property
        Returns:
        tip text for this property suitable for displaying in the explorer/experimenter gui
      • globalInfo

        public java.lang.String globalInfo()
        Returns a string describing this classifier
        Returns:
        a description of the classifier suitable for displaying in the explorer/experimenter gui
      • getTechnicalInformation

        public TechnicalInformation getTechnicalInformation()
        Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
        Specified by:
        getTechnicalInformation in interface TechnicalInformationHandler
        Returns:
        the technical information about this class
      • buildClassifier

        public void buildClassifier​(Instances instances)
                             throws java.lang.Exception
        Generates the classifier.
        Specified by:
        buildClassifier in class Classifier
        Parameters:
        instances - set of instances serving as training data
        Throws:
        java.lang.Exception - if the classifier has not been built successfully
      • classifyInstance

        public double classifyInstance​(Instance instance)
                                throws java.lang.Exception
        Classifies a given instance.

        The classification rule is:
        MinC(forAllWords(ti*Wci))
        where
        ti is the frequency of word i in the given instance
        Wci is the weight of word i in Class c.

        For more information see section 4.4 of the paper mentioned above in the classifiers description.

        Overrides:
        classifyInstance in class Classifier
        Parameters:
        instance - the instance to classify
        Returns:
        the index of the class the instance is most likely to belong.
        Throws:
        java.lang.Exception - if the classifier has not been built yet.
      • toString

        public java.lang.String toString()
        Prints out the internal model built by the classifier. In this case it prints out the word weights calculated when building the classifier.
        Overrides:
        toString in class java.lang.Object
      • main

        public static void main​(java.lang.String[] argv)
        Main method for testing this class.
        Parameters:
        argv - the options