Class DemoHTMLParser

  • All Implemented Interfaces:
    HTMLParser

    public class DemoHTMLParser
    extends Object
    implements HTMLParser
    HTML Parser that is based on Lucene's demo HTML parser.
    • Constructor Detail

      • DemoHTMLParser

        public DemoHTMLParser()
    • Method Detail

      • parse

        public DocData parse​(DocData docData,
                             String name,
                             Date date,
                             String title,
                             Reader reader,
                             DateFormat dateFormat)
                      throws IOException,
                             InterruptedException
        Description copied from interface: HTMLParser
        Parse the input Reader and return DocData. The provided name,title,date are used for the result, unless when they're null, in which case an attempt is made to set them from the parsed data.
        Specified by:
        parse in interface HTMLParser
        Parameters:
        docData - result reused
        name - name of the result doc data.
        date - date of the result doc data. If null, attempt to set by parsed data.
        title - title of the result doc data. If null, attempt to set by parsed data.
        reader - reader of html text to parse.
        dateFormat - date formatter to use for extracting the date.
        Returns:
        Parsed doc data.
        Throws:
        IOException
        InterruptedException