Class TrecDocParser

    • Constructor Detail

      • TrecDocParser

        public TrecDocParser()
    • Method Detail

      • stripTags

        public static String stripTags​(StringBuilder buf,
                                       int start)
        strip tags from buf: each tag is replaced by a single blank.
        Returns:
        text obtained when stripping all tags from buf (Input StringBuilder is unmodified).
      • extract

        public static String extract​(StringBuilder buf,
                                     String startTag,
                                     String endTag,
                                     int maxPos,
                                     String[] noisePrefixes)
        Extract from buf the text of interest within specified tags
        Parameters:
        buf - entire input text
        startTag - tag marking start of text of interest
        endTag - tag marking end of text of interest
        maxPos - if ≥ 0 sets a limit on start of text of interest
        Returns:
        text of interest or null if not found