Class FuzzyQuery

  • All Implemented Interfaces:
    Serializable, Cloneable

    public class FuzzyQuery
    extends MultiTermQuery
    Implements the fuzzy search query. The similarity measurement is based on the Levenshtein (edit distance) algorithm.

    Warning: this query is not very scalable with its default prefix length of 0 - in this case, *every* term will be enumerated and cause an edit score calculation.

    This query uses MultiTermQuery.TopTermsScoringBooleanQueryRewrite as default. So terms will be collected and scored according to their edit distance. Only the top terms are used for building the BooleanQuery. It is not recommended to change the rewrite mode for fuzzy queries.

    See Also:
    Serialized Form
    • Constructor Detail

      • FuzzyQuery

        public FuzzyQuery​(Term term,
                          float minimumSimilarity,
                          int prefixLength,
                          int maxExpansions)
        Create a new FuzzyQuery that will match terms with a similarity of at least minimumSimilarity to term. If a prefixLength > 0 is specified, a common prefix of that length is also required.
        Parameters:
        term - the term to search for
        minimumSimilarity - a value between 0 and 1 to set the required similarity between the query term and the matching terms. For example, for a minimumSimilarity of 0.5 a term of the same length as the query term is considered similar to the query term if the edit distance between both terms is less than length(term)*0.5
        prefixLength - length of common (non-fuzzy) prefix
        maxExpansions - the maximum number of terms to match. If this number is greater than BooleanQuery.getMaxClauseCount() when the query is rewritten, then the maxClauseCount will be used instead.
        Throws:
        IllegalArgumentException - if minimumSimilarity is >= 1 or < 0 or if prefixLength < 0
    • Method Detail

      • getMinSimilarity

        public float getMinSimilarity()
        Returns the minimum similarity that is required for this query to match.
        Returns:
        float value between 0.0 and 1.0
      • getPrefixLength

        public int getPrefixLength()
        Returns the non-fuzzy prefix length. This is the number of characters at the start of a term that must be identical (not fuzzy) to the query term if the query is to match that term.
      • getTerm

        public Term getTerm()
        Returns the pattern term.
      • toString

        public String toString​(String field)
        Description copied from class: Query
        Prints a query to a string, with field assumed to be the default field and omitted.

        The representation used is one that is supposed to be readable by QueryParser. However, there are the following limitations:

        • If the query was created by the parser, the printed representation may not be exactly what was parsed. For example, characters that need to be escaped will be represented without the required backslash.
        • Some of the more complicated queries (e.g. span queries) don't have a representation that can be parsed by QueryParser.
        Specified by:
        toString in class Query