"Fossies" - the Fresh Open Source Software Archive

Member "elasticsearch-6.8.23/docs/reference/query-dsl.asciidoc" (29 Dec 2021, 1514 Bytes) of package /linux/www/elasticsearch-6.8.23-src.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format (assuming AsciiDoc format). Alternatively you can here view or download the uninterpreted source code file. A member file download can also be achieved by clicking within a package contents listing on the according byte size field.

Query and filter context

The behaviour of a query clause depends on whether it is used in query context or in filter context:

Query context

A query clause used in query context answers the question `How well does this document match this query clause?'' Besides deciding whether or not the document matches, the query clause also calculates a `_score representing how well the document matches, relative to other documents.

Query context is in effect whenever a query clause is passed to a query parameter, such as the query parameter in the search API.

Filter context

In filter context, a query clause answers the question ``Does this document match this query clause?'' The answer is a simple Yes or No — no scores are calculated. Filter context is mostly used for filtering structured data, e.g.

  • Does this timestamp fall into the range 2015 to 2016?

  • Is the status field set to "published"?

Frequently used filters will be cached automatically by Elasticsearch, to speed up performance.

Filter context is in effect whenever a query clause is passed to a filter parameter, such as the filter or must_not parameters in the bool query, the filter parameter in the constant_score query, or the filter aggregation.

Below is an example of query clauses being used in query and filter context in the search API. This query will match documents where all of the following conditions are met:

  • The title field contains the word search.

  • The content field contains the word elasticsearch.

  • The status field contains the exact word published.

  • The publish_date field contains a date from 1 Jan 2015 onwards.

GET /_search
{
  "query": { (1)
    "bool": { (2)
      "must": [
        { "match": { "title":   "Search"        }},
        { "match": { "content": "Elasticsearch" }}
      ],
      "filter": [ (3)
        { "term":  { "status": "published" }},
        { "range": { "publish_date": { "gte": "2015-01-01" }}}
      ]
    }
  }
}
  1. The query parameter indicates query context.

  2. The bool and two match clauses are used in query context, which means that they are used to score how well each document matches.

  3. The filter parameter indicates filter context. Its term and range clauses are used in filter context. They will filter out documents which do not match, but they will not affect the score for matching documents.

Warning
Scores calculated for queries in query context are represented as single precision floating point numbers; they have only 24 bits for significand’s precision. Score calculations that exceed the significand’s precision will be converted to floats with loss of precision.
Tip
Use query clauses in query context for conditions which should affect the score of matching documents (i.e. how well does the document match), and use all other query clauses in filter context.

Match All Query

The most simple query, which matches all documents, giving them all a _score of 1.0.

GET /_search
{
    "query": {
        "match_all": {}
    }
}

The _score can be changed with the boost parameter:

GET /_search
{
    "query": {
        "match_all": { "boost" : 1.2 }
    }
}

Match None Query

This is the inverse of the match_all query, which matches no documents.

GET /_search
{
    "query": {
        "match_none": {}
    }
}

Full text queries

The high-level full text queries are usually used for running full text queries on full text fields like the body of an email. They understand how the field being queried is analyzed and will apply each field’s analyzer (or search_analyzer) to the query string before executing.

The queries in this group are:

match query

The standard query for performing full text queries, including fuzzy matching and phrase or proximity queries.

match_phrase query

Like the match query but used for matching exact phrases or word proximity matches.

match_phrase_prefix query

The poor man’s search-as-you-type. Like the match_phrase query, but does a wildcard search on the final word.

multi_match query

The multi-field version of the match query.

common terms query

A more specialized query which gives more preference to uncommon words.

query_string query

Supports the compact Lucene query string syntax, allowing you to specify AND|OR|NOT conditions and multi-field search within a single query string. For expert users only.

simple_query_string query

A simpler, more robust version of the query_string syntax suitable for exposing directly to users.

Match Query

match queries accept text/numerics/dates, analyzes them, and constructs a query. For example:

GET /_search
{
    "query": {
        "match" : {
            "message" : "this is a test"
        }
    }
}

Note, message is the name of a field, you can substitute the name of any field instead.

match

The match query is of type boolean. It means that the text provided is analyzed and the analysis process constructs a boolean query from the provided text. The operator flag can be set to or or and to control the boolean clauses (defaults to or). The minimum number of optional should clauses to match can be set using the minimum_should_match parameter.

Here is an example when providing additional parameters (note the slight change in structure, message is the field name):

GET /_search
{
    "query": {
        "match" : {
            "message" : {
                "query" : "this is a test",
                "operator" : "and"
            }
        }
    }
}

The analyzer can be set to control which analyzer will perform the analysis process on the text. It defaults to the field explicit mapping definition, or the default search analyzer.

The lenient parameter can be set to true to ignore exceptions caused by data-type mismatches, such as trying to query a numeric field with a text query string. Defaults to false.

Fuzziness

fuzziness allows fuzzy matching based on the type of field being queried. See [fuzziness] for allowed settings.

The prefix_length and max_expansions can be set in this case to control the fuzzy process. If the fuzzy option is set the query will use top_terms_blended_freqs_${max_expansions} as its rewrite method the fuzzy_rewrite parameter allows to control how the query will get rewritten.

Fuzzy transpositions (abba) are allowed by default but can be disabled by setting fuzzy_transpositions to false.

Note that fuzzy matching is not applied to terms with synonyms, as under the hood these terms are expanded to a special synonym query that blends term frequencies, which does not support fuzzy expansion.

GET /_search
{
    "query": {
        "match" : {
            "message" : {
                "query" : "this is a testt",
                "fuzziness": "AUTO"
            }
        }
    }
}
Zero terms query

If the analyzer used removes all tokens in a query like a stop filter does, the default behavior is to match no documents at all. In order to change that the zero_terms_query option can be used, which accepts none (default) and all which corresponds to a match_all query.

GET /_search
{
    "query": {
        "match" : {
            "message" : {
                "query" : "to be or not to be",
                "operator" : "and",
                "zero_terms_query": "all"
            }
        }
    }
}
Cutoff frequency

The match query supports a cutoff_frequency that allows specifying an absolute or relative document frequency where high frequency terms are moved into an optional subquery and are only scored if one of the low frequency (below the cutoff) terms in the case of an or operator or all of the low frequency terms in the case of an and operator match.

This query allows handling stopwords dynamically at runtime, is domain independent and doesn’t require a stopword file. It prevents scoring / iterating high frequency terms and only takes the terms into account if a more significant / lower frequency term matches a document. Yet, if all of the query terms are above the given cutoff_frequency the query is automatically transformed into a pure conjunction (and) query to ensure fast execution.

The cutoff_frequency can either be relative to the total number of documents if in the range [0..1) or absolute if greater or equal to 1.0.

Here is an example showing a query composed of stopwords exclusively:

GET /_search
{
    "query": {
        "match" : {
            "message" : {
                "query" : "to be or not to be",
                "cutoff_frequency" : 0.001
            }
        }
    }
}
Important
The cutoff_frequency option operates on a per-shard-level. This means that when trying it out on test indexes with low document numbers you should follow the advice in {defguide}/relevance-is-broken.html[Relevance is broken].
Synonyms

The match query supports multi-terms synonym expansion with the synonym_graph token filter. When this filter is used, the parser creates a phrase query for each multi-terms synonyms. For example, the following synonym: "ny, new york" would produce:

(ny OR ("new york"))

It is also possible to match multi terms synonyms with conjunctions instead:

GET /_search
{
   "query": {
       "match" : {
           "message": {
               "query" : "ny city",
               "auto_generate_synonyms_phrase_query" : false
           }
       }
   }
}

The example above creates a boolean query:

(ny OR (new AND york)) city

that matches documents with the term ny or the conjunction new AND york. By default the parameter auto_generate_synonyms_phrase_query is set to true.

Comparison to query_string / field

The match family of queries does not go through a "query parsing" process. It does not support field name prefixes, wildcard characters, or other "advanced" features. For this reason, chances of it failing are very small / non existent, and it provides an excellent behavior when it comes to just analyze and run that text as a query behavior (which is usually what a text search box does). Also, the phrase_prefix type can provide a great "as you type" behavior to automatically load search results.

Match Phrase Query

The match_phrase query analyzes the text and creates a phrase query out of the analyzed text. For example:

GET /_search
{
    "query": {
        "match_phrase" : {
            "message" : "this is a test"
        }
    }
}

A phrase query matches terms up to a configurable slop (which defaults to 0) in any order. Transposed terms have a slop of 2.

The analyzer can be set to control which analyzer will perform the analysis process on the text. It defaults to the field explicit mapping definition, or the default search analyzer, for example:

GET /_search
{
    "query": {
        "match_phrase" : {
            "message" : {
                "query" : "this is a test",
                "analyzer" : "my_analyzer"
            }
        }
    }
}

This query also accepts zero_terms_query, as explained in match query.

Match Phrase Prefix Query

The match_phrase_prefix is the same as match_phrase, except that it allows for prefix matches on the last term in the text. For example:

GET /_search
{
    "query": {
        "match_phrase_prefix" : {
            "message" : "quick brown f"
        }
    }
}

It accepts the same parameters as the phrase type. In addition, it also accepts a max_expansions parameter (default 50) that can control to how many suffixes the last term will be expanded. It is highly recommended to set it to an acceptable value to control the execution time of the query. For example:

GET /_search
{
    "query": {
        "match_phrase_prefix" : {
            "message" : {
                "query" : "quick brown f",
                "max_expansions" : 10
            }
        }
    }
}
Important

The match_phrase_prefix query is a poor-man’s autocomplete. It is very easy to use, which lets you get started quickly with search-as-you-type but its results, which usually are good enough, can sometimes be confusing.

Consider the query string quick brown f. This query works by creating a phrase query out of quick and brown (i.e. the term quick must exist and must be followed by the term brown). Then it looks at the sorted term dictionary to find the first 50 terms that begin with f, and adds these terms to the phrase query.

The problem is that the first 50 terms may not include the term fox so the phrase quick brown fox will not be found. This usually isn’t a problem as the user will continue to type more letters until the word they are looking for appears.

For better solutions for search-as-you-type see the completion suggester and {defguide}/_index_time_search_as_you_type.html[Index-Time Search-as-You-Type].

Multi Match Query

The multi_match query builds on the match query to allow multi-field queries:

GET /_search
{
  "query": {
    "multi_match" : {
      "query":    "this is a test", (1)
      "fields": [ "subject", "message" ] (2)
    }
  }
}
  1. The query string.

  2. The fields to be queried.

fields and per-field boosting

Fields can be specified with wildcards, eg:

GET /_search
{
  "query": {
    "multi_match" : {
      "query":    "Will Smith",
      "fields": [ "title", "*_name" ] (1)
    }
  }
}
  1. Query the title, first_name and last_name fields.

Individual fields can be boosted with the caret (^) notation:

GET /_search
{
  "query": {
    "multi_match" : {
      "query" : "this is a test",
      "fields" : [ "subject^3", "message" ] (1)
    }
  }
}
  1. The subject field is three times as important as the message field.

If no fields are provided, the multi_match query defaults to the index.query.default_field index settings, which in turn defaults to . extracts all fields in the mapping that are eligible to term queries and filters the metadata fields. All extracted fields are then combined to build a query.

Warning
If you have a huge number of fields, the above auto expansion might lead to querying a large number of fields which might cause performance issues. In future versions (starting in 7.0), there will be a limit on the number of fields that can be queried at once. This limit will be determined by the indices.query.bool.max_clause_count setting which defaults to 1024.

Types of multi_match query:

The way the multi_match query is executed internally depends on the type parameter, which can be set to:

best_fields

(default) Finds documents which match any field, but uses the _score from the best field. See best_fields.

most_fields

Finds documents which match any field and combines the _score from each field. See most_fields.

cross_fields

Treats fields with the same analyzer as though they were one big field. Looks for each word in any field. See cross_fields.

phrase

Runs a match_phrase query on each field and uses the _score from the best field. See phrase and phrase_prefix.

phrase_prefix

Runs a match_phrase_prefix query on each field and combines the _score from each field. See phrase and phrase_prefix.

best_fields

The best_fields type is most useful when you are searching for multiple words best found in the same field. For instance brown fox'' in a single field is more meaningful than brown'' in one field and ``fox'' in the other.

The best_fields type generates a match query for each field and wraps them in a dis_max query, to find the single best matching field. For instance, this query:

GET /_search
{
  "query": {
    "multi_match" : {
      "query":      "brown fox",
      "type":       "best_fields",
      "fields":     [ "subject", "message" ],
      "tie_breaker": 0.3
    }
  }
}

would be executed as:

GET /_search
{
  "query": {
    "dis_max": {
      "queries": [
        { "match": { "subject": "brown fox" }},
        { "match": { "message": "brown fox" }}
      ],
      "tie_breaker": 0.3
    }
  }
}

Normally the best_fields type uses the score of the single best matching field, but if tie_breaker is specified, then it calculates the score as follows:

  • the score from the best matching field

  • plus tie_breaker * _score for all other matching fields

Also, accepts analyzer, boost, operator, minimum_should_match, fuzziness, lenient, prefix_length, max_expansions, rewrite, zero_terms_query, cutoff_frequency, auto_generate_synonyms_phrase_query and fuzzy_transpositions, as explained in match query.

Important
operator and minimum_should_match

The best_fields and most_fields types are field-centric — they generate a match query per field. This means that the operator and minimum_should_match parameters are applied to each field individually, which is probably not what you want.

Take this query for example:

GET /_search
{
  "query": {
    "multi_match" : {
      "query":      "Will Smith",
      "type":       "best_fields",
      "fields":     [ "first_name", "last_name" ],
      "operator":   "and" (1)
    }
  }
}
  1. All terms must be present.

This query is executed as:

  (+first_name:will +first_name:smith)
| (+last_name:will  +last_name:smith)

In other words, all terms must be present in a single field for a document to match.

See cross_fields for a better solution.

most_fields

The most_fields type is most useful when querying multiple fields that contain the same text analyzed in different ways. For instance, the main field may contain synonyms, stemming and terms without diacritics. A second field may contain the original terms, and a third field might contain shingles. By combining scores from all three fields we can match as many documents as possible with the main field, but use the second and third fields to push the most similar results to the top of the list.

This query:

GET /_search
{
  "query": {
    "multi_match" : {
      "query":      "quick brown fox",
      "type":       "most_fields",
      "fields":     [ "title", "title.original", "title.shingles" ]
    }
  }
}

would be executed as:

GET /_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "title":          "quick brown fox" }},
        { "match": { "title.original": "quick brown fox" }},
        { "match": { "title.shingles": "quick brown fox" }}
      ]
    }
  }
}

The score from each match clause is added together, then divided by the number of match clauses.

Also, accepts analyzer, boost, operator, minimum_should_match, fuzziness, lenient, prefix_length, max_expansions, rewrite, zero_terms_query and cutoff_frequency, as explained in match query, but see operator and minimum_should_match.

phrase and phrase_prefix

The phrase and phrase_prefix types behave just like best_fields, but they use a match_phrase or match_phrase_prefix query instead of a match query.

This query:

GET /_search
{
  "query": {
    "multi_match" : {
      "query":      "quick brown f",
      "type":       "phrase_prefix",
      "fields":     [ "subject", "message" ]
    }
  }
}

would be executed as:

GET /_search
{
  "query": {
    "dis_max": {
      "queries": [
        { "match_phrase_prefix": { "subject": "quick brown f" }},
        { "match_phrase_prefix": { "message": "quick brown f" }}
      ]
    }
  }
}

Also, accepts analyzer, boost, lenient and zero_terms_query as explained in Match Query, as well as slop which is explained in Match Phrase Query. Type phrase_prefix additionally accepts max_expansions.

Important
phrase, phrase_prefix and fuzziness

The fuzziness parameter cannot be used with the phrase or phrase_prefix type.

cross_fields

The cross_fields type is particularly useful with structured documents where multiple fields should match. For instance, when querying the first_name and last_name fields for Will Smith'', the best match is likely to have Will'' in one field and ``Smith'' in the other.

This sounds like a job for most_fields but there are two problems with that approach. The first problem is that operator and minimum_should_match are applied per-field, instead of per-term (see explanation above).

The second problem is to do with relevance: the different term frequencies in the first_name and last_name fields can produce unexpected results.

For instance, imagine we have two people: Will Smith'' and Smith Jones''. Smith'' as a last name is very common (and so is of low importance) but Smith'' as a first name is very uncommon (and so is of great importance).

If we do a search for Will Smith'', the Smith Jones'' document will probably appear above the better matching `Will Smith'' because the score of `first_name:smith has trumped the combined scores of first_name:will plus last_name:smith.

One way of dealing with these types of queries is simply to index the first_name and last_name fields into a single full_name field. Of course, this can only be done at index time.

The cross_field type tries to solve these problems at query time by taking a term-centric approach. It first analyzes the query string into individual terms, then looks for each term in any of the fields, as though they were one big field.

A query like:

GET /_search
{
  "query": {
    "multi_match" : {
      "query":      "Will Smith",
      "type":       "cross_fields",
      "fields":     [ "first_name", "last_name" ],
      "operator":   "and"
    }
  }
}

is executed as:

+(first_name:will  last_name:will)
+(first_name:smith last_name:smith)

In other words, all terms must be present in at least one field for a document to match. (Compare this to the logic used for best_fields and most_fields.)

That solves one of the two problems. The problem of differing term frequencies is solved by blending the term frequencies for all fields in order to even out the differences.

In practice, first_name:smith will be treated as though it has the same frequencies as last_name:smith, plus one. This will make matches on first_name and last_name have comparable scores, with a tiny advantage for last_name since it is the most likely field that contains smith.

Note that cross_fields is usually only useful on short string fields that all have a boost of 1. Otherwise boosts, term freqs and length normalization contribute to the score in such a way that the blending of term statistics is not meaningful anymore.

If you run the above query through the [search-validate], it returns this explanation:

+blended("will",  fields: [first_name, last_name])
+blended("smith", fields: [first_name, last_name])

Also, accepts analyzer, boost, operator, minimum_should_match, lenient, zero_terms_query and cutoff_frequency, as explained in match query.

cross_field and analysis

The cross_field type can only work in term-centric mode on fields that have the same analyzer. Fields with the same analyzer are grouped together as in the example above. If there are multiple groups, they are combined with a bool query.

For instance, if we have a first and last field which have the same analyzer, plus a first.edge and last.edge which both use an edge_ngram analyzer, this query:

GET /_search
{
  "query": {
    "multi_match" : {
      "query":      "Jon",
      "type":       "cross_fields",
      "fields":     [
        "first", "first.edge",
        "last",  "last.edge"
      ]
    }
  }
}

would be executed as:

    blended("jon", fields: [first, last])
| (
    blended("j",   fields: [first.edge, last.edge])
    blended("jo",  fields: [first.edge, last.edge])
    blended("jon", fields: [first.edge, last.edge])
)

In other words, first and last would be grouped together and treated as a single field, and first.edge and last.edge would be grouped together and treated as a single field.

Having multiple groups is fine, but when combined with operator or minimum_should_match, it can suffer from the same problem as most_fields or best_fields.

You can easily rewrite this query yourself as two separate cross_fields queries combined with a bool query, and apply the minimum_should_match parameter to just one of them:

GET /_search
{
  "query": {
    "bool": {
      "should": [
        {
          "multi_match" : {
            "query":      "Will Smith",
            "type":       "cross_fields",
            "fields":     [ "first", "last" ],
            "minimum_should_match": "50%" (1)
          }
        },
        {
          "multi_match" : {
            "query":      "Will Smith",
            "type":       "cross_fields",
            "fields":     [ "*.edge" ]
          }
        }
      ]
    }
  }
}
  1. Either will or smith must be present in either of the first or last fields

You can force all fields into the same group by specifying the analyzer parameter in the query.

GET /_search
{
  "query": {
   "multi_match" : {
      "query":      "Jon",
      "type":       "cross_fields",
      "analyzer":   "standard", (1)
      "fields":     [ "first", "last", "*.edge" ]
    }
  }
}
  1. Use the standard analyzer for all fields.

which will be executed as:

blended("will",  fields: [first, first.edge, last.edge, last])
blended("smith", fields: [first, first.edge, last.edge, last])
tie_breaker

By default, each per-term blended query will use the best score returned by any field in a group, then these scores are added together to give the final score. The tie_breaker parameter can change the default behaviour of the per-term blended queries. It accepts:

0.0

Take the single best score out of (eg) first_name:will and last_name:will (default)

1.0

Add together the scores for (eg) first_name:will and last_name:will

0.0 < n < 1.0

Take the single best score plus tie_breaker multiplied by each of the scores from other matching fields.

Important
cross_fields and fuzziness

The fuzziness parameter cannot be used with the cross_fields type.

Common Terms Query

The common terms query is a modern alternative to stopwords which improves the precision and recall of search results (by taking stopwords into account), without sacrificing performance.

The problem

Every term in a query has a cost. A search for "The brown fox" requires three term queries, one for each of "the", "brown" and "fox", all of which are executed against all documents in the index. The query for "the" is likely to match many documents and thus has a much smaller impact on relevance than the other two terms.

Previously, the solution to this problem was to ignore terms with high frequency. By treating "the" as a stopword, we reduce the index size and reduce the number of term queries that need to be executed.

The problem with this approach is that, while stopwords have a small impact on relevance, they are still important. If we remove stopwords, we lose precision, (eg we are unable to distinguish between "happy" and "not happy") and we lose recall (eg text like "The The" or "To be or not to be" would simply not exist in the index).

The solution

The common terms query divides the query terms into two groups: more important (ie low frequency terms) and less important (ie high frequency terms which would previously have been stopwords).

First it searches for documents which match the more important terms. These are the terms which appear in fewer documents and have a greater impact on relevance.

Then, it executes a second query for the less important terms — terms which appear frequently and have a low impact on relevance. But instead of calculating the relevance score for all matching documents, it only calculates the _score for documents already matched by the first query. In this way the high frequency terms can improve the relevance calculation without paying the cost of poor performance.

If a query consists only of high frequency terms, then a single query is executed as an AND (conjunction) query, in other words all terms are required. Even though each individual term will match many documents, the combination of terms narrows down the resultset to only the most relevant. The single query can also be executed as an OR with a specific minimum_should_match, in this case a high enough value should probably be used.

Terms are allocated to the high or low frequency groups based on the cutoff_frequency, which can be specified as an absolute frequency (>=1) or as a relative frequency (0.0 .. 1.0). (Remember that document frequencies are computed on a per shard level as explained in the blog post {defguide}/relevance-is-broken.html[Relevance is broken].)

Perhaps the most interesting property of this query is that it adapts to domain specific stopwords automatically. For example, on a video hosting site, common terms like "clip" or "video" will automatically behave as stopwords without the need to maintain a manual list.

Examples

In this example, words that have a document frequency greater than 0.1% (eg "this" and "is") will be treated as common terms.

GET /_search
{
    "query": {
        "common": {
            "body": {
                "query": "this is bonsai cool",
                "cutoff_frequency": 0.001
            }
        }
    }
}

The number of terms which should match can be controlled with the minimum_should_match (high_freq, low_freq), low_freq_operator (default "or") and high_freq_operator (default "or") parameters.

For low frequency terms, set the low_freq_operator to "and" to make all terms required:

GET /_search
{
    "query": {
        "common": {
            "body": {
                "query": "nelly the elephant as a cartoon",
                "cutoff_frequency": 0.001,
                "low_freq_operator": "and"
            }
        }
    }
}

which is roughly equivalent to:

GET /_search
{
    "query": {
        "bool": {
            "must": [
            { "term": { "body": "nelly"}},
            { "term": { "body": "elephant"}},
            { "term": { "body": "cartoon"}}
            ],
            "should": [
            { "term": { "body": "the"}},
            { "term": { "body": "as"}},
            { "term": { "body": "a"}}
            ]
        }
    }
}

Alternatively use minimum_should_match to specify a minimum number or percentage of low frequency terms which must be present, for instance:

GET /_search
{
    "query": {
        "common": {
            "body": {
                "query": "nelly the elephant as a cartoon",
                "cutoff_frequency": 0.001,
                "minimum_should_match": 2
            }
        }
    }
}

which is roughly equivalent to:

GET /_search
{
    "query": {
        "bool": {
            "must": {
                "bool": {
                    "should": [
                    { "term": { "body": "nelly"}},
                    { "term": { "body": "elephant"}},
                    { "term": { "body": "cartoon"}}
                    ],
                    "minimum_should_match": 2
                }
            },
            "should": [
                { "term": { "body": "the"}},
                { "term": { "body": "as"}},
                { "term": { "body": "a"}}
                ]
        }
    }
}

A different minimum_should_match can be applied for low and high frequency terms with the additional low_freq and high_freq parameters. Here is an example when providing additional parameters (note the change in structure):

GET /_search
{
    "query": {
        "common": {
            "body": {
                "query": "nelly the elephant not as a cartoon",
                "cutoff_frequency": 0.001,
                "minimum_should_match": {
                    "low_freq" : 2,
                    "high_freq" : 3
                }
            }
        }
    }
}

which is roughly equivalent to:

GET /_search
{
    "query": {
        "bool": {
            "must": {
                "bool": {
                    "should": [
                    { "term": { "body": "nelly"}},
                    { "term": { "body": "elephant"}},
                    { "term": { "body": "cartoon"}}
                    ],
                    "minimum_should_match": 2
                }
            },
            "should": {
                "bool": {
                    "should": [
                    { "term": { "body": "the"}},
                    { "term": { "body": "not"}},
                    { "term": { "body": "as"}},
                    { "term": { "body": "a"}}
                    ],
                    "minimum_should_match": 3
                }
            }
        }
    }
}

In this case it means the high frequency terms have only an impact on relevance when there are at least three of them. But the most interesting use of the minimum_should_match for high frequency terms is when there are only high frequency terms:

GET /_search
{
    "query": {
        "common": {
            "body": {
                "query": "how not to be",
                "cutoff_frequency": 0.001,
                "minimum_should_match": {
                    "low_freq" : 2,
                    "high_freq" : 3
                }
            }
        }
    }
}

which is roughly equivalent to:

GET /_search
{
    "query": {
        "bool": {
            "should": [
            { "term": { "body": "how"}},
            { "term": { "body": "not"}},
            { "term": { "body": "to"}},
            { "term": { "body": "be"}}
            ],
            "minimum_should_match": "3<50%"
        }
    }
}

The high frequency generated query is then slightly less restrictive than with an AND.

The common terms query also supports boost and analyzer as parameters.

Query String Query

A query that uses a query parser in order to parse its content. Here is an example:

GET /_search
{
    "query": {
        "query_string" : {
            "default_field" : "content",
            "query" : "this AND that OR thus"
        }
    }
}

The query_string query parses the input and splits text around operators. Each textual part is analyzed independently of each other. For instance the following query:

GET /_search
{
    "query": {
        "query_string" : {
            "default_field" : "content",
            "query" : "(new york city) OR (big apple)" (1)
        }
    }
}
  1. will be split into new york city and big apple and each part is then analyzed independently by the analyzer configured for the field.

Warning
Whitespaces are not considered operators, this means that new york city will be passed "as is" to the analyzer configured for the field. If the field is a keyword field the analyzer will create a single term new york city and the query builder will use this term in the query. If you want to query each term separately you need to add explicit operators around the terms (e.g. new AND york AND city).

When multiple fields are provided it is also possible to modify how the different field queries are combined inside each textual part using the type parameter. The possible modes are described here and the default is best_fields.

The query_string top level parameters include:

Parameter Description

query

The actual query to be parsed. See Query string syntax.

default_field

The default field for query terms if no prefix field is specified. Defaults to the index.query.default_field index settings, which in turn defaults to . extracts all fields in the mapping that are eligible to term queries and filters the metadata fields. All extracted fields are then combined to build a query when no prefix field is provided.

WARNING: In future versions (starting in 7.0), there will be a limit on the number of fields that can be queried at once. This limit will be determined by the indices.query.bool.max_clause_count setting which defaults to 1024. Currently this will be raised and logged as a Warning only.

default_operator

The default operator used if no explicit operator is specified. For example, with a default operator of OR, the query capital of Hungary is translated to capital OR of OR Hungary, and with default operator of AND, the same query is translated to capital AND of AND Hungary. The default value is OR.

analyzer

The analyzer name used to analyze the query string.

quote_analyzer

The name of the analyzer that is used to analyze quoted phrases in the query string. For those parts, it overrides other analyzers that are set using the analyzer parameter or the search_quote_analyzer setting.

allow_leading_wildcard

When set, * or ? are allowed as the first character. Defaults to true.

enable_position_increments

Set to true to enable position increments in result queries. Defaults to true.

fuzzy_max_expansions

Controls the number of terms fuzzy queries will expand to. Defaults to 50

fuzziness

Set the fuzziness for fuzzy queries. Defaults to AUTO. See [fuzziness] for allowed settings.

fuzzy_prefix_length

Set the prefix length for fuzzy queries. Default is 0.

fuzzy_transpositions

Set to false to disable fuzzy transpositions (abba). Default is true.

phrase_slop

Sets the default slop for phrases. If zero, then exact phrase matches are required. Default value is 0.

boost

Sets the boost value of the query. Defaults to 1.0.

auto_generate_phrase_queries

Deprecated setting. This setting is ignored, use [type=phrase] instead to make phrase queries out of all text that is within query operators, or use explicitly quoted strings if you need finer-grained control.

analyze_wildcard

By default, wildcards terms in a query string are not analyzed. By setting this value to true, a best effort will be made to analyze those as well.

max_determinized_states

Limit on how many automaton states regexp queries are allowed to create. This protects against too-difficult (e.g. exponentially hard) regexps. Defaults to 10000.

minimum_should_match

A value controlling how many "should" clauses in the resulting boolean query should match. It can be an absolute value (2), a percentage (30%) or a combination of both.

lenient

If set to true will cause format based failures (like providing text to a numeric field) to be ignored.

time_zone

Time Zone to be applied to any range query related to dates. See also JODA timezone.

quote_field_suffix

A suffix to append to fields for quoted parts of the query string. This allows to use a field that has a different analysis chain for exact matching. Look here for a comprehensive example.

auto_generate_synonyms_phrase_query

Whether phrase queries should be automatically generated for multi terms synonyms. Defaults to true.

all_fields

deprecated[6.0.0, set default_field to * instead] Perform the query on all fields detected in the mapping that can be queried. Will be used by default when the _all field is disabled and no default_field is specified (either in the index settings or in the request body) and no fields are specified.

When a multi term query is being generated, one can control how it gets rewritten using the rewrite parameter.

Default Field

When not explicitly specifying the field to search on in the query string syntax, the index.query.default_field will be used to derive which field to search on. If the index.query.default_field is not specified, the query_string will automatically attempt to determine the existing fields in the index’s mapping that are queryable, and perform the search on those fields. This will not include nested documents, use a nested query to search those documents.

Note
For mappings with a large number of fields, searching across all queryable fields in the mapping could be expensive.

Multi Field

The query_string query can also run against multiple fields. Fields can be provided via the fields parameter (example below).

The idea of running the query_string query against multiple fields is to expand each query term to an OR clause like this:

field1:query_term OR field2:query_term | ...

For example, the following query

GET /_search
{
    "query": {
        "query_string" : {
            "fields" : ["content", "name"],
            "query" : "this AND that"
        }
    }
}

matches the same words as

GET /_search
{
    "query": {
        "query_string": {
            "query": "(content:this OR name:this) AND (content:that OR name:that)"
        }
    }
}

Since several queries are generated from the individual search terms, combining them is automatically done using a dis_max query with a tie_breaker. For example (the name is boosted by 5 using ^5 notation):

GET /_search
{
    "query": {
        "query_string" : {
            "fields" : ["content", "name^5"],
            "query" : "this AND that OR thus",
            "tie_breaker" : 0
        }
    }
}

Simple wildcard can also be used to search "within" specific inner elements of the document. For example, if we have a city object with several fields (or inner object with fields) in it, we can automatically search on all "city" fields:

GET /_search
{
    "query": {
        "query_string" : {
            "fields" : ["city.*"],
            "query" : "this AND that OR thus"
        }
    }
}

Another option is to provide the wildcard fields search in the query string itself (properly escaping the sign), for example: city.\:something:

GET /_search
{
    "query": {
        "query_string" : {
            "query" : "city.\\*:(this AND that OR thus)"
        }
    }
}
Note
Since \ (backslash) is a special character in json strings, it needs to be escaped, hence the two backslashes in the above query_string.

When running the query_string query against multiple fields, the following additional parameters are allowed:

Parameter Description

type

How the fields should be combined to build the text query. See types for a complete example. Defaults to best_fields

tie_breaker

The disjunction max tie breaker for multi fields. Defaults to 0

The fields parameter can also include pattern based field names, allowing to automatically expand to the relevant fields (dynamically introduced fields included). For example:

GET /_search
{
    "query": {
        "query_string" : {
            "fields" : ["content", "name.*^5"],
            "query" : "this AND that OR thus"
        }
    }
}

Synonyms

The query_string query supports multi-terms synonym expansion with the synonym_graph token filter. When this filter is used, the parser creates a phrase query for each multi-terms synonyms. For example, the following synonym: ny, new york would produce:

(ny OR ("new york"))

It is also possible to match multi terms synonyms with conjunctions instead:

GET /_search
{
   "query": {
       "query_string" : {
           "default_field": "title",
           "query" : "ny city",
           "auto_generate_synonyms_phrase_query" : false
       }
   }
}

The example above creates a boolean query:

(ny OR (new AND york)) city)

that matches documents with the term ny or the conjunction new AND york. By default the parameter auto_generate_synonyms_phrase_query is set to true.

Minimum should match

The query_string splits the query around each operator to create a boolean query for the entire input. You can use minimum_should_match to control how many "should" clauses in the resulting query should match.

GET /_search
{
    "query": {
        "query_string": {
            "fields": [
                "title"
            ],
            "query": "this that thus",
            "minimum_should_match": 2
        }
    }
}

The example above creates a boolean query:

(title:this title:that title:thus)~2

that matches documents with at least two of the terms this, that or thus in the single field title.

Multi Field
GET /_search
{
    "query": {
        "query_string": {
            "fields": [
                "title",
                "content"
            ],
            "query": "this that thus",
            "minimum_should_match": 2
        }
    }
}

The example above creates a boolean query:

content:this content:that content:thus) | (title:this title:that title:thus

that matches documents with the disjunction max over the fields title and content. Here the minimum_should_match parameter can’t be applied.

GET /_search
{
    "query": {
        "query_string": {
            "fields": [
                "title",
                "content"
            ],
            "query": "this OR that OR thus",
            "minimum_should_match": 2
        }
    }
}

Adding explicit operators forces each term to be considered as a separate clause.

The example above creates a boolean query:

content:this | title:this) (content:that | title:that) (content:thus | title:thus~2

that matches documents with at least two of the three "should" clauses, each of them made of the disjunction max over the fields for each term.

Cross Field
GET /_search
{
    "query": {
        "query_string": {
            "fields": [
                "title",
                "content"
            ],
            "query": "this OR that OR thus",
            "type": "cross_fields",
            "minimum_should_match": 2
        }
    }
}

The cross_fields value in the type field indicates that fields that have the same analyzer should be grouped together when the input is analyzed.

The example above creates a boolean query:

(blended(terms:[field2:this, field1:this]) blended(terms:[field2:that, field1:that]) blended(terms:[field2:thus, field1:thus]))~2

that matches documents with at least two of the three per-term blended queries.

Query string syntax

The query string `mini-language'' is used by the Query String Query and by the `q query string parameter in the search API.

The query string is parsed into a series of terms and operators. A term can be a single word — quick or brown — or a phrase, surrounded by double quotes — "quick brown" — which searches for all the words in the phrase, in the same order.

Operators allow you to customize the search — the available options are explained below.

Field names

As mentioned in Query String Query, the default_field is searched for the search terms, but it is possible to specify other fields in the query syntax:

  • where the status field contains active

    status:active
  • where the title field contains quick or brown. If you omit the OR operator the default operator will be used

    title:(quick OR brown)
    title:(quick brown)
  • where the author field contains the exact phrase "john smith"

    author:"John Smith"
  • where the first name field contains Alice (note how we need to escape the space with a backslash)

    first\ name:Alice
  • where any of the fields book.title, book.content or book.date contains quick or brown (note how we need to escape the * with a backslash):

    book.\*:(quick brown)
  • where the field title has any non-null value:

    _exists_:title
Wildcards

Wildcard searches can be run on individual terms, using ? to replace a single character, and * to replace zero or more characters:

qu?ck bro*

Be aware that wildcard queries can use an enormous amount of memory and perform very badly — just think how many terms need to be queried to match the query string "a* b* c*".

Warning

Pure wildcards * are rewritten to exists queries for efficiency. As a consequence, the wildcard "field:*" would match documents with an empty value like the following:

{
  "field": ""
}

... and would not match if the field is missing or set with an explicit null value like the following:

{
  "field": null
}
Warning

Allowing a wildcard at the beginning of a word (eg "*ing") is particularly heavy, because all terms in the index need to be examined, just in case they match. Leading wildcards can be disabled by setting allow_leading_wildcard to false.

Only parts of the analysis chain that operate at the character level are applied. So for instance, if the analyzer performs both lowercasing and stemming, only the lowercasing will be applied: it would be wrong to perform stemming on a word that is missing some of its letters.

By setting analyze_wildcard to true, queries that end with a * will be analyzed and a boolean query will be built out of the different tokens, by ensuring exact matches on the first N-1 tokens, and prefix match on the last token.

Regular expressions

Regular expression patterns can be embedded in the query string by wrapping them in forward-slashes ("/"):

name:/joh?n(ath[oa]n)/

The supported regular expression syntax is explained in Regular expression syntax.

Warning

The allow_leading_wildcard parameter does not have any control over regular expressions. A query string such as the following would force Elasticsearch to visit every term in the index:

/.*n/

Use with caution!

Fuzziness

We can search for terms that are similar to, but not exactly like our search terms, using the ``fuzzy'' operator:

quikc~ brwn~ foks~

This uses the Damerau-Levenshtein distance to find all terms with a maximum of two changes, where a change is the insertion, deletion or substitution of a single character, or transposition of two adjacent characters.

The default edit distance is 2, but an edit distance of 1 should be sufficient to catch 80% of all human misspellings. It can be specified as:

quikc~1
Proximity searches

While a phrase query (eg "john smith") expects all of the terms in exactly the same order, a proximity query allows the specified words to be further apart or in a different order. In the same way that fuzzy queries can specify a maximum edit distance for characters in a word, a proximity search allows us to specify a maximum edit distance of words in a phrase:

"fox quick"~5

The closer the text in a field is to the original order specified in the query string, the more relevant that document is considered to be. When compared to the above example query, the phrase "quick fox" would be considered more relevant than "quick brown fox".

Ranges

Ranges can be specified for date, numeric or string fields. Inclusive ranges are specified with square brackets [min TO max] and exclusive ranges with curly brackets {min TO max}.

  • All days in 2012:

    date:[2012-01-01 TO 2012-12-31]
  • Numbers 1..5

    count:[1 TO 5]
  • Tags between alpha and omega, excluding alpha and omega:

    tag:{alpha TO omega}
  • Numbers from 10 upwards

    count:[10 TO *]
  • Dates before 2012

    date:{* TO 2012-01-01}

Curly and square brackets can be combined:

  • Numbers from 1 up to but not including 5

    count:[1 TO 5}

Ranges with one side unbounded can use the following syntax:

age:>10
age:>=10
age:<10
age:<=10
Note

To combine an upper and lower bound with the simplified syntax, you would need to join two clauses with an AND operator:

age:(>=10 AND <20)
age:(+>=10 +<20)

The parsing of ranges in query strings can be complex and error prone. It is much more reliable to use an explicit range query.

Boosting

Use the boost operator ^ to make one term more relevant than another. For instance, if we want to find all documents about foxes, but we are especially interested in quick foxes:

quick^2 fox

The default boost value is 1, but can be any positive floating point number. Boosts between 0 and 1 reduce relevance.

Boosts can also be applied to phrases or to groups:

"john smith"^2   (foo bar)^4
Boolean operators

By default, all terms are optional, as long as one term matches. A search for foo bar baz will find any document that contains one or more of foo or bar or baz. We have already discussed the default_operator above which allows you to force all terms to be required, but there are also boolean operators which can be used in the query string itself to provide more control.

The preferred operators are + (this term must be present) and - (this term must not be present). All other terms are optional. For example, this query:

quick brown +fox -news

states that:

  • fox must be present

  • news must not be present

  • quick and brown are optional — their presence increases the relevance

The familiar boolean operators AND, OR and NOT (also written &&, || and !) are also supported but beware that they do not honor the usual precedence rules, so parentheses should be used whenever multiple operators are used together. For instance the previous query could be rewritten as:

((quick AND fox) OR (brown AND fox) OR fox) AND NOT news

This form now replicates the logic from the original query correctly, but the relevance scoring bears little resemblance to the original.

In contrast, the same query rewritten using the match query would look like this:

{
    "bool": {
        "must":     { "match": "fox"         },
        "should":   { "match": "quick brown" },
        "must_not": { "match": "news"        }
    }
}
Grouping

Multiple terms or clauses can be grouped together with parentheses, to form sub-queries:

(quick OR brown) AND fox

Groups can be used to target a particular field, or to boost the result of a sub-query:

status:(active OR pending) title:(full text search)^2
Reserved characters

If you need to use any of the characters which function as operators in your query itself (and not as operators), then you should escape them with a leading backslash. For instance, to search for (1+1)=2, you would need to write your query as \(1\+1\)\=2. When using JSON for the request body, two preceding backslashes (\\) are required; the backslash is a reserved escaping character in JSON strings.

GET /twitter/_search
{
  "query" : {
    "query_string" : {
      "query" : "kimchy\\!",
      "fields"  : ["user"]
    }
  }
}

The reserved characters are: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /

Failing to escape these special characters correctly could lead to a syntax error which prevents your query from running.

Note
< and > can’t be escaped at all. The only way to prevent them from attempting to create a range query is to remove them from the query string entirely.
Empty Query

If the query string is empty or only contains whitespaces the query will yield an empty result set.

Simple Query String Query

A query that uses the SimpleQueryParser to parse its context. Unlike the regular query_string query, the simple_query_string query will never throw an exception, and discards invalid parts of the query. Here is an example:

GET /_search
{
  "query": {
    "simple_query_string" : {
        "query": "\"fried eggs\" +(eggplant | potato) -frittata",
        "fields": ["title^5", "body"],
        "default_operator": "and"
    }
  }
}

The simple_query_string top level parameters include:

Parameter Description

query

The actual query to be parsed. See below for syntax.

fields

The fields to perform the parsed query against. Defaults to the index.query.default_field index settings, which in turn defaults to . extracts all fields in the mapping that are eligible to term queries and filters the metadata fields.

WARNING: In future versions (starting in 7.0), there will be a limit on the number of fields that can be queried at once. This limit will be determined by the indices.query.bool.max_clause_count setting which defaults to 1024. Currently this will be raised and logged as a Warning only.

default_operator

The default operator used if no explicit operator is specified. For example, with a default operator of OR, the query capital of Hungary is translated to capital OR of OR Hungary, and with default operator of AND, the same query is translated to capital AND of AND Hungary. The default value is OR.

analyzer

Force the analyzer to use to analyze each term of the query when creating composite queries.

flags

A set of flags specifying which features of the simple_query_string to enable. Defaults to ALL.

analyze_wildcard

Whether terms of prefix queries should be automatically analyzed or not. If true a best effort will be made to analyze the prefix. However, some analyzers will be not able to provide a meaningful results based just on the prefix of a term. Defaults to false.

lenient

If set to true will cause format based failures (like providing text to a numeric field) to be ignored.

minimum_should_match

The minimum number of clauses that must match for a document to be returned. See the minimum_should_match documentation for the full list of options.

quote_field_suffix

A suffix to append to fields for quoted parts of the query string. This allows to use a field that has a different analysis chain for exact matching. Look here for a comprehensive example.

auto_generate_synonyms_phrase_query

Whether phrase queries should be automatically generated for multi terms synonyms. Defaults to true.

all_fields

deprecated[6.0.0, set default_field to * instead] Perform the query on all fields detected in the mapping that can be queried. Will be used by default when the _all field is disabled and no default_field is specified (either in the index settings or in the request body) and no fields are specified.

fuzzy_prefix_length

Set the prefix length for fuzzy queries. Default is 0.

fuzzy_max_expansions

Controls the number of terms fuzzy queries will expand to. Defaults to 50

fuzzy_transpositions

Set to false to disable fuzzy transpositions (abba). Default is true.

Simple Query String Syntax

The simple_query_string supports the following special characters:

  • + signifies AND operation

  • | signifies OR operation

  • - negates a single token

  • " wraps a number of tokens to signify a phrase for searching

  • * at the end of a term signifies a prefix query

  • ( and ) signify precedence

  • ~N after a word signifies edit distance (fuzziness)

  • ~N after a phrase signifies slop amount

In order to search for any of these special characters, they will need to be escaped with \.

Be aware that this syntax may have a different behavior depending on the default_operator value. For example, consider the following query:

GET /_search
{
    "query": {
        "simple_query_string" : {
            "fields" : ["content"],
            "query" : "foo bar -baz"
        }
    }
}

You may expect that documents containing only "foo" or "bar" will be returned, as long as they do not contain "baz", however, due to the default_operator being OR, this really means "match documents that contain "foo" or documents that contain "bar", or documents that don’t contain "baz". If this is unintended then the query can be switched to "foo bar +-baz" which will not return documents that contain "baz".

Default Field

When not explicitly specifying the field to search on in the query string syntax, the index.query.default_field will be used to derive which fields to search on. It defaults to * and the query will automatically attempt to determine the existing fields in the index’s mapping that are queryable, and perform the search on those fields.

Multi Field

The fields parameter can also include pattern based field names, allowing to automatically expand to the relevant fields (dynamically introduced fields included). For example:

GET /_search
{
    "query": {
        "simple_query_string" : {
            "fields" : ["content", "name.*^5"],
            "query" : "foo bar baz"
        }
    }
}

Flags

simple_query_string support multiple flags to specify which parsing features should be enabled. It is specified as a |-delimited string with the flags parameter:

GET /_search
{
    "query": {
        "simple_query_string" : {
            "query" : "foo | bar + baz*",
            "flags" : "OR|AND|PREFIX"
        }
    }
}

The available flags are:

Flag Description

ALL

Enables all parsing features. This is the default.

NONE

Switches off all parsing features.

AND

Enables the + AND operator.

OR

Enables the | OR operator.

NOT

Enables the - NOT operator.

PREFIX

Enables the * Prefix operator.

PHRASE

Enables the " quotes operator used to search for phrases.

PRECEDENCE

Enables the ( and ) operators to control operator precedence.

ESCAPE

Enables \ as the escape character.

WHITESPACE

Enables whitespaces as split characters.

FUZZY

Enables the ~N operator after a word where N is an integer denoting the allowed edit distance for matching (see [fuzziness]).

SLOP

Enables the ~N operator after a phrase where N is an integer denoting the slop amount.

NEAR

Synonymous to SLOP.

Synonyms

The simple_query_string query supports multi-terms synonym expansion with the synonym_graph token filter. When this filter is used, the parser creates a phrase query for each multi-terms synonyms. For example, the following synonym: "ny, new york" would produce:

(ny OR ("new york"))

It is also possible to match multi terms synonyms with conjunctions instead:

GET /_search
{
   "query": {
       "simple_query_string" : {
           "query" : "ny city",
           "auto_generate_synonyms_phrase_query" : false
       }
   }
}

The example above creates a boolean query:

(ny OR (new AND york)) city)

that matches documents with the term ny or the conjunction new AND york. By default the parameter auto_generate_synonyms_phrase_query is set to true.

Term-level queries

You can use term-level queries to find documents based on precise values in structured data. Examples of structured data include date ranges, IP addresses, prices, or product IDs.

Unlike full-text queries, term-level queries do not analyze search terms. Instead, term-level queries match the exact terms stored in a field.

Note

Term-level queries still normalize search terms for keyword fields with the normalizer property. For more details, see normalizer.

Types of term-level queries

term query

Returns documents that contain an exact term in a provided field.

terms query

Returns documents that contain one or more exact terms in a provided field.

terms_set query

Returns documents that contain a minimum number of exact terms in a provided field. You can define the minimum number of matching terms using a field or script.

range query

Returns documents that contain terms within a provided range.

exists query

Returns documents that contain any indexed value for a field.

prefix query

Returns documents that contain a specific prefix in a provided field.

wildcard query

Returns documents that contain terms matching a wildcard pattern.

regexp query

Returns documents that contain terms matching a regular expression.

fuzzy query

Returns documents that contain terms similar to the search term. {es} measures similarity, or fuzziness, using a Levenshtein edit distance.

type query

Returns documents of the specified type.

ids query

Returns documents based on their document IDs.

Term Query

The term query finds documents that contain the exact term specified in the inverted index. For instance:

POST _search
{
  "query": {
    "term" : { "user" : "Kimchy" } (1)
  }
}
  1. Finds documents which contain the exact term Kimchy in the inverted index of the user field.

A boost parameter can be specified to give this term query a higher relevance score than another query, for instance:

GET _search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "status": {
              "value": "urgent",
              "boost": 2.0 (1)
            }
          }
        },
        {
          "term": {
            "status": "normal" (2)
          }
        }
      ]
    }
  }
}
  1. The urgent query clause has a boost of 2.0, meaning it is twice as important as the query clause for normal.

  2. The normal clause has the default neutral boost of 1.0.

A term query can also match against range data types.

Why doesn’t the term query match my document?

String fields can be of type text (treated as full text, like the body of an email), or keyword (treated as exact values, like an email address or a zip code). Exact values (like numbers, dates, and keywords) have the exact value specified in the field added to the inverted index in order to make them searchable.

However, text fields are analyzed. This means that their values are first passed through an analyzer to produce a list of terms, which are then added to the inverted index.

There are many ways to analyze text: the default standard analyzer drops most punctuation, breaks up text into individual words, and lower cases them. For instance, the standard analyzer would turn the string `Quick Brown Fox!'' into the terms [`quick, brown, fox].

This analysis process makes it possible to search for individual words within a big block of full text.

The term query looks for the exact term in the field’s inverted index — it doesn’t know anything about the field’s analyzer. This makes it useful for looking up values in keyword fields, or in numeric or date fields. When querying full text fields, use the match query instead, which understands how the field has been analyzed.

To demonstrate, try out the example below. First, create an index, specifying the field mappings, and index a document:

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "full_text": {
          "type":  "text" (1)
        },
        "exact_value": {
          "type":  "keyword" (2)
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "full_text":   "Quick Foxes!", (3)
  "exact_value": "Quick Foxes!"  (4)
}
  1. The full_text field is of type text and will be analyzed.

  2. The exact_value field is of type keyword and will NOT be analyzed.

  3. The full_text inverted index will contain the terms: [quick, foxes].

  4. The exact_value inverted index will contain the exact term: [Quick Foxes!].

Now, compare the results for the term query and the match query:

GET my_index/_search
{
  "query": {
    "term": {
      "exact_value": "Quick Foxes!" (1)
    }
  }
}

GET my_index/_search
{
  "query": {
    "term": {
      "full_text": "Quick Foxes!" (2)
    }
  }
}

GET my_index/_search
{
  "query": {
    "term": {
      "full_text": "foxes" (3)
    }
  }
}

GET my_index/_search
{
  "query": {
    "match": {
      "full_text": "Quick Foxes!" (4)
    }
  }
}
  1. This query matches because the exact_value field contains the exact term Quick Foxes!.

  2. This query does not match, because the full_text field only contains the terms quick and foxes. It does not contain the exact term Quick Foxes!.

  3. A term query for the term foxes matches the full_text field.

  4. This match query on the full_text field first analyzes the query string, then looks for documents containing quick or foxes or both.

Terms Query

Filters documents that have fields that match any of the provided terms (not analyzed). For example:

GET /_search
{
    "query": {
        "terms" : { "user" : ["kimchy", "elasticsearch"]}
    }
}
Note
Highlighting terms queries is best-effort only, so terms of a terms query might not be highlighted depending on the highlighter implementation that is selected and on the number of terms in the terms query.
Terms lookup mechanism

When it’s needed to specify a terms filter with a lot of terms it can be beneficial to fetch those term values from a document in an index. A concrete example would be to filter tweets tweeted by your followers. Potentially the amount of user ids specified in the terms filter can be a lot. In this scenario it makes sense to use the terms filter’s terms lookup mechanism.

The terms lookup mechanism supports the following options:

index

The index to fetch the term values from.

type

The type to fetch the term values from.

id

The id of the document to fetch the term values from.

path

The field specified as path to fetch the actual values for the terms filter.

routing

A custom routing value to be used when retrieving the external terms doc.

The values for the terms filter will be fetched from a field in a document with the specified id in the specified type and index. Internally a get request is executed to fetch the values from the specified path. At the moment for this feature to work the _source needs to be stored.

Also, consider using an index with a single shard and fully replicated across all nodes if the "reference" terms data is not large. The lookup terms filter will prefer to execute the get request on a local node if possible, reducing the need for networking.

Warning
Executing a Terms Query request with a lot of terms can be quite slow, as each additional term demands extra processing and memory. To safeguard against this, the maximum number of terms that can be used in a Terms Query both directly or through lookup has been limited to 65536. This default maximum can be changed for a particular index with the index setting index.max_terms_count.
Terms lookup twitter example

At first we index the information for user with id 2, specifically, its followers, then index a tweet from user with id 1. Finally we search on all the tweets that match the followers of user 2.

PUT /users/_doc/2
{
    "followers" : ["1", "3"]
}

PUT /tweets/_doc/1
{
    "user" : "1"
}

GET /tweets/_search
{
    "query" : {
        "terms" : {
            "user" : {
                "index" : "users",
                "type" : "_doc",
                "id" : "2",
                "path" : "followers"
            }
        }
    }
}

The structure of the external terms document can also include an array of inner objects, for example:

PUT /users/_doc/2
{
 "followers" : [
   {
     "id" : "1"
   },
   {
     "id" : "2"
   }
 ]
}

In which case, the lookup path will be followers.id.

Terms Set Query

Returns any documents that match with at least one or more of the provided terms. The terms are not analyzed and thus must match exactly. The number of terms that must match varies per document and is either controlled by a minimum should match field or computed per document in a minimum should match script.

The field that controls the number of required terms that must match must be a number field:

PUT /my-index
{
    "mappings": {
        "_doc": {
            "properties": {
                "required_matches": {
                    "type": "long"
                }
            }
        }
    }
}

PUT /my-index/_doc/1?refresh
{
    "codes": ["ghi", "jkl"],
    "required_matches": 2
}

PUT /my-index/_doc/2?refresh
{
    "codes": ["def", "ghi"],
    "required_matches": 2
}

An example that uses the minimum should match field:

GET /my-index/_search
{
    "query": {
        "terms_set": {
            "codes" : {
                "terms" : ["abc", "def", "ghi"],
                "minimum_should_match_field": "required_matches"
            }
        }
    }
}

Response:

{
  "took": 13,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped" : 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "my-index",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.5753642,
        "_source": {
          "codes": ["def", "ghi"],
          "required_matches": 2
        }
      }
    ]
  }
}

Scripts can also be used to control how many terms are required to match in a more dynamic way. For example a create date or a popularity field can be used as basis for the number of required terms to match.

Also the params.num_terms parameter is available in the script to indicate the number of terms that have been specified.

An example that always limits the number of required terms to match to never become larger than the number of terms specified:

GET /my-index/_search
{
    "query": {
        "terms_set": {
            "codes" : {
                "terms" : ["abc", "def", "ghi"],
                "minimum_should_match_script": {
                   "source": "Math.min(params.num_terms, doc['required_matches'].value)"
                }
            }
        }
    }
}

Range Query

Matches documents with fields that have terms within a certain range. The type of the Lucene query depends on the field type, for string fields, the TermRangeQuery, while for number/date fields, the query is a NumericRangeQuery. The following example returns all documents where age is between 10 and 20:

GET _search
{
    "query": {
        "range" : {
            "age" : {
                "gte" : 10,
                "lte" : 20,
                "boost" : 2.0
            }
        }
    }
}

The range query accepts the following parameters:

gte

Greater-than or equal to

gt

Greater-than

lte

Less-than or equal to

lt

Less-than

boost

Sets the boost value of the query, defaults to 1.0

Ranges on date fields

When running range queries on fields of type date, ranges can be specified using [date-math]:

GET _search
{
    "query": {
        "range" : {
            "date" : {
                "gte" : "now-1d/d",
                "lt" :  "now/d"
            }
        }
    }
}
Date math and rounding

When using date math to round dates to the nearest day, month, hour, etc, the rounded dates depend on whether the ends of the ranges are inclusive or exclusive.

Rounding up moves to the last millisecond of the rounding scope, and rounding down to the first millisecond of the rounding scope. For example:

gt

Greater than the date rounded up: 2014-11-18||/M becomes 2014-11-30T23:59:59.999, ie excluding the entire month.

gte

Greater than or equal to the date rounded down: 2014-11-18||/M becomes 2014-11-01, ie including the entire month.

lt

Less than the date rounded down: 2014-11-18||/M becomes 2014-11-01, ie excluding the entire month.

lte

Less than or equal to the date rounded up: 2014-11-18||/M becomes 2014-11-30T23:59:59.999, ie including the entire month.

Date format in range queries

Formatted dates will be parsed using the format specified on the date field by default, but it can be overridden by passing the format parameter to the range query:

GET _search
{
    "query": {
        "range" : {
            "born" : {
                "gte": "01/01/2012",
                "lte": "2013",
                "format": "dd/MM/yyyy||yyyy"
            }
        }
    }
}

Note that if the date misses some of the year, month and day coordinates, the missing parts are filled with the start of unix time, which is January 1st, 1970. This means, that when e.g. specifying dd as the format, a value like "gte" : 10 will translate to 1970-01-10T00:00:00.000Z.

Time zone in range queries

Dates can be converted from another timezone to UTC either by specifying the time zone in the date value itself (if the format accepts it), or it can be specified as the time_zone parameter:

GET _search
{
    "query": {
        "range" : {
            "timestamp" : {
                "gte": "2015-01-01T00:00:00", (1)
                "lte": "now", (2)
                "time_zone": "+01:00"
            }
        }
    }
}
  1. This date will be converted to 2014-12-31T23:00:00 UTC.

  2. now is not affected by the time_zone parameter, its always the current system time (in UTC). However, when using date math rounding (e.g. down to the nearest day using now/d), the provided time_zone will be considered.

Querying range fields

range queries can be used on fields of type range, allowing to match a range specified in the query with a range field value in the document. The relation parameter controls how these two ranges are matched:

WITHIN

Matches documents who’s range field is entirely within the query’s range.

CONTAINS

Matches documents who’s range field entirely contains the query’s range.

INTERSECTS

Matches documents who’s range field intersects the query’s range. This is the default value when querying range fields.

For examples, see range mapping type.

Exists Query

Returns documents that contain a value other than null or [] in a provided field.

Example request

GET /_search
{
    "query": {
        "exists": {
            "field": "user"
        }
    }
}

Top-level parameters for exists

field

(Required, string) Name of the field you wish to search.

To return a document, this field must exist and contain a value other than null or []. These values can include:

  • Empty strings, such as "" or "-"

  • Arrays containing null and another value, such as [null, "foo"]

  • A custom null-value, defined in field mapping

Notes

Find documents with null values

To find documents that contain only null values or [] in a provided field, use the must_not boolean query with the exists query.

The following search returns documents that contain only null values or [] in the user field.

GET /_search
{
    "query": {
        "bool": {
            "must_not": {
                "exists": {
                    "field": "user"
                }
            }
        }
    }
}

Prefix Query

Matches documents that have fields containing terms with a specified prefix (not analyzed). The prefix query maps to Lucene PrefixQuery. The following matches documents where the user field contains a term that starts with ki:

GET /_search
{ "query": {
    "prefix" : { "user" : "ki" }
  }
}

A boost can also be associated with the query:

GET /_search
{ "query": {
    "prefix" : { "user" :  { "value" : "ki", "boost" : 2.0 } }
  }
}

This multi term query allows you to control how it gets rewritten using the rewrite parameter.

Wildcard Query

Returns documents that contain terms matching a wildcard pattern.

A wildcard operator is a placeholder that matches one or more characters. For example, the * wildcard operator matches zero or more characters. You can combine wildcard operators with other characters to create a wildcard pattern.

Example request

The following search returns documents where the user field contains a term that begins with ki and ends with y. These matching terms can include kiy, kity, or kimchy.

GET /_search
{
    "query": {
        "wildcard": {
            "user": {
                "value": "ki*y",
                "boost": 1.0,
                "rewrite": "constant_score"
            }
        }
    }
}

Top-level parameters for wildcard

<field>

(Required, object) Field you wish to search.

Parameters for <field>

value

(Required, string) Wildcard pattern for terms you wish to find in the provided <field>.

This parameter supports two wildcard operators:

  • ?, which matches any single character

  • *, which can match zero or more characters, including an empty one

Warning
Avoid beginning patterns with * or ?. This can increase the iterations needed to find matching terms and slow search performance.
boost

(Optional, float) Floating point number used to decrease or increase the relevance scores of a query. Defaults to 1.0.

You can use the boost parameter to adjust relevance scores for searches containing two or more queries.

Boost values are relative to the default value of 1.0. A boost value between 0 and 1.0 decreases the relevance score. A value greater than 1.0 increases the relevance score.

rewrite

(Optional, string) Method used to rewrite the query. For valid values and more information, see the rewrite parameter.

Regexp Query

The regexp query allows you to use regular expression term queries. See Regular expression syntax for details of the supported regular expression language. The "term queries" in that first sentence means that Elasticsearch will apply the regexp to the terms produced by the tokenizer for that field, and not to the original text of the field.

Note: The performance of a regexp query heavily depends on the regular expression chosen. Matching everything like . is very slow as well as using lookaround regular expressions. If possible, you should try to use a long prefix before your regular expression starts. Wildcard matchers like .?+ will mostly lower performance.

GET /_search
{
    "query": {
        "regexp":{
            "name.first": "s.*y"
        }
    }
}

Boosting is also supported

GET /_search
{
    "query": {
        "regexp":{
            "name.first":{
                "value":"s.*y",
                "boost":1.2
            }
        }
    }
}

You can also use special flags

GET /_search
{
    "query": {
        "regexp":{
            "name.first": {
                "value": "s.*y",
                "flags" : "INTERSECTION|COMPLEMENT|EMPTY"
            }
        }
    }
}

Possible flags are ALL (default), ANYSTRING, COMPLEMENT, EMPTY, INTERSECTION, INTERVAL, or NONE. Please check the Lucene documentation for their meaning

Regular expressions are dangerous because it’s easy to accidentally create an innocuous looking one that requires an exponential number of internal determinized automaton states (and corresponding RAM and CPU) for Lucene to execute. Lucene prevents these using the max_determinized_states setting (defaults to 10000). You can raise this limit to allow more complex regular expressions to execute.

GET /_search
{
    "query": {
        "regexp":{
            "name.first": {
                "value": "s.*y",
                "flags" : "INTERSECTION|COMPLEMENT|EMPTY",
                "max_determinized_states": 20000
            }
        }
    }
}
Note
By default the maximum length of regex string allowed in a Regexp Query is limited to 1000. You can update the index.max_regex_length index setting to bypass this limit.

Regular expression syntax

Regular expression queries are supported by the regexp and the query_string queries. The Lucene regular expression engine is not Perl-compatible but supports a smaller range of operators.

Note

We will not attempt to explain regular expressions, but just explain the supported operators.

Standard operators
Anchoring

Most regular expression engines allow you to match any part of a string. If you want the regexp pattern to start at the beginning of the string or finish at the end of the string, then you have to anchor it specifically, using ^ to indicate the beginning or $ to indicate the end.

Lucene’s patterns are always anchored. The pattern provided must match the entire string. For string "abcde":

ab.*     # match
abcd     # no match
Allowed characters

Any Unicode characters may be used in the pattern, but certain characters are reserved and must be escaped. The standard reserved characters are:

. ? + * | { } [ ] ( ) " \

If you enable optional features (see below) then these characters may also be reserved:

# @ & < >  ~

Any reserved character can be escaped with a backslash "\*" including a literal backslash character: "\\"

Additionally, any characters (except double quotes) are interpreted literally when surrounded by double quotes:

john"@smith.com"
Match any character

The period "." can be used to represent any character. For string "abcde":

ab...   # match
a.c.e   # match
One-or-more

The plus sign "+" can be used to repeat the preceding shortest pattern once or more times. For string "aaabbb":

a+b+        # match
aa+bb+      # match
a+.+        # match
aa+bbb+     # match
Zero-or-more

The asterisk "*" can be used to match the preceding shortest pattern zero-or-more times. For string `"aaabbb`":

a*b*        # match
a*b*c*      # match
.*bbb.*     # match
aaa*bbb*    # match
Zero-or-one

The question mark "?" makes the preceding shortest pattern optional. It matches zero or one times. For string "aaabbb":

aaa?bbb?    # match
aaaa?bbbb?  # match
.....?.?    # match
aa?bb?      # no match
Min-to-max

Curly brackets "{}" can be used to specify a minimum and (optionally) a maximum number of times the preceding shortest pattern can repeat. The allowed forms are:

{5}     # repeat exactly 5 times
{2,5}   # repeat at least twice and at most 5 times
{2,}    # repeat at least twice

For string "aaabbb":

a{3}b{3}        # match
a{2,4}b{2,4}    # match
a{2,}b{2,}      # match
.{3}.{3}        # match
a{4}b{4}        # no match
a{4,6}b{4,6}    # no match
a{4,}b{4,}      # no match
Grouping

Parentheses "()" can be used to form sub-patterns. The quantity operators listed above operate on the shortest previous pattern, which can be a group. For string "ababab":

(ab)+       # match
ab(ab)+     # match
(..)+       # match
(...)+      # no match
(ab)*       # match
abab(ab)?   # match
ab(ab)?     # no match
(ab){3}     # match
(ab){1,2}   # no match
Alternation

The pipe symbol "|" acts as an OR operator. The match will succeed if the pattern on either the left-hand side OR the right-hand side matches. The alternation applies to the longest pattern, not the shortest. For string "aabb":

aabb|bbaa   # match
aacc|bb     # no match
aa(cc|bb)   # match
a+|b+       # no match
a+b+|b+a+   # match
a+(b|c)+    # match
Character classes

Ranges of potential characters may be represented as character classes by enclosing them in square brackets "[]". A leading ^ negates the character class. The allowed forms are:

[abc]   # 'a' or 'b' or 'c'
[a-c]   # 'a' or 'b' or 'c'
[-abc]  # '-' or 'a' or 'b' or 'c'
[abc\-] # '-' or 'a' or 'b' or 'c'
[^abc]  # any character except 'a' or 'b' or 'c'
[^a-c]  # any character except 'a' or 'b' or 'c'
[^-abc]  # any character except '-' or 'a' or 'b' or 'c'
[^abc\-] # any character except '-' or 'a' or 'b' or 'c'

Note that the dash "-" indicates a range of characters, unless it is the first character or if it is escaped with a backslash.

For string "abcd":

ab[cd]+     # match
[a-d]+      # match
[^a-d]+     # no match
Optional operators

These operators are available by default as the flags parameter defaults to ALL. Different flag combinations (concatenated with "|") can be used to enable/disable specific operators:

{
    "regexp": {
        "username": {
            "value": "john~athon<1-5>",
            "flags": "COMPLEMENT|INTERVAL"
        }
    }
}
Complement

The complement is probably the most useful option. The shortest pattern that follows a tilde "~" is negated. For instance, `"ab~cd" means:

  • Starts with a

  • Followed by b

  • Followed by a string of any length that it anything but c

  • Ends with d

For the string "abcdef":

ab~df     # match
ab~cf     # match
ab~cdef   # no match
a~(cb)def # match
a~(bc)def # no match

Enabled with the COMPLEMENT or ALL flags.

Interval

The interval option enables the use of numeric ranges, enclosed by angle brackets "<>". For string: "foo80":

foo<1-100>     # match
foo<01-100>    # match
foo<001-100>   # no match

Enabled with the INTERVAL or ALL flags.

Intersection

The ampersand "&" joins two patterns in a way that both of them have to match. For string "aaabbb":

aaa.+&.+bbb     # match
aaa&bbb         # no match

Using this feature usually means that you should rewrite your regular expression.

Enabled with the INTERSECTION or ALL flags.

Any string

The at sign "@" matches any string in its entirety. This could be combined with the intersection and complement above to express ``everything except''. For instance:

@&~(foo.+)      # anything except string beginning with "foo"

Enabled with the ANYSTRING or ALL flags.

Fuzzy Query

The fuzzy query uses similarity based on Levenshtein edit distance.

String fields

The fuzzy query generates matching terms that are within the maximum edit distance specified in fuzziness and then checks the term dictionary to find out which of those generated terms actually exist in the index. The final query uses up to max_expansions matching terms.

Here is a simple example:

GET /_search
{
    "query": {
       "fuzzy" : { "user" : "ki" }
    }
}

Or with more advanced settings:

GET /_search
{
    "query": {
        "fuzzy" : {
            "user" : {
                "value": "ki",
                "boost": 1.0,
                "fuzziness": 2,
                "prefix_length": 0,
                "max_expansions": 100
            }
        }
    }
}
Parameters
fuzziness

The maximum edit distance. Defaults to AUTO. See [fuzziness].

prefix_length

The number of initial characters which will not be `fuzzified''. This helps to reduce the number of terms which must be examined. Defaults to `0.

max_expansions

The maximum number of terms that the fuzzy query will expand to. Defaults to 50.

transpositions

Whether fuzzy transpositions (abba) are supported. Default is false.

Warning
This query can be very heavy if prefix_length is set to 0 and if max_expansions is set to a high number. It could result in every term in the index being examined!

Type Query

Filters documents matching the provided document / mapping type.

GET /_search
{
    "query": {
        "type" : {
            "value" : "_doc"
        }
    }
}

Ids Query

Returns documents based on their IDs. This query uses document IDs stored in the _id field.

Example request

GET /_search
{
    "query": {
        "ids" : {
            "type" : "_doc",
            "values" : ["1", "4", "100"]
        }
    }
}

Top-level parameters for ids

values

An array of document IDs.

Compound queries

Compound queries wrap other compound or leaf queries, either to combine their results and scores, to change their behaviour, or to switch from query to filter context.

The queries in this group are:

constant_score query

A query which wraps another query, but executes it in filter context. All matching documents are given the same `constant'' `_score.

bool query

The default query for combining multiple leaf or compound query clauses, as must, should, must_not, or filter clauses. The must and should clauses have their scores combined — the more matching clauses, the better — while the must_not and filter clauses are executed in filter context.

dis_max query

A query which accepts multiple queries, and returns any documents which match any of the query clauses. While the bool query combines the scores from all matching queries, the dis_max query uses the score of the single best- matching query clause.

function_score query

Modify the scores returned by the main query with functions to take into account factors like popularity, recency, distance, or custom algorithms implemented with scripting.

boosting query

Return documents which match a positive query, but reduce the score of documents which also match a negative query.

Constant Score Query

Wraps a filter query and returns every matching document with a relevance score equal to the boost parameter value.

GET /_search
{
    "query": {
        "constant_score" : {
            "filter" : {
                "term" : { "user" : "kimchy"}
            },
            "boost" : 1.2
        }
    }
}

Top-level parameters for constant_score

filter

(Required, query object) Filter query you wish to run. Any returned documents must match this query.

Filter queries do not calculate relevance scores. To speed up performance, {es} automatically caches frequently used filter queries.

boost

(Optional, float) Floating point number used as the constant relevance score for every document matching the filter query. Defaults to 1.0.

Bool Query

A query that matches documents matching boolean combinations of other queries. The bool query maps to Lucene BooleanQuery. It is built using one or more boolean clauses, each clause with a typed occurrence. The occurrence types are:

Occur Description

must

The clause (query) must appear in matching documents and will contribute to the score.

filter

The clause (query) must appear in matching documents. However unlike must the score of the query will be ignored. Filter clauses are executed in filter context, meaning that scoring is ignored and clauses are considered for caching.

should

The clause (query) should appear in the matching document. If the bool query is in a query context and has a must or filter clause then a document will match the bool query even if none of the should queries match. In this case these clauses are only used to influence the score. If the bool query is in a filter context or has neither must or filter then at least one of the should queries must match a document for it to match the bool query. This behavior may be explicitly controlled by setting the minimum_should_match parameter.

must_not

The clause (query) must not appear in the matching documents. Clauses are executed in filter context meaning that scoring is ignored and clauses are considered for caching. Because scoring is ignored, a score of 0 for all documents is returned.

Important
Bool query in filter context

If this query is used in a filter context and it has should clauses then at least one should clause is required to match.

The bool query takes a more-matches-is-better approach, so the score from each matching must or should clause will be added together to provide the final _score for each document.

POST _search
{
  "query": {
    "bool" : {
      "must" : {
        "term" : { "user" : "kimchy" }
      },
      "filter": {
        "term" : { "tag" : "tech" }
      },
      "must_not" : {
        "range" : {
          "age" : { "gte" : 10, "lte" : 20 }
        }
      },
      "should" : [
        { "term" : { "tag" : "wow" } },
        { "term" : { "tag" : "elasticsearch" } }
      ],
      "minimum_should_match" : 1,
      "boost" : 1.0
    }
  }
}

Using minimum_should_match

You can use the minimum_should_match parameter to specify the number or percentage of should clauses returned documents must match.

If the bool query includes at least one should clause and no must or filter clauses, the default value is 1. Otherwise, the default value is 0.

For other valid values, see the minimum_should_match parameter.

Scoring with bool.filter

Queries specified under the filter element have no effect on scoring — scores are returned as 0. Scores are only affected by the query that has been specified. For instance, all three of the following queries return all documents where the status field contains the term active.

This first query assigns a score of 0 to all documents, as no scoring query has been specified:

GET _search
{
  "query": {
    "bool": {
      "filter": {
        "term": {
          "status": "active"
        }
      }
    }
  }
}

This bool query has a match_all query, which assigns a score of 1.0 to all documents.

GET _search
{
  "query": {
    "bool": {
      "must": {
        "match_all": {}
      },
      "filter": {
        "term": {
          "status": "active"
        }
      }
    }
  }
}

This constant_score query behaves in exactly the same way as the second example above. The constant_score query assigns a score of 1.0 to all documents matched by the filter.

GET _search
{
  "query": {
    "constant_score": {
      "filter": {
        "term": {
          "status": "active"
        }
      }
    }
  }
}

Using named queries to see which clauses matched

If you need to know which of the clauses in the bool query matched the documents returned from the query, you can use named queries to assign a name to each clause.

Dis Max Query

A query that generates the union of documents produced by its subqueries, and that scores each document with the maximum score for that document as produced by any subquery, plus a tie breaking increment for any additional matching subqueries.

This is useful when searching for a word in multiple fields with different boost factors (so that the fields cannot be combined equivalently into a single search field). We want the primary score to be the one associated with the highest boost, not the sum of the field scores (as Boolean Query would give). If the query is "albino elephant" this ensures that "albino" matching one field and "elephant" matching another gets a higher score than "albino" matching both fields. To get this result, use both Boolean Query and DisjunctionMax Query: for each term a DisjunctionMaxQuery searches for it in each field, while the set of these DisjunctionMaxQuery’s is combined into a BooleanQuery.

The tie breaker capability allows results that include the same term in multiple fields to be judged better than results that include this term in only the best of those multiple fields, without confusing this with the better case of two different terms in the multiple fields.The default tie_breaker is 0.0.

This query maps to Lucene DisjunctionMaxQuery.

GET /_search
{
    "query": {
        "dis_max" : {
            "tie_breaker" : 0.7,
            "boost" : 1.2,
            "queries" : [
                {
                    "term" : { "age" : 34 }
                },
                {
                    "term" : { "age" : 35 }
                }
            ]
        }
    }
}

Function Score Query

The function_score allows you to modify the score of documents that are retrieved by a query. This can be useful if, for example, a score function is computationally expensive and it is sufficient to compute the score on a filtered set of documents.

To use function_score, the user has to define a query and one or more functions, that compute a new score for each document returned by the query.

function_score can be used with only one function like this:

GET /_search
{
    "query": {
        "function_score": {
            "query": { "match_all": {} },
            "boost": "5",
            "random_score": {}, (1)
            "boost_mode":"multiply"
        }
    }
}
  1. See [score-functions] for a list of supported functions.

Furthermore, several functions can be combined. In this case one can optionally choose to apply the function only if a document matches a given filtering query

GET /_search
{
    "query": {
        "function_score": {
          "query": { "match_all": {} },
          "boost": "5", (1)
          "functions": [
              {
                  "filter": { "match": { "test": "bar" } },
                  "random_score": {}, (2)
                  "weight": 23
              },
              {
                  "filter": { "match": { "test": "cat" } },
                  "weight": 42
              }
          ],
          "max_boost": 42,
          "score_mode": "max",
          "boost_mode": "multiply",
          "min_score" : 42
        }
    }
}
  1. Boost for the whole query.

  2. See [score-functions] for a list of supported functions.

Note
The scores produced by the filtering query of each function do not matter.

If no filter is given with a function this is equivalent to specifying "match_all": {}

First, each document is scored by the defined functions. The parameter score_mode specifies how the computed scores are combined:

multiply

scores are multiplied (default)

sum

scores are summed

avg

scores are averaged

first

the first function that has a matching filter is applied

max

maximum score is used

min

minimum score is used

Because scores can be on different scales (for example, between 0 and 1 for decay functions but arbitrary for field_value_factor) and also because sometimes a different impact of functions on the score is desirable, the score of each function can be adjusted with a user defined weight. The weight can be defined per function in the functions array (example above) and is multiplied with the score computed by the respective function. If weight is given without any other function declaration, weight acts as a function that simply returns the weight.

In case score_mode is set to avg the individual scores will be combined by a weighted average. For example, if two functions return score 1 and 2 and their respective weights are 3 and 4, then their scores will be combined as (1*3+2*4)/(3+4) and not (1*3+2*4)/2.

The new score can be restricted to not exceed a certain limit by setting the max_boost parameter. The default for max_boost is FLT_MAX.

The newly computed score is combined with the score of the query. The parameter boost_mode defines how:

multiply

query score and function score is multiplied (default)

replace

only function score is used, the query score is ignored

sum

query score and function score are added

avg

average

max

max of query score and function score

min

min of query score and function score

By default, modifying the score does not change which documents match. To exclude documents that do not meet a certain score threshold the min_score parameter can be set to the desired score threshold.

Note
For min_score to work, all documents returned by the query need to be scored and then filtered out one by one.

The function_score query provides several types of score functions.

Script score

The script_score function allows you to wrap another query and customize the scoring of it optionally with a computation derived from other numeric field values in the doc using a script expression. Here is a simple sample:

GET /_search
{
    "query": {
        "function_score": {
            "query": {
                "match": { "message": "elasticsearch" }
            },
            "script_score" : {
                "script" : {
                  "source": "Math.log(2 + doc['likes'].value)"
                }
            }
        }
    }
}
Important

In {es}, all document scores are positive 32-bit floating point numbers.

If the script_score function produces a score with greater precision, it is converted to the nearest 32-bit float.

Similarly, scores must be non-negative. Otherwise, {es} returns an error.

On top of the different scripting field values and expression, the _score script parameter can be used to retrieve the score based on the wrapped query.

Scripts compilation is cached for faster execution. If the script has parameters that it needs to take into account, it is preferable to reuse the same script, and provide parameters to it:

GET /_search
{
    "query": {
        "function_score": {
            "query": {
                "match": { "message": "elasticsearch" }
            },
            "script_score" : {
                "script" : {
                    "params": {
                        "a": 5,
                        "b": 1.2
                    },
                    "source": "params.a / Math.pow(params.b, doc['likes'].value)"
                }
            }
        }
    }
}

Note that unlike the custom_score query, the score of the query is multiplied with the result of the script scoring. If you wish to inhibit this, set "boost_mode": "replace"

Weight

The weight score allows you to multiply the score by the provided weight. This can sometimes be desired since boost value set on specific queries gets normalized, while for this score function it does not. The number value is of type float.

"weight" : number

Random

The random_score generates scores that are uniformly distributed from 0 up to but not including 1. By default, it uses the internal Lucene doc ids as a source of randomness, which is very efficient but unfortunately not reproducible since documents might be renumbered by merges.

In case you want scores to be reproducible, it is possible to provide a seed and field. The final score will then be computed based on this seed, the minimum value of field for the considered document and a salt that is computed based on the index name and shard id so that documents that have the same value but are stored in different indexes get different scores. Note that documents that are within the same shard and have the same value for field will however get the same score, so it is usually desirable to use a field that has unique values for all documents. A good default choice might be to use the _seq_no field, whose only drawback is that scores will change if the document is updated since update operations also update the value of the _seq_no field.

Note
It was possible to set a seed without setting a field, but this has been deprecated as this requires loading fielddata on the _id field which consumes a lot of memory.
GET /_search
{
    "query": {
        "function_score": {
            "random_score": {
                "seed": 10,
                "field": "_seq_no"
            }
        }
    }
}

Field Value factor

The field_value_factor function allows you to use a field from a document to influence the score. It’s similar to using the script_score function, however, it avoids the overhead of scripting. If used on a multi-valued field, only the first value of the field is used in calculations.

As an example, imagine you have a document indexed with a numeric likes field and wish to influence the score of a document with this field, an example doing so would look like:

GET /_search
{
    "query": {
        "function_score": {
            "field_value_factor": {
                "field": "likes",
                "factor": 1.2,
                "modifier": "sqrt",
                "missing": 1
            }
        }
    }
}

Which will translate into the following formula for scoring:

sqrt(1.2 * doc['likes'].value)

There are a number of options for the field_value_factor function:

field

Field to be extracted from the document.

factor

Optional factor to multiply the field value with, defaults to 1.

modifier

Modifier to apply to the field value, can be one of: none, log, log1p, log2p, ln, ln1p, ln2p, square, sqrt, or reciprocal. Defaults to none.

Modifier Meaning

none

Do not apply any multiplier to the field value

log

Take the common logarithm of the field value

log1p

Add 1 to the field value and take the common logarithm

log2p

Add 2 to the field value and take the common logarithm

ln

Take the natural logarithm of the field value

ln1p

Add 1 to the field value and take the natural logarithm

ln2p

Add 2 to the field value and take the natural logarithm

square

Square the field value (multiply it by itself)

sqrt

Take the square root of the field value

reciprocal

Reciprocate the field value, same as 1/x where x is the field’s value

missing

Value used if the document doesn’t have that field. The modifier and factor are still applied to it as though it were read from the document.

Keep in mind that taking the log() of 0, or the square root of a negative number
is an illegal operation, and an exception will be thrown. Be sure to limit the
values of the field with a range filter to avoid this, or use `log1p` and
`ln1p`.
Warning
Scores produced by the field_value_score function must be non-negative, otherwise a deprecation warning will be issued.

Decay functions

Decay functions score a document with a function that decays depending on the distance of a numeric field value of the document from a user given origin. This is similar to a range query, but with smooth edges instead of boxes.

To use distance scoring on a query that has numerical fields, the user has to define an origin and a scale for each field. The origin is needed to define the `central point'' from which the distance is calculated, and the `scale to define the rate of decay. The decay function is specified as

"DECAY_FUNCTION": { (1)
    "FIELD_NAME": { (2)
          "origin": "11, 12",
          "scale": "2km",
          "offset": "0km",
          "decay": 0.33
    }
}
  1. The DECAY_FUNCTION should be one of linear, exp, or gauss.

  2. The specified field must be a numeric, date, or geo-point field.

In the above example, the field is a geo_point and origin can be provided in geo format. scale and offset must be given with a unit in this case. If your field is a date field, you can set scale and offset as days, weeks, and so on. Example:

GET /_search
{
    "query": {
        "function_score": {
            "gauss": {
                "date": {
                      "origin": "2013-09-17", (1)
                      "scale": "10d",
                      "offset": "5d", (2)
                      "decay" : 0.5 (2)
                }
            }
        }
    }
}
  1. The date format of the origin depends on the format defined in your mapping. If you do not define the origin, the current time is used.

  2. The offset and decay parameters are optional.

origin

The point of origin used for calculating distance. Must be given as a number for numeric field, date for date fields and geo point for geo fields. Required for geo and numeric field. For date fields the default is now. Date math (for example now-1h) is supported for origin.

scale

Required for all types. Defines the distance from origin + offset at which the computed score will equal decay parameter. For geo fields: Can be defined as number+unit (1km, 12m,…​). Default unit is meters. For date fields: Can to be defined as a number+unit ("1h", "10d",…​). Default unit is milliseconds. For numeric field: Any number.

offset

If an offset is defined, the decay function will only compute the decay function for documents with a distance greater that the defined offset. The default is 0.

decay

The decay parameter defines how documents are scored at the distance given at scale. If no decay is defined, documents at the distance scale will be scored 0.5.

In the first example, your documents might represents hotels and contain a geo location field. You want to compute a decay function depending on how far the hotel is from a given location. You might not immediately see what scale to choose for the gauss function, but you can say something like: "At a distance of 2km from the desired location, the score should be reduced to one third." The parameter "scale" will then be adjusted automatically to assure that the score function computes a score of 0.33 for hotels that are 2km away from the desired location.

In the second example, documents with a field value between 2013-09-12 and 2013-09-22 would get a weight of 1.0 and documents which are 15 days from that date a weight of 0.5.

Supported decay functions

The DECAY_FUNCTION determines the shape of the decay:

gauss

Normal decay, computed as:

Gaussian

where sigma is computed to assure that the score takes the value decay at distance scale from origin+-offset

sigma calc

See Normal decay, keyword gauss for graphs demonstrating the curve generated by the gauss function.

exp

Exponential decay, computed as:

Exponential

where again the parameter lambda is computed to assure that the score takes the value decay at distance scale from origin+-offset

lambda calc

See Exponential decay, keyword exp for graphs demonstrating the curve generated by the exp function.

linear

Linear decay, computed as:

Linear.

where again the parameter s is computed to assure that the score takes the value decay at distance scale from origin+-offset

s calc

In contrast to the normal and exponential decay, this function actually sets the score to 0 if the field value exceeds twice the user given scale value.

For single functions the three decay functions together with their parameters can be visualized like this (the field in this example called "age"):

decay 2d

Multi-values fields

If a field used for computing the decay contains multiple values, per default the value closest to the origin is chosen for determining the distance. This can be changed by setting multi_value_mode.

min

Distance is the minimum distance

max

Distance is the maximum distance

avg

Distance is the average distance

sum

Distance is the sum of all distances

Example:

    "DECAY_FUNCTION": {
        "FIELD_NAME": {
              "origin": ...,
              "scale": ...
        },
        "multi_value_mode": "avg"
    }

Detailed example

Suppose you are searching for a hotel in a certain town. Your budget is limited. Also, you would like the hotel to be close to the town center, so the farther the hotel is from the desired location the less likely you are to check in.

You would like the query results that match your criterion (for example, "hotel, Nancy, non-smoker") to be scored with respect to distance to the town center and also the price.

Intuitively, you would like to define the town center as the origin and maybe you are willing to walk 2km to the town center from the hotel.
In this case your origin for the location field is the town center and the scale is ~2km.

If your budget is low, you would probably prefer something cheap above something expensive. For the price field, the origin would be 0 Euros and the scale depends on how much you are willing to pay, for example 20 Euros.

In this example, the fields might be called "price" for the price of the hotel and "location" for the coordinates of this hotel.

The function for price in this case would be

"gauss": { (1)
    "price": {
          "origin": "0",
          "scale": "20"
    }
}
  1. This decay function could also be linear or exp.

and for location:

"gauss": { (1)
    "location": {
          "origin": "11, 12",
          "scale": "2km"
    }
}
  1. This decay function could also be linear or exp.

Suppose you want to multiply these two functions on the original score, the request would look like this:

GET /_search
{
    "query": {
        "function_score": {
          "functions": [
            {
              "gauss": {
                "price": {
                  "origin": "0",
                  "scale": "20"
                }
              }
            },
            {
              "gauss": {
                "location": {
                  "origin": "11, 12",
                  "scale": "2km"
                }
              }
            }
          ],
          "query": {
            "match": {
              "properties": "balcony"
            }
          },
          "score_mode": "multiply"
        }
    }
}

Next, we show how the computed score looks like for each of the three possible decay functions.

Normal decay, keyword gauss

When choosing gauss as the decay function in the above example, the contour and surface plot of the multiplier looks like this:

cd0e18a6 e898 11e2 9b3c f0145078bd6f
ec43c928 e898 11e2 8e0d f3c4519dbd89

Suppose your original search results matches three hotels :

  • "Backback Nap"

  • "Drink n Drive"

  • "BnB Bellevue".

"Drink n Drive" is pretty far from your defined location (nearly 2 km) and is not too cheap (about 13 Euros) so it gets a low factor a factor of 0.56. "BnB Bellevue" and "Backback Nap" are both pretty close to the defined location but "BnB Bellevue" is cheaper, so it gets a multiplier of 0.86 whereas "Backpack Nap" gets a value of 0.66.

Exponential decay, keyword exp

When choosing exp as the decay function in the above example, the contour and surface plot of the multiplier looks like this:

082975c0 e899 11e2 86f7 174c3a729d64
0b606884 e899 11e2 907b aefc77eefef6
Linear decay, keyword linear

When choosing linear as the decay function in the above example, the contour and surface plot of the multiplier looks like this:

1775b0ca e899 11e2 9f4a 776b406305c6
19d8b1aa e899 11e2 91bc 6b0553e8d722

Supported fields for decay functions

Only numeric, date, and geo-point fields are supported.

What if a field is missing?

If the numeric field is missing in the document, the function will return 1.

Boosting Query

The boosting query can be used to effectively demote results that match a given query. Unlike the "NOT" clause in bool query, this still selects documents that contain undesirable terms, but reduces their overall score.

GET /_search
{
    "query": {
        "boosting" : {
            "positive" : {
                "term" : {
                    "field1" : "value1"
                }
            },
            "negative" : {
                 "term" : {
                     "field2" : "value2"
                }
            },
            "negative_boost" : 0.2
        }
    }
}

Joining queries

Performing full SQL-style joins in a distributed system like Elasticsearch is prohibitively expensive. Instead, Elasticsearch offers two forms of join which are designed to scale horizontally.

nested query

Documents may contain fields of type nested. These fields are used to index arrays of objects, where each object can be queried (with the nested query) as an independent document.

has_child and has_parent queries

A join field relationship can exist between documents within a single index. The has_child query returns parent documents whose child documents match the specified query, while the has_parent query returns child documents whose parent document matches the specified query.

Also see the terms-lookup mechanism in the terms query, which allows you to build a terms query from values contained in another document.

Nested Query

Wraps another query to search nested fields.

The nested query searches nested field objects as if they were indexed as separate documents. If an object matches the search, the nested query returns the root parent document.

Example request

Index setup

To use the nested query, your index must include a nested field mapping. For example:

PUT /my_index
{
    "mappings": {
        "_doc" : {
            "properties" : {
                "obj1" : {
                    "type" : "nested"
                }
            }
        }
    }
}
Example query
GET /my_index/_search
{
    "query": {
        "nested" : {
            "path" : "obj1",
            "query" : {
                "bool" : {
                    "must" : [
                        { "match" : {"obj1.name" : "blue"} },
                        { "range" : {"obj1.count" : {"gt" : 5}} }
                    ]
                }
            },
            "score_mode" : "avg"
        }
    }
}

Top-level parameters for nested

path

(Required, string) Path to the nested object you wish to search.

query

(Required, query object) Query you wish to run on nested objects in the path. If an object matches the search, the nested query returns the root parent document.

You can search nested fields using dot notation that includes the complete path, such as obj1.name.

Multi-level nesting is automatically supported, and detected, resulting in an inner nested query to automatically match the relevant nesting level, rather than root, if it exists within another nested query.

score_mode

(Optional, string) Indicates how scores for matching child objects affect the root parent document’s relevance score. Valid values are:

avg (Default)

Use the mean relevance score of all matching child objects.

max

Uses the highest relevance score of all matching child objects.

min

Uses the lowest relevance score of all matching child objects.

none

Do not use the relevance scores of matching child objects. The query assigns parent documents a score of 0.

sum

Add together the relevance scores of all matching child objects.

ignore_unmapped

(Optional, boolean) Indicates whether to ignore an unmapped path and not return any documents instead of an error. Defaults to false.

If false, {es} returns an error if the path is an unmapped field.

You can use this parameter to query multiple indices that may not contain the field path.

Has Child Query

The has_child filter accepts a query and the child type to run against, and results in parent documents that have child docs matching the query. Here is an example:

GET /_search
{
    "query": {
        "has_child" : {
            "type" : "blog_tag",
            "query" : {
                "term" : {
                    "tag" : "something"
                }
            }
        }
    }
}

Note that the has_child is a slow query compared to other queries in the query dsl due to the fact that it performs a join. The performance degrades as the number of matching child documents pointing to unique parent documents increases. If you care about query performance you should not use this query. However if you do happen to use this query then use it as little as possible. Each has_child query that gets added to a search request can increase query time significantly.

Scoring capabilities

The has_child also has scoring support. The supported score modes are min, max, sum, avg or none. The default is none and yields the same behaviour as in previous versions. If the score mode is set to another value than none, the scores of all the matching child documents are aggregated into the associated parent documents. The score type can be specified with the score_mode field inside the has_child query:

GET /_search
{
    "query": {
        "has_child" : {
            "type" : "blog_tag",
            "score_mode" : "min",
            "query" : {
                "term" : {
                    "tag" : "something"
                }
            }
        }
    }
}

Min/Max Children

The has_child query allows you to specify that a minimum and/or maximum number of children are required to match for the parent doc to be considered a match:

GET /_search
{
    "query": {
        "has_child" : {
            "type" : "blog_tag",
            "score_mode" : "min",
            "min_children": 2, (1)
            "max_children": 10, (1)
            "query" : {
                "term" : {
                    "tag" : "something"
                }
            }
        }
    }
}
  1. Both min_children and max_children are optional.

The min_children and max_children parameters can be combined with the score_mode parameter.

Ignore Unmapped

When set to true the ignore_unmapped option will ignore an unmapped type and will not match any documents for this query. This can be useful when querying multiple indexes which might have different mappings. When set to false (the default value) the query will throw an exception if the type is not mapped.

Sorting

Parent documents can’t be sorted by fields in matching child documents via the regular sort options. If you need to sort parent document by field in the child documents then you should use the function_score query and then just sort by _score.

Sorting blogs by child documents' click_count field:

GET /_search
{
    "query": {
        "has_child" : {
            "type" : "blog_tag",
            "score_mode" : "max",
            "query" : {
                "function_score" : {
                    "script_score": {
                        "script": "_score * doc['click_count'].value"
                    }
                }
            }
        }
    }
}

Has Parent Query

The has_parent query accepts a query and a parent type. The query is executed in the parent document space, which is specified by the parent type. This query returns child documents which associated parents have matched. For the rest has_parent query has the same options and works in the same manner as the has_child query.

GET /_search
{
    "query": {
        "has_parent" : {
            "parent_type" : "blog",
            "query" : {
                "term" : {
                    "tag" : "something"
                }
            }
        }
    }
}

Note that the has_parent is a slow query compared to other queries in the query dsl due to the fact that it performs a join. The performance degrades as the number of matching parent documents increases. If you care about query performance you should not use this query. However if you do happen to use this query then use it as less as possible. Each has_parent query that gets added to a search request can increase query time significantly.

Scoring capabilities

The has_parent also has scoring support. The default is false which ignores the score from the parent document. The score is in this case equal to the boost on the has_parent query (Defaults to 1). If the score is set to true, then the score of the matching parent document is aggregated into the child documents belonging to the matching parent document. The score mode can be specified with the score field inside the has_parent query:

GET /_search
{
    "query": {
        "has_parent" : {
            "parent_type" : "blog",
            "score" : true,
            "query" : {
                "term" : {
                    "tag" : "something"
                }
            }
        }
    }
}

Ignore Unmapped

When set to true the ignore_unmapped option will ignore an unmapped type and will not match any documents for this query. This can be useful when querying multiple indexes which might have different mappings. When set to false (the default value) the query will throw an exception if the type is not mapped.

Sorting

Child documents can’t be sorted by fields in matching parent documents via the regular sort options. If you need to sort child documents by field in the parent documents then you should use the function_score query and then just sort by _score.

Sorting tags by parent document' view_count field:

GET /_search
{
    "query": {
        "has_parent" : {
            "parent_type" : "blog",
            "score" : true,
            "query" : {
                "function_score" : {
                    "script_score": {
                        "script": "_score * doc['view_count'].value"
                    }
                }
            }
        }
    }
}

Parent Id Query

The parent_id query can be used to find child documents which belong to a particular parent. Given the following mapping definition:

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "my_join_field": {
          "type": "join",
          "relations": {
            "my_parent": "my_child"
          }
        }
      }
    }
  }
}

PUT my_index/_doc/1?refresh
{
  "text": "This is a parent document",
  "my_join_field": "my_parent"
}

PUT my_index/_doc/2?routing=1&refresh
{
  "text": "This is a child document",
  "my_join_field": {
    "name": "my_child",
    "parent": "1"
  }
}
GET /my_index/_search
{
  "query": {
    "parent_id": {
      "type": "my_child",
      "id": "1"
    }
  }
}

Parameters

This query has two required parameters:

type

The child type name, as specified in the join field.

id

The ID of the parent document.

ignore_unmapped

When set to true this will ignore an unmapped type and will not match any documents for this query. This can be useful when querying multiple indexes which might have different mappings. When set to false (the default value) the query will throw an exception if the type is not mapped.

Geo queries

Elasticsearch supports two types of geo data: geo_point fields which support lat/lon pairs, and geo_shape fields, which support points, lines, circles, polygons, multi-polygons, etc.

The queries in this group are:

geo_shape query

Finds documents with geo-shapes which either intersect, are contained by, or do not intersect with the specified geo-shape.

geo_bounding_box query

Finds documents with geo-points that fall into the specified rectangle.

geo_distance query

Finds documents with geo-points within the specified distance of a central point.

geo_polygon query

Find documents with geo-points within the specified polygon.

GeoShape Query

Filter documents indexed using the geo_shape type.

Requires the geo_shape Mapping.

The geo_shape query uses the same grid square representation as the geo_shape mapping to find documents that have a shape that intersects with the query shape. It will also use the same Prefix Tree configuration as defined for the field mapping.

The query supports two ways of defining the query shape, either by providing a whole shape definition, or by referencing the name of a shape pre-indexed in another index. Both formats are defined below with examples.

Inline Shape Definition

Similar to the geo_shape type, the geo_shape query uses GeoJSON to represent shapes.

Given the following index:

PUT /example
{
    "mappings": {
        "_doc": {
            "properties": {
                "location": {
                    "type": "geo_shape"
                }
            }
        }
    }
}

POST /example/_doc?refresh
{
    "name": "Wind & Wetter, Berlin, Germany",
    "location": {
        "type": "point",
        "coordinates": [13.400544, 52.530286]
    }
}

The following query will find the point using the Elasticsearch’s envelope GeoJSON extension:

GET /example/_search
{
    "query":{
        "bool": {
            "must": {
                "match_all": {}
            },
            "filter": {
                "geo_shape": {
                    "location": {
                        "shape": {
                            "type": "envelope",
                            "coordinates" : [[13.0, 53.0], [14.0, 52.0]]
                        },
                        "relation": "within"
                    }
                }
            }
        }
    }
}

Pre-Indexed Shape

The Query also supports using a shape which has already been indexed in another index and/or index type. This is particularly useful for when you have a pre-defined list of shapes which are useful to your application and you want to reference this using a logical name (for example 'New Zealand') rather than having to provide their coordinates each time. In this situation it is only necessary to provide:

  • id - The ID of the document that containing the pre-indexed shape.

  • index - Name of the index where the pre-indexed shape is. Defaults to 'shapes'.

  • type - Index type where the pre-indexed shape is.

  • path - The field specified as path containing the pre-indexed shape. Defaults to 'shape'.

  • routing - The routing of the shape document if required.

The following is an example of using the Filter with a pre-indexed shape:

PUT /shapes
{
    "mappings": {
        "_doc": {
            "properties": {
                "location": {
                    "type": "geo_shape"
                }
            }
        }
    }
}

PUT /shapes/_doc/deu
{
    "location": {
        "type": "envelope",
        "coordinates" : [[13.0, 53.0], [14.0, 52.0]]
    }
}

GET /example/_search
{
    "query": {
        "bool": {
            "filter": {
                "geo_shape": {
                    "location": {
                        "indexed_shape": {
                            "index": "shapes",
                            "type": "_doc",
                            "id": "deu",
                            "path": "location"
                        }
                    }
                }
            }
        }
    }
}

Spatial Relations

The geo_shape strategy mapping parameter determines which spatial relation operators may be used at search time.

The following is a complete list of spatial relation operators available:

  • INTERSECTS - (default) Return all documents whose geo_shape field intersects the query geometry.

  • DISJOINT - Return all documents whose geo_shape field has nothing in common with the query geometry.

  • WITHIN - Return all documents whose geo_shape field is within the query geometry.

  • CONTAINS - Return all documents whose geo_shape field contains the query geometry. Note: this is only supported using the recursive Prefix Tree Strategy deprecated[6.6]

Ignore Unmapped

When set to true the ignore_unmapped option will ignore an unmapped field and will not match any documents for this query. This can be useful when querying multiple indexes which might have different mappings. When set to false (the default value) the query will throw an exception if the field is not mapped.

Geo Bounding Box Query

A query allowing to filter hits based on a point location using a bounding box. Assuming the following indexed document:

PUT /my_locations
{
    "mappings": {
        "_doc": {
            "properties": {
                "pin": {
                    "properties": {
                        "location": {
                            "type": "geo_point"
                        }
                    }
                }
            }
        }
    }
}

PUT /my_locations/_doc/1
{
    "pin" : {
        "location" : {
            "lat" : 40.12,
            "lon" : -71.34
        }
    }
}

Then the following simple query can be executed with a geo_bounding_box filter:

GET /_search
{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_bounding_box" : {
                    "pin.location" : {
                        "top_left" : {
                            "lat" : 40.73,
                            "lon" : -74.1
                        },
                        "bottom_right" : {
                            "lat" : 40.01,
                            "lon" : -71.12
                        }
                    }
                }
            }
        }
    }
}

Query Options

Option Description

_name

Optional name field to identify the filter

validation_method

Set to IGNORE_MALFORMED to accept geo points with invalid latitude or longitude, set to COERCE to also try to infer correct latitude or longitude. (default is STRICT).

type

Set to one of indexed or memory to defines whether this filter will be executed in memory or indexed. See Type below for further details Default is memory.

Accepted Formats

In much the same way the geo_point type can accept different representations of the geo point, the filter can accept it as well:

Lat Lon As Properties
GET /_search
{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_bounding_box" : {
                    "pin.location" : {
                        "top_left" : {
                            "lat" : 40.73,
                            "lon" : -74.1
                        },
                        "bottom_right" : {
                            "lat" : 40.01,
                            "lon" : -71.12
                        }
                    }
                }
            }
        }
    }
}
Lat Lon As Array

Format in [lon, lat], note, the order of lon/lat here in order to conform with GeoJSON.

GET /_search
{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_bounding_box" : {
                    "pin.location" : {
                        "top_left" : [-74.1, 40.73],
                        "bottom_right" : [-71.12, 40.01]
                    }
                }
            }
        }
    }
}
Lat Lon As String

Format in lat,lon.

GET /_search
{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_bounding_box" : {
                    "pin.location" : {
                        "top_left" : "40.73, -74.1",
                        "bottom_right" : "40.01, -71.12"
                    }
                }
            }
    }
}
}
Bounding Box as Well-Known Text (WKT)
GET /_search
{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_bounding_box" : {
                    "pin.location" : {
                        "wkt" : "BBOX (-74.1, -71.12, 40.73, 40.01)"
                    }
                }
            }
        }
    }
}
Geohash
GET /_search
{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_bounding_box" : {
                    "pin.location" : {
                        "top_left" : "dr5r9ydj2y73",
                        "bottom_right" : "drj7teegpus6"
                    }
                }
            }
        }
    }
}

Vertices

The vertices of the bounding box can either be set by top_left and bottom_right or by top_right and bottom_left parameters. More over the names topLeft, bottomRight, topRight and bottomLeft are supported. Instead of setting the values pairwise, one can use the simple names top, left, bottom and right to set the values separately.

GET /_search
{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_bounding_box" : {
                    "pin.location" : {
                        "top" : 40.73,
                        "left" : -74.1,
                        "bottom" : 40.01,
                        "right" : -71.12
                    }
                }
            }
        }
    }
}

geo_point Type

The filter requires the geo_point type to be set on the relevant field.

Multi Location Per Document

The filter can work with multiple locations / points per document. Once a single location / point matches the filter, the document will be included in the filter

Type

The type of the bounding box execution by default is set to memory, which means in memory checks if the doc falls within the bounding box range. In some cases, an indexed option will perform faster (but note that the geo_point type must have lat and lon indexed in this case). Note, when using the indexed option, multi locations per document field are not supported. Here is an example:

GET /_search
{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_bounding_box" : {
                    "pin.location" : {
                        "top_left" : {
                            "lat" : 40.73,
                            "lon" : -74.1
                        },
                        "bottom_right" : {
                            "lat" : 40.10,
                            "lon" : -71.12
                        }
                    },
                    "type" : "indexed"
                }
            }
        }
    }
}

Ignore Unmapped

When set to true the ignore_unmapped option will ignore an unmapped field and will not match any documents for this query. This can be useful when querying multiple indexes which might have different mappings. When set to false (the default value) the query will throw an exception if the field is not mapped.

Notes on Precision

Geopoints have limited precision and are always rounded down during index time. During the query time, upper boundaries of the bounding boxes are rounded down, while lower boundaries are rounded up. As a result, the points along on the lower bounds (bottom and left edges of the bounding box) might not make it into the bounding box due to the rounding error. At the same time points alongside the upper bounds (top and right edges) might be selected by the query even if they are located slightly outside the edge. The rounding error should be less than 4.20e-8 degrees on the latitude and less than 8.39e-8 degrees on the longitude, which translates to less than 1cm error even at the equator.

Geo Distance Query

Filters documents that include only hits that exists within a specific distance from a geo point. Assuming the following mapping and indexed document:

PUT /my_locations
{
    "mappings": {
        "_doc": {
            "properties": {
                "pin": {
                    "properties": {
                        "location": {
                            "type": "geo_point"
                        }
                    }
                }
            }
        }
    }
}

PUT /my_locations/_doc/1
{
    "pin" : {
        "location" : {
            "lat" : 40.12,
            "lon" : -71.34
        }
    }
}

Then the following simple query can be executed with a geo_distance filter:

GET /my_locations/_search
{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_distance" : {
                    "distance" : "200km",
                    "pin.location" : {
                        "lat" : 40,
                        "lon" : -70
                    }
                }
            }
        }
    }
}

Accepted Formats

In much the same way the geo_point type can accept different representations of the geo point, the filter can accept it as well:

Lat Lon As Properties
GET /my_locations/_search
{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_distance" : {
                    "distance" : "12km",
                    "pin.location" : {
                        "lat" : 40,
                        "lon" : -70
                    }
                }
            }
        }
    }
}
Lat Lon As Array

Format in [lon, lat], note, the order of lon/lat here in order to conform with GeoJSON.

GET /my_locations/_search
{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_distance" : {
                    "distance" : "12km",
                    "pin.location" : [-70, 40]
                }
            }
        }
    }
}
Lat Lon As String

Format in lat,lon.

GET /my_locations/_search
{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_distance" : {
                    "distance" : "12km",
                    "pin.location" : "40,-70"
                }
            }
        }
    }
}
Geohash
GET /my_locations/_search
{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_distance" : {
                    "distance" : "12km",
                    "pin.location" : "drm3btev3e86"
                }
            }
        }
    }
}

Options

The following are options allowed on the filter:

distance

The radius of the circle centred on the specified location. Points which fall into this circle are considered to be matches. The distance can be specified in various units. See [distance-units].

distance_type

How to compute the distance. Can either be arc (default), or plane (faster, but inaccurate on long distances and close to the poles).

_name

Optional name field to identify the query

validation_method

Set to IGNORE_MALFORMED to accept geo points with invalid latitude or longitude, set to COERCE to additionally try and infer correct coordinates (default is STRICT).

geo_point Type

The filter requires the geo_point type to be set on the relevant field.

Multi Location Per Document

The geo_distance filter can work with multiple locations / points per document. Once a single location / point matches the filter, the document will be included in the filter.

Ignore Unmapped

When set to true the ignore_unmapped option will ignore an unmapped field and will not match any documents for this query. This can be useful when querying multiple indexes which might have different mappings. When set to false (the default value) the query will throw an exception if the field is not mapped.

Geo Polygon Query

A query returning hits that only fall within a polygon of points. Here is an example:

GET /_search
{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_polygon" : {
                    "person.location" : {
                        "points" : [
                            {"lat" : 40, "lon" : -70},
                            {"lat" : 30, "lon" : -80},
                            {"lat" : 20, "lon" : -90}
                        ]
                    }
                }
            }
        }
    }
}

Query Options

Option Description

_name

Optional name field to identify the filter

validation_method

Set to IGNORE_MALFORMED to accept geo points with invalid latitude or longitude, COERCE to try and infer correct latitude or longitude, or STRICT (default is STRICT).

Allowed Formats

Lat Long as Array

Format as [lon, lat]

Note: the order of lon/lat here must conform with GeoJSON.

GET /_search
{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
                "geo_polygon" : {
                    "person.location" : {
                        "points" : [
                            [-70, 40],
                            [-80, 30],
                            [-90, 20]
                        ]
                    }
                }
            }
        }
    }
}
Lat Lon as String

Format in lat,lon.

GET /_search
{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
               "geo_polygon" : {
                    "person.location" : {
                        "points" : [
                            "40, -70",
                            "30, -80",
                            "20, -90"
                        ]
                    }
                }
            }
        }
    }
}
Geohash
GET /_search
{
    "query": {
        "bool" : {
            "must" : {
                "match_all" : {}
            },
            "filter" : {
               "geo_polygon" : {
                    "person.location" : {
                        "points" : [
                            "drn5x1g8cu2y",
                            "30, -80",
                            "20, -90"
                        ]
                    }
                }
            }
        }
    }
}

geo_point Type

The query requires the geo_point type to be set on the relevant field.

Ignore Unmapped

When set to true the ignore_unmapped option will ignore an unmapped field and will not match any documents for this query. This can be useful when querying multiple indexes which might have different mappings. When set to false (the default value) the query will throw an exception if the field is not mapped.

Specialized queries

This group contains queries which do not fit into the other groups:

more_like_this query

This query finds documents which are similar to the specified text, document, or collection of documents.

script query

This query allows a script to act as a filter. Also see the function_score query.

percolate query

This query finds queries that are stored as documents that match with the specified document.

wrapper query

A query that accepts other queries as json or yaml string.

More Like This Query

The More Like This Query finds documents that are "like" a given set of documents. In order to do so, MLT selects a set of representative terms of these input documents, forms a query using these terms, executes the query and returns the results. The user controls the input documents, how the terms should be selected and how the query is formed.

The simplest use case consists of asking for documents that are similar to a provided piece of text. Here, we are asking for all movies that have some text similar to "Once upon a time" in their "title" and in their "description" fields, limiting the number of selected terms to 12.

GET /_search
{
    "query": {
        "more_like_this" : {
            "fields" : ["title", "description"],
            "like" : "Once upon a time",
            "min_term_freq" : 1,
            "max_query_terms" : 12
        }
    }
}

A more complicated use case consists of mixing texts with documents already existing in the index. In this case, the syntax to specify a document is similar to the one used in the Multi GET API.

GET /_search
{
    "query": {
        "more_like_this" : {
            "fields" : ["title", "description"],
            "like" : [
            {
                "_index" : "imdb",
                "_type" : "movies",
                "_id" : "1"
            },
            {
                "_index" : "imdb",
                "_type" : "movies",
                "_id" : "2"
            },
            "and potentially some more text here as well"
            ],
            "min_term_freq" : 1,
            "max_query_terms" : 12
        }
    }
}

Finally, users can mix some texts, a chosen set of documents but also provide documents not necessarily present in the index. To provide documents not present in the index, the syntax is similar to artificial documents.

GET /_search
{
    "query": {
        "more_like_this" : {
            "fields" : ["name.first", "name.last"],
            "like" : [
            {
                "_index" : "marvel",
                "_type" : "quotes",
                "doc" : {
                    "name": {
                        "first": "Ben",
                        "last": "Grimm"
                    },
                    "_doc": "You got no idea what I'd... what I'd give to be invisible."
                  }
            },
            {
                "_index" : "marvel",
                "_type" : "quotes",
                "_id" : "2"
            }
            ],
            "min_term_freq" : 1,
            "max_query_terms" : 12
        }
    }
}

How it Works

Suppose we wanted to find all documents similar to a given input document. Obviously, the input document itself should be its best match for that type of query. And the reason would be mostly, according to Lucene scoring formula, due to the terms with the highest tf-idf. Therefore, the terms of the input document that have the highest tf-idf are good representatives of that document, and could be used within a disjunctive query (or OR) to retrieve similar documents. The MLT query simply extracts the text from the input document, analyzes it, usually using the same analyzer at the field, then selects the top K terms with highest tf-idf to form a disjunctive query of these terms.

Important
The fields on which to perform MLT must be indexed and of type text or keyword`. Additionally, when using like with documents, either _source must be enabled or the fields must be stored or store term_vector. In order to speed up analysis, it could help to store term vectors at index time.

For example, if we wish to perform MLT on the "title" and "tags.raw" fields, we can explicitly store their term_vector at index time. We can still perform MLT on the "description" and "tags" fields, as _source is enabled by default, but there will be no speed up on analysis for these fields.

PUT /imdb
{
    "mappings": {
        "movies": {
            "properties": {
                "title": {
                    "type": "text",
                    "term_vector": "yes"
                },
                "description": {
                    "type": "text"
                },
                "tags": {
                    "type": "text",
                    "fields" : {
                        "raw": {
                            "type" : "text",
                            "analyzer": "keyword",
                            "term_vector" : "yes"
                        }
                    }
                }
            }
        }
    }
}

Parameters

The only required parameter is like, all other parameters have sensible defaults. There are three types of parameters: one to specify the document input, the other one for term selection and for query formation.

Document Input Parameters

like

The only required parameter of the MLT query is like and follows a versatile syntax, in which the user can specify free form text and/or a single or multiple documents (see examples above). The syntax to specify documents is similar to the one used by the Multi GET API. When specifying documents, the text is fetched from fields unless overridden in each document request. The text is analyzed by the analyzer at the field, but could also be overridden. The syntax to override the analyzer at the field follows a similar syntax to the per_field_analyzer parameter of the Term Vectors API. Additionally, to provide documents not necessarily present in the index, artificial documents are also supported.

unlike

The unlike parameter is used in conjunction with like in order not to select terms found in a chosen set of documents. In other words, we could ask for documents like: "Apple", but unlike: "cake crumble tree". The syntax is the same as like.

fields

A list of fields to fetch and analyze the text from. Defaults to the _all field for free text and to all possible fields for document inputs.

like_text

The text to find documents like it.

ids or docs

A list of documents following the same syntax as the Multi GET API.

Term Selection Parameters

max_query_terms

The maximum number of query terms that will be selected. Increasing this value gives greater accuracy at the expense of query execution speed. Defaults to 25.

min_term_freq

The minimum term frequency below which the terms will be ignored from the input document. Defaults to 2.

min_doc_freq

The minimum document frequency below which the terms will be ignored from the input document. Defaults to 5.

max_doc_freq

The maximum document frequency above which the terms will be ignored from the input document. This could be useful in order to ignore highly frequent words such as stop words. Defaults to unbounded (0).

min_word_length

The minimum word length below which the terms will be ignored. The old name min_word_len is deprecated. Defaults to 0.

max_word_length

The maximum word length above which the terms will be ignored. The old name max_word_len is deprecated. Defaults to unbounded (0).

stop_words

An array of stop words. Any word in this set is considered "uninteresting" and ignored. If the analyzer allows for stop words, you might want to tell MLT to explicitly ignore them, as for the purposes of document similarity it seems reasonable to assume that "a stop word is never interesting".

analyzer

The analyzer that is used to analyze the free form text. Defaults to the analyzer associated with the first field in fields.

Query Formation Parameters

minimum_should_match

After the disjunctive query has been formed, this parameter controls the number of terms that must match. The syntax is the same as the minimum should match. (Defaults to "30%").

fail_on_unsupported_field

Controls whether the query should fail (throw an exception) if any of the specified fields are not of the supported types (text or keyword'). Set this to `false to ignore the field and continue processing. Defaults to true.

boost_terms

Each term in the formed query could be further boosted by their tf-idf score. This sets the boost factor to use when using this feature. Defaults to deactivated (0). Any other positive value activates terms boosting with the given boost factor.

include

Specifies whether the input documents should also be included in the search results returned. Defaults to false.

boost

Sets the boost value of the whole query. Defaults to 1.0.

Script Query

A query allowing to define scripts as queries. They are typically used in a filter context, for example:

GET /_search
{
    "query": {
        "bool" : {
            "filter" : {
                "script" : {
                    "script" : {
                        "source": "doc['num1'].value > 1",
                        "lang": "painless"
                     }
                }
            }
        }
    }
}

Custom Parameters

Scripts are compiled and cached for faster execution. If the same script can be used, just with different parameters provider, it is preferable to use the ability to pass parameters to the script itself, for example:

GET /_search
{
    "query": {
        "bool" : {
            "filter" : {
                "script" : {
                    "script" : {
                        "source" : "doc['num1'].value > params.param1",
                        "lang"   : "painless",
                        "params" : {
                            "param1" : 5
                        }
                    }
                }
            }
        }
    }
}

Percolate Query

The percolate query can be used to match queries stored in an index. The percolate query itself contains the document that will be used as query to match with the stored queries.

Sample Usage

Create an index with two fields:

PUT /my-index
{
    "mappings": {
        "_doc": {
            "properties": {
                "message": {
                    "type": "text"
                },
                "query": {
                    "type": "percolator"
                }
            }
        }
    }
}

The message field is the field used to preprocess the document defined in the percolator query before it gets indexed into a temporary index.

The query field is used for indexing the query documents. It will hold a json object that represents an actual Elasticsearch query. The query field has been configured to use the percolator field type. This field type understands the query dsl and stores the query in such a way that it can be used later on to match documents defined on the percolate query.

Register a query in the percolator:

PUT /my-index/_doc/1?refresh
{
    "query" : {
        "match" : {
            "message" : "bonsai tree"
        }
    }
}

Match a document to the registered percolator queries:

GET /my-index/_search
{
    "query" : {
        "percolate" : {
            "field" : "query",
            "document" : {
                "message" : "A new bonsai tree in the office"
            }
        }
    }
}

The above request will yield the following response:

{
  "took": 13,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped" : 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      { (1)
        "_index": "my-index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "query": {
            "match": {
              "message": "bonsai tree"
            }
          }
        },
        "fields" : {
          "_percolator_document_slot" : [0] (2)
        }
      }
    ]
  }
}
  1. The query with id 1 matches our document.

  2. The _percolator_document_slot field indicates which document has matched with this query. Useful when percolating multiple document simultaneously.

Tip
To provide a simple example, this documentation uses one index my-index for both the percolate queries and documents. This set-up can work well when there are just a few percolate queries registered. However, with heavier usage it is recommended to store queries and documents in separate indices. Please see How it Works Under the Hood for more details.

Parameters

The following parameters are required when percolating a document:

field

The field of type percolator that holds the indexed queries. This is a required parameter.

name

The suffix to be used for the _percolator_document_slot field in case multiple percolate queries have been specified. This is an optional parameter.

document

The source of the document being percolated.

documents

Like the document parameter, but accepts multiple documents via a json array.

document_type

The type / mapping of the document being percolated. This setting is deprecated and only required for indices created before 6.0

Instead of specifying the source of the document being percolated, the source can also be retrieved from an already stored document. The percolate query will then internally execute a get request to fetch that document.

In that case the document parameter can be substituted with the following parameters:

index

The index the document resides in. This is a required parameter.

type

The type of the document to fetch. This is a required parameter.

id

The id of the document to fetch. This is a required parameter.

routing

Optionally, routing to be used to fetch document to percolate.

preference

Optionally, preference to be used to fetch document to percolate.

version

Optionally, the expected version of the document to be fetched.

Percolating in a filter context

In case you are not interested in the score, better performance can be expected by wrapping the percolator query in a bool query’s filter clause or in a constant_score query:

GET /my-index/_search
{
    "query" : {
        "constant_score": {
            "filter": {
                "percolate" : {
                    "field" : "query",
                    "document" : {
                        "message" : "A new bonsai tree in the office"
                    }
                }
            }
        }
    }
}

At index time terms are extracted from the percolator query and the percolator can often determine whether a query matches just by looking at those extracted terms. However, computing scores requires to deserialize each matching query and run it against the percolated document, which is a much more expensive operation. Hence if computing scores is not required the percolate query should be wrapped in a constant_score query or a bool query’s filter clause.

Note that the percolate query never gets cached by the query cache.

Percolating multiple documents

The percolate query can match multiple documents simultaneously with the indexed percolator queries. Percolating multiple documents in a single request can improve performance as queries only need to be parsed and matched once instead of multiple times.

The _percolator_document_slot field that is being returned with each matched percolator query is important when percolating multiple documents simultaneously. It indicates which documents matched with a particular percolator query. The numbers correlate with the slot in the documents array specified in the percolate query.

GET /my-index/_search
{
    "query" : {
        "percolate" : {
            "field" : "query",
            "documents" : [ (1)
                {
                    "message" : "bonsai tree"
                },
                {
                    "message" : "new tree"
                },
                {
                    "message" : "the office"
                },
                {
                    "message" : "office tree"
                }
            ]
        }
    }
}
  1. The documents array contains 4 documents that are going to be percolated at the same time.

{
  "took": 13,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped" : 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.5606477,
    "hits": [
      {
        "_index": "my-index",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.5606477,
        "_source": {
          "query": {
            "match": {
              "message": "bonsai tree"
            }
          }
        },
        "fields" : {
          "_percolator_document_slot" : [0, 1, 3] (1)
        }
      }
    ]
  }
}
  1. The _percolator_document_slot indicates that the first, second and last documents specified in the percolate query are matching with this query.

Percolating an Existing Document

In order to percolate a newly indexed document, the percolate query can be used. Based on the response from an index request, the _id and other meta information can be used to immediately percolate the newly added document.

Example

Based on the previous example.

Index the document we want to percolate:

PUT /my-index/_doc/2
{
  "message" : "A new bonsai tree in the office"
}

Index response:

{
  "_index": "my-index",
  "_type": "_doc",
  "_id": "2",
  "_version": 1,
  "_shards": {
    "total": 2,
    "successful": 1,
    "failed": 0
  },
  "result": "created",
  "_seq_no" : 0,
  "_primary_term" : 1
}

Percolating an existing document, using the index response as basis to build to new search request:

GET /my-index/_search
{
    "query" : {
        "percolate" : {
            "field": "query",
            "index" : "my-index",
            "type" : "_doc",
            "id" : "2",
            "version" : 1 (1)
        }
    }
}
  1. The version is optional, but useful in certain cases. We can ensure that we are trying to percolate the document we just have indexed. A change may be made after we have indexed, and if that is the case the search request would fail with a version conflict error.

The search response returned is identical as in the previous example.

Percolate query and highlighting

The percolate query is handled in a special way when it comes to highlighting. The queries hits are used to highlight the document that is provided in the percolate query. Whereas with regular highlighting the query in the search request is used to highlight the hits.

Example

This example is based on the mapping of the first example.

Save a query:

PUT /my-index/_doc/3?refresh
{
    "query" : {
        "match" : {
            "message" : "brown fox"
        }
    }
}

Save another query:

PUT /my-index/_doc/4?refresh
{
    "query" : {
        "match" : {
            "message" : "lazy dog"
        }
    }
}

Execute a search request with the percolate query and highlighting enabled:

GET /my-index/_search
{
    "query" : {
        "percolate" : {
            "field": "query",
            "document" : {
                "message" : "The quick brown fox jumps over the lazy dog"
            }
        }
    },
    "highlight": {
      "fields": {
        "message": {}
      }
    }
}

This will yield the following response.

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped" : 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "my-index",
        "_type": "_doc",
        "_id": "4",
        "_score": 0.5753642,
        "_source": {
          "query": {
            "match": {
              "message": "lazy dog"
            }
          }
        },
        "highlight": {
          "message": [
            "The quick brown fox jumps over the <em>lazy</em> <em>dog</em>" (1)
          ]
        },
        "fields" : {
          "_percolator_document_slot" : [0]
        }
      },
      {
        "_index": "my-index",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.5753642,
        "_source": {
          "query": {
            "match": {
              "message": "brown fox"
            }
          }
        },
        "highlight": {
          "message": [
            "The quick <em>brown</em> <em>fox</em> jumps over the lazy dog" (1)
          ]
        },
        "fields" : {
          "_percolator_document_slot" : [0]
        }
      }
    ]
  }
}
  1. The terms from each query have been highlighted in the document.

Instead of the query in the search request highlighting the percolator hits, the percolator queries are highlighting the document defined in the percolate query.

When percolating multiple documents at the same time like the request below then the highlight response is different:

GET /my-index/_search
{
    "query" : {
        "percolate" : {
            "field": "query",
            "documents" : [
                {
                    "message" : "bonsai tree"
                },
                {
                    "message" : "new tree"
                },
                {
                    "message" : "the office"
                },
                {
                    "message" : "office tree"
                }
            ]
        }
    },
    "highlight": {
      "fields": {
        "message": {}
      }
    }
}

The slightly different response:

{
  "took": 13,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped" : 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1.5606477,
    "hits": [
      {
        "_index": "my-index",
        "_type": "_doc",
        "_id": "1",
        "_score": 1.5606477,
        "_source": {
          "query": {
            "match": {
              "message": "bonsai tree"
            }
          }
        },
        "fields" : {
          "_percolator_document_slot" : [0, 1, 3]
        },
        "highlight" : { (1)
          "0_message" : [
              "<em>bonsai</em> <em>tree</em>"
          ],
          "3_message" : [
              "office <em>tree</em>"
          ],
          "1_message" : [
              "new <em>tree</em>"
          ]
        }
      }
    ]
  }
}
  1. The highlight fields have been prefixed with the document slot they belong to, in order to know which highlight field belongs to what document.

Specifying multiple percolate queries

It is possible to specify multiple percolate queries in a single search request:

GET /my-index/_search
{
    "query" : {
        "bool" : {
            "should" : [
                {
                    "percolate" : {
                        "field" : "query",
                        "document" : {
                            "message" : "bonsai tree"
                        },
                        "name": "query1" (1)
                    }
                },
                {
                    "percolate" : {
                        "field" : "query",
                        "document" : {
                            "message" : "tulip flower"
                        },
                        "name": "query2" (1)
                    }
                }
            ]
        }
    }
}
  1. The name parameter will be used to identify which percolator document slots belong to what percolate query.

The _percolator_document_slot field name will be suffixed with what is specified in the _name parameter. If that isn’t specified then the field parameter will be used, which in this case will result in ambiguity.

The above search request returns a response similar to this:

{
  "took": 13,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped" : 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "my-index",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.5753642,
        "_source": {
          "query": {
            "match": {
              "message": "bonsai tree"
            }
          }
        },
        "fields" : {
          "_percolator_document_slot_query1" : [0] (1)
        }
      }
    ]
  }
}
  1. The _percolator_document_slot_query1 percolator slot field indicates that these matched slots are from the percolate query with _name parameter set to query1.

How it Works Under the Hood

When indexing a document into an index that has the percolator field type mapping configured, the query part of the document gets parsed into a Lucene query and is stored into the Lucene index. A binary representation of the query gets stored, but also the query’s terms are analyzed and stored into an indexed field.

At search time, the document specified in the request gets parsed into a Lucene document and is stored in a in-memory temporary Lucene index. This in-memory index can just hold this one document and it is optimized for that. After this a special query is built based on the terms in the in-memory index that select candidate percolator queries based on their indexed query terms. These queries are then evaluated by the in-memory index if they actually match.

The selecting of candidate percolator queries matches is an important performance optimization during the execution of the percolate query as it can significantly reduce the number of candidate matches the in-memory index needs to evaluate. The reason the percolate query can do this is because during indexing of the percolator queries the query terms are being extracted and indexed with the percolator query. Unfortunately the percolator cannot extract terms from all queries (for example the wildcard or geo_shape query) and as a result of that in certain cases the percolator can’t do the selecting optimization (for example if an unsupported query is defined in a required clause of a boolean query or the unsupported query is the only query in the percolator document). These queries are marked by the percolator and can be found by running the following search:

GET /_search
{
  "query": {
    "term" : {
      "query.extraction_result" : "failed"
    }
  }
}
Note
The above example assumes that there is a query field of type percolator in the mappings.

Given the design of percolation, it often makes sense to use separate indices for the percolate queries and documents being percolated, as opposed to a single index as we do in examples. There are a few benefits to this approach:

  • Because percolate queries contain a different set of fields from the percolated documents, using two separate indices allows for fields to be stored in a denser, more efficient way.

  • Percolate queries do not scale in the same way as other queries, so percolation performance may benefit from using a different index configuration, like the number of primary shards.

Wrapper Query

A query that accepts any other query as base64 encoded string.

GET /_search
{
    "query" : {
        "wrapper": {
            "query" : "eyJ0ZXJtIiA6IHsgInVzZXIiIDogIktpbWNoeSIgfX0=" (1)
        }
    }
}
  1. Base64 encoded string: {"term" : { "user" : "Kimchy" }}

This query is more useful in the context of the Java high-level REST client or transport client to also accept queries as json formatted string. In these cases queries can be specified as a json or yaml formatted string or as a query builder (which is a available in the Java high-level REST client).

Span queries

Span queries are low-level positional queries which provide expert control over the order and proximity of the specified terms. These are typically used to implement very specific queries on legal documents or patents.

Setting boost on inner span queries is deprecated. Compound span queries, like span_near, only use the list of matching spans of inner span queries in order to find their own spans, which they then use to produce a score. Scores are never computed on inner span queries, which is the reason why their boosts don’t make sense.

Span queries cannot be mixed with non-span queries (with the exception of the span_multi query).

The queries in this group are:

span_term query

The equivalent of the term query but for use with other span queries.

span_multi query

Wraps a term, range, prefix, wildcard, regexp, or fuzzy query.

span_first query

Accepts another span query whose matches must appear within the first N positions of the field.

span_near query

Accepts multiple span queries whose matches must be within the specified distance of each other, and possibly in the same order.

span_or query

Combines multiple span queries — returns documents which match any of the specified queries.

span_not query

Wraps another span query, and excludes any documents which match that query.

span_containing query

Accepts a list of span queries, but only returns those spans which also match a second span query.

span_within query

The result from a single span query is returned as long is its span falls within the spans returned by a list of other span queries.

field_masking_span query

Allows queries like span-near or span-or across different fields.

Span Term Query

Matches spans containing a term. The span term query maps to Lucene SpanTermQuery. Here is an example:

GET /_search
{
    "query": {
        "span_term" : { "user" : "kimchy" }
    }
}

A boost can also be associated with the query:

GET /_search
{
    "query": {
       "span_term" : { "user" : { "value" : "kimchy", "boost" : 2.0 } }
    }
}

Or :

GET /_search
{
    "query": {
        "span_term" : { "user" : { "term" : "kimchy", "boost" : 2.0 } }
    }
}

Span Multi Term Query

The span_multi query allows you to wrap a multi term query (one of wildcard, fuzzy, prefix, range or regexp query) as a span query, so it can be nested. Example:

GET /_search
{
    "query": {
        "span_multi":{
            "match":{
                "prefix" : { "user" :  { "value" : "ki" } }
            }
        }
    }
}

A boost can also be associated with the query:

GET /_search
{
    "query": {
        "span_multi":{
            "match":{
                "prefix" : { "user" :  { "value" : "ki", "boost" : 1.08 } }
            }
        }
    }
}
Warning
span_multi queries will hit too many clauses failure if the number of terms that match the query exceeds the boolean query limit (defaults to 1024).To avoid an unbounded expansion you can set the rewrite method of the multi term query to top_terms_* rewrite. Or, if you use span_multi on prefix query only, you can activate the index_prefixes field option of the text field instead. This will rewrite any prefix query on the field to a single term query that matches the indexed prefix.

Span First Query

Matches spans near the beginning of a field. The span first query maps to Lucene SpanFirstQuery. Here is an example:

GET /_search
{
    "query": {
        "span_first" : {
            "match" : {
                "span_term" : { "user" : "kimchy" }
            },
            "end" : 3
        }
    }
}

The match clause can be any other span type query. The end controls the maximum end position permitted in a match.

Span Near Query

Matches spans which are near one another. One can specify slop, the maximum number of intervening unmatched positions, as well as whether matches are required to be in-order. The span near query maps to Lucene SpanNearQuery. Here is an example:

GET /_search
{
    "query": {
        "span_near" : {
            "clauses" : [
                { "span_term" : { "field" : "value1" } },
                { "span_term" : { "field" : "value2" } },
                { "span_term" : { "field" : "value3" } }
            ],
            "slop" : 12,
            "in_order" : false
        }
    }
}

The clauses element is a list of one or more other span type queries and the slop controls the maximum number of intervening unmatched positions permitted.

Span Or Query

Matches the union of its span clauses. The span or query maps to Lucene SpanOrQuery. Here is an example:

GET /_search
{
    "query": {
        "span_or" : {
            "clauses" : [
                { "span_term" : { "field" : "value1" } },
                { "span_term" : { "field" : "value2" } },
                { "span_term" : { "field" : "value3" } }
            ]
        }
    }
}

The clauses element is a list of one or more other span type queries.

Span Not Query

Removes matches which overlap with another span query or which are within x tokens before (controlled by the parameter pre) or y tokens after (controled by the parameter post) another SpanQuery. The span not query maps to Lucene SpanNotQuery. Here is an example:

GET /_search
{
    "query": {
        "span_not" : {
            "include" : {
                "span_term" : { "field1" : "hoya" }
            },
            "exclude" : {
                "span_near" : {
                    "clauses" : [
                        { "span_term" : { "field1" : "la" } },
                        { "span_term" : { "field1" : "hoya" } }
                    ],
                    "slop" : 0,
                    "in_order" : true
                }
            }
        }
    }
}

The include and exclude clauses can be any span type query. The include clause is the span query whose matches are filtered, and the exclude clause is the span query whose matches must not overlap those returned.

In the above example all documents with the term hoya are filtered except the ones that have 'la' preceding them.

Other top level options:

pre

If set the amount of tokens before the include span can’t have overlap with the exclude span. Defaults to 0.

post

If set the amount of tokens after the include span can’t have overlap with the exclude span. Defaults to 0.

dist

If set the amount of tokens from within the include span can’t have overlap with the exclude span. Equivalent of setting both pre and post.

Span Containing Query

Returns matches which enclose another span query. The span containing query maps to Lucene SpanContainingQuery. Here is an example:

GET /_search
{
    "query": {
        "span_containing" : {
            "little" : {
                "span_term" : { "field1" : "foo" }
            },
            "big" : {
                "span_near" : {
                    "clauses" : [
                        { "span_term" : { "field1" : "bar" } },
                        { "span_term" : { "field1" : "baz" } }
                    ],
                    "slop" : 5,
                    "in_order" : true
                }
            }
        }
    }
}

The big and little clauses can be any span type query. Matching spans from big that contain matches from little are returned.

Span Within Query

Returns matches which are enclosed inside another span query. The span within query maps to Lucene SpanWithinQuery. Here is an example:

GET /_search
{
    "query": {
        "span_within" : {
            "little" : {
                "span_term" : { "field1" : "foo" }
            },
            "big" : {
                "span_near" : {
                    "clauses" : [
                        { "span_term" : { "field1" : "bar" } },
                        { "span_term" : { "field1" : "baz" } }
                    ],
                    "slop" : 5,
                    "in_order" : true
                }
            }
        }
    }
}

The big and little clauses can be any span type query. Matching spans from little that are enclosed within big are returned.

Span Field Masking Query

Wrapper to allow span queries to participate in composite single-field span queries by 'lying' about their search field. The span field masking query maps to Lucene’s SpanFieldMaskingQuery

This can be used to support queries like span-near or span-or across different fields, which is not ordinarily permitted.

Span field masking query is invaluable in conjunction with multi-fields when same content is indexed with multiple analyzers. For instance we could index a field with the standard analyzer which breaks text up into words, and again with the english analyzer which stems words into their root form.

Example:

GET /_search
{
  "query": {
    "span_near": {
      "clauses": [
        {
          "span_term": {
            "text": "quick brown"
          }
        },
        {
          "field_masking_span": {
            "query": {
              "span_term": {
                "text.stems": "fox"
              }
            },
            "field": "text"
          }
        }
      ],
      "slop": 5,
      "in_order": false
    }
  }
}

Note: as span field masking query returns the masked field, scoring will be done using the norms of the field name supplied. This may lead to unexpected scoring behaviour.

Minimum Should Match

The minimum_should_match parameter possible values:

Type Example Description

Integer

3

Indicates a fixed value regardless of the number of optional clauses.

Negative integer

-2

Indicates that the total number of optional clauses, minus this number should be mandatory.

Percentage

75%

Indicates that this percent of the total number of optional clauses are necessary. The number computed from the percentage is rounded down and used as the minimum.

Negative percentage

-25%

Indicates that this percent of the total number of optional clauses can be missing. The number computed from the percentage is rounded down, before being subtracted from the total to determine the minimum.

Combination

3<90%

A positive integer, followed by the less-than symbol, followed by any of the previously mentioned specifiers is a conditional specification. It indicates that if the number of optional clauses is equal to (or less than) the integer, they are all required, but if it’s greater than the integer, the specification applies. In this example: if there are 1 to 3 clauses they are all required, but for 4 or more clauses only 90% are required.

Multiple combinations

2←25% 9←3

Multiple conditional specifications can be separated by spaces, each one only being valid for numbers greater than the one before it. In this example: if there are 1 or 2 clauses both are required, if there are 3-9 clauses all but 25% are required, and if there are more than 9 clauses, all but three are required.

NOTE:

When dealing with percentages, negative values can be used to get different behavior in edge cases. 75% and -25% mean the same thing when dealing with 4 clauses, but when dealing with 5 clauses 75% means 3 are required, but -25% means 4 are required.

If the calculations based on the specification determine that no optional clauses are needed, the usual rules about BooleanQueries still apply at search time (a BooleanQuery containing no required clauses must still match at least one optional clause)

No matter what number the calculation arrives at, a value greater than the number of optional clauses, or a value less than 1 will never be used. (ie: no matter how low or how high the result of the calculation result is, the minimum number of required matches will never be lower than 1 or greater than the number of clauses.

Multi Term Query Rewrite

Multi term queries, like wildcard and prefix are called multi term queries and end up going through a process of rewrite. This also happens on the query_string. All of those queries allow to control how they will get rewritten using the rewrite parameter:

  • constant_score (default): A rewrite method that performs like constant_score_boolean when there are few matching terms and otherwise visits all matching terms in sequence and marks documents for that term. Matching documents are assigned a constant score equal to the query’s boost.

  • scoring_boolean: A rewrite method that first translates each term into a should clause in a boolean query, and keeps the scores as computed by the query. Note that typically such scores are meaningless to the user, and require non-trivial CPU to compute, so it’s almost always better to use constant_score. This rewrite method will hit too many clauses failure if it exceeds the boolean query limit (defaults to 1024).

  • constant_score_boolean: Similar to scoring_boolean except scores are not computed. Instead, each matching document receives a constant score equal to the query’s boost. This rewrite method will hit too many clauses failure if it exceeds the boolean query limit (defaults to 1024).

  • top_terms_N: A rewrite method that first translates each term into should clause in boolean query, and keeps the scores as computed by the query. This rewrite method only uses the top scoring terms so it will not overflow boolean max clause count. The N controls the size of the top scoring terms to use.

  • top_terms_boost_N: A rewrite method that first translates each term into should clause in boolean query, but the scores are only computed as the boost. This rewrite method only uses the top scoring terms so it will not overflow the boolean max clause count. The N controls the size of the top scoring terms to use.

  • top_terms_blended_freqs_N: A rewrite method that first translates each term into should clause in boolean query, but all term queries compute scores as if they had the same frequency. In practice the frequency which is used is the maximum frequency of all matching terms. This rewrite method only uses the top scoring terms so it will not overflow boolean max clause count. The N controls the size of the top scoring terms to use.