Query and filter context
The behaviour of a query clause depends on whether it is used in query context or in filter context:
- Query context
-
A query clause used in query context answers the question
`How well does this document match this query clause?'' Besides deciding whether or not the document matches, the query clause also calculates a `_score
representing how well the document matches, relative to other documents.Query context is in effect whenever a query clause is passed to a
query
parameter, such as thequery
parameter in thesearch
API. - Filter context
-
In filter context, a query clause answers the question ``Does this document match this query clause?'' The answer is a simple Yes or No — no scores are calculated. Filter context is mostly used for filtering structured data, e.g.
-
Does this timestamp fall into the range 2015 to 2016?
-
Is the status field set to "published"?
Frequently used filters will be cached automatically by Elasticsearch, to speed up performance.
Filter context is in effect whenever a query clause is passed to a
filter
parameter, such as thefilter
ormust_not
parameters in thebool
query, thefilter
parameter in theconstant_score
query, or thefilter
aggregation. -
Below is an example of query clauses being used in query and filter context
in the search
API. This query will match documents where all of the following
conditions are met:
-
The
title
field contains the wordsearch
. -
The
content
field contains the wordelasticsearch
. -
The
status
field contains the exact wordpublished
. -
The
publish_date
field contains a date from 1 Jan 2015 onwards.
GET /_search
{
"query": { (1)
"bool": { (2)
"must": [
{ "match": { "title": "Search" }},
{ "match": { "content": "Elasticsearch" }}
],
"filter": [ (3)
{ "term": { "status": "published" }},
{ "range": { "publish_date": { "gte": "2015-01-01" }}}
]
}
}
}
-
The
query
parameter indicates query context. -
The
bool
and twomatch
clauses are used in query context, which means that they are used to score how well each document matches. -
The
filter
parameter indicates filter context. Itsterm
andrange
clauses are used in filter context. They will filter out documents which do not match, but they will not affect the score for matching documents.
Warning
|
Scores calculated for queries in query context are represented as single precision floating point numbers; they have only 24 bits for significand’s precision. Score calculations that exceed the significand’s precision will be converted to floats with loss of precision. |
Tip
|
Use query clauses in query context for conditions which should affect the score of matching documents (i.e. how well does the document match), and use all other query clauses in filter context. |
Match All Query
The most simple query, which matches all documents, giving them all a _score
of 1.0
.
GET /_search
{
"query": {
"match_all": {}
}
}
The _score
can be changed with the boost
parameter:
GET /_search
{
"query": {
"match_all": { "boost" : 1.2 }
}
}
Match None Query
This is the inverse of the match_all
query, which matches no documents.
GET /_search
{
"query": {
"match_none": {}
}
}
Full text queries
The high-level full text queries are usually used for running full text
queries on full text fields like the body of an email. They understand how the
field being queried is analyzed and will apply each field’s
analyzer
(or search_analyzer
) to the query string before executing.
The queries in this group are:
match
query-
The standard query for performing full text queries, including fuzzy matching and phrase or proximity queries.
match_phrase
query-
Like the
match
query but used for matching exact phrases or word proximity matches. match_phrase_prefix
query-
The poor man’s search-as-you-type. Like the
match_phrase
query, but does a wildcard search on the final word. multi_match
query-
The multi-field version of the
match
query. common
terms query-
A more specialized query which gives more preference to uncommon words.
query_string
query-
Supports the compact Lucene query string syntax, allowing you to specify AND|OR|NOT conditions and multi-field search within a single query string. For expert users only.
simple_query_string
query-
A simpler, more robust version of the
query_string
syntax suitable for exposing directly to users.
Match Query
match
queries accept text/numerics/dates, analyzes
them, and constructs a query. For example:
GET /_search
{
"query": {
"match" : {
"message" : "this is a test"
}
}
}
Note, message
is the name of a field, you can substitute the name of
any field instead.
match
The match
query is of type boolean
. It means that the text
provided is analyzed and the analysis process constructs a boolean query
from the provided text. The operator
flag can be set to or
or and
to control the boolean clauses (defaults to or
). The minimum number of
optional should
clauses to match can be set using the
minimum_should_match
parameter.
Here is an example when providing additional parameters (note the slight
change in structure, message
is the field name):
GET /_search
{
"query": {
"match" : {
"message" : {
"query" : "this is a test",
"operator" : "and"
}
}
}
}
The analyzer
can be set to control which analyzer will perform the
analysis process on the text. It defaults to the field explicit mapping
definition, or the default search analyzer.
The lenient
parameter can be set to true
to ignore exceptions caused by
data-type mismatches, such as trying to query a numeric field with a text
query string. Defaults to false
.
Fuzziness
fuzziness
allows fuzzy matching based on the type of field being queried.
See [fuzziness] for allowed settings.
The prefix_length
and
max_expansions
can be set in this case to control the fuzzy process.
If the fuzzy option is set the query will use top_terms_blended_freqs_${max_expansions}
as its rewrite
method the fuzzy_rewrite
parameter allows to control how the query will get
rewritten.
Fuzzy transpositions (ab
→ ba
) are allowed by default but can be disabled
by setting fuzzy_transpositions
to false
.
Note that fuzzy matching is not applied to terms with synonyms, as under the hood these terms are expanded to a special synonym query that blends term frequencies, which does not support fuzzy expansion.
GET /_search
{
"query": {
"match" : {
"message" : {
"query" : "this is a testt",
"fuzziness": "AUTO"
}
}
}
}
Zero terms query
If the analyzer used removes all tokens in a query like a stop
filter
does, the default behavior is to match no documents at all. In order to
change that the zero_terms_query
option can be used, which accepts
none
(default) and all
which corresponds to a match_all
query.
GET /_search
{
"query": {
"match" : {
"message" : {
"query" : "to be or not to be",
"operator" : "and",
"zero_terms_query": "all"
}
}
}
}
Cutoff frequency
The match query supports a cutoff_frequency
that allows
specifying an absolute or relative document frequency where high
frequency terms are moved into an optional subquery and are only scored
if one of the low frequency (below the cutoff) terms in the case of an
or
operator or all of the low frequency terms in the case of an and
operator match.
This query allows handling stopwords
dynamically at runtime, is domain
independent and doesn’t require a stopword file. It prevents scoring /
iterating high frequency terms and only takes the terms into account if a
more significant / lower frequency term matches a document. Yet, if all
of the query terms are above the given cutoff_frequency
the query is
automatically transformed into a pure conjunction (and
) query to
ensure fast execution.
The cutoff_frequency
can either be relative to the total number of
documents if in the range [0..1)
or absolute if greater or equal to
1.0
.
Here is an example showing a query composed of stopwords exclusively:
GET /_search
{
"query": {
"match" : {
"message" : {
"query" : "to be or not to be",
"cutoff_frequency" : 0.001
}
}
}
}
Important
|
The cutoff_frequency option operates on a per-shard-level. This means
that when trying it out on test indexes with low document numbers you
should follow the advice in {defguide}/relevance-is-broken.html[Relevance is broken].
|
Synonyms
The match
query supports multi-terms synonym expansion with the synonym_graph token filter. When this filter is used, the parser creates a phrase query for each multi-terms synonyms.
For example, the following synonym: "ny, new york" would produce:
(ny OR ("new york"))
It is also possible to match multi terms synonyms with conjunctions instead:
GET /_search
{
"query": {
"match" : {
"message": {
"query" : "ny city",
"auto_generate_synonyms_phrase_query" : false
}
}
}
}
The example above creates a boolean query:
(ny OR (new AND york)) city
that matches documents with the term ny
or the conjunction new AND york
.
By default the parameter auto_generate_synonyms_phrase_query
is set to true
.
Match Phrase Query
The match_phrase
query analyzes the text and creates a phrase
query
out of the analyzed text. For example:
GET /_search
{
"query": {
"match_phrase" : {
"message" : "this is a test"
}
}
}
A phrase query matches terms up to a configurable slop
(which defaults to 0) in any order. Transposed terms have a slop of 2.
The analyzer
can be set to control which analyzer will perform the
analysis process on the text. It defaults to the field explicit mapping
definition, or the default search analyzer, for example:
GET /_search
{
"query": {
"match_phrase" : {
"message" : {
"query" : "this is a test",
"analyzer" : "my_analyzer"
}
}
}
}
This query also accepts zero_terms_query
, as explained in match
query.
Match Phrase Prefix Query
The match_phrase_prefix
is the same as match_phrase
, except that it
allows for prefix matches on the last term in the text. For example:
GET /_search
{
"query": {
"match_phrase_prefix" : {
"message" : "quick brown f"
}
}
}
It accepts the same parameters as the phrase type. In addition, it also
accepts a max_expansions
parameter (default 50
) that can control to how
many suffixes the last term will be expanded. It is highly recommended to set
it to an acceptable value to control the execution time of the query. For
example:
GET /_search
{
"query": {
"match_phrase_prefix" : {
"message" : {
"query" : "quick brown f",
"max_expansions" : 10
}
}
}
}
Important
|
The Consider the query string The problem is that the first 50 terms may not include the term For better solutions for search-as-you-type see the completion suggester and {defguide}/_index_time_search_as_you_type.html[Index-Time Search-as-You-Type]. |
Multi Match Query
The multi_match
query builds on the match
query
to allow multi-field queries:
GET /_search
{
"query": {
"multi_match" : {
"query": "this is a test", (1)
"fields": [ "subject", "message" ] (2)
}
}
}
-
The query string.
-
The fields to be queried.
fields
and per-field boosting
Fields can be specified with wildcards, eg:
GET /_search
{
"query": {
"multi_match" : {
"query": "Will Smith",
"fields": [ "title", "*_name" ] (1)
}
}
}
-
Query the
title
,first_name
andlast_name
fields.
Individual fields can be boosted with the caret (^
) notation:
GET /_search
{
"query": {
"multi_match" : {
"query" : "this is a test",
"fields" : [ "subject^3", "message" ] (1)
}
}
}
-
The
subject
field is three times as important as themessage
field.
If no fields
are provided, the multi_match
query defaults to the index.query.default_field
index settings, which in turn defaults to .
extracts all fields in the mapping that
are eligible to term queries and filters the metadata fields. All extracted fields are then
combined to build a query.
Warning
|
If you have a huge number of fields, the above auto expansion might lead to querying a
large number of fields which might cause performance issues. In future versions (starting in 7.0),
there will be a limit on the number of fields that can be queried at once. This limit will be
determined by the indices.query.bool.max_clause_count setting which defaults to 1024.
|
Types of multi_match
query:
The way the multi_match
query is executed internally depends on the type
parameter, which can be set to:
best_fields
|
(default) Finds documents which match any field, but
uses the |
most_fields
|
Finds documents which match any field and combines
the |
cross_fields
|
Treats fields with the same |
phrase
|
Runs a |
phrase_prefix
|
Runs a |
best_fields
The best_fields
type is most useful when you are searching for multiple
words best found in the same field. For instance brown fox'' in a single
field is more meaningful than
brown'' in one field and ``fox'' in the other.
The best_fields
type generates a match
query for
each field and wraps them in a dis_max
query, to
find the single best matching field. For instance, this query:
GET /_search
{
"query": {
"multi_match" : {
"query": "brown fox",
"type": "best_fields",
"fields": [ "subject", "message" ],
"tie_breaker": 0.3
}
}
}
would be executed as:
GET /_search
{
"query": {
"dis_max": {
"queries": [
{ "match": { "subject": "brown fox" }},
{ "match": { "message": "brown fox" }}
],
"tie_breaker": 0.3
}
}
}
Normally the best_fields
type uses the score of the single best matching
field, but if tie_breaker
is specified, then it calculates the score as
follows:
-
the score from the best matching field
-
plus
tie_breaker * _score
for all other matching fields
Also, accepts analyzer
, boost
, operator
, minimum_should_match
,
fuzziness
, lenient
, prefix_length
, max_expansions
, rewrite
, zero_terms_query
,
cutoff_frequency
, auto_generate_synonyms_phrase_query
and fuzzy_transpositions
,
as explained in match query.
Important
|
operator and minimum_should_match The Take this query for example:
This query is executed as: (+first_name:will +first_name:smith) | (+last_name:will +last_name:smith) In other words, all terms must be present in a single field for a document to match. See |
most_fields
The most_fields
type is most useful when querying multiple fields that
contain the same text analyzed in different ways. For instance, the main
field may contain synonyms, stemming and terms without diacritics. A second
field may contain the original terms, and a third field might contain
shingles. By combining scores from all three fields we can match as many
documents as possible with the main field, but use the second and third fields
to push the most similar results to the top of the list.
This query:
GET /_search
{
"query": {
"multi_match" : {
"query": "quick brown fox",
"type": "most_fields",
"fields": [ "title", "title.original", "title.shingles" ]
}
}
}
would be executed as:
GET /_search
{
"query": {
"bool": {
"should": [
{ "match": { "title": "quick brown fox" }},
{ "match": { "title.original": "quick brown fox" }},
{ "match": { "title.shingles": "quick brown fox" }}
]
}
}
}
The score from each match
clause is added together, then divided by the
number of match
clauses.
Also, accepts analyzer
, boost
, operator
, minimum_should_match
,
fuzziness
, lenient
, prefix_length
, max_expansions
, rewrite
, zero_terms_query
and cutoff_frequency
, as explained in match query, but
see operator
and minimum_should_match
.
phrase
and phrase_prefix
The phrase
and phrase_prefix
types behave just like best_fields
,
but they use a match_phrase
or match_phrase_prefix
query instead of a
match
query.
This query:
GET /_search
{
"query": {
"multi_match" : {
"query": "quick brown f",
"type": "phrase_prefix",
"fields": [ "subject", "message" ]
}
}
}
would be executed as:
GET /_search
{
"query": {
"dis_max": {
"queries": [
{ "match_phrase_prefix": { "subject": "quick brown f" }},
{ "match_phrase_prefix": { "message": "quick brown f" }}
]
}
}
}
Also, accepts analyzer
, boost
, lenient
and zero_terms_query
as explained
in Match Query, as well as slop
which is explained in Match Phrase Query.
Type phrase_prefix
additionally accepts max_expansions
.
Important
|
phrase , phrase_prefix and fuzziness The |
cross_fields
The cross_fields
type is particularly useful with structured documents where
multiple fields should match. For instance, when querying the first_name
and last_name
fields for Will Smith'', the best match is likely to have
Will'' in one field and ``Smith'' in the other.
One way of dealing with these types of queries is simply to index the
first_name
and last_name
fields into a single full_name
field. Of
course, this can only be done at index time.
The cross_field
type tries to solve these problems at query time by taking a
term-centric approach. It first analyzes the query string into individual
terms, then looks for each term in any of the fields, as though they were one
big field.
A query like:
GET /_search
{
"query": {
"multi_match" : {
"query": "Will Smith",
"type": "cross_fields",
"fields": [ "first_name", "last_name" ],
"operator": "and"
}
}
}
is executed as:
+(first_name:will last_name:will) +(first_name:smith last_name:smith)
In other words, all terms must be present in at least one field for a
document to match. (Compare this to
the logic used for best_fields
and most_fields
.)
That solves one of the two problems. The problem of differing term frequencies is solved by blending the term frequencies for all fields in order to even out the differences.
In practice, first_name:smith
will be treated as though it has the same
frequencies as last_name:smith
, plus one. This will make matches on
first_name
and last_name
have comparable scores, with a tiny advantage
for last_name
since it is the most likely field that contains smith
.
Note that cross_fields
is usually only useful on short string fields
that all have a boost
of 1
. Otherwise boosts, term freqs and length
normalization contribute to the score in such a way that the blending of term
statistics is not meaningful anymore.
If you run the above query through the [search-validate], it returns this explanation:
+blended("will", fields: [first_name, last_name]) +blended("smith", fields: [first_name, last_name])
Also, accepts analyzer
, boost
, operator
, minimum_should_match
,
lenient
, zero_terms_query
and cutoff_frequency
, as explained in
match query.
cross_field
and analysis
The cross_field
type can only work in term-centric mode on fields that have
the same analyzer. Fields with the same analyzer are grouped together as in
the example above. If there are multiple groups, they are combined with a
bool
query.
For instance, if we have a first
and last
field which have
the same analyzer, plus a first.edge
and last.edge
which
both use an edge_ngram
analyzer, this query:
GET /_search
{
"query": {
"multi_match" : {
"query": "Jon",
"type": "cross_fields",
"fields": [
"first", "first.edge",
"last", "last.edge"
]
}
}
}
would be executed as:
blended("jon", fields: [first, last]) | ( blended("j", fields: [first.edge, last.edge]) blended("jo", fields: [first.edge, last.edge]) blended("jon", fields: [first.edge, last.edge]) )
In other words, first
and last
would be grouped together and
treated as a single field, and first.edge
and last.edge
would be
grouped together and treated as a single field.
Having multiple groups is fine, but when combined with operator
or
minimum_should_match
, it can suffer from the same problem
as most_fields
or best_fields
.
You can easily rewrite this query yourself as two separate cross_fields
queries combined with a bool
query, and apply the minimum_should_match
parameter to just one of them:
GET /_search
{
"query": {
"bool": {
"should": [
{
"multi_match" : {
"query": "Will Smith",
"type": "cross_fields",
"fields": [ "first", "last" ],
"minimum_should_match": "50%" (1)
}
},
{
"multi_match" : {
"query": "Will Smith",
"type": "cross_fields",
"fields": [ "*.edge" ]
}
}
]
}
}
}
-
Either
will
orsmith
must be present in either of thefirst
orlast
fields
You can force all fields into the same group by specifying the analyzer
parameter in the query.
GET /_search
{
"query": {
"multi_match" : {
"query": "Jon",
"type": "cross_fields",
"analyzer": "standard", (1)
"fields": [ "first", "last", "*.edge" ]
}
}
}
-
Use the
standard
analyzer for all fields.
which will be executed as:
blended("will", fields: [first, first.edge, last.edge, last]) blended("smith", fields: [first, first.edge, last.edge, last])
tie_breaker
By default, each per-term blended
query will use the best score returned by
any field in a group, then these scores are added together to give the final
score. The tie_breaker
parameter can change the default behaviour of the
per-term blended
queries. It accepts:
0.0
|
Take the single best score out of (eg) |
1.0
|
Add together the scores for (eg) |
0.0 < n < 1.0
|
Take the single best score plus tie_breaker multiplied by each of the scores from other matching fields. |
Important
|
cross_fields and fuzziness The |
Common Terms Query
The common
terms query is a modern alternative to stopwords which
improves the precision and recall of search results (by taking stopwords
into account), without sacrificing performance.
The problem
Every term in a query has a cost. A search for "The brown fox"
requires three term queries, one for each of "the"
, "brown"
and
"fox"
, all of which are executed against all documents in the index.
The query for "the"
is likely to match many documents and thus has a
much smaller impact on relevance than the other two terms.
Previously, the solution to this problem was to ignore terms with high
frequency. By treating "the"
as a stopword, we reduce the index size
and reduce the number of term queries that need to be executed.
The problem with this approach is that, while stopwords have a small
impact on relevance, they are still important. If we remove stopwords,
we lose precision, (eg we are unable to distinguish between "happy"
and "not happy"
) and we lose recall (eg text like "The The"
or
"To be or not to be"
would simply not exist in the index).
The solution
The common
terms query divides the query terms into two groups: more
important (ie low frequency terms) and less important (ie high
frequency terms which would previously have been stopwords).
First it searches for documents which match the more important terms. These are the terms which appear in fewer documents and have a greater impact on relevance.
Then, it executes a second query for the less important terms — terms
which appear frequently and have a low impact on relevance. But instead
of calculating the relevance score for all matching documents, it only
calculates the _score
for documents already matched by the first
query. In this way the high frequency terms can improve the relevance
calculation without paying the cost of poor performance.
If a query consists only of high frequency terms, then a single query is
executed as an AND
(conjunction) query, in other words all terms are
required. Even though each individual term will match many documents,
the combination of terms narrows down the resultset to only the most
relevant. The single query can also be executed as an OR
with a
specific
minimum_should_match
,
in this case a high enough value should probably be used.
Terms are allocated to the high or low frequency groups based on the
cutoff_frequency
, which can be specified as an absolute frequency
(>=1
) or as a relative frequency (0.0 .. 1.0
). (Remember that document
frequencies are computed on a per shard level as explained in the blog post
{defguide}/relevance-is-broken.html[Relevance is broken].)
Perhaps the most interesting property of this query is that it adapts to
domain specific stopwords automatically. For example, on a video hosting
site, common terms like "clip"
or "video"
will automatically behave
as stopwords without the need to maintain a manual list.
Examples
In this example, words that have a document frequency greater than 0.1%
(eg "this"
and "is"
) will be treated as common terms.
GET /_search
{
"query": {
"common": {
"body": {
"query": "this is bonsai cool",
"cutoff_frequency": 0.001
}
}
}
}
The number of terms which should match can be controlled with the
minimum_should_match
(high_freq
, low_freq
), low_freq_operator
(default "or"
) and
high_freq_operator
(default "or"
) parameters.
For low frequency terms, set the low_freq_operator
to "and"
to make
all terms required:
GET /_search
{
"query": {
"common": {
"body": {
"query": "nelly the elephant as a cartoon",
"cutoff_frequency": 0.001,
"low_freq_operator": "and"
}
}
}
}
which is roughly equivalent to:
GET /_search
{
"query": {
"bool": {
"must": [
{ "term": { "body": "nelly"}},
{ "term": { "body": "elephant"}},
{ "term": { "body": "cartoon"}}
],
"should": [
{ "term": { "body": "the"}},
{ "term": { "body": "as"}},
{ "term": { "body": "a"}}
]
}
}
}
Alternatively use
minimum_should_match
to specify a minimum number or percentage of low frequency terms which
must be present, for instance:
GET /_search
{
"query": {
"common": {
"body": {
"query": "nelly the elephant as a cartoon",
"cutoff_frequency": 0.001,
"minimum_should_match": 2
}
}
}
}
which is roughly equivalent to:
GET /_search
{
"query": {
"bool": {
"must": {
"bool": {
"should": [
{ "term": { "body": "nelly"}},
{ "term": { "body": "elephant"}},
{ "term": { "body": "cartoon"}}
],
"minimum_should_match": 2
}
},
"should": [
{ "term": { "body": "the"}},
{ "term": { "body": "as"}},
{ "term": { "body": "a"}}
]
}
}
}
A different
minimum_should_match
can be applied for low and high frequency terms with the additional
low_freq
and high_freq
parameters. Here is an example when providing
additional parameters (note the change in structure):
GET /_search
{
"query": {
"common": {
"body": {
"query": "nelly the elephant not as a cartoon",
"cutoff_frequency": 0.001,
"minimum_should_match": {
"low_freq" : 2,
"high_freq" : 3
}
}
}
}
}
which is roughly equivalent to:
GET /_search
{
"query": {
"bool": {
"must": {
"bool": {
"should": [
{ "term": { "body": "nelly"}},
{ "term": { "body": "elephant"}},
{ "term": { "body": "cartoon"}}
],
"minimum_should_match": 2
}
},
"should": {
"bool": {
"should": [
{ "term": { "body": "the"}},
{ "term": { "body": "not"}},
{ "term": { "body": "as"}},
{ "term": { "body": "a"}}
],
"minimum_should_match": 3
}
}
}
}
}
In this case it means the high frequency terms have only an impact on
relevance when there are at least three of them. But the most
interesting use of the
minimum_should_match
for high frequency terms is when there are only high frequency terms:
GET /_search
{
"query": {
"common": {
"body": {
"query": "how not to be",
"cutoff_frequency": 0.001,
"minimum_should_match": {
"low_freq" : 2,
"high_freq" : 3
}
}
}
}
}
which is roughly equivalent to:
GET /_search
{
"query": {
"bool": {
"should": [
{ "term": { "body": "how"}},
{ "term": { "body": "not"}},
{ "term": { "body": "to"}},
{ "term": { "body": "be"}}
],
"minimum_should_match": "3<50%"
}
}
}
The high frequency generated query is then slightly less restrictive
than with an AND
.
The common
terms query also supports boost
and analyzer
as
parameters.
Query String Query
A query that uses a query parser in order to parse its content. Here is an example:
GET /_search
{
"query": {
"query_string" : {
"default_field" : "content",
"query" : "this AND that OR thus"
}
}
}
The query_string
query parses the input and splits text around operators.
Each textual part is analyzed independently of each other. For instance the following query:
GET /_search
{
"query": {
"query_string" : {
"default_field" : "content",
"query" : "(new york city) OR (big apple)" (1)
}
}
}
-
will be split into
new york city
andbig apple
and each part is then analyzed independently by the analyzer configured for the field.
Warning
|
Whitespaces are not considered operators, this means that new york city
will be passed "as is" to the analyzer configured for the field. If the field is a keyword
field the analyzer will create a single term new york city and the query builder will
use this term in the query. If you want to query each term separately you need to add explicit
operators around the terms (e.g. new AND york AND city ).
|
When multiple fields are provided it is also possible to modify how the different
field queries are combined inside each textual part using the type
parameter.
The possible modes are described here and the default is best_fields
.
The query_string
top level parameters include:
Parameter | Description |
---|---|
|
The actual query to be parsed. See Query string syntax. |
|
The default field for query terms if no prefix field
is specified. Defaults to the WARNING: In future versions (starting in 7.0), there will be a limit on the number of fields that can
be queried at once. This limit will be determined by the |
|
The default operator used if no explicit operator
is specified. For example, with a default operator of |
|
The analyzer name used to analyze the query string. |
|
The name of the analyzer that is used to analyze
quoted phrases in the query string. For those parts, it overrides other
analyzers that are set using the |
|
When set, |
|
Set to |
|
Controls the number of terms fuzzy queries will
expand to. Defaults to |
|
Set the fuzziness for fuzzy queries. Defaults
to |
|
Set the prefix length for fuzzy queries. Default
is |
|
Set to |
|
Sets the default slop for phrases. If zero, then exact
phrase matches are required. Default value is |
|
Sets the boost value of the query. Defaults to |
|
Deprecated setting. This setting is ignored, use [type=phrase] instead to make phrase queries out of all text that is within query operators, or use explicitly quoted strings if you need finer-grained control. |
|
By default, wildcards terms in a query string are
not analyzed. By setting this value to |
|
Limit on how many automaton states regexp queries are allowed to create. This protects against too-difficult (e.g. exponentially hard) regexps. Defaults to 10000. |
|
A value controlling how many "should" clauses
in the resulting boolean query should match. It can be an absolute value
( |
|
If set to |
|
Time Zone to be applied to any range query related to dates. See also JODA timezone. |
|
A suffix to append to fields for quoted parts of the query string. This allows to use a field that has a different analysis chain for exact matching. Look here for a comprehensive example. |
|
Whether phrase queries should be automatically generated for multi terms synonyms.
Defaults to |
|
deprecated[6.0.0, set |
When a multi term query is being generated, one can control how it gets rewritten using the rewrite parameter.
Default Field
When not explicitly specifying the field to search on in the query
string syntax, the index.query.default_field
will be used to derive
which field to search on. If the index.query.default_field
is not specified,
the query_string
will automatically attempt to determine the existing fields in the index’s
mapping that are queryable, and perform the search on those fields.
This will not include nested documents, use a nested query to search those documents.
Note
|
For mappings with a large number of fields, searching across all queryable fields in the mapping could be expensive. |
Multi Field
The query_string
query can also run against multiple fields. Fields can be
provided via the fields
parameter (example below).
The idea of running the query_string
query against multiple fields is to
expand each query term to an OR clause like this:
field1:query_term OR field2:query_term | ...
For example, the following query
GET /_search
{
"query": {
"query_string" : {
"fields" : ["content", "name"],
"query" : "this AND that"
}
}
}
matches the same words as
GET /_search
{
"query": {
"query_string": {
"query": "(content:this OR name:this) AND (content:that OR name:that)"
}
}
}
Since several queries are generated from the individual search terms,
combining them is automatically done using a dis_max
query with a tie_breaker
.
For example (the name
is boosted by 5 using ^5
notation):
GET /_search
{
"query": {
"query_string" : {
"fields" : ["content", "name^5"],
"query" : "this AND that OR thus",
"tie_breaker" : 0
}
}
}
Simple wildcard can also be used to search "within" specific inner
elements of the document. For example, if we have a city
object with
several fields (or inner object with fields) in it, we can automatically
search on all "city" fields:
GET /_search
{
"query": {
"query_string" : {
"fields" : ["city.*"],
"query" : "this AND that OR thus"
}
}
}
Another option is to provide the wildcard fields search in the query
string itself (properly escaping the sign), for example:
city.\:something
:
GET /_search
{
"query": {
"query_string" : {
"query" : "city.\\*:(this AND that OR thus)"
}
}
}
Note
|
Since \ (backslash) is a special character in json strings, it needs to
be escaped, hence the two backslashes in the above query_string .
|
When running the query_string
query against multiple fields, the
following additional parameters are allowed:
Parameter | Description |
---|---|
|
How the fields should be combined to build the text query.
See types for a complete example.
Defaults to |
|
The disjunction max tie breaker for multi fields.
Defaults to |
The fields parameter can also include pattern based field names, allowing to automatically expand to the relevant fields (dynamically introduced fields included). For example:
GET /_search
{
"query": {
"query_string" : {
"fields" : ["content", "name.*^5"],
"query" : "this AND that OR thus"
}
}
}
Synonyms
The query_string
query supports multi-terms synonym expansion with the synonym_graph token filter. When this filter is used, the parser creates a phrase query for each multi-terms synonyms.
For example, the following synonym: ny, new york
would produce:
(ny OR ("new york"))
It is also possible to match multi terms synonyms with conjunctions instead:
GET /_search
{
"query": {
"query_string" : {
"default_field": "title",
"query" : "ny city",
"auto_generate_synonyms_phrase_query" : false
}
}
}
The example above creates a boolean query:
(ny OR (new AND york)) city)
that matches documents with the term ny
or the conjunction new AND york
.
By default the parameter auto_generate_synonyms_phrase_query
is set to true
.
Minimum should match
The query_string
splits the query around each operator to create a boolean
query for the entire input. You can use minimum_should_match
to control how
many "should" clauses in the resulting query should match.
GET /_search
{
"query": {
"query_string": {
"fields": [
"title"
],
"query": "this that thus",
"minimum_should_match": 2
}
}
}
The example above creates a boolean query:
(title:this title:that title:thus)~2
that matches documents with at least two of the terms this
, that
or thus
in the single field title
.
Multi Field
GET /_search
{
"query": {
"query_string": {
"fields": [
"title",
"content"
],
"query": "this that thus",
"minimum_should_match": 2
}
}
}
The example above creates a boolean query:
content:this content:that content:thus) | (title:this title:that title:thus
that matches documents with the disjunction max over the fields title
and
content
. Here the minimum_should_match
parameter can’t be applied.
GET /_search
{
"query": {
"query_string": {
"fields": [
"title",
"content"
],
"query": "this OR that OR thus",
"minimum_should_match": 2
}
}
}
Adding explicit operators forces each term to be considered as a separate clause.
The example above creates a boolean query:
content:this | title:this) (content:that | title:that) (content:thus | title:thus~2
that matches documents with at least two of the three "should" clauses, each of them made of the disjunction max over the fields for each term.
Cross Field
GET /_search
{
"query": {
"query_string": {
"fields": [
"title",
"content"
],
"query": "this OR that OR thus",
"type": "cross_fields",
"minimum_should_match": 2
}
}
}
The cross_fields
value in the type
field indicates that fields that have the
same analyzer should be grouped together when the input is analyzed.
The example above creates a boolean query:
(blended(terms:[field2:this, field1:this]) blended(terms:[field2:that, field1:that]) blended(terms:[field2:thus, field1:thus]))~2
that matches documents with at least two of the three per-term blended queries.
Query string syntax
The query string `mini-language'' is used by the
Query String Query and by the
`q
query string parameter in the search
API.
The query string is parsed into a series of terms and operators. A
term can be a single word — quick
or brown
— or a phrase, surrounded by
double quotes — "quick brown"
— which searches for all the words in the
phrase, in the same order.
Operators allow you to customize the search — the available options are explained below.
Field names
As mentioned in Query String Query, the default_field
is searched for the
search terms, but it is possible to specify other fields in the query syntax:
-
where the
status
field containsactive
status:active
-
where the
title
field containsquick
orbrown
. If you omit the OR operator the default operator will be usedtitle:(quick OR brown) title:(quick brown)
-
where the
author
field contains the exact phrase"john smith"
author:"John Smith"
-
where the
first name
field containsAlice
(note how we need to escape the space with a backslash)first\ name:Alice
-
where any of the fields
book.title
,book.content
orbook.date
containsquick
orbrown
(note how we need to escape the*
with a backslash):book.\*:(quick brown)
-
where the field
title
has any non-null value:_exists_:title
Wildcards
Wildcard searches can be run on individual terms, using ?
to replace
a single character, and *
to replace zero or more characters:
qu?ck bro*
Be aware that wildcard queries can use an enormous amount of memory and
perform very badly — just think how many terms need to be queried to
match the query string "a* b* c*"
.
Warning
|
Pure wildcards
... and would not match if the field is missing or set with an explicit null value like the following:
|
Warning
|
Allowing a wildcard at the beginning of a word (eg |
Only parts of the analysis chain that operate at the character level are applied. So for instance, if the analyzer performs both lowercasing and stemming, only the lowercasing will be applied: it would be wrong to perform stemming on a word that is missing some of its letters.
By setting analyze_wildcard
to true, queries that end with a *
will be
analyzed and a boolean query will be built out of the different tokens, by
ensuring exact matches on the first N-1 tokens, and prefix match on the last
token.
Regular expressions
Regular expression patterns can be embedded in the query string by
wrapping them in forward-slashes ("/"
):
name:/joh?n(ath[oa]n)/
The supported regular expression syntax is explained in Regular expression syntax.
Warning
|
The /.*n/ Use with caution! |
Fuzziness
We can search for terms that are similar to, but not exactly like our search terms, using the ``fuzzy'' operator:
quikc~ brwn~ foks~
This uses the Damerau-Levenshtein distance to find all terms with a maximum of two changes, where a change is the insertion, deletion or substitution of a single character, or transposition of two adjacent characters.
The default edit distance is 2
, but an edit distance of 1
should be
sufficient to catch 80% of all human misspellings. It can be specified as:
quikc~1
Proximity searches
While a phrase query (eg "john smith"
) expects all of the terms in exactly
the same order, a proximity query allows the specified words to be further
apart or in a different order. In the same way that fuzzy queries can
specify a maximum edit distance for characters in a word, a proximity search
allows us to specify a maximum edit distance of words in a phrase:
"fox quick"~5
The closer the text in a field is to the original order specified in the
query string, the more relevant that document is considered to be. When
compared to the above example query, the phrase "quick fox"
would be
considered more relevant than "quick brown fox"
.
Ranges
Ranges can be specified for date, numeric or string fields. Inclusive ranges
are specified with square brackets [min TO max]
and exclusive ranges with
curly brackets {min TO max}
.
-
All days in 2012:
date:[2012-01-01 TO 2012-12-31]
-
Numbers 1..5
count:[1 TO 5]
-
Tags between
alpha
andomega
, excludingalpha
andomega
:tag:{alpha TO omega}
-
Numbers from 10 upwards
count:[10 TO *]
-
Dates before 2012
date:{* TO 2012-01-01}
Curly and square brackets can be combined:
-
Numbers from 1 up to but not including 5
count:[1 TO 5}
Ranges with one side unbounded can use the following syntax:
age:>10 age:>=10 age:<10 age:<=10
Note
|
To combine an upper and lower bound with the simplified syntax, you
would need to join two clauses with an age:(>=10 AND <20) age:(+>=10 +<20) |
The parsing of ranges in query strings can be complex and error prone. It is
much more reliable to use an explicit range
query.
Boosting
Use the boost operator ^
to make one term more relevant than another.
For instance, if we want to find all documents about foxes, but we are
especially interested in quick foxes:
quick^2 fox
The default boost
value is 1, but can be any positive floating point number.
Boosts between 0 and 1 reduce relevance.
Boosts can also be applied to phrases or to groups:
"john smith"^2 (foo bar)^4
Boolean operators
By default, all terms are optional, as long as one term matches. A search
for foo bar baz
will find any document that contains one or more of
foo
or bar
or baz
. We have already discussed the default_operator
above which allows you to force all terms to be required, but there are
also boolean operators which can be used in the query string itself
to provide more control.
The preferred operators are +
(this term must be present) and -
(this term must not be present). All other terms are optional.
For example, this query:
quick brown +fox -news
states that:
-
fox
must be present -
news
must not be present -
quick
andbrown
are optional — their presence increases the relevance
The familiar boolean operators AND
, OR
and NOT
(also written &&
, ||
and !
) are also supported but beware that they do not honor the usual
precedence rules, so parentheses should be used whenever multiple operators are
used together. For instance the previous query could be rewritten as:
((quick AND fox) OR (brown AND fox) OR fox) AND NOT news
-
This form now replicates the logic from the original query correctly, but the relevance scoring bears little resemblance to the original.
In contrast, the same query rewritten using the match
query
would look like this:
{ "bool": { "must": { "match": "fox" }, "should": { "match": "quick brown" }, "must_not": { "match": "news" } } }
Grouping
Multiple terms or clauses can be grouped together with parentheses, to form sub-queries:
(quick OR brown) AND fox
Groups can be used to target a particular field, or to boost the result of a sub-query:
status:(active OR pending) title:(full text search)^2
Reserved characters
If you need to use any of the characters which function as operators in your
query itself (and not as operators), then you should escape them with
a leading backslash. For instance, to search for (1+1)=2
, you would
need to write your query as \(1\+1\)\=2
. When using JSON for the request body, two preceding backslashes (\\
) are required; the backslash is a reserved escaping character in JSON strings.
GET /twitter/_search
{
"query" : {
"query_string" : {
"query" : "kimchy\\!",
"fields" : ["user"]
}
}
}
The reserved characters are: + - = && || > < ! ( ) { } [ ] ^ " ~ * ? : \ /
Failing to escape these special characters correctly could lead to a syntax error which prevents your query from running.
Note
|
< and > can’t be escaped at all. The only way to prevent them from
attempting to create a range query is to remove them from the query string
entirely.
|
Empty Query
If the query string is empty or only contains whitespaces the query will yield an empty result set.
Simple Query String Query
A query that uses the SimpleQueryParser to parse its context. Unlike the
regular query_string
query, the simple_query_string
query will never
throw an exception, and discards invalid parts of the query. Here is
an example:
GET /_search
{
"query": {
"simple_query_string" : {
"query": "\"fried eggs\" +(eggplant | potato) -frittata",
"fields": ["title^5", "body"],
"default_operator": "and"
}
}
}
The simple_query_string
top level parameters include:
Parameter | Description |
---|---|
|
The actual query to be parsed. See below for syntax. |
|
The fields to perform the parsed query against. Defaults to the
WARNING: In future versions (starting in 7.0), there will be a limit on the number of fields that can
be queried at once. This limit will be determined by the |
|
The default operator used if no explicit operator
is specified. For example, with a default operator of |
|
Force the analyzer to use to analyze each term of the query when creating composite queries. |
|
A set of flags specifying which features of the
|
|
Whether terms of prefix queries should be automatically
analyzed or not. If |
|
If set to |
|
The minimum number of clauses that must match for a
document to be returned. See the
|
|
A suffix to append to fields for quoted parts of the query string. This allows to use a field that has a different analysis chain for exact matching. Look here for a comprehensive example. |
|
Whether phrase queries should be automatically generated
for multi terms synonyms.
Defaults to |
|
deprecated[6.0.0, set |
|
Set the prefix length for fuzzy queries. Default
is |
|
Controls the number of terms fuzzy queries will
expand to. Defaults to |
|
Set to |
Simple Query String Syntax
The simple_query_string
supports the following special characters:
-
+
signifies AND operation -
|
signifies OR operation -
-
negates a single token -
"
wraps a number of tokens to signify a phrase for searching -
*
at the end of a term signifies a prefix query -
(
and)
signify precedence -
~N
after a word signifies edit distance (fuzziness) -
~N
after a phrase signifies slop amount
In order to search for any of these special characters, they will need to
be escaped with \
.
Be aware that this syntax may have a different behavior depending on the
default_operator
value. For example, consider the following query:
GET /_search
{
"query": {
"simple_query_string" : {
"fields" : ["content"],
"query" : "foo bar -baz"
}
}
}
You may expect that documents containing only "foo" or "bar" will be returned,
as long as they do not contain "baz", however, due to the default_operator
being OR, this really means "match documents that contain "foo" or documents
that contain "bar", or documents that don’t contain "baz". If this is unintended
then the query can be switched to "foo bar +-baz"
which will not return
documents that contain "baz".
Default Field
When not explicitly specifying the field to search on in the query
string syntax, the index.query.default_field
will be used to derive
which fields to search on. It defaults to *
and the query will automatically
attempt to determine the existing fields in the index’s mapping that are queryable,
and perform the search on those fields.
Multi Field
The fields parameter can also include pattern based field names, allowing to automatically expand to the relevant fields (dynamically introduced fields included). For example:
GET /_search
{
"query": {
"simple_query_string" : {
"fields" : ["content", "name.*^5"],
"query" : "foo bar baz"
}
}
}
Flags
simple_query_string
support multiple flags to specify which parsing features
should be enabled. It is specified as a |
-delimited string with the
flags
parameter:
GET /_search
{
"query": {
"simple_query_string" : {
"query" : "foo | bar + baz*",
"flags" : "OR|AND|PREFIX"
}
}
}
The available flags are:
Flag | Description |
---|---|
|
Enables all parsing features. This is the default. |
|
Switches off all parsing features. |
|
Enables the |
|
Enables the |
|
Enables the |
|
Enables the |
|
Enables the |
|
Enables the |
|
Enables |
|
Enables whitespaces as split characters. |
|
Enables the |
|
Enables the |
|
Synonymous to |
Synonyms
The simple_query_string
query supports multi-terms synonym expansion with the synonym_graph token filter. When this filter is used, the parser creates a phrase query for each multi-terms synonyms.
For example, the following synonym: "ny, new york" would produce:
(ny OR ("new york"))
It is also possible to match multi terms synonyms with conjunctions instead:
GET /_search
{
"query": {
"simple_query_string" : {
"query" : "ny city",
"auto_generate_synonyms_phrase_query" : false
}
}
}
The example above creates a boolean query:
(ny OR (new AND york)) city)
that matches documents with the term ny
or the conjunction new AND york
.
By default the parameter auto_generate_synonyms_phrase_query
is set to true
.
Term-level queries
You can use term-level queries to find documents based on precise values in structured data. Examples of structured data include date ranges, IP addresses, prices, or product IDs.
Unlike full-text queries, term-level queries do not analyze search terms. Instead, term-level queries match the exact terms stored in a field.
Note
|
Term-level queries still normalize search terms for |
Types of term-level queries
term
query-
Returns documents that contain an exact term in a provided field.
terms
query-
Returns documents that contain one or more exact terms in a provided field.
terms_set
query-
Returns documents that contain a minimum number of exact terms in a provided field. You can define the minimum number of matching terms using a field or script.
range
query-
Returns documents that contain terms within a provided range.
exists
query-
Returns documents that contain any indexed value for a field.
prefix
query-
Returns documents that contain a specific prefix in a provided field.
wildcard
query-
Returns documents that contain terms matching a wildcard pattern.
regexp
query-
Returns documents that contain terms matching a regular expression.
fuzzy
query-
Returns documents that contain terms similar to the search term. {es} measures similarity, or fuzziness, using a Levenshtein edit distance.
type
query-
Returns documents of the specified type.
ids
query-
Returns documents based on their document IDs.
Term Query
The term
query finds documents that contain the exact term specified
in the inverted index. For instance:
POST _search
{
"query": {
"term" : { "user" : "Kimchy" } (1)
}
}
-
Finds documents which contain the exact term
Kimchy
in the inverted index of theuser
field.
A boost
parameter can be specified to give this term
query a higher
relevance score than another query, for instance:
GET _search
{
"query": {
"bool": {
"should": [
{
"term": {
"status": {
"value": "urgent",
"boost": 2.0 (1)
}
}
},
{
"term": {
"status": "normal" (2)
}
}
]
}
}
}
-
The
urgent
query clause has a boost of2.0
, meaning it is twice as important as the query clause fornormal
. -
The
normal
clause has the default neutral boost of1.0
.
A term
query can also match against range data types.
Terms Query
Filters documents that have fields that match any of the provided terms (not analyzed). For example:
GET /_search
{
"query": {
"terms" : { "user" : ["kimchy", "elasticsearch"]}
}
}
Note
|
Highlighting terms queries is best-effort only, so terms of a terms
query might not be highlighted depending on the highlighter implementation that
is selected and on the number of terms in the terms query.
|
Terms lookup mechanism
When it’s needed to specify a terms
filter with a lot of terms it can
be beneficial to fetch those term values from a document in an index. A
concrete example would be to filter tweets tweeted by your followers.
Potentially the amount of user ids specified in the terms filter can be
a lot. In this scenario it makes sense to use the terms filter’s terms
lookup mechanism.
The terms lookup mechanism supports the following options:
index
|
The index to fetch the term values from. |
type
|
The type to fetch the term values from. |
id
|
The id of the document to fetch the term values from. |
path
|
The field specified as path to fetch the actual values for the
|
routing
|
A custom routing value to be used when retrieving the external terms doc. |
The values for the terms
filter will be fetched from a field in a
document with the specified id in the specified type and index.
Internally a get request is executed to fetch the values from the
specified path. At the moment for this feature to work the _source
needs to be stored.
Also, consider using an index with a single shard and fully replicated across all nodes if the "reference" terms data is not large. The lookup terms filter will prefer to execute the get request on a local node if possible, reducing the need for networking.
Warning
|
Executing a Terms Query request with a lot of terms can be quite slow,
as each additional term demands extra processing and memory.
To safeguard against this, the maximum number of terms that can be used
in a Terms Query both directly or through lookup has been limited to 65536 .
This default maximum can be changed for a particular index with the index setting
index.max_terms_count .
|
Terms lookup twitter example
At first we index the information for user with id 2, specifically, its followers, then index a tweet from user with id 1. Finally we search on all the tweets that match the followers of user 2.
PUT /users/_doc/2
{
"followers" : ["1", "3"]
}
PUT /tweets/_doc/1
{
"user" : "1"
}
GET /tweets/_search
{
"query" : {
"terms" : {
"user" : {
"index" : "users",
"type" : "_doc",
"id" : "2",
"path" : "followers"
}
}
}
}
The structure of the external terms document can also include an array of inner objects, for example:
PUT /users/_doc/2
{
"followers" : [
{
"id" : "1"
},
{
"id" : "2"
}
]
}
In which case, the lookup path will be followers.id
.
Terms Set Query
Returns any documents that match with at least one or more of the provided terms. The terms are not analyzed and thus must match exactly. The number of terms that must match varies per document and is either controlled by a minimum should match field or computed per document in a minimum should match script.
The field that controls the number of required terms that must match must be a number field:
PUT /my-index
{
"mappings": {
"_doc": {
"properties": {
"required_matches": {
"type": "long"
}
}
}
}
}
PUT /my-index/_doc/1?refresh
{
"codes": ["ghi", "jkl"],
"required_matches": 2
}
PUT /my-index/_doc/2?refresh
{
"codes": ["def", "ghi"],
"required_matches": 2
}
An example that uses the minimum should match field:
GET /my-index/_search
{
"query": {
"terms_set": {
"codes" : {
"terms" : ["abc", "def", "ghi"],
"minimum_should_match_field": "required_matches"
}
}
}
}
Response:
{
"took": 13,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped" : 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.5753642,
"hits": [
{
"_index": "my-index",
"_type": "_doc",
"_id": "2",
"_score": 0.5753642,
"_source": {
"codes": ["def", "ghi"],
"required_matches": 2
}
}
]
}
}
Scripts can also be used to control how many terms are required to match in a more dynamic way. For example a create date or a popularity field can be used as basis for the number of required terms to match.
Also the params.num_terms
parameter is available in the script to indicate the
number of terms that have been specified.
An example that always limits the number of required terms to match to never become larger than the number of terms specified:
GET /my-index/_search
{
"query": {
"terms_set": {
"codes" : {
"terms" : ["abc", "def", "ghi"],
"minimum_should_match_script": {
"source": "Math.min(params.num_terms, doc['required_matches'].value)"
}
}
}
}
}
Range Query
Matches documents with fields that have terms within a certain range.
The type of the Lucene query depends on the field type, for string
fields, the TermRangeQuery
, while for number/date fields, the query is
a NumericRangeQuery
. The following example returns all documents where
age
is between 10
and 20
:
GET _search
{
"query": {
"range" : {
"age" : {
"gte" : 10,
"lte" : 20,
"boost" : 2.0
}
}
}
}
The range
query accepts the following parameters:
gte
|
Greater-than or equal to |
gt
|
Greater-than |
lte
|
Less-than or equal to |
lt
|
Less-than |
boost
|
Sets the boost value of the query, defaults to |
Ranges on date fields
When running range
queries on fields of type date
, ranges can be
specified using [date-math]:
GET _search
{
"query": {
"range" : {
"date" : {
"gte" : "now-1d/d",
"lt" : "now/d"
}
}
}
}
Date math and rounding
When using date math to round dates to the nearest day, month, hour, etc, the rounded dates depend on whether the ends of the ranges are inclusive or exclusive.
Rounding up moves to the last millisecond of the rounding scope, and rounding down to the first millisecond of the rounding scope. For example:
gt
|
Greater than the date rounded up: |
gte
|
Greater than or equal to the date rounded down: |
lt
|
Less than the date rounded down: |
lte
|
Less than or equal to the date rounded up: |
Date format in range queries
Formatted dates will be parsed using the format
specified on the date
field by default, but it can be overridden by
passing the format
parameter to the range
query:
GET _search
{
"query": {
"range" : {
"born" : {
"gte": "01/01/2012",
"lte": "2013",
"format": "dd/MM/yyyy||yyyy"
}
}
}
}
Note that if the date misses some of the year, month and day coordinates, the
missing parts are filled with the start of
unix time, which is January 1st, 1970.
This means, that when e.g. specifying dd
as the format, a value like "gte" : 10
will translate to 1970-01-10T00:00:00.000Z
.
Time zone in range queries
Dates can be converted from another timezone to UTC either by specifying the
time zone in the date value itself (if the format
accepts it), or it can be specified as the time_zone
parameter:
GET _search
{
"query": {
"range" : {
"timestamp" : {
"gte": "2015-01-01T00:00:00", (1)
"lte": "now", (2)
"time_zone": "+01:00"
}
}
}
}
-
This date will be converted to
2014-12-31T23:00:00 UTC
. -
now
is not affected by thetime_zone
parameter, its always the current system time (in UTC). However, when using date math rounding (e.g. down to the nearest day usingnow/d
), the providedtime_zone
will be considered.
Querying range fields
range
queries can be used on fields of type range
, allowing to
match a range specified in the query with a range field value in the document.
The relation
parameter controls how these two ranges are matched:
WITHIN
|
Matches documents who’s range field is entirely within the query’s range. |
CONTAINS
|
Matches documents who’s range field entirely contains the query’s range. |
INTERSECTS
|
Matches documents who’s range field intersects the query’s range. This is the default value when querying range fields. |
For examples, see range
mapping type.
Exists Query
Returns documents that contain a value other than null
or []
in a provided
field.
Example request
GET /_search
{
"query": {
"exists": {
"field": "user"
}
}
}
Top-level parameters for exists
field
-
(Required, string) Name of the field you wish to search.
To return a document, this field must exist and contain a value other than
null
or[]
. These values can include:-
Empty strings, such as
""
or"-"
-
Arrays containing
null
and another value, such as[null, "foo"]
-
A custom
null-value
, defined in field mapping
-
Notes
Find documents with null values
To find documents that contain only null
values or []
in a provided field,
use the must_not
boolean query with the exists
query.
The following search returns documents that contain only null
values or []
in the user
field.
GET /_search
{
"query": {
"bool": {
"must_not": {
"exists": {
"field": "user"
}
}
}
}
}
Prefix Query
Matches documents that have fields containing terms with a specified
prefix (not analyzed). The prefix query maps to Lucene PrefixQuery
.
The following matches documents where the user field contains a term
that starts with ki
:
GET /_search
{ "query": {
"prefix" : { "user" : "ki" }
}
}
A boost can also be associated with the query:
GET /_search
{ "query": {
"prefix" : { "user" : { "value" : "ki", "boost" : 2.0 } }
}
}
This multi term query allows you to control how it gets rewritten using the rewrite parameter.
Wildcard Query
Returns documents that contain terms matching a wildcard pattern.
A wildcard operator is a placeholder that matches one or more characters. For
example, the *
wildcard operator matches zero or more characters. You can
combine wildcard operators with other characters to create a wildcard pattern.
Example request
The following search returns documents where the user
field contains a term
that begins with ki
and ends with y
. These matching terms can include kiy
,
kity
, or kimchy
.
GET /_search
{
"query": {
"wildcard": {
"user": {
"value": "ki*y",
"boost": 1.0,
"rewrite": "constant_score"
}
}
}
}
Top-level parameters for wildcard
<field>
-
(Required, object) Field you wish to search.
Parameters for <field>
value
-
(Required, string) Wildcard pattern for terms you wish to find in the provided
<field>
.This parameter supports two wildcard operators:
-
?
, which matches any single character -
*
, which can match zero or more characters, including an empty one
WarningAvoid beginning patterns with *
or?
. This can increase the iterations needed to find matching terms and slow search performance. -
boost
-
(Optional, float) Floating point number used to decrease or increase the relevance scores of a query. Defaults to
1.0
.You can use the
boost
parameter to adjust relevance scores for searches containing two or more queries.Boost values are relative to the default value of
1.0
. A boost value between0
and1.0
decreases the relevance score. A value greater than1.0
increases the relevance score. rewrite
-
(Optional, string) Method used to rewrite the query. For valid values and more information, see the
rewrite
parameter.
Regexp Query
The regexp
query allows you to use regular expression term queries.
See Regular expression syntax for details of the supported regular expression language.
The "term queries" in that first sentence means that Elasticsearch will apply
the regexp to the terms produced by the tokenizer for that field, and not
to the original text of the field.
Note: The performance of a regexp
query heavily depends on the
regular expression chosen. Matching everything like .
is very slow as
well as using lookaround regular expressions. If possible, you should
try to use a long prefix before your regular expression starts. Wildcard
matchers like .?+
will mostly lower performance.
GET /_search
{
"query": {
"regexp":{
"name.first": "s.*y"
}
}
}
Boosting is also supported
GET /_search
{
"query": {
"regexp":{
"name.first":{
"value":"s.*y",
"boost":1.2
}
}
}
}
You can also use special flags
GET /_search
{
"query": {
"regexp":{
"name.first": {
"value": "s.*y",
"flags" : "INTERSECTION|COMPLEMENT|EMPTY"
}
}
}
}
Possible flags are ALL
(default), ANYSTRING
, COMPLEMENT
,
EMPTY
, INTERSECTION
, INTERVAL
, or NONE
. Please check the
Lucene
documentation for their meaning
Regular expressions are dangerous because it’s easy to accidentally
create an innocuous looking one that requires an exponential number of
internal determinized automaton states (and corresponding RAM and CPU)
for Lucene to execute. Lucene prevents these using the
max_determinized_states
setting (defaults to 10000). You can raise
this limit to allow more complex regular expressions to execute.
GET /_search
{
"query": {
"regexp":{
"name.first": {
"value": "s.*y",
"flags" : "INTERSECTION|COMPLEMENT|EMPTY",
"max_determinized_states": 20000
}
}
}
}
Note
|
By default the maximum length of regex string allowed in a Regexp Query
is limited to 1000. You can update the index.max_regex_length index setting
to bypass this limit.
|
Regular expression syntax
Regular expression queries are supported by the regexp
and the query_string
queries. The Lucene regular expression engine
is not Perl-compatible but supports a smaller range of operators.
Note
|
We will not attempt to explain regular expressions, but just explain the supported operators. |
Standard operators
- Anchoring
-
Most regular expression engines allow you to match any part of a string. If you want the regexp pattern to start at the beginning of the string or finish at the end of the string, then you have to anchor it specifically, using
^
to indicate the beginning or$
to indicate the end.Lucene’s patterns are always anchored. The pattern provided must match the entire string. For string
"abcde"
:ab.* # match abcd # no match
- Allowed characters
-
Any Unicode characters may be used in the pattern, but certain characters are reserved and must be escaped. The standard reserved characters are:
. ? + * | { } [ ] ( ) " \
If you enable optional features (see below) then these characters may also be reserved:
# @ & < > ~
Any reserved character can be escaped with a backslash
"\*"
including a literal backslash character:"\\"
Additionally, any characters (except double quotes) are interpreted literally when surrounded by double quotes:
john"@smith.com"
- Match any character
-
The period
"."
can be used to represent any character. For string"abcde"
:ab... # match a.c.e # match
- One-or-more
-
The plus sign
"+"
can be used to repeat the preceding shortest pattern once or more times. For string"aaabbb"
:a+b+ # match aa+bb+ # match a+.+ # match aa+bbb+ # match
- Zero-or-more
-
The asterisk
"*"
can be used to match the preceding shortest pattern zero-or-more times. For string `"aaabbb`":a*b* # match a*b*c* # match .*bbb.* # match aaa*bbb* # match
- Zero-or-one
-
The question mark
"?"
makes the preceding shortest pattern optional. It matches zero or one times. For string"aaabbb"
:aaa?bbb? # match aaaa?bbbb? # match .....?.? # match aa?bb? # no match
- Min-to-max
-
Curly brackets
"{}"
can be used to specify a minimum and (optionally) a maximum number of times the preceding shortest pattern can repeat. The allowed forms are:{5} # repeat exactly 5 times {2,5} # repeat at least twice and at most 5 times {2,} # repeat at least twice
For string
"aaabbb"
:a{3}b{3} # match a{2,4}b{2,4} # match a{2,}b{2,} # match .{3}.{3} # match a{4}b{4} # no match a{4,6}b{4,6} # no match a{4,}b{4,} # no match
- Grouping
-
Parentheses
"()"
can be used to form sub-patterns. The quantity operators listed above operate on the shortest previous pattern, which can be a group. For string"ababab"
:(ab)+ # match ab(ab)+ # match (..)+ # match (...)+ # no match (ab)* # match abab(ab)? # match ab(ab)? # no match (ab){3} # match (ab){1,2} # no match
- Alternation
-
The pipe symbol
"|"
acts as an OR operator. The match will succeed if the pattern on either the left-hand side OR the right-hand side matches. The alternation applies to the longest pattern, not the shortest. For string"aabb"
:aabb|bbaa # match aacc|bb # no match aa(cc|bb) # match a+|b+ # no match a+b+|b+a+ # match a+(b|c)+ # match
- Character classes
-
Ranges of potential characters may be represented as character classes by enclosing them in square brackets
"[]"
. A leading^
negates the character class. The allowed forms are:[abc] # 'a' or 'b' or 'c' [a-c] # 'a' or 'b' or 'c' [-abc] # '-' or 'a' or 'b' or 'c' [abc\-] # '-' or 'a' or 'b' or 'c' [^abc] # any character except 'a' or 'b' or 'c' [^a-c] # any character except 'a' or 'b' or 'c' [^-abc] # any character except '-' or 'a' or 'b' or 'c' [^abc\-] # any character except '-' or 'a' or 'b' or 'c'
Note that the dash
"-"
indicates a range of characters, unless it is the first character or if it is escaped with a backslash.For string
"abcd"
:ab[cd]+ # match [a-d]+ # match [^a-d]+ # no match
Optional operators
These operators are available by default as the flags
parameter defaults to ALL
.
Different flag combinations (concatenated with "|"
) can be used to enable/disable
specific operators:
{ "regexp": { "username": { "value": "john~athon<1-5>", "flags": "COMPLEMENT|INTERVAL" } } }
- Complement
-
The complement is probably the most useful option. The shortest pattern that follows a tilde
"~"
is negated. For instance, `"ab~cd" means:-
Starts with
a
-
Followed by
b
-
Followed by a string of any length that it anything but
c
-
Ends with
d
For the string
"abcdef"
:ab~df # match ab~cf # match ab~cdef # no match a~(cb)def # match a~(bc)def # no match
Enabled with the
COMPLEMENT
orALL
flags. -
- Interval
-
The interval option enables the use of numeric ranges, enclosed by angle brackets
"<>"
. For string:"foo80"
:foo<1-100> # match foo<01-100> # match foo<001-100> # no match
Enabled with the
INTERVAL
orALL
flags. - Intersection
-
The ampersand
"&"
joins two patterns in a way that both of them have to match. For string"aaabbb"
:aaa.+&.+bbb # match aaa&bbb # no match
Using this feature usually means that you should rewrite your regular expression.
Enabled with the
INTERSECTION
orALL
flags. - Any string
-
The at sign
"@"
matches any string in its entirety. This could be combined with the intersection and complement above to express ``everything except''. For instance:@&~(foo.+) # anything except string beginning with "foo"
Enabled with the
ANYSTRING
orALL
flags.
Fuzzy Query
The fuzzy query uses similarity based on Levenshtein edit distance.
String fields
The fuzzy
query generates matching terms that are within the
maximum edit distance specified in fuzziness
and then checks the term
dictionary to find out which of those generated terms actually exist in the
index. The final query uses up to max_expansions
matching terms.
Here is a simple example:
GET /_search
{
"query": {
"fuzzy" : { "user" : "ki" }
}
}
Or with more advanced settings:
GET /_search
{
"query": {
"fuzzy" : {
"user" : {
"value": "ki",
"boost": 1.0,
"fuzziness": 2,
"prefix_length": 0,
"max_expansions": 100
}
}
}
}
Parameters
fuzziness
|
The maximum edit distance. Defaults to |
prefix_length
|
The number of initial characters which will not be |
max_expansions
|
The maximum number of terms that the |
transpositions
|
Whether fuzzy transpositions ( |
Warning
|
This query can be very heavy if prefix_length is set to 0 and if
max_expansions is set to a high number. It could result in every term in the
index being examined!
|
Type Query
Filters documents matching the provided document / mapping type.
GET /_search
{
"query": {
"type" : {
"value" : "_doc"
}
}
}
Ids Query
Returns documents based on their IDs. This query uses document IDs stored in
the _id
field.
Example request
GET /_search
{
"query": {
"ids" : {
"type" : "_doc",
"values" : ["1", "4", "100"]
}
}
}
Top-level parameters for ids
values
-
An array of document IDs.
Compound queries
Compound queries wrap other compound or leaf queries, either to combine their results and scores, to change their behaviour, or to switch from query to filter context.
The queries in this group are:
constant_score
query-
A query which wraps another query, but executes it in filter context. All matching documents are given the same
`constant'' `_score
. bool
query-
The default query for combining multiple leaf or compound query clauses, as
must
,should
,must_not
, orfilter
clauses. Themust
andshould
clauses have their scores combined — the more matching clauses, the better — while themust_not
andfilter
clauses are executed in filter context. dis_max
query-
A query which accepts multiple queries, and returns any documents which match any of the query clauses. While the
bool
query combines the scores from all matching queries, thedis_max
query uses the score of the single best- matching query clause. function_score
query-
Modify the scores returned by the main query with functions to take into account factors like popularity, recency, distance, or custom algorithms implemented with scripting.
boosting
query-
Return documents which match a
positive
query, but reduce the score of documents which also match anegative
query.
Constant Score Query
Wraps a filter query and returns every matching
document with a relevance score equal to the boost
parameter value.
GET /_search
{
"query": {
"constant_score" : {
"filter" : {
"term" : { "user" : "kimchy"}
},
"boost" : 1.2
}
}
}
Top-level parameters for constant_score
filter
-
(Required, query object) Filter query you wish to run. Any returned documents must match this query.
Filter queries do not calculate relevance scores. To speed up performance, {es} automatically caches frequently used filter queries.
boost
-
(Optional, float) Floating point number used as the constant relevance score for every document matching the
filter
query. Defaults to1.0
.
Bool Query
A query that matches documents matching boolean combinations of other
queries. The bool query maps to Lucene BooleanQuery
. It is built using
one or more boolean clauses, each clause with a typed occurrence. The
occurrence types are:
Occur | Description |
---|---|
|
The clause (query) must appear in matching documents and will contribute to the score. |
|
The clause (query) must appear in matching documents. However unlike
|
|
The clause (query) should appear in the matching document. If the
|
|
The clause (query) must not appear in the matching
documents. Clauses are executed in filter context meaning
that scoring is ignored and clauses are considered for caching. Because scoring is
ignored, a score of |
Important
|
Bool query in filter context
If this query is used in a filter context and it has |
The bool
query takes a more-matches-is-better approach, so the score from
each matching must
or should
clause will be added together to provide the
final _score
for each document.
POST _search
{
"query": {
"bool" : {
"must" : {
"term" : { "user" : "kimchy" }
},
"filter": {
"term" : { "tag" : "tech" }
},
"must_not" : {
"range" : {
"age" : { "gte" : 10, "lte" : 20 }
}
},
"should" : [
{ "term" : { "tag" : "wow" } },
{ "term" : { "tag" : "elasticsearch" } }
],
"minimum_should_match" : 1,
"boost" : 1.0
}
}
}
Using minimum_should_match
You can use the minimum_should_match
parameter to specify the number or
percentage of should
clauses returned documents must match.
If the bool
query includes at least one should
clause and no must
or
filter
clauses, the default value is 1
.
Otherwise, the default value is 0
.
For other valid values, see the
minimum_should_match
parameter.
Scoring with bool.filter
Queries specified under the filter
element have no effect on scoring — scores are returned as 0
. Scores are only affected by the query that has
been specified. For instance, all three of the following queries return
all documents where the status
field contains the term active
.
This first query assigns a score of 0
to all documents, as no scoring
query has been specified:
GET _search
{
"query": {
"bool": {
"filter": {
"term": {
"status": "active"
}
}
}
}
}
This bool
query has a match_all
query, which assigns a score of 1.0
to
all documents.
GET _search
{
"query": {
"bool": {
"must": {
"match_all": {}
},
"filter": {
"term": {
"status": "active"
}
}
}
}
}
This constant_score
query behaves in exactly the same way as the second example above.
The constant_score
query assigns a score of 1.0
to all documents matched
by the filter.
GET _search
{
"query": {
"constant_score": {
"filter": {
"term": {
"status": "active"
}
}
}
}
}
Using named queries to see which clauses matched
If you need to know which of the clauses in the bool query matched the documents returned from the query, you can use named queries to assign a name to each clause.
Dis Max Query
A query that generates the union of documents produced by its subqueries, and that scores each document with the maximum score for that document as produced by any subquery, plus a tie breaking increment for any additional matching subqueries.
This is useful when searching for a word in multiple fields with different boost factors (so that the fields cannot be combined equivalently into a single search field). We want the primary score to be the one associated with the highest boost, not the sum of the field scores (as Boolean Query would give). If the query is "albino elephant" this ensures that "albino" matching one field and "elephant" matching another gets a higher score than "albino" matching both fields. To get this result, use both Boolean Query and DisjunctionMax Query: for each term a DisjunctionMaxQuery searches for it in each field, while the set of these DisjunctionMaxQuery’s is combined into a BooleanQuery.
The tie breaker capability allows results that include the same term in
multiple fields to be judged better than results that include this term
in only the best of those multiple fields, without confusing this with
the better case of two different terms in the multiple fields.The
default tie_breaker
is 0.0
.
This query maps to Lucene DisjunctionMaxQuery
.
GET /_search
{
"query": {
"dis_max" : {
"tie_breaker" : 0.7,
"boost" : 1.2,
"queries" : [
{
"term" : { "age" : 34 }
},
{
"term" : { "age" : 35 }
}
]
}
}
}
Function Score Query
The function_score
allows you to modify the score of documents that are
retrieved by a query. This can be useful if, for example, a score
function is computationally expensive and it is sufficient to compute
the score on a filtered set of documents.
To use function_score
, the user has to define a query and one or
more functions, that compute a new score for each document returned
by the query.
function_score
can be used with only one function like this:
GET /_search
{
"query": {
"function_score": {
"query": { "match_all": {} },
"boost": "5",
"random_score": {}, (1)
"boost_mode":"multiply"
}
}
}
-
See [score-functions] for a list of supported functions.
Furthermore, several functions can be combined. In this case one can optionally choose to apply the function only if a document matches a given filtering query
GET /_search
{
"query": {
"function_score": {
"query": { "match_all": {} },
"boost": "5", (1)
"functions": [
{
"filter": { "match": { "test": "bar" } },
"random_score": {}, (2)
"weight": 23
},
{
"filter": { "match": { "test": "cat" } },
"weight": 42
}
],
"max_boost": 42,
"score_mode": "max",
"boost_mode": "multiply",
"min_score" : 42
}
}
}
-
Boost for the whole query.
-
See [score-functions] for a list of supported functions.
Note
|
The scores produced by the filtering query of each function do not matter. |
If no filter is given with a function this is equivalent to specifying
"match_all": {}
First, each document is scored by the defined functions. The parameter
score_mode
specifies how the computed scores are combined:
multiply
|
scores are multiplied (default) |
sum
|
scores are summed |
avg
|
scores are averaged |
first
|
the first function that has a matching filter is applied |
max
|
maximum score is used |
min
|
minimum score is used |
Because scores can be on different scales (for example, between 0 and 1 for decay functions but arbitrary for field_value_factor
) and also
because sometimes a different impact of functions on the score is desirable, the score of each function can be adjusted with a user defined
weight
. The weight
can be defined per function in the functions
array (example above) and is multiplied with the score computed by
the respective function.
If weight is given without any other function declaration, weight
acts as a function that simply returns the weight
.
In case score_mode
is set to avg
the individual scores will be combined by a weighted average.
For example, if two functions return score 1 and 2 and their respective weights are 3 and 4, then their scores will be combined as
(1*3+2*4)/(3+4)
and not (1*3+2*4)/2
.
The new score can be restricted to not exceed a certain limit by setting
the max_boost
parameter. The default for max_boost
is FLT_MAX.
The newly computed score is combined with the score of the
query. The parameter boost_mode
defines how:
multiply
|
query score and function score is multiplied (default) |
replace
|
only function score is used, the query score is ignored |
sum
|
query score and function score are added |
avg
|
average |
max
|
max of query score and function score |
min
|
min of query score and function score |
By default, modifying the score does not change which documents match. To exclude
documents that do not meet a certain score threshold the min_score
parameter can be set to the desired score threshold.
Note
|
For min_score to work, all documents returned by the query need to be scored and then filtered out one by one.
|
The function_score
query provides several types of score functions.
-
decay functions:
gauss
,linear
,exp
Script score
The script_score
function allows you to wrap another query and customize
the scoring of it optionally with a computation derived from other numeric
field values in the doc using a script expression. Here is a
simple sample:
GET /_search
{
"query": {
"function_score": {
"query": {
"match": { "message": "elasticsearch" }
},
"script_score" : {
"script" : {
"source": "Math.log(2 + doc['likes'].value)"
}
}
}
}
}
Important
|
In {es}, all document scores are positive 32-bit floating point numbers. If the Similarly, scores must be non-negative. Otherwise, {es} returns an error. |
On top of the different scripting field values and expression, the
_score
script parameter can be used to retrieve the score based on the
wrapped query.
Scripts compilation is cached for faster execution. If the script has parameters that it needs to take into account, it is preferable to reuse the same script, and provide parameters to it:
GET /_search
{
"query": {
"function_score": {
"query": {
"match": { "message": "elasticsearch" }
},
"script_score" : {
"script" : {
"params": {
"a": 5,
"b": 1.2
},
"source": "params.a / Math.pow(params.b, doc['likes'].value)"
}
}
}
}
}
Note that unlike the custom_score
query, the
score of the query is multiplied with the result of the script scoring. If
you wish to inhibit this, set "boost_mode": "replace"
Weight
The weight
score allows you to multiply the score by the provided
weight
. This can sometimes be desired since boost value set on
specific queries gets normalized, while for this score function it does
not. The number value is of type float.
"weight" : number
Random
The random_score
generates scores that are uniformly distributed from 0 up to
but not including 1. By default, it uses the internal Lucene doc ids as a
source of randomness, which is very efficient but unfortunately not
reproducible since documents might be renumbered by merges.
In case you want scores to be reproducible, it is possible to provide a seed
and field
. The final score will then be computed based on this seed, the
minimum value of field
for the considered document and a salt that is computed
based on the index name and shard id so that documents that have the same
value but are stored in different indexes get different scores. Note that
documents that are within the same shard and have the same value for field
will however get the same score, so it is usually desirable to use a field that
has unique values for all documents. A good default choice might be to use the
_seq_no
field, whose only drawback is that scores will change if the document
is updated since update operations also update the value of the _seq_no
field.
Note
|
It was possible to set a seed without setting a field, but this has been
deprecated as this requires loading fielddata on the _id field which consumes
a lot of memory.
|
GET /_search
{
"query": {
"function_score": {
"random_score": {
"seed": 10,
"field": "_seq_no"
}
}
}
}
Field Value factor
The field_value_factor
function allows you to use a field from a document to
influence the score. It’s similar to using the script_score
function, however,
it avoids the overhead of scripting. If used on a multi-valued field, only the
first value of the field is used in calculations.
As an example, imagine you have a document indexed with a numeric likes
field and wish to influence the score of a document with this field, an example
doing so would look like:
GET /_search
{
"query": {
"function_score": {
"field_value_factor": {
"field": "likes",
"factor": 1.2,
"modifier": "sqrt",
"missing": 1
}
}
}
}
Which will translate into the following formula for scoring:
sqrt(1.2 * doc['likes'].value)
There are a number of options for the field_value_factor
function:
field
|
Field to be extracted from the document. |
factor
|
Optional factor to multiply the field value with, defaults to |
modifier
|
Modifier to apply to the field value, can be one of: |
Modifier | Meaning |
---|---|
|
Do not apply any multiplier to the field value |
|
Take the common logarithm of the field value |
|
Add 1 to the field value and take the common logarithm |
|
Add 2 to the field value and take the common logarithm |
|
Take the natural logarithm of the field value |
|
Add 1 to the field value and take the natural logarithm |
|
Add 2 to the field value and take the natural logarithm |
|
Square the field value (multiply it by itself) |
|
Take the square root of the field value |
|
Reciprocate the field value, same as |
missing
-
Value used if the document doesn’t have that field. The modifier and factor are still applied to it as though it were read from the document.
Keep in mind that taking the log() of 0, or the square root of a negative number is an illegal operation, and an exception will be thrown. Be sure to limit the values of the field with a range filter to avoid this, or use `log1p` and `ln1p`.
Warning
|
Scores produced by the field_value_score function must be
non-negative, otherwise a deprecation warning will be issued.
|
Decay functions
Decay functions score a document with a function that decays depending on the distance of a numeric field value of the document from a user given origin. This is similar to a range query, but with smooth edges instead of boxes.
To use distance scoring on a query that has numerical fields, the user
has to define an origin
and a scale
for each field. The origin
is needed to define the `central point'' from which the distance
is calculated, and the `scale
to define the rate of decay. The
decay function is specified as
"DECAY_FUNCTION": { (1)
"FIELD_NAME": { (2)
"origin": "11, 12",
"scale": "2km",
"offset": "0km",
"decay": 0.33
}
}
-
The
DECAY_FUNCTION
should be one oflinear
,exp
, orgauss
. -
The specified field must be a numeric, date, or geo-point field.
In the above example, the field is a geo_point
and origin can
be provided in geo format. scale
and offset
must be given with a unit in
this case. If your field is a date field, you can set scale
and offset
as
days, weeks, and so on. Example:
GET /_search
{
"query": {
"function_score": {
"gauss": {
"date": {
"origin": "2013-09-17", (1)
"scale": "10d",
"offset": "5d", (2)
"decay" : 0.5 (2)
}
}
}
}
}
-
The date format of the origin depends on the
format
defined in your mapping. If you do not define the origin, the current time is used. -
The
offset
anddecay
parameters are optional.
origin
|
The point of origin used for calculating distance. Must be given as a
number for numeric field, date for date fields and geo point for geo fields.
Required for geo and numeric field. For date fields the default is |
scale
|
Required for all types. Defines the distance from origin + offset at which the computed
score will equal |
offset
|
If an |
decay
|
The |
In the first example, your documents might represents hotels and contain a geo location field. You want to compute a decay function depending on how far the hotel is from a given location. You might not immediately see what scale to choose for the gauss function, but you can say something like: "At a distance of 2km from the desired location, the score should be reduced to one third." The parameter "scale" will then be adjusted automatically to assure that the score function computes a score of 0.33 for hotels that are 2km away from the desired location.
In the second example, documents with a field value between 2013-09-12 and 2013-09-22 would get a weight of 1.0 and documents which are 15 days from that date a weight of 0.5.
Supported decay functions
The DECAY_FUNCTION
determines the shape of the decay:
gauss
-
Normal decay, computed as:
where
is computed to assure that the score takes the value
decay
at distancescale
fromorigin
+-offset
See Normal decay, keyword
gauss
for graphs demonstrating the curve generated by thegauss
function. exp
-
Exponential decay, computed as:
where again the parameter
is computed to assure that the score takes the value
decay
at distancescale
fromorigin
+-offset
See Exponential decay, keyword
exp
for graphs demonstrating the curve generated by theexp
function. linear
-
Linear decay, computed as:
.
where again the parameter
s
is computed to assure that the score takes the valuedecay
at distancescale
fromorigin
+-offset
In contrast to the normal and exponential decay, this function actually sets the score to 0 if the field value exceeds twice the user given scale value.
For single functions the three decay functions together with their parameters can be visualized like this (the field in this example called "age"):
Multi-values fields
If a field used for computing the decay contains multiple values, per default the value closest to the origin is chosen for determining the distance.
This can be changed by setting multi_value_mode
.
min
|
Distance is the minimum distance |
max
|
Distance is the maximum distance |
avg
|
Distance is the average distance |
sum
|
Distance is the sum of all distances |
Example:
"DECAY_FUNCTION": {
"FIELD_NAME": {
"origin": ...,
"scale": ...
},
"multi_value_mode": "avg"
}
Detailed example
Suppose you are searching for a hotel in a certain town. Your budget is limited. Also, you would like the hotel to be close to the town center, so the farther the hotel is from the desired location the less likely you are to check in.
You would like the query results that match your criterion (for example, "hotel, Nancy, non-smoker") to be scored with respect to distance to the town center and also the price.
Intuitively, you would like to define the town center as the origin and
maybe you are willing to walk 2km to the town center from the hotel.
In this case your origin for the location field is the town center
and the scale is ~2km.
If your budget is low, you would probably prefer something cheap above something expensive. For the price field, the origin would be 0 Euros and the scale depends on how much you are willing to pay, for example 20 Euros.
In this example, the fields might be called "price" for the price of the hotel and "location" for the coordinates of this hotel.
The function for price
in this case would be
"gauss": { (1)
"price": {
"origin": "0",
"scale": "20"
}
}
-
This decay function could also be
linear
orexp
.
and for location
:
"gauss": { (1)
"location": {
"origin": "11, 12",
"scale": "2km"
}
}
-
This decay function could also be
linear
orexp
.
Suppose you want to multiply these two functions on the original score, the request would look like this:
GET /_search
{
"query": {
"function_score": {
"functions": [
{
"gauss": {
"price": {
"origin": "0",
"scale": "20"
}
}
},
{
"gauss": {
"location": {
"origin": "11, 12",
"scale": "2km"
}
}
}
],
"query": {
"match": {
"properties": "balcony"
}
},
"score_mode": "multiply"
}
}
}
Next, we show how the computed score looks like for each of the three possible decay functions.
Normal decay, keyword gauss
When choosing gauss
as the decay function in the above example, the
contour and surface plot of the multiplier looks like this:


Suppose your original search results matches three hotels :
-
"Backback Nap"
-
"Drink n Drive"
-
"BnB Bellevue".
"Drink n Drive" is pretty far from your defined location (nearly 2 km) and is not too cheap (about 13 Euros) so it gets a low factor a factor of 0.56. "BnB Bellevue" and "Backback Nap" are both pretty close to the defined location but "BnB Bellevue" is cheaper, so it gets a multiplier of 0.86 whereas "Backpack Nap" gets a value of 0.66.
Exponential decay, keyword exp
When choosing exp
as the decay function in the above example, the
contour and surface plot of the multiplier looks like this:


Linear decay, keyword linear
When choosing linear
as the decay function in the above example, the
contour and surface plot of the multiplier looks like this:


Supported fields for decay functions
Only numeric, date, and geo-point fields are supported.
What if a field is missing?
If the numeric field is missing in the document, the function will return 1.
Boosting Query
The boosting
query can be used to effectively demote results that
match a given query. Unlike the "NOT" clause in bool query, this still
selects documents that contain undesirable terms, but reduces their
overall score.
GET /_search
{
"query": {
"boosting" : {
"positive" : {
"term" : {
"field1" : "value1"
}
},
"negative" : {
"term" : {
"field2" : "value2"
}
},
"negative_boost" : 0.2
}
}
}
Joining queries
Performing full SQL-style joins in a distributed system like Elasticsearch is prohibitively expensive. Instead, Elasticsearch offers two forms of join which are designed to scale horizontally.
nested
query-
Documents may contain fields of type
nested
. These fields are used to index arrays of objects, where each object can be queried (with thenested
query) as an independent document. has_child
andhas_parent
queries-
A
join
field relationship can exist between documents within a single index. Thehas_child
query returns parent documents whose child documents match the specified query, while thehas_parent
query returns child documents whose parent document matches the specified query.
Also see the terms-lookup mechanism in the terms
query, which allows you to build a terms
query from values contained in
another document.
Nested Query
Wraps another query to search nested fields.
The nested
query searches nested field objects as if they were indexed as
separate documents. If an object matches the search, the nested
query returns
the root parent document.
Example request
Index setup
To use the nested
query, your index must include a nested field
mapping. For example:
PUT /my_index
{
"mappings": {
"_doc" : {
"properties" : {
"obj1" : {
"type" : "nested"
}
}
}
}
}
Example query
GET /my_index/_search
{
"query": {
"nested" : {
"path" : "obj1",
"query" : {
"bool" : {
"must" : [
{ "match" : {"obj1.name" : "blue"} },
{ "range" : {"obj1.count" : {"gt" : 5}} }
]
}
},
"score_mode" : "avg"
}
}
}
Top-level parameters for nested
path
-
(Required, string) Path to the nested object you wish to search.
query
-
(Required, query object) Query you wish to run on nested objects in the
path
. If an object matches the search, thenested
query returns the root parent document.You can search nested fields using dot notation that includes the complete path, such as
obj1.name
.Multi-level nesting is automatically supported, and detected, resulting in an inner nested query to automatically match the relevant nesting level, rather than root, if it exists within another nested query.
score_mode
-
(Optional, string) Indicates how scores for matching child objects affect the root parent document’s relevance score. Valid values are:
avg
(Default)-
Use the mean relevance score of all matching child objects.
max
-
Uses the highest relevance score of all matching child objects.
min
-
Uses the lowest relevance score of all matching child objects.
none
-
Do not use the relevance scores of matching child objects. The query assigns parent documents a score of
0
. sum
-
Add together the relevance scores of all matching child objects.
ignore_unmapped
-
(Optional, boolean) Indicates whether to ignore an unmapped
path
and not return any documents instead of an error. Defaults tofalse
.If
false
, {es} returns an error if thepath
is an unmapped field.You can use this parameter to query multiple indices that may not contain the field
path
.
Has Child Query
The has_child
filter accepts a query and the child type to run against, and
results in parent documents that have child docs matching the query. Here is
an example:
GET /_search
{
"query": {
"has_child" : {
"type" : "blog_tag",
"query" : {
"term" : {
"tag" : "something"
}
}
}
}
}
Note that the has_child
is a slow query compared to other queries in the
query dsl due to the fact that it performs a join. The performance degrades
as the number of matching child documents pointing to unique parent documents
increases. If you care about query performance you should not use this query.
However if you do happen to use this query then use it as little as possible.
Each has_child
query that gets added to a search request can increase query
time significantly.
Scoring capabilities
The has_child
also has scoring support. The
supported score modes are min
, max
, sum
, avg
or none
. The default is
none
and yields the same behaviour as in previous versions. If the
score mode is set to another value than none
, the scores of all the
matching child documents are aggregated into the associated parent
documents. The score type can be specified with the score_mode
field
inside the has_child
query:
GET /_search
{
"query": {
"has_child" : {
"type" : "blog_tag",
"score_mode" : "min",
"query" : {
"term" : {
"tag" : "something"
}
}
}
}
}
Min/Max Children
The has_child
query allows you to specify that a minimum and/or maximum
number of children are required to match for the parent doc to be considered
a match:
GET /_search
{
"query": {
"has_child" : {
"type" : "blog_tag",
"score_mode" : "min",
"min_children": 2, (1)
"max_children": 10, (1)
"query" : {
"term" : {
"tag" : "something"
}
}
}
}
}
-
Both
min_children
andmax_children
are optional.
The min_children
and max_children
parameters can be combined with
the score_mode
parameter.
Ignore Unmapped
When set to true
the ignore_unmapped
option will ignore an unmapped type
and will not match any documents for this query. This can be useful when
querying multiple indexes which might have different mappings. When set to
false
(the default value) the query will throw an exception if the type
is not mapped.
Sorting
Parent documents can’t be sorted by fields in matching child documents via the
regular sort options. If you need to sort parent document by field in the child
documents then you should use the function_score
query and then just sort
by _score
.
Sorting blogs by child documents' click_count
field:
GET /_search
{
"query": {
"has_child" : {
"type" : "blog_tag",
"score_mode" : "max",
"query" : {
"function_score" : {
"script_score": {
"script": "_score * doc['click_count'].value"
}
}
}
}
}
}
Has Parent Query
The has_parent
query accepts a query and a parent type. The query is
executed in the parent document space, which is specified by the parent
type. This query returns child documents which associated parents have
matched. For the rest has_parent
query has the same options and works
in the same manner as the has_child
query.
GET /_search
{
"query": {
"has_parent" : {
"parent_type" : "blog",
"query" : {
"term" : {
"tag" : "something"
}
}
}
}
}
Note that the has_parent
is a slow query compared to other queries in the
query dsl due to the fact that it performs a join. The performance degrades
as the number of matching parent documents increases. If you care about query
performance you should not use this query. However if you do happen to use
this query then use it as less as possible. Each has_parent
query that gets
added to a search request can increase query time significantly.
Scoring capabilities
The has_parent
also has scoring support. The default is false
which
ignores the score from the parent document. The score is in this
case equal to the boost on the has_parent
query (Defaults to 1). If
the score is set to true
, then the score of the matching parent
document is aggregated into the child documents belonging to the
matching parent document. The score mode can be specified with the
score
field inside the has_parent
query:
GET /_search
{
"query": {
"has_parent" : {
"parent_type" : "blog",
"score" : true,
"query" : {
"term" : {
"tag" : "something"
}
}
}
}
}
Ignore Unmapped
When set to true
the ignore_unmapped
option will ignore an unmapped type
and will not match any documents for this query. This can be useful when
querying multiple indexes which might have different mappings. When set to
false
(the default value) the query will throw an exception if the type
is not mapped.
Sorting
Child documents can’t be sorted by fields in matching parent documents via the
regular sort options. If you need to sort child documents by field in the parent
documents then you should use the function_score
query and then just sort
by _score
.
Sorting tags by parent document' view_count
field:
GET /_search
{
"query": {
"has_parent" : {
"parent_type" : "blog",
"score" : true,
"query" : {
"function_score" : {
"script_score": {
"script": "_score * doc['view_count'].value"
}
}
}
}
}
}
Parent Id Query
The parent_id
query can be used to find child documents which belong to a particular parent.
Given the following mapping definition:
PUT my_index
{
"mappings": {
"_doc": {
"properties": {
"my_join_field": {
"type": "join",
"relations": {
"my_parent": "my_child"
}
}
}
}
}
}
PUT my_index/_doc/1?refresh
{
"text": "This is a parent document",
"my_join_field": "my_parent"
}
PUT my_index/_doc/2?routing=1&refresh
{
"text": "This is a child document",
"my_join_field": {
"name": "my_child",
"parent": "1"
}
}
GET /my_index/_search
{
"query": {
"parent_id": {
"type": "my_child",
"id": "1"
}
}
}
Parameters
This query has two required parameters:
type
|
The child type name, as specified in the |
id
|
The ID of the parent document. |
ignore_unmapped
|
When set to |
Geo queries
Elasticsearch supports two types of geo data:
geo_point
fields which support lat/lon pairs, and
geo_shape
fields, which support points,
lines, circles, polygons, multi-polygons, etc.
The queries in this group are:
geo_shape
query-
Finds documents with geo-shapes which either intersect, are contained by, or do not intersect with the specified geo-shape.
geo_bounding_box
query-
Finds documents with geo-points that fall into the specified rectangle.
geo_distance
query-
Finds documents with geo-points within the specified distance of a central point.
geo_polygon
query-
Find documents with geo-points within the specified polygon.
GeoShape Query
Filter documents indexed using the geo_shape
type.
Requires the geo_shape
Mapping.
The geo_shape
query uses the same grid square representation as the
geo_shape
mapping to find documents that have a shape that intersects
with the query shape. It will also use the same Prefix Tree configuration
as defined for the field mapping.
The query supports two ways of defining the query shape, either by providing a whole shape definition, or by referencing the name of a shape pre-indexed in another index. Both formats are defined below with examples.
Inline Shape Definition
Similar to the geo_shape
type, the geo_shape
query uses
GeoJSON to represent shapes.
Given the following index:
PUT /example
{
"mappings": {
"_doc": {
"properties": {
"location": {
"type": "geo_shape"
}
}
}
}
}
POST /example/_doc?refresh
{
"name": "Wind & Wetter, Berlin, Germany",
"location": {
"type": "point",
"coordinates": [13.400544, 52.530286]
}
}
The following query will find the point using the Elasticsearch’s
envelope
GeoJSON extension:
GET /example/_search
{
"query":{
"bool": {
"must": {
"match_all": {}
},
"filter": {
"geo_shape": {
"location": {
"shape": {
"type": "envelope",
"coordinates" : [[13.0, 53.0], [14.0, 52.0]]
},
"relation": "within"
}
}
}
}
}
}
Pre-Indexed Shape
The Query also supports using a shape which has already been indexed in another index and/or index type. This is particularly useful for when you have a pre-defined list of shapes which are useful to your application and you want to reference this using a logical name (for example 'New Zealand') rather than having to provide their coordinates each time. In this situation it is only necessary to provide:
-
id
- The ID of the document that containing the pre-indexed shape. -
index
- Name of the index where the pre-indexed shape is. Defaults to 'shapes'. -
type
- Index type where the pre-indexed shape is. -
path
- The field specified as path containing the pre-indexed shape. Defaults to 'shape'. -
routing
- The routing of the shape document if required.
The following is an example of using the Filter with a pre-indexed shape:
PUT /shapes
{
"mappings": {
"_doc": {
"properties": {
"location": {
"type": "geo_shape"
}
}
}
}
}
PUT /shapes/_doc/deu
{
"location": {
"type": "envelope",
"coordinates" : [[13.0, 53.0], [14.0, 52.0]]
}
}
GET /example/_search
{
"query": {
"bool": {
"filter": {
"geo_shape": {
"location": {
"indexed_shape": {
"index": "shapes",
"type": "_doc",
"id": "deu",
"path": "location"
}
}
}
}
}
}
}
Spatial Relations
The geo_shape strategy mapping parameter determines which spatial relation operators may be used at search time.
The following is a complete list of spatial relation operators available:
-
INTERSECTS
- (default) Return all documents whosegeo_shape
field intersects the query geometry. -
DISJOINT
- Return all documents whosegeo_shape
field has nothing in common with the query geometry. -
WITHIN
- Return all documents whosegeo_shape
field is within the query geometry. -
CONTAINS
- Return all documents whosegeo_shape
field contains the query geometry. Note: this is only supported using therecursive
Prefix Tree Strategy deprecated[6.6]
Ignore Unmapped
When set to true
the ignore_unmapped
option will ignore an unmapped field
and will not match any documents for this query. This can be useful when
querying multiple indexes which might have different mappings. When set to
false
(the default value) the query will throw an exception if the field
is not mapped.
Geo Bounding Box Query
A query allowing to filter hits based on a point location using a bounding box. Assuming the following indexed document:
PUT /my_locations
{
"mappings": {
"_doc": {
"properties": {
"pin": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
}
}
PUT /my_locations/_doc/1
{
"pin" : {
"location" : {
"lat" : 40.12,
"lon" : -71.34
}
}
}
Then the following simple query can be executed with a
geo_bounding_box
filter:
GET /_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_bounding_box" : {
"pin.location" : {
"top_left" : {
"lat" : 40.73,
"lon" : -74.1
},
"bottom_right" : {
"lat" : 40.01,
"lon" : -71.12
}
}
}
}
}
}
}
Query Options
Option | Description |
---|---|
|
Optional name field to identify the filter |
|
Set to |
|
Set to one of |
Accepted Formats
In much the same way the geo_point type can accept different representations of the geo point, the filter can accept it as well:
Lat Lon As Properties
GET /_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_bounding_box" : {
"pin.location" : {
"top_left" : {
"lat" : 40.73,
"lon" : -74.1
},
"bottom_right" : {
"lat" : 40.01,
"lon" : -71.12
}
}
}
}
}
}
}
Lat Lon As Array
Format in [lon, lat]
, note, the order of lon/lat here in order to
conform with GeoJSON.
GET /_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_bounding_box" : {
"pin.location" : {
"top_left" : [-74.1, 40.73],
"bottom_right" : [-71.12, 40.01]
}
}
}
}
}
}
Lat Lon As String
Format in lat,lon
.
GET /_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_bounding_box" : {
"pin.location" : {
"top_left" : "40.73, -74.1",
"bottom_right" : "40.01, -71.12"
}
}
}
}
}
}
Bounding Box as Well-Known Text (WKT)
GET /_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_bounding_box" : {
"pin.location" : {
"wkt" : "BBOX (-74.1, -71.12, 40.73, 40.01)"
}
}
}
}
}
}
Geohash
GET /_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_bounding_box" : {
"pin.location" : {
"top_left" : "dr5r9ydj2y73",
"bottom_right" : "drj7teegpus6"
}
}
}
}
}
}
Vertices
The vertices of the bounding box can either be set by top_left
and
bottom_right
or by top_right
and bottom_left
parameters. More
over the names topLeft
, bottomRight
, topRight
and bottomLeft
are supported. Instead of setting the values pairwise, one can use
the simple names top
, left
, bottom
and right
to set the
values separately.
GET /_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_bounding_box" : {
"pin.location" : {
"top" : 40.73,
"left" : -74.1,
"bottom" : 40.01,
"right" : -71.12
}
}
}
}
}
}
geo_point Type
The filter requires the geo_point
type to be set on the relevant
field.
Multi Location Per Document
The filter can work with multiple locations / points per document. Once a single location / point matches the filter, the document will be included in the filter
Type
The type of the bounding box execution by default is set to memory
,
which means in memory checks if the doc falls within the bounding box
range. In some cases, an indexed
option will perform faster (but note
that the geo_point
type must have lat and lon indexed in this case).
Note, when using the indexed option, multi locations per document field
are not supported. Here is an example:
GET /_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_bounding_box" : {
"pin.location" : {
"top_left" : {
"lat" : 40.73,
"lon" : -74.1
},
"bottom_right" : {
"lat" : 40.10,
"lon" : -71.12
}
},
"type" : "indexed"
}
}
}
}
}
Ignore Unmapped
When set to true
the ignore_unmapped
option will ignore an unmapped field
and will not match any documents for this query. This can be useful when
querying multiple indexes which might have different mappings. When set to
false
(the default value) the query will throw an exception if the field
is not mapped.
Notes on Precision
Geopoints have limited precision and are always rounded down during index time. During the query time, upper boundaries of the bounding boxes are rounded down, while lower boundaries are rounded up. As a result, the points along on the lower bounds (bottom and left edges of the bounding box) might not make it into the bounding box due to the rounding error. At the same time points alongside the upper bounds (top and right edges) might be selected by the query even if they are located slightly outside the edge. The rounding error should be less than 4.20e-8 degrees on the latitude and less than 8.39e-8 degrees on the longitude, which translates to less than 1cm error even at the equator.
Geo Distance Query
Filters documents that include only hits that exists within a specific distance from a geo point. Assuming the following mapping and indexed document:
PUT /my_locations
{
"mappings": {
"_doc": {
"properties": {
"pin": {
"properties": {
"location": {
"type": "geo_point"
}
}
}
}
}
}
}
PUT /my_locations/_doc/1
{
"pin" : {
"location" : {
"lat" : 40.12,
"lon" : -71.34
}
}
}
Then the following simple query can be executed with a geo_distance
filter:
GET /my_locations/_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "200km",
"pin.location" : {
"lat" : 40,
"lon" : -70
}
}
}
}
}
}
Accepted Formats
In much the same way the geo_point
type can accept different
representations of the geo point, the filter can accept it as well:
Lat Lon As Properties
GET /my_locations/_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "12km",
"pin.location" : {
"lat" : 40,
"lon" : -70
}
}
}
}
}
}
Lat Lon As Array
Format in [lon, lat]
, note, the order of lon/lat here in order to
conform with GeoJSON.
GET /my_locations/_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "12km",
"pin.location" : [-70, 40]
}
}
}
}
}
Lat Lon As String
Format in lat,lon
.
GET /my_locations/_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "12km",
"pin.location" : "40,-70"
}
}
}
}
}
Geohash
GET /my_locations/_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_distance" : {
"distance" : "12km",
"pin.location" : "drm3btev3e86"
}
}
}
}
}
Options
The following are options allowed on the filter:
distance
|
The radius of the circle centred on the specified location. Points which
fall into this circle are considered to be matches. The |
distance_type
|
How to compute the distance. Can either be |
_name
|
Optional name field to identify the query |
validation_method
|
Set to |
geo_point Type
The filter requires the geo_point
type to be set on the relevant
field.
Multi Location Per Document
The geo_distance
filter can work with multiple locations / points per
document. Once a single location / point matches the filter, the
document will be included in the filter.
Ignore Unmapped
When set to true
the ignore_unmapped
option will ignore an unmapped field
and will not match any documents for this query. This can be useful when
querying multiple indexes which might have different mappings. When set to
false
(the default value) the query will throw an exception if the field
is not mapped.
Geo Polygon Query
A query returning hits that only fall within a polygon of points. Here is an example:
GET /_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_polygon" : {
"person.location" : {
"points" : [
{"lat" : 40, "lon" : -70},
{"lat" : 30, "lon" : -80},
{"lat" : 20, "lon" : -90}
]
}
}
}
}
}
}
Query Options
Option | Description |
---|---|
|
Optional name field to identify the filter |
|
Set to |
Allowed Formats
Lat Long as Array
Format as [lon, lat]
Note: the order of lon/lat here must conform with GeoJSON.
GET /_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_polygon" : {
"person.location" : {
"points" : [
[-70, 40],
[-80, 30],
[-90, 20]
]
}
}
}
}
}
}
Lat Lon as String
Format in lat,lon
.
GET /_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_polygon" : {
"person.location" : {
"points" : [
"40, -70",
"30, -80",
"20, -90"
]
}
}
}
}
}
}
Geohash
GET /_search
{
"query": {
"bool" : {
"must" : {
"match_all" : {}
},
"filter" : {
"geo_polygon" : {
"person.location" : {
"points" : [
"drn5x1g8cu2y",
"30, -80",
"20, -90"
]
}
}
}
}
}
}
geo_point Type
The query requires the geo_point
type to be set on the
relevant field.
Ignore Unmapped
When set to true
the ignore_unmapped
option will ignore an unmapped field
and will not match any documents for this query. This can be useful when
querying multiple indexes which might have different mappings. When set to
false
(the default value) the query will throw an exception if the field
is not mapped.
Specialized queries
This group contains queries which do not fit into the other groups:
more_like_this
query-
This query finds documents which are similar to the specified text, document, or collection of documents.
script
query-
This query allows a script to act as a filter. Also see the
function_score
query. percolate
query-
This query finds queries that are stored as documents that match with the specified document.
wrapper
query-
A query that accepts other queries as json or yaml string.
More Like This Query
The More Like This Query finds documents that are "like" a given set of documents. In order to do so, MLT selects a set of representative terms of these input documents, forms a query using these terms, executes the query and returns the results. The user controls the input documents, how the terms should be selected and how the query is formed.
The simplest use case consists of asking for documents that are similar to a provided piece of text. Here, we are asking for all movies that have some text similar to "Once upon a time" in their "title" and in their "description" fields, limiting the number of selected terms to 12.
GET /_search
{
"query": {
"more_like_this" : {
"fields" : ["title", "description"],
"like" : "Once upon a time",
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
}
A more complicated use case consists of mixing texts with documents already existing in the index. In this case, the syntax to specify a document is similar to the one used in the Multi GET API.
GET /_search
{
"query": {
"more_like_this" : {
"fields" : ["title", "description"],
"like" : [
{
"_index" : "imdb",
"_type" : "movies",
"_id" : "1"
},
{
"_index" : "imdb",
"_type" : "movies",
"_id" : "2"
},
"and potentially some more text here as well"
],
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
}
Finally, users can mix some texts, a chosen set of documents but also provide documents not necessarily present in the index. To provide documents not present in the index, the syntax is similar to artificial documents.
GET /_search
{
"query": {
"more_like_this" : {
"fields" : ["name.first", "name.last"],
"like" : [
{
"_index" : "marvel",
"_type" : "quotes",
"doc" : {
"name": {
"first": "Ben",
"last": "Grimm"
},
"_doc": "You got no idea what I'd... what I'd give to be invisible."
}
},
{
"_index" : "marvel",
"_type" : "quotes",
"_id" : "2"
}
],
"min_term_freq" : 1,
"max_query_terms" : 12
}
}
}
How it Works
Suppose we wanted to find all documents similar to a given input document.
Obviously, the input document itself should be its best match for that type of
query. And the reason would be mostly, according to
Lucene scoring formula,
due to the terms with the highest tf-idf. Therefore, the terms of the input
document that have the highest tf-idf are good representatives of that
document, and could be used within a disjunctive query (or OR
) to retrieve similar
documents. The MLT query simply extracts the text from the input document,
analyzes it, usually using the same analyzer at the field, then selects the
top K terms with highest tf-idf to form a disjunctive query of these terms.
Important
|
The fields on which to perform MLT must be indexed and of type
text or keyword` . Additionally, when using like with documents, either
_source must be enabled or the fields must be stored or store
term_vector . In order to speed up analysis, it could help to store term
vectors at index time.
|
For example, if we wish to perform MLT on the "title" and "tags.raw" fields,
we can explicitly store their term_vector
at index time. We can still
perform MLT on the "description" and "tags" fields, as _source
is enabled by
default, but there will be no speed up on analysis for these fields.
PUT /imdb
{
"mappings": {
"movies": {
"properties": {
"title": {
"type": "text",
"term_vector": "yes"
},
"description": {
"type": "text"
},
"tags": {
"type": "text",
"fields" : {
"raw": {
"type" : "text",
"analyzer": "keyword",
"term_vector" : "yes"
}
}
}
}
}
}
}
Parameters
The only required parameter is like
, all other parameters have sensible
defaults. There are three types of parameters: one to specify the document
input, the other one for term selection and for query formation.
Document Input Parameters
like
|
The only required parameter of the MLT query is |
unlike
|
The |
fields
|
A list of fields to fetch and analyze the text from. Defaults to the |
like_text
|
The text to find documents like it. |
ids or docs
|
A list of documents following the same syntax as the Multi GET API. |
Term Selection Parameters
max_query_terms
|
The maximum number of query terms that will be selected. Increasing this value
gives greater accuracy at the expense of query execution speed. Defaults to
|
min_term_freq
|
The minimum term frequency below which the terms will be ignored from the
input document. Defaults to |
min_doc_freq
|
The minimum document frequency below which the terms will be ignored from the
input document. Defaults to |
max_doc_freq
|
The maximum document frequency above which the terms will be ignored from the
input document. This could be useful in order to ignore highly frequent words
such as stop words. Defaults to unbounded ( |
min_word_length
|
The minimum word length below which the terms will be ignored. The old name
|
max_word_length
|
The maximum word length above which the terms will be ignored. The old name
|
stop_words
|
An array of stop words. Any word in this set is considered "uninteresting" and ignored. If the analyzer allows for stop words, you might want to tell MLT to explicitly ignore them, as for the purposes of document similarity it seems reasonable to assume that "a stop word is never interesting". |
analyzer
|
The analyzer that is used to analyze the free form text. Defaults to the
analyzer associated with the first field in |
Query Formation Parameters
minimum_should_match
|
After the disjunctive query has been formed, this parameter controls the
number of terms that must match.
The syntax is the same as the minimum should match.
(Defaults to |
fail_on_unsupported_field
|
Controls whether the query should fail (throw an exception) if any of the
specified fields are not of the supported types
( |
boost_terms
|
Each term in the formed query could be further boosted by their tf-idf score.
This sets the boost factor to use when using this feature. Defaults to
deactivated ( |
include
|
Specifies whether the input documents should also be included in the search
results returned. Defaults to |
boost
|
Sets the boost value of the whole query. Defaults to |
Script Query
A query allowing to define scripts as queries. They are typically used in a filter context, for example:
GET /_search
{
"query": {
"bool" : {
"filter" : {
"script" : {
"script" : {
"source": "doc['num1'].value > 1",
"lang": "painless"
}
}
}
}
}
}
Custom Parameters
Scripts are compiled and cached for faster execution. If the same script can be used, just with different parameters provider, it is preferable to use the ability to pass parameters to the script itself, for example:
GET /_search
{
"query": {
"bool" : {
"filter" : {
"script" : {
"script" : {
"source" : "doc['num1'].value > params.param1",
"lang" : "painless",
"params" : {
"param1" : 5
}
}
}
}
}
}
}
Percolate Query
The percolate
query can be used to match queries
stored in an index. The percolate
query itself
contains the document that will be used as query
to match with the stored queries.
Sample Usage
Create an index with two fields:
PUT /my-index
{
"mappings": {
"_doc": {
"properties": {
"message": {
"type": "text"
},
"query": {
"type": "percolator"
}
}
}
}
}
The message
field is the field used to preprocess the document defined in
the percolator
query before it gets indexed into a temporary index.
The query
field is used for indexing the query documents. It will hold a
json object that represents an actual Elasticsearch query. The query
field
has been configured to use the percolator field type. This field
type understands the query dsl and stores the query in such a way that it can be
used later on to match documents defined on the percolate
query.
Register a query in the percolator:
PUT /my-index/_doc/1?refresh
{
"query" : {
"match" : {
"message" : "bonsai tree"
}
}
}
Match a document to the registered percolator queries:
GET /my-index/_search
{
"query" : {
"percolate" : {
"field" : "query",
"document" : {
"message" : "A new bonsai tree in the office"
}
}
}
}
The above request will yield the following response:
{
"took": 13,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped" : 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.5753642,
"hits": [
{ (1)
"_index": "my-index",
"_type": "_doc",
"_id": "1",
"_score": 0.5753642,
"_source": {
"query": {
"match": {
"message": "bonsai tree"
}
}
},
"fields" : {
"_percolator_document_slot" : [0] (2)
}
}
]
}
}
-
The query with id
1
matches our document. -
The
_percolator_document_slot
field indicates which document has matched with this query. Useful when percolating multiple document simultaneously.
Tip
|
To provide a simple example, this documentation uses one index my-index for both the percolate queries and documents.
This set-up can work well when there are just a few percolate queries registered. However, with heavier usage it is recommended
to store queries and documents in separate indices. Please see How it Works Under the Hood for more details.
|
Parameters
The following parameters are required when percolating a document:
field
|
The field of type |
name
|
The suffix to be used for the |
document
|
The source of the document being percolated. |
documents
|
Like the |
document_type
|
The type / mapping of the document being percolated. This setting is deprecated and only required for indices created before 6.0 |
Instead of specifying the source of the document being percolated, the source can also be retrieved from an already
stored document. The percolate
query will then internally execute a get request to fetch that document.
In that case the document
parameter can be substituted with the following parameters:
index
|
The index the document resides in. This is a required parameter. |
type
|
The type of the document to fetch. This is a required parameter. |
id
|
The id of the document to fetch. This is a required parameter. |
routing
|
Optionally, routing to be used to fetch document to percolate. |
preference
|
Optionally, preference to be used to fetch document to percolate. |
version
|
Optionally, the expected version of the document to be fetched. |
Percolating in a filter context
In case you are not interested in the score, better performance can be expected by wrapping
the percolator query in a bool
query’s filter clause or in a constant_score
query:
GET /my-index/_search
{
"query" : {
"constant_score": {
"filter": {
"percolate" : {
"field" : "query",
"document" : {
"message" : "A new bonsai tree in the office"
}
}
}
}
}
}
At index time terms are extracted from the percolator query and the percolator
can often determine whether a query matches just by looking at those extracted
terms. However, computing scores requires to deserialize each matching query
and run it against the percolated document, which is a much more expensive
operation. Hence if computing scores is not required the percolate
query
should be wrapped in a constant_score
query or a bool
query’s filter clause.
Note that the percolate
query never gets cached by the query cache.
Percolating multiple documents
The percolate
query can match multiple documents simultaneously with the indexed percolator queries.
Percolating multiple documents in a single request can improve performance as queries only need to be parsed and
matched once instead of multiple times.
The _percolator_document_slot
field that is being returned with each matched percolator query is important when percolating
multiple documents simultaneously. It indicates which documents matched with a particular percolator query. The numbers
correlate with the slot in the documents
array specified in the percolate
query.
GET /my-index/_search
{
"query" : {
"percolate" : {
"field" : "query",
"documents" : [ (1)
{
"message" : "bonsai tree"
},
{
"message" : "new tree"
},
{
"message" : "the office"
},
{
"message" : "office tree"
}
]
}
}
}
-
The documents array contains 4 documents that are going to be percolated at the same time.
{
"took": 13,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped" : 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.5606477,
"hits": [
{
"_index": "my-index",
"_type": "_doc",
"_id": "1",
"_score": 1.5606477,
"_source": {
"query": {
"match": {
"message": "bonsai tree"
}
}
},
"fields" : {
"_percolator_document_slot" : [0, 1, 3] (1)
}
}
]
}
}
-
The
_percolator_document_slot
indicates that the first, second and last documents specified in thepercolate
query are matching with this query.
Percolating an Existing Document
In order to percolate a newly indexed document, the percolate
query can be used. Based on the response
from an index request, the _id
and other meta information can be used to immediately percolate the newly added
document.
Example
Based on the previous example.
Index the document we want to percolate:
PUT /my-index/_doc/2
{
"message" : "A new bonsai tree in the office"
}
Index response:
{
"_index": "my-index",
"_type": "_doc",
"_id": "2",
"_version": 1,
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"result": "created",
"_seq_no" : 0,
"_primary_term" : 1
}
Percolating an existing document, using the index response as basis to build to new search request:
GET /my-index/_search
{
"query" : {
"percolate" : {
"field": "query",
"index" : "my-index",
"type" : "_doc",
"id" : "2",
"version" : 1 (1)
}
}
}
-
The version is optional, but useful in certain cases. We can ensure that we are trying to percolate the document we just have indexed. A change may be made after we have indexed, and if that is the case the search request would fail with a version conflict error.
The search response returned is identical as in the previous example.
Percolate query and highlighting
The percolate
query is handled in a special way when it comes to highlighting. The queries hits are used
to highlight the document that is provided in the percolate
query. Whereas with regular highlighting the query in
the search request is used to highlight the hits.
Example
This example is based on the mapping of the first example.
Save a query:
PUT /my-index/_doc/3?refresh
{
"query" : {
"match" : {
"message" : "brown fox"
}
}
}
Save another query:
PUT /my-index/_doc/4?refresh
{
"query" : {
"match" : {
"message" : "lazy dog"
}
}
}
Execute a search request with the percolate
query and highlighting enabled:
GET /my-index/_search
{
"query" : {
"percolate" : {
"field": "query",
"document" : {
"message" : "The quick brown fox jumps over the lazy dog"
}
}
},
"highlight": {
"fields": {
"message": {}
}
}
}
This will yield the following response.
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped" : 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.5753642,
"hits": [
{
"_index": "my-index",
"_type": "_doc",
"_id": "4",
"_score": 0.5753642,
"_source": {
"query": {
"match": {
"message": "lazy dog"
}
}
},
"highlight": {
"message": [
"The quick brown fox jumps over the <em>lazy</em> <em>dog</em>" (1)
]
},
"fields" : {
"_percolator_document_slot" : [0]
}
},
{
"_index": "my-index",
"_type": "_doc",
"_id": "3",
"_score": 0.5753642,
"_source": {
"query": {
"match": {
"message": "brown fox"
}
}
},
"highlight": {
"message": [
"The quick <em>brown</em> <em>fox</em> jumps over the lazy dog" (1)
]
},
"fields" : {
"_percolator_document_slot" : [0]
}
}
]
}
}
-
The terms from each query have been highlighted in the document.
Instead of the query in the search request highlighting the percolator hits, the percolator queries are highlighting
the document defined in the percolate
query.
When percolating multiple documents at the same time like the request below then the highlight response is different:
GET /my-index/_search
{
"query" : {
"percolate" : {
"field": "query",
"documents" : [
{
"message" : "bonsai tree"
},
{
"message" : "new tree"
},
{
"message" : "the office"
},
{
"message" : "office tree"
}
]
}
},
"highlight": {
"fields": {
"message": {}
}
}
}
The slightly different response:
{
"took": 13,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped" : 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1.5606477,
"hits": [
{
"_index": "my-index",
"_type": "_doc",
"_id": "1",
"_score": 1.5606477,
"_source": {
"query": {
"match": {
"message": "bonsai tree"
}
}
},
"fields" : {
"_percolator_document_slot" : [0, 1, 3]
},
"highlight" : { (1)
"0_message" : [
"<em>bonsai</em> <em>tree</em>"
],
"3_message" : [
"office <em>tree</em>"
],
"1_message" : [
"new <em>tree</em>"
]
}
}
]
}
}
-
The highlight fields have been prefixed with the document slot they belong to, in order to know which highlight field belongs to what document.
Specifying multiple percolate queries
It is possible to specify multiple percolate
queries in a single search request:
GET /my-index/_search
{
"query" : {
"bool" : {
"should" : [
{
"percolate" : {
"field" : "query",
"document" : {
"message" : "bonsai tree"
},
"name": "query1" (1)
}
},
{
"percolate" : {
"field" : "query",
"document" : {
"message" : "tulip flower"
},
"name": "query2" (1)
}
}
]
}
}
}
-
The
name
parameter will be used to identify which percolator document slots belong to whatpercolate
query.
The _percolator_document_slot
field name will be suffixed with what is specified in the _name
parameter.
If that isn’t specified then the field
parameter will be used, which in this case will result in ambiguity.
The above search request returns a response similar to this:
{
"took": 13,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped" : 0,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.5753642,
"hits": [
{
"_index": "my-index",
"_type": "_doc",
"_id": "1",
"_score": 0.5753642,
"_source": {
"query": {
"match": {
"message": "bonsai tree"
}
}
},
"fields" : {
"_percolator_document_slot_query1" : [0] (1)
}
}
]
}
}
-
The
_percolator_document_slot_query1
percolator slot field indicates that these matched slots are from thepercolate
query with_name
parameter set toquery1
.
How it Works Under the Hood
When indexing a document into an index that has the percolator field type mapping configured, the query part of the document gets parsed into a Lucene query and is stored into the Lucene index. A binary representation of the query gets stored, but also the query’s terms are analyzed and stored into an indexed field.
At search time, the document specified in the request gets parsed into a Lucene document and is stored in a in-memory temporary Lucene index. This in-memory index can just hold this one document and it is optimized for that. After this a special query is built based on the terms in the in-memory index that select candidate percolator queries based on their indexed query terms. These queries are then evaluated by the in-memory index if they actually match.
The selecting of candidate percolator queries matches is an important performance optimization during the execution
of the percolate
query as it can significantly reduce the number of candidate matches the in-memory index needs to
evaluate. The reason the percolate
query can do this is because during indexing of the percolator queries the query
terms are being extracted and indexed with the percolator query. Unfortunately the percolator cannot extract terms from
all queries (for example the wildcard
or geo_shape
query) and as a result of that in certain cases the percolator
can’t do the selecting optimization (for example if an unsupported query is defined in a required clause of a boolean query
or the unsupported query is the only query in the percolator document). These queries are marked by the percolator and
can be found by running the following search:
GET /_search
{
"query": {
"term" : {
"query.extraction_result" : "failed"
}
}
}
Note
|
The above example assumes that there is a query field of type
percolator in the mappings.
|
Given the design of percolation, it often makes sense to use separate indices for the percolate queries and documents being percolated, as opposed to a single index as we do in examples. There are a few benefits to this approach:
-
Because percolate queries contain a different set of fields from the percolated documents, using two separate indices allows for fields to be stored in a denser, more efficient way.
-
Percolate queries do not scale in the same way as other queries, so percolation performance may benefit from using a different index configuration, like the number of primary shards.
Wrapper Query
A query that accepts any other query as base64 encoded string.
GET /_search
{
"query" : {
"wrapper": {
"query" : "eyJ0ZXJtIiA6IHsgInVzZXIiIDogIktpbWNoeSIgfX0=" (1)
}
}
}
-
Base64 encoded string:
{"term" : { "user" : "Kimchy" }}
This query is more useful in the context of the Java high-level REST client or transport client to also accept queries as json formatted string. In these cases queries can be specified as a json or yaml formatted string or as a query builder (which is a available in the Java high-level REST client).
Span queries
Span queries are low-level positional queries which provide expert control over the order and proximity of the specified terms. These are typically used to implement very specific queries on legal documents or patents.
Setting boost
on inner span queries is deprecated. Compound span queries,
like span_near, only use the list of matching spans of inner span queries in
order to find their own spans, which they then use to produce a score. Scores
are never computed on inner span queries, which is the reason why their boosts
don’t make sense.
Span queries cannot be mixed with non-span queries (with the exception of the span_multi
query).
The queries in this group are:
span_term
query-
The equivalent of the
term
query but for use with other span queries. span_multi
query-
Wraps a
term
,range
,prefix
,wildcard
,regexp
, orfuzzy
query. span_first
query-
Accepts another span query whose matches must appear within the first N positions of the field.
span_near
query-
Accepts multiple span queries whose matches must be within the specified distance of each other, and possibly in the same order.
span_or
query-
Combines multiple span queries — returns documents which match any of the specified queries.
span_not
query-
Wraps another span query, and excludes any documents which match that query.
span_containing
query-
Accepts a list of span queries, but only returns those spans which also match a second span query.
span_within
query-
The result from a single span query is returned as long is its span falls within the spans returned by a list of other span queries.
field_masking_span
query-
Allows queries like
span-near
orspan-or
across different fields.
Span Term Query
Matches spans containing a term. The span term query maps to Lucene
SpanTermQuery
. Here is an example:
GET /_search
{
"query": {
"span_term" : { "user" : "kimchy" }
}
}
A boost can also be associated with the query:
GET /_search
{
"query": {
"span_term" : { "user" : { "value" : "kimchy", "boost" : 2.0 } }
}
}
Or :
GET /_search
{
"query": {
"span_term" : { "user" : { "term" : "kimchy", "boost" : 2.0 } }
}
}
Span Multi Term Query
The span_multi
query allows you to wrap a multi term query
(one of wildcard,
fuzzy, prefix, range or regexp query) as a span query
, so
it can be nested. Example:
GET /_search
{
"query": {
"span_multi":{
"match":{
"prefix" : { "user" : { "value" : "ki" } }
}
}
}
}
A boost can also be associated with the query:
GET /_search
{
"query": {
"span_multi":{
"match":{
"prefix" : { "user" : { "value" : "ki", "boost" : 1.08 } }
}
}
}
}
Warning
|
span_multi queries will hit too many clauses failure if the number of terms that match the query exceeds the
boolean query limit (defaults to 1024).To avoid an unbounded expansion you can set the rewrite method of the multi term query to top_terms_* rewrite. Or, if you use span_multi on prefix query only,
you can activate the index_prefixes field option of the text field instead. This will
rewrite any prefix query on the field to a single term query that matches the indexed prefix.
|
Span First Query
Matches spans near the beginning of a field. The span first query maps
to Lucene SpanFirstQuery
. Here is an example:
GET /_search
{
"query": {
"span_first" : {
"match" : {
"span_term" : { "user" : "kimchy" }
},
"end" : 3
}
}
}
The match
clause can be any other span type query. The end
controls
the maximum end position permitted in a match.
Span Near Query
Matches spans which are near one another. One can specify slop, the
maximum number of intervening unmatched positions, as well as whether
matches are required to be in-order. The span near query maps to Lucene
SpanNearQuery
. Here is an example:
GET /_search
{
"query": {
"span_near" : {
"clauses" : [
{ "span_term" : { "field" : "value1" } },
{ "span_term" : { "field" : "value2" } },
{ "span_term" : { "field" : "value3" } }
],
"slop" : 12,
"in_order" : false
}
}
}
The clauses
element is a list of one or more other span type queries
and the slop
controls the maximum number of intervening unmatched
positions permitted.
Span Or Query
Matches the union of its span clauses. The span or query maps to Lucene
SpanOrQuery
. Here is an example:
GET /_search
{
"query": {
"span_or" : {
"clauses" : [
{ "span_term" : { "field" : "value1" } },
{ "span_term" : { "field" : "value2" } },
{ "span_term" : { "field" : "value3" } }
]
}
}
}
The clauses
element is a list of one or more other span type queries.
Span Not Query
Removes matches which overlap with another span query or which are
within x tokens before (controlled by the parameter pre
) or y tokens
after (controled by the parameter post
) another SpanQuery. The span not
query maps to Lucene SpanNotQuery
. Here is an example:
GET /_search
{
"query": {
"span_not" : {
"include" : {
"span_term" : { "field1" : "hoya" }
},
"exclude" : {
"span_near" : {
"clauses" : [
{ "span_term" : { "field1" : "la" } },
{ "span_term" : { "field1" : "hoya" } }
],
"slop" : 0,
"in_order" : true
}
}
}
}
}
The include
and exclude
clauses can be any span type query. The
include
clause is the span query whose matches are filtered, and the
exclude
clause is the span query whose matches must not overlap those
returned.
In the above example all documents with the term hoya are filtered except the ones that have 'la' preceding them.
Other top level options:
pre
|
If set the amount of tokens before the include span can’t have overlap with the exclude span. Defaults to 0. |
post
|
If set the amount of tokens after the include span can’t have overlap with the exclude span. Defaults to 0. |
dist
|
If set the amount of tokens from within the include span can’t have overlap with the exclude span. Equivalent
of setting both |
Span Containing Query
Returns matches which enclose another span query. The span containing
query maps to Lucene SpanContainingQuery
. Here is an example:
GET /_search
{
"query": {
"span_containing" : {
"little" : {
"span_term" : { "field1" : "foo" }
},
"big" : {
"span_near" : {
"clauses" : [
{ "span_term" : { "field1" : "bar" } },
{ "span_term" : { "field1" : "baz" } }
],
"slop" : 5,
"in_order" : true
}
}
}
}
}
The big
and little
clauses can be any span type query. Matching
spans from big
that contain matches from little
are returned.
Span Within Query
Returns matches which are enclosed inside another span query. The span within
query maps to Lucene SpanWithinQuery
. Here is an example:
GET /_search
{
"query": {
"span_within" : {
"little" : {
"span_term" : { "field1" : "foo" }
},
"big" : {
"span_near" : {
"clauses" : [
{ "span_term" : { "field1" : "bar" } },
{ "span_term" : { "field1" : "baz" } }
],
"slop" : 5,
"in_order" : true
}
}
}
}
}
The big
and little
clauses can be any span type query. Matching
spans from little
that are enclosed within big
are returned.
Span Field Masking Query
Wrapper to allow span queries to participate in composite single-field span queries by 'lying' about their search field. The span field masking query maps to Lucene’s SpanFieldMaskingQuery
This can be used to support queries like span-near
or span-or
across different fields, which is not ordinarily permitted.
Span field masking query is invaluable in conjunction with multi-fields when same content is indexed with multiple analyzers. For instance we could index a field with the standard analyzer which breaks text up into words, and again with the english analyzer which stems words into their root form.
Example:
GET /_search
{
"query": {
"span_near": {
"clauses": [
{
"span_term": {
"text": "quick brown"
}
},
{
"field_masking_span": {
"query": {
"span_term": {
"text.stems": "fox"
}
},
"field": "text"
}
}
],
"slop": 5,
"in_order": false
}
}
}
Note: as span field masking query returns the masked field, scoring will be done using the norms of the field name supplied. This may lead to unexpected scoring behaviour.
Minimum Should Match
The minimum_should_match
parameter possible values:
Type | Example | Description |
---|---|---|
Integer |
|
Indicates a fixed value regardless of the number of optional clauses. |
Negative integer |
|
Indicates that the total number of optional clauses, minus this number should be mandatory. |
Percentage |
|
Indicates that this percent of the total number of optional clauses are necessary. The number computed from the percentage is rounded down and used as the minimum. |
Negative percentage |
|
Indicates that this percent of the total number of optional clauses can be missing. The number computed from the percentage is rounded down, before being subtracted from the total to determine the minimum. |
Combination |
|
A positive integer, followed by the less-than symbol, followed by any of the previously mentioned specifiers is a conditional specification. It indicates that if the number of optional clauses is equal to (or less than) the integer, they are all required, but if it’s greater than the integer, the specification applies. In this example: if there are 1 to 3 clauses they are all required, but for 4 or more clauses only 90% are required. |
Multiple combinations |
|
Multiple conditional specifications can be separated by spaces, each one only being valid for numbers greater than the one before it. In this example: if there are 1 or 2 clauses both are required, if there are 3-9 clauses all but 25% are required, and if there are more than 9 clauses, all but three are required. |
NOTE:
When dealing with percentages, negative values can be used to get different behavior in edge cases. 75% and -25% mean the same thing when dealing with 4 clauses, but when dealing with 5 clauses 75% means 3 are required, but -25% means 4 are required.
If the calculations based on the specification determine that no optional clauses are needed, the usual rules about BooleanQueries still apply at search time (a BooleanQuery containing no required clauses must still match at least one optional clause)
No matter what number the calculation arrives at, a value greater than the number of optional clauses, or a value less than 1 will never be used. (ie: no matter how low or how high the result of the calculation result is, the minimum number of required matches will never be lower than 1 or greater than the number of clauses.
Multi Term Query Rewrite
Multi term queries, like
wildcard and
prefix are called
multi term queries and end up going through a process of rewrite. This
also happens on the
query_string.
All of those queries allow to control how they will get rewritten using
the rewrite
parameter:
-
constant_score
(default): A rewrite method that performs likeconstant_score_boolean
when there are few matching terms and otherwise visits all matching terms in sequence and marks documents for that term. Matching documents are assigned a constant score equal to the query’s boost. -
scoring_boolean
: A rewrite method that first translates each term into a should clause in a boolean query, and keeps the scores as computed by the query. Note that typically such scores are meaningless to the user, and require non-trivial CPU to compute, so it’s almost always better to useconstant_score
. This rewrite method will hit too many clauses failure if it exceeds the boolean query limit (defaults to1024
). -
constant_score_boolean
: Similar toscoring_boolean
except scores are not computed. Instead, each matching document receives a constant score equal to the query’s boost. This rewrite method will hit too many clauses failure if it exceeds the boolean query limit (defaults to1024
). -
top_terms_N
: A rewrite method that first translates each term into should clause in boolean query, and keeps the scores as computed by the query. This rewrite method only uses the top scoring terms so it will not overflow boolean max clause count. TheN
controls the size of the top scoring terms to use. -
top_terms_boost_N
: A rewrite method that first translates each term into should clause in boolean query, but the scores are only computed as the boost. This rewrite method only uses the top scoring terms so it will not overflow the boolean max clause count. TheN
controls the size of the top scoring terms to use. -
top_terms_blended_freqs_N
: A rewrite method that first translates each term into should clause in boolean query, but all term queries compute scores as if they had the same frequency. In practice the frequency which is used is the maximum frequency of all matching terms. This rewrite method only uses the top scoring terms so it will not overflow boolean max clause count. TheN
controls the size of the top scoring terms to use.