"Fossies" - the Fresh Open Source Software Archive

Member "elasticsearch-6.8.23/docs/reference/scripting.asciidoc" (29 Dec 2021, 2158 Bytes) of package /linux/www/elasticsearch-6.8.23-src.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format (assuming AsciiDoc format). Alternatively you can here view or download the uninterpreted source code file. A member file download can also be achieved by clicking within a package contents listing on the according byte size field.

How to use scripts

Wherever scripting is supported in the Elasticsearch API, the syntax follows the same pattern:

  "script": {
    "lang":   "...",  (1)
    "source" | "id": "...", (2)
    "params": { ... } (3)
  }
  1. The language the script is written in, which defaults to painless.

  2. The script itself which may be specified as source for an inline script or id for a stored script.

  3. Any named parameters that should be passed into the script.

For example, the following script is used in a search request to return a scripted field:

PUT my_index/_doc/1
{
  "my_field": 5
}

GET my_index/_search
{
  "script_fields": {
    "my_doubled_field": {
      "script": {
        "lang":   "expression",
        "source": "doc['my_field'] * multiplier",
        "params": {
          "multiplier": 2
        }
      }
    }
  }
}

Script parameters

lang

Specifies the language the script is written in. Defaults to painless.

source, id

Specifies the source of the script. An inline script is specified source as in the example above. A stored script is specified id and is retrieved from the cluster state (see Stored Scripts).

params

Specifies any named parameters that are passed into the script as variables.

Important
Prefer parameters

The first time Elasticsearch sees a new script, it compiles it and stores the compiled version in a cache. Compilation can be a heavy process.

If you need to pass variables into the script, you should pass them in as named params instead of hard-coding values into the script itself. For example, if you want to be able to multiply a field value by different multipliers, don’t hard-code the multiplier into the script:

  "source": "doc['my_field'] * 2"

Instead, pass it in as a named parameter:

  "source": "doc['my_field'] * multiplier",
  "params": {
    "multiplier": 2
  }

The first version has to be recompiled every time the multiplier changes. The second version is only compiled once.

If you compile too many unique scripts within a small amount of time, Elasticsearch will reject the new dynamic scripts with a circuit_breaking_exception error. By default, up to 15 inline scripts per minute will be compiled. You can change this setting dynamically by setting script.max_compilations_rate.

Short script form

A short script form can be used for brevity. In the short form, script is represented by a string instead of an object. This string contains the source of the script.

Short form:

  "script": "ctx._source.likes++"

The same script in the normal form:

  "script": {
    "source": "ctx._source.likes++"
  }

Stored scripts

Scripts may be stored in and retrieved from the cluster state using the _scripts end-point.

Request examples

The following are examples of using a stored script that lives at /_scripts/{id}.

First, create the script called calculate-score in the cluster state:

POST _scripts/calculate-score
{
  "script": {
    "lang": "painless",
    "source": "Math.log(_score * 2) + params.my_modifier"
  }
}

This same script can be retrieved with:

GET _scripts/calculate-score

Stored scripts can be used by specifying the id parameters as follows:

GET _search
{
  "query": {
    "script": {
      "script": {
        "id": "calculate-score",
        "params": {
          "my_modifier": 2
        }
      }
    }
  }
}

And deleted with:

DELETE _scripts/calculate-score

Search templates

You can also use the _scripts API to store search templates. Search templates save specific search requests with placeholder values, called template parameters.

You can use stored search templates to run searches without writing out the entire query. Just provide the stored template’s ID and the template parameters. This is useful when you want to run a commonly used query quickly and without mistakes.

Search templates use the mustache templating language. See [search-template] for more information and examples.

Script caching

All scripts are cached by default so that they only need to be recompiled when updates occur. By default, scripts do not have a time-based expiration, but you can change this behavior by using the script.cache.expire setting. You can configure the size of this cache by using the script.cache.max_size setting. By default, the cache size is 100.

Note
The size of scripts is limited to 65,535 bytes. This can be changed by setting script.max_size_in_bytes setting to increase that soft limit, but if scripts are really large then a native script engine should be considered.

Accessing document fields and special variables

Depending on where a script is used, it will have access to certain special variables and document fields.

Update scripts

A script used in the update, update-by-query, or reindex API will have access to the ctx variable which exposes:

ctx._source

Access to the document _source field.

ctx.op

The operation that should be applied to the document: index or delete.

ctx._index etc

Access to document meta-fields, some of which may be read-only.

Search and aggregation scripts

With the exception of script fields which are executed once per search hit, scripts used in search and aggregations will be executed once for every document which might match a query or an aggregation. Depending on how many documents you have, this could mean millions or billions of executions: these scripts need to be fast!

Field values can be accessed from a script using doc-values, the _source field, or stored fields, each of which is explained below.

Accessing the score of a document within a script

Scripts used in the function_score query, in script-based sorting, or in aggregations have access to the _score variable which represents the current relevance score of a document.

Here’s an example of using a script in a function_score query to alter the relevance _score of each document:

PUT my_index/_doc/1?refresh
{
  "text": "quick brown fox",
  "popularity": 1
}

PUT my_index/_doc/2?refresh
{
  "text": "quick fox",
  "popularity": 5
}

GET my_index/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "text": "quick brown fox"
        }
      },
      "script_score": {
        "script": {
          "lang": "expression",
          "source": "_score * doc['popularity']"
        }
      }
    }
  }
}

Doc values

By far the fastest most efficient way to access a field value from a script is to use the doc['field_name'] syntax, which retrieves the field value from doc values. Doc values are a columnar field value store, enabled by default on all fields except for analyzed text fields.

PUT my_index/_doc/1?refresh
{
  "cost_price": 100
}

GET my_index/_search
{
  "script_fields": {
    "sales_price": {
      "script": {
        "lang":   "expression",
        "source": "doc['cost_price'] * markup",
        "params": {
          "markup": 0.2
        }
      }
    }
  }
}

Doc-values can only return "simple" field values like numbers, dates, geo- points, terms, etc, or arrays of these values if the field is multi-valued. It cannot return JSON objects.

Note
Missing fields

The doc['field'] will throw an error if field is missing from the mappings. In painless, a check can first be done with doc.containsKey('field') to guard accessing the doc map. Unfortunately, there is no way to check for the existence of the field in mappings in an expression script.

Note
Doc values and text fields

The doc['field'] syntax can also be used for analyzed text fields if fielddata is enabled, but BEWARE: enabling fielddata on a text field requires loading all of the terms into the JVM heap, which can be very expensive both in terms of memory and CPU. It seldom makes sense to access text fields from scripts.

The document _source

The document _source can be accessed using the _source.field_name syntax. The _source is loaded as a map-of-maps, so properties within object fields can be accessed as, for example, _source.name.first.

Important
Prefer doc-values to _source

Accessing the _source field is much slower than using doc-values. The _source field is optimised for returning several fields per result, while doc values are optimised for accessing the value of a specific field in many documents.

It makes sense to use _source or stored fields when generating a script field for the top ten hits from a search result but, for other search and aggregation use cases, always prefer using doc values.

For instance:

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "first_name": {
          "type": "text"
        },
        "last_name": {
          "type": "text"
        }
      }
    }
  }
}

PUT my_index/_doc/1?refresh
{
  "first_name": "Barry",
  "last_name": "White"
}

GET my_index/_search
{
  "script_fields": {
    "full_name": {
      "script": {
        "lang": "painless",
        "source": "params._source.first_name + ' ' + params._source.last_name"
      }
    }
  }
}

Stored fields

Stored fields — fields explicitly marked as "store": true in the mapping — can be accessed using the _fields['field_name'].value or _fields['field_name'] syntax:

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "full_name": {
          "type": "text",
          "store": true
        },
        "title": {
          "type": "text",
          "store": true
        }
      }
    }
  }
}

PUT my_index/_doc/1?refresh
{
  "full_name": "Alice Ball",
  "title": "Professor"
}

GET my_index/_search
{
  "script_fields": {
    "name_with_title": {
      "script": {
        "lang": "painless",
        "source": "params._fields['title'].value + ' ' + params._fields['full_name'].value"
      }
    }
  }
}
Tip
Stored vs _source

The _source field is just a special stored field, so the performance is similar to that of other stored fields. The _source provides access to the original document body that was indexed (including the ability to distinguish null values from empty fields, single-value arrays from plain scalars, etc).

The only time it really makes sense to use stored fields instead of the _source field is when the _source is very large and it is less costly to access a few small stored fields instead of the entire _source.

Scripting and security

While Elasticsearch contributors make every effort to prevent scripts from running amok, security is something best done in layers because all software has bugs and it is important to minimize the risk of failure in any security layer. Find below rules of thumb for how to keep Elasticsearch from being a vulnerability.

Do not run as root

First and foremost, never run Elasticsearch as the root user as this would allow any successful effort to circumvent the other security layers to do anything on your server. Elasticsearch will refuse to start if it detects that it is running as root but this is so important that it is worth double and triple checking.

Do not expose Elasticsearch directly to users

Do not expose Elasticsearch directly to users, instead have an application make requests on behalf of users. If this is not possible, have an application to sanitize requests from users. If that is not possible then have some mechanism to track which users did what. Understand that it is quite possible to write a _search that overwhelms Elasticsearch and brings down the cluster. All such searches should be considered bugs and the Elasticsearch contributors make an effort to prevent this but they are still possible.

Do not expose Elasticsearch directly to the Internet

Do not expose Elasticsearch to the Internet, instead have an application make requests on behalf of the Internet. Do not entertain the thought of having an application "sanitize" requests to Elasticsearch. Understand that it is possible for a sufficiently determined malicious user to write searches that overwhelm the Elasticsearch cluster and bring it down. For example:

Good:

Bad:

  • Users can write arbitrary scripts, queries, _search requests.

  • User actions make documents with structure defined by users.

Other security layers

In addition to user privileges and script sandboxing Elasticsearch uses the Java Security Manager and native security tools as additional layers of security.

As part of its startup sequence Elasticsearch enables the Java Security Manager which limits the actions that can be taken by portions of the code. Painless uses this to limit the actions that generated Painless scripts can take, preventing them from being able to do things like write files and listen to sockets.

Elasticsearch uses seccomp in Linux, Seatbelt in macOS, and ActiveProcessLimit on Windows to prevent Elasticsearch from forking or executing other processes.

Below this we describe the security settings for scripts and how you can change from the defaults described above. You should be very, very careful when allowing more than the defaults. Any extra permissions weakens the total security of the Elasticsearch deployment.

Allowed script types setting

By default all script types are allowed to be executed. This can be modified using the setting script.allowed_types. Only the types specified as part of the setting will be allowed to be executed. To specify no types are allowed, set script.allowed_types to be none.

script.allowed_types: inline (1)
  1. This will allow only inline scripts to be executed but not stored scripts (or any other types).

Allowed script contexts setting

By default all script contexts are allowed to be executed. This can be modified using the setting script.allowed_contexts. Only the contexts specified as part of the setting will be allowed to be executed. To specify no contexts are allowed, set script.allowed_contexts to be none.

script.allowed_contexts: search, update (1)
  1. This will allow only search and update scripts to be executed but not aggs or plugin scripts (or any other contexts).

Painless scripting language

Painless is a simple, secure scripting language designed specifically for use with Elasticsearch. It is the default scripting language for Elasticsearch and can safely be used for inline and stored scripts. For a detailed description of the Painless syntax and language features, see the {painless}/painless-lang-spec.html[Painless Language Specification].

You can use Painless anywhere scripts can be used in Elasticsearch. Painless provides:

  • Fast performance: Painless scripts run several times faster than the alternatives.

  • Safety: Fine-grained whitelist with method call/field granularity. See the {painless}/painless-api-reference.html[Painless API Reference] for a complete list of available classes and methods.

  • Optional typing: Variables and parameters can use explicit types or the dynamic def type.

  • Syntax: Extends Java’s syntax to provide Groovy-style scripting language features that make scripts easier to write.

  • Optimizations: Designed specifically for Elasticsearch scripting.

Ready to start scripting with Painless? See {painless}/painless-getting-started.html[Getting Started with Painless] in the guide to the {painless}/index.html[Painless Scripting Language].

Lucene expressions language

Lucene’s expressions compile a javascript expression to bytecode. They are designed for high-performance custom ranking and sorting functions and are enabled for inline and stored scripting by default.

Performance

Expressions were designed to have competitive performance with custom Lucene code. This performance is due to having low per-document overhead as opposed to other scripting engines: expressions do more "up-front".

This allows for very fast execution, even faster than if you had written a native script.

Syntax

Expressions support a subset of javascript syntax: a single expression.

See the expressions module documentation for details on what operators and functions are available.

Variables in expression scripts are available to access:

  • document fields, e.g. doc['myfield'].value

  • variables and methods that the field supports, e.g. doc['myfield'].empty

  • Parameters passed into the script, e.g. mymodifier

  • The current document’s score, _score (only available when used in a script_score)

You can use Expressions scripts for script_score, script_fields, sort scripts, and numeric aggregation scripts, simply set the lang parameter to expression.

Numeric field API

Expression Description

doc['field_name'].value

The value of the field, as a double

doc['field_name'].empty

A boolean indicating if the field has no values within the doc.

doc['field_name'].length

The number of values in this document.

doc['field_name'].min()

The minimum value of the field in this document.

doc['field_name'].max()

The maximum value of the field in this document.

doc['field_name'].median()

The median value of the field in this document.

doc['field_name'].avg()

The average of the values in this document.

doc['field_name'].sum()

The sum of the values in this document.

When a document is missing the field completely, by default the value will be treated as 0. You can treat it as another value instead, e.g. doc['myfield'].empty ? 100 : doc['myfield'].value

When a document has multiple values for the field, by default the minimum value is returned. You can choose a different value instead, e.g. doc['myfield'].sum().

When a document is missing the field completely, by default the value will be treated as 0.

Boolean fields are exposed as numerics, with true mapped to 1 and false mapped to 0. For example: doc['on_sale'].value ? doc['price'].value * 0.5 : doc['price'].value

Date field API

Date fields are treated as the number of milliseconds since January 1, 1970 and support the Numeric Fields API above, plus access to some date-specific fields:

Expression Description

doc['field_name'].date.centuryOfEra

Century (1-2920000)

doc['field_name'].date.dayOfMonth

Day (1-31), e.g. 1 for the first of the month.

doc['field_name'].date.dayOfWeek

Day of the week (1-7), e.g. 1 for Monday.

doc['field_name'].date.dayOfYear

Day of the year, e.g. 1 for January 1.

doc['field_name'].date.era

Era: 0 for BC, 1 for AD.

doc['field_name'].date.hourOfDay

Hour (0-23).

doc['field_name'].date.millisOfDay

Milliseconds within the day (0-86399999).

doc['field_name'].date.millisOfSecond

Milliseconds within the second (0-999).

doc['field_name'].date.minuteOfDay

Minute within the day (0-1439).

doc['field_name'].date.minuteOfHour

Minute within the hour (0-59).

doc['field_name'].date.monthOfYear

Month within the year (1-12), e.g. 1 for January.

doc['field_name'].date.secondOfDay

Second within the day (0-86399).

doc['field_name'].date.secondOfMinute

Second within the minute (0-59).

doc['field_name'].date.year

Year (-292000000 - 292000000).

doc['field_name'].date.yearOfCentury

Year within the century (1-100).

doc['field_name'].date.yearOfEra

Year within the era (1-292000000).

The following example shows the difference in years between the date fields date0 and date1:

doc['date1'].date.year - doc['date0'].date.year

geo_point field API

Expression Description

doc['field_name'].empty

A boolean indicating if the field has no values within the doc.

doc['field_name'].lat

The latitude of the geo point.

doc['field_name'].lon

The longitude of the geo point.

The following example computes distance in kilometers from Washington, DC:

haversin(38.9072, 77.0369, doc['field_name'].lat, doc['field_name'].lon)

In this example the coordinates could have been passed as parameters to the script, e.g. based on geolocation of the user.

Limitations

There are a few limitations relative to other script languages:

  • Only numeric, boolean, date, and geo_point fields may be accessed

  • Stored fields are not available

Advanced scripts using script engines

A ScriptEngine is a backend for implementing a scripting language. It may also be used to write scripts that need to use advanced internals of scripting. For example, a script that wants to use term frequencies while scoring.

The plugin {plugins}/plugin-authors.html[documentation] has more information on how to write a plugin so that Elasticsearch will properly load it. To register the ScriptEngine, your plugin should implement the ScriptPlugin interface and override the getScriptEngine(Settings settings) method.

The following is an example of a custom ScriptEngine which uses the language name expert_scripts. It implements a single script called pure_df which may be used as a search script to override each document’s score as the document frequency of a provided term.

include-tagged::{plugins-examples-dir}/script-expert-scoring/src/main/java/org/elasticsearch/example/expertscript/ExpertScriptPlugin.java[expert_engine]

You can execute the script by specifying its lang as expert_scripts, and the name of the script as the script source:

POST /_search
{
  "query": {
    "function_score": {
      "query": {
        "match": {
          "body": "foo"
        }
      },
      "functions": [
        {
          "script_score": {
            "script": {
                "source": "pure_df",
                "lang" : "expert_scripts",
                "params": {
                    "field": "body",
                    "term": "foo"
                }
            }
          }
        }
      ]
    }
  }
}