"Fossies" - the Fresh Open Source Software Archive

Member "elasticsearch-6.8.23/docs/reference/indices.asciidoc" (29 Dec 2021, 2096 Bytes) of package /linux/www/elasticsearch-6.8.23-src.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format (assuming AsciiDoc format). Alternatively you can here view or download the uninterpreted source code file. A member file download can also be achieved by clicking within a package contents listing on the according byte size field.

Create Index

The Create Index API is used to manually create an index in Elasticsearch. All documents in Elasticsearch are stored inside of one index or another.

The most basic command is the following:

PUT twitter

This create an index named twitter with all default setting.

Note
Index name limitations

There are several limitations to what you can name your index. The complete list of limitations are:

  • Lowercase only

  • Cannot include \, /, *, ?, ", <, >, |, ` ` (space character), ,, #

  • Indices prior to 7.0 could contain a colon (:), but that’s been deprecated and won’t be supported in 7.0+

  • Cannot start with -, _, +

  • Cannot be . or ..

  • Cannot be longer than 255 bytes (note it is bytes, so multi-byte characters will count towards the 255 limit faster)

Index Settings

Each index created can have specific settings associated with it, defined in the body:

PUT twitter
{
    "settings" : {
        "index" : {
            "number_of_shards" : 3, (1)
            "number_of_replicas" : 2 (2)
        }
    }
}
  1. Default for number_of_shards is 5

  2. Default for number_of_replicas is 1 (ie one replica for each primary shard)

or more simplified

PUT twitter
{
    "settings" : {
        "number_of_shards" : 3,
        "number_of_replicas" : 2
    }
}
Note
You do not have to explicitly specify index section inside the settings section.

For more information regarding all the different index level settings that can be set when creating an index, please check the index modules section.

Mappings

The create index API allows to provide a type mapping:

PUT test
{
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "_doc" : {
            "properties" : {
                "field1" : { "type" : "text" }
            }
        }
    }
}

Aliases

The create index API allows also to provide a set of aliases:

PUT test
{
    "aliases" : {
        "alias_1" : {},
        "alias_2" : {
            "filter" : {
                "term" : {"user" : "kimchy" }
            },
            "routing" : "kimchy"
        }
    }
}

Wait For Active Shards

By default, index creation will only return a response to the client when the primary copies of each shard have been started, or the request times out. The index creation response will indicate what happened:

{
    "acknowledged": true,
    "shards_acknowledged": true,
    "index": "test"
}

acknowledged indicates whether the index was successfully created in the cluster, while shards_acknowledged indicates whether the requisite number of shard copies were started for each shard in the index before timing out. Note that it is still possible for either acknowledged or shards_acknowledged to be false, but the index creation was successful. These values simply indicate whether the operation completed before the timeout. If acknowledged is false, then we timed out before the cluster state was updated with the newly created index, but it probably will be created sometime soon. If shards_acknowledged is false, then we timed out before the requisite number of shards were started (by default just the primaries), even if the cluster state was successfully updated to reflect the newly created index (i.e. acknowledged=true).

We can change the default of only waiting for the primary shards to start through the index setting index.write.wait_for_active_shards (note that changing this setting will also affect the wait_for_active_shards value on all subsequent write operations):

PUT test
{
    "settings": {
        "index.write.wait_for_active_shards": "2"
    }
}

or through the request parameter wait_for_active_shards:

PUT test?wait_for_active_shards=2

A detailed explanation of wait_for_active_shards and its possible values can be found here.

Skipping types

Types are being removed from Elasticsearch: in 7.0, the mappings element will no longer take the type name as a top-level key by default. You can already opt in for this behavior by setting include_type_name=false and putting mappings directly under mappings in the index creation call, without specifying a type name.

Here is an example:

PUT test?include_type_name=false
{
  "mappings": {
    "properties": {
      "foo": {
        "type": "keyword"
      }
    }
  }
}

Delete Index

Use the delete index API to delete an existing index.

DELETE /twitter

The above example deletes an index called twitter. Specifying an index or a wildcard expression is required. Aliases cannot be used to delete an index. Wildcard expressions are resolved to matching concrete indices only.

The delete index API can also be applied to more than one index, by either using a comma separated list, or on all indices (be careful!) by using _all or * as index.

In order to disable allowing to delete indices via wildcards or _all, set action.destructive_requires_name setting in the config to true. This setting can also be changed via the cluster update settings api.

Get Index

The get index API allows to retrieve information about one or more indexes.

GET /twitter

The above example gets the information for an index called twitter. Specifying an index, alias or wildcard expression is required.

The get index API can also be applied to more than one index, or on all indices by using _all or * as index.

Skipping types

Types are being removed from Elasticsearch: in 7.0, the mappings element will no longer return the type name as a top-level key by default. You can already opt in for this behavior by setting include_type_name=false on the request.

Note
Such calls will be rejected on indices that have multiple types as it introduces ambiguity as to which mapping should be returned. Only indices created by Elasticsearch 5.x may have multiple types.

Here is an example:

GET twitter?include_type_name=false

which returns

{
    "twitter": {
        "aliases": {},
        "mappings" : {
            "properties" : {
              "date" : {
                "type" : "date"
              },
              "likes" : {
                "type" : "long"
              },
              "message" : {
                "type" : "text",
                "fields" : {
                  "keyword" : {
                    "type" : "keyword",
                    "ignore_above" : 256
                  }
                }
              },
              "user" : {
                "type" : "keyword"
              }
            }
        },
        "settings": {
            "index": {
                "creation_date": "1547028674905",
                "number_of_shards": "1",
                "number_of_replicas": "1",
                "uuid": "u1YpkPqLSqGIn3kNAvY8cA",
                "version": {
                    "created": ...
                },
                "provided_name": "twitter"
            }
        }
    }
}

Indices Exists

Used to check if the index (indices) exists or not. For example:

HEAD twitter

The HTTP status code indicates if the index exists or not. A 404 means it does not exist, and 200 means it does.

Important
This request does not distinguish between an index and an alias, i.e. status code 200 is also returned if an alias exists with that name.

Open / Close Index API

The open and close index APIs allow to close an index, and later on opening it. A closed index has almost no overhead on the cluster (except for maintaining its metadata), and is blocked for read/write operations. A closed index can be opened which will then go through the normal recovery process.

Warning

To reduce the risk of data loss, avoid keeping closed indices in your cluster for long periods of time. When a node leaves your cluster, {es} does not rebuild the lost copies of the shards in any closed indices. As you replace the nodes in your cluster over time you will gradually lose the shard data for any closed indices.

In managed environments such as Elastic Cloud nodes may be automatically replaced at any time, which will result in the loss of data held in closed indices. You should not close any indices if your cluster is running in such a a managed environment.

If you want to reduce the overhead of an index while keeping it available for occasional searches, freeze the index instead. If you want to store an index outside of the cluster, use a snapshot.

The REST endpoint is /{index}/_close and /{index}/_open. For example:

POST /my_index/_close

POST /my_index/_open

It is possible to open and close multiple indices. An error will be thrown if the request explicitly refers to a missing index. This behaviour can be disabled using the ignore_unavailable=true parameter.

All indices can be opened or closed at once using _all as the index name or specifying patterns that identify them all (e.g. *).

Identifying indices via wildcards or _all can be disabled by setting the action.destructive_requires_name flag in the config file to true. This setting can also be changed via the cluster update settings api.

Closed indices consume a significant amount of disk-space which can cause problems in managed environments. Closed indices are enabled by default. You can disable closed indices by setting cluster.indices.close.enable to false using the cluster settings API.

Important
Closed indices are ignored by many APIs. For instance, the shards of a closed index are not included in the output of the [cat-shards] API.

Wait For Active Shards

Because opening an index allocates its shards, the wait_for_active_shards setting on index creation applies to the index opening action as well. The default value for the wait_for_active_shards setting on the open index API is 0, which means that the command won’t wait for the shards to be allocated.

Shrink Index

The shrink index API allows you to shrink an existing index into a new index with fewer primary shards. The requested number of primary shards in the target index must be a factor of the number of shards in the source index. For example an index with 8 primary shards can be shrunk into 4, 2 or 1 primary shards or an index with 15 primary shards can be shrunk into 5, 3 or 1. If the number of shards in the index is a prime number it can only be shrunk into a single primary shard. Before shrinking, a (primary or replica) copy of every shard in the index must be present on the same node.

Shrinking works as follows:

  • First, it creates a new target index with the same definition as the source index, but with a smaller number of primary shards.

  • Then it hard-links segments from the source index into the target index. (If the file system doesn’t support hard-linking, then all segments are copied into the new index, which is a much more time consuming process. Also if using multiple data paths, shards on different data paths require a full copy of segment files if they are not on the same disk since hardlinks don’t work across disks)

  • Finally, it recovers the target index as though it were a closed index which had just been re-opened.

Preparing an index for shrinking

In order to shrink an index, the index must be marked as read-only, and a (primary or replica) copy of every shard in the index must be relocated to the same node and have health green.

These two conditions can be achieved with the following request:

PUT /my_source_index/_settings
{
  "settings": {
    "index.routing.allocation.require._name": "shrink_node_name", (1)
    "index.blocks.write": true (2)
  }
}
  1. Forces the relocation of a copy of each shard to the node with name shrink_node_name. See [shard-allocation-filtering] for more options.

  2. Prevents write operations to this index while still allowing metadata changes like deleting the index.

It can take a while to relocate the source index. Progress can be tracked with the _cat recovery API, or the cluster health API can be used to wait until all shards have relocated with the wait_for_no_relocating_shards parameter.

Shrinking an index

To shrink my_source_index into a new index called my_target_index, issue the following request:

POST my_source_index/_shrink/my_target_index?copy_settings=true
{
  "settings": {
    "index.routing.allocation.require._name": null, (1)
    "index.blocks.write": null (2)
  }
}
  1. Clear the allocation requirement copied from the source index.

  2. Clear the index write block copied from the source index.

The above request returns immediately once the target index has been added to the cluster state — it doesn’t wait for the shrink operation to start.

Important

Indices can only be shrunk if they satisfy the following requirements:

  • The target index must not exist.

  • The source index must have more primary shards than the target index.

  • The number of primary shards in the target index must be a factor of the number of primary shards in the source index. The source index must have more primary shards than the target index.

  • The index must not contain more than 2,147,483,519 documents in total across all shards that will be shrunk into a single shard on the target index as this is the maximum number of docs that can fit into a single shard.

  • The node handling the shrink process must have sufficient free disk space to accommodate a second copy of the existing index.

The _shrink API is similar to the create index API and accepts settings and aliases parameters for the target index:

POST my_source_index/_shrink/my_target_index?copy_settings=true
{
  "settings": {
    "index.number_of_replicas": 1,
    "index.number_of_shards": 1, (1)
    "index.codec": "best_compression" (2)
  },
  "aliases": {
    "my_search_indices": {}
  }
}
  1. The number of shards in the target index. This must be a factor of the number of shards in the source index.

  2. Best compression will only take affect when new writes are made to the index, such as when force-merging the shard to a single segment.

Note
Mappings may not be specified in the _shrink request.
Note
By default, with the exception of index.analysis, index.similarity, and index.sort settings, index settings on the source index are not copied during a shrink operation. With the exception of non-copyable settings, settings from the source index can be copied to the target index by adding the URL parameter copy_settings=true to the request. Note that copy_settings can not be set to false. The parameter copy_settings will be removed in 8.0.0

deprecated:[6.4.0, "not copying settings is deprecated, copying settings will be the default behavior in 7.x"]

Monitoring the shrink process

The shrink process can be monitored with the _cat recovery API, or the cluster health API can be used to wait until all primary shards have been allocated by setting the wait_for_status parameter to yellow.

The _shrink API returns as soon as the target index has been added to the cluster state, before any shards have been allocated. At this point, all shards are in the state unassigned. If, for any reason, the target index can’t be allocated on the shrink node, its primary shard will remain unassigned until it can be allocated on that node.

Once the primary shard is allocated, it moves to state initializing, and the shrink process begins. When the shrink operation completes, the shard will become active. At that point, Elasticsearch will try to allocate any replicas and may decide to relocate the primary shard to another node.

Wait For Active Shards

Because the shrink operation creates a new index to shrink the shards to, the wait for active shards setting on index creation applies to the shrink index action as well.

Split Index

The split index API allows you to split an existing index into a new index, where each original primary shard is split into two or more primary shards in the new index.

Important
The _split API requires the source index to be created with a specific number_of_routing_shards in order to be split in the future. This requirement has been removed in Elasticsearch 7.0.

The number of times the index can be split (and the number of shards that each original shard can be split into) is determined by the index.number_of_routing_shards setting. The number of routing shards specifies the hashing space that is used internally to distribute documents across shards with consistent hashing. For instance, a 5 shard index with number_of_routing_shards set to 30 (5 x 2 x 3) could be split by a factor of 2 or 3. In other words, it could be split as follows:

  • 51030 (split by 2, then by 3)

  • 51530 (split by 3, then by 2)

  • 530 (split by 6)

How does splitting work?

Splitting works as follows:

  • First, it creates a new target index with the same definition as the source index, but with a larger number of primary shards.

  • Then it hard-links segments from the source index into the target index. (If the file system doesn’t support hard-linking, then all segments are copied into the new index, which is a much more time consuming process.)

  • Once the low level files are created all documents will be hashed again to delete documents that belong to a different shard.

  • Finally, it recovers the target index as though it were a closed index which had just been re-opened.

Why doesn’t Elasticsearch support incremental resharding?

Going from N shards to N+1 shards, aka. incremental resharding, is indeed a feature that is supported by many key-value stores. Adding a new shard and pushing new data to this new shard only is not an option: this would likely be an indexing bottleneck, and figuring out which shard a document belongs to given its _id, which is necessary for get, delete and update requests, would become quite complex. This means that we need to rebalance existing data using a different hashing scheme.

The most common way that key-value stores do this efficiently is by using consistent hashing. Consistent hashing only requires 1/N-th of the keys to be relocated when growing the number of shards from N to N+1. However Elasticsearch’s unit of storage, shards, are Lucene indices. Because of their search-oriented data structure, taking a significant portion of a Lucene index, be it only 5% of documents, deleting them and indexing them on another shard typically comes with a much higher cost than with a key-value store. This cost is kept reasonable when growing the number of shards by a multiplicative factor as described in the above section: this allows Elasticsearch to perform the split locally, which in-turn allows to perform the split at the index level rather than reindexing documents that need to move, as well as using hard links for efficient file copying.

In the case of append-only data, it is possible to get more flexibility by creating a new index and pushing new data to it, while adding an alias that covers both the old and the new index for read operations. Assuming that the old and new indices have respectively M and N shards, this has no overhead compared to searching an index that would have M+N shards.

Preparing an index for splitting

Create an index with a routing shards factor:

PUT my_source_index
{
    "settings": {
        "index.number_of_shards" : 1,
        "index.number_of_routing_shards" : 2 (1)
    }
}
  1. Allows to split the index into two shards or in other words, it allows for a single split operation.

In order to split an index, the index must be marked as read-only, and have health green.

This can be achieved with the following request:

PUT /my_source_index/_settings
{
  "settings": {
    "index.blocks.write": true (1)
  }
}
  1. Prevents write operations to this index while still allowing metadata changes like deleting the index.

Splitting an index

To split my_source_index into a new index called my_target_index, issue the following request:

POST my_source_index/_split/my_target_index?copy_settings=true
{
  "settings": {
    "index.number_of_shards": 2
  }
}

The above request returns immediately once the target index has been added to the cluster state — it doesn’t wait for the split operation to start.

Important

Indices can only be split if they satisfy the following requirements:

  • the target index must not exist

  • The index must have less primary shards than the target index.

  • The number of primary shards in the target index must be a multiple of the number of primary shards in the source index.

  • The node handling the split process must have sufficient free disk space to accommodate a second copy of the existing index.

The _split API is similar to the create index API and accepts settings and aliases parameters for the target index:

POST my_source_index/_split/my_target_index?copy_settings=true
{
  "settings": {
    "index.number_of_shards": 5 (1)
  },
  "aliases": {
    "my_search_indices": {}
  }
}
  1. The number of shards in the target index. This must be a multiple of the number of shards in the source index.

Note
Mappings may not be specified in the _split request.
Note
By default, with the exception of index.analysis, index.similarity, and index.sort settings, index settings on the source index are not copied during a split operation. With the exception of non-copyable settings, settings from the source index can be copied to the target index by adding the URL parameter copy_settings=true to the request. Note that copy_settings can not be set to false. The parameter copy_settings will be removed in 8.0.0

deprecated:[6.4.0, "not copying settings is deprecated, copying settings will be the default behavior in 7.x"]

Monitoring the split process

The split process can be monitored with the _cat recovery API, or the cluster health API can be used to wait until all primary shards have been allocated by setting the wait_for_status parameter to yellow.

The _split API returns as soon as the target index has been added to the cluster state, before any shards have been allocated. At this point, all shards are in the state unassigned. If, for any reason, the target index can’t be allocated, its primary shard will remain unassigned until it can be allocated on that node.

Once the primary shard is allocated, it moves to state initializing, and the split process begins. When the split operation completes, the shard will become active. At that point, Elasticsearch will try to allocate any replicas and may decide to relocate the primary shard to another node.

Wait For Active Shards

Because the split operation creates a new index to split the shards to, the wait for active shards setting on index creation applies to the split index action as well.

Rollover Index

The rollover index API rolls an alias over to a new index when the existing index is considered to be too large or too old.

The API accepts a single alias name and a list of conditions. The alias must point to a write index for a Rollover request to be valid. There are two ways this can be achieved, and depending on the configuration, the alias metadata will be updated differently. The two scenarios are as follows:

  • The alias only points to a single index with is_write_index not configured (defaults to null).

In this scenario, the original index will have their rollover alias will be added to the newly created index, and removed from the original (rolled-over) index.

  • The alias points to one or more indices with is_write_index set to true on the index to be rolled over (the write index).

In this scenario, the write index will have its rollover alias' is_write_index set to false, while the newly created index will now have the rollover alias pointing to it as the write index with is_write_index as true.

The available conditions are:

Table 1. conditions parameters
Name Description

max_age

The maximum age of the index

max_docs

The maximum number of documents the index should contain. This does not add documents multiple times for replicas

max_size

The maximum estimated size of the primary shard of the index

PUT /logs-000001 (1)
{
  "aliases": {
    "logs_write": {}
  }
}

# Add > 1000 documents to logs-000001

POST /logs_write/_rollover (2)
{
  "conditions": {
    "max_age":   "7d",
    "max_docs":  1000,
    "max_size":  "5gb"
  }
}
  1. Creates an index called logs-0000001 with the alias logs_write.

  2. If the index pointed to by logs_write was created 7 or more days ago, or contains 1,000 or more documents, or has an index size at least around 5GB, then the logs-000002 index is created and the logs_write alias is updated to point to logs-000002.

The above request might return the following response:

{
  "acknowledged": true,
  "shards_acknowledged": true,
  "old_index": "logs-000001",
  "new_index": "logs-000002",
  "rolled_over": true, (1)
  "dry_run": false, (2)
  "conditions": { (3)
    "[max_age: 7d]": false,
    "[max_docs: 1000]": true,
    "[max_size: 5gb]": false,
  }
}
  1. Whether the index was rolled over.

  2. Whether the rollover was dry run.

  3. The result of each condition.

Naming the new index

If the name of the existing index ends with - and a number — e.g. logs-000001 — then the name of the new index will follow the same pattern, incrementing the number (logs-000002). The number is zero-padded with a length of 6, regardless of the old index name.

If the old name doesn’t match this pattern then you must specify the name for the new index as follows:

POST /my_alias/_rollover/my_new_index_name
{
  "conditions": {
    "max_age":   "7d",
    "max_docs":  1000,
    "max_size": "5gb"
  }
}

Using date math with the rollover API

It can be useful to use date math to name the rollover index according to the date that the index rolled over, e.g. logstash-2016.02.03. The rollover API supports date math, but requires the index name to end with a dash followed by a number, e.g. logstash-2016.02.03-1 which is incremented every time the index is rolled over. For instance:

# PUT /<logs-{now/d}-1> with URI encoding:
PUT /%3Clogs-%7Bnow%2Fd%7D-1%3E (1)
{
  "aliases": {
    "logs_write": {}
  }
}

PUT logs_write/_doc/1
{
  "message": "a dummy log"
}

POST logs_write/_refresh

# Wait for a day to pass

POST /logs_write/_rollover (2)
{
  "conditions": {
    "max_docs":   "1"
  }
}
  1. Creates an index named with today’s date (e.g.) logs-2016.10.31-1

  2. Rolls over to a new index with today’s date, e.g. logs-2016.10.31-000002 if run immediately, or logs-2016.11.01-000002 if run after 24 hours

These indices can then be referenced as described in the date math documentation. For example, to search over indices created in the last three days, you could do the following:

# GET /<logs-{now/d}-*>,<logs-{now/d-1d}-*>,<logs-{now/d-2d}-*>/_search
GET /%3Clogs-%7Bnow%2Fd%7D-*%3E%2C%3Clogs-%7Bnow%2Fd-1d%7D-*%3E%2C%3Clogs-%7Bnow%2Fd-2d%7D-*%3E/_search

Defining the new index

The settings, mappings, and aliases for the new index are taken from any matching index templates. Additionally, you can specify settings, mappings, and aliases in the body of the request, just like the create index API. Values specified in the request override any values set in matching index templates. For example, the following rollover request overrides the index.number_of_shards setting:

PUT /logs-000001
{
  "aliases": {
    "logs_write": {}
  }
}

POST /logs_write/_rollover
{
  "conditions" : {
    "max_age": "7d",
    "max_docs": 1000,
    "max_size": "5gb"
  },
  "settings": {
    "index.number_of_shards": 2
  }
}

Dry run

The rollover API supports dry_run mode, where request conditions can be checked without performing the actual rollover:

PUT /logs-000001
{
  "aliases": {
    "logs_write": {}
  }
}

POST /logs_write/_rollover?dry_run
{
  "conditions" : {
    "max_age": "7d",
    "max_docs": 1000,
    "max_size": "5gb"
  }
}

Wait For Active Shards

Because the rollover operation creates a new index to rollover to, the wait_for_active_shards setting on index creation applies to the rollover action as well.

Write Index Alias Behavior

The rollover alias when rolling over a write index that has is_write_index explicitly set to true is not swapped during rollover actions. Since having an alias point to multiple indices is ambiguous in distinguishing which is the correct write index to roll over, it is not valid to rollover an alias that points to multiple indices. For this reason, the default behavior is to swap which index is being pointed to by the write-oriented alias. This was logs_write in some of the above examples. Since setting is_write_index enables an alias to point to multiple indices while also being explicit as to which is the write index that rollover should target, removing the alias from the rolled over index is not necessary. This simplifies things by allowing for one alias to behave both as the write and read aliases for indices that are being managed with Rollover.

Look at the behavior of the aliases in the following example where is_write_index is set on the rolled over index.

PUT my_logs_index-000001
{
  "aliases": {
    "logs": { "is_write_index": true } (1)
  }
}

PUT logs/_doc/1
{
  "message": "a dummy log"
}

POST logs/_refresh

POST /logs/_rollover
{
  "conditions": {
    "max_docs":   "1"
  }
}

PUT logs/_doc/2 (2)
{
  "message": "a newer log"
}
  1. configures my_logs_index as the write index for the logs alias

  2. newly indexed documents against the logs alias will write to the new index

{
  "_index" : "my_logs_index-000002",
  "_type" : "_doc",
  "_id" : "2",
  "_version" : 1,
  "result" : "created",
  "_shards" : {
    "total" : 2,
    "successful" : 1,
    "failed" : 0
  },
  "_seq_no" : 0,
  "_primary_term" : 1
}

After the rollover, the alias metadata for the two indices will have the is_write_index setting reflect each index’s role, with the newly created index as the write index.

{
  "my_logs_index-000002": {
    "aliases": {
      "logs": { "is_write_index": true }
    }
  },
  "my_logs_index-000001": {
    "aliases": {
      "logs": { "is_write_index" : false }
    }
  }
}

Put Mapping

The PUT mapping API allows you to add fields to an existing index or to change search only settings of existing fields.

PUT twitter (1)
{}

PUT twitter/_mapping/_doc (2)
{
  "properties": {
    "email": {
      "type": "keyword"
    }
  }
}
  1. Creates an index called twitter without any type mapping.

  2. Uses the PUT mapping API to add a new field called email to the _doc mapping type.

More information on how to define type mappings can be found in the mapping section.

Multi-index

The PUT mapping API can be applied to multiple indices with a single request. For example, we can update the twitter-1 and twitter-2 mappings at the same time:

# Create the two indices
PUT twitter-1
PUT twitter-2

# Update both mappings
PUT /twitter-1,twitter-2/_mapping/_doc (1)
{
  "properties": {
    "user_name": {
      "type": "text"
    }
  }
}
  1. Note that the indices specified (twitter-1,twitter-2) follows multiple index names and wildcard format.

Note
When updating the default mapping with the PUT mapping API, the new mapping is not merged with the existing mapping. Instead, the new default mapping replaces the existing one.

Updating field mappings

In general, the mapping for existing fields cannot be updated. There are some exceptions to this rule. For instance:

For example:

PUT my_index (1)
{
  "mappings": {
    "_doc": {
      "properties": {
        "name": {
          "properties": {
            "first": {
              "type": "text"
            }
          }
        },
        "user_id": {
          "type": "keyword"
        }
      }
    }
  }
}

PUT my_index/_mapping/_doc
{
  "properties": {
    "name": {
      "properties": {
        "last": { (2)
          "type": "text"
        }
      }
    },
    "user_id": {
      "type": "keyword",
      "ignore_above": 100 (3)
    }
  }
}
  1. Create an index with a first field under the name [object] field, and a user_id field.

  2. Add a last field under the name object field.

  3. Update the ignore_above setting from its default of 0.

Each mapping parameter specifies whether or not its setting can be updated on an existing field.

Skipping types

Types are being removed from Elasticsearch: in 7.0, the mappings element will no longer take the type name as a top-level key by default. You can already opt in for this behavior by setting include_type_name=false and putting mappings directly under mappings in the index creation call, without specifying a type name.

Note
On indices created on Elasticsearch 5.x, such calls will actually introduce or update mappings for the _doc type. It is recommended to avoid calling the put-mapping API with include_type_name=false on 5.x indices.

Here is an example:

PUT my_index?include_type_name=false
{
  "mappings": {
    "properties": {
      "name": {
        "properties": {
          "first": {
            "type": "text"
          }
        }
      },
      "user_id": {
        "type": "keyword"
      }
    }
  }
}

Get Mapping

The get mapping API allows to retrieve mapping definitions for an index or index/type.

GET /twitter/_mapping/_doc

Multiple Indices and Types

The get mapping API can be used to get more than one index or type mapping with a single call. General usage of the API follows the following syntax: host:port/{index}/_mapping/{type} where both {index} and {type} can accept a comma-separated list of names. To get mappings for all indices you can use _all for {index}. The following are some examples:

GET /_mapping/_doc

GET /_all/_mapping/_doc

If you want to get mappings of all indices and types then the following two examples are equivalent:

GET /_all/_mapping

GET /_mapping

Skipping types

Types are being removed from Elasticsearch: in 7.0, the mappings element will no longer return the type name as a top-level key by default. You can already opt in for this behavior by setting include_type_name=false on the request.

Note
Such calls will be rejected on indices that have multiple types as it introduces ambiguity as to which mapping should be returned. Only indices created by Elasticsearch 5.x may have multiple types.

Here is an example:

PUT test?include_type_name=false
{
  "mappings": {
    "properties": {
      "foo": {
        "type": "keyword"
      }
    }
  }
}

GET test/_mappings?include_type_name=false

which returns

{
  "test": {
    "mappings": {
      "properties": {
        "foo": {
          "type": "keyword"
        }
      }
    }
  }
}

Get Field Mapping

The get field mapping API allows you to retrieve mapping definitions for one or more fields. This is useful when you do not need the complete type mapping returned by the Get Mapping API.

For example, consider the following mapping:

PUT publications
{
    "mappings": {
        "_doc": {
            "properties": {
                "id": { "type": "text" },
                "title":  { "type": "text"},
                "abstract": { "type": "text"},
                "author": {
                    "properties": {
                        "id": { "type": "text" },
                        "name": { "type": "text" }
                    }
                }
            }
        }
    }
}

The following returns the mapping of the field title only:

GET publications/_mapping/_doc/field/title

For which the response is:

{
   "publications": {
      "mappings": {
         "_doc": {
            "title": {
               "full_name": "title",
               "mapping": {
                  "title": {
                     "type": "text"
                  }
               }
            }
         }
      }
   }
}

Multiple Indices, Types and Fields

The get field mapping API can be used to get the mapping of multiple fields from more than one index or type with a single call. General usage of the API follows the following syntax: host:port/{index}/{type}/_mapping/field/{field} where {index}, {type} and {field} can stand for comma-separated list of names or wild cards. To get mappings for all indices you can use _all for {index}. The following are some examples:

GET /twitter,kimchy/_mapping/field/message

GET /_all/_mapping/_doc/field/message,user.id

GET /_all/_mapping/_doc/field/*.id

Specifying fields

The get mapping api allows you to specify a comma-separated list of fields.

For instance to select the id of the author field, you must use its full name author.id.

GET publications/_mapping/_doc/field/author.id,abstract,name

returns:

{
   "publications": {
      "mappings": {
         "_doc": {
            "author.id": {
               "full_name": "author.id",
               "mapping": {
                  "id": {
                     "type": "text"
                  }
               }
            },
            "abstract": {
               "full_name": "abstract",
               "mapping": {
                  "abstract": {
                     "type": "text"
                  }
               }
            }
         }
      }
   }
}

The get field mapping API also supports wildcard notation.

GET publications/_mapping/_doc/field/a*

returns:

{
   "publications": {
      "mappings": {
         "_doc": {
            "author.name": {
               "full_name": "author.name",
               "mapping": {
                  "name": {
                     "type": "text"
                  }
               }
            },
            "abstract": {
               "full_name": "abstract",
               "mapping": {
                  "abstract": {
                     "type": "text"
                  }
               }
            },
            "author.id": {
               "full_name": "author.id",
               "mapping": {
                  "id": {
                     "type": "text"
                  }
               }
            }
         }
      }
   }
}

Other options

include_defaults

adding include_defaults=true to the query string will cause the response to include default values, which are normally suppressed.

Types Exists

Used to check if a type/types exists in an index/indices.

HEAD twitter/_mapping/tweet

The HTTP status code indicates if the type exists or not. A 404 means it does not exist, and 200 means it does.

Index Aliases

APIs in Elasticsearch accept an index name when working against a specific index, and several indices when applicable. The index aliases API allows aliasing an index with a name, with all APIs automatically converting the alias name to the actual index name. An alias can also be mapped to more than one index, and when specifying it, the alias will automatically expand to the aliased indices. An alias can also be associated with a filter that will automatically be applied when searching, and routing values. An alias cannot have the same name as an index.

Here is a sample of associating the alias alias1 with index test1:

POST /_aliases
{
    "actions" : [
        { "add" : { "index" : "test1", "alias" : "alias1" } }
    ]
}

And here is removing that same alias:

POST /_aliases
{
    "actions" : [
        { "remove" : { "index" : "test1", "alias" : "alias1" } }
    ]
}

Renaming an alias is a simple remove then add operation within the same API. This operation is atomic, no need to worry about a short period of time where the alias does not point to an index:

POST /_aliases
{
    "actions" : [
        { "remove" : { "index" : "test1", "alias" : "alias1" } },
        { "add" : { "index" : "test2", "alias" : "alias1" } }
    ]
}

Associating an alias with more than one index is simply several add actions:

POST /_aliases
{
    "actions" : [
        { "add" : { "index" : "test1", "alias" : "alias1" } },
        { "add" : { "index" : "test2", "alias" : "alias1" } }
    ]
}

Multiple indices can be specified for an action with the indices array syntax:

POST /_aliases
{
    "actions" : [
        { "add" : { "indices" : ["test1", "test2"], "alias" : "alias1" } }
    ]
}

To specify multiple aliases in one action, the corresponding aliases array syntax exists as well.

For the example above, a glob pattern can also be used to associate an alias to more than one index that share a common name:

POST /_aliases
{
    "actions" : [
        { "add" : { "index" : "test*", "alias" : "all_test_indices" } }
    ]
}

In this case, the alias is a point-in-time alias that will group all current indices that match, it will not automatically update as new indices that match this pattern are added/removed.

It is an error to index to an alias which points to more than one index.

It is also possible to swap an index with an alias in one operation:

PUT test     (1)
PUT test_2   (2)
POST /_aliases
{
    "actions" : [
        { "add":  { "index": "test_2", "alias": "test" } },
        { "remove_index": { "index": "test" } }  (3)
    ]
}
  1. An index we’ve added by mistake

  2. The index we should have added

  3. remove_index is just like Delete Index

Filtered Aliases

Aliases with filters provide an easy way to create different "views" of the same index. The filter can be defined using Query DSL and is applied to all Search, Count, Delete By Query and More Like This operations with this alias.

To create a filtered alias, first we need to ensure that the fields already exist in the mapping:

PUT /test1
{
  "mappings": {
    "_doc": {
      "properties": {
        "user" : {
          "type": "keyword"
        }
      }
    }
  }
}

Now we can create an alias that uses a filter on field user:

POST /_aliases
{
    "actions" : [
        {
            "add" : {
                 "index" : "test1",
                 "alias" : "alias2",
                 "filter" : { "term" : { "user" : "kimchy" } }
            }
        }
    ]
}

Routing

It is possible to associate routing values with aliases. This feature can be used together with filtering aliases in order to avoid unnecessary shard operations.

The following command creates a new alias alias1 that points to index test. After alias1 is created, all operations with this alias are automatically modified to use value 1 for routing:

POST /_aliases
{
    "actions" : [
        {
            "add" : {
                 "index" : "test",
                 "alias" : "alias1",
                 "routing" : "1"
            }
        }
    ]
}

It’s also possible to specify different routing values for searching and indexing operations:

POST /_aliases
{
    "actions" : [
        {
            "add" : {
                 "index" : "test",
                 "alias" : "alias2",
                 "search_routing" : "1,2",
                 "index_routing" : "2"
            }
        }
    ]
}

As shown in the example above, search routing may contain several values separated by comma. Index routing can contain only a single value.

If a search operation that uses routing alias also has a routing parameter, an intersection of both search alias routing and routing specified in the parameter is used. For example the following command will use "2" as a routing value:

GET /alias2/_search?q=user:kimchy&routing=2,3

Write Index

It is possible to associate the index pointed to by an alias as the write index. When specified, all index and update requests against an alias that point to multiple indices will attempt to resolve to the one index that is the write index. Only one index per alias can be assigned to be the write index at a time. If no write index is specified and there are multiple indices referenced by an alias, then writes will not be allowed.

It is possible to specify an index associated with an alias as a write index using both the aliases API and index creation API.

Setting an index to be the write index with an alias also affects how the alias is manipulated during Rollover (see Rollover With Write Index).

POST /_aliases
{
    "actions" : [
        {
            "add" : {
                 "index" : "test",
                 "alias" : "alias1",
                 "is_write_index" : true
            }
        },
        {
            "add" : {
                 "index" : "test2",
                 "alias" : "alias1"
            }
        }
    ]
}

In this example, we associate the alias alias1 to both test and test2, where test will be the index chosen for writing to.

PUT /alias1/_doc/1
{
    "foo": "bar"
}

The new document that was indexed to /alias1/_doc/1 will be indexed as if it were /test/_doc/1.

GET /test/_doc/1

To swap which index is the write index for an alias, the Aliases API can be leveraged to do an atomic swap. The swap is not dependent on the ordering of the actions.

POST /_aliases
{
    "actions" : [
        {
            "add" : {
                 "index" : "test",
                 "alias" : "alias1",
                 "is_write_index" : false
            }
        }, {
            "add" : {
                 "index" : "test2",
                 "alias" : "alias1",
                 "is_write_index" : true
            }
        }
    ]
}
Important

Aliases that do not explicitly set is_write_index: true for an index, and only reference one index, will have that referenced index behave as if it is the write index until an additional index is referenced. At that point, there will be no write index and writes will be rejected.

Add a single alias

An alias can also be added with the endpoint

PUT /{index}/_alias/{name}

where

index

The index the alias refers to. Can be any of * | _all | glob pattern | name1, name2, …

name

The name of the alias. This is a required option.

routing

An optional routing that can be associated with an alias.

filter

An optional filter that can be associated with an alias.

You can also use the plural _aliases.

Examples:

Adding time based alias
PUT /logs_201305/_alias/2013
Adding a user alias

First create the index and add a mapping for the user_id field:

PUT /users
{
    "mappings" : {
        "_doc" : {
            "properties" : {
                "user_id" : {"type" : "integer"}
            }
        }
    }
}

Then add the alias for a specific user:

PUT /users/_alias/user_12
{
    "routing" : "12",
    "filter" : {
        "term" : {
            "user_id" : 12
        }
    }
}

Aliases during index creation

Aliases can also be specified during index creation:

PUT /logs_20162801
{
    "mappings" : {
        "_doc" : {
            "properties" : {
                "year" : {"type" : "integer"}
            }
        }
    },
    "aliases" : {
        "current_day" : {},
        "2016" : {
            "filter" : {
                "term" : {"year" : 2016 }
            }
        }
    }
}

Delete aliases

The rest endpoint is: /{index}/_alias/{name}

where

index

* | _all | glob pattern | name1, name2, …

name

* | _all | glob pattern | name1, name2, …

Alternatively you can use the plural _aliases. Example:

DELETE /logs_20162801/_alias/current_day

Retrieving existing aliases

The get index alias API allows to filter by alias name and index name. This api redirects to the master and fetches the requested index aliases, if available. This api only serialises the found index aliases.

Possible options:

index

The index name to get aliases for. Partial names are supported via wildcards, also multiple index names can be specified separated with a comma. Also the alias name for an index can be used.

alias

The name of alias to return in the response. Like the index option, this option supports wildcards and the option the specify multiple alias names separated by a comma.

ignore_unavailable

What to do if an specified index name doesn’t exist. If set to true then those indices are ignored.

The rest endpoint is: /{index}/_alias/{alias}.

Examples:

All aliases for the index logs_20162801:

GET /logs_20162801/_alias/*

Response:

{
 "logs_20162801" : {
   "aliases" : {
     "2016" : {
       "filter" : {
         "term" : {
           "year" : 2016
         }
       }
     }
   }
 }
}

All aliases with the name 2016 in any index:

GET /_alias/2016

Response:

{
  "logs_20162801" : {
    "aliases" : {
      "2016" : {
        "filter" : {
          "term" : {
            "year" : 2016
          }
        }
      }
    }
  }
}

All aliases that start with 20 in any index:

GET /_alias/20*

Response:

{
  "logs_20162801" : {
    "aliases" : {
      "2016" : {
        "filter" : {
          "term" : {
            "year" : 2016
          }
        }
      }
    }
  }
}

There is also a HEAD variant of the get indices aliases api to check if index aliases exist. The indices aliases exists api supports the same option as the get indices aliases api. Examples:

HEAD /_alias/2016
HEAD /_alias/20*
HEAD /logs_20162801/_alias/*

Update Indices Settings

Change specific index level settings in real time.

The REST endpoint is /_settings (to update all indices) or {index}/_settings to update one (or more) indices settings. The body of the request includes the updated settings, for example:

PUT /twitter/_settings
{
    "index" : {
        "number_of_replicas" : 2
    }
}

To reset a setting back to the default value, use null. For example:

PUT /twitter/_settings
{
    "index" : {
        "refresh_interval" : null
    }
}

The list of per-index settings which can be updated dynamically on live indices can be found in [index-modules]. To preserve existing settings from being updated, the preserve_existing request parameter can be set to true.

Bulk Indexing Usage

For example, the update settings API can be used to dynamically change the index from being more performant for bulk indexing, and then move it to more real time indexing state. Before the bulk indexing is started, use:

PUT /twitter/_settings
{
    "index" : {
        "refresh_interval" : "-1"
    }
}

(Another optimization option is to start the index without any replicas, and only later adding them, but that really depends on the use case).

Then, once bulk indexing is done, the settings can be updated (back to the defaults for example):

PUT /twitter/_settings
{
    "index" : {
        "refresh_interval" : "1s"
    }
}

And, a force merge should be called:

POST /twitter/_forcemerge?max_num_segments=5

Updating Index Analysis

It is also possible to define new analyzers for the index. But it is required to close the index first and open it after the changes are made.

For example if content analyzer hasn’t been defined on myindex yet you can use the following commands to add it:

POST /twitter/_close

PUT /twitter/_settings
{
  "analysis" : {
    "analyzer":{
      "content":{
        "type":"custom",
        "tokenizer":"whitespace"
      }
    }
  }
}

POST /twitter/_open

Get Settings

The get settings API allows to retrieve settings of index/indices:

GET /twitter/_settings

Multiple Indices and Types

The get settings API can be used to get settings for more than one index with a single call. General usage of the API follows the following syntax: host:port/{index}/_settings where {index} can stand for comma-separated list of index names and aliases. To get settings for all indices you can use _all for {index}. Wildcard expressions are also supported. The following are some examples:

GET /twitter,kimchy/_settings

GET /_all/_settings

GET /log_2013_*/_settings

Filtering settings by name

The settings that are returned can be filtered with wildcard matching as follows:

GET /log_2013_-*/_settings/index.number_*

Analyze

Performs the analysis process on a text and return the tokens breakdown of the text.

Can be used without specifying an index against one of the many built in analyzers:

GET _analyze
{
  "analyzer" : "standard",
  "text" : "this is a test"
}

If text parameter is provided as array of strings, it is analyzed as a multi-valued field.

GET _analyze
{
  "analyzer" : "standard",
  "text" : ["this is a test", "the second text"]
}

Or by building a custom transient analyzer out of tokenizers, token filters and char filters. Token filters can use the shorter 'filter' parameter name:

GET _analyze
{
  "tokenizer" : "keyword",
  "filter" : ["lowercase"],
  "text" : "this is a test"
}
GET _analyze
{
  "tokenizer" : "keyword",
  "filter" : ["lowercase"],
  "char_filter" : ["html_strip"],
  "text" : "this is a <b>test</b>"
}

deprecated[5.0.0, Use filter/char_filter instead of filters/char_filters and token_filters has been removed]

Custom tokenizers, token filters, and character filters can be specified in the request body as follows:

GET _analyze
{
  "tokenizer" : "whitespace",
  "filter" : ["lowercase", {"type": "stop", "stopwords": ["a", "is", "this"]}],
  "text" : "this is a test"
}

It can also run against a specific index:

GET analyze_sample/_analyze
{
  "text" : "this is a test"
}

The above will run an analysis on the "this is a test" text, using the default index analyzer associated with the analyze_sample index. An analyzer can also be provided to use a different analyzer:

GET analyze_sample/_analyze
{
  "analyzer" : "whitespace",
  "text" : "this is a test"
}

Also, the analyzer can be derived based on a field mapping, for example:

GET analyze_sample/_analyze
{
  "field" : "obj1.field1",
  "text" : "this is a test"
}

Will cause the analysis to happen based on the analyzer configured in the mapping for obj1.field1 (and if not, the default index analyzer).

A normalizer can be provided for keyword field with normalizer associated with the analyze_sample index.

GET analyze_sample/_analyze
{
  "normalizer" : "my_normalizer",
  "text" : "BaR"
}

Or by building a custom transient normalizer out of token filters and char filters.

GET _analyze
{
  "filter" : ["lowercase"],
  "text" : "BaR"
}

Explain Analyze

If you want to get more advanced details, set explain to true (defaults to false). It will output all token attributes for each token. You can filter token attributes you want to output by setting attributes option.

Note
The format of the additional detail information is labelled as experimental in Lucene and it may change in the future.
GET _analyze
{
  "tokenizer" : "standard",
  "filter" : ["snowball"],
  "text" : "detailed output",
  "explain" : true,
  "attributes" : ["keyword"] (1)
}
  1. Set "keyword" to output "keyword" attribute only

The request returns the following result:

{
  "detail" : {
    "custom_analyzer" : true,
    "charfilters" : [ ],
    "tokenizer" : {
      "name" : "standard",
      "tokens" : [ {
        "token" : "detailed",
        "start_offset" : 0,
        "end_offset" : 8,
        "type" : "<ALPHANUM>",
        "position" : 0
      }, {
        "token" : "output",
        "start_offset" : 9,
        "end_offset" : 15,
        "type" : "<ALPHANUM>",
        "position" : 1
      } ]
    },
    "tokenfilters" : [ {
      "name" : "snowball",
      "tokens" : [ {
        "token" : "detail",
        "start_offset" : 0,
        "end_offset" : 8,
        "type" : "<ALPHANUM>",
        "position" : 0,
        "keyword" : false (1)
      }, {
        "token" : "output",
        "start_offset" : 9,
        "end_offset" : 15,
        "type" : "<ALPHANUM>",
        "position" : 1,
        "keyword" : false (1)
      } ]
    } ]
  }
}
  1. Output only "keyword" attribute, since specify "attributes" in the request.

Index Templates

Index templates allow you to define templates that will automatically be applied when new indices are created. The templates include both settings and mappings and a simple pattern template that controls whether the template should be applied to the new index.

Note
Templates are only applied at index creation time. Changing a template will have no impact on existing indices. When using the create index API, the settings/mappings defined as part of the create index call will take precedence over any matching settings/mappings defined in the template.

For example:

PUT _template/template_1
{
  "index_patterns": ["te*", "bar*"],
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "_doc": {
      "_source": {
        "enabled": false
      },
      "properties": {
        "host_name": {
          "type": "keyword"
        },
        "created_at": {
          "type": "date",
          "format": "EEE MMM dd HH:mm:ss Z yyyy"
        }
      }
    }
  }
}
Note
Index templates provide C-style /* */ block comments. Comments are allowed everywhere in the JSON document except before the initial opening curly bracket.

Defines a template named template_1, with a template pattern of te* or bar*. The settings and mappings will be applied to any index name that matches the te* or bar* pattern.

It is also possible to include aliases in an index template as follows:

PUT _template/template_1
{
    "index_patterns" : ["te*"],
    "settings" : {
        "number_of_shards" : 1
    },
    "aliases" : {
        "alias1" : {},
        "alias2" : {
            "filter" : {
                "term" : {"user" : "kimchy" }
            },
            "routing" : "kimchy"
        },
        "{index}-alias" : {} (1)
    }
}
  1. the {index} placeholder in the alias name will be replaced with the actual index name that the template gets applied to, during index creation.

Deleting a Template

Index templates are identified by a name (in the above case template_1) and can be deleted as well:

DELETE /_template/template_1

Getting templates

Index templates are identified by a name (in the above case template_1) and can be retrieved using the following:

GET /_template/template_1

You can also match several templates by using wildcards like:

GET /_template/temp*
GET /_template/template_1,template_2

To get list of all index templates you can run:

GET /_template

Template exists

Used to check if the template exists or not. For example:

HEAD _template/template_1

The HTTP status code indicates if the template with the given name exists or not. Status code 200 means it exists and 404 means it does not.

Multiple Templates Matching

Multiple index templates can potentially match an index, in this case, both the settings and mappings are merged into the final configuration of the index. The order of the merging can be controlled using the order parameter, with lower order being applied first, and higher orders overriding them. For example:

PUT /_template/template_1
{
    "index_patterns" : ["*"],
    "order" : 0,
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "_doc" : {
            "_source" : { "enabled" : false }
        }
    }
}

PUT /_template/template_2
{
    "index_patterns" : ["te*"],
    "order" : 1,
    "settings" : {
        "number_of_shards" : 1
    },
    "mappings" : {
        "_doc" : {
            "_source" : { "enabled" : true }
        }
    }
}

The above will disable storing the _source, but for indices that start with te*, _source will still be enabled. Note, for mappings, the merging is "deep", meaning that specific object/property based mappings can easily be added/overridden on higher order templates, with lower order templates providing the basis.

Note
Multiple matching templates with the same order value will result in a non-deterministic merging order.

Template Versioning

Templates can optionally add a version number, which can be any integer value, in order to simplify template management by external systems. The version field is completely optional and it is meant solely for external management of templates. To unset a version, simply replace the template without specifying one.

PUT /_template/template_1
{
    "index_patterns" : ["*"],
    "order" : 0,
    "settings" : {
        "number_of_shards" : 1
    },
    "version": 123
}

To check the version, you can filter responses using filter_path to limit the response to just the version:

GET /_template/template_1?filter_path=*.version

This should give a small response that makes it both easy and inexpensive to parse:

{
  "template_1" : {
    "version" : 123
  }
}

Indices Stats

Indices level stats provide statistics on different operations happening on an index. The API provides statistics on the index level scope (though most stats can also be retrieved using node level scope).

The following returns high level aggregation and index level stats for all indices:

GET /_stats

Specific index stats can be retrieved using:

GET /index1,index2/_stats

By default, all stats are returned, returning only specific stats can be specified as well in the URI. Those stats can be any of:

docs

The number of docs / deleted docs (docs not yet merged out). Note, affected by refreshing the index.

store

The size of the index.

indexing

Indexing statistics, can be combined with a comma separated list of types to provide document type level stats.

get

Get statistics, including missing stats.

search

Search statistics including suggest statistics. You can include statistics for custom groups by adding an extra groups parameter (search operations can be associated with one or more groups). The groups parameter accepts a comma separated list of group names. Use _all to return statistics for all groups.

segments

Retrieve the memory use of the open segments. Optionally, setting the include_segment_file_sizes flag, report the aggregated disk usage of each one of the Lucene index files.

completion

Completion suggest statistics.

fielddata

Fielddata statistics.

flush

Flush statistics.

merge

Merge statistics.

request_cache

Shard request cache statistics.

refresh

Refresh statistics.

warmer

Warmer statistics.

translog

Translog statistics.

Some statistics allow per field granularity which accepts a list comma-separated list of included fields. By default all fields are included:

fields

List of fields to be included in the statistics. This is used as the default list unless a more specific field list is provided (see below).

completion_fields

List of fields to be included in the Completion Suggest statistics.

fielddata_fields

List of fields to be included in the Fielddata statistics.

Here are some samples:

# Get back stats for merge and refresh only for all indices
GET /_stats/merge,refresh
# Get back stats for type1 and type2 documents for the my_index index
GET /my_index/_stats/indexing?types=type1,type2
# Get back just search stats for group1 and group2
GET /_stats/search?groups=group1,group2

The stats returned are aggregated on the index level, with primaries and total aggregations, where primaries are the values for only the primary shards, and total are the accumulated values for both primary and replica shards.

In order to get back shard level stats, set the level parameter to shards.

Note, as shards move around the cluster, their stats will be cleared as they are created on other nodes. On the other hand, even though a shard "left" a node, that node will still retain the stats that shard contributed to.

Indices Segments

Provide low level segments information that a Lucene index (shard level) is built with. Allows to be used to provide more information on the state of a shard and an index, possibly optimization information, data "wasted" on deletes, and so on.

Endpoints include segments for a specific index:

GET /test/_segments

For several indices:

GET /test1,test2/_segments

Or for all indices:

GET /_segments

Response:

{
  "_shards": ...
  "indices": {
    "test": {
      "shards": {
        "0": [
          {
            "routing": {
              "state": "STARTED",
              "primary": true,
              "node": "zDC_RorJQCao9xf9pg3Fvw"
            },
            "num_committed_segments": 0,
            "num_search_segments": 1,
            "segments": {
              "_0": {
                "generation": 0,
                "num_docs": 1,
                "deleted_docs": 0,
                "size_in_bytes": 3800,
                "memory_in_bytes": 1410,
                "committed": false,
                "search": true,
                "version": "7.0.0",
                "compound": true,
                "attributes": {
                }
              }
            }
          }
        ]
      }
    }
  }
}
_0

The key of the JSON document is the name of the segment. This name is used to generate file names: all files starting with this segment name in the directory of the shard belong to this segment.

generation

A generation number that is basically incremented when needing to write a new segment. The segment name is derived from this generation number.

num_docs

The number of non-deleted documents that are stored in this segment.

deleted_docs

The number of deleted documents that are stored in this segment. It is perfectly fine if this number is greater than 0, space is going to be reclaimed when this segment gets merged.

size_in_bytes

The amount of disk space that this segment uses, in bytes.

memory_in_bytes

Segments need to store some data into memory in order to be searchable efficiently. This number returns the number of bytes that are used for that purpose. A value of -1 indicates that Elasticsearch was not able to compute this number.

committed

Whether the segment has been sync’ed on disk. Segments that are committed would survive a hard reboot. No need to worry in case of false, the data from uncommitted segments is also stored in the transaction log so that Elasticsearch is able to replay changes on the next start.

search

Whether the segment is searchable. A value of false would most likely mean that the segment has been written to disk but no refresh occurred since then to make it searchable.

version

The version of Lucene that has been used to write this segment.

compound

Whether the segment is stored in a compound file. When true, this means that Lucene merged all files from the segment in a single one in order to save file descriptors.

attributes

Contains information about whether high compression was enabled

Verbose mode

To add additional information that can be used for debugging, use the verbose flag.

Note
The format of the additional detail information is labelled as experimental in Lucene and it may change in the future.
GET /test/_segments?verbose=true

Response:

{
    ...
        "_0": {
            ...
            "ram_tree": [
                {
                    "description": "postings [PerFieldPostings(format=1)]",
                    "size_in_bytes": 2696,
                    "children": [
                        {
                            "description": "format 'Lucene50_0' ...",
                            "size_in_bytes": 2608,
                            "children" :[ ... ]
                        },
                        ...
                    ]
                },
                ...
                ]

        }
    ...
}

Indices Recovery

The indices recovery API provides insight into on-going index shard recoveries. Recovery status may be reported for specific indices, or cluster-wide.

For example, the following command would show recovery information for the indices "index1" and "index2".

GET index1,index2/_recovery?human

To see cluster-wide recovery status simply leave out the index names.

GET /_recovery?human

Response:

{
  "index1" : {
    "shards" : [ {
      "id" : 0,
      "type" : "SNAPSHOT",
      "stage" : "INDEX",
      "primary" : true,
      "start_time" : "2014-02-24T12:15:59.716",
      "start_time_in_millis": 1393244159716,
      "stop_time" : "0s",
      "stop_time_in_millis" : 0,
      "total_time" : "2.9m",
      "total_time_in_millis" : 175576,
      "source" : {
        "repository" : "my_repository",
        "snapshot" : "my_snapshot",
        "index" : "index1",
        "version" : "{version}",
        "restoreUUID": "PDh1ZAOaRbiGIVtCvZOMww"
      },
      "target" : {
        "id" : "ryqJ5lO5S4-lSFbGntkEkg",
        "host" : "my.fqdn",
        "transport_address" : "my.fqdn",
        "ip" : "10.0.1.7",
        "name" : "my_es_node"
      },
      "index" : {
        "size" : {
          "total" : "75.4mb",
          "total_in_bytes" : 79063092,
          "reused" : "0b",
          "reused_in_bytes" : 0,
          "recovered" : "65.7mb",
          "recovered_in_bytes" : 68891939,
          "percent" : "87.1%"
        },
        "files" : {
          "total" : 73,
          "reused" : 0,
          "recovered" : 69,
          "percent" : "94.5%"
        },
        "total_time" : "0s",
        "total_time_in_millis" : 0,
        "source_throttle_time" : "0s",
        "source_throttle_time_in_millis" : 0,
        "target_throttle_time" : "0s",
        "target_throttle_time_in_millis" : 0
      },
      "translog" : {
        "recovered" : 0,
        "total" : 0,
        "percent" : "100.0%",
        "total_on_start" : 0,
        "total_time" : "0s",
        "total_time_in_millis" : 0,
      },
      "verify_index" : {
        "check_index_time" : "0s",
        "check_index_time_in_millis" : 0,
        "total_time" : "0s",
        "total_time_in_millis" : 0
      }
    } ]
  }
}

The above response shows a single index recovering a single shard. In this case, the source of the recovery is a snapshot repository and the target of the recovery is the node with name "my_es_node".

Additionally, the output shows the number and percent of files recovered, as well as the number and percent of bytes recovered.

In some cases a higher level of detail may be preferable. Setting "detailed=true" will present a list of physical files in recovery.

GET _recovery?human&detailed=true

Response:

{
  "index1" : {
    "shards" : [ {
      "id" : 0,
      "type" : "STORE",
      "stage" : "DONE",
      "primary" : true,
      "start_time" : "2014-02-24T12:38:06.349",
      "start_time_in_millis" : "1393245486349",
      "stop_time" : "2014-02-24T12:38:08.464",
      "stop_time_in_millis" : "1393245488464",
      "total_time" : "2.1s",
      "total_time_in_millis" : 2115,
      "source" : {
        "id" : "RGMdRc-yQWWKIBM4DGvwqQ",
        "host" : "my.fqdn",
        "transport_address" : "my.fqdn",
        "ip" : "10.0.1.7",
        "name" : "my_es_node"
      },
      "target" : {
        "id" : "RGMdRc-yQWWKIBM4DGvwqQ",
        "host" : "my.fqdn",
        "transport_address" : "my.fqdn",
        "ip" : "10.0.1.7",
        "name" : "my_es_node"
      },
      "index" : {
        "size" : {
          "total" : "24.7mb",
          "total_in_bytes" : 26001617,
          "reused" : "24.7mb",
          "reused_in_bytes" : 26001617,
          "recovered" : "0b",
          "recovered_in_bytes" : 0,
          "percent" : "100.0%"
        },
        "files" : {
          "total" : 26,
          "reused" : 26,
          "recovered" : 0,
          "percent" : "100.0%",
          "details" : [ {
            "name" : "segments.gen",
            "length" : 20,
            "recovered" : 20
          }, {
            "name" : "_0.cfs",
            "length" : 135306,
            "recovered" : 135306
          }, {
            "name" : "segments_2",
            "length" : 251,
            "recovered" : 251
          }
          ]
        },
        "total_time" : "2ms",
        "total_time_in_millis" : 2,
        "source_throttle_time" : "0s",
        "source_throttle_time_in_millis" : 0,
        "target_throttle_time" : "0s",
        "target_throttle_time_in_millis" : 0
      },
      "translog" : {
        "recovered" : 71,
        "total" : 0,
        "percent" : "100.0%",
        "total_on_start" : 0,
        "total_time" : "2.0s",
        "total_time_in_millis" : 2025
      },
      "verify_index" : {
        "check_index_time" : 0,
        "check_index_time_in_millis" : 0,
        "total_time" : "88ms",
        "total_time_in_millis" : 88
      }
    } ]
  }
}

This response shows a detailed listing (truncated for brevity) of the actual files recovered and their sizes.

Also shown are the timings in milliseconds of the various stages of recovery: index retrieval, translog replay, and index start time.

Note that the above listing indicates that the recovery is in stage "done". All recoveries, whether on-going or complete, are kept in cluster state and may be reported on at any time. Setting "active_only=true" will cause only on-going recoveries to be reported.

Here is a complete list of options:

detailed

Display a detailed view. This is primarily useful for viewing the recovery of physical index files. Default: false.

active_only

Display only those recoveries that are currently on-going. Default: false.

Description of output fields:

id

Shard ID

type

Recovery type:

  • store

  • snapshot

  • replica

  • relocating

stage

Recovery stage:

  • init: Recovery has not started

  • index: Reading index meta-data and copying bytes from source to destination

  • start: Starting the engine; opening the index for use

  • translog: Replaying transaction log

  • finalize: Cleanup

  • done: Complete

primary

True if shard is primary, false otherwise

start_time

Timestamp of recovery start

stop_time

Timestamp of recovery finish

total_time_in_millis

Total time to recover shard in milliseconds

source

Recovery source:

  • repository description if recovery is from a snapshot

  • description of source node otherwise

target

Destination node

index

Statistics about physical index recovery

translog

Statistics about translog recovery

start

Statistics about time to open and start the index

Indices Shard Stores

Provides store information for shard copies of indices. Store information reports on which nodes shard copies exist, the shard copy allocation ID, a unique identifier for each shard copy, and any exceptions encountered while opening the shard index or from earlier engine failure.

By default, only lists store information for shards that have at least one unallocated copy. When the cluster health status is yellow, this will list store information for shards that have at least one unassigned replica. When the cluster health status is red, this will list store information for shards, which has unassigned primaries.

Endpoints include shard stores information for a specific index, several indices, or all:

# return information of only index test
GET /test/_shard_stores

# return information of only test1 and test2 indices
GET /test1,test2/_shard_stores

# return information of all indices
GET /_shard_stores

The scope of shards to list store information can be changed through status param. Defaults to 'yellow' and 'red'. 'yellow' lists store information of shards with at least one unassigned replica and 'red' for shards with unassigned primary shard. Use 'green' to list store information for shards with all assigned copies.

GET /_shard_stores?status=green

Response:

The shard stores information is grouped by indices and shard ids.

{
   "indices": {
       "my-index": {
           "shards": {
              "0": { (1)
                "stores": [ (2)
                    {
                        "sPa3OgxLSYGvQ4oPs-Tajw": { (3)
                            "name": "node_t0",
                            "ephemeral_id" : "9NlXRFGCT1m8tkvYCMK-8A",
                            "transport_address": "local[1]",
                            "attributes": {}
                        },
                        "allocation_id": "2iNySv_OQVePRX-yaRH_lQ", (4)
                        "allocation" : "primary|replica|unused" (5)
                        "store_exception": ... (6)
                    }
                ]
              }
           }
       }
   }
}
  1. The key is the corresponding shard id for the store information

  2. A list of store information for all copies of the shard

  3. The node information that hosts a copy of the store, the key is the unique node id.

  4. The allocation id of the store copy

  5. The status of the store copy, whether it is used as a primary, replica or not used at all

  6. Any exception encountered while opening the shard index or from earlier engine failure

Clear Cache

The clear cache API allows to clear either all caches or specific cached associated with one or more indices.

POST /twitter/_cache/clear

The API, by default, will clear all caches. Specific caches can be cleaned explicitly by setting the query, fielddata or request url parameter to true.

POST /twitter/_cache/clear?query=true      (1)
POST /twitter/_cache/clear?request=true    (2)
POST /twitter/_cache/clear?fielddata=true   (3)
  1. Cleans only the query cache

  2. Cleans only the request cache

  3. Cleans only the fielddata cache

In addition to this, all caches relating to a specific field can also be cleared by specifying fields url parameter with a comma delimited list of the fields that should be cleared. Note that the provided names must refer to concrete fields — objects and field aliases are not supported.

POST /twitter/_cache/clear?fields=foo,bar   (1)
  1. Clear the cache for the foo an bar field

Multi Index

The clear cache API can be applied to more than one index with a single call, or even on _all the indices.

POST /kimchy,elasticsearch/_cache/clear

POST /_cache/clear

Flush

The flush API allows to flush one or more indices through an API. The flush process of an index makes sure that any data that is currently only persisted in the transaction log is also permanently persisted in Lucene. This reduces recovery times as that data doesn’t need to be reindexed from the transaction logs after the Lucene indexed is opened. By default, Elasticsearch uses heuristics in order to automatically trigger flushes as required. It is rare for users to need to call the API directly.

POST twitter/_flush

Request Parameters

The flush API accepts the following request parameters:

wait_if_ongoing

If set to true the flush operation will block until the flush can be executed if another flush operation is already executing. The default is false and will cause an exception to be thrown on the shard level if another flush operation is already running.

force

Whether a flush should be forced even if it is not necessarily needed i.e. if no changes will be committed to the index. This is useful if transaction log IDs should be incremented even if no uncommitted changes are present. (This setting can be considered as internal)

Multi Index

The flush API can be applied to more than one index with a single call, or even on _all the indices.

POST kimchy,elasticsearch/_flush

POST _flush

Synced Flush

Elasticsearch tracks the indexing activity of each shard. Shards that have not received any indexing operations for 5 minutes are automatically marked as inactive. This presents an opportunity for Elasticsearch to reduce shard resources and also perform a special kind of flush, called synced flush. A synced flush performs a normal flush, then adds a generated unique marker (sync_id) to all shards.

Since the sync id marker was added when there were no ongoing indexing operations, it can be used as a quick way to check if the two shards' lucene indices are identical. This quick sync id comparison (if present) is used during recovery or restarts to skip the first and most costly phase of the process. In that case, no segment files need to be copied and the transaction log replay phase of the recovery can start immediately. Note that since the sync id marker was applied together with a flush, it is very likely that the transaction log will be empty, speeding up recoveries even more.

This is particularly useful for use cases having lots of indices which are never or very rarely updated, such as time based data. This use case typically generates lots of indices whose recovery without the synced flush marker would take a long time.

To check whether a shard has a marker or not, look for the commit section of shard stats returned by the indices stats API:

GET twitter/_stats?filter_path=**.commit&level=shards (1)
  1. filter_path is used to reduce the verbosity of the response, but is entirely optional

which returns something similar to:

{
   "indices": {
      "twitter": {
         "shards": {
            "0": [
               {
                 "commit" : {
                   "id" : "3M3zkw2GHMo2Y4h4/KFKCg==",
                   "generation" : 3,
                   "user_data" : {
                     "translog_uuid" : "hnOG3xFcTDeoI_kvvvOdNA",
                     "history_uuid" : "XP7KDJGiS1a2fHYiFL5TXQ",
                     "local_checkpoint" : "-1",
                     "translog_generation" : "2",
                     "max_seq_no" : "-1",
                     "sync_id" : "AVvFY-071siAOuFGEO9P", (1)
                     "max_unsafe_auto_id_timestamp" : "-1"
                   },
                   "num_docs" : 0
                 }
               }
            ],
            "1": ...,
            "2": ...,
            "3": ...,
            "4": ...
         }
      }
   }
}
  1. the sync id marker

Synced Flush API

The Synced Flush API allows an administrator to initiate a synced flush manually. This can be particularly useful for a planned (rolling) cluster restart where you can stop indexing and don’t want to wait the default 5 minutes for idle indices to be sync-flushed automatically.

While handy, there are a couple of caveats for this API:

  1. Synced flush is a best effort operation. Any ongoing indexing operations will cause the synced flush to fail on that shard. This means that some shards may be synced flushed while others aren’t. See below for more.

  2. The sync_id marker is removed as soon as the shard is flushed again. That is because a flush replaces the low level lucene commit point where the marker is stored. Uncommitted operations in the transaction log do not remove the marker. In practice, one should consider any indexing operation on an index as removing the marker as a flush can be triggered by Elasticsearch at any time.

Note
It is harmless to request a synced flush while there is ongoing indexing. Shards that are idle will succeed and shards that are not will fail. Any shards that succeeded will have faster recovery times.
POST twitter/_flush/synced

The response contains details about how many shards were successfully sync-flushed and information about any failure.

Here is what it looks like when all shards of a two shards and one replica index successfully sync-flushed:

{
   "_shards": {
      "total": 2,
      "successful": 2,
      "failed": 0
   },
   "twitter": {
      "total": 2,
      "successful": 2,
      "failed": 0
   }
}

Here is what it looks like when one shard group failed due to pending operations:

{
   "_shards": {
      "total": 4,
      "successful": 2,
      "failed": 2
   },
   "twitter": {
      "total": 4,
      "successful": 2,
      "failed": 2,
      "failures": [
         {
            "shard": 1,
            "reason": "[2] ongoing operations on primary"
         }
      ]
   }
}
Note
The above error is shown when the synced flush fails due to concurrent indexing operations. The HTTP status code in that case will be 409 CONFLICT.

Sometimes the failures are specific to a shard copy. The copies that failed will not be eligible for fast recovery but those that succeeded still will be. This case is reported as follows:

{
   "_shards": {
      "total": 4,
      "successful": 1,
      "failed": 1
   },
   "twitter": {
      "total": 4,
      "successful": 3,
      "failed": 1,
      "failures": [
         {
            "shard": 1,
            "reason": "unexpected error",
            "routing": {
               "state": "STARTED",
               "primary": false,
               "node": "SZNr2J_ORxKTLUCydGX4zA",
               "relocating_node": null,
               "shard": 1,
               "index": "twitter"
            }
         }
      ]
   }
}
Note
When a shard copy fails to sync-flush, the HTTP status code returned will be 409 CONFLICT.

The synced flush API can be applied to more than one index with a single call, or even on _all the indices.

POST kimchy,elasticsearch/_flush/synced

POST _flush/synced

Refresh

The refresh API allows to explicitly refresh one or more index, making all operations performed since the last refresh available for search. The (near) real-time capabilities depend on the index engine used. For example, the internal one requires refresh to be called, but by default a refresh is scheduled periodically.

POST /twitter/_refresh

Multi Index

The refresh API can be applied to more than one index with a single call, or even on _all the indices.

POST /kimchy,elasticsearch/_refresh

POST /_refresh

Force Merge

The force merge API allows to force merging of one or more indices through an API. The merge relates to the number of segments a Lucene index holds within each shard. The force merge operation allows to reduce the number of segments by merging them.

This call will block until the merge is complete. If the http connection is lost, the request will continue in the background, and any new requests will block until the previous force merge is complete.

Warning
Force merge should only be called against read-only indices. Running force merge against a read-write index can cause very large segments to be produced (>5Gb per segment), and the merge policy will never consider it for merging again until it mostly consists of deleted docs. This can cause very large segments to remain in the shards.
POST /twitter/_forcemerge

Request Parameters

The force merge API accepts the following request parameters:

max_num_segments

The number of segments to merge to. To fully merge the index, set it to 1. Defaults to simply checking if a merge needs to execute, and if so, executes it.

only_expunge_deletes

Should the merge process only expunge segments with deletes in it. In Lucene, a document is not deleted from a segment, just marked as deleted. During a merge process of segments, a new segment is created that does not have those deletes. This flag allows to only merge segments that have deletes. Defaults to false. Note that this won’t override the index.merge.policy.expunge_deletes_allowed threshold.

flush

Should a flush be performed after the forced merge. Defaults to true.

POST /kimchy/_forcemerge?only_expunge_deletes=false&max_num_segments=100&flush=true

Multi Index

The force merge API can be applied to more than one index with a single call, or even on _all the indices. Multi index operations are executed one shard at a time per node. Force merge makes the storage for the shard being merged temporarily increase, up to double its size in case max_num_segments is set to 1, as all segments need to be rewritten into a new one.

POST /kimchy,elasticsearch/_forcemerge

POST /_forcemerge