"Fossies" - the Fresh Open Source Software Archive

Member "elasticsearch-6.8.23/docs/java-api/docs.asciidoc" (29 Dec 2021, 760 Bytes) of package /linux/www/elasticsearch-6.8.23-src.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format (assuming AsciiDoc format). Alternatively you can here view or download the uninterpreted source code file. A member file download can also be achieved by clicking within a package contents listing on the according byte size field.

Document APIs

This section describes the following CRUD APIs:

Single document APIs
Note
All CRUD APIs are single-index APIs. The index parameter accepts a single index name, or an alias which points to a single index.

Index API

The index API allows one to index a typed JSON document into a specific index and make it searchable.

Generate JSON document

There are several different ways of generating a JSON document:

  • Manually (aka do it yourself) using native byte[] or as a String

  • Using a Map that will be automatically converted to its JSON equivalent

  • Using a third party library to serialize your beans such as Jackson

  • Using built-in helpers XContentFactory.jsonBuilder()

Internally, each type is converted to byte[] (so a String is converted to a byte[]). Therefore, if the object is in this form already, then use it. The jsonBuilder is highly optimized JSON generator that directly constructs a byte[].

Do It Yourself

Nothing really difficult here but note that you will have to encode dates according to the {ref}/mapping-date-format.html[Date Format].

String json = "{" +
        "\"user\":\"kimchy\"," +
        "\"postDate\":\"2013-01-30\"," +
        "\"message\":\"trying out Elasticsearch\"" +
    "}";
Using Map

Map is a key:values pair collection. It represents a JSON structure:

Map<String, Object> json = new HashMap<String, Object>();
json.put("user","kimchy");
json.put("postDate",new Date());
json.put("message","trying out Elasticsearch");
Serialize your beans

You can use Jackson to serialize your beans to JSON. Please add Jackson Databind to your project. Then you can use ObjectMapper to serialize your beans:

import com.fasterxml.jackson.databind.*;

// instance a json mapper
ObjectMapper mapper = new ObjectMapper(); // create once, reuse

// generate json
byte[] json = mapper.writeValueAsBytes(yourbeaninstance);
Use Elasticsearch helpers

Elasticsearch provides built-in helpers to generate JSON content.

import static org.elasticsearch.common.xcontent.XContentFactory.*;

XContentBuilder builder = jsonBuilder()
    .startObject()
        .field("user", "kimchy")
        .field("postDate", new Date())
        .field("message", "trying out Elasticsearch")
    .endObject()

Note that you can also add arrays with startArray(String) and endArray() methods. By the way, the field method
accepts many object types. You can directly pass numbers, dates and even other XContentBuilder objects.

If you need to see the generated JSON content, you can use the Strings.toString() method.

import org.elasticsearch.common.Strings;

String json = Strings.toString(builder);

Index document

The following example indexes a JSON document into an index called twitter, under a type called _doc`, with id valued 1:

import static org.elasticsearch.common.xcontent.XContentFactory.*;

IndexResponse response = client.prepareIndex("twitter", "_doc", "1")
        .setSource(jsonBuilder()
                    .startObject()
                        .field("user", "kimchy")
                        .field("postDate", new Date())
                        .field("message", "trying out Elasticsearch")
                    .endObject()
                  )
        .get();

Note that you can also index your documents as JSON String and that you don’t have to give an ID:

String json = "{" +
        "\"user\":\"kimchy\"," +
        "\"postDate\":\"2013-01-30\"," +
        "\"message\":\"trying out Elasticsearch\"" +
    "}";

IndexResponse response = client.prepareIndex("twitter", "_doc")
        .setSource(json, XContentType.JSON)
        .get();

IndexResponse object will give you a report:

// Index name
String _index = response.getIndex();
// Type name
String _type = response.getType();
// Document ID (generated or not)
String _id = response.getId();
// Version (if it's the first time you index this document, you will get: 1)
long _version = response.getVersion();
// status has stored current instance statement.
RestStatus status = response.status();

For more information on the index operation, check out the REST {ref}/docs-index_.html[index] docs.

Get API

The get API allows to get a typed JSON document from the index based on its id. The following example gets a JSON document from an index called twitter, under a type called _doc`, with id valued 1:

GetResponse response = client.prepareGet("twitter", "_doc", "1").get();

For more information on the get operation, check out the REST {ref}/docs-get.html[get] docs.

Delete API

The delete API allows one to delete a typed JSON document from a specific index based on its id. The following example deletes the JSON document from an index called twitter, under a type called _doc, with id valued 1:

DeleteResponse response = client.prepareDelete("twitter", "_doc", "1").get();

For more information on the delete operation, check out the {ref}/docs-delete.html[delete API] docs.

Delete By Query API

The delete by query API allows one to delete a given set of documents based on the result of a query:

include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[delete-by-query-sync]
  1. query

  2. index

  3. execute the operation

  4. number of deleted documents

As it can be a long running operation, if you wish to do it asynchronously, you can call execute instead of get and provide a listener like:

include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[delete-by-query-async]
  1. query

  2. index

  3. listener

  4. number of deleted documents

Update API

You can either create an UpdateRequest and send it to the client:

UpdateRequest updateRequest = new UpdateRequest();
updateRequest.index("index");
updateRequest.type("_doc");
updateRequest.id("1");
updateRequest.doc(jsonBuilder()
        .startObject()
            .field("gender", "male")
        .endObject());
client.update(updateRequest).get();

Or you can use prepareUpdate() method:

client.prepareUpdate("ttl", "doc", "1")
        .setScript(new Script(
            "ctx._source.gender = \"male\"", (1)
            ScriptService.ScriptType.INLINE, null, null))
        .get();

client.prepareUpdate("ttl", "doc", "1")
        .setDoc(jsonBuilder()               (2)
            .startObject()
                .field("gender", "male")
            .endObject())
        .get();
  1. Your script. It could also be a locally stored script name. In that case, you’ll need to use ScriptService.ScriptType.FILE

  2. Document which will be merged to the existing one.

Note that you can’t provide both script and doc.

Update by script

The update API allows to update a document based on a script provided:

UpdateRequest updateRequest = new UpdateRequest("ttl", "doc", "1")
        .script(new Script("ctx._source.gender = \"male\""));
client.update(updateRequest).get();

Update by merging documents

The update API also support passing a partial document, which will be merged into the existing document (simple recursive merge, inner merging of objects, replacing core "keys/values" and arrays). For example:

UpdateRequest updateRequest = new UpdateRequest("index", "type", "1")
        .doc(jsonBuilder()
            .startObject()
                .field("gender", "male")
            .endObject());
client.update(updateRequest).get();

Upsert

There is also support for upsert. If the document does not exist, the content of the upsert element will be used to index the fresh doc:

IndexRequest indexRequest = new IndexRequest("index", "type", "1")
        .source(jsonBuilder()
            .startObject()
                .field("name", "Joe Smith")
                .field("gender", "male")
            .endObject());
UpdateRequest updateRequest = new UpdateRequest("index", "type", "1")
        .doc(jsonBuilder()
            .startObject()
                .field("gender", "male")
            .endObject())
        .upsert(indexRequest);              (1)
client.update(updateRequest).get();
  1. If the document does not exist, the one in indexRequest will be added

If the document index/_doc/1 already exists, we will have after this operation a document like:

{
    "name"  : "Joe Dalton",
    "gender": "male"        (1)
}
  1. This field is added by the update request

If it does not exist, we will have a new document:

{
    "name" : "Joe Smith",
    "gender": "male"
}

Multi Get API

The multi get API allows to get a list of documents based on their index and id:

MultiGetResponse multiGetItemResponses = client.prepareMultiGet()
    .add("twitter", "_doc", "1")            (1)
    .add("twitter", "_doc", "2", "3", "4")  (2)
    .add("another", "_doc", "foo")          (3)
    .get();

for (MultiGetItemResponse itemResponse : multiGetItemResponses) { (4)
    GetResponse response = itemResponse.getResponse();
    if (response.isExists()) {                      (5)
        String json = response.getSourceAsString(); (6)
    }
}
  1. get by a single id

  2. or by a list of ids for the same index

  3. you can also get from another index

  4. iterate over the result set

  5. you can check if the document exists

  6. access to the _source field

For more information on the multi get operation, check out the REST {ref}/docs-multi-get.html[multi get] docs.

Bulk API

The bulk API allows one to index and delete several documents in a single request. Here is a sample usage:

import static org.elasticsearch.common.xcontent.XContentFactory.*;

BulkRequestBuilder bulkRequest = client.prepareBulk();

// either use client#prepare, or use Requests# to directly build index/delete requests
bulkRequest.add(client.prepareIndex("twitter", "_doc", "1")
        .setSource(jsonBuilder()
                    .startObject()
                        .field("user", "kimchy")
                        .field("postDate", new Date())
                        .field("message", "trying out Elasticsearch")
                    .endObject()
                  )
        );

bulkRequest.add(client.prepareIndex("twitter", "_doc", "2")
        .setSource(jsonBuilder()
                    .startObject()
                        .field("user", "kimchy")
                        .field("postDate", new Date())
                        .field("message", "another post")
                    .endObject()
                  )
        );

BulkResponse bulkResponse = bulkRequest.get();
if (bulkResponse.hasFailures()) {
    // process failures by iterating through each bulk response item
}

Using Bulk Processor

The BulkProcessor class offers a simple interface to flush bulk operations automatically based on the number or size of requests, or after a given period.

To use it, first create a BulkProcessor instance:

import org.elasticsearch.action.bulk.BackoffPolicy;
import org.elasticsearch.action.bulk.BulkProcessor;
import org.elasticsearch.common.unit.ByteSizeUnit;
import org.elasticsearch.common.unit.ByteSizeValue;
import org.elasticsearch.common.unit.TimeValue;

BulkProcessor bulkProcessor = BulkProcessor.builder(
        client,  (1)
        new BulkProcessor.Listener() {
            @Override
            public void beforeBulk(long executionId,
                                   BulkRequest request) { ... } (2)

            @Override
            public void afterBulk(long executionId,
                                  BulkRequest request,
                                  BulkResponse response) { ... } (3)

            @Override
            public void afterBulk(long executionId,
                                  BulkRequest request,
                                  Throwable failure) { ... } (4)
        })
        .setBulkActions(10000) (5)
        .setBulkSize(new ByteSizeValue(5, ByteSizeUnit.MB)) (6)
        .setFlushInterval(TimeValue.timeValueSeconds(5)) (7)
        .setConcurrentRequests(1) (8)
        .setBackoffPolicy(
            BackoffPolicy.exponentialBackoff(TimeValue.timeValueMillis(100), 3)) (9)
        .build();
  1. Add your Elasticsearch client

  2. This method is called just before bulk is executed. You can for example see the numberOfActions with request.numberOfActions()

  3. This method is called after bulk execution. You can for example check if there was some failing requests with response.hasFailures()

  4. This method is called when the bulk failed and raised a Throwable

  5. We want to execute the bulk every 10 000 requests

  6. We want to flush the bulk every 5mb

  7. We want to flush the bulk every 5 seconds whatever the number of requests

  8. Set the number of concurrent requests. A value of 0 means that only a single request will be allowed to be executed. A value of 1 means 1 concurrent request is allowed to be executed while accumulating new bulk requests.

  9. Set a custom backoff policy which will initially wait for 100ms, increase exponentially and retries up to three times. A retry is attempted whenever one or more bulk item requests have failed with an EsRejectedExecutionException which indicates that there were too little compute resources available for processing the request. To disable backoff, pass BackoffPolicy.noBackoff().

By default, BulkProcessor:

  • sets bulkActions to 1000

  • sets bulkSize to 5mb

  • does not set flushInterval

  • sets concurrentRequests to 1, which means an asynchronous execution of the flush operation.

  • sets backoffPolicy to an exponential backoff with 8 retries and a start delay of 50ms. The total wait time is roughly 5.1 seconds.

Add requests

Then you can simply add your requests to the BulkProcessor:

bulkProcessor.add(new IndexRequest("twitter", "_doc", "1").source(/* your doc here */));
bulkProcessor.add(new DeleteRequest("twitter", "_doc", "2"));

Closing the Bulk Processor

When all documents are loaded to the BulkProcessor it can be closed by using awaitClose or close methods:

bulkProcessor.awaitClose(10, TimeUnit.MINUTES);

or

bulkProcessor.close();

Both methods flush any remaining documents and disable all other scheduled flushes if they were scheduled by setting flushInterval. If concurrent requests were enabled the awaitClose method waits for up to the specified timeout for all bulk requests to complete then returns true, if the specified waiting time elapses before all bulk requests complete, false is returned. The close method doesn’t wait for any remaining bulk requests to complete and exits immediately.

Using Bulk Processor in tests

If you are running tests with Elasticsearch and are using the BulkProcessor to populate your dataset you should better set the number of concurrent requests to 0 so the flush operation of the bulk will be executed in a synchronous manner:

BulkProcessor bulkProcessor = BulkProcessor.builder(client, new BulkProcessor.Listener() { /* Listener methods */ })
        .setBulkActions(10000)
        .setConcurrentRequests(0)
        .build();

// Add your requests
bulkProcessor.add(/* Your requests */);

// Flush any remaining requests
bulkProcessor.flush();

// Or close the bulkProcessor if you don't need it anymore
bulkProcessor.close();

// Refresh your indices
client.admin().indices().prepareRefresh().get();

// Now you can start searching!
client.prepareSearch().get();

Global Parameters

Global parameters can be specified on the BulkRequest as well as BulkProcessor, similar to the REST API. These global parameters serve as defaults and can be overridden by local parameters specified on each sub request. Some parameters have to be set before any sub request is added - index, type - and you have to specify them during BulkRequest or BulkProcessor creation. Some are optional - pipeline, routing - and can be specified at any point before the bulk is sent.

include-tagged::{hlrc-tests}/BulkProcessorIT.java[bulk-processor-mix-parameters]
  1. global parameters from the BulkRequest will be applied on a sub request

  2. local pipeline parameter on a sub request will override global parameters from BulkRequest

include-tagged::{hlrc-tests}/BulkRequestWithGlobalParametersIT.java[bulk-request-mix-pipeline]
  1. local pipeline parameter on a sub request will override global pipeline from the BulkRequest

  2. global parameter from the BulkRequest will be applied on a sub request

Update By Query API

The simplest usage of updateByQuery updates each document in an index without changing the source. This usage enables picking up a new property or another online mapping change.

include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query]

Calls to the updateByQuery API start by getting a snapshot of the index, indexing any documents found using the internal versioning.

Note
Version conflicts happen when a document changes between the time of the snapshot and the time the index request processes.

When the versions match, updateByQuery updates the document and increments the version number.

All update and query failures cause updateByQuery to abort. These failures are available from the BulkByScrollResponse#getIndexingFailures method. Any successful updates remain and are not rolled back. While the first failure causes the abort, the response contains all of the failures generated by the failed bulk request.

To prevent version conflicts from causing updateByQuery to abort, set abortOnVersionConflict(false). The first example does this because it is trying to pick up an online mapping change and a version conflict means that the conflicting document was updated between the start of the updateByQuery and the time when it attempted to update the document. This is fine because that update will have picked up the online mapping update.

The UpdateByQueryRequestBuilder API supports filtering the updated documents, limiting the total number of documents to update, and updating documents with a script:

include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-filter]

UpdateByQueryRequestBuilder also enables direct access to the query used to select the documents. You can use this access to change the default scroll size or otherwise modify the request for matching documents.

include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-size]

You can also combine size with sorting to limit the documents updated:

include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-sort]

In addition to changing the _source field for the document, you can use a script to change the action, similar to the Update API:

include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-script]

As in the Update API, you can set the value of ctx.op to change the operation that executes:

noop

Set ctx.op = "noop" if your script doesn’t make any changes. The updateByQuery operation then omits that document from the updates. This behavior increments the noop counter in the response body.

delete

Set ctx.op = "delete" if your script decides that the document must be deleted. The deletion will be reported in the deleted counter in the response body.

Setting ctx.op to any other value generates an error. Setting any other field in ctx generates an error.

This API doesn’t allow you to move the documents it touches, just modify their source. This is intentional! We’ve made no provisions for removing the document from its original location.

You can also perform these operations on multiple indices and types at once, similar to the search API:

include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-multi-index]

If you provide a routing value then the process copies the routing value to the scroll query, limiting the process to the shards that match that routing value:

include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-routing]

updateByQuery can also use the ingest node by specifying a pipeline like this:

include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-pipeline]

Works with the Task API

You can fetch the status of all running update-by-query requests with the Task API:

include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-list-tasks]

With the TaskId shown above you can look up the task directly:

include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-get-task]

Works with the Cancel Task API

Any Update By Query can be canceled using the Task Cancel API:

include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-cancel-task]

Use the list tasks API to find the value of taskId.

Cancelling a request is typically a very fast process but can take up to a few seconds. The task status API continues to list the task until the cancellation is complete.

Rethrottling

Use the _rethrottle API to change the value of requests_per_second on a running update:

include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[update-by-query-rethrottle]

Use the list tasks API to find the value of taskId.

As with the updateByQuery API, the value of requests_per_second can be any positive float value to set the level of the throttle, or Float.POSITIVE_INFINITY to disable throttling. A value of requests_per_second that speeds up the process takes effect immediately. requests_per_second values that slow the query take effect after completing the current batch in order to prevent scroll timeouts.

Reindex API

See {ref}/docs-reindex.html[reindex API].

include-tagged::{client-reindex-tests}/ReindexDocumentationIT.java[reindex1]
  1. Optionally a query can provided to filter what documents should be re-indexed from the source to the target index.