"Fossies" - the Fresh Open Source Software Archive

Member "elasticsearch-6.8.23/docs/plugins/repository.asciidoc" (29 Dec 2021, 1195 Bytes) of package /linux/www/elasticsearch-6.8.23-src.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format (assuming AsciiDoc format). Alternatively you can here view or download the uninterpreted source code file. A member file download can also be achieved by clicking within a package contents listing on the according byte size field.

Snapshot/Restore Repository Plugins

Repository plugins extend the {ref}/modules-snapshots.html[Snapshot/Restore] functionality in Elasticsearch by adding repositories backed by the cloud or by distributed file systems:

Core repository plugins

The core repository plugins are:

S3 Repository

The S3 repository plugin adds support for using S3 as a repository.

Azure Repository

The Azure repository plugin adds support for using Azure as a repository.

HDFS Repository

The Hadoop HDFS Repository plugin adds support for using HDFS as a repository.

Google Cloud Storage Repository

The GCS repository plugin adds support for using Google Cloud Storage service as a repository.

Community contributed repository plugins

The following plugin has been contributed by our community:

Azure Repository Plugin

The Azure Repository plugin adds support for using Azure as a repository for {ref}/modules-snapshots.html[Snapshot/Restore].

Installation

This plugin can be installed using the plugin manager:

sudo bin/elasticsearch-plugin install repository-azure

The plugin must be installed on every node in the cluster, and each node must be restarted after installation.

This plugin can be downloaded for offline install from {plugin_url}/repository-azure/repository-azure-{version}.zip.

Removal

The plugin can be removed with the following command:

sudo bin/elasticsearch-plugin remove repository-azure

The node must be stopped before removing the plugin.

Azure Repository

To enable Azure repositories, you have first to define your azure storage settings as {ref}/secure-settings.html[secure settings], before starting up the node:

bin/elasticsearch-keystore add azure.client.default.account
bin/elasticsearch-keystore add azure.client.default.key

Where account is the azure account name and key the azure secret key. Instead of an azure secret key under key, you can alternatively define a shared access signatures (SAS) token under sas_token to use for authentication instead. When using an SAS token instead of an account key, the SAS token must have read (r), write (w), list (l), and delete (d) permissions for the repository base path and all its contents. These permissions need to be granted for the blob service (b) and apply to resource types service (s), container (c), and object (o). These settings are used by the repository’s internal azure client.

Note that you can also define more than one account:

bin/elasticsearch-keystore add azure.client.default.account
bin/elasticsearch-keystore add azure.client.default.key
bin/elasticsearch-keystore add azure.client.secondary.account
bin/elasticsearch-keystore add azure.client.secondary.sas_token

default is the default account name which will be used by a repository, unless you set an explicit one in the repository settings.

The account, key, and sas_token storage settings are {ref}/secure-settings.html#reloadable-secure-settings[reloadable]. After you reload the settings, the internal azure clients, which are used to transfer the snapshot, will utilize the latest settings from the keystore.

Note
In progress snapshot/restore jobs will not be preempted by a reload of the storage secure settings. They will complete using the client as it was built when the operation started.

You can set the client side timeout to use when making any single request. It can be defined globally, per account or both. It’s not set by default which means that Elasticsearch is using the default value set by the azure client (known as 5 minutes).

max_retries can help to control the exponential backoff policy. It will fix the number of retries in case of failures before considering the snapshot is failing. Defaults to 3 retries. The initial backoff period is defined by Azure SDK as 30s. Which means 30s of wait time before retrying after a first timeout or failure. The maximum backoff period is defined by Azure SDK as 90s.

endpoint_suffix can be used to specify Azure endpoint suffix explicitly. Defaults to core.windows.net.

cloud.azure.storage.timeout: 10s
azure.client.default.max_retries: 7
azure.client.default.endpoint_suffix: core.chinacloudapi.cn
azure.client.secondary.timeout: 30s

In this example, timeout will be 10s per try for default with 7 retries before failing and endpoint suffix will be core.chinacloudapi.cn and 30s per try for secondary with 3 retries.

Important
Supported Azure Storage Account types

The Azure Repository plugin works with all Standard storage accounts

  • Standard Locally Redundant Storage - Standard_LRS

  • Standard Zone-Redundant Storage - Standard_ZRS

  • Standard Geo-Redundant Storage - Standard_GRS

  • Standard Read Access Geo-Redundant Storage - Standard_RAGRS

Premium Locally Redundant Storage (Premium_LRS) is not supported as it is only usable as VM disk storage, not as general storage.

You can register a proxy per client using the following settings:

azure.client.default.proxy.host: proxy.host
azure.client.default.proxy.port: 8888
azure.client.default.proxy.type: http

Supported values for proxy.type are direct (default), http or socks. When proxy.type is set to http or socks, proxy.host and proxy.port must be provided.

Repository settings

The Azure repository supports following settings:

client

Azure named client to use. Defaults to default.

container

Container name. You must create the azure container before creating the repository. Defaults to elasticsearch-snapshots.

base_path

Specifies the path within container to repository data. Defaults to empty (root directory).

chunk_size

Big files can be broken down into chunks during snapshotting if needed. Specify the chunk size as a value and unit, for example: 10MB, 5KB, 500B. Defaults to 64MB (64MB max).

compress

When set to true metadata files are stored in compressed format. This setting doesn’t affect index files that are already compressed by default. Defaults to false.

max_restore_bytes_per_sec

Throttles per node restore rate. Defaults to 40mb per second.

max_snapshot_bytes_per_sec

Throttles per node snapshot rate. Defaults to 40mb per second.

readonly

Makes repository read-only. Defaults to false.

location_mode

primary_only or secondary_only. Defaults to primary_only. Note that if you set it to secondary_only, it will force readonly to true.

Some examples, using scripts:

# The simplest one
PUT _snapshot/my_backup1
{
    "type": "azure"
}

# With some settings
PUT _snapshot/my_backup2
{
    "type": "azure",
    "settings": {
        "container": "backup-container",
        "base_path": "backups",
        "chunk_size": "32m",
        "compress": true
    }
}


# With two accounts defined in elasticsearch.yml (my_account1 and my_account2)
PUT _snapshot/my_backup3
{
    "type": "azure",
    "settings": {
        "client": "secondary"
    }
}
PUT _snapshot/my_backup4
{
    "type": "azure",
    "settings": {
        "client": "secondary",
        "location_mode": "primary_only"
    }
}

Example using Java:

client.admin().cluster().preparePutRepository("my_backup_java1")
    .setType("azure").setSettings(Settings.builder()
        .put(Storage.CONTAINER, "backup-container")
        .put(Storage.CHUNK_SIZE, new ByteSizeValue(32, ByteSizeUnit.MB))
    ).get();

Repository validation rules

According to the containers naming guide, a container name must be a valid DNS name, conforming to the following naming rules:

  • Container names must start with a letter or number, and can contain only letters, numbers, and the dash (-) character.

  • Every dash (-) character must be immediately preceded and followed by a letter or number; consecutive dashes are not permitted in container names.

  • All letters in a container name must be lowercase.

  • Container names must be from 3 through 63 characters long.

S3 Repository Plugin

The S3 repository plugin adds support for using AWS S3 as a repository for {ref}/modules-snapshots.html[Snapshot/Restore].

If you are looking for a hosted solution of Elasticsearch on AWS, please visit http://www.elastic.co/cloud.

Installation

This plugin can be installed using the plugin manager:

sudo bin/elasticsearch-plugin install repository-s3

The plugin must be installed on every node in the cluster, and each node must be restarted after installation.

This plugin can be downloaded for offline install from {plugin_url}/repository-s3/repository-s3-{version}.zip.

Removal

The plugin can be removed with the following command:

sudo bin/elasticsearch-plugin remove repository-s3

The node must be stopped before removing the plugin.

Getting Started

The plugin provides a repository type named s3 which may be used when creating a repository. The repository defaults to using ECS IAM Role or EC2 IAM Role credentials for authentication. The only mandatory setting is the bucket name:

PUT _snapshot/my_s3_repository
{
  "type": "s3",
  "settings": {
    "bucket": "my_bucket"
  }
}

Client Settings

The client that you use to connect to S3 has a number of settings available. The settings have the form s3.client.CLIENT_NAME.SETTING_NAME. By default, s3 repositories use a client named default, but this can be modified using the repository setting client. For example:

PUT _snapshot/my_s3_repository
{
  "type": "s3",
  "settings": {
    "bucket": "my_bucket",
    "client": "my_alternate_client"
  }
}

Most client settings can be added to the elasticsearch.yml configuration file with the exception of the secure settings, which you add to the {es} keystore. For more information about creating and updating the {es} keystore, see {ref}/secure-settings.html[Secure settings].

For example, before you start the node, run these commands to add AWS access key settings to the keystore:

bin/elasticsearch-keystore add s3.client.default.access_key
bin/elasticsearch-keystore add s3.client.default.secret_key

All client secure settings of this plugin are {ref}/secure-settings.html#reloadable-secure-settings[reloadable]. After you reload the settings, the internal s3 clients, used to transfer the snapshot contents, will utilize the latest settings from the keystore. Any existing s3 repositories, as well as any newly created ones, will pick up the new values stored in the keystore.

Note
In-progress snapshot/restore tasks will not be preempted by a reload of the client’s secure settings. The task will complete using the client as it was built when the operation started.

The following list contains the available client settings. Those that must be stored in the keystore are marked as "secure" and are reloadable; the other settings belong in the elasticsearch.yml file.

access_key ({ref}/secure-settings.html[Secure])

An S3 access key. The secret_key setting must also be specified.

secret_key ({ref}/secure-settings.html[Secure])

An S3 secret key. The access_key setting must also be specified.

session_token

An S3 session token. The access_key and secret_key settings must also be specified. (Secure)

endpoint

The S3 service endpoint to connect to. This defaults to s3.amazonaws.com but the AWS documentation lists alternative S3 endpoints. If you are using an S3-compatible service then you should set this to the service’s endpoint.

protocol

The protocol to use to connect to S3. Valid values are either http or https. Defaults to https.

proxy.host

The host name of a proxy to connect to S3 through.

proxy.port

The port of a proxy to connect to S3 through.

proxy.username ({ref}/secure-settings.html[Secure])

The username to connect to the proxy.host with.

proxy.password ({ref}/secure-settings.html[Secure])

The password to connect to the proxy.host with.

read_timeout

The socket timeout for connecting to S3. The value should specify the unit. For example, a value of 5s specifies a 5 second timeout. The default value is 50 seconds.

max_retries

The number of retries to use when an S3 request fails. The default value is 3.

use_throttle_retries

Whether retries should be throttled (i.e. should back off). Must be true or false. Defaults to true.

S3-compatible services

There are a number of storage systems that provide an S3-compatible API, and the repository-s3 plugin allows you to use these systems in place of AWS S3. To do so, you should set the s3.client.CLIENT_NAME.endpoint setting to the system’s endpoint. This setting accepts IP addresses and hostnames and may include a port. For example, the endpoint may be 172.17.0.2 or 172.17.0.2:9000. You may also need to set s3.client.CLIENT_NAME.protocol to http if the endpoint does not support HTTPS.

Minio is an example of a storage system that provides an S3-compatible API. The repository-s3 plugin allows {es} to work with Minio-backed repositories as well as repositories stored on AWS S3. Other S3-compatible storage systems may also work with {es}, but these are not tested or supported.

Repository Settings

The s3 repository type supports a number of settings to customize how data is stored in S3. These can be specified when creating the repository. For example:

PUT _snapshot/my_s3_repository
{
  "type": "s3",
  "settings": {
    "bucket": "my_bucket_name",
    "another_setting": "setting_value"
  }
}

The following settings are supported:

bucket

The name of the bucket to be used for snapshots. (Mandatory)

client

The name of the S3 client to use to connect to S3. Defaults to default.

base_path

Specifies the path to the repository data within its bucket. Defaults to an empty string, meaning that the repository is at the root of the bucket. The value of this setting should not start or end with a /.

chunk_size

Big files can be broken down into chunks during snapshotting if needed. Specify the chunk size as a value and unit, for example: 1GB, 10MB, 5KB, 500B. Defaults to 1GB.

compress

When set to true metadata files are stored in compressed format. This setting doesn’t affect index files that are already compressed by default. Defaults to false.

max_restore_bytes_per_sec

Throttles per node restore rate. Defaults to 40mb per second.

max_snapshot_bytes_per_sec

Throttles per node snapshot rate. Defaults to 40mb per second.

readonly

Makes repository read-only. Defaults to false.

server_side_encryption

When set to true files are encrypted on server side using AES256 algorithm. Defaults to false.

buffer_size

Minimum threshold below which the chunk is uploaded using a single request. Beyond this threshold, the S3 repository will use the AWS Multipart Upload API to split the chunk into several parts, each of buffer_size length, and to upload each part in its own request. Note that setting a buffer size lower than 5mb is not allowed since it will prevent the use of the Multipart API and may result in upload errors. It is also not possible to set a buffer size greater than 5gb as it is the maximum upload size allowed by S3. Defaults to the minimum between 100mb and 5% of the heap size.

canned_acl

The S3 repository supports all S3 canned ACLs : private, public-read, public-read-write, authenticated-read, log-delivery-write, bucket-owner-read, bucket-owner-full-control. Defaults to private. You could specify a canned ACL using the canned_acl setting. When the S3 repository creates buckets and objects, it adds the canned ACL into the buckets and objects.

storage_class

Sets the S3 storage class for objects stored in the snapshot repository. Values may be standard, reduced_redundancy, standard_ia. Defaults to standard. Changing this setting on an existing repository only affects the storage class for newly created objects, resulting in a mixed usage of storage classes. Additionally, S3 Lifecycle Policies can be used to manage the storage class of existing objects. Due to the extra complexity with the Glacier class lifecycle, it is not currently supported by the plugin. For more information about the different classes, see AWS Storage Classes Guide

Note
The option of defining client settings in the repository settings as documented below is considered deprecated, and will be removed in a future version.

In addition to the above settings, you may also specify all non-secure client settings in the repository settings. In this case, the client settings found in the repository settings will be merged with those of the named client used by the repository. Conflicts between client and repository settings are resolved by the repository settings taking precedence over client settings.

For example:

PUT _snapshot/my_s3_repository
{
  "type": "s3",
  "settings": {
    "client": "my_client_name",
    "bucket": "my_bucket_name",
    "endpoint": "my.s3.endpoint"
  }
}

This sets up a repository that uses all client settings from the client my_client_name except for the endpoint that is overridden to my.s3.endpoint by the repository settings.

Recommended S3 Permissions

In order to restrict the Elasticsearch snapshot process to the minimum required resources, we recommend using Amazon IAM in conjunction with pre-existing S3 buckets. Here is an example policy which will allow the snapshot access to an S3 bucket named "snaps.example.com". This may be configured through the AWS IAM console, by creating a Custom Policy, and using a Policy Document similar to this (changing snaps.example.com to your bucket name).

{
  "Statement": [
    {
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucketVersions"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::snaps.example.com"
      ]
    },
    {
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::snaps.example.com/*"
      ]
    }
  ],
  "Version": "2012-10-17"
}

You may further restrict the permissions by specifying a prefix within the bucket, in this example, named "foo".

{
  "Statement": [
    {
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucketVersions"
      ],
      "Condition": {
        "StringLike": {
          "s3:prefix": [
            "foo/*"
          ]
        }
      },
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::snaps.example.com"
      ]
    },
    {
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::snaps.example.com/foo/*"
      ]
    }
  ],
  "Version": "2012-10-17"
}

The bucket needs to exist to register a repository for snapshots. If you did not create the bucket then the repository registration will fail.

Note: Starting in version 7.0, all bucket operations are using the path style access pattern. In previous versions the decision to use virtual hosted style or path style access was made by the AWS Java SDK.

AWS VPC Bandwidth Settings

AWS instances resolve S3 endpoints to a public IP. If the Elasticsearch instances reside in a private subnet in an AWS VPC then all traffic to S3 will go through that VPC’s NAT instance. If your VPC’s NAT instance is a smaller instance size (e.g. a t1.micro) or is handling a high volume of network traffic your bandwidth to S3 may be limited by that NAT instance’s networking bandwidth limitations.

Instances residing in a public subnet in an AWS VPC will connect to S3 via the VPC’s internet gateway and not be bandwidth limited by the VPC’s NAT instance.

Hadoop HDFS Repository Plugin

The HDFS repository plugin adds support for using HDFS File System as a repository for {ref}/modules-snapshots.html[Snapshot/Restore].

Installation

This plugin can be installed using the plugin manager:

sudo bin/elasticsearch-plugin install repository-hdfs

The plugin must be installed on every node in the cluster, and each node must be restarted after installation.

This plugin can be downloaded for offline install from {plugin_url}/repository-hdfs/repository-hdfs-{version}.zip.

Removal

The plugin can be removed with the following command:

sudo bin/elasticsearch-plugin remove repository-hdfs

The node must be stopped before removing the plugin.

Getting started with HDFS

The HDFS snapshot/restore plugin is built against the latest Apache Hadoop 2.x (currently 2.7.1). If the distro you are using is not protocol compatible with Apache Hadoop, consider replacing the Hadoop libraries inside the plugin folder with your own (you might have to adjust the security permissions required).

Even if Hadoop is already installed on the Elasticsearch nodes, for security reasons, the required libraries need to be placed under the plugin folder. Note that in most cases, if the distro is compatible, one simply needs to configure the repository with the appropriate Hadoop configuration files (see below).

Windows Users

Using Apache Hadoop on Windows is problematic and thus it is not recommended. For those really wanting to use it, make sure you place the elusive winutils.exe under the plugin folder and point HADOOP_HOME variable to it; this should minimize the amount of permissions Hadoop requires (though one would still have to add some more).

Configuration Properties

Once installed, define the configuration for the hdfs repository through the {ref}/modules-snapshots.html[REST API]:

PUT _snapshot/my_hdfs_repository
{
  "type": "hdfs",
  "settings": {
    "uri": "hdfs://namenode:8020/",
    "path": "elasticsearch/repositories/my_hdfs_repository",
    "conf.dfs.client.read.shortcircuit": "true"
  }
}

The following settings are supported:

uri

The uri address for hdfs. ex: "hdfs://<host>:<port>/". (Required)

path

The file path within the filesystem where data is stored/loaded. ex: "path/to/file". (Required)

load_defaults

Whether to load the default Hadoop configuration or not. (Enabled by default)

conf.<key>

Inlined configuration parameter to be added to Hadoop configuration. (Optional) Only client oriented properties from the hadoop core and hdfs configuration files will be recognized by the plugin.

compress

Whether to compress the metadata or not. (Disabled by default)

max_restore_bytes_per_sec

Throttles per node restore rate. Defaults to 40mb per second.

max_snapshot_bytes_per_sec

Throttles per node snapshot rate. Defaults to 40mb per second.

readonly

Makes repository read-only. Defaults to false.

chunk_size

Override the chunk size. (Disabled by default)

security.principal

Kerberos principal to use when connecting to a secured HDFS cluster. If you are using a service principal for your elasticsearch node, you may use the _HOST pattern in the principal name and the plugin will replace the pattern with the hostname of the node at runtime (see Creating the Secure Repository).

A Note on HDFS Availability

When you initialize a repository, its settings are persisted in the cluster state. When a node comes online, it will attempt to initialize all repositories for which it has settings. If your cluster has an HDFS repository configured, then all nodes in the cluster must be able to reach HDFS when starting. If not, then the node will fail to initialize the repository at start up and the repository will be unusable. If this happens, you will need to remove and re-add the repository or restart the offending node.

Hadoop Security

The HDFS Repository Plugin integrates seamlessly with Hadoop’s authentication model. The following authentication methods are supported by the plugin:

simple

Also means "no security" and is enabled by default. Uses information from underlying operating system account running Elasticsearch to inform Hadoop of the name of the current user. Hadoop makes no attempts to verify this information.

kerberos

Authenticates to Hadoop through the usage of a Kerberos principal and keytab. Interfacing with HDFS clusters secured with Kerberos requires a few additional steps to enable (See Principals and Keytabs and Creating the Secure Repository for more info)

Principals and Keytabs

Before attempting to connect to a secured HDFS cluster, provision the Kerberos principals and keytabs that the Elasticsearch nodes will use for authenticating to Kerberos. For maximum security and to avoid tripping up the Kerberos replay protection, you should create a service principal per node, following the pattern of elasticsearch/hostname@REALM.

Warning
In some cases, if the same principal is authenticating from multiple clients at once, services may reject authentication for those principals under the assumption that they could be replay attacks. If you are running the plugin in production with multiple nodes you should be using a unique service principal for each node.

On each Elasticsearch node, place the appropriate keytab file in the node’s configuration location under the repository-hdfs directory using the name krb5.keytab:

$> cd elasticsearch/config
$> ls
elasticsearch.yml  jvm.options        log4j2.properties  repository-hdfs/   scripts/
$> cd repository-hdfs
$> ls
krb5.keytab
Note
Make sure you have the correct keytabs! If you are using a service principal per node (like elasticsearch/hostname@REALM) then each node will need its own unique keytab file for the principal assigned to that host!
Creating the Secure Repository

Once your keytab files are in place and your cluster is started, creating a secured HDFS repository is simple. Just add the name of the principal that you will be authenticating as in the repository settings under the security.principal option:

PUT _snapshot/my_hdfs_repository
{
  "type": "hdfs",
  "settings": {
    "uri": "hdfs://namenode:8020/",
    "path": "/user/elasticsearch/repositories/my_hdfs_repository",
    "security.principal": "elasticsearch@REALM"
  }
}

If you are using different service principals for each node, you can use the _HOST pattern in your principal name. Elasticsearch will automatically replace the pattern with the hostname of the node at runtime:

PUT _snapshot/my_hdfs_repository
{
  "type": "hdfs",
  "settings": {
    "uri": "hdfs://namenode:8020/",
    "path": "/user/elasticsearch/repositories/my_hdfs_repository",
    "security.principal": "elasticsearch/_HOST@REALM"
  }
}
Authorization

Once Elasticsearch is connected and authenticated to HDFS, HDFS will infer a username to use for authorizing file access for the client. By default, it picks this username from the primary part of the kerberos principal used to authenticate to the service. For example, in the case of a principal like elasticsearch@REALM or elasticsearch/hostname@REALM then the username that HDFS extracts for file access checks will be elasticsearch.

Note
The repository plugin makes no assumptions of what Elasticsearch’s principal name is. The main fragment of the Kerberos principal is not required to be elasticsearch. If you have a principal or service name that works better for you or your organization then feel free to use it instead!

Google Cloud Storage Repository Plugin

The GCS repository plugin adds support for using the Google Cloud Storage service as a repository for {ref}/modules-snapshots.html[Snapshot/Restore].

Installation

This plugin can be installed using the plugin manager:

sudo bin/elasticsearch-plugin install repository-gcs

The plugin must be installed on every node in the cluster, and each node must be restarted after installation.

This plugin can be downloaded for offline install from {plugin_url}/repository-gcs/repository-gcs-{version}.zip.

Removal

The plugin can be removed with the following command:

sudo bin/elasticsearch-plugin remove repository-gcs

The node must be stopped before removing the plugin.

Getting started

The plugin uses the Google Cloud Java Client for Storage to connect to the Storage service. If you are using Google Cloud Storage for the first time, you must connect to the Google Cloud Platform Console and create a new project. After your project is created, you must enable the Cloud Storage Service for your project.

Creating a Bucket

The Google Cloud Storage service uses the concept of a bucket as a container for all the data. Buckets are usually created using the Google Cloud Platform Console. The plugin does not automatically create buckets.

To create a new bucket:

  1. Connect to the Google Cloud Platform Console.

  2. Select your project.

  3. Go to the Storage Browser.

  4. Click the Create Bucket button.

  5. Enter the name of the new bucket.

  6. Select a storage class.

  7. Select a location.

  8. Click the Create button.

For more detailed instructions, see the Google Cloud documentation.

Service Authentication

The plugin must authenticate the requests it makes to the Google Cloud Storage service. It is common for Google client libraries to employ a strategy named application default credentials. However, that strategy is not supported for use with Elasticsearch. The plugin operates under the Elasticsearch process, which runs with the security manager enabled. The security manager obstructs the "automatic" credential discovery. Therefore, you must configure service account credentials even if you are using an environment that does not normally require this configuration (such as Compute Engine, Kubernetes Engine or App Engine).

Using a Service Account

You have to obtain and provide service account credentials manually.

For detailed information about generating JSON service account files, see the Google Cloud documentation. Note that the PKCS12 format is not supported by this plugin.

Here is a summary of the steps:

  1. Connect to the Google Cloud Platform Console.

  2. Select your project.

  3. Go to the Permission tab.

  4. Select the Service Accounts tab.

  5. Click Create service account.

  6. After the account is created, select it and download a JSON key file.

A JSON service account file looks like this:

{
  "type": "service_account",
  "project_id": "your-project-id",
  "private_key_id": "...",
  "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
  "client_email": "service-account-for-your-repository@your-project-id.iam.gserviceaccount.com",
  "client_id": "...",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://accounts.google.com/o/oauth2/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/your-bucket@your-project-id.iam.gserviceaccount.com"
}

To provide this file to the plugin, it must be stored in the {ref}/secure-settings.html[Elasticsearch keystore]. You must add a setting name of the form gcs.client.NAME.credentials_file, where NAME is the name of the client configuration for the repository. The implicit client name is default, but a different client name can be specified in the repository settings with the client key.

Note
Passing the file path via the GOOGLE_APPLICATION_CREDENTIALS environment variable is not supported.

For example, if you added a gcs.client.my_alternate_client.credentials_file setting in the keystore, you can configure a repository to use those credentials like this:

PUT _snapshot/my_gcs_repository
{
  "type": "gcs",
  "settings": {
    "bucket": "my_bucket",
    "client": "my_alternate_client"
  }
}

The credentials_file settings are {ref}/secure-settings.html#reloadable-secure-settings[reloadable]. After you reload the settings, the internal gcs clients, which are used to transfer the snapshot contents, utilize the latest settings from the keystore.

Note
Snapshot or restore jobs that are in progress are not preempted by a reload of the client’s credentials_file settings. They complete using the client as it was built when the operation started.

Client Settings

The client used to connect to Google Cloud Storage has a number of settings available. Client setting names are of the form gcs.client.CLIENT_NAME.SETTING_NAME and are specified inside elasticsearch.yml. The default client name looked up by a gcs repository is called default, but can be customized with the repository setting client.

For example:

PUT _snapshot/my_gcs_repository
{
  "type": "gcs",
  "settings": {
    "bucket": "my_bucket",
    "client": "my_alternate_client"
  }
}

Some settings are sensitive and must be stored in the {ref}/secure-settings.html[Elasticsearch keystore]. This is the case for the service account file:

bin/elasticsearch-keystore add-file gcs.client.default.credentials_file /path/service-account.json

The following are the available client settings. Those that must be stored in the keystore are marked as Secure.

credentials_file

The service account file that is used to authenticate to the Google Cloud Storage service. (Secure)

endpoint

The Google Cloud Storage service endpoint to connect to. This will be automatically determined by the Google Cloud Storage client but can be specified explicitly.

connect_timeout

The timeout to establish a connection to the Google Cloud Storage service. The value should specify the unit. For example, a value of 5s specifies a 5 second timeout. The value of -1 corresponds to an infinite timeout. The default value is 20 seconds.

read_timeout

The timeout to read data from an established connection. The value should specify the unit. For example, a value of 5s specifies a 5 second timeout. The value of -1 corresponds to an infinite timeout. The default value is 20 seconds.

application_name

Name used by the client when it uses the Google Cloud Storage service. Setting a custom name can be useful to authenticate your cluster when requests statistics are logged in the Google Cloud Platform. Default to repository-gcs

project_id

The Google Cloud project id. This will be automatically inferred from the credentials file but can be specified explicitly. For example, it can be used to switch between projects when the same credentials are usable for both the production and the development projects.

Repository Settings

The gcs repository type supports a number of settings to customize how data is stored in Google Cloud Storage.

These can be specified when creating the repository. For example:

PUT _snapshot/my_gcs_repository
{
  "type": "gcs",
  "settings": {
    "bucket": "my_other_bucket",
    "base_path": "dev"
  }
}

The following settings are supported:

bucket

The name of the bucket to be used for snapshots. (Mandatory)

client

The name of the client to use to connect to Google Cloud Storage. Defaults to default.

base_path

Specifies the path within bucket to repository data. Defaults to the root of the bucket.

chunk_size

Big files can be broken down into chunks during snapshotting if needed. Specify the chunk size as a value and unit, for example: 10MB or 5KB. Defaults to 100MB, which is the maximum permitted.

compress

When set to true metadata files are stored in compressed format. This setting doesn’t affect index files that are already compressed by default. Defaults to false.

max_restore_bytes_per_sec

Throttles per node restore rate. Defaults to 40mb per second.

max_snapshot_bytes_per_sec

Throttles per node snapshot rate. Defaults to 40mb per second.

readonly

Makes repository read-only. Defaults to false.

application_name

deprecated:[6.3.0, "This setting is now defined in the client settings."] Name used by the client when it uses the Google Cloud Storage service.

Recommended Bucket Permission

The service account used to access the bucket must have the "Writer" access to the bucket:

  1. Connect to the Google Cloud Platform Console.

  2. Select your project.

  3. Go to the Storage Browser.

  4. Select the bucket and "Edit bucket permission".

  5. The service account must be configured as a "User" with "Writer" access.