"Fossies" - the Fresh Open Source Software Archive

Member "elasticsearch-6.8.23/docs/plugins/repository-gcs.asciidoc" (29 Dec 2021, 9946 Bytes) of package /linux/www/elasticsearch-6.8.23-src.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format (assuming AsciiDoc format). Alternatively you can here view or download the uninterpreted source code file. A member file download can also be achieved by clicking within a package contents listing on the according byte size field.

Google Cloud Storage Repository Plugin

The GCS repository plugin adds support for using the Google Cloud Storage service as a repository for {ref}/modules-snapshots.html[Snapshot/Restore].

Installation

This plugin can be installed using the plugin manager:

sudo bin/elasticsearch-plugin install repository-gcs

The plugin must be installed on every node in the cluster, and each node must be restarted after installation.

This plugin can be downloaded for offline install from {plugin_url}/repository-gcs/repository-gcs-{version}.zip.

Removal

The plugin can be removed with the following command:

sudo bin/elasticsearch-plugin remove repository-gcs

The node must be stopped before removing the plugin.

Getting started

The plugin uses the Google Cloud Java Client for Storage to connect to the Storage service. If you are using Google Cloud Storage for the first time, you must connect to the Google Cloud Platform Console and create a new project. After your project is created, you must enable the Cloud Storage Service for your project.

Creating a Bucket

The Google Cloud Storage service uses the concept of a bucket as a container for all the data. Buckets are usually created using the Google Cloud Platform Console. The plugin does not automatically create buckets.

To create a new bucket:

  1. Connect to the Google Cloud Platform Console.

  2. Select your project.

  3. Go to the Storage Browser.

  4. Click the Create Bucket button.

  5. Enter the name of the new bucket.

  6. Select a storage class.

  7. Select a location.

  8. Click the Create button.

For more detailed instructions, see the Google Cloud documentation.

Service Authentication

The plugin must authenticate the requests it makes to the Google Cloud Storage service. It is common for Google client libraries to employ a strategy named application default credentials. However, that strategy is not supported for use with Elasticsearch. The plugin operates under the Elasticsearch process, which runs with the security manager enabled. The security manager obstructs the "automatic" credential discovery. Therefore, you must configure service account credentials even if you are using an environment that does not normally require this configuration (such as Compute Engine, Kubernetes Engine or App Engine).

Using a Service Account

You have to obtain and provide service account credentials manually.

For detailed information about generating JSON service account files, see the Google Cloud documentation. Note that the PKCS12 format is not supported by this plugin.

Here is a summary of the steps:

  1. Connect to the Google Cloud Platform Console.

  2. Select your project.

  3. Go to the Permission tab.

  4. Select the Service Accounts tab.

  5. Click Create service account.

  6. After the account is created, select it and download a JSON key file.

A JSON service account file looks like this:

{
  "type": "service_account",
  "project_id": "your-project-id",
  "private_key_id": "...",
  "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
  "client_email": "service-account-for-your-repository@your-project-id.iam.gserviceaccount.com",
  "client_id": "...",
  "auth_uri": "https://accounts.google.com/o/oauth2/auth",
  "token_uri": "https://accounts.google.com/o/oauth2/token",
  "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
  "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/your-bucket@your-project-id.iam.gserviceaccount.com"
}

To provide this file to the plugin, it must be stored in the {ref}/secure-settings.html[Elasticsearch keystore]. You must add a setting name of the form gcs.client.NAME.credentials_file, where NAME is the name of the client configuration for the repository. The implicit client name is default, but a different client name can be specified in the repository settings with the client key.

Note
Passing the file path via the GOOGLE_APPLICATION_CREDENTIALS environment variable is not supported.

For example, if you added a gcs.client.my_alternate_client.credentials_file setting in the keystore, you can configure a repository to use those credentials like this:

PUT _snapshot/my_gcs_repository
{
  "type": "gcs",
  "settings": {
    "bucket": "my_bucket",
    "client": "my_alternate_client"
  }
}

The credentials_file settings are {ref}/secure-settings.html#reloadable-secure-settings[reloadable]. After you reload the settings, the internal gcs clients, which are used to transfer the snapshot contents, utilize the latest settings from the keystore.

Note
Snapshot or restore jobs that are in progress are not preempted by a reload of the client’s credentials_file settings. They complete using the client as it was built when the operation started.

Client Settings

The client used to connect to Google Cloud Storage has a number of settings available. Client setting names are of the form gcs.client.CLIENT_NAME.SETTING_NAME and are specified inside elasticsearch.yml. The default client name looked up by a gcs repository is called default, but can be customized with the repository setting client.

For example:

PUT _snapshot/my_gcs_repository
{
  "type": "gcs",
  "settings": {
    "bucket": "my_bucket",
    "client": "my_alternate_client"
  }
}

Some settings are sensitive and must be stored in the {ref}/secure-settings.html[Elasticsearch keystore]. This is the case for the service account file:

bin/elasticsearch-keystore add-file gcs.client.default.credentials_file /path/service-account.json

The following are the available client settings. Those that must be stored in the keystore are marked as Secure.

credentials_file

The service account file that is used to authenticate to the Google Cloud Storage service. (Secure)

endpoint

The Google Cloud Storage service endpoint to connect to. This will be automatically determined by the Google Cloud Storage client but can be specified explicitly.

connect_timeout

The timeout to establish a connection to the Google Cloud Storage service. The value should specify the unit. For example, a value of 5s specifies a 5 second timeout. The value of -1 corresponds to an infinite timeout. The default value is 20 seconds.

read_timeout

The timeout to read data from an established connection. The value should specify the unit. For example, a value of 5s specifies a 5 second timeout. The value of -1 corresponds to an infinite timeout. The default value is 20 seconds.

application_name

Name used by the client when it uses the Google Cloud Storage service. Setting a custom name can be useful to authenticate your cluster when requests statistics are logged in the Google Cloud Platform. Default to repository-gcs

project_id

The Google Cloud project id. This will be automatically inferred from the credentials file but can be specified explicitly. For example, it can be used to switch between projects when the same credentials are usable for both the production and the development projects.

Repository Settings

The gcs repository type supports a number of settings to customize how data is stored in Google Cloud Storage.

These can be specified when creating the repository. For example:

PUT _snapshot/my_gcs_repository
{
  "type": "gcs",
  "settings": {
    "bucket": "my_other_bucket",
    "base_path": "dev"
  }
}

The following settings are supported:

bucket

The name of the bucket to be used for snapshots. (Mandatory)

client

The name of the client to use to connect to Google Cloud Storage. Defaults to default.

base_path

Specifies the path within bucket to repository data. Defaults to the root of the bucket.

chunk_size

Big files can be broken down into chunks during snapshotting if needed. Specify the chunk size as a value and unit, for example: 10MB or 5KB. Defaults to 100MB, which is the maximum permitted.

compress

When set to true metadata files are stored in compressed format. This setting doesn’t affect index files that are already compressed by default. Defaults to false.

max_restore_bytes_per_sec

Throttles per node restore rate. Defaults to 40mb per second.

max_snapshot_bytes_per_sec

Throttles per node snapshot rate. Defaults to 40mb per second.

readonly

Makes repository read-only. Defaults to false.

application_name

deprecated:[6.3.0, "This setting is now defined in the client settings."] Name used by the client when it uses the Google Cloud Storage service.

Recommended Bucket Permission

The service account used to access the bucket must have the "Writer" access to the bucket:

  1. Connect to the Google Cloud Platform Console.

  2. Select your project.

  3. Go to the Storage Browser.

  4. Select the bucket and "Edit bucket permission".

  5. The service account must be configured as a "User" with "Writer" access.