"Fossies" - the Fresh Open Source Software Archive

Member "elasticsearch-6.8.23/docs/reference/high-availability.asciidoc" (29 Dec 2021, 1035 Bytes) of package /linux/www/elasticsearch-6.8.23-src.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format (assuming AsciiDoc format). Alternatively you can here view or download the uninterpreted source code file. A member file download can also be achieved by clicking within a package contents listing on the according byte size field.

{ccr-cap}

The {ccr} (CCR) feature enables replication of indices in remote clusters to a local cluster. This functionality can be used in some common production use cases:

  • Disaster recovery in case a primary cluster fails. A secondary cluster can serve as a hot backup

  • Geo-proximity so that reads can be served locally

This guide provides an overview of {ccr}:

Overview

{ccr-cap} is done on an index-by-index basis. Replication is configured at the index level. For each configured replication there is a replication source index called the leader index and a replication target index called the follower index.

Replication is active-passive. This means that while the leader index can directly be written into, the follower index can not directly receive writes.

Replication is pull-based. This means that replication is driven by the follower index. This simplifies state management on the leader index and means that {ccr} does not interfere with indexing on the leader index.

Important
{ccr-cap} requires {ref}/modules-remote-clusters.html[remote clusters].

Configuring replication

Replication can be configured in two ways:

  • Manually creating specific follower indices (in {kib} or by using the {ref}/ccr-put-follow.html[create follower API])

  • Automatically creating follower indices from auto-follow patterns (in {kib} or by using the {ref}/ccr-put-auto-follow-pattern.html[create auto-follow pattern API])

For more information about managing {ccr} in {kib}, see {kibana-ref}/working-remote-clusters.html[Working with remote clusters].

Note
You must also configure the leader index.

When you initiate replication either manually or through an auto-follow pattern, the follower index is created on the local cluster. Once the follower index is created, the remote recovery process copies all of the Lucene segment files from the remote cluster to the local cluster.

By default, if you initiate following manually (by using {kib} or the create follower API), the recovery process is asynchronous in relationship to the {ref}/ccr-put-follow.html[create follower request]. The request returns before the remote recovery process completes. If you would like to wait on the process to complete, you can use the wait_for_active_shards parameter.

The mechanics of replication

While replication is managed at the index level, replication is performed at the shard level. When a follower index is created, it is automatically configured to have an identical number of shards as the leader index. A follower shard task in the follower index pulls from the corresponding leader shard in the leader index by sending read requests for new operations. These read requests can be served from any copy of the leader shard (primary or replicas).

For each read request sent by the follower shard task, if there are new operations available on the leader shard, the leader shard responds with operations limited by the read parameters that you established when you configured the follower index. If there are no new operations available on the leader shard, the leader shard waits up to a configured timeout for new operations. If new operations occur within that timeout, the leader shard immediately responds with those new operations. Otherwise, if the timeout elapses, the leader shard replies that there are no new operations. The follower shard task updates some statistics and immediately sends another read request to the leader shard. This ensures that the network connections between the remote cluster and the local cluster are continually being used so as to avoid forceful termination by an external source (such as a firewall).

If a read request fails, the cause of the failure is inspected. If the cause of the failure is deemed to be a failure that can be recovered from (for example, a network failure), the follower shard task enters into a retry loop. Otherwise, the follower shard task is paused and requires user intervention before it can be resumed with the {ref}/ccr-post-resume-follow.html[resume follower API].

When operations are received by the follower shard task, they are placed in a write buffer. The follower shard task manages this write buffer and submits bulk write requests from this write buffer to the follower shard. The write buffer and these write requests are managed by the write parameters that you established when you configured the follower index. The write buffer serves as back-pressure against read requests. If the write buffer exceeds its configured limits, no additional read requests are sent by the follower shard task. The follower shard task resumes sending read requests when the write buffer no longer exceeds its configured limits.

Note
The intricacies of how operations are replicated from the leader are governed by settings that you can configure when you create the follower index in {kib} or by using the {ref}/ccr-put-follow.html[create follower API].

Mapping updates applied to the leader index are automatically retrieved as-needed by the follower index.

Settings updates applied to the leader index that are needed by the follower index are automatically retried as-needed by the follower index. Not all settings updates are needed by the follower index. For example, changing the number of replicas on the leader index is not replicated by the follower index.

Note
If you apply a non-dynamic settings change to the leader index that is needed by the follower index, the follower index will go through a cycle of closing itself, applying the settings update, and then re-opening itself. The follower index will be unavailable for reads and not replicating writes during this cycle.

Inspecting the progress of replication

You can inspect the progress of replication at the shard level with the {ref}/ccr-get-follow-stats.html[get follower stats API]. This API gives you insight into the read and writes managed by the follower shard task. It also reports read exceptions that can be retried and fatal exceptions that require user intervention.

Pausing and resuming replication

You can pause replication with the {ref}/ccr-post-pause-follow.html[pause follower API] and then later resume replication with the {ref}/ccr-post-resume-follow.html[resume follower API]. Using these APIs in tandem enables you to adjust the read and write parameters on the follower shard task if your initial configuration is not suitable for your use case.

Leader index retaining operations for replication

If the follower is unable to replicate operations from a leader for a period of time, the following process can fail due to the leader lacking a complete history of operations necessary for replication.

Operations replicated to the follower are identified using a sequence number generated when the operation was initially performed. Lucene segment files are occasionally merged in order to optimize searches and save space. When these merges occur, it is possible for operations associated with deleted or updated documents to be pruned during the merge. When the follower requests the sequence number for a pruned operation, the process will fail due to the operation missing on the leader.

This scenario is not possible in an append-only workflow. As documents are never deleted or updated, the underlying operation will not be pruned.

Elasticsearch attempts to mitigate this potential issue for update workflows using a Lucene feature called soft deletes. When a document is updated or deleted, the underlying operation is retained in the Lucene index for a period of time. This period of time is governed by the index.soft_deletes.retention_lease.period setting which can be configured on the leader index.

When a follower initiates the index following, it acquires a retention lease from the leader. This informs the leader that it should not allow a soft delete to be pruned until either the follower indicates that it has received the operation or the lease expires. It is valuable to have monitoring in place to detect a follower replication issue prior to the lease expiring so that the problem can be remedied before the follower falls fatally behind.

Remedying a follower that has fallen behind

If a follower falls sufficiently behind a leader that it can no longer replicate operations this can be detected in {kib} or by using the {ref}/ccr-get-follow-stats.html[get follow stats API]. It will be reported as a indices[].fatal_exception.

In order to restart the follower, you must pause the following process, close the index, and the create follower index again. For example:

POST /follower_index/_ccr/pause_follow

POST /follower_index/_close

PUT /follower_index/_ccr/follow?wait_for_active_shards=1
{
  "remote_cluster" : "remote_cluster",
  "leader_index" : "leader_index"
}

Re-creating the follower index is a destructive action. All of the existing Lucene segment files are deleted on the follower cluster. The remote recovery process copies the Lucene segment files from the leader again. After the follower index initializes, the following process starts again.

Terminating replication

You can terminate replication with the {ref}/ccr-post-unfollow.html[unfollow API]. This API converts a follower index to a regular (non-follower) index.

Requirements for leader indices

{ccr-cap} works by replaying the history of individual write operations that were performed on the shards of the leader index. This means that the history of these operations needs to be retained on the leader shards so that they can be pulled by the follower shard tasks. The underlying mechanism used to retain these operations is soft deletes. A soft delete occurs whenever an existing document is deleted or updated. By retaining these soft deletes up to configurable limits, the history of operations can be retained on the leader shards and made available to the follower shard tasks as it replays the history of operations.

Soft deletes must be enabled for indices that you want to use as leader indices. Enabling soft deletes requires the addition of some index settings at index creation time. You must add these settings to your create index requests or to the index templates that you use to manage the creation of new indices.

Important
This means that {ccr} can not be used on existing indices. If you have existing data that you want to replicate from another cluster, you must {ref}/docs-reindex.html[reindex] your data into a new index with soft deletes enabled.

Soft delete settings

index.soft_deletes.enabled

Whether or not soft deletes are enabled on the index. Soft deletes can only be configured at index creation and only on indices created on or after 6.5.0. The default value is false.

index.soft_deletes.retention_lease.period

The maximum period to retain a shard history retention lease before it is considered expired. Shard history retention leases ensure that soft deletes are retained during merges on the Lucene index. If a soft delete is merged away before it can be replicated to a follower the following process will fail due to incomplete history on the leader. The default value is 12h.

For more information about index settings, see {ref}/index-modules.html[Index modules].

Automatically following indices

In time series use cases where you want to follow new indices that are periodically created (such as daily Beats indices), manually configuring follower indices for each new leader index can be an operational burden. The auto-follow functionality in {ccr} is aimed at easing this burden. With the auto-follow functionality, you can specify that new indices in a remote cluster that have a name that matches a pattern are automatically followed.

Managing auto-follow patterns

You can add a new auto-follow pattern configuration with the {ref}/ccr-put-auto-follow-pattern.html[create auto-follow pattern API]. When you create a new auto-follow pattern configuration, you are configuring a collection of patterns against a single remote cluster. Any time a new index with a name that matches one of the patterns in the collection is created in the remote cluster, a follower index is configured in the local cluster. The follower index uses the new index as its leader index.

You can inspect all configured auto-follow pattern collections with the {ref}/ccr-get-auto-follow-pattern.html[get auto-follow pattern API]. To delete a configured auto-follow pattern collection, use the {ref}/ccr-delete-auto-follow-pattern.html[delete auto-follow pattern API].

Since auto-follow functionality is handled automatically in the background on your behalf, error reporting is done through logs on the elected master node and through the {ref}/ccr-get-stats.html[{ccr} stats API].

Getting started with {ccr}

This getting-started guide for {ccr} shows you how to:

Before you begin

  1. {stack-gs}/get-started-elastic-stack.html#install-elasticsearch[Install {es}] on your local and remote clusters.

  2. Obtain a license that includes the {ccr} features. See subscriptions and {stack-ov}/license-management.html[License-management].

  3. If the Elastic {security-features} are enabled in your local and remote clusters, you need a user that has appropriate authority to perform the steps in this tutorial.

    The {ccr} features use cluster privileges and built-in roles to make it easier to control which users have authority to manage {ccr}.

    By default, you can perform all of the steps in this tutorial by using the built-in elastic user. However, a password must be set for this user before the user can do anything. For information about how to set that password, see [security-getting-started].

    If you are performing these steps in a production environment, take extra care because the elastic user has the superuser role and you could inadvertently make significant changes.

    Alternatively, you can assign the appropriate privileges to a user ID of your choice. On the remote cluster that contains the leader index, a user will need the read_ccr cluster privilege and monitor and read privileges on the leader index.

    ccr_user:
      cluster:
        - read_ccr
      indices:
        - names: [ 'leader-index' ]
          privileges:
            - monitor
            - read

    On the local cluster that contains the follower index, the same user will need the manage_ccr cluster privilege and monitor, read, write and manage_follow_index privileges on the follower index.

    ccr_user:
      cluster:
        - manage_ccr
      indices:
        - names: [ 'follower-index' ]
          privileges:
            - monitor
            - read
            - write
            - manage_follow_index

    If you are managing connecting to the remote cluster via the cluster update settings API, you will also need a user with the all cluster privilege.

Connecting to a remote cluster

The {ccr} features require that you {ref}/modules-remote-clusters.html[connect your local cluster to a remote cluster]. In this tutorial, we will connect our local cluster to a remote cluster with the cluster alias leader.

PUT /_cluster/settings
{
  "persistent" : {
    "cluster" : {
      "remote" : {
        "leader" : {
          "seeds" : [
            "127.0.0.1:9300" (1)
          ]
        }
      }
    }
  }
}
  1. Specifies the hostname and transport port of a seed node in the remote cluster.

You can verify that the local cluster is successfully connected to the remote cluster.

GET /_remote/info

The API will respond by showing that the local cluster is connected to the remote cluster.

{
  "leader" : {
    "seeds" : [
      "127.0.0.1:9300"
    ],
    "http_addresses" : [
      "127.0.0.1:9200"
    ],
    "connected" : true, (1)
    "num_nodes_connected" : 1, (2)
    "max_connections_per_cluster" : 3,
    "initial_connect_timeout" : "30s",
    "skip_unavailable" : false
  }
}
  1. This shows the local cluster is connected to the remote cluster with cluster alias leader

  2. This shows the number of nodes in the remote cluster the local cluster is connected to.

Alternatively, you can manage remote clusters on the Management / Elasticsearch / Remote Clusters page in {kib}:

The Remote Clusters page in {kib}

Creating a leader index

Leader indices require a special index setting to ensure that the operations that need to be replicated are available when the follower requests them from the leader. This setting is used to enable soft deletes on the leader index A soft delete occurs whenever a document is deleted or updated. Soft deletes can be enabled only on new indices created on or after {es} 6.5.0.

In the following example, we will create a leader index in the remote cluster:

PUT /server-metrics
{
  "settings" : {
    "index" : {
      "number_of_shards" : 1,
      "number_of_replicas" : 0,
      "soft_deletes" : {
        "enabled" : true (1)
      }
    }
  },
  "mappings" : {
    "metric" : {
      "properties" : {
        "@timestamp" : {
          "type" : "date"
        },
        "accept" : {
          "type" : "long"
        },
        "deny" : {
          "type" : "long"
        },
        "host" : {
          "type" : "keyword"
        },
        "response" : {
          "type" : "float"
        },
        "service" : {
          "type" : "keyword"
        },
        "total" : {
          "type" : "long"
        }
      }
    }
  }
}
  1. Enables soft deletes on the leader index.

Creating a follower index

Follower indices are created with the {ref}/ccr-put-follow.html[create follower API]. When you create a follower index, you must reference the remote cluster and the leader index that you created in the remote cluster.

PUT /server-metrics-copy/_ccr/follow?wait_for_active_shards=1
{
  "remote_cluster" : "leader",
  "leader_index" : "server-metrics"
}

The follower index is initialized using the remote recovery process. The remote recovery process transfers the existing Lucene segment files from the leader to the follower. When the remote recovery process is complete, the index following begins.

Now when you index documents into your leader index, you will see these documents replicated in the follower index. You can inspect the status of replication using the {ref}/ccr-get-follow-stats.html[get follower stats API].

Automatically create follower indices

The auto-follow feature in {ccr} helps for time series use cases where you want to follow new indices that are periodically created in the remote cluster (such as daily Beats indices). Auto-following is configured using the {ref}/ccr-put-auto-follow-pattern.html[create auto-follow pattern API]. With an auto-follow pattern, you reference the remote cluster that you connected your local cluster to. You must also specify a collection of patterns that match the indices you want to automatically follow.

For example:

PUT /_ccr/auto_follow/beats
{
  "remote_cluster" : "leader",
  "leader_index_patterns" :
  [
    "metricbeat-*", (1)
    "packetbeat-*" (2)
  ],
  "follow_index_pattern" : "{{leader_index}}-copy" (3)
}
  1. Automatically follow new {metricbeat} indices.

  2. Automatically follow new {packetbeat} indices.

  3. The name of the follower index is derived from the name of the leader index by adding the suffix -copy to the name of the leader index.

Alternatively, you can manage auto-follow patterns on the Management / Elasticsearch / Cross Cluster Replication page in {kib}:

The Auto-follow patterns page in {kib}

Remote recovery

When you create a follower index, you cannot use it until it is fully initialized. The remote recovery process builds a new copy of a shard on a follower node by copying data from the primary shard in the leader cluster. {es} uses this remote recovery process to bootstrap a follower index using the data from the leader index. This process provides the follower with a copy of the current state of the leader index, even if a complete history of changes is not available on the leader due to Lucene segment merging.

Remote recovery is a network intensive process that transfers all of the Lucene segment files from the leader cluster to the follower cluster. The follower requests that a recovery session be initiated on the primary shard in the leader cluster. The follower then requests file chunks concurrently from the leader. By default, the process concurrently requests 5 large 1mb file chunks. This default behavior is designed to support leader and follower clusters with high network latency between them.

There are dynamic settings that you can use to rate-limit the transmitted data and manage the resources consumed by remote recoveries. See {ref}/ccr-settings.html[{ccr-cap} settings].

You can obtain information about an in-progress remote recovery by using the {ref}/cat-recovery.html[recovery API] on the follower cluster. Remote recoveries are implemented using the {ref}/modules-snapshots.html[snapshot and restore] infrastructure. This means that on-going remote recoveries are labelled as type snapshot in the recovery API.

Upgrading clusters

Clusters that are actively using {ccr} require a careful approach to upgrades. Otherwise index following may fail during a rolling upgrade, because of the following reasons:

  • If a new index setting or mapping type is replicated from an upgraded cluster to a non-upgraded cluster then the non-upgraded cluster will reject that and will fail index following.

  • Lucene is not forwards compatible and when index following is falling back to file based recovery then a node in a non-upgraded cluster will reject index files from a newer Lucene version compared to what it is using.

Rolling upgrading clusters with {ccr} is different in case of uni-directional index following and bi-directional index following.

Uni-directional index following

In a uni-directional setup between two clusters, one cluster contains only leader indices, and the other cluster contains only follower indices following indices in the first cluster.

In this setup, the cluster with follower indices should be upgraded first and the cluster with leader indices should be upgraded last. If clusters are upgraded in this order then index following can continue during the upgrade without downtime.

Note that a chain index following setup can also be upgraded in this way. For example if there is a cluster A that contains all leader indices, cluster B that follows indices in cluster A and cluster C that follows indices in cluster B. In this case the cluster C should be upgraded first, then cluster B and finally cluster A.

Bi-directional index following

In a bi-directional setup between two clusters, each cluster contains both leader and follower indices.

When upgrading clusters in this setup, all index following needs to be paused using the {ref}/ccr-post-pause-follow.html[pause follower API] prior to upgrading both clusters. After both clusters have been upgraded then index following can be resumed using the {ref}/ccr-post-resume-follow.html[resume follower API]].