EC2 Discovery Plugin
The EC2 discovery plugin uses the AWS API for unicast discovery.
If you are looking for a hosted solution of Elasticsearch on AWS, please visit http://www.elastic.co/cloud.
Installation
This plugin can be installed using the plugin manager:
sudo bin/elasticsearch-plugin install discovery-ec2
The plugin must be installed on every node in the cluster, and each node must be restarted after installation.
This plugin can be downloaded for offline install from {plugin_url}/discovery-ec2/discovery-ec2-{version}.zip.
Removal
The plugin can be removed with the following command:
sudo bin/elasticsearch-plugin remove discovery-ec2
The node must be stopped before removing the plugin.
Getting started with AWS
The plugin provides a hosts provider for zen discovery named ec2
. This hosts
provider finds other Elasticsearch instances in EC2 through AWS metadata.
Authentication is done using
IAM
Role credentials by default. To enable the plugin, set the unicast host
provider for Zen discovery to ec2
:
discovery.zen.hosts_provider: ec2
Settings
EC2 host discovery supports a number of settings. Some settings are sensitive and must be stored in the {ref}/secure-settings.html[elasticsearch keystore]. For example, to use explicit AWS access keys:
bin/elasticsearch-keystore add discovery.ec2.access_key
bin/elasticsearch-keystore add discovery.ec2.secret_key
The following are the available discovery settings. All should be prefixed with discovery.ec2.
.
Those that must be stored in the keystore are marked as Secure
.
access_key
-
An ec2 access key. The
secret_key
setting must also be specified. (Secure) secret_key
-
An ec2 secret key. The
access_key
setting must also be specified. (Secure) session_token
-
An ec2 session token. The
access_key
andsecret_key
settings must also be specified. (Secure) endpoint
-
The ec2 service endpoint to connect to. See http://docs.aws.amazon.com/general/latest/gr/rande.html#ec2_region. This defaults to
ec2.us-east-1.amazonaws.com
. protocol
-
The protocol to use to connect to ec2. Valid values are either
http
orhttps
. Defaults tohttps
. proxy.host
-
The host name of a proxy to connect to ec2 through.
proxy.port
-
The port of a proxy to connect to ec2 through.
proxy.username
-
The username to connect to the
proxy.host
with. (Secure) proxy.password
-
The password to connect to the
proxy.host
with. (Secure) read_timeout
-
The socket timeout for connecting to ec2. The value should specify the unit. For example, a value of
5s
specifies a 5 second timeout. The default value is 50 seconds. groups
-
Either a comma separated list or array based list of (security) groups. Only instances with the provided security groups will be used in the cluster discovery. (NOTE: You could provide either group NAME or group ID.)
host_type
-
The type of host type to use to communicate with other instances. Can be one of
private_ip
,public_ip
,private_dns
,public_dns
ortag:TAGNAME
whereTAGNAME
refers to a name of a tag configured for all EC2 instances. Instances which don’t have this tag set will be ignored by the discovery process.For example if you defined a tag
my-elasticsearch-host
in ec2 and set it tomyhostname1.mydomain.com
, then settinghost_type: tag:my-elasticsearch-host
will tell Discovery Ec2 plugin to read the host name from themy-elasticsearch-host
tag. In this case, it will be resolved tomyhostname1.mydomain.com
. Read more about EC2 Tags.Defaults to
private_ip
. availability_zones
-
Either a comma separated list or array based list of availability zones. Only instances within the provided availability zones will be used in the cluster discovery.
any_group
-
If set to
false
, will require all security groups to be present for the instance to be used for the discovery. Defaults totrue
. node_cache_time
-
How long the list of hosts is cached to prevent further requests to the AWS API. Defaults to
10s
.
All secure settings of this plugin are {ref}/secure-settings.html#reloadable-secure-settings[reloadable]. After you reload the settings, an aws sdk client with the latest settings from the keystore will be used.
Important
|
Binding the network host
It’s important to define You can use {ref}/modules-network.html[core network host settings] or ec2 specific host settings: |
EC2 Network Host
When the discovery-ec2
plugin is installed, the following are also allowed
as valid network host settings:
EC2 Host Value | Description |
---|---|
|
The private IP address (ipv4) of the machine. |
|
The private host of the machine. |
|
The public IP address (ipv4) of the machine. |
|
The public host of the machine. |
|
equivalent to |
|
equivalent to |
|
equivalent to |
Recommended EC2 Permissions
EC2 discovery requires making a call to the EC2 service. You’ll want to setup an IAM policy to allow this. You can create a custom policy via the IAM Management Console. It should look similar to this.
{
"Statement": [
{
"Action": [
"ec2:DescribeInstances"
],
"Effect": "Allow",
"Resource": [
"*"
]
}
],
"Version": "2012-10-17"
}
Filtering by Tags
The ec2 discovery can also filter machines to include in the cluster based on tags (and not just groups). The settings
to use include the discovery.ec2.tag.
prefix. For example, if you defined a tag stage
in EC2 and set it to dev
,
setting discovery.ec2.tag.stage
to dev
will only filter instances with a tag key set to stage
, and a value
of dev
. Adding multiple discovery.ec2.tag
settings will require all of those tags to be set for the instance to be included.
One practical use for tag filtering is when an ec2 cluster contains many nodes that are not running Elasticsearch. In
this case (particularly with high discovery.zen.ping_timeout
values) there is a risk that a new node’s discovery phase
will end before it has found the cluster (which will result in it declaring itself master of a new cluster with the same
name - highly undesirable). Tagging Elasticsearch ec2 nodes and then filtering by that tag will resolve this issue.
Automatic Node Attributes
Though not dependent on actually using ec2
as discovery (but still requires the discovery-ec2
plugin installed), the
plugin can automatically add node attributes relating to ec2. In the future this may support other attributes, but this will
currently only add an aws_availability_zone
node attribute, which is the availability zone of the current node. Attributes
can be used to isolate primary and replica shards across availability zones by using the
{ref}/allocation-awareness.html[Allocation Awareness] feature.
In order to enable it, set cloud.node.auto_attributes
to true
in the settings. For example:
cloud.node.auto_attributes: true
cluster.routing.allocation.awareness.attributes: aws_availability_zone
Best Practices in AWS
Collection of best practices and other information around running Elasticsearch on AWS.
Instance/Disk
When selecting disk please be aware of the following order of preference:
-
EFS - Avoid as the sacrifices made to offer durability, shared storage, and grow/shrink come at performance cost, such file systems have been known to cause corruption of indices, and due to Elasticsearch being distributed and having built-in replication, the benefits that EFS offers are not needed.
-
EBS - Works well if running a small cluster (1-2 nodes) and cannot tolerate the loss all storage backing a node easily or if running indices with no replicas. If EBS is used, then leverage provisioned IOPS to ensure performance.
-
Instance Store - When running clusters of larger size and with replicas the ephemeral nature of Instance Store is ideal since Elasticsearch can tolerate the loss of shards. With Instance Store one gets the performance benefit of having disk physically attached to the host running the instance and also the cost benefit of avoiding paying extra for EBS.
Prefer Amazon Linux AMIs as since Elasticsearch runs on the JVM, OS dependencies are very minimal and one can benefit from the lightweight nature, support, and performance tweaks specific to EC2 that the Amazon Linux AMIs offer.
Networking
-
Networking throttling takes place on smaller instance types in both the form of bandwidth and number of connections. Therefore if large number of connections are needed and networking is becoming a bottleneck, avoid instance types with networking labeled as
Moderate
orLow
. -
Multicast is not supported, even when in an VPC; the aws cloud plugin which joins by performing a security group lookup.
-
When running in multiple availability zones be sure to leverage {ref}/allocation-awareness.html[shard allocation awareness] so that not all copies of shard data reside in the same availability zone.
-
Do not span a cluster across regions. If necessary, use cross cluster search.
Misc
-
If you have split your nodes into roles, consider tagging the EC2 instances by role to make it easier to filter and view your EC2 instances in the AWS console.
-
Consider enabling termination protection for all of your instances to avoid accidentally terminating a node in the cluster and causing a potentially disruptive reallocation.