"Fossies" - the Fresh Open Source Software Archive

Member "elasticsearch-6.8.23/docs/plugins/discovery.asciidoc" (29 Dec 2021, 1302 Bytes) of package /linux/www/elasticsearch-6.8.23-src.tar.gz:


As a special service "Fossies" has tried to format the requested source page into HTML format (assuming AsciiDoc format). Alternatively you can here view or download the uninterpreted source code file. A member file download can also be achieved by clicking within a package contents listing on the according byte size field.

Discovery Plugins

Discovery plugins extend Elasticsearch by adding new discovery mechanisms that can be used instead of {ref}/modules-discovery-zen.html[Zen Discovery].

Core discovery plugins

The core discovery plugins are:

EC2 discovery

The EC2 discovery plugin uses the AWS API for unicast discovery.

Azure Classic discovery

The Azure Classic discovery plugin uses the Azure Classic API for unicast discovery.

GCE discovery

The Google Compute Engine discovery plugin uses the GCE API for unicast discovery.

File-based discovery

The File-based discovery plugin allows providing the unicast hosts list through a dynamically updatable file.

Community contributed discovery plugins

A number of discovery plugins have been contributed by our community:

EC2 Discovery Plugin

The EC2 discovery plugin uses the AWS API for unicast discovery.

If you are looking for a hosted solution of Elasticsearch on AWS, please visit http://www.elastic.co/cloud.

Installation

This plugin can be installed using the plugin manager:

sudo bin/elasticsearch-plugin install discovery-ec2

The plugin must be installed on every node in the cluster, and each node must be restarted after installation.

This plugin can be downloaded for offline install from {plugin_url}/discovery-ec2/discovery-ec2-{version}.zip.

Removal

The plugin can be removed with the following command:

sudo bin/elasticsearch-plugin remove discovery-ec2

The node must be stopped before removing the plugin.

Getting started with AWS

The plugin provides a hosts provider for zen discovery named ec2. This hosts provider finds other Elasticsearch instances in EC2 through AWS metadata. Authentication is done using IAM Role credentials by default. To enable the plugin, set the unicast host provider for Zen discovery to ec2:

discovery.zen.hosts_provider: ec2

Settings

EC2 host discovery supports a number of settings. Some settings are sensitive and must be stored in the {ref}/secure-settings.html[elasticsearch keystore]. For example, to use explicit AWS access keys:

bin/elasticsearch-keystore add discovery.ec2.access_key
bin/elasticsearch-keystore add discovery.ec2.secret_key

The following are the available discovery settings. All should be prefixed with discovery.ec2.. Those that must be stored in the keystore are marked as Secure.

access_key

An ec2 access key. The secret_key setting must also be specified. (Secure)

secret_key

An ec2 secret key. The access_key setting must also be specified. (Secure)

session_token

An ec2 session token. The access_key and secret_key settings must also be specified. (Secure)

endpoint

The ec2 service endpoint to connect to. See http://docs.aws.amazon.com/general/latest/gr/rande.html#ec2_region. This defaults to ec2.us-east-1.amazonaws.com.

protocol

The protocol to use to connect to ec2. Valid values are either http or https. Defaults to https.

proxy.host

The host name of a proxy to connect to ec2 through.

proxy.port

The port of a proxy to connect to ec2 through.

proxy.username

The username to connect to the proxy.host with. (Secure)

proxy.password

The password to connect to the proxy.host with. (Secure)

read_timeout

The socket timeout for connecting to ec2. The value should specify the unit. For example, a value of 5s specifies a 5 second timeout. The default value is 50 seconds.

groups

Either a comma separated list or array based list of (security) groups. Only instances with the provided security groups will be used in the cluster discovery. (NOTE: You could provide either group NAME or group ID.)

host_type

The type of host type to use to communicate with other instances. Can be one of private_ip, public_ip, private_dns, public_dns or tag:TAGNAME where TAGNAME refers to a name of a tag configured for all EC2 instances. Instances which don’t have this tag set will be ignored by the discovery process.

For example if you defined a tag my-elasticsearch-host in ec2 and set it to myhostname1.mydomain.com, then setting host_type: tag:my-elasticsearch-host will tell Discovery Ec2 plugin to read the host name from the my-elasticsearch-host tag. In this case, it will be resolved to myhostname1.mydomain.com. Read more about EC2 Tags.

Defaults to private_ip.

availability_zones

Either a comma separated list or array based list of availability zones. Only instances within the provided availability zones will be used in the cluster discovery.

any_group

If set to false, will require all security groups to be present for the instance to be used for the discovery. Defaults to true.

node_cache_time

How long the list of hosts is cached to prevent further requests to the AWS API. Defaults to 10s.

All secure settings of this plugin are {ref}/secure-settings.html#reloadable-secure-settings[reloadable]. After you reload the settings, an aws sdk client with the latest settings from the keystore will be used.

Important
Binding the network host

It’s important to define network.host as by default it’s bound to localhost.

You can use {ref}/modules-network.html[core network host settings] or ec2 specific host settings:

EC2 Network Host

When the discovery-ec2 plugin is installed, the following are also allowed as valid network host settings:

EC2 Host Value Description

ec2:privateIpv4

The private IP address (ipv4) of the machine.

ec2:privateDns

The private host of the machine.

ec2:publicIpv4

The public IP address (ipv4) of the machine.

ec2:publicDns

The public host of the machine.

ec2:privateIp

equivalent to ec2:privateIpv4.

ec2:publicIp

equivalent to ec2:publicIpv4.

ec2

equivalent to ec2:privateIpv4.

Recommended EC2 Permissions

EC2 discovery requires making a call to the EC2 service. You’ll want to setup an IAM policy to allow this. You can create a custom policy via the IAM Management Console. It should look similar to this.

{
  "Statement": [
    {
      "Action": [
        "ec2:DescribeInstances"
      ],
      "Effect": "Allow",
      "Resource": [
        "*"
      ]
    }
  ],
  "Version": "2012-10-17"
}
Filtering by Tags

The ec2 discovery can also filter machines to include in the cluster based on tags (and not just groups). The settings to use include the discovery.ec2.tag. prefix. For example, if you defined a tag stage in EC2 and set it to dev, setting discovery.ec2.tag.stage to dev will only filter instances with a tag key set to stage, and a value of dev. Adding multiple discovery.ec2.tag settings will require all of those tags to be set for the instance to be included.

One practical use for tag filtering is when an ec2 cluster contains many nodes that are not running Elasticsearch. In this case (particularly with high discovery.zen.ping_timeout values) there is a risk that a new node’s discovery phase will end before it has found the cluster (which will result in it declaring itself master of a new cluster with the same name - highly undesirable). Tagging Elasticsearch ec2 nodes and then filtering by that tag will resolve this issue.

Automatic Node Attributes

Though not dependent on actually using ec2 as discovery (but still requires the discovery-ec2 plugin installed), the plugin can automatically add node attributes relating to ec2. In the future this may support other attributes, but this will currently only add an aws_availability_zone node attribute, which is the availability zone of the current node. Attributes can be used to isolate primary and replica shards across availability zones by using the {ref}/allocation-awareness.html[Allocation Awareness] feature.

In order to enable it, set cloud.node.auto_attributes to true in the settings. For example:

cloud.node.auto_attributes: true

cluster.routing.allocation.awareness.attributes: aws_availability_zone

Best Practices in AWS

Collection of best practices and other information around running Elasticsearch on AWS.

Instance/Disk

When selecting disk please be aware of the following order of preference:

  • EFS - Avoid as the sacrifices made to offer durability, shared storage, and grow/shrink come at performance cost, such file systems have been known to cause corruption of indices, and due to Elasticsearch being distributed and having built-in replication, the benefits that EFS offers are not needed.

  • EBS - Works well if running a small cluster (1-2 nodes) and cannot tolerate the loss all storage backing a node easily or if running indices with no replicas. If EBS is used, then leverage provisioned IOPS to ensure performance.

  • Instance Store - When running clusters of larger size and with replicas the ephemeral nature of Instance Store is ideal since Elasticsearch can tolerate the loss of shards. With Instance Store one gets the performance benefit of having disk physically attached to the host running the instance and also the cost benefit of avoiding paying extra for EBS.

Prefer Amazon Linux AMIs as since Elasticsearch runs on the JVM, OS dependencies are very minimal and one can benefit from the lightweight nature, support, and performance tweaks specific to EC2 that the Amazon Linux AMIs offer.

Networking
  • Networking throttling takes place on smaller instance types in both the form of bandwidth and number of connections. Therefore if large number of connections are needed and networking is becoming a bottleneck, avoid instance types with networking labeled as Moderate or Low.

  • Multicast is not supported, even when in an VPC; the aws cloud plugin which joins by performing a security group lookup.

  • When running in multiple availability zones be sure to leverage {ref}/allocation-awareness.html[shard allocation awareness] so that not all copies of shard data reside in the same availability zone.

  • Do not span a cluster across regions. If necessary, use cross cluster search.

Misc
  • If you have split your nodes into roles, consider tagging the EC2 instances by role to make it easier to filter and view your EC2 instances in the AWS console.

  • Consider enabling termination protection for all of your instances to avoid accidentally terminating a node in the cluster and causing a potentially disruptive reallocation.

Azure Classic Discovery Plugin

The Azure Classic Discovery plugin uses the Azure Classic API for unicast discovery.

deprecated[5.0.0, Use coming Azure ARM Discovery plugin instead]

Installation

This plugin can be installed using the plugin manager:

sudo bin/elasticsearch-plugin install discovery-azure-classic

The plugin must be installed on every node in the cluster, and each node must be restarted after installation.

This plugin can be downloaded for offline install from {plugin_url}/discovery-azure-classic/discovery-azure-classic-{version}.zip.

Removal

The plugin can be removed with the following command:

sudo bin/elasticsearch-plugin remove discovery-azure-classic

The node must be stopped before removing the plugin.

Azure Virtual Machine Discovery

Azure VM discovery allows to use the azure APIs to perform automatic discovery (similar to multicast in non hostile multicast environments). Here is a simple sample configuration:

cloud:
    azure:
        management:
             subscription.id: XXX-XXX-XXX-XXX
             cloud.service.name: es-demo-app
             keystore:
                   path: /path/to/azurekeystore.pkcs12
                   password: WHATEVER
                   type: pkcs12

discovery:
    zen.hosts_provider: azure
Important
Binding the network host

The keystore file must be placed in a directory accessible by Elasticsearch like the config directory.

It’s important to define network.host as by default it’s bound to localhost.

You can use {ref}/modules-network.html[core network host settings]. For example en0.

How to start (short story)
  • Create Azure instances

  • Install Elasticsearch

  • Install Azure plugin

  • Modify elasticsearch.yml file

  • Start Elasticsearch

Azure credential API settings

The following are a list of settings that can further control the credential API:

cloud.azure.management.keystore.path

/path/to/keystore

cloud.azure.management.keystore.type

pkcs12, jceks or jks. Defaults to pkcs12.

cloud.azure.management.keystore.password

your_password for the keystore

cloud.azure.management.subscription.id

your_azure_subscription_id

cloud.azure.management.cloud.service.name

your_azure_cloud_service_name. This is the cloud service name/DNS but without the cloudapp.net part. So if the DNS name is abc.cloudapp.net then the cloud.service.name to use is just abc.

Advanced settings

The following are a list of settings that can further control the discovery:

discovery.azure.host.type

Either public_ip or private_ip (default). Azure discovery will use the one you set to ping other nodes.

discovery.azure.endpoint.name

When using public_ip this setting is used to identify the endpoint name used to forward requests to Elasticsearch (aka transport port name). Defaults to elasticsearch. In Azure management console, you could define an endpoint elasticsearch forwarding for example requests on public IP on port 8100 to the virtual machine on port 9300.

discovery.azure.deployment.name

Deployment name if any. Defaults to the value set with cloud.azure.management.cloud.service.name.

discovery.azure.deployment.slot

Either staging or production (default).

For example:

discovery:
    type: azure
    azure:
        host:
            type: private_ip
        endpoint:
            name: elasticsearch
        deployment:
            name: your_azure_cloud_service_name
            slot: production

Setup process for Azure Discovery

We will expose here one strategy which is to hide our Elasticsearch cluster from outside.

With this strategy, only VMs behind the same virtual port can talk to each other. That means that with this mode, you can use Elasticsearch unicast discovery to build a cluster, using the Azure API to retrieve information about your nodes.

Prerequisites

Before starting, you need to have:

  • A Windows Azure account

  • OpenSSL that isn’t from MacPorts, specifically OpenSSL 1.0.1f 6 Jan 2014 doesn’t seem to create a valid keypair for ssh. FWIW, OpenSSL 1.0.1c 10 May 2012 on Ubuntu 14.04 LTS is known to work.

  • SSH keys and certificate

    You should follow this guide to learn how to create or use existing SSH keys. If you have already did it, you can skip the following.

    Here is a description on how to generate SSH keys using openssl:

    # You may want to use another dir than /tmp
    cd /tmp
    openssl req -x509 -nodes -days 365 -newkey rsa:2048 -keyout azure-private.key -out azure-certificate.pem
    chmod 600 azure-private.key azure-certificate.pem
    openssl x509 -outform der -in azure-certificate.pem -out azure-certificate.cer

    Generate a keystore which will be used by the plugin to authenticate with a certificate all Azure API calls.

    # Generate a keystore (azurekeystore.pkcs12)
    # Transform private key to PEM format
    openssl pkcs8 -topk8 -nocrypt -in azure-private.key -inform PEM -out azure-pk.pem -outform PEM
    # Transform certificate to PEM format
    openssl x509 -inform der -in azure-certificate.cer -out azure-cert.pem
    cat azure-cert.pem azure-pk.pem > azure.pem.txt
    # You MUST enter a password!
    openssl pkcs12 -export -in azure.pem.txt -out azurekeystore.pkcs12 -name azure -noiter -nomaciter

    Upload the azure-certificate.cer file both in the Elasticsearch Cloud Service (under Manage Certificates), and under Settings → Manage Certificates.

    Important
    When prompted for a password, you need to enter a non empty one.

    See this guide for more details about how to create keys for Azure.

    Once done, you need to upload your certificate in Azure:

    • Go to the management console.

    • Sign in using your account.

    • Click on Portal.

    • Go to Settings (bottom of the left list)

    • On the bottom bar, click on Upload and upload your azure-certificate.cer file.

    You may want to use Windows Azure Command-Line Tool:

  • Install NodeJS, for example using homebrew on MacOS X:

    brew install node
  • Install Azure tools

    sudo npm install azure-cli -g
  • Download and import your azure settings:

    # This will open a browser and will download a .publishsettings file
    azure account download
    
    # Import this file (we have downloaded it to /tmp)
    # Note, it will create needed files in ~/.azure. You can remove azure.publishsettings when done.
    azure account import /tmp/azure.publishsettings
Creating your first instance

You need to have a storage account available. Check Azure Blob Storage documentation for more information.

You will need to choose the operating system you want to run on. To get a list of official available images, run:

azure vm image list

Let’s say we are going to deploy an Ubuntu image on an extra small instance in West Europe:

Azure cluster name

azure-elasticsearch-cluster

Image

b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-13_10-amd64-server-20130808-alpha3-en-us-30GB

VM Name

myesnode1

VM Size

extrasmall

Location

West Europe

Login

elasticsearch

Password

password1234!!

Using command line:

azure vm create azure-elasticsearch-cluster \
                b39f27a8b8c64d52b05eac6a62ebad85__Ubuntu-13_10-amd64-server-20130808-alpha3-en-us-30GB \
                --vm-name myesnode1 \
                --location "West Europe" \
                --vm-size extrasmall \
                --ssh 22 \
                --ssh-cert /tmp/azure-certificate.pem \
                elasticsearch password1234\!\!

You should see something like:

info:    Executing command vm create
+ Looking up image
+ Looking up cloud service
+ Creating cloud service
+ Retrieving storage accounts
+ Configuring certificate
+ Creating VM
info:    vm create command OK

Now, your first instance is started.

Tip
Working with SSH

You need to give the private key and username each time you log on your instance:

ssh -i ~/.ssh/azure-private.key elasticsearch@myescluster.cloudapp.net

But you can also define it once in ~/.ssh/config file:

Host *.cloudapp.net
 User elasticsearch
 StrictHostKeyChecking no
 UserKnownHostsFile=/dev/null
 IdentityFile ~/.ssh/azure-private.key

Next, you need to install Elasticsearch on your new instance. First, copy your keystore to the instance, then connect to the instance using SSH:

scp /tmp/azurekeystore.pkcs12 azure-elasticsearch-cluster.cloudapp.net:/home/elasticsearch
ssh azure-elasticsearch-cluster.cloudapp.net

Once connected, install Elasticsearch:

# Install Latest Java version
# Read http://www.webupd8.org/2012/09/install-oracle-java-8-in-ubuntu-via-ppa.html for details
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

# If you want to install OpenJDK instead
# sudo apt-get update
# sudo apt-get install openjdk-8-jre-headless

# Download Elasticsearch
curl -s https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-{version}.deb -o elasticsearch-{version}.deb

# Prepare Elasticsearch installation
sudo dpkg -i elasticsearch-{version}.deb

Check that Elasticsearch is running:

GET /

This command should give you a JSON result:

{
  "name" : "Cp8oag6",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "AT69_T_DTp-1qgIJlatQqA",
  "version" : {
    "number" : "{version}",
    "build_flavor" : "{build_flavor}",
    "build_type" : "zip",
    "build_hash" : "f27399d",
    "build_date" : "2016-03-30T09:51:41.449Z",
    "build_snapshot" : false,
    "lucene_version" : "{lucene_version}",
    "minimum_wire_compatibility_version" : "1.2.3",
    "minimum_index_compatibility_version" : "1.2.3"
  },
  "tagline" : "You Know, for Search"
}
Install Elasticsearch cloud azure plugin
# Stop Elasticsearch
sudo service elasticsearch stop

# Install the plugin
sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install discovery-azure-classic

# Configure it
sudo vi /etc/elasticsearch/elasticsearch.yml

And add the following lines:

# If you don't remember your account id, you may get it with `azure account list`
cloud:
    azure:
        management:
             subscription.id: your_azure_subscription_id
             cloud.service.name: your_azure_cloud_service_name
             keystore:
                   path: /home/elasticsearch/azurekeystore.pkcs12
                   password: your_password_for_keystore

discovery:
    type: azure

# Recommended (warning: non durable disk)
# path.data: /mnt/resource/elasticsearch/data

Restart Elasticsearch:

sudo service elasticsearch start

If anything goes wrong, check your logs in /var/log/elasticsearch.

Scaling Out!

You need first to create an image of your previous machine. Disconnect from your machine and run locally the following commands:

# Shutdown the instance
azure vm shutdown myesnode1

# Create an image from this instance (it could take some minutes)
azure vm capture myesnode1 esnode-image --delete

# Note that the previous instance has been deleted (mandatory)
# So you need to create it again and BTW create other instances.

azure vm create azure-elasticsearch-cluster \
                esnode-image \
                --vm-name myesnode1 \
                --location "West Europe" \
                --vm-size extrasmall \
                --ssh 22 \
                --ssh-cert /tmp/azure-certificate.pem \
                elasticsearch password1234\!\!
Tip

It could happen that azure changes the endpoint public IP address. DNS propagation could take some minutes before you can connect again using name. You can get from azure the IP address if needed, using:

# Look at Network `Endpoints 0 Vip`
azure vm show myesnode1

Let’s start more instances!

for x in $(seq  2 10)
	do
		echo "Launching azure instance #$x..."
		azure vm create azure-elasticsearch-cluster \
		                esnode-image \
		                --vm-name myesnode$x \
		                --vm-size extrasmall \
		                --ssh $((21 + $x)) \
		                --ssh-cert /tmp/azure-certificate.pem \
		                --connect \
		                elasticsearch password1234\!\!
	done

If you want to remove your running instances:

azure vm delete myesnode1

GCE Discovery Plugin

The Google Compute Engine Discovery plugin uses the GCE API for unicast discovery.

Installation

This plugin can be installed using the plugin manager:

sudo bin/elasticsearch-plugin install discovery-gce

The plugin must be installed on every node in the cluster, and each node must be restarted after installation.

This plugin can be downloaded for offline install from {plugin_url}/discovery-gce/discovery-gce-{version}.zip.

Removal

The plugin can be removed with the following command:

sudo bin/elasticsearch-plugin remove discovery-gce

The node must be stopped before removing the plugin.

GCE Virtual Machine Discovery

Google Compute Engine VM discovery allows to use the google APIs to perform automatic discovery (similar to multicast in non hostile multicast environments). Here is a simple sample configuration:

cloud:
  gce:
      project_id: <your-google-project-id>
      zone: <your-zone>
discovery:
      zen.hosts_provider: gce

The following gce settings (prefixed with cloud.gce) are supported:

project_id

Your Google project id. By default the project id will be derived from the instance metadata.

Note: Deriving the project id from system properties or environment variables
(`GOOGLE_CLOUD_PROJECT` or `GCLOUD_PROJECT`) is not supported.
zone

helps to retrieve instances running in a given zone. It should be one of the GCE supported zones. By default the zone will be derived from the instance metadata. See also Using GCE zones.

retry

If set to true, client will use ExponentialBackOff policy to retry the failed http request. Defaults to true.

max_wait

The maximum elapsed time after the client instantiating retry. If the time elapsed goes past the max_wait, client stops to retry. A negative value means that it will wait indefinitely. Defaults to 0s (retry indefinitely).

refresh_interval

How long the list of hosts is cached to prevent further requests to the GCE API. 0s disables caching. A negative value will cause infinite caching. Defaults to 0s.

Important
Binding the network host

It’s important to define network.host as by default it’s bound to localhost.

You can use {ref}/modules-network.html[core network host settings] or gce specific host settings:

GCE Network Host

When the discovery-gce plugin is installed, the following are also allowed as valid network host settings:

GCE Host Value Description

gce:privateIp:X

The private IP address of the machine for a given network interface.

gce:hostname

The hostname of the machine.

gce

Same as gce:privateIp:0 (recommended).

Examples:

# get the IP address from network interface 1
network.host: _gce:privateIp:1_
# Using GCE internal hostname
network.host: _gce:hostname_
# shortcut for _gce:privateIp:0_ (recommended)
network.host: _gce_
How to start (short story)
  • Create Google Compute Engine instance (with compute rw permissions)

  • Install Elasticsearch

  • Install Google Compute Engine Cloud plugin

  • Modify elasticsearch.yml file

  • Start Elasticsearch

Setting up GCE Discovery

Prerequisites

Before starting, you need:

If you did not set it yet, you can define your default project you will work on:

gcloud config set project es-cloud
Login to Google Cloud

If you haven’t already, login to Google Cloud

gcloud auth login

This will open your browser. You will be asked to sign-in to a Google account and authorize access to the Google Cloud SDK.

Creating your first instance
gcloud compute instances create myesnode1 \
       --zone <your-zone> \
       --scopes compute-rw

When done, a report like this one should appears:

Created [https://www.googleapis.com/compute/v1/projects/es-cloud-1070/zones/us-central1-f/instances/myesnode1].
NAME      ZONE          MACHINE_TYPE  PREEMPTIBLE INTERNAL_IP   EXTERNAL_IP   STATUS
myesnode1 us-central1-f n1-standard-1             10.240.133.54 104.197.94.25 RUNNING

You can now connect to your instance:

# Connect using google cloud SDK
gcloud compute ssh myesnode1 --zone europe-west1-a

# Or using SSH with external IP address
ssh -i ~/.ssh/google_compute_engine 192.158.29.199
Important
Service Account Permissions

It’s important when creating an instance that the correct permissions are set. At a minimum, you must ensure you have:

scopes=compute-rw

Failing to set this will result in unauthorized messages when starting Elasticsearch. See Machine Permissions.

Once connected, install Elasticsearch:

sudo apt-get update

# Download Elasticsearch
wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-2.0.0.deb

# Prepare Java installation (Oracle)
sudo echo "deb http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main" | sudo tee /etc/apt/sources.list.d/webupd8team-java.list
sudo echo "deb-src http://ppa.launchpad.net/webupd8team/java/ubuntu trusty main" | sudo tee -a /etc/apt/sources.list.d/webupd8team-java.list
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys EEA14886
sudo apt-get update
sudo apt-get install oracle-java8-installer

# Prepare Java installation (or OpenJDK)
# sudo apt-get install java8-runtime-headless

# Prepare Elasticsearch installation
sudo dpkg -i elasticsearch-2.0.0.deb
Install Elasticsearch discovery gce plugin

Install the plugin:

# Use Plugin Manager to install it
sudo bin/elasticsearch-plugin install discovery-gce

Open the elasticsearch.yml file:

sudo vi /etc/elasticsearch/elasticsearch.yml

And add the following lines:

cloud:
  gce:
      project_id: es-cloud
      zone: europe-west1-a
discovery:
      zen.hosts_provider: gce

Start Elasticsearch:

sudo /etc/init.d/elasticsearch start

If anything goes wrong, you should check logs:

tail -f /var/log/elasticsearch/elasticsearch.log

If needed, you can change log level to trace by opening log4j2.properties:

sudo vi /etc/elasticsearch/log4j2.properties

and adding the following line:

# discovery
logger.discovery_gce.name = discovery.gce
logger.discovery_gce.level = trace

Cloning your existing machine

In order to build a cluster on many nodes, you can clone your configured instance to new nodes. You won’t have to reinstall everything!

First create an image of your running instance and upload it to Google Cloud Storage:

# Create an image of your current instance
sudo /usr/bin/gcimagebundle -d /dev/sda -o /tmp/

# An image has been created in `/tmp` directory:
ls /tmp
e4686d7f5bf904a924ae0cfeb58d0827c6d5b966.image.tar.gz

# Upload your image to Google Cloud Storage:
# Create a bucket to hold your image, let's say `esimage`:
gsutil mb gs://esimage

# Copy your image to this bucket:
gsutil cp /tmp/e4686d7f5bf904a924ae0cfeb58d0827c6d5b966.image.tar.gz gs://esimage

# Then add your image to images collection:
gcloud compute images create elasticsearch-2-0-0 --source-uri gs://esimage/e4686d7f5bf904a924ae0cfeb58d0827c6d5b966.image.tar.gz

# If the previous command did not work for you, logout from your instance
# and launch the same command from your local machine.
Start new instances

As you have now an image, you can create as many instances as you need:

# Just change node name (here myesnode2)
gcloud compute instances create myesnode2 --image elasticsearch-2-0-0 --zone europe-west1-a

# If you want to provide all details directly, you can use:
gcloud compute instances create myesnode2 --image=elasticsearch-2-0-0 \
       --zone europe-west1-a --machine-type f1-micro --scopes=compute-rw
Remove an instance (aka shut it down)

You can use Google Cloud Console or CLI to manage your instances:

# Stopping and removing instances
gcloud compute instances delete myesnode1 myesnode2 \
       --zone=europe-west1-a

# Consider removing disk as well if you don't need them anymore
gcloud compute disks delete boot-myesnode1 boot-myesnode2  \
       --zone=europe-west1-a

Using GCE zones

cloud.gce.zone helps to retrieve instances running in a given zone. It should be one of the GCE supported zones.

The GCE discovery can support multi zones although you need to be aware of network latency between zones. To enable discovery across more than one zone, just enter add your zone list to cloud.gce.zone setting:

cloud:
  gce:
      project_id: <your-google-project-id>
      zone: ["<your-zone1>", "<your-zone2>"]
discovery:
      zen.hosts_provider: gce

Filtering by tags

The GCE discovery can also filter machines to include in the cluster based on tags using discovery.gce.tags settings. For example, setting discovery.gce.tags to dev will only filter instances having a tag set to dev. Several tags set will require all of those tags to be set for the instance to be included.

One practical use for tag filtering is when an GCE cluster contains many nodes that are not running Elasticsearch. In this case (particularly with high discovery.zen.ping_timeout values) there is a risk that a new node’s discovery phase will end before it has found the cluster (which will result in it declaring itself master of a new cluster with the same name - highly undesirable). Adding tag on Elasticsearch GCE nodes and then filtering by that tag will resolve this issue.

Add your tag when building the new instance:

gcloud compute instances create myesnode1 --project=es-cloud \
       --scopes=compute-rw \
       --tags=elasticsearch,dev

Then, define it in elasticsearch.yml:

cloud:
  gce:
      project_id: es-cloud
      zone: europe-west1-a
discovery:
      zen.hosts_provider: gce
      gce:
            tags: elasticsearch, dev

Changing default transport port

By default, Elasticsearch GCE plugin assumes that you run Elasticsearch on 9300 default port. But you can specify the port value Elasticsearch is meant to use using google compute engine metadata es_port:

When creating instance

Add --metadata es_port=9301 option:

# when creating first instance
gcloud compute instances create myesnode1 \
       --scopes=compute-rw,storage-full \
       --metadata es_port=9301

# when creating an instance from an image
gcloud compute instances create myesnode2 --image=elasticsearch-1-0-0-RC1 \
       --zone europe-west1-a --machine-type f1-micro --scopes=compute-rw \
       --metadata es_port=9301
On a running instance
gcloud compute instances add-metadata myesnode1 \
       --zone europe-west1-a \
       --metadata es_port=9301

GCE Tips

Store project id locally

If you don’t want to repeat the project id each time, you can save it in the local gcloud config

gcloud config set project es-cloud
Machine Permissions

If you have created a machine without the correct permissions, you will see 403 unauthorized error messages. To change machine permission on an existing instance, first stop the instance then Edit. Scroll down to Access Scopes to change permission. The other way to alter these permissions is to delete the instance (NOT THE DISK). Then create another with the correct permissions.

Creating machines with gcloud

Ensure the following flags are set:

--scopes=compute-rw
Creating with console (web)

When creating an instance using the web portal, click Show advanced options.

At the bottom of the page, under PROJECT ACCESS, choose >> Compute >> Read Write.

Creating with knife google

Set the service account scopes when creating the machine:

knife google server create www1 \
    -m n1-standard-1 \
    -I debian-8 \
    -Z us-central1-a \
    -i ~/.ssh/id_rsa \
    -x jdoe \
    --gce-service-account-scopes https://www.googleapis.com/auth/compute.full_control

Or, you may use the alias:

    --gce-service-account-scopes compute-rw

Testing GCE

Integrations tests in this plugin require working GCE configuration and therefore disabled by default. To enable tests prepare a config file elasticsearch.yml with the following content:

cloud:
  gce:
      project_id: es-cloud
      zone: europe-west1-a
discovery:
      zen.hosts_provider: gce

Replaces project_id and zone with your settings.

To run test:

mvn -Dtests.gce=true -Dtests.config=/path/to/config/file/elasticsearch.yml clean test

File-Based Discovery Plugin

The functionality provided by the discovery-file plugin is now available in Elasticsearch without requiring a plugin. This plugin still exists to ensure backwards compatibility, but it will be removed in a future version.

On installation, this plugin creates a file at $ES_PATH_CONF/discovery-file/unicast_hosts.txt that comprises comments that describe how to use it. It is preferable not to install this plugin and instead to create this file, and its containing directory, using standard tools.

Installation

This plugin can be installed using the plugin manager:

sudo bin/elasticsearch-plugin install discovery-file

The plugin must be installed on every node in the cluster, and each node must be restarted after installation.

This plugin can be downloaded for offline install from {plugin_url}/discovery-file/discovery-file-{version}.zip.

Removal

The plugin can be removed with the following command:

sudo bin/elasticsearch-plugin remove discovery-file

The node must be stopped before removing the plugin.