Managing Amazon Elasticsearch Service Domains

As the size and number of documents in your Amazon Elasticsearch Service (Amazon ES)
domain grow and as network
traffic increases, you likely will need to update the configuration of your Elasticsearch
cluster. To know when it's time to reconfigure your domain, you need to monitor domain
metrics. You also have the option of managing your own index snapshots, auditing
data-related API calls to your domain, and assigning tags to your domain. This section
describes how to perform these and other tasks related to managing your domains.

About Dedicated Master
Nodes

Amazon Elasticsearch Service (Amazon ES) uses dedicated master nodes to increase cluster
stability. A dedicated master node is a cluster node that performs cluster management
tasks, but does not hold data or respond to data upload requests. This offloading
of
cluster management tasks increases the stability of your Elasticsearch clusters.

Note

We recommend that you allocate three dedicated master nodes for each Amazon ES domain
in production.

A dedicated master node performs the following cluster management tasks:

Tracks all nodes in the cluster

Tracks the number of indices in the cluster

Tracks the number of shards belonging to each index

Maintains routing information for nodes in the cluster

Updates the cluster state after state changes, such as creating an index and
adding or removing nodes in the cluster

Replicates changes to the cluster state across all nodes in the cluster

Monitors the health of all cluster nodes by sending heartbeat signals, periodic signals that monitor the
availability of the data nodes in the cluster

The following illustration shows an Amazon ES domain with ten instances. Seven of
the
instances are data nodes and three are dedicated master nodes. Only one of the dedicated
master nodes is active; the two gray dedicated master nodes wait as backup in case
the
active dedicated master node fails. All data upload requests are served by the seven
data nodes, and all cluster management tasks are offloaded to the active dedicated
master node.

Although the dedicated master instances do not process search and query requests,
their size is highly correlated with the number of instances, indices, and shards
that
they can manage. For production clusters, we recommend the following sizing for
dedicated master instances. These recommendations are based on typical workloads on
the
service and can vary based on your workload requirements.

About Configuration
Changes

Amazon ES uses a blue-green deployment process when updating domains.
Blue-green typically refers to the practice of running two production environments,
one
live and one idle, and switching the two as you make software changes. In the case
of
Amazon ES, it refers to the practice of creating a new environment for domain updates
and
routing users to the new environment after those updates are complete. The practice
minimizes downtime and maintains the original environment in the event that deployment
to the new environment is unsuccessful.

Domain updates occur when you make configuration changes, but they also occur when
the
Amazon ES team makes certain software changes to the service. If you make configuration
changes, the domain state changes to Processing. If the
Amazon ES team makes software changes, the state remains Active. In both cases, you can review the cluster health and Amazon CloudWatch
metrics and see that the number of nodes in the cluster temporarily doubles while the domain update occurs. In the following
illustration, you can see the number of nodes doubling from 11 to 22 during a
configuration change and returning to 11 when the update is complete.

This temporary doubling can strain the cluster's dedicated master nodes, which
suddenly have twice as many nodes to manage. It is important to maintain sufficient
capacity on dedicated master nodes to handle the overheard that is associated with
these
blue-green deployments.

Important

You do not incur any additional charges during
configuration changes and service maintenance. You are billed only for the number
of
nodes that you request for your cluster.

To prevent overloading dedicated master nodes, you can monitor usage with the Amazon
CloudWatch
metrics that are shown in the following table. Use a larger instance type for dedicated
master nodes when these metrics reach their maximum recommended values.

CloudWatch Metric

Guideline

MasterCPUUtilization

Measures the percentage utilization of the CPU for the dedicated
master nodes. We recommend increasing the size of the instance type when
this metric exceeds 40% with a domain status of Active and exceeds 60% with a domain status of Processing.

MasterJVMMemoryPressure

Measures the percentage utilization of the JVM memory for the
dedicated master nodes. We recommend increasing the size of the instance
type when this metric exceeds 60% with a domain status of Active and exceeds 85% with a domain status
of Processing.

Enabling Zone Awareness
(Console)

Each AWS Region is a separate geographic area with multiple, isolated locations
known as Availability Zones. To prevent data loss and minimize
downtime in the event of node and data center failure, you can use the Amazon ES console
to
allocate nodes and replica index shards that belong to an Elasticsearch cluster across
two
Availability Zones in the same region. This allocation is known as zone
awareness. If you enable zone awareness, you also must use the native
Elasticsearch API to create replica shards for your cluster. Amazon ES distributes
the replicas
across the nodes in the Availability Zones, which increases the availability of your
cluster. Enabling zone awareness for a cluster slightly increases network latencies.

Important

Zone awareness requires an even number of instances in the instance count. The
default configuration for any index is a replica count of 1. If you specify a
replica count of 0 for an index, zone awareness doesn't replicate the shards to the
second Availability Zone. Without replica shards, there are no replicas to
distribute to a second Availability Zone, and enabling the feature doesn't provide
protection from data loss.

The following illustration shows a four-node cluster with zone awareness enabled.
The
service places all the primary index shards in one Availability Zone and all the replica
shards in the second Availability Zone.

An Elasticsearch cluster is a collection of one or more data nodes, optional dedicated
master nodes, and storage required to run Elasticsearch and operate your Amazon ES
domain. Each
node in an Elasticsearch cluster automatically sends performance metrics to Amazon
CloudWatch in
one-minute intervals. Use the Monitoring tab in the Amazon Elasticsearch Service
console to view these metrics, provided at no charge.

Statistics provide you with broader insight into each metric. For example, view the
Average statistic for the CPUUtilization
metric to compute the average CPU utilization for all nodes in the cluster. Each of
the
metrics falls into one of three categories:

For a list of relevant statistics for each metric, see the tables in Cluster
Metrics. Some statistics are not relevant for a given metric. For
example, the Sum statistic is not meaningful for the
Nodes metric.

Choose Update graph.

Cluster
Metrics

Note

To check your cluster metrics if metrics are unavailable in the Amazon Elasticsearch
Service
console, use Amazon CloudWatch.

The AWS/ES namespace includes the following metrics for clusters.

Metric

Description

ClusterStatus.green

Indicates that all index shards are allocated to nodes in the cluster.

Relevant statistics: Minimum, Maximum

ClusterStatus.yellow

Indicates that the primary shards for all indices are allocated to nodes in a cluster,
but the replica shards for at least one index are not. Single node clusters always
initialize with this cluster status because there is no second node to which a replica
can be assigned. You can either increase your node count to obtain a green cluster
status, or you can use the Elasticsearch API to set the number_of_replicas setting for your index to 0. For more information, see Configuring Amazon Elasticsearch Service Domains and Update Indices Settings in the Elasticsearch documentation.

Relevant statistics: Minimum, Maximum

ClusterStatus.red

Indicates that the primary and replica shards of at least one index are not allocated
to nodes in a cluster. A common cause for this state is a lack of free storage space
on one or more of the data nodes in the cluster. In turn, a lack of free storage space
prevents the service from distributing replica shards to the affected data node or
nodes, and all new indices to start with a red cluster status. To recover, you must
add EBS-based storage to existing data nodes, use larger instance types, or delete
the indices and restore them from a snapshot. For more information, see Red Cluster Status.

Relevant statistics: Minimum, Maximum

Nodes

The number of nodes in the Amazon ES cluster.

Relevant Statistics: Minimum, Maximum, Average

SearchableDocuments

The total number of searchable documents across all indices in the cluster.

Relevant statistics: Minimum, Maximum, Average

DeletedDocuments

The total number of deleted documents across all indices in the cluster.

Relevant statistics: Minimum, Maximum, Average

CPUUtilization

The maximum percentage of CPU resources used for data nodes in the cluster.

Relevant statistics: Maximum, Average

FreeStorageSpace

The free space, in megabytes, for all data nodes in the cluster. Amazon ES throws
a ClusterBlockException when this metric reaches 0. To recover, you must either delete indices, add larger instances, or add EBS-based
storage to existing instances. To learn more, see Recovering from a Lack of Free Storage Space

Note

FreeStorageSpace will always be lower than the value that the Elasticsearch _cluster/stats API provides. Amazon ES reserves a percentage of the storage space on each instance
for internal operations.

Relevant statistics: Minimum

ClusterUsedSpace

The total used space, in megabytes, for a cluster. You can view this metric in the
Amazon CloudWatch console, but not in the Amazon ES console.

Relevant statistics: Minimum, Maximum

ClusterIndexWritesBlocked

Indicates whether your cluster is accepting or blocking incoming write requests. A
value of 0 means that the cluster is accepting requests. A value of 1 means that it
is blocking requests.

Many factors can cause a cluster to begin blocking requests. Some common factors include
the following: FreeStorageSpace is too low, JVMMemoryPressure is too high, or CPUUtilization is too high. To alleviate this issue, consider adding more disk space or scaling
your cluster.

Relevant statistics: Maximum

Note

You can view this metric in the Amazon CloudWatch console, but not the Amazon ES console.

JVMMemoryPressure

The maximum percentage of the Java heap used for all data nodes in the cluster.

Relevant statistics: Maximum

AutomatedSnapshotFailure

The number of failed automated snapshots for the cluster. A value of 1 indicates
that no automated snapshot was taken for the domain in the previous 36 hours.

Relevant statistics: Minimum, Maximum

CPUCreditBalance

The remaining CPU credits available for data nodes in the cluster. A CPU credit provides
the performance of a full CPU core for one minute. For more information, see CPU Credits in the Amazon EC2 Developer Guide. This metric is available only for the t2.micro.elasticsearch, t2.small.elasticsearch,
and t2.medium.elasticsearch instance types.

Relevant statistics: Minimum

KibanaHealthyNodes

A health check for Kibana. A value of 1 indicates normal behavior. A value of 0 indicates
that Kibana is inaccessible. In most cases, the health of Kibana mirrors the health
of the cluster.

Relevant statistics: Minimum

Note

You can view this metric on the Amazon CloudWatch console, but not the Amazon ES console.

The following screenshot shows the
cluster metrics that are described in the preceding table.

Dedicated
Master Node Metrics

The AWS/ES namespace includes the following metrics for dedicated master nodes.

Metric

Description

MasterCPUUtilization

The maximum percentage of CPU resources used by the dedicated master nodes. We recommend
increasing the size of the instance type when this metric reaches 60 percent.

Relevant statistics: Average

MasterFreeStorageSpace

This metric is not relevant and can be ignored. The service does not use master nodes
as data nodes.

MasterJVMMemoryPressure

The maximum percentage of the Java heap used for all dedicated master nodes in the
cluster. We recommend moving to a larger instance type when this metric reaches 85
percent.

Relevant statistics: Maximum

MasterCPUCreditBalance

The remaining CPU credits available for dedicated master nodes in the cluster. A CPU
credit provides the performance of a full CPU core for one minute. For more information,
see CPU Credits in the Amazon EC2 User Guide for Linux Instances. This metric is available only for the t2.micro.elasticsearch, t2.small.elasticsearch,
and t2.medium.elasticsearch instance types.

Relevant statistics: Minimum

MasterReachableFromNode

A health check for MasterNotDiscovered exceptions. A value of 1 indicates normal behavior. A value of 0 indicates that /_cluster/health/ is failing.

Failures mean that the master node stopped or is not reachable. They are usually the
result of a network connectivity issue or AWS dependency problem.

Relevant statistics: Minimum

Note

You can view this metric on the Amazon CloudWatch console, but not the Amazon ES console.

The following screenshot shows the dedicated
master nodes metrics that are described in the preceding table.

EBS Volume
Metrics

The AWS/ES namespace includes the following metrics for EBS volumes.

Metric

Description

ReadLatency

The latency, in seconds, for read operations on EBS volumes.

Relevant statistics: Minimum, Maximum, Average

WriteLatency

The latency, in seconds, for write operations on EBS volumes.

Relevant statistics: Minimum, Maximum, Average

ReadThroughput

The throughput, in bytes per second, for read operations on EBS volumes.

Relevant statistics: Minimum, Maximum, Average

WriteThroughput

The throughput, in bytes per second, for write operations on EBS volumes.

Relevant statistics: Minimum, Maximum, Average

DiskQueueDepth

The number of pending input and output (I/O) requests for an EBS volume.

Relevant statistics: Minimum, Maximum, Average

ReadIOPS

The number of input and output (I/O) operations per second for read operations on
EBS volumes.

Relevant statistics: Minimum, Maximum, Average

WriteIOPS

The number of input and output (I/O) operations per second for write operations on
EBS volumes.

Relevant statistics: Minimum, Maximum, Average

The following screenshot shows the EBS
volume metrics that are described in the preceding table.

Auditing Amazon Elasticsearch Service Domains with
AWS CloudTrail

Amazon Elasticsearch Service (Amazon ES) is integrated with AWS CloudTrail, a service
that logs all AWS API calls
made by, or on behalf of, your AWS account. The log files are delivered to an Amazon
S3
bucket that you create and configure with a bucket policy that grants CloudTrail permissions
to write log files to the bucket. CloudTrail captures all Amazon ES configuration
service API
calls, including those submitted by the Amazon Elasticsearch Service console.

You can use the information collected by CloudTrail to monitor activity for your search
domains. You can determine the request that was made to Amazon ES, the source IP address
from
which the request was made, who made the request, and when it was made. To learn more
about CloudTrail, including how to configure and enable it, see the AWS CloudTrail User Guide. To learn more about how to create and configure an S3
bucket for CloudTrail, see Amazon S3 Bucket Policy for CloudTrail.

Amazon Elasticsearch Service Information in
CloudTrail

When CloudTrail logging is enabled in your AWS account, API calls made to Amazon Elasticsearch
Service
(Amazon ES) operations are tracked in log files. Amazon ES records are written together
with
other AWS service records in a log file. CloudTrail determines when to create and
write
to a new file based on a time period and file size.

All Amazon ES configuration service operations are logged. For example, calls to
CreateElasticsearchDomain,
DescribeElasticsearchDomain, and
UpdateElasticsearchDomainConfig generate entries in the CloudTrail log
files. Every log entry contains information about who generated the request. The
user identity information in the log helps you determine whether the request was
made with root or IAM user credentials, with temporary security credentials for a
role or federated user, or by another AWS service. For more information, see the
userIdentity field in the CloudTrail Event
Reference.

You can store your log files in your bucket indefinitely, or you can define Amazon
S3
lifecycle rules to archive or delete log files automatically. By default, your log
files are encrypted using Amazon S3 server-side encryption (SSE). You can choose to
have
CloudTrail publish Amazon SNS notifications when new log files are delivered if you
want to
take quick action upon log file delivery. For more information, see Configuring Amazon SNS
Notifications for CloudTrail. You also can aggregate Amazon ES log files from
multiple AWS Regions and multiple AWS accounts into a single Amazon S3 bucket. For
more information, see Receiving CloudTrail Log Files from Multiple Regions.

Understanding Amazon Elasticsearch Service Log
File Entries

CloudTrail log files contain one or more log entries where each entry is made up of
multiple JSON-formatted events. A log entry represents a single request from any
source and includes information about the requested action, any parameters, the date
and time of the action, and so on. The log entries are not guaranteed to be in any
particular order—they are not an ordered stack trace of the public API calls.
CloudTrail log files include events for all AWS API calls for your AWS account, not
just calls to the Amazon ES configuration service API. However, you can read the log
files and scan for eventSourcees.amazonaws.com. The eventName element contains the name
of the configuration service action that was called.

Signing Amazon Elasticsearch Service
Requests

If you're using a language for which AWS provides an SDK, we recommend that you use
the SDK to submit Amazon Elasticsearch Service (Amazon ES) requests. All AWS SDKs
greatly simplify the process
of signing requests and save you a significant amount of time when compared to using
the
Amazon ES APIs directly. The SDKs integrate with your development environment and
provide
easy access to related commands.

If you choose to call the Amazon ES configuration service operations directly, you
must
sign your own requests. Configuration service requests must always be signed. Upload
and
search requests must be signed unless you configure anonymous access for those services.

To sign a request, you calculate a digital signature using a cryptographic hash
function, which returns a hash value based on the input. The input includes the text
of
your request and your secret access key. The hash function returns a hash value that
you
include in the request as your signature. The signature is part of the
Authorization header of your request.

After receiving your request, Amazon ES recalculates the signature using the same
hash
function and input that you used to sign the request. If the resulting signature matches
the signature in the request, Amazon ES processes the request. Otherwise, the request
is
rejected.

Tagging Amazon Elasticsearch Service Domains

You
can use Amazon ES tags to add metadata to your Amazon ES domains. AWS does
not apply any semantic meaning to your tags. Tags are interpreted strictly as character
strings. All tags have the following elements.

Tag Element

Description

Tag key

The tag key is the required name of the tag. Tag keys must be unique
for the Amazon ES domain to which they are attached. For a list of basic
restrictions on tag keys and values, see User-Defined Tag Restrictions.

Tag value

The tag value is an optional string value of the tag. Tag values can
be null and do not have to be unique in a tag set. For example, you can
have a key-value pair in a tag set of project/Trinity and
cost-center/Trinity. For a list of basic restrictions on tag keys and
values, see User-Defined Tag Restrictions.

Each Amazon ES domain has a tag set, which contains all the tags that are assigned
to that
Amazon ES domain. AWS does not automatically set any tags on Amazon ES domains.
A
tag set can contain up to 50 tags, or it can be empty. If you add a
tag to an Amazon ES domain that has the same key as an existing tag for a resource,
the new
value overwrites the old value.

You can use these tags to track costs by grouping expenses for similarly tagged
resources. An Amazon ES domain tag is a name-value pair that you define and associate
with an
Amazon ES domain. The name is referred to as the key. You can use tags
to assign arbitrary information to an Amazon ES domain. A tag key could be used, for
example,
to define a category, and the tag value could be an item in that category. For example,
you could define a tag key of “project” and a tag value of “Salix,” indicating that
the
Amazon ES domain is assigned to the Salix project. You could also use tags to designate
Amazon ES
domains as being used for test or production by using a key such as environment=test
or
environment=production. We recommend that you use a consistent set of tag keys to
make
it easier to track metadata that is associated with Amazon ES domains.

You also can use tags to organize your AWS bill to reflect your own cost structure.
To do this, sign up to get your AWS account bill with tag key values included. Then,
organize your billing information according to resources with the same tag key values
to
see the cost of combined resources. For example, you can tag several Amazon ES domains
with
key-value pairs, and then organize your billing information to see the total cost
for
each domain across several services. For more information, see Using Cost Allocation Tags in the AWS Billing and
Cost Management documentation.

Note

Tags are cached for authorization purposes. Because of this, additions and updates
to tags on Amazon ES domains might take several minutes before they are available.