GeoMesa can be run on top of HBase using S3 as the underlying storage engine. This mode of running GeoMesa is
cost-effective as one sizes the database cluster for the compute and memory requirements, not the storage requirements.
The following guide describes how to bootstrap GeoMesa in this manner. This guide assumes you have an Amazon Web
Services account already provisioned as well as an IAM key pair. To set up the AWS command line tools, follow the
instructions found in the AWS online documentation.
The instructions below were executed on an AWS EC2 machine running Amazon Linux.

First, you will need to configure an S3 bucket for use by HBase. Make sure to replace <bucket-name> with your bucket
name. You can also use a different root directory for HBase if you desire. If you’re using the AWS CLI you can create a
bucket and the root “directory” this:

Next, create a local json file named geomesa-hbase-on-s3.json with the following content. Make sure to replace
<bucket-name>/hbase-root with a unique root directory for HBase that you configured in the previous step.

Then, use the following command to bootstrap an EMR cluster with HBase. You will need to change __KEY_NAME__ to
the IAM key pair you intend to use for this cluster and __SUBNET_ID__ to the id of the subnet if that key is
associated with a specific subnet. You can also edit the instance types to a size appropriate for your use case.
Specify the appropriate path to the json file you created in the last step.

You may desire to run awsconfigure before running this command. If you don’t you’ll need to specify a region
something like --regionus-west-2. Also, you’ll need to ensure that your EC2 instance has the IAM Role to perform
the elasticmapreduce:RunJobFlow action. The config below will create a single master and 3 worker nodes. You may
wish to increase or decrease the number of worker nodes or change the instance types to suit your query needs.

Optionally you can find the hostname for the master node on the AWS management console. Find the name (as specified in
the awsemr command) of the cluster and click through to its details page. Under the Hardware section, you can
find the master node and its IP address. Copy the IP address and then run the
following command.

exportMASTER=<ip_address>

To configure GeoMesa, remote into the master node of your new AWS EMR cluster using the following command:

Then, bootstrap GeoMesa on HBase on S3 by executing the provided script. This script sets up the needed environment
variables, copies hadoop jars into GeoMesa’s lib directory, copies the GeoMesa distributed runtime into S3 where HBase
can utilize it, sets up the GeoMesa coprocessor registration among other administrative tasks.

GeoMesa ships with predefined data models for many open spatio-temporal data sets such as GDELT. To ingest the most recent 7 days of GDELT from Amazon’s public S3 bucket, one can copy the files locally to the cluster or use a distributed ingest: