The GeoMesa FileSystem Datastore (GeoMesa FSDS) takes advantage of the performance characteristics of modern
cloud-native and distributed filesystems to scale bulk analytic queries. The FSDS is a good choice for doing bulk egress
queries or large analytic jobs using frameworks such as Spark SQL and MapReduce. The FSDS differs from other datastores
in that ingest and point query latencies are traded for high-throughput query performance. The FSDS pairs well with
low-latency ingest and cache-based datastores systems such as HBase or Kafka to provide an optimal pairing of “hot” and
“warm” storage options. This pairing is commonly known as a Lambda Architecture.

The GeoMesa FSDS consists of a few primary components:

FileSystem - A separately managed storage system that implements the GeoMesa FileSystem API

Partition Scheme - A stategy for laying out data on within the filesystem

Storage Format - A defined format or encoding to store data in files

Query Engine - A query engine or client to fulfill queries and run analytic jobs

GeoMesa FSDS can utilize any filesystem that implements the Hadoop FileSystem API. The most common filesystems used
with GeoMesa FSDS are:

HDFS - Hadoop Distributed File System

S3 - Amazon Simple Storage

GCS - Google Cloud Storage

WASB - Windows Azure Blob Store

Local - Locally Mounted File System (e.g. local disk or NFS)

Choosing a filesystem depends generally on cost and performance requirements. One thing to note is that S3, GCS, and
WASB are all “cloud-native” storage meaning that they are built into Amazon, Google, and Microsoft Azure cloud
platforms. These cloud-native filesystems are scaled separately from the compute nodes which generally provides a more
cost efficient storage solution. Compared to HDFS, their price per GigaByte of storage is lower but their latency is
higher. They also have the ability to persist data after you turn off all your compute nodes.

Any of the filesystems mentioned about are good choices for the FSDS. If you have more questions about making a choice
contact the GeoMesa team

The partition scheme defines how data is stored within the filesystem. The scheme is important because it defines how
the data is queried. Most OLAP queries in GeoMesa contain a date range and geometric predicate and the partition scheme
can aid in finding the files satisfy the query. There are two main partition schmes used with GeoMesa FSDS:

Apache Parquet - Apache Parquet is the leading interoperable columnar format in the Hadoop ecosystem. It provides
efficient compression, storage, and query of structured data. Apache Parquet is currently the only format that can be
used for writing data into the FileSystem datastore.

Converter Storage - The converter storage format is a synthetic format which allows you to overlay a GeoMesa converter
on top of a filesystem using a defined partition scheme. This allows you to utilize existing data storage layouts of
data stored in JSON, CSV, TSV, Avro, or other formats. Converters are pluggable allowing users to expose their own
custom storage formats if desired.