What is Amazon Athena?

Amazon Athena is an interactive query service that makes it easy to analyze data directly
in Amazon Simple Storage Service (Amazon S3) using standard SQL. With a few actions
in the AWS Management Console, you can point Athena at your data stored in Amazon
S3 and begin using standard SQL to run ad-hoc queries and get results in seconds.

Athena is serverless, so there is no infrastructure to set up or manage, and you pay
only for the queries you run. Athena scales automatically—executing queries in parallel—so
results are fast, even with large datasets and complex queries.

When should I use Athena?

Athena helps you analyze unstructured, semi-structured, and structured data stored
in Amazon S3. Examples include CSV, JSON, or columnar data formats such as Apache
Parquet and Apache ORC. You can use Athena to run ad-hoc queries using ANSI SQL, without
the need to aggregate or load the data into Athena.

Athena integrates with the AWS Glue Data Catalog, which offers a persistent metadata
store for your data in Amazon S3. This allows you to create tables and query data
in Athena based on a central metadata store available throughout your AWS account
and integrated with the ETL and data discovery features of AWS Glue. For more information,
see Integration with AWS Glue and What is AWS Glue in the AWS Glue Developer Guide.

You can create named queries with AWS CloudFormation and run them in Athena. Named
queries allow you to map a query name to a query and then call the query multiple
times referencing it by its name. For information, see CreateNamedQuery in the Amazon Athena API Reference, and
AWS::Athena::NamedQuery in the AWS CloudFormation User Guide.

Accessing Athena

You can access Athena using the AWS Management Console, through a JDBC connection,
using the Athena API, or using the Athena CLI.

Understanding Tables, Databases, and the Data Catalog

In Athena, tables and databases are containers for the metadata definitions that define
a schema for underlying source data. For each dataset, a table needs to exist in Athena.
The metadata in the table tells Athena where the data is located in Amazon S3, and
specifies the structure of the data, for example, column names, data types, and the
name of the table. Databases are a logical grouping of tables, and also hold only
metadata and schema information for a dataset.

For each dataset that you'd like to query, Athena must have an underlying table it
will use for obtaining and returning query results. Therefore, before querying data,
a table must be registered in Athena. The registration occurs when you either create
tables automatically or manually.

Regardless of how the tables are created, the tables creation process registers the
dataset with Athena. This registration occurs either in the AWS Glue Data Catalog,
or in the internal Athena data catalog and enables Athena to run queries on the data.

To create a table automatically, use an AWS Glue crawler from within Athena. For more
information about AWS Glue and crawlers, see Integration with AWS Glue. When AWS Glue creates a table, it registers it in its own AWS Glue Data Catalog.
Athena uses the AWS Glue Data Catalog to store and retrieve this metadata, using it
when you run queries to analyze the underlying dataset.

The AWS Glue Data Catalog is accessible throughout your AWS account. Other AWS services
can share the AWS Glue Data Catalog, so you can see databases and tables created throughout
your organization using Athena and vice versa. In addition, AWS Glue lets you automatically
discover data schema and extract, transform, and load (ETL) data.

Note

You use the internal Athena data catalog in regions where AWS Glue is not available
and where the AWS Glue Data Catalog cannot be used.

To create a table manually:

Use the Athena console to run the Create Table Wizard.

Use the Athena console to write Hive DDL statements in the Query Editor.

Use the Athena API or CLI to execute a SQL query string with DDL statements.

Use the Athena JDBC or ODBC driver.

When you create tables and databases manually, Athena uses HiveQL data definition
language (DDL) statements such as CREATE TABLE, CREATE DATABASE, and DROP TABLE under the hood to create tables and databases in the AWS Glue Data Catalog, or in
its internal data catalog in those regions where AWS Glue is not available.

Note

If you have tables in Athena created before August 14, 2017, they were created in
an Athena-managed data catalog that exists side-by-side with the AWS Glue Data Catalog
until you choose to update. For more information, see Upgrading to the AWS Glue Data Catalog Step-by-Step.

When you query an existing table, under the hood, Amazon Athena uses Presto, a distributed
SQL engine. We have examples with sample data within Athena to show you how to create
a table and then issue a query against it using Athena. Athena also has a tutorial
in the console that helps you get started creating a table based on data that is stored
in Amazon S3.

For a step-by-step tutorial on creating a table and write queries in the Athena Query
Editor, see Getting Started.

Run the Athena tutorial in the console. This launches automatically if you log in
to https://console.aws.amazon.com/athena/ for the first time. You can also choose Tutorial in the console to launch it.