Amazon S3

Working with files stored in S3

You can query files and directories stored in your S3 buckets. Dremio supports a number of different file formats. To learn more, see the chapter on Files and Directories.

Amazon Configuration

Amazon S3 Credentials

To list your AWS account's S3 buckets as a source, you must provide your AWS credentials in the form of your access and secret keys. You can find instructions for creating these keys in Amazon's documentation.

NOTE: AWS credentials are not necessary if you are accessing only public S3 buckets.

Dremio Configuration

Here are all available source specific options:

Name

Description

AWS Access Key

AWS access key.

AWS Access Secret

AWS access secret.

Enable SSL Encryption

Whether to enable secure connections.

External Buckets

A list of external buckets that are not included with the provided AWS account credentials.

Properties

A list of additional Amazon S3 connection properties.

WARNING: If your S3 datasets include large Parquet files with 100 or more columns, then you will need to edit the number of maximum connections to S3 that each processing unit of Dremio is allowed to spawn. This can be done by adding a connection property called fs.s3a.connection.maximum and a custom value greater than the default 100.

Connecting through a proxy server

Optionally, you can configure your S3 source to connect through a proxy. You can achieve this by adding the following Properties in the settings for your S3 source:

Property Name

Description

fs.s3a.proxy.host

Proxy host.

fs.s3a.proxy.port

Proxy port number.

fs.s3a.proxy.username

Username for authenticated connections, optional.

fs.s3a.proxy.password

Password for authenticated connections, optional.

Connecting to a bucket in AWS GovCloud

To connect to a bucket in AWS GovCloud, set the correct GovCloud endpoint for your S3 source. You can achieve this by adding the following Properties in the settings: