"exclusions": (Optional) A string containing a JSON list of Unix-style glob
patterns to exclude. for example "[\"**.pdf\"]" would exclude all pdf files. More
information
about the glob syntax supported by AWS Glue can be found at Using
Include and Exclude Patterns.

"compressionType": (Optional) Specifies how the data is compressed.
This is generally not necessary if the data has a standard file extension. Possible
values are
"gzip" and "bzip").

"groupFiles": (Optional) Grouping files is enabled by default when the
input contains more than 50,000 files. To enable grouping with fewer than 50,000 files,
set this parameter to "inPartition". To disable grouping when there are more
than 50,000 files, set this parameter to "none".

"groupSize": (Optional) The target group size in bytes. The default
is computed based on the input data size and the size of your cluster. When there
are fewer
than 50,000 input files, "groupFiles" must be set to "inPartition"
for this to take effect.

"recurse": (Optional) If set to true, recursively reads files
in all subdirectories under the specified paths.

"maxBand": (Optional, Advanced) This option controls the duration
in seconds after which s3 listing is likely to be consistent. Files with modification
timestamps falling within the last maxBand seconds are tracked specially
when using JobBookmarks to account for S3 eventual consistency. Most users do not
need to set
this option. The default is 900 seconds.

"maxFilesInBand": (Optional, Advanced) This option specifies
the maximum number of files to save from the last maxBand seconds. If this
number is exceeded, extra files are skipped and only processed in the next job run.
Most
users do not need to set this option.

"connectionType": "parquet"

Designates a connection to files stored in Amazon Simple Storage Service (Amazon S3)
in the Apache Parquet
file format.

Use the following connectionOptions with "connectionType": "parquet":

paths: (Required) A list of the Amazon S3 paths from which to read.

(Other option name/value pairs):
Any additional options, including formatting options, are passed
directly to the SparkSQL DataSource.
For more information, see Redshift data source for Spark.

"connectionType": "postgresql":
Designates a connection to a PostgreSQL database.

Use these connectionOptions with JDBC connections:

"url": (Required) The JDBC URL for the database.

"dbtable": The database table to read from. For JDBC data stores that support schemas within
a database, specify schema.table-name. If a schema is not provided, then the default "public" schema is used.

"redshiftTmpDir": (Required for Amazon Redshift, optional for other JDBC types) The Amazon S3 path
where temporary
data can be staged when copying out of the database.

"user": (Required) The username to use when
connecting.

"password": (Required) The password to use when connecting.

All other option name/value pairs that are included in connectionOptions
for a JDBC connection, including formatting options, are passed directly to the underlying
SparkSQL DataSource.
For more information, see Redshift data source for Spark.

"connectionType": "dynamodb"

Designates a connection to Amazon DynamoDB (DynamoDB).

Use the following connectionOptions with "connectionType": "dynamodb":

"dynamodb.input.tableName": (Required) The DynamoDB table from which to read.

"dynamodb.throughput.read.percent": (Optional) The percentage of reserved capacity units (RCU) to use.
The default is set to "0.5". Acceptable values are from "0.1" to "1.5", inclusive.

Javascript is disabled or is unavailable in your browser.

To use the AWS Documentation, Javascript must be enabled. Please refer to your browser's
Help pages for instructions.