COPYFROM copies data from a URI to the specified table as a raw data
import.

The nodes in the cluster will attempt to read the files available at the URI
and import the data. These files have to be UTF-8 encoded and contain a single
JSON object per line. Any keys in the object will be added as columns,
regardless of the previously defined table. Empty lines are simply sikpped.

CrateDB does not check if the column’s data types match the types from the
import file. It does not cast the types but will always import the data as in
the source file. Furthermore CrateDB will only check for primary key duplicates
but not for other Column Constraints like NOTNULL.

For example a WKT string cannot be imported into a column of geo_shape
or geo_point type, since there is no implict cast to the GeoJSON format.

Note

In case the COPYFROM statement fails, the log output on the node will
provide an error message. Any data that has been imported until then has
been written to the table and should be deleted before restarting the
import.

The provided (absolute) path should point to files available on at least one
node with read access to the CrateDB process (with its default user ‘crate’)
there.

By default each node will attempt to read the files specified. In case the URI
points to a shared folder (where other CrateDB nodes also have access) the
shared option must be set to true in order to avoid importing duplicates.

A secretkey provided by Amazon Web Services can contain characters such
as ‘/’, ‘+’ or ‘=’. These characters must be URL encoded. For a detailed
explanation read the official AWS documentation.

Note

Versions prior to 0.51.x use HTTP for connections to S3. Since 0.51.x these
connections are using the HTTPS protocol. Please make sure you update your
firewall rules to allow outgoing connections on port 443.

In addition to the schemes above, CrateDB supports all protocols supported by
the URL implementation of its JVM (typically http, https, ftp,
and jar). Please refer to the documentation of the JVM vendor for an
accurate list of supported protocols.

For partitioned tables this clause can be used to import data into the
specified partition. This clause takes one or more partition columns and for
each column a value.

[PARTITION(partition_column=value[,...])]

partition_column:

The name of the column by which the table is partitioned.
All partition columns that were part of the
PARTITIONED BY Clause of the CREATE TABLE
statement must be specified.

value:

The column’s value.

Note

Partitioned tables do not store the value for the partition column in each
row, hence every row will be imported into the specified partition
regardless of the value provided for the partition columns.

The number of nodes that will read the resources specified in the URI. Defaults
to the number of nodes available in the cluster. If the option is set to a
number greater than the number of available nodes it will still use each node
only once to do the import. However, the value must be an integer greater than
0.

If shared is set to false this option has to be used with caution. It might
exclude the wrong nodes, causing COPY FROM to read no files or only a subset of
the files.