VARCHAR Data Type (CDH 5.2 or higher only)

A variable-length character type, truncated during processing if necessary to fit within the specified length.

Syntax:

In the column definition of a CREATE TABLE statement:

column_name VARCHAR(max_length)

The maximum length you can specify is 65,535.

Partitioning: This type can be used for partition key columns. Because of the efficiency advantage of numeric values over character-based values, if the
partition key is a string representation of a number, prefer to use an integer type with sufficient range (INT, BIGINT, and so on) where
practical.

HBase considerations: This data type cannot be used with HBase tables.

Parquet considerations:

This type can be read from and written to Parquet files.

There is no requirement for a particular level of Parquet.

Parquet files generated by Impala and containing this type can be freely interchanged with other components such as Hive and MapReduce.

Parquet data files can contain values that are longer than allowed by the VARCHAR(n) length limit. Impala ignores any
extra trailing characters when it processes those values during a query.

Text table considerations:

Text data files can contain values that are longer than allowed by the VARCHAR(n) length limit. Any extra trailing
characters are ignored when Impala processes those values during a query.

Avro considerations:

The Avro specification allows string values up to 2**64 bytes in length. Impala queries for Avro tables use 32-bit integers to hold string lengths. In
CDH 5.7 / Impala 2.5 and higher, Impala truncates CHAR and VARCHAR values in
Avro tables to (2**31)-1 bytes. If a query encounters a STRING value longer than (2**31)-1 bytes in an Avro table, the query fails. In earlier releases, encountering
such long values in an Avro table could cause a crash.

Schema evolution considerations:

You can use ALTER TABLE ... CHANGE to switch column data types to and from VARCHAR. You can convert from STRING to VARCHAR(n), or from VARCHAR(n) to
STRING, or from CHAR(n) to VARCHAR(n), or from VARCHAR(n) to CHAR(n).
When switching back and forth between VARCHAR and CHAR, you can also change the length value. This schema evolution works the same for
tables using any file format. If a table contains values longer than the maximum length defined for a VARCHAR column, Impala does not return an error. Any extra
trailing characters are ignored when Impala processes those values during a query.

Compatibility:

This type is available in CDH 5.2 / Impala 2.0 or higher.

Internal details: Represented in memory as a byte array with the minimum size needed to represent each value.

Added in:CDH 5.2.0 / Impala 2.0.0

Column statistics considerations: Because the values of this type have variable size, none of the column statistics fields are filled in until you run the
COMPUTE STATS statement.

Kudu considerations:

Currently, the data types CHAR, VARCHAR, ARRAY, MAP, and STRUCT cannot be used with Kudu tables.

Restrictions:

All data in CHAR and VARCHAR columns must be in a character encoding that is compatible with UTF-8. If you have binary data
from another database system (that is, a BLOB type), use a STRING column to hold it.

Examples:

The following examples show how long and short VARCHAR values are treated. Values longer than the maximum specified length are truncated by CAST(), or when queried from existing data files. Values shorter than the maximum specified length are represented as the actual length of the value, with no extra padding as seen
with CHAR values.

The following example shows how identical VARCHAR values compare as equal, even if the columns are defined with different maximum lengths. Both tables
contain 'a' and 'b' values. The longer 'hello' and 'world' values from the
VARCHAR_20 table were truncated when inserted into the VARCHAR_1 table.

If this documentation includes code, including but not limited to, code examples, Cloudera makes this available to you under the terms of the Apache License, Version 2.0, including any required
notices. A copy of the Apache License Version 2.0 can be found here.