A character-separated values (CSV) file represents a tabular data set consisting of rows and columns.
Each row is a plan-text line. A line is usually broken by a character line feed \n or carriage-return \r.
The line feed \n is the default delimiter in Tajo. Each record consists of multiple fields, separated by
some other character or string, most commonly a literal vertical bar |, comma , or tab \t.
The vertical bar is used as the default field delimiter in Tajo.

Some table storage formats provide parameters for enabling or disabling features and adjusting physical parameters.
The WITH clause in the CREATE TABLE statement allows users to set those parameters.

Now, the CSV storage format provides the following physical properties.

text.delimiter: delimiter character. | or \u0001 is usually used, and the default field delimiter is |.

text.null: NULL character. The default NULL character is an empty string ''. Hive’s default NULL character is '\\N'.

compression.codec: Compression codec. You can enable compression feature and set specified compression algorithm. The compression algorithm used to compress files. The compression codec name should be the fully qualified class name inherited from org.apache.hadoop.io.compress.CompressionCodec. By default, compression is disabled.

timezone: the time zone that the table uses for writting. When table rows are read or written, `timestamp` and `time` column values are adjusted by this timezone if it is set. Time zone can be an abbreviation form like ‘PST’ or ‘DST’. Also, it accepts an offset-based form like ‘UTC+9’ or a location-based form like ‘Asia/Seoul’.

text.error-tolerance.max-num: the maximum number of permissible parsing errors. This value should be an integer value. By default, text.error-tolerance.max-num is 0. According to the value, parsing errors will be handled in different ways.
* If text.error-tolerance.max-num<0, all parsing errors are ignored.
* If text.error-tolerance.max-num==0, any parsing error is not allowed. If any error occurs, the query will be failed. (default)
* If text.error-tolerance.max-num>0, the given number of parsing errors in each task will be pemissible.

The following example is to set a custom field delimiter, NULL character, and compression codec:

In default, NULL character in CSV files is an empty string ''.
In other words, an empty field is basically recognized as a NULL value in Tajo.
If a field domain is TEXT, an empty field is recognized as a string value '' instead of NULL value.
Besides, You can also use your own NULL character by specifying a physical property text.null.

CSV files generated in Tajo can be processed directly by Apache Hive™ without further processing.
In this section, we explain some compatibility issue for users who use both Hive and Tajo.

If you set a custom field delimiter, the CSV tables cannot be directly used in Hive.
In order to specify the custom field delimiter in Hive, you need to use ROWFORMATDELIMITEDFIELDSTERMINATEDBY
clause in a Hive’s CREATETABLE statement as follows: