You can use the PXF HDFS connector to read one or more multi-line text files in HDFS each as a single table row. This may be useful when you want to read multiple files into the same Greenplum Database external table, for example when individual JSON files each contain a separate record.

Prerequisites

Ensure that you have met the PXF Hadoop Prerequisites before you attempt to read files from HDFS.

Reading Multi-Line Text and JSON Files

You can read single- and multi-line files into a single table row, including files with embedded linefeeds. If you are reading multiple JSON files, each file must be a complete record, and each file must contain the same record type.

PXF reads the complete file data into a single row and column. When you create the external table to read multiple files, you must ensure that all of the files that you want to read are of the same (text or JSON) type. You must also specify a single text or json column, depending upon the file type.

The following syntax creates a Greenplum Database readable external table that references one or more text or JSON files on HDFS:

The named server configuration that PXF uses to access the data. Optional; PXF uses the default server if not specified.

FILE_AS_ROW=true

The required option that instructs PXF to read each file into a single table row.

FORMAT

The FORMAT must specify 'CSV'.

Note: The hdfs:text:multi profile does not support additional format options when you specify the FILE_AS_ROW=true option.

For example, if /data/pxf_examples/jdir identifies an HDFS directory that contains a number of JSON files, the following statement creates a Greenplum Database external table that references all of the files in that directory:

When you query the pxf_readjfiles table with a SELECT statement, PXF returns the contents of each JSON file in jdir/ as a separate row in the external table.

When you read JSON files, you can use the JSON functions provided in Greenplum Database to access individual data fields in the JSON record. For example, if the pxf_readjfiles external table above reads a JSON file that contains this JSON record:

Example: Reading an HDFS Text File into a Single Table Row

Perform the following procedure to create 3 sample text files in an HDFS directory, and use the PXF hdfs:text:multi profile and the default PXF server to read all of these text files in a single external table query.