Identifying HAWQ Table HDFS Files

You can determine the HDFS location of the data file(s) associated with a specific HAWQ table using the HAWQ filespace HDFS location, the table identifier, and the identifiers for the tablespace and database in which the table resides.

The number of HDFS data files associated with a HAWQ table is determined by the distribution mechanism (hash or random) identified when the table was first created or altered.

or view the HAWQ service Configs > Advanced, General pane, in your Ambari console.

You can determine the tablespace, database, and table object identifiers through HAWQ catalog queries. See the Example below.

Number of Data Files

The number of data files that are created for a HAWQ table differs for hash-distributed and randomly-distributed HAWQ tables.

Hash-distributed HAWQ tables use a fixed number of virtual segments (vsegs). This number is determined by the default_hash_table_bucket_number server configuration parameter setting, or the BUCKETNUM value you provide in the CREATE TABLE call. The number of HDFS files that HAWQ creates for a hash-distributed table also depends on the maximum number of concurrent inserts that have been executed against the table. The number of HDFS files is always the default_hash_table_bucket_number or BUCKETNUM value multiplied by the maximum number of concurrent inserts.

The number of HDFS files generated for a randomly-distributed HAWQ table varies depending on the total number of virtual segments that have written data to the table.

Example: Locating HDFS Files for a HAWQ Table

Perform the following steps to identify the HDFS location of the data files associated with a hash-distributed HAWQ table. The SQL queries used in this example are applicable to randomly-distributed HAWQ tables as well.