In this example, the dump files used are the same as those created in the previous example using the ORACLE_LOADER access driver. However, in this example, in addition to partitioning the data using customer_number, the data is subpartitioned using postal_code. For every partition, there is a subpartition where the postal_code is less than 50000 and another subpartition for all other values of postal_code. With three partitions, each containing two subpartitions, a total of six files is required. To create the files, use the SQL CREATE TABLE AS SELECT statement to select the correct rows for the partition and then write those rows into the file for the ORACLE_DATAPUMP driver.

The following statement creates a file with data for the first subpartition (postal_code less than 50000) of partition p1 (customer_number less than 100).

You can select from each of these external tables to verify that it has the data you intended to write out. After you have executed the SQL CREATE TABLE AS SELECT statement, you can drop these external tables.

To use a virtual column to partition the table, create the partitioned ORACLE_DATAPUMP table. Again, the table is partitioned on the customer_number column and subpartitioned on the postal_code column. The postal_code column is a character field that contains numbers, but this example partitions it based on the numeric value, not a character string. In order to do this, create a virtual column, postal_code_num, whose value is the postal_code field converted to a NUMBER data type. The SUBPARTITION clause uses the virtual column to determine the subpartition for the row.

To create a partitioned external table for an ORACLE_HIVE table, you need a partitioned Hive external table. Then you need to use the PL/SQL procedure DBMS_HADOOP.CREATE_EXTDDL_FOR_HIVE(). Additionally, this example creates the partitioned Hive table from the HDFS files used in the previous example. Before the partitioned Hive table can be created, you must create an HDFS directory in which to store the data.

This examples creates the Hive table using the data files from the previous example showing how to use ORACLE_HDFS to create partitioned external tables.. The following commands are all performed inside of the Hive CLI so they use Hive syntax.

First, use Hive to create a Hive external table on top of the HDFS data files, as follows:

Now you can go back to SQL*Plus to create the partitioned external table inside the Oracle database. First, use PL/SQL function DBMS_HADOOP.CREATE_EXTDDL_FOR_HIVE() to create the external table. The arguments are as follows:

the name of the Hadoop cluster

the name of the Hive user that owns the table

the name of the partitioned Hive table

a boolean value indicating whether you want the partition columns in the Hive table to be included as columns in the external table

the name of the partitioned ORACLE_HIVE table that is created

a boolean value indicating whether the CREATE DDL is executed

a CLOB contains the CREATE DDL string

This example gets the CLOB for the CREATE DDL string and uses the DBMS_OUTPUT.PUT_LINE() procedure to write it. Setting SERVEROUTPUT ON tells SQL*Plus to display the data from the DBMS_OUTPUT.PUT_LINE() procedure. Setting LINESIZE to 132 makes sure there are no line breaks at odd places when displaying the data from the DBMS_OUTPUT.PUT_LINE() procedure.

Because Hive does not specify a maximum character count for STRING columns, the column definition for the external table is VARCHAR2(4000). If you want a smaller length for some columns, you can call the DBMS_HADOOP.CREATE_EXTDDL_FOR_HIVE() procedure and specify that you do not want to execute the CREATE DDL. Then, you can edit the CREATE statement returned in the CLOB to set the length of the VARCHAR2 columns to a more appropriate value.

Also, note that the numbers in the partition name can vary.

Now that the table is created, executing a SELECT * statement returns all of the rows. Note that the SET LINESIZE executed above means that SQL*Plus uses 132 character for customer_name and postal_code.

18.5 Loading LOBs From External Tables

External tables are particularly useful for loading large numbers of records from a single file, so that each record appears in its own row in the table. The following example shows how to perform such a load.

The second line in the dump file requires quotation marks around the full text string; otherwise the field would be terminated at the comma.

Note:

Although not a problem in the dump file being used in this example, if something in the full text string contained quotation marks, then you would enclose it in another set of quotation marks, as follows for the word alone:

If the full text might contain the record delimiter character (0x'0A', or newline), you can specify a separate file for each document. External tables do not support filler fields, so instead you must use a COLUMN TRANSFORMS clause to specify that the fname field contains the name of the external file:

The FROM (MY_DIRECTORY) clause is not actually necessary since it has already been specified as the default directory. However it is being shown here for example purposes because if the loader.txt file and the individual CLOB files were in different locations, it would be needed.

Once the data is in an external table, you can either leave it there and perform normal table operations (DML and most DDL) on the external table, or you can use the external table as a staging table to get the data into a normal table. To create a new normal (non-external) table, you could use the following SQL statement:

CREATE TABLE normaltable AS SELECT * FROM externaltable;

You can similarly use the following SQL statement to insert data into the new normal table:

INSERT INTO normaltable AS SELECT * FROM externaltable;

18.6 Loading CSV Files From External Tables

This topic provides examples of how to load CSV files from external tables under various conditions. Some of the examples build on previous examples.

Example 18-6 Loading Data From CSV Files With No Access Parameters

This example requires the following conditions:

The order of the columns in the table must match the order of fields in the data file.

The records in the data file must be terminated by newline.

The field in the records in the data file must be separated by commas (if field values are enclosed in quotation marks, then the quotation marks are not removed from the field).

Example 18-7 Default Date Mask For the Session Does Not Match the Format of Data Fields in the Data File

This example is the same as the previous example, except that the default date mask for the session does not match the format of date fields in the data file. In the example below, the session format for dates is DD-Mon-YYYY whereas the format of dates in the data file is MM/DD/YYYY. If the external table definition does not have a date mask, then the ORACLE_LOADER access driver uses the session date mask to attempt to convert the character data in the data file to a date data type. ou specify an access parameter for the date mask to use for all fields in the data file that are used to load date columns in the external table.

The following is the contents of the data file for the external table:

This example is that same as the first example in this section except for the following:

The data is split across two data files.

Each data file has a row containing the names of the fields.

Some fields in the data file are enclosed by quotation marks.

The FIELD NAMES ALL FILES tells the access driver that the first row in each file contains a row with names of the fields in the file. The access driver matches the names of the fields to the names of the columns in the table. This means the order of the fields in the file can be different than the order of the columns in the table. If a field name in the first row is not enclosed in quotation marks, then the access driver uppercases the name before trying to find the matching column name in the table. If the field name is enclosed in quotation marks, then it does not change the case of the names before looking for a matching name.

Because the fields are enclosed in quotation marks, the access parameter requires the CSV WITHOUT EMBEDDED RECORD TERMINATORS clause. This clause states the following:

Fields in the data file are separated by commas.

If the fields are enclosed in double quotation marks, then the access driver removes them from the field value.

There are no new lines embedded in the field values (this option allows the access driver to skip some checks that can slow the performance of SELECT operations on the external table).

Example 18-11 Not All Fields in the Data File Use Default Settings for the Access Parameters

This example shows what to do when most field in the data file use default settings for the access parameters but a few do not. Instead of listing the setting for all fields, this example shows how you can set attributes for just the fields that are different from the default. The differences are as follows:

there are two date fields, one of which uses the session format, but registration_deadline uses a different format

registration_deadline also uses a value of NONE to indicate a null value.

The table definition is as follows. The ALL FIELDS OVERRIDE clause allows you to specify information for that field while using defaults for the remaining fields. The remaining fields have a data type of CHAR(255) and the field data is terminated by a comma with a trimming option of LDRTRIM.