The Unix operating system has a number of utilities that can be very useful for pre-processing data files to be loaded with SQL*Loader. Even when the same functionality can be achieved through SQL*Loader, the utilities described

here will be much faster. Data warehousing applications, in particular, can benefit greatly from these utilities.

PRESORTING THE DATA FILEThe performance of direct path loads can be significantly improved by presorting the input data on indexed columns. Pre-sorting minimizes the demand for temporary segments during the load.

The Unix command to be used for presorting is “sort”.In Example 1, suppose you have added the SORTED INDEXES (empno) clause to the control file to indicate that fields in the data file are presorted on the EMPNO column. To do that presorting, you would enter at the Unix prompt: % sort +0 -1 example1.dat > example1.srt

This will sort file example1.dat by its first field (by default fields are delimited by spaces and tabs) and send the output to file example1.srt: 7654 MARTIN SALESMAN 7698 1894.00 0.15 30 7782 CLARK MANAGER 7839 2572.50 0.20 10 7839 KING PRESIDENT 5850.00 10

In Example 2, if you wanted to sort file example2.dat by column DNAME, you would enter: % sort -t, -d +1 -2 example2.dat > example2.srtwhere “-t,” indicates that commas are delimiters, “-d” causes sort to consider only letters and digits in comparisons, and example2.srt is the output file: 10,”ACCOUNTING”,CLEVELAND 13,”FINANCE”,”BOSTON” 12,RESEARCH,”SARATOGA”

ELIMINATING FIELDS FROM THE DATA FILEOften, it is necessary to remove one or more fields from all the records in the data file. The Unix command that does that is “cut”.In Example 1, if you want to eliminate the COMM field altogether from the data file, enter at the Unix prompt: % cut -c1-38,44- example1.dat > example1.cutwhere the “-c” option specifies the character ranges that you want to extract from each record. The output file example1.cut contains: 7782 CLARK MANAGER 7839 2572.50 10 7839 KING PRESIDENT 5850.00 10 7654 MARTIN SALESMAN 7698 1894.00 30

In Example 2, to eliminate the LOC field from the data file, you would enter: % cut -f1-2 -d, example2.dat > example2.cutwhere “-f1-2” indicates you want to extract the first two fields of each record and “-d,” tells cut to treat comma as a delimiter. The output file example2.cut would contain: 12,RESEARCH 10,”ACCOUNTING” 13,”FINANCE”

REPLACING CHARACTERS IN THE DATA FILETwo Unix commands can be used here: “tr” or “sed”.For instance, if you want to replace all double quotes in the data file in Example 2 by single quotes, you may enter: % cat example2.dat | tr \” \’ > example2.sqtThe piped “cat” is necessary because tr’s input source is the standard input. Single and double quotes are preceded by backslashes because they are special characters. The output file will be: 12,RESEARCH,’SARATOGA’ 10,’ACCOUNTING’,CLEVELAND 13,’FINANCE’,’BOSTON’

ELIMINATING CHARACTERS FROM THE DATA FILEJust as for replacing characters, “tr” and “sed” can be used for eliminating them from the data file.If you want to remove all double quotes from the data file in Example 2, you may type: % cat example2.dat | tr -d \” > example2.noqThe contents of file example2.dat are piped to the tr process, in which the “-d” option stands for “delete”. The output file example2.noq would look like: 12,RESEARCH,SARATOGA 10,ACCOUNTING,CLEVELAND 13,FINANCE,BOSTON

An identical result would be obtained by using sed: % sed ‘s/\”//g’ example2.dat > example2.noqThe string in single quotes indicates that double quotes should be replaced by an empty string globally in the input file. Another interesting usage of tr would be to squeeze multiple blanks between fields down to a single space character. That can be achieved by doing:

INSERTING CHARACTERS INTO THE DATA FILEA typical situation in which you may need to insert characters into the datafile would be to convert a fixed position data file into a delimited one. The data file in Example 1 is a fixed position one. To convert it into a file delimited by commas, you would enter % cat example1.dat | tr -s ‘ ‘ ‘,’ > example1.dlmand obtain 7782,CLARK,MANAGER,7839,2572.50,0.20,10 7839,KING,PRESIDENT,5850.00,10 7654,MARTIN,SALESMAN,7698,1894.00,0.15,30

MERGING DIFFERENT FILES INTO A SINGLE DATA FILEMerging can be done by using “paste”. This command allows you to specify a list of files to be merged and the character(s) to be used as delimiter(s). For instance, to merge the data files in Examples 1 and 2, you may enter: % paste -d’ ‘ example1.dat example2.dat > example.mrgwhere “-d’ ‘” specifies a blank character as the delimiter between records being merged and example.mrg is the merged output file: 7782 CLARK MANAGER 7839 2572.50 0.20 10 12,RESEARCH,”SARATOGA” 7839 KING PRESIDENT 5850.00 10 10,”ACCOUNTING”,CLEVELAND 7654 MARTIN SALESMAN 7698 1894.00 0.15 30 13,”FINANCE”,”BOSTON”

ELIMINATING DUPLICATE RECORDS/FIELDS FROM THE DATA FILEThe Unix command for this is “uniq”. It eliminates or reports consecutive lines that are identical. Because only adjacent lines are compared, you may have to use the sort utility before using uniq.In Example1, suppose you wanted to keep only the first entry with DEPTNO = 10. The Unix command would be: % uniq +43 example1.dat > example1.unqThe “+43” indicates that the first 43 characters in each record should be ignored for the comparison. The output file example1.unq would contain: 7782 CLARK MANAGER 7839 2572.50 0.20 10 7654 MARTIN SALESMAN 7698 1894.00 0.15 30

COUNTING THE NUMBER OF LINES IN THE DATA FILEUse “wc”. For example: % wc -l example1.dat 3 example1.datwhich indicates that the file contains three lines.

CONVERTING BETWEEN ASCII AND EBCDICThe “dd” Unix command can be used to convert EBCDIC data files to ASCII and vice-versa. For example: % dd if=example1.ebc conv=ascii > example1.asctakes example1.ebc as EBCDIC input file, converts it into ASCII, and writes the converted output to file example1.asc.

Follow me on Twitter

Shivmohan Purohit - A Learner by Heart and ERP Professional by Work. Contribution to success of my organization and people around. Learning new things everyday.
Belongs to family, being a son, brother, a husband and a father of a joyful son. Looking to Collaborate with people around the work.
Looking forward to connect and learn with you