Thursday, March 13, 2008

There have been occasions when I needed to bulk load fixed width or Column Aligned Text (CAT) format files into a table in Microsoft Access. Access has a great Import Wizard that allows me to interactively define the fixed width columns before importing to a table. However I needed a non-graphical batch tool to do the same job. So I decided to use FME for the job. I explored the FME Workbench for a way to define the source CAT dataset but it was nowhere to be found. There is the Comma Separated Value (CSV) format but my dataset was not formatted that way. I did some research on Google and could not find a suitable method within the Workbench. So I had to roll up my sleeves and tried to write a FME Mapping file that did the job. So here are the steps that I took.

The above lines define some macros which are used by the FME Universal Translator graphical user interface to get the necessary input and output parameters. The macro DATABASE_COMPRESS_AT_END is set to 'yes' in the example above. The figure below shows how the macros look like when the mapping file is run from the Translator.

Now we have to add in the *_DEF statements to define the file formats for the input CAT dataset and the output Access dataset. For example, we may have a sample CAT file that looks like the few lines below.

* This is a comment lineS0000000105981116893620091001XXX S0000000216897716897720091001XXX S0000000516965916862220081001XXX

The first 9 characters make up an ID number. The next 6 characters starting from column 10 define the OLDPOSCODE field. The NEWPOSCODE field is 6 characters long starting from the 16th column. The next 8 characters starting from column 22 make up the REGDATE field. Next is a 1 character REGCODE field. Finally the last 3 characters define the AREA field. The correct input TABLE definition statement in the mapping file would therefore be:

We want to bulk load all the input data into the output Access table. The output DATABASE_DEF table definition should be almost identical as shown below. If you do not want to append to the Access table then add in the keyword SQL_DROP_TABLE and set it to "yes"; this will tell FME to drop and recreate the table.

We have now created a basic mapping file that can be used to bulk load the CAT dataset to an Access database without any data validation. The next step is to add the statements to perform the data checking.

Step 2 - Adding the Checking

There may be comment lines or blank lines in the CAT file which you may not want to load. Comment lines usually have a comment character like '#' or '!' in the first column. To filter out these lines, you can use the TestFactory and define a TCL regular expression to test for the conditions. In our example sample data, if the first column has the '*' character, then the line is a comment line and should be ignored.

The first factory is named as "Validate Line" with the FACTORY_NAME keyword. The INPUT FEATURE_TYPE line with the '*' allows the factory to read all lines from the CAT file. For each ID field of the input lines, the factory will test it against the TCL regular expression "{^\*|^$)". In plain English, the regular expression will look for the existence of the '*' character or the newline character '$' in the first column. If either of these is found, then the entire TEST TCL expression will evaluate to the value of 1 and the line record will fail the test and the factory will not pass the record down the pipeline to the second factory.

Step 3 - Run the Translation

Once the mapping file has been created, you can run it through the FME Universal Translator interface or you can run it from the DOS Command Prompt. An example translation run is shown below.

Recently, I did some GeoMedia 6.0 VB programming and I had a bit of a problem connecting to a GeoMedia SmartStore Warehouse. According to the GeoMedia Professional Object Reference document, the Connection object's ConnectInfo property is supposed to be set according to the following:

DDCLocation=file | folder; CSF=csffile

However, this is incorrect. The DDCLocation keyword is actually in uppercase.