Blogs

About this blog

I'm a well-known mainframe performance guy, with almost 30 years of experience helping customers manage systems. I also dabble in lots of other technology. I've sought to widen the Performance role, incorporating aspects of infrastructural architecture.
I'm a world-famous podcaster and screencaster (albeit VERY thinly spread). :-)

XML, XSLT and DFSORT, Part Two - DFSORT

The DFSORT code in this post parses the Comma-Separated Variable (CSV) file produced by XSLT processing. In this simple example it merely produces a flat file report, but the post has a few additional details you might find valuable.

First, here's the SORTIN DD JCL statement. It's not like a regular sequential file statement as it has to access the zFS file system we wrote the data to with Saxon:

Of particular note in this DD statement is the record format (VB), the logical record length (255) and the block size (32760). This is definitely VB data. I've found a LRECL greater than the maximum size Saxon has produced is fine. Similarly a sensible block size works. FILEDATA=TEXT is also needed.

Here's the SYMNAMES file:

Contents of the SYMNAMES File

RDW,1,4,BI
Row,%01
a,%02

You'll need an accompanying SYMNOUT DD - for the messages DFSORT (or ICETOOL) produce when the SYMNAMES file is processed.

I'm showing you this first so you can understand the main DFSORT control statements file: Everywhere you see the symbol "Row" in these statements you can interpret it as "%01", whatever that is. Similarly for "a" and "%02". The "RDW" symbol maps the Record Descriptor Word that we need for variable-length record processing. (DFSORT can convert from variable- to fixed-record format but we won't do that here.)

This is a very simple case of using DFSORT. So, for example, there's no SORT, no OUTFIL, nor any ICETOOL sophistication. It's meant to show how you can get the data into a format DFSORT can use. Let me explain how it works:

VLSHRT and the INCLUDE statement will, between them, remove the blank lines Saxon created.

IFOUTLEN sets the output record length (from INREC) to 70 bytes.

This WHEN=INIT parses the input (CSV) data.

The %01 field is filled from after the first " and before the second (with comma) ". It becomes a fixed character field of length 10 bytes.

The %02 field is filled with the remainder of the data in the record - for a length of 8 bytes.

We write out the RDW and both parsed fields.

This WHEN=INIT is used to produce the report lines. We print the %01 field ("Row"), a space, and the %02 numeric field ("a"). For the numeric field ("a") we parse the characters to extract the numeric value (with SFF) and then immediately reformat it (with EDIT=(I,IIT) ) to insert commas.

And here's the output:

The Resultant Output

One 1
Two 12
Three 903

Of course we needn't have just printed the data, as I've indicated. With a more interesting data set you could do a lot more.

The use of symbols ("Row" and "a") was largely gratuitous here. It just shows you can use them. If you're a regular DFSORT or ICETOOL user you'll know their value.

If you were to strip this down to the bare essentials the first WHEN=INIT does most of the work - parsing the data into fixed positions. (The one really useful thing the second WHEN=INIT does is to convert the numeric field into a packed decimal number.)

So, over these three posts I've shown how you can use XSLT to half tame XML data and DFSORT to complete the taming. I have a couple of other things I want to talk about in relation to this. But those belong in a separate post.