Programmers' Patch

Matthew Phillips delves into disc directories

As the Programmers' Patch mailbag has not even been underwhelmed with a letter from Mrs Trellis of North Wales, I am going to continue with the topic I was pursuing last issue (I'm glad you didn't say 'month' there ­ Richard). We had had a good look at how data is physically stored on a floppy disc. I perhaps should have pointed out that most of what I said about physical storage applies to almost any floppy disc on any computer. This month we are going to get on to stuff which is specific to the Amstrad (or rather to CP/M, whose disc format Amstrad adopted for Amsdos). I also forgot to point out the rather strange numbering of the sectors in that diagram of a disc which appeared. Sector C6 intervened between sectors C1 and C2. Peculiar. More on that later.

Right, now it is time to fire up your disc sector editor and start rooting around the Directory. Fire up Discedit, Sected, Discology, Xexor, DMon, or whatever and pop a typical Data disc in the drive. I am using Martin Schroeder's Discedit for the screenshots in this article. A listing for this was published in Amstrad Action, November 1987, but all sector editors look very similar when you get down to it, as there is not a lot you can do with an 80-column screen once you have displayed the sector data in hexadecimal and in characters.

Sector editing for beginners

In the main part of the screen there are three columns. The column of four-digit hex numbers on the left side is just for reference, so you can work out how far into the sector data each line is. It happens that Discedit starts the numbering at &3000 because that is the position it stores the sector data in memory. The middle area shows the bytes of the sector in hexadecimal. They are arranged in lines of sixteen bytes. Some sector editors put an extra gap after eight bytes to help you to count. On the right hand side, the appropriate characters from the Amstrad's character set are displayed, so it is easy to read off ASCII data such as DISCEDITBAS. Discedit displays all characters from 0 to 255: some sector editors just show a full stop in place of the control characters.

There are only 16 lines of data shown, and as most disc sectors on Amstrad discs are 512 bytes long, only half a sector fits on screen at once. With Discedit the ' ] ' key allows you to flip the display between the two halves of the sector. Across the top the display shows which drive and track we are looking at, and then the there is a list of the ID numbers of the sectors present on that track. In the screenshot C1 is highlighted because that is the sector in view in the main part of the window.

The data in view in the main part of the window can be edited, usually by typing in hex codes or by typing the characters. It is not actually written to the disc until you tell the editor to do so. Generally, any changes you have made will be forgotten when you move to another sector, unless you have not saved them to the disc.

The interleave principle

Again, in the list of sector ID numbers, you will see that C1 to C9 are jumbled up, with C6 intervening between C1 and C2. The track has been formatted this way in order to make it quicker to read the disc.

Often, when reading files from the disc, the computer will need to read several sectors in a row. After sector C1 has passed under the drive head, the computer takes a moment to digest the data it has just read, and there is a risk that the next sector on the track will have already started passing under the drive head when the drive is told to read the data. If this happens, the disc will have to make a complete revolution before we can begin reading again. This slows things down, so this trick of sector skewing is used. The next numbered sector on the track, C2, is actually the next but one physically, so we have plenty of time to digest the data without missing the start of the next sector.

Looking at the directory

With a Data format disc in the drive, move to sector C1 on track zero and you will see something similar to that displayed in the screenshot. You should recognize the names of some of the files on your disc. On Data format discs the first four sectors at taken up with the directory. Each directory entry is 32 bytes long, recording both the name of the file and where it is stored on the disc. We will take a look at the first one as an example.

File name
Up to eight ASCII characters of file name, the rest filled with spaces, upper case.

t1 to t3

Extension
Up to three characters, the rest filled with spaces, upper case.

ex

Extent number

s1

Last record byte count (CP/M+ only)

s2

High byte of extent number, if required.

rc

Record count

d0 to dF

Disc blocks allocated to this file.

These are the conventional names for each byte of the directory entry, with a brief explanation for reference. Let us look at them in more detail.

If you don't know what user numbers are... you haven't missed an awful lot. Look up the |USER command in your manual. The important thing to note is that when a file is deleted, most of the directory entry is retained: only the user number byte is changed to &E5. This makes it very easy to 'undelete' a file, providing that the space it occupied on the disc has not already been reused.

When a disc is first formatted, it is entirely filled with &E5; but CP/M does not bother to wipe the actual data from the disc when deleting files, as that would take time. Instead the directory entry is flagged in this manner, and that and the areas of the disc used for the file will be reused in due course. In the screenshot the sixth file, SFORMAT.BAS, has been deleted. CP/M+ discs can have other codes in the user number slot, and these are outlined in the table.

Names and extensions

The storage of the file name and extension (or file type) is easy to see, but note that the full stop that we are used to seeing between the name and the extension is not actually stored on the disc. The name must be stored in capital letters, or weird things start happening. If you fancy experimenting, then use your sector editor to change some of the names to lower case. You will find that the files appear in lower case when you catalogue the disc, but if you try to load any such files, Amsdos and CP/M will not find them, because the directory will be searched for the upper case name instead.

The characters for the name must be from the ASCII range 32 to 127, and many of the characters in that range are not allowed either. Amsdos, CP/M 2.2 and CP/M+ all have slightly different restrictions on which punctuation characters are permitted in file names. The general restriction that characters must be less than ASCII 128 means that the top bit of each of the bytes of the name and extension is potentially free for other purposes. Those in the extension, referred to as t1', t2' and t3', are used as follows:

t1': read-only (file cannot be altered or deleted)

t2': system (file is hidden and available to all users)

t3': archive (file has been backed up)

You can try these out with your sector editor as well. For example, change the &42 at the start of the extension of DISCEDIT.BAS to &C2 and drop back to BASIC and catalogue the disc. You will find an asterisk appears by the file name, and you will not be able to delete the file using |ERA. The archive bit, t3', is intended to be used by software to make back-up copies of files, and is not otherwise useful. Under CP/M+, the top bits of f1 to f4 can be used for user (or application) defined attributes, but f5 to f8 must have the top bit clear.

The size of a cow

The rest of the bytes (ex, s1, s2, rc and d0 to dF) all deal with how large the file is and where it is stored on the disc. For Amsdos Data format discs it is quite easy to understand, but it can be much more complicated for larger disc formats. I will explain the terminology with the simple arrangement of Data format, and then we will see how it extends to Romdos and Parados formats.

The first concept to grasp is the "block". The whole of a CP/M disc is divided into equal sized blocks. On Data format discs they are 1K, which is actually the smallest block size CP/M allows. The first sector of the directory marks the beginning of block 0. On a Data format disc the sectors are 512 bytes big, so block 0 is made up of sectors C1 and C2 on track 0. Note that blocks can straddle two tracks. For example, block 4 is made up of sector C9 on track 0 and sector C1 on track 1.

The first four sectors of a Data format disc are taken up by the directory, equating to blocks 0 and 1. Thus the first block available for storing files is block 2. Have another look at the directory entry for DISCEDIT.BAS and you will find that the d0 byte has the value 2! Get the idea?

The second line of the directory entry (d0 to dF) tells us that DISCEDIT.BAS is stored in blocks 2, 3, 4, 5, and 6. Block 2 is made up of sectors C5 and C6 on track 0. Have a go at working out where one of the files on your disc starts, pop along with the sector editor and see if you can recognise it! For DISCEDIT.BAS, d5 to dF are all set to zero. The zero value means that there is no more space allocated to the file. Block zero itself is always the start of the directory, so it could never form part of a file's allocation anyway.

Precision engineering

Right, what does the record count (rc) indicate? At a fundamental level, CP/M works in "records" which are always 128 bytes long. That is because when CP/M was originally developed, floppy discs usually had sectors only 128 bytes in size. The rc value shows how long the file is in records. DISCEDIT.BAS has a record count of &28, or 40 in decimal, and 40 times 128 is 5120, or 5K. See how that matches up with the five 1K blocks allocated on the second line?

If you have a look at the other directory entries in the screen shot, you will see that it is not always this neat. DISCEDIT.BIN has a record count of 3, making only 384 bytes, much less than the 1K allocated. DEPRO0.BIN, at the bottom, only occupies two records, or 256 bytes. This reflects the fact that there is a certain amount of space wasted on the disc, because files have to be made of whole 1K blocks. Nevertheless, CP/M records the length of the files in records, multiples of 128 bytes.

Most newer operating systems record the exact number of bytes in a file: Amsdos and CP/M 2.2 just round up to the next multiple of 128 bytes. As far as text files are concerned, the actual end of the file is usually signified with CTRL-Z (character 26, or &1A). Amsdos .BIN and .BAS files start with a header which tells the firmware the precise length of the file. For many other types of file, the fact that there may be a few bytes of garbage on the end does not actually matter.

When CP/M+ was introduced, a method of recording the exact length of the file was brought in. The s1 byte can be used to store how many bytes in the last record of the file are actually used. For the sake of compatibility, a value of 0 means that all of the 128 bytes are used. Very few CP/M+ programs actually make use of this feature, so you will rarely see anything but zero in this byte. In the screen shot you will see that ARTICLE.TXT has s1 equal to 1C and rc equal to 7. This means that the file is precisely 6 x 128 + &1C bytes long, or 796 bytes. That was when I had only typed the first ten lines of this article! The s1 value was only set in this case because I had transferred the file from RISC OS using my own CPMFS filing system, which understands these things!

All creatures great (over 16k) and small

So what are ex and s2 for? If you have been thinking ahead, you will have seen that, as I have explained it so far, there is a problem storing files bigger than 16K. All of the d0 to dF values will be filled up, and there will be no way to indicate where the rest of the file is located. It's time for another screen shot.

This example is from side B of WACCI PD Disc 131. When you catalogue that disc, you will find a file AMSPLAY3.ASM which takes up 18K. As you see from the screen shot, there are two directory entries for this file. In the first one, blocks &18 to &27 are allocated to the file, and the record count is &80, making 128 x 128 bytes, or 16K. In the second directory entry, blocks &28 and &29 are allocated, with a record count of nine, giving 1152 bytes, just over 1K. The important thing to note is the ex value in each case. The first directory entry has ex equal to zero, and in the second it is set to one. The chunk of a file recorded by one directory entry is known as an 'extent', and ex is the extent number. In this example AMSPLAY3.BIN consists of two extents.

One important thing to realise is that the extents of a file do not necessarily appear in the directory in the right order. Files created using Amsdos cannot be lengthened without saving the whole file again, but it is possible to lengthen a file using CP/M. If a new extent is started, the operating system will just take the first free directory entry it comes across. If a previous file has been deleted in the meantime, then that directory entry will be used, and the extents will not be in numeric order in the directory. Even more common is a situation which can occur under Amsdos as well ­ where the directory entries may be in the right order, but are not consecutive.

The same applies to the blocks allocated to a file. The examples in the screenshots tend to be neatly arranged with consecutive blocks allocated to a file, but once you have deleted a few files and saved a few more, things will be very different. Take a look at a well-used disc in your sector editor, and you will see what I mean.

CP/M imposes no restrictions whatsoever on the use of the disc's storage. This might seem an advantage especially when compared with disc operating systems on other home computers of the same era. The BBC Micro, for example, had a system where files could only be stored in continuous blocks. After deleting a few files, you might have 20K free on the disc; but if the spare space was not all gathered together, if would not be possible to save a 20K file without using the *COMPACT command, which took ages ­ and might not be available if you were trying to save a file from an application.

There are, however, some disadvantages to CP/M's flexibility. If files are split up in little bits across the disc, it can be much slower to load them, as the drive head has to seek from one track to another much more. Also, it is harder to recover the files if you accidentally wipe your directory with your sector editor!

Keep it simple

Since the days of CP/M, operating system designers have continued with these different approaches. MS-DOS and Windows have the same flexibility and failings of CP/M: it sometimes pays to 'defragment' your hard drive to sort out the mess. Other operating systems (including RISC OS, the successor to the BBC Micro) have relied on more sophisticated algorithms to ensure that files are kept in one piece so far as is possible. Most systems split up the jobs of recording the names of the files and the spaces on the disc which they occupy. Meanwhile, CP/M's simplicity is a great advantage when it comes to delving in there with a trusty sector editor!

Next issue, we will look some more at the directory entry, and deal with the more complicated situation found on larger format discs where the capacity exceeds 256K. In the meantime, as a taster, I will leave you with a little BASIC program which works out from the block number what the corresponding track and sector numbers are. You can use it together with your sector editor, if you have two computers! It is set up for Data format discs, but if you alter the first two lines to set lows% to &41 and res% to 2, then it will give you the correct answers for System format discs instead. By the way, if you miss out all the percentage signs it will work just as well and will be quicker to type in.