Contents of the MDCD.DOC file

This documentation is organized in the following order, under the following headings:

WHAT IS IT ? WHAT ISN'T IT ? WHY WAS IT WRITTEN ? HOW CAN I USE IT ? WHAT ARE THE RESTRICTIONS ? CONCERN # 1 CONCERN # 2 CONCERN # 3 CONCERN # 4 MDCD GENERAL INFORMATION & FEATURES MDCD COMPRESS OPTION MDCD DECOMPRESS OPTIONS MDCD LIST OPTIONS MDCD FUTURE RELEASES SOFTWARE DESCRIPTION WRITING YOUR OWN PROGRAMS THE COMPRESSION ALGORITHM MISCELLANEOUS INFO LIABILITY DISCLAIMERS ACKNOWLEDGEMENTS WHERE CAN I GET THE MOST RECENT COPY OF THE SOFTWARE ? WHERE CAN I CONTACT THE AUTHOR ?

WHAT IS IT ?

MDCD 1.0 is the first release of a file compression and decompression program that compresses data using a 13 bit LZW algorithm. It was written in Turbo Pascal and requires the Turbo Pascal 5.0 compiler. Portions are written in 808x assembler and require Turbo Assembler 1.0 or MASM 3.0+.

It is not as fast as PKWARE products but compresses almost as well. It is significantly faster and compresses better than the only version of ARC that I could find to compare it against (version 5.20).

It demonstrates some interesting differences when compared to current file compression/decompression programs. I will talk about the differences in a bit, but first I want to talk about what it is not...

WHAT ISN'T IT ?

It is not a replacement for PKWARE/SEA products. It is not a solution to the ARC wars that currently have (to mention a few) BBS Sysops, telecommunications networks and PC users in general in total disarray. It is not meant to show off my computer knowledge as I am terrible when it comes to math, and have only a simplistic understanding of the compression algorithm involved.

You are probably asking yourself, "Then why was it written" ? I shall explain...

WHY WAS IT WRITTEN ?

I write commercial PC software for a living. This includes a specialized communications program, a telephone call accounting system and several clients that require low level systems programming. The communications program uses compression in the process of preparing files for transmission. The call accounting system is rather large and I need to reduce disk storage and increase reliability when shipping and installing. I also like to have an orderly method of keeping track of various releases and revision levels of this software.

I did not want to pay an exorbitant fee to license compression technology from a third party. Not only is this cost prohibitive for someone trying to make a simple living, but it is nearly impossible to get something that can be tailored to ones own needs. You are stuck with a program that more than likely requires more disk space, or gobbles up more quantities of memory than you care to relinquish.

So, I wrote my own. It is written for my needs. It uses minimal memory. It requires no disk work area. It allows me to remember the exact path that a file was compressed from. It lets me store comments about the file. It lets me keep duplicate file names in a compressed file, retaining their real-time chronological order. It lets me remember the original file's date, time and size. It allows me to retain a file's original attributes so that if it is a hidden, system, read only file before compression, it will be likewise after. It allows me to simply group files together in a single area and doesn't waste time trying to compress already compressed .ARC or .ZOO files. It serves my exact needs and is written for me, a commercial software author, to facilitate control and distribution of software.

I wrote it for one other express purpose. I continually grow in my programming abilities. I have been programming computers for over 20 years and can't think of any period in my career where I wasn't constantly increasing my knowledge and awareness of current computer technology. This has happened at a fairly logical pace for most of my programming life... Until I discovered the crazy world of telecommunications/public domain/shareware/BBS'ing/networks in the sky/etc./etc.

This crazy world has increased my knowledge, solved my day-to-day technical problems, provided my with one heck of a lot of fun and exposed me to computer technology at a rate I would never have believed possible. I have learned from the generosity of others sharing their knowledge, experience and brilliance and I want to attempt to repay that in kind.

"So", you ask yourself, "how can I use this software?"

HOW CAN I USE IT ?

You can use it in any way that it serves your needs, providing you adhere to a couple of restrictions I will talk about in a minute. I am including all of the source code. If you are a shareware, or commercial software author, you could probably use it in many of the ways I mentioned above. Some ideas that come to mind:

A communications program compression of files prior to transmission. Graphics compression while saving to disk. Word processor document compression. Distributing products in compressed format and reliably determining a successful installation. Personal file compression program. Maintaining version/release control of software products, word processing documents, or just about anything. To mention a few!

"Hmmmmm, sounds like I could use this, but what ARE those restrictions ?"

WHAT ARE THE RESTRICTIONS ?

I toyed with the idea of copyrighting this software. I wanted to somehow have a means to control the few restrictions & requests that I had decided on. But that seemed kind of silly, and basically unenforceable. I was modifying a brilliant algorithm (which I do not understand), written by someone else (who had placed the code in the public domain) and adding some simple stuff to move some files around and keep track of basic information. And on top of that I was releasing the source to the general public, virtually no strings attached! I figured that I would get thrown out by any court trying to decide if my copyright was enforceable, especially if the proprietary source code was made available to all. So... NO COPYRIGHT.

I have three restrictions. All are unenforceable legally, one is enforceable morally, the other two are enforceable by virtue of the media in which this is distributed.

Restriction 1: If you use this code as part of a product that you gain monetarily from, I will be considered a "registered" user of that product, with all related privileges. In other words, you send me current releases of your product. If you have a BBS that supports your product and charge a related fee, I will be able to use it. If you have a quarterly newsletter, I will receive a copy. I consider this enforceable morally as I find it difficult to believe that any shareware or commercial organization would prohibit unauthorized use of their own software and yet violate this principle in using other's software against stated restrictions. (is he naive, you ask?)

Restriction 2: You may not use this code to create a product that competes with existing compression programs, shareware or otherwise. I do not want to be part of, or contribute to, the mass confusion that currently exists over conflicts between SEA and PKWARE. I will not delve into this issue in this document, but please KNOW that I see RED every time I think of the grief and chaos that this issue is causing a whole world full of users. This is somewhat enforceable in two ways. Number 1, if you are discovered by the public in general, your product will not be supported, and number 2, if I find out about it, I will make it common knowledge on every BBS and major network in this country. And believe me, tenacity and relentless are my two middle names. (yup... he's naive!)

Restriction 3: If you distribute this software, don't charge for it. Charges for postage & handling (that are in line with currently prevailing rates for shareware & public domain software) or connection and access charges to commercial networks or pay BBS's are excluded from the "don't charge for it" portion of this restriction. Make sure that it is distributed in the same form that you received it in. Do not remove or modify any of the files. This is enforceable in the same manner as restriction 2 discussed above.

"Okay, I want to use it, and I agree to the restrictions, ... BUT I have some concerns".

CONCERN # 1

"I am concerned about the legality of using this software."

If you have second thoughts about using this software in your commercial or shareware products, and are worried about not REALLY having my permission, write me a letter stating that you would like express permission to use this software, and state that you will adhere to the restrictions outlined above. I will send you my permission on (what do they call that stuff?? oh yeh..) paper giving you the authority to use the programs in any way you see fit.

CONCERN # 2

"I'm not sure I'm up to modifying this compression stuff."

If you're not sure you want to get into the complexities of writing or adapting compression technology to your software products, I will make myself available to you (after all, this IS how I feed my rug rats!) on a consulting basis. While generating direct revenue was not a consideration for creating this product, I would rather enjoy the task of adapting this software in unique areas.

CONCERN # 3

"Does this stuff really work?"

I have tested it as extensively as I can. That means that the first person to try it will probably find everything I missed, and then some. I have also had 5 other people putting it through its paces. So far, no problems. I have included extensive disk error checking. Throughout the entire development process, I did not once wipe out any of my disk files, or create any cross-linked clusters, and most of the disk I/O is done in assembler. I attribute this to 1) starting with an working public domain program, 2) extensive disk error checking, 3) Turbo Pascal's extensive error checking, and 4) extreme caution and care on my part.

CONCERN # 4

"I've heard about them VIRII. Am I taking a chance?"

All of the source code is included. Look at it, recompile it yourself, and test it in your own carefully created environment.

MDCD GENERAL INFORMATION & FEATURES

MDCD has three basic functions. It allows you to compress a file or files, decompress a previously compressed file or list the directory of files contained within a compressed file.

Typing MDCD with no parameters or entering an invalid command will display a help screen.

MCDC is a program that provides a fairly functional compress/decompress program. It was written to test and exercise the core routine that does that actual compression/decompression (MDCD1213.ASM). The source code provides excellent examples for writing your own compression program or routines.

Some of the features, and lack thereof are:

- Uses 13 bit LZW compression.

- Original file date, time, and attributes are preserved.

- Compress file extensions default to .MD if not specified.

- Files added to an existing Compress file are appended to the end of the file.

- Because duplicate files are allowed, and because files are added in physical order, an inherent ability to keep a chronological backups of files is available.

- File headers are added in the order that files are added to the Compress file. Therefore Compress file directory lists will not be in alphabetical order as you are probably used to. They will be in the same order that you added files to the Compress file.

- No external disk work space is required for any functions.

- The program is fairly small. 39k for Code, 1k for Date, 3k for Stack and 45k for heap for a total of 88k. Using 12 bit compression would reduce the program size by 20k requiring 68k total memory.

- The complete drive:path\name is kept for every compressed file.

- 122 bytes of overhead is incurred for each file stored in a compress file. Most of this is for the original drive:\path\name. I chose to do this on the first release because having this information available is much more valuable than the related disk space required.

- If a file does not compress smaller than the original, it is stored as is, retaining its original size.

- .ZOO and .ARC files are automatically recognized, and no attempt at compression is performed. The file is stored as is.

- Conventional DOS wildcards may be used when specifying files to be compressed.

- MDCD will not inadvertently try and compress the file that is currently being output to.

- MDCD will, at this time, only decompress an entire Compress file.

MDCD COMPRESS OPTION

This option allows you to compress a single or multiple files. You can compress to a new file, or existing file. Additions are always made to the end of the file. The compress file may contain exact duplicate drive:\path\filenames. If you attempt to compress to an existing file, a validity check is made prior to any physical writing to the file. If it is determined not to be a compress file, you will be informed and the program will terminate. Three (3) parameters are required:

Parameter 1: This parameter always contains the option. In the case of compression it must be 'C'.

Parameter 2: File to be compressed. This may contain a complete drive:\path\ name in front of the file name. If no drive: or path\ is specified, the currently directory will be searched. Valid DOS wild cards are allowed.

Parameter 3: File to contain the compressed input file. If pre-existing, it must be a valid compress file. If not found, it will be created. It may contain a complete drive:\path\name in front of the file name. The drive:\path\ is verified for existence. If no drive: or path\ is specified, the currently directory will be used. If no file extension is specified, the extension of .MD will be appended. If you wish no file extension for the file, end your file name with a period. e.g. [COMPFILE.].

MDCD DECOMPRESS OPTIONS

There are two different decompress options allowing the decompression of all files in an existing compress file. They are identical except that the 'D' option will pause and prompt you if it encounters an existing file of the same name as one it is about to decompress. The 'R' option will automatically 'R'eplace any files it decompresses, informing you with a message, but no pause.

If you attempt to decompress a nonexisting file, you will be informed. if you attempt to decompress an existing file, a validity check is made. If it is determined not to be a compress file, you will be informed and the program will terminate. Two (2) or three (3) parameters may be specified:

Parameter 1: This parameter always contains the option. In the case of decompression, it must be 'D' or 'R'. Specifying 'D' will cause a pause and prompt to occur if the program attempts to decompress a file that currently exists. You may respond with 'Y'es to replace the file or any other key to ignore the file and go on. Specifying 'R' will automatically replace any pre-existing files encountered.

Parameter 2: File to be decompressed. This may contain a complete drive:\path\ name in front of the file name. If no drive: or path\ is specified, the currently directory will be searched. If no file extension is specified, the extension of .MD will be appended. If you wish no file extension for the file, end your file name with a period. e.g. [COMPFILE.].

Parameter 3: Drive:\path to contain the decompressed output files. This parameter may be left blank, in which case files are decompressed to the current directory. if a drive:\path\ is entered, it is verified for existence. one slight problem exists when specifying the path. If you specify an output drive:\path to decompress to and it is specifying other than that root directory, you must leave off the trailing '\'. In other words, if you want to decompress to the root directory of your C: drive, enter c:\ - if you want to decompress to a subdirectory off your root directory on your c: drive, enter C:\SUBDIR - do NOT enter c:\subdir\. I will fix this in the next release.

MDCD LIST OPTIONS

There are two different list options. they both allow you to display a directory of all of the files in an existing compress file. They are identical except that the 'l' option displays original file date and time information and the 'F' option displays the original file drive:\path\ in its place.

If you attempt to list a nonexisting file, you will be informed. if you attempt to list an existing file, a validity check is made. if it is determined not to be a compress file, you will be informed and the program will terminate. Two (2) parameters are required:

Parameter 1: This parameter always contains the option. In the case of list, it must be 'l' or 'f'. both options will display a directory of all compressed files contained in a compress file. Specifying option 'l' will cause the original file's date and time to be displayed. specifying option 'F' will cause the original file's drive:\path to be displayed instead of the date/time.

Parameter 2: File to be listed. this may contain a complete drive:\path\ name in front of the file name. If no drive: or path\ is specified, the currently directory will be searched. If no file extension is specified, the extension of .MD will be appended. if you wish no file extension for the file, end your file name with a period. e.g. [COMPFILE.]. Wildcards can not be currently used to specify multiple compress files to be listed. This will be addressed in the next release.

MDCD will be an evolving product. Items that are on my to do list to be addressed are:

- Ability to sort several ways when doing a compress file directory list.

- Various length header records to allow a smaller header record without the drive:\path\, one to allow a file comment to be attached to each file and one to allow a file description to be attached to each file (variable length).

- Ability to override file compression and force file(s) to be stored directly.

- Ability to disable the saving/restoring of original file attributes.

- Ability to compress and decompress an entire disk(ette), or sub-directory and all lower level sub-directories, maintaining original disk(ette) structure.

- Ability to force decompression to the original drive:\path\.

- Program to allow the creation of self-extracting compress files.

- Ability to decompress individual files and selected file(s) that have duplicate names.

- Fix the irritation when using append or similar TSR's where a file appears to already exist, but doesn't.

- Allow wildcards to be used for decompressing and listing multiple Compress files.

- Implement heuristic logic and user specified requests allowing the use of both 12 bit and 13 bit compression.

- Change MDCD1213.ASM to allow simpler interface with Turbo C and various models.

MDCD1213.ASM is the compression/decompression algorithm routine. It can be compiled with TASM 1.0 or MASM 3.0 and up. It creates MDCD1213.obj. it has two routines which may be called. These routines are compressfile & decompress file. Each is a FAR assembler proc. make sure you identify these calls as FAR in your high level language. They are both implemented as high level language functions in that they pass back a WORD/INT/DW value indicating the success of the request. They also pass back a RECORD/STRUCT/STRUCT that contains either the compressed file crc and size, or the decompressed file crc and size. It is up to the caller to determine if the crc is correct. This assumes that the user of this routine has implemented his/her own internal file structure.

To call CompressFile, or DecompressFile, you need to pass several parameters on the stack. The parameters are the same for both functions however their meaning is a bit different depending on which routine you are calling. For a more detailed description of the parameters, see the listing for MDCD1213.ASM or look at TESTC.PAS, or TESTD.pas. These parameters are described below and are pushed onto the stack in the order described:

Parameter 1: This is a valid dos file handle for the input file. The file must be currently opened under this handle. For CompressFile, this is the individual file to be compressed. For DeCompressfile, this is the file that is to be decompressed.

Parameter 2: This is a valid dos file handle for the output file. The file must be currently opened under this handle. For CompressFile, this is the compressed output file. For DeCompressFile, this is the name that you want to call the file after it is decompressed.

Parameter 3: This is the LONGINT/LONG/DD value that specifies the byte offset within the input file that you want to start decompressing at. This is provided so that you may include multiple files in a compressed file. See MDCD.PAS for an example. MDCD1213.ASM specifies this parameter as two word values, but to the caller, it is strictly a double word long value. If you are only compressing and compressing single files, this value will always be zero (0).

Parameter 4: This is a @POINTER/&POINTER/SEGMENT:OFFSET that points to a RECORD/STRUCT/STRUCT defined in the callers program. MDCD1213.ASM specifies this parameter as two word values, but to the high level language caller this is strictly a single far pointer. To an assembler programmer, the first parameter is the segment and the second parameter is the offset. This area will receive return data from CompressFile or DeCompressfile. This record needs to contain 5 bytes allocated as a WORD/INT/DW and a LONGINT/LONG/DD. For CompressFile, the compressed file's crc and size in bytes is returned. For Decompressfile, the decompressed file's CRC and size in bytes is returned. These structures should look something like this, depending on your language:

Parameter 5: This is a WORD/INT/DW containing the segment address of a contiguous area of memory used for storing the temporary hash table. the size of this area varies depending on whether you are using 12 or 13 bit compression, and depending on the routine being called:

Parameter 6: This is a value indicating the type of compression or decompression to be performed. This value MUST be 12 for 12 bit or 13 for 13 bit. The values are not checked, and an incorrect value will more than likely lunch your machine.

In the event of a severe error in the MDCD1213.OBJ module, control will be passed back to the caller (after cleaning up the stack) and a function return code of $ffff/0xffff/0ffffh will be returned. The SS & SP, registers and the number of parameters on the stack are stored upon entry so that MDCD1213.OBJ can unwind itself no matter what routine the error occurs in and perform a far jump back to the caller.

For a simple example of using MDCD1213.OBJ see the included files TESTC.PAS and TESTD.PAS. TESTC.PAS allows you to compress a file by entering:

TESTC [uncompressed_file_name] [compressed_file_name]

TESTD allows you to decompress a file by entering:

TESTD [compressed_file_name] [uncompressed_file_name]

NOTE: MDCD1213.ASM needs to be changed to interface with Tubro C. It currently is model sensitive and the parameters would have to be passed backwards, or with pascal calling conventions. The segment names would have to also agree to Turbo C conventions.

THE COMPRESSION ALGORITHM

The compression algorithm is contained in MDCD1213.ASM. It is an extensively modified version of Tom Pfau's LZCOMP & LZDCMP programs written in 808x assembler. It provides for 12 bit and 13 bit LZW compression. Tom's original programs are contained in this download.

Tom's original source implemented the Lempel-Ziv-Welch 12 bit compression with non-repeat packing. The initial size of the code started at 9 bits and proceeded to a maximum code size of 12 bits. Once the number of codes exceeded the current code size, the number of bits was increased. When the table filled (4096 entries), a clear code was transmitted for the decompression routine and the table re-initialized, starting over at 9 bits. The maximum of 12 bits allowed 4096 codes. This algorithm is referred to as "Crunched" (note the UPPER case 'C') in several compression programs.

I modified the code to also allow for a maximum of 13 bits, or 8192 codes. Other than that, it is identical to 12 bit compression. This algorithm is referred to as "Squashed" in several compression programs.

The original source of the algorithm is from the article "A Technique for High Performance Data Compression" by Terry A. Welch which appeared in IEEE Computer Volume 17, Number 6 (June 1984), pp 8-19.

MISCELLANEOUS INFO

MDCD1213.ASM uses the same 16 bit CRC used by communications programs.

If MDCD1213.ASM encounters a corrupted file during decompression, or if invalid parameters are passed to it (e.g. 12 bit when it was actually compressed using 13 bit), your machine will probably go to lunch. I will work on trying to recognize these problems and finding a solution that allows for a graceful exit.

All of the included source code is HEAVILY commented... Maybe even excessively. This should help to make it more understandable for anyone attempting to utilize it in their own software tools.

My tests of the 12 vs. 13 bit compression show that .EXE files are an average of 1/2 of 1% smaller using 12 bit. Most other files, especially text type files will be anywhere from 4% - 7% larger. This is why I chose to implement the 13 bit LZW compression option in MDCD.

LIABILITY

LIMIT OF LIABILITY

MDCD and its related source code is distributed as-is. The author disclaims all warranties, expressed or implied. The author will assume no liability for damages either from the direct use of this product or as a consequence of the use of this product.

DISCLAIMERS

I have NOT looked at the source code for ARC, ZOO, DWC or any other file compression programs other than the original LZCOMP/LZDCMP assembler code, in the process of designing or programming this software. I have not used any existing ARC, ZOO or DWC documentation to assist in the design of file formats or file stowage methodologies. I have purposefully avoided any likeness to existing file compression programs that I have knowledge of. I have used the command line to pass parameters to my program. If THIS ever becomes a "look-and-feel" issue, we are ALL in big trouble.

I HAVE looked at hex dumps of both ARC and ZOO files to determine how to identify them as such, so that I can avoid the compression cycle and subsequently store the file. This is limited to finding a commonality that identifies these files, and in both cases, is determined by the file extension, along with the first character stored in the file. This information was NOT garnered from ARC/ZOO file documentation.

This program will NOT extract files from ARC or ZOO files or compress files into existing ARC or ZOO files. It WILL store an ARC or ZOO file into my .MD files as it will ALL files.

ACKNOWLEDGEMENTS

We walk on the shoulders of others that came before us, or however that old cliche goes.. I want to acknowledge several people who have contributed directly or indirectly to this program; most, if not all of whom have never heard of me. Please forgive any omissions:

Terry A. Welch, for the compression algorithm.

Tom Pfau, for LZCOMP & LZDCMP.

SEA & PKWARE (in a convoluted way), for creating the atmosphere that "incited" me to write this thing.

TurboPower Software, for the highest quality programming tools and a refreshingly unique commitment to professionalism and business practices.

Jerry D. Stuckle for the ASYNC.ASM tutorial, from which I learned 808x assembly and the 8250.

Phil "PIB" Burns for PIBTERM, from which I learned many excellent Turbo Pascal techniques and overcame communications hurdles.

Ray Duncan for "Advanced MSDOS Programming", Microsoft Press, without which, life would have been much more complex.

Jim Kyle, Chip Rabinowitz, Ray Duncan & other contributors to "The MS-DOS Encyclopedia", Microsoft Press, a formidable reference to Ms. DOS and all her idiosyncracies.