The author of CRCSET (hereafter referred to as "the author") makes nowarranty of any kind, expressed or implied, including without limitation, anywarranties of merchantability and/or fitness for a particular purpose. Theauthor shall not be liable for any damages, whether direct, indirect, special,or consequential arising from a failure of this program to operate in themanner desired by the user. The author shall not be liable for any damage todata or property which may be caused directly or indirectly by use of theprogram.

In no event will the author be liable to the user for any damages,including any lost profits, lost savings, or other incidental or consequentialdamages arising out of the use or inability to use the program, or for anyclaim by any other party.

- Page 1 -

License

This program is public domain. As such, it may be freely distributedby anyone by any means. No person or organization may charge for this programexcept for a minimal charge to cover handling and distribution.

Having said that, I would also like to add that this algorithm hastaken a lot of time and work to develop. If you would like to make anycontribution for the use of this program, please do so, but it is notrequired. I consider anti-virus algorithms too useful to be proprietary. Asfor the contributions themselves, they don't have to be money. Send me copiesof your own programs or whatever you feel is appropriate.

Also, if you have any questions or would like to see some morefeatures in the program, drop me a note by surface or electronic mail (myaddress is on the first page of this file). I will answer all mail regardingthis program.

Customization is available.

- Page 2 -

Introduction

CRCSET is an anti-virus utility. Its purpose is to protect programs(in which supporting code is linked) with one of the most effective weaponsagainst computer viruses: the Cyclic Redundancy Check, or CRC. A fullunderstanding of the CRC is not required to use this utility; if you like, youcan skip over the discussion of the CRC to the section entitled "How to UseCRCSET".

There are many utilities that perform CRC checks on other programs(VALIDATE.COM by McAfee associates is one) but most of these are externalprograms that are usually run only once, if at all. The CRC generated bythese utilities must be compared to a value in an external file; if the valuesmatch, the program is not infected.

This approach has two problems: the first is that the CRC check isrun only once when the user gets the program, if at all. Most people wouldnever run the check a second time. The second problem is that the CRC isstored in an external file (e.g. the documentation). If someone wants to tacka virus onto the program, it becomes a simple matter to run the validationprogram, copy the CRC values to the documentation, and distribute the infectedprogram. Anyone running the validation program would find the same CRC in theprogram as in the documentation, and in comes the virus.

Another (increasingly popular) approach is for the CRC to be stored inthe program itself (the .EXE file) and for the program to do its own checkevery time it is loaded. This method is much more effective than the previousone because it means that the moment the program is infected and the CRCchanges, the infection will be detected the next time the program is run.

There is one big problem with this method, but before I get into that,we need some background.

- Page 3 -

What is a CRC?

The CRC, or Cyclic Redundancy Check, is an error-checking algorithmused in many types of computer operations, especially in data transfer. Forexample, whenever your computer writes to disk, the disk controller calculatesthe CRC of the data being written and writes it with the data. If your diskshould somehow become corrupted, the CRC check will fail when you next try toread the data and the disk controller will return with an error, forcing DOSto display the critical error "Data error reading drive C:". Most filetransfer protocols (like Xmodem, Zmodem, and some derivatives of Kermit) alsouse a CRC check to validate the data being transmitted; if the CRC transmittedwith the data does not match the CRC calculated by the receiving program, thenthe transmission has failed and the sending program is asked to retry thetransmission.

The actual calculation of the CRC is very simple. The algorithmconsists of two parts, a CRC polynomial and a CRC register, and is really anexercise in modulo-2 arithmetic. The rules for modulo-2 arithmetic are shownin the following table:

0 + 0 = 0 0 + 1 = 1 1 + 0 = 1 1 + 1 = 0

For those of you familiar with binary logic, these rules are equivalent tothe exclusive-or operation. Note: under modulo-2 arithmetic, subtraction isequivalent to addition.

There is nothing magical about modulo-2 arithmetic and it has nospecial properties that make it better suited to CRC calculations thanstandard arithmetic; rather, since modulo-2 arithmetic doesn't carry from onecolumn to the next (i.e. 1 + 1 = 0 with no carry), it's just easier toimplement in hardware and software than any other method. Consider thefollowing:

Each '+' symbol represents modulo-2 addition. The numbers above the CRCregister are the bit numbers of the register.

The CRC is calculated as follows:

1. Initialize the CRC register to 0.

2. Add the incoming bit of the data stream to the outgoing bit (bit 3) of the CRC register.

3. Send the result of step 1 into the polynomial feedback loop.

4. Add the value in the feedback loop to the bits in the CRC register as it is shifted left. The bits affected are determined by the CRC polynomial (i.e. there is an addition for every bit in the polynomial that is equal to 1; if the bit is 0, it is not fed back into the register). In this case, the polynomial represented is 1011.

5. Repeat steps 2-4 for every bit in the data stream.

6. The CRC is the value in the register.

Let's try this with the data stream 11010111 and the polynomial 1011. Theresult will be a 4-bit CRC.

The output stream to the left is the result of each addition operationat the left-most gate. This is the value that is fed into the polynomialfeedback loop during the left shift.

What should be obvious at this point is that if a single bit in thedata stream is changed, the value in the CRC register is corrupted completely.The feedback loop ensures that the error is propagated throughout the entirecalculation.

Most CRC algorithms use either a 16- or 32-bit polynomial. The longerthe polynomial, the more effective it is at catching errors; a 16-bit CRC, forexample, algorithm catches more than 99.99% of all errors in a data stream.

All other CRC algorithms are analogous to the 4-bit algorithmpresented here. There are some optimizations that can process several bits ata time; the source code included with this program uses a lookup table thatcan process 8 bits at once. For further discussion of the CRC algorithm andits variations, I recommend "C Programmer's Guide to Serial Communications" byJoe Campbell, published by Howard W. Sams & Company.

- Page 7 -

How CRCSET Works

The idea of storing a program's CRC in the executable file itself hasone drawback: since the CRC is part of the program, it becomes part of thedata stream that is used to calculate itself. In other words, you have toknow what the CRC of the program is in order to calculate it! At compile andlink time, this is downright impossible; changing the slightest thing in yourcode will change the CRC the next time you recompile.

Most algorithms that store the CRC in the program itself get aroundthis drawback by breaking up the program into three chunks:

I won't explain how (I don't want to give any virus-writers anyideas), but with the right technique the CRC can be found, recalculated, andrewritten in under 30 seconds.

CRCSET overcomes this limitation by making the polynomial and the CRCpart of the data stream. In order to calculate the CRC, CRCSET looks for apredefined string in the program (the default is DEAN_CRC), replaces the firstfour bytes with a 32-bit polynomial, sets the next four bytes (the true CRC)to 0, and calculates the intermediate CRC assuming that the true CRC is 0.Then, through the magic of matrix algebra, CRCSET calculates what the true CRCshould have been in order to yield itself instead of the intermediate CRC atthe end. Let's take a look at a 4-bit CRC calculation as an example.

Let's assume that the polynomial in use is 1011, that the CRCcalculated up to the point where we reach the search string (represented bythe bit pattern STUVWXYZ) is 0010, and that the bit pattern 1100 follows thesearch string:

If we collect all the variables on the left and all the constants on theright (keeping in mind that we are dealing with modulo-2 arithmetic):

X2 + X1 = 1 X3 + X2 + X0 = 0 X2 + X1 = 1 X3 + X2 + X0 = 0

The value 1010 is the intermediate CRC mentioned earlier.

Here we have an interesting situation. The first and third equationsare the same and so are the second and fourth. What we come down to is this:

X2 + X1 = 1 X3 + X2 + X0 = 0

We have four variables and only two equations. There is no unique solution;in fact, there are four (2 to the power of (4 - number of independentequations)) separate and distinct sets of values that will satisfy theseequations.

Since CRCSET needs a numeric solution, we have to arbitrarily set bitsto get one. For arguments sake, let's set X2 to 1.

1 + X1 = 1 X3 + 1 + X0 = 0

In other words:

X1 = 0 X3 + + X0 = 1

By setting X2 to 1, we have also fixed X1. Now let's set X0 to 0.

X3 + + 0 = 1

In other words:

X3 = 1

We now have a solution for the CRC of the program: 1100. There are threeothers: 0101, 0010, and 1011. If we replace the string WXYZ with any of thesevalues, the CRC calculation process will yield that value at the end, e.g.:

If you're not sure about this, try it with pen and paper. Plug in each of thefour values and you should get that same value at the end of the CRCcalculation process. To help you out, here are the intermediate values of theCRC register for each solution (the first value is the value after step 2 ofthe calculation):

The fact that there is not a unique solution isn't really important;only about 30% of the time will there be a unique solution. This does notdiminish the effectiveness of the CRC calculation because whichever of thefour values the CRC is set to, any virus installing itself in the program willstill change it. The fact that we did not get a unique solution does mean,however, that it is possible to get the following situation:

X2 + X1 = 1 X3 + X2 + X0 = 1 X2 + X1 = 1 X3 + X2 + X0 = 0

Here equations 2 and 4 contradict each other. There are no values of X3 to X0that will satisfy these equations. If the CRCSET program comes across thissituation, it will simply try again.

For illustration, I have used only a 4-bit CRC; the CRCSET algorithmuses 32 bits. The principle is the same; it just takes more time (and ink,paper, patience, caffeine, pizza, and chocolate chip cookies).

- Page 12 -

How to Use CRCSET

This is the easy part.

For C programmers, add the files VALIDCRC.C and VIRUSDAT.C to the listof files required to build the program you are working on (in Turbo C, addthem to the the project file). Add a call to isvalidcrc("PROGNAME.EXE")somewhere in your program (preferably in main()). The function isvalidcrc()returns 1 if the CRC is valid or 0 if it is invalid, if the program file isnot found, or if the CRC polynomial has been reset to 0.

For Turbo Pascal programmers, add the unit VirusCRC to your "uses"clause and call the function IsValidCRC('PROGNAME.EXE'). IsValidCRC returnsTRUE if the CRC is valid or FALSE if it is invalid, if the program file is notfound, or if the CRC polynomial has been reset to 0.

Example programs, TESTCRC.C and TESTCRC.PAS, have been provided.

Once you have compiled your program, you have to calculate its CRC.The program CRCSET.EXE has been provided for this purpose. The syntax is:

CRCSET [-] file [file [file [...]]] is an 8-character (0-padded) string in which to store CRC data (default is DEAN_CRC), [file] is one or more files for which a CRC is to be calculated.

The string for which CRCSET.EXE searches is stored in the variable _viruscrcin C and _VirusCRC in Turbo Pascal. The default is DEAN_CRC but you maychange it if there is a conflict (i.e. if there is more than one instance ofDEAN_CRC in the program, CRCSET.EXE will not know which one holds the CRC andso will not set it). The string is replaced with a randomly-generatedpolynomial and the CRC itself.

For example, to set the CRC for either of the test programs, thecommand is:

CRCSET TESTCRC.EXE

If you change the string in one of the test programs to something like"MyName", you would enter:

CRCSET -MyName TESTCRC.EXE

The case of the string on the command line must match exactly the case of thestring in the program. Also, any strings shorter than 8 characters must bepadded with 0's in the program.

If you run TESTCRC before setting the CRC, it will abort with awarning that it may have been infected. After you set the CRC, run TESTCRC toassure yourself that its CRC is valid.

If you want to test the reliability of the CRC check, change a fewbytes in TESTCRC.EXE (be careful, changing bytes in the code segment can hangthe program, so try bytes in the data segment, towards the end). Run TESTCRCagain, and it should warn you that it may have been infected.

Despite its complexity, CRCSET.EXE takes only a few seconds tocalculate the CRC of the target file. I have made some optimizations to the

- Page 13 -

algorithm that make the calculation time almost constant regardless of thesize of the file.

Once a CRC has been determined for your program, it takes little timefor the validation function to verify it every time the program is run.

CRCSET will display the name of the file that it is working on andone or more of the following messages:

CRC search string "" found more than once.

The search string specified occurs more than once in the file. This is an error, since CRCSET has no way of knowing which occurrence of the string it is supposed to replace with the polynomial and CRC. Change the search string in _viruscrc (C version) or _VirusCRC (Turbo Pascal version), recompile, and try again with the new string as an argument to CRCSET (e.g. CRCSET -NewStrng PROG.EXE).

CRC search string "" not found.

The search string specified does not occur in the file. Make sure that the string passed to CRCSET is correct and also that the validation module is being linked (i.e. that there is a call to the isvalidcrc() or IsValidCRC somewhere in your program).

Testing polynomial abcdef12 ... no solution.

The matrix generated by CRCSET using the polynomial abcdef12 does not have a solution. CRCSET will try again with another polynomial.

Testing polynomial abcdef12 ... CRC is 34567890. It is unique.

The matrix generated by CRCSET using the polynomial abcdef12 has the unique solution 34567890, i.e. 34567890 is the CRC of your program under the polynomial abcdef12. A unique solution will occur only about 30% of the time.

Testing polynomial abcdef12 ... CRC is 34567890. It is not unique (2^N solutions).

The matrix generated by CRCSET using the polynomial abcdef12 has the non-unique solution 34567890, i.e. 34567890 is the CRC of your program under the polynomial abcdef12. The number of free variables (the number of bits in the CRC whose value is irrelevant to the solution) is N; each bit has two possible values, so there are 2^N possible solutions. It is theoretically possible that all 32 bits will be free variables, but this is extremely unlikely. The fact that there is not a unique solution does not diminish the effectiveness of the CRC validation in any way.

Note: the validation function (both the C and Pascal versions) findsthe name of the running program in _argv[0] (or ParamStr(0)) if the program isrunning under DOS versions 3 or greater and searches the DOS PATH for theprogram under DOS version 2.

- Page 14 -

Vulnerability

The CRCSET algorithm, like every other anti-virus algorithm, isvulnerable to attack. Hand-tweaking the code to bypass the virus protectionis always possible. Direct attack to determine the storage location of thepolynomial and the CRC and to change it is also possible, but this can takeupwards of 5-10 minutes on a 386. Any virus that ties up the computer forthat long wins no points for discretion. Any user that doesn't notice asystem lockup lasting over 30 seconds probably has many other doors open forviruses anyway.

There is no substitute for proper precautions: downloading from areputable BBS, avoiding pirated software, scanning programs for viruses beforeusing them, and so on. This program was developed with the knowledge thatmost people don't take these precautions (based on a sample size of at least 1- me); rather than leave it up to the end user to protect against viruses,with this we programmers can take on some of the burden by protecting theprograms we write against them.