This document is about C64 tape loaders. I do not pretend to be rigorous since I'm more interested in showing how things work, rather than leaving a historical note. Of course, my view point is not optimal, being very familiar with the subject, therefore you may find omissions here and there. Anyway I am more than happy to discuss corrections and enhancements.

I think a short introduction about the C64 I/O hardware is required in order to deeply understand the meaning of data in Tap file format, which is not just a hardware-independent array of bytes.

I don't own this book, I just have 4 scans from its pages and it seems to be a professional and very detailed one. You can find a list of other interesting works from the same author at the end of this document.

Each bit of data or program sent to the C2N is encoded by the operating system using audio frequencies: that means square waves with a 50% duty cycle, often referred to as pulses (this is also the term I'll often be using moving forward).
A sequence of bits is therefore encoded as a sequence of square waves on tape, back-to-back.

It is evident that each name refers to the period (period = 1/frequency) of the square wave rather than to its frequency.
Data bits are encoded by a couple of pulses (medium and short pulses are used). The structure of this loader is discussed in the CBM Loader page. You may give it a look when done with the following paragraph about Tap format. The software details of CBM loader are discussed in Nick Hampshire's book mentioned above.

On the other side, a Turbo Loader usually uses just two pulses:

“short pulse”

“long pulse”

whose frequencies are chosen by its designer. The length of a pulse saved on tape decides whether the bit is a 1 or a 0. In fact, these loaders don't usually use a sequence of pulses to encode a bit: just a single pulse does the job.

NOTE about the Hardware: a thing to point out is that during a SAVE operation, on the WRITE line of the Datasette port, “pulse length” is intended as the time distance between two consecutive low-high transitions. During a LOAD operation, pulse length is the time distance between two consecutive high-low transitions, since the 6526 READ line triggers on negative edges. In other words, the signal from C2N to C64 is the negative version of the one from C64 to C2N. That's why tape duplication hardware consists in an inverter (with a BJT and 2 resistors used in a so called “common emitter” scheme): the signal from the C2N performing LOAD, intended for being sent to C64, needs being inverted before being sent to the C2N doing the SAVE operation.

Check end of article (after the Terminator 2 loader analysis) for infomation on CIA 1 + 2 and Vectors.

Designed by Per Hakan Sundell (author of the CCS64 C64 emulator) in 1997, this format attempts to duplicate the data stored on a C64 cassette tape, pulse after pulse. The difference from a WAV sampling of any C64 tape and its Tap file data is that Tap file is not a sampled version of the waveforms stored on tape.

Each nonzero byte in the Tap file data area represents the time length of a single pulse. The conversion formula is given here:

Tap Data Byte=Pulse length (expressed in seconds)*C64 PAL Frequency/8

where “C64 Pal Frequency” is 985248 Hz. By calculating the constant 1E-6*C64 PAL Frequency/8, the following equivalent formula can be produced:

Tap Data Byte=Pulse length (microseconds)*0.123156

where “Pulse length” is the time interval between two negative edges of the received square wave. As example, CBM pulses correspond to the following TAP values:

It is clear that this conversion introduces some information alteration due to quantization: two pulses with similar length may produce the same Tap value.

NOTE about CBM pulses size: Tap imports of older tapes show a better consistence with the above mentioned values than younger ones. Anyway the operating system can produce a correction factor which allows a very wide variation in tape speed without affecting reading. In fact, the sequence of short pulses written on the CBM leader is used to synchronize the read routine timing to the timing on the tape.

We'll have a general discussion here about Turbo Loaders and C64 I/O dedicated hardware. For a detailed discussion about this topic you may consult Nick Hampshire's book, “The Commodore 64 Kernal and Hardware Revealed”. A detailed example can be found in next paragraph.

Almost every marketed C64 tape software uses some form of Turbo Loader. The origin of these Turbo Loaders is rather obscure since many of the software houses use the same routines.

A Turbo Loader is a routine which must be loaded into C64 RAM before being executed and therefore every Turbo Loader routine is stored in a Standard CBM encoded “boot” file. Usually a part of the Turbo Loader routines is stored in the CBM file Header and therefore loaded in the tape buffer (at $033C-$03FB). CBM file Data is often used both for other Turbo Loader routines and to modify the table of vectors in low RAM, to cause the autostart of the turbo loader itself (eg. it may modify $0326/$0327 where the output vector is located). When the standard LOAD ends, the operating system executes various operations, one of which is printing the “READY.” message on the C64 screen. By default, at $0326/$0327 there's the start address of the onscreen print routine (remember any of the 64K memory addresses can be identified by 2 bytes, low significant part first and then most significant. As example, the couple of values 01-08 is a pointer to $0801). If CBM loader loads Data block overwriting this vector with the Turbo Loader start address, the operating system, instead of printing the “READY.” message at the end of CBM LOAD, executes the Turbo Loader.

When executed, a Turbo Loader “replaces” the existing LOAD and allows a program or data to be loaded from tape at a faster speed than the normal LOAD. This is achieved by simply reducing the length of pulses stored onto the tape, in order to allow a far greater density of information storage per inch of tape.

Each bit is flagged in the interrupt register on the falling (negative) edge of the pulse. A widely used Turbo loader scheme runs with the interrupts disabled, sets a timer to between the two lengths (which we will refer as “threshold” value), and when the timer runs out, the interrupt register is checked to see if the pulse came in or not. If the falling edge of the pulse sets the relevant interrupt flag before the timer runs out then the pulse was a “short” pulse (usually identified with bit zero), otherwise it was a “long” one (bit one). Bits are then rotated into a byte storage until 8 bits have been read, thereby loading a full byte. This rotation may be to left or right, which establishes endianess: MSbF or LSbF.

Before any byte can be read and stored, the Turbo Loader must set itself to be in sync with the bits on the tape. This is done by writing a certain string of bits at every byte interval. The routine then tries to align itself by recognising the value of the byte. An example of a header byte for aligning would be the value 64, hex $40 or in binary: 01000000. A series of these bytes is written as the header; only when this byte has been read in a number of times consecutively, the actual program can be read without risk of alignment errors.

We document here a “nonIRQ based” loader step-by-step. Before starting with this reading it's necessary to have a good knowledge of CIA Timers. I reported in Appendix A and B an extract (I did some changes where needed) from MAPC6410.TXT (the Project 64 etext of the “Mapping The Commodore 64 book”). Those paragraphs just cover the CIAs use we are interested in.
In addition to CIA Timers, you should have a copy of the “Commodore 64 Programmer's Reference Guide”, to consult it while studying the ASM code.

Here we'll see how this Turbo Loader performs the operations we discussed before. Please consult the documentation about this loader (CHR Loader T1) coming with Stewart Wilson't Final TAP. Get a Tap version of “Cauldron” if you want to extract yourself the listings I report here.

CHR Loader T1-T3 routines are partly stored in the CBM file Header. CBM file Data (loaded at $02A7-0303) stores other routines and is used to cause the autostart of this loader. The autostart feature is performed by using the IMAIN Vector at $0302-0303. By default, this vector points to the address of the main BASIC program loop at $A483. This is the routine that is operating when you are in the direct mode (READY.). It executes statements, or stores them as program lines. Cauldron Loader sets the IMAIN Vector to point to $02AE, therefore, when CBM LOAD ends, control is given to the Turbo Loader.

Using Final TAP to exam the mentioned Tap file, you can get the following listings.

I'll assume you are familiar with hardware interrupts and ISRs (you don't absolutely require to know how they work on a C64, but a small knowledge about interrupts in general is enough). If you know about any data-link layer networking protocol, it can be useful (for understanding things better) to compare the datasette to a network adapter. Import the problems of framing you have on the data-link layer (which is equivalent to our loader) and adapt them to our study.

First, here's a summary of what we need to do when writing an IRQ-based loader:

We first need to disable the system of interrupts, by setting the interrupt disable status bit (this is done by a SEI instruction).

Then we have to disable all interrupts individually (by WRITING to $DC0D, which is an Interrupt Control Register when written to) and clear any latched interrupt request (by READING the clear-on-read register $DC0D, which is an Interrupt Latch Register when read from- e.g. bit 1 reads 1 when CIA #1 Timer B countdown expires).

Now we have to set the start value of the timer we'll be using to measure the length of the pulses coming from the tape (CIA #1 Timer A was chosen in the discussed loader). That's done by WRITING the start value in $DC04/$DC05 (which is the CIA #1 16-bit Timer A latch value). The timer will count down to zero starting from the value we just chose (one-shot mode). We'll restart the countdown every time we received a pulse, to measure the pulse that will come after the one we just measured.

Then we have to enable the FLAG line interrupt (the interrupt that triggers when a pulse is read from the datasette). The interrupt won't trigger until we enable the system of interrupts. Before doing that, we have to declare where our Interrupt Service Routine is (by making the vector at $FFFE/$FFFF point to our ISR).

After enabling interrupts (CLI instruction), we are ready to measure the pulses coming from the datasette, align our read routine with the bit stream (using the pilot byte information), syncronize (ie. know where exactly the turbo frame starts), and finally read the header which tells us where to store the following data bytes in the RAM.

I'm referring to those loaders using an IRQ handler routine and FLAG line(1) interrupt. For some reason, the Threshold value for them is often omitted in docs. Here is a note about how to extract it. I will refer to the ASM code of a loader I just found, which uses CIA #1, Timer B. Other loaders may use a different combination, but the results are the same.

Before changing the vector to main IRQ handler (by writing to $FFFE/$FFFF) we have:

What this code does is to compare the Timer B Countdown value with $0200. Since the initial value is $03A0 (clock cycles), and it counts DOWN to 0, if the Countdown is at a value greater than $0200 clock cycles when a pulse received on FLAG line(1), the latter is shorter than $01A0 clock cycles. $01A0 is therefore the Threshold value (in clock cycles).

In our example, the TAP value is then:

TAP threshold byte = Threshold (in microseconds) * 0.123156 = $34

where the Threshold (in microseconds) is Threshold * 1e6/CPUFrequency.

(1) on CIA #1, this FLAG line is connected to the Cassette Read line of the Cassette Port.

Locations 56320-56335 ($DC00-$DC0F) are used to communicate with the Complex Interface Adapter chip #1 (CIA #1). This chip allows the 6510 microprocessor to communicate with peripheral input and output devices. The specific devices that CIA #1 reads data from and sends data to are the joystick controllers, the paddle fire buttons, and the keyboard.

In addition to its two data ports, CIA #1 has two timers, each of which can count an interval from a millionth of a second to a fifteenth of a second. Or the timers can be hooked together to count much longer intervals. CIA #1 has an interrupt line which is connected to the 6510 IRQ line. These two timers can be used to generate interrupts at specified intervals (such as the 1/60 second interrupt used for keyboard scanning, or the more complexly timed interrupts that drive the tape read and write routines).

Location Range: 56320-56321 ($DC00-$DC01)
CIA #1 Data Ports A and B

Data Port B can be used as an output by either Timer A or B. It is possible to set a mode in which the timers do not cause an interrupt when they run down (see the descriptions of Control Registers A and B at 56334-5 ($DC0E-F)). Instead, they cause the output on Bit 6 or 7 of Data Port B to change. Timer A can be set either to pulse the output of Bit 6 for one machine cycle, or to toggle that bit from 1 to 0 or 0 to 1. Timer B can use Bit 7 of this register for the same purpose.

Location Range: 56324-56327 ($DC04-$DC07)
Timers A and B Low and High Bytes

These four timer registers (two for each timer) have different functions depending on whether you are reading from them or writing to them. When you read from these registers, you get the present value of the Timer Counter (which counts down from its initial value to 0). When you write data to these registers, it is stored in the Timer Latch, and from there it can be used to load the Timer Counter using the Force Load bit of Control Register A or B (see 56334-5 ($DC0E-F) below).

These interval timers can hold a 16-bit number from 0 to 65535, in normal 6510 low-byte, high-byte format (VALUE=LOW BYTE+256*HIGH BYTE). Once the Timer Counter is set to an initial value, and the timer is started, the timer will count down one number every microprocessor clock cycle. Since the clock speed of the 64 (using the American NTSC television standard) is 1,022,730 cycles per second, every count takes approximately a millionth of a second. The formula for calculating the amount of time it will take for the timer to count down from its latch value to 0 is:

TIME=LATCH VALUE/CLOCK SPEED

where LATCH VALUE is the value written to the low and high timer registers (LATCH VALUE=TIMER LOW+256*TIMER HIGH), and CLOCK SPEED is 1,022,370 cycles per second for American (NTSC) standard television monitors, or 985,250 for European (PAL) monitors.

When Timer Counter A or B gets to 0, it will set Bit 0 or 1 in the Interrupt Control Register at 56333 ($DC0D). If the timer interrupt has been enabled (see 56333 ($DC0D)), an IRQ will take place, and the high bit of the Interrupt Control Register will be set to 1. Alternately, if the Port B output bit is set, the timer will write data to Bit 6 or 7 of Port B. After the timer gets to 0, it will reload the Timer Latch Value, and either stop or count down again, depending on whether it is in one-shot or continuous mode (determined by Bit 3 of the Control Register).

Although usually a timer will be used to count the microprocessor cycles, Timer A can count either the microprocessor clock cycles or external pulses on the CTN line, which is connected to pin 4 of the User Port.

Timer B is even more versatile. In addition to these two sources, Timer B can count the number of times that Timer A goes to 0. By setting Timer A to count the microprocessor clock, and setting Timer B to count the number of times that Timer A zeros, you effectively link the two timers into one 32-bit timer that can count up to 70 minutes with accuracy within 1/15 second.

In the 64, CIA #1 Timer A is used to generate the interrupt which drives the routine for reading the keyboard and updating the software clock. Both Timers A and B are also used for the timing of the routines that read and write tape data. Normally, Timer A is set for continuous operation, and latched with a value of 149 in the low byte and 66 in the high byte, for a total Latch Value of 17045. This means that it is set to count to 0 every 17045/1022730 seconds, or approximately 1/60 second.

For tape reads and writes, the tape routines take over the IRQ vectors. Even though the tape write routines use the on-chip I/O port at location 1 for the actual data output to the cassette, reading and writing to the cassette uses both CIA #1 Timer A and Timer B for timing the I/O routines.

This register is used to control the five interrupt sources on the 6526 CIA chip. These sources are Timer A, Timer B, the Time of Day Clock, the Serial Register, and the FLAG line. Timers A and B cause an interrupt when they count down to 0. The Time of Day Clock generates an interrupt when it reaches the ALARM time. The Serial Shift Register interrupts when it compiles eight bits of input or output. An external signal pulling the CIA hardware line called FLAG low will also cause an interrupt (on CIA #1, this FLAG line is connected to the Cassette Read line of the Cassette Port).

Even if the condition for a particular interrupt is satisfied, the interrupt must still be enabled for an IRQ actually to occur. This is done by writing to the Interrupt Control Register. What happens when you write to this register depends on the way that you set Bit 7. If you set it to 0, any other bit that was written to with a 1 will be cleared, and the corresponding interrupt will be disabled. If you set Bit 7 to 1, any bit written to with a 1 will be set, and the corresponding interrupt will be enabled. In either case, the interrupt enable flags for those bits written to with a 0 will not be affected.

For example, in order to disable all interrupts from BASIC, you could POKE 56333, 127. This sets Bit 7 to 0, which clears all of the other bits, since they are all written with 1's. Don't try this from BASIC immediate mode, as it will turn off Timer A which causes the IRQ for reading the keyboard, so that it will in effect turn off the keyboard.

To turn on the Timer A interrupt, a program could POKE 56333,129. Bit 7 is set to 1 and so is Bit 0, so the interrupt which corresponds to Bit 0 (Timer A) is enabled.

When you read this register, you can tell if any of the conditions for a CIA Interrupt were satisfied because the corresponding bit will be set to a 1. For example, if Timer A counts down to 0, Bit 0 of this register will be set to 1. If, in addition, the mask bit that corresponds to that interrupt source is set to 1, and an interrupt occurs, Bit 7 will also be set. This allows a multi-interrupt system to read one bit and see if the source of a particular interrupt was CIA #1. You should note, however, that reading this register clears it, so you should preserve its contents in RAM if you want to test more than one bit.

Bits 0-3. This nybble controls Timer A. Bit 0 is set to 1 to start the timer counting down, and set to 0 to stop it. Bit 3 sets the timer for one-shot or continuous mode.

In one-shot mode, the timer counts down to 0, sets the counter value back to the latch value, and then sets Bit 0 back to 0 to stop the timer. In continuous mode, it reloads the latch value and starts all over again.

Bits 1 and 2 allow you to send a signal on Bit 6 of Data Port B when the timer counts. Setting Bit 1 to 1 forces this output (which overrides the Data Direction Register B Bit 6, and the normal Data Port B value). Bit 2 allows you to choose the form this output to Bit 6 of Data Port B will take. Setting Bit 2 to a value of 1 will cause Bit 6 to toggle to the opposite value when the timer runs down (a value of 1 will change to 0, and a value of 0 will change to 1). Setting Bit 2 to a value of 0 will cause a single pulse of a one machine-cycle duration (about a millionth of a second) to occur.

Bit 4. This bit is used to load the Timer A counter with the value that was previously written to the Timer Low and High Byte Registers. Writing a 1 to this bit will force the load (although there is no data stored here, and the bit has no significance on a read).

Bit 5. Bit 5 is used to control just what it is Timer A is counting. If this bit is set to 1, it counts the microprocessor machine cycles (which occur at the rate of 1,022,730 cycles per second). If the bit is set to 0, the timer counts pulses on the CNT line, which is connected to pin 4 of the User Port. This allows you to use the CIA as a frequency counter or an event counter, or to measure pulse width or delay times of external signals.

Bit 6. Whether the Serial Port Register is currently inputting or outputting data (see the entry for that register at 56332 ($DC0C) for more information) is controlled by this bit.

Bit 7. This bit allows you to select from software whether the Time of Day Clock will use a 50 Hz or 60 Hz signal on the TOD pin in order to keep accurate time (the 64 uses a 60 Hz signal on that pin).

Bits 0-3. This nybble performs the same functions for Timer B that Bits 0-3 of Control Register A perform for Timer A, except that Timer B output on Data Port B appears at Bit 7, and not Bit 6.

Bits 5 and 6. These two bits are used to select what Timer B counts. If both bits are set to 0, Timer B counts the microprocessor machine cycles (which occur at the rate of 1,022,730 cycles per second). If Bit 6 is set to 0 and Bit 5 is set to 1, Timer B counts pulses on the CNT line, which is connected to pin 4 of the User Port. If Bit 6 is set to 1 and Bit 5 is set to 0, Timer B counts Timer A underflow pulses, which is to say that it counts the number of times that Timer A counts down to 0. This is used to link the two numbers into one 32-bit timer that can count up to 70 minutes with accuracy to within 1/15 second. Finally, if both bits are set to 1, Timer B counts the number of times that Timer A counts down to 0 and there is a signal on the CNT line (pin 4 of the User Port).

Bit 7. Bit 7 controls what happens when you write to the Time of Day registers. If this bit is set to 1, writing to the TOD registers sets the ALARM time. If this bit is cleared to 0, writing to the TOD registers sets the TOD clock.

Locations 56576-56591 ($DD00-$DD0F) are used to address the Complex Interface Adapter chip #2 (CIA #2). Since the chip itself is identical to CIA #1, which is addressed at 56320 ($DC00), the discussion here will be limited to the use which the 64 makes of this particular chip. For more general information on the chip registers, please see the corresponding entries for CIA #1.

A significant (for our purposes) difference between CIA chips #1 and #2 is that the interrupt line of CIA #1 is wired to the 6510 IRQ line, while that of CIA #2 is wired to the NMI line. This means that interrupts from this chip cannot be masked by setting the Interrupt disable flag (SEI). They can be disabled from CIA's Mask Register, though. Be sure to use the NMI vector when setting up routines to be driven by interrupts generated by this chip.

This vector points to the address of the routine that will be executed when a Non-Maskable Interrupt (NMI) occurs (currently at 65095 ($FE47)).

There are two possible sources for an NMI interrupt. The first is the RESTORE key, which is connected directly to the 6510 NMI line. The second is CIA #2, the interrupt line of which is connected to the 6510 NMI line.

When an NMI interrupt occurs, a ROM routine sets the Interrupt disable flag, and then jumps through this RAM vector. The default vector points to an interrupt routine which checks to see what the cause of the NMI was.

If the cause was CIA #2, the routine checks to see if one of the RS-232 routines should be called. If the source was the RESTORE key, it checks for a cartridge, and if present, the cartridge is entered at the warm start entry point. If there is no cartridge, the STOP key is tested. If the STOP key was pressed at the same time as the RESTORE key, several of the Kernal initialization routines such as RESTOR, IOINIT and part of CINT are executed, and BASIC is entered through its warm start vector at 40962. If the STOP key was not pressed simultaneously with the RESTORE, the interrupt will end without letting the user know that anything happened at all when the RESTORE key was pressed.

Since this vector controls the outcome of pressing the RESTORE key, it can be used to disable the STOP/RESTORE sequence. A simple way to do this is to change this vector to point to the RTI instruction. A simple

LDA #$C1
STA $0318

will accomplish this. To set the vector back:

LDA #$47
STA $0318

Note that this will cut out all NMIs, including those required for RS-232 I/O.

Location Range: 65530-65535 ($FFFA-$FFFF)
6510 Hardware Vectors

The last six locations in memory are reserved by the 6510 processor chip for three fixed vectors. These vectors let the chip know at what address to start executing machine language program code when an NMI interrupt occurs, when the computer is turned on, or when an IRQ interrupt or BRK occurs.