Decoding BASIC variables

After I figured out that the save game files in Questron were memory dumps of the BASIC variables section, I had to find out how to interpret that data. Luckily, a lot of old books about Commodore 64 programming are now available in various places on the web. In particular, I referred to Jim Butterfield’s Machine Language for the Commodore 64, 128, and other Commodore Computers and Compute!’s Vic-20 and Commodore 64 Tool Kit: BASIC by Dan Heeb.

And by the way, a lot of this should be pretty applicable to other 6502 computers, since most of them used versions of Microsoft 6502 BASIC.

The variables space is divided into three sections – the scalar variables, array variables, and character data for strings. The scalar section is pretty straightforward. Each variable is described using 7 bytes. The first two bytes are the variable’s name, but there’s a trick: the type of the variable is also indicated by setting the high bit in one or byte or the other. The variable is a float if neither high bit is set, an int if both are set, and a character string if only the high bit in the second letter of the name is set.

This implementation detail actually explains some of the behavior of the BASIC interpreter. For example, C64 BASIC only cares about the first two letters of a variable name. You can use more, but it won’t matter – the interpreter treats FOO and FOOBAR as the same variable. On the other hand, variables with different types can have their own entries. So the floating-point variable HP would be distinct from the string HP$. I vaguely remember those rules from way back when I first learned BASIC, but it’s neat now to see where they came from.

The remaining 5 bytes of data are the value of the variable. For integers variables, two bytes are used to store a 16-bit integer value. For strings, one byte stores the string length and two more are a pointer into the string data area. Floating point variables – which are kind of the default – are more complicated. And as for why the authors of Questron used floating point variables to store fields like hit points or food? Who knows.

Commodore BASIC uses 5 bytes for “packed” floats. One byte is the exponent and 4 bytes are the mantissa, but it cheats and doesn’t save the actual first one bit of the mantissa, using that place to store a sign bit instead. I wrote a Python function to see if I could figure out the decoding:

def decodePackedFloat (data):
"""decodes the 5-byte packed c64 BASIC float in data, and returns a
float value. See appendix F of Butterfield"""
# data[0] is both the exponent and a zero flag. If it's zero, the whole
# number is 0.
if data[0] == 0:
return 0.0
# The packed representation has an 8 bit exponent and a 32 bit mantissa.
# If exponent is 128, all 32-bits of mantissa should be to the right
# of the decimal point - which we get by multiplying the mantissa by
# 2^-32.
exponent = data[0] - 128 - 32
# Since the highest bit of mantissa is always going to be a 1, the
# packed format can cheat and use that high bit for something else -
# the sign of the number.
if data[1] >= 128:
sign = -1
else:
sign = 1
# Build the mantissa out of the last 4 elements of data. Note we're
# making sure the high bit is 1. Also note that the number is stored
# big-endian, which is very un-C64-like. I blame Bill Gates.
mantissa = ((data[1] | 0x80) << 24) + (data[2] << 16) + (data[3] << 8) + data[4]
# put together our final result
return sign * mantissa * pow (2, exponent)

The array variables are a bit more complicated, especially since you can theoretically have an array of up to 256 dimensions – you just wouldn’t be able to write a BASIC line long enough to dereference it. Arrays use the same naming convention as scalar variables, and the same rules apply. After that there’s two bytes storing the total size of the array entry, one byte storing the number of dimensions, and then the size of each dimension stored as two bytes (the dimensions are stored in reverse order). The array data follows, I believe in column-major order.

I had a hard time with the array data, and the thing that threw me was that the total size of the array entry was stored in little-endian order, but the size of the individual dimensions is apparently big-endian. This actual contradicts the diagrams in the Tool Kit: BASIC book. I did some peeking and poking on actual hardware to convince myself I had the right order, and then my decoding routine started giving good results. Here are some of the more interesting BASIC variables I found:

So now I pretty much have all the information I need to create a character editor – I just need to decide how to do it. I’ve been using Python for analyzing disks and prototyping stuff, but if I want a piece of code that’ll run on an actual C64. It might be time to dust off cc65 again.