10ATARI BASIC

WHAT IS ATARI BASIC?

ATARI BASIC is an interpreted language. This means programs can be run when they are entered without intermediate stages of compilation and linking. The ATARI BASIC interpreter resides in an 8K ROM cartridge in the left slot of the computer. It encompasses addresses A000 through BFFF. At least 8K of RAM is required to run BASIC.

To use ATARI BASIC effectively, you must know its strengths and weaknesses. With this information, programs can be written that make good use of the assets and features of ATARI BASIC.

Strengths of ATARI BASIC

It supports the operating system graphics – Simple graphics calls can be
made to display information on the screen.

It supports the hardware – Such calls as SOUND, STICK and PADDLE are
simple interfaces to the hardware of the computer.

ROM based interpreter – The BASIC interpreter is in ROM, which prevents accidental modification by the user program.

DOS support – Specialized calls such as NOTE and POINT (DOS 2.0S) allow
the user to randomly access a disk through the disk operating system.

Peripheral support – Any peripheral recognized by the operating system
can be accessed from a BASIC program.

Weaknesses of ATARI BASIC

No support of integers – All numbers are stored as 6-byte BCD floating
point numbers.

Slow math package – Since all numbers are six bytes long, math
operations become rather slow.

No string arrays – Only one-dimensional strings can be
created.

HOW ATARI BASIC WORKS

The workings of the BASIC interpreter are summarized as follows:

BASIC gets a line of input from the user and converts it into a
tokenized form.

It then puts this line into a token program.

This program is then executed.

The details of these operations are discussed in the following four sections.

The Tokenizing Process

The Token File Structure

The Program Execution Process

System Interaction

THE TOKENIZING PROCESS

In simple terms, the tokenization of a line of code in BASIC looks like this:

BASIC gets a line of input

It then checks for legal syntax

During syntax checking it is tokenized

The tokenized line is moved into the token program

If the line is in immediate mode it is executed

To better understand the tokenizing process, some terms must first be defined:

Token

An 8-bit byte containing a particular interpretable code.

Statement

A complete "sentence" of tokens that causes BASIC to perform some
meaningful task. In LIST form, statements are separated by
colons.

Line

One or more statements preceded either by a line number in the
range of 0 to 32767 or an immediate mode line with no number.

Command

The first executable token of a statement that tells BASIC to
interpret the tokens that follow in a particular way.

Variable

A token that is an indirect pointer to its actual value; thus the
value can be changed without changing the token.

Constant

A 6-byte BCD value preceeded by a special token. This value
remains unchanged throughout program execution.

Operator

Any one of 46 tokens that in some way move or modify the values
that follow them.

Function

A token that when executed returns a value to the program.

EOL

End of Line. A character with the value 9B hex.

BCD

Binary code decimal. A number that uses the 6502 decimal
mode.

BASIC begins the tokenizing process by getting a line of input. This input will be obtained from one of the handlers of the operating system. Normally it is from the screen editor; however with the ENTER command, any device can be specified. The call BASIC issues is a GET RECORD command, and the data returned is ATASCII information terminated by an EOL. This data is stored by CIO into the BASIC Input Line Buffer from 580 to 5FF hex.

After the record is returned, the syntax checking and tokenizing processes begin. First BASIC looks for a line number. If one is found, it is converted into a 2-byte integer. If no line number is present, it is assumed to be in immediate mode and the line-number 8000 hex is assigned to it. These will be the first two tokens of the tokenized line. This line is built in the token output buffer that is 256 bytes long and resides at the end of the reserved operating system RAM.

The next token is a dummy byte reserved for the byte count (or offset) from the start of this line to the start of the next line. Following that is another dummy byte for the count of the start of this line to the start of the next statement. These values will be set when tokenization is complete for the line and the statement respectively. The use of these values is discussed in the program execution process section.

BASIC now looks for the command of the first statement of the input line. A check is made to determine if this is a valid command by scanning a list of
legal commands in ROM. If a match is found, then the next byte in the token line becomes the number of the entry in the ROM list that matched. If no match is found, a syntax error token is assigned to that byte and BASIC stops tokenizing, copies the rest of the input buffer in ATASCII format to the token output buffer, and prints the error line.

Assuming a good line, one of seven items can follow the command: a variable, a constant, an operator, a function, a double quote, another statement, or an EOL. BASIC tests if the next input character is numeric. If not then it compares that character and those following against the entries of the variable name table. If this is the first line of code entered in the program then no match is found. The characters are then compared against the function and operator tables. If no match is found there then BASIC assumes that this is a new variable name. Since this is the first variable it is assigned the first entry in the variable name table. The characters are copied out of the input buffer and stored into the name table with the most significant bit (MSB) set on the last byte of the name. Eight bytes are then reserved in the variable value table for this entry. (See the variable value table discussion in the section, "Token File Structure".)

The token that ends up in the tokenized line is the variable number minus
one; with the MSB set. Thus the token of the first variable entered would be 80
Hex, the second would be 81, and so on up to FF for a total of 128 unique
variable numbers.

If a function is found, then its entry number in the operator function table
is assigned to the token. Functions require certain sequences of parameters;
these are contained in syntax tables, and if they are not matched, a syntax
error will result.

If an operator is found, then a token is given its table entry number.
Operators can follow each other in a rather complex fashion (such as multiple
parentheses), so the syntax checking of them is a bit complicated.

In the case of the double quotes, BASIC assumes that a character string is
following and assigns a 0F hex to the output token and reserves a dummy byte for the string length. The characters are moved from the input buffer into the
output buffer until the second set of quotes is found. The length byte is then
set to the character count.

If the next characters in the input buffer are numeric, BASIC converts them into a 6-byte BCD constant. A 0E hex token will be put in the output buffer, followed by the six byte constant.

When a colon is encountered, a 14 hex token is inserted in the output buffer and the offset from the start of the line is stored in the dummy byte that was reserved for the count to the start of the next statement. At this point another dummy byte is reserved and the process goes back to get a command.

When the EOL is found, a 16 hex token is stored and the offset from the start of the line is put in the dummy byte for the line offset. At this point,
tokenization is complete and BASIC moves the token line into the token program. First it searched the program for that line number. If it is found it replaces the old line with the new one. If it is not found, then it inserts the new line in the correct numerical sequence. In both cases, the data following the line will be moved either up or down in memory to allow for an expanding and
contracting program size.

BASIC now checks if the tokenized line is an immediate mode line. If so, that line is executed according to the methods described in the interpretive process; if not, BASIC goes back to get another line of input.

If at any time during the tokenizing process the length of the token line
exceeds 256 bytes, an ERROR 14 message (line too long) is sent to the screen and BASIC goes back to get the next line of input.

An example line of input and its token form looks like this (all token values
are hexadecimal):

THE TOKEN FILE STRUCTURE

The token file contains two major segments: (1) a group of zero page pointers that point into the token file, and (2) the actual token file itself. The zero page pointers are 2-byte values that point to various sections of the token file. There are nine 2-byte pointers and they are in locations 80 to 91 hex. Following is a list of the pointers and the sections of the token file they reference.

Pointer (hex)

Token File Section (Contiguous Blocks)

LOMEM 80,81

Token output buffer – This is the buffer BASIC uses to tokenize one
line of code. It is 256 bytes long. This buffer resides at the end of the
operating system's allocated RAM.

VNTP 82,83

Variable name table – A list of all the variable names that have been
entered in the program. They are stored as ATASCII characters, each new
name stored in the order it was entered. Three types of name entries
exist:

Scalar variables – MSB set on last character in name.

String variables – last character is a with the MSB set.

Array variables – last character is a with the MSB set.

VNTD 84,85

Variable name table dummy end – BASIC uses this pointer to indicate
the end of the name table. This normally points to a dummy zero byte when
there are less than 128 variables. When 128 variables are present, this
points to the last byte of the last variablename.

VVTP 86,87

Variable value table – This table contains current information on each
variable. For each variable in the name table, eight bytes are reserved in
the value table. The information for each variable type is:

Byte Number

1

2

3

4

5

6

7

8

Scalar

00

Var#

6-byte BCD constant

Array (DIMed)(unDIMed)

4140

Var#

Offset fromSTARP(8C,8D)

firstDIM + 1

secondDIM + 1

String (DIMed)(unDIMed)

8180

Var#

Offset fromSTARP

Length

DIM

A scalar variable contains a numeric value. An example is X=1. The
scalar is X and its value is 1, stored in 6-byte BCD format. An array is
composed of numeric elements stored in the string/array area and has one
entry in the value table. A string, composed of character elements in the
string/array area, also has one entry in the table.

The first byte of each value entry indicates the type of variable: 00
for a scalar, 40 for an array, and 80 for a string. If the array or string
has been dimensioned, then the LSB is set on the first byte.

The second byte contains the variable number. The first variable entry
is number zero, and if 128 variables were present, the last would be 7F.

In the case of the scalar variable the third through eighth byte
contain the 6-byte BCD number that has currently been assigned to it.

For arrays and strings, the third and fourth bytes contain an offset
from the start of the string/array area (described below) to the beginning
of the data.

The fifth and sixth bytes of an array contain its first dimension. The
quantity is a 16-bit integer and its value is 1 greater than the user
entered. The seventh and eighth bytes are the second dimension, also a
value of 1 greater.

The fifth and sixth bytes of a string are a 16 bit integer that
contains its current length. The seventh and eighth bytes are its
dimension (up to 32767 bytes in size).

STMTAB 88,89

Statement Table – This block of data includes all the lines of code
that have been entered by the user and tokenized by BASIC, and it also
includes the immediate mode line. The format of these lines is described
in the tokenized line example of the section on the tokenizing
process.

STMCUR 8A,8B

Current Statement – This pointer is used by BASIC to reference
particular tokens within a line of the statement table. When BASIC is
waiting for input, this pointer is set to the beginning of the immediate
mode line.

STARP 8C,8D

String/Array area – This block contains all the string and array data.
String characters are stored as one byte ATASCII entries, so a string of
20 characters will require 20 bytes. Arrays are stored with 6-byte BCD
numbers for each element. A 10-element array would require 60 bytes. This
area is allocated and subsequently enlarged by each dimension statement
encountered, the amount being equal to the size of a string dimension or
six times the size of an array dimension.

RUNSTK 8E,8F

Run time stack – This software stack contains GOSUB and FOR/NEXT
entries. The GOSUB entry consists of four bytes. The first is a 0 byte
indicating GOSUB, followed by the 2-byte integer line number on which the
call occurred. This is followed by the offset into that line so the RETURN
can come back and execute the next statement. The FOR/NEXT entry contains
16 bytes. The first is the limit the counter variable can reach. The
second byte is the step or counter increment. Each of these quantities is
in 6-byte BCD format. The thirteenth byte is the counter variable number
with the MSB set. The fourteenth and fifteenth bytes are the line number,
and the sixteenth is the line offset to the FOR statement.

MEMTOP 90,91

Top of application RAM – This is the end of the user program. Program
expansion can occur from this point to the end of free RAM, which is
defined by the start of the display list. The FRE function returns the
amount of free RAM by subtracting MEMTOP from HIMEM (2E5,2E6). Note that
the BASIC MEMTOP is not the same as the OS variable called
MEMTOP.

THE PROGRAM EXECUTION PROCESS

Executing a line of code is a process that involves reading the tokens that were created during the tokenization process. Each token has a particular meaning that causes BASIC to execute a specific series of operations. The method of doing this requires that BASIC get one token at a time from the token program and then process it. The token is an index into a jump table of routines, so a PRINT token will point indirectly to a PRINT processing routine. When that processing is complete, BASIC returns to get the next token. The pointer that is used to fetch each token is called STMCUR and is at 8A and 8B.

The first line of code that is executed in a program is the immediate mode
line. This is usually a RUN or GOTO. In the case of the RUN, BASIC gets the
first line of tokens from the statement table (tokenized program) and processes it. If all the code is in-line, then BASIC merely executes consecutive lines.

If a GOTO is encountered, then the line to go to must be found. The statement table contains a linked list of tokenized BASIC lines. These lines are stored in ascending numerical order. To find a line somewhere in the middle of the table, BASIC starts by finding the first line of the program.

The address of the first line is contained in the STMTAB pointer at 88 and
89. This address is now stored in a temporary pointer. The first 2 bytes of the
first line are its line number which is compared against the requested line
number. If the first number is less, then BASIC gets the next line by adding the
third byte of the first line to the temporary pointer. The temporary pointer
will now be pointing to the second line. Again the first 2 bytes of this new
line are compared to the requested line, and if they are less, the third byte is
added to the pointer. If a line number does match, the contents of the temporary pointer are moved into STMCUR and BASIC fetches the next token from the new line. Should the requested line number not be found, an ERROR 12 is generated.

The GOSUB involves more processing than the GOTO. The line finding routine is the same, but before BASIC goes to that line it sets up an entry in the Run Time Stack. It allocates four bytes at the end of the stack and stores a 0 in the first byte to indicate a GOSUB stack entry. It then stores the line number it was on when the call was made into the next two bytes of the stack. The final byte contains the offset in bytes from the start of that line to where the GOSUB token was found. BASIC then executes the line it looked up. When the RETURN is found, the entry on the stack is pulled off, and BASIC returns to the calling line.

The FOR command causes BASIC to allocate 16 bytes on the Run Time Stack. The first six bytes are the limit the variable can reach in 6-byte BCD format. The second six bytes are the step, in the same format. Following these, BASIC stores the variable number (MSB set) of the counting variable. It then stores the present line number (two bytes) and the offset into the line. The rest of the line is then executed.

When BASIC finds the NEXT command, it looks at the last entry on the stack. It makes sure the variable referenced by the NEXT is the same as the one on the stack and checks if the counter has reached or exceeded the limit. If not then BASIC returns to the line with the FOR statement and continues execution. If the limit was reached, then the FOR entry is pulled off the stack and execution continues from that point.

When an expression is evaluated, the operators are put onto an operator stack and are pulled off one at a time and evaluated. The order in which the operators are put onto the stack can either be implied, in which case BASIC looks up the operator's precedence from a ROM table, or the order can be explicitly stated by the placement of parentheses.

Pressing the BREAK key at any time causes the operating system to set a flag to indicate this occurrence. BASIC checks this flag after each token is
processed. If it finds it has been set, it stores the line number at which this
occurred, prints out a "STOPPED AT LINE XXXX" message, clears the BREAK flag and waits for user input. At this point the user could type CONT and program execution would continue at the next line.

SYSTEM INTERACTION

BASIC communicates with the Operating System primarily through the use of I/O calls to the Central I/O Utility (CIO). Following is a list of user BASIC calls and the corresponding operating system IOCB (Input/Output Control Block) setups.

IOCB=1BASIC uses a special put byte vectorin the IOCB to talk
directly to thehandler.

XIO 18,#6,12,0,"S:"

IOCB=6Command=18 (Special – Fill)Aux1=12Aux2=0

SAVE/LOAD: When a BASIC token program is saved to a device, two blocks of information are written. The first block consists of seven of the nine zero page pointers that BASIC uses to maintain the token file. These are LOMEM (80,81) through STARP (8C,8D). There is one change made to these pointers when they are written out: The value of LOMEM is subtracted from each of the 2-byte pointers, and these new values are written to the device. Thus the first 2-bytes written will be 0,0.

The second block of information written consists of the following token file
sections: (1) The variable name table, (2) the variable value table, (3) the
token program, and (4) the immediate mode line.

When this program is loaded into memory, BASIC looks at the OS variable MEMLO (2E7,2E8) and adds its value to each of the 2-byte zero page pointers as they are read from the device. These pointers are placed back on page zero and then the values of RUNSTK (8E,8F) and MEMTOP (90,91) are set to the value in STARP.

Next, 256 bytes are reserved in memory above the value of MEMLO to allocate space for the token output buffer. Then the token file information, consisting of the variable name table through the immediate mode line, is read in. This data is placed in memory immediately following the token output buffer.

Figure 10-2 OS and BASIC Pointers (No DOS Present)

IMPROVING PROGRAM PERFORMANCE

Program performance can be improved in two ways. First the execution time can be decreased (it will run faster) and second, the amount of space required can be decreased, allowing it to use less RAM. To attain these two goals, the following lists can be used as guidelines. The methods of improvement in each list are primarily arranged in order of decreasing effectiveness. Therefore the method at the top of a list will have more impact than one on the bottom.

Speeding Up a BASIC Program

Recode – Because BASIC is not a structured language, the code written in
it tends to be inefficient. After many revisions it becomes even worse. Thus,
the time spend to restructure the code is worthwhile.

Check algorithm logic – Make sure that the code to execute a process is as
efficient as possible.

Put frequently called subroutines and FOR/NEXT loops at the start of the
program – BASIC starts at the beginning of a program to look for a line
number, so any line references near the end will take longer to reach.

For frequently called operations within a loop use in-line code rather
than subroutines – The program speed can be improved here since BASIC spends
time adding and removing entries from the run time stack.

Make the most frequently changing loop of a nested set the deepest – In
this way, the run time stack will be altered the fewest number of times.

Simplify floating point calculations within the loop – if a result is
obtained by multiplying a constant by a counter, time could be saved by
changing the operation to an add of a constant.

Set up loops as multiple statements on one line – In this way the BASIC
interpreter will not have to get the next line to continue the loop.

Disable the screen display – If visual information is not important for a
period of time, up to a 30 percent time savings can be made with a POKE 559,0.

Use a faster graphics mode or a short display list – If a full screen
display is not necessary then up to 25 percent time savings can be made.

Use assembly code – Time savings can be made by encoding loops in
assembler and using the USR function.

Saving Space In A BASIC Program

Recode – As mentioned previously, restructuring the program will make it
more efficient. It will also save space.

Remove remarks – Remarks are stored as ATASCII data and merely take up
space in the running program.

Replace a constant used three times or more with a variable BASIC
allocates seven bytes for a constant but only one for a variable reference, so
six bytes can be saved each time a constant is replaced with a variable
assigned to that constant's value.

Initialize variables with a read statement – A data statement is stored in
ATASCII code, one byte per character, whereas an assignment statement requires
seven bytes for one constant.

Try to convert numbers used once and twice to operations of predefined
variables – An example is to define Z1 to equal 1, Z2 to equal 2, and if the
number 3 is required, replace it with the expression Z1 + Z2.

Set frequently used line numbers (in GOSUB and GOTO) to predefined
variables – If the line 100 is referenced 50 times, approximately 300 bytes
can be saved by equating Z100 to 100 and referencing Z100

Keep the number of variables to a minimum – Each new variable entry
requires 8 more bytes in the variable value table plus a few bytes for its
name.

Clean up the value and name tables – Variable entries are not deleted from
the value and name tables even after all references to them are removed from
the program. To delete the entries LIST the program to disk or cassette, type
NEW, then ENTER the program.

Keep variable names as short as possible – Each variable name is stored in
the name table as ATASCII information. The shorter the names, the shorter the
table.

Replace text used repeatedly with strings – On screens with a lot of text,
space can be saved by assigning a string to a commonly used set of characters.

Initialize strings with assignment statements – An assignment of a string
with data in quotes requires less space than a READ statement and a CHR$
function.

Concatenate lines into multiple statements – Three bytes can be saved each
time two lines are converted into two statements on one line.

Replace once used subroutines with in-line code – The GOSUB and RETURN
statements waste bytes if used only once.

Use cursor control characters rather than POSITION statements The POSITION
statement requires 15 bytes for the X,Y parameters whereas the cursor editing
characters are one byte each.

Delete lines of code via program control – See the advanced programming
techniques section.

Modify the string/array pointer to load predefined data – By changing the
value in STARP, string and array information can be saved.

Small assembly routines can be stored in USR calls – For example
X=USR(ADR("hhh*LVd"),16).

Chain programs – An example would be an initialization routine that is run
first and then loads and executes the main program.

ADVANCED PROGRAMMING TECHNIQUES

An understanding of fundamentals of ATARI BASIC makes it possible to write some interesting applications. These can be strictly BASIC operations, or they can also involve features of the operating system.

Example 1 – String Initialization – This program will set all the bytes of a
string of any length to the same value. BASIC copies the first byte of the
source string into the first byte of the destination string, then the second,
third, and so on. By making the destination string the second byte of the
source, the same character can be stored throughout the entire string.

Example 2 – Delete Lines Of Code – By using a feature of the operating
system, a program can delete or modify lines of code within itself. The screen
editor can be set to accept data from the screen without user input. Thus by
first setting up the screen, positioning the cursor to the top, and then
stopping the program, BASIC will be getting the commands that have been printed on the screen.

Example 3 – Player/Missile (P/M) Graphics With Strings – A fast way to move player/missile graphics data is shown in this example. A dimensioned string has its string/array area offset value changed to point to the P/M graphics area. Writing to this string with an assignment statement will now write data into the P/M area at assembly language rates.