LAUGHTON
ELECTRONICS

Cheap Video à
la Lancaster and
the back story re: my
KIM-1

At
the dawn of the microcomputer era, back in the days of the 8080 and
the Motorola 6800, a certain Pioneer of the Art published plans for a
remarkable microprocessor interface. Don
Lancaster
published a series
of books, including The
Cheap Video Cookbook and Son
of Cheap Video.
These
explain an unorthodox technique that
allows rudimentary microcomputers like the Altair and the KIM-1 to
generate video output.

The novelty of
Lancaster's approach impressed me deeply, and I took to heart an
important lesson: sometimes the most expedient way to solve a problem
is to
"lie" to the machine! (See below.)

Incidentally, the title
of my KimKlone article is a grateful acknowledgment to Mr Lancaster
and his humorous style:
( Lancaster's book: ) Cheap
Video
Cookbook
( Lancaster's book: ) Son
of
Cheap Video
(my spin-off article:) Bride
of Son of Cheap
Video

Cheap
Video and Lying To the Machine

Cheap Video is a means of
outputting video without the need
for DMA hardware or a Video Controller chip with dedicated video
memory. Instead, a portion of the existing system RAM serves as video
memory, and, instead of DMA, what's used is programmed
I/O. In other words, video is
generated
as the output of an actual
program running on the
computer. This would ordinarily be impossible
due to the very high data rate required, but Cheap Video slips
a joker in the deck — a simple hardware
trick (described later)
which fools the CPU.

At the heart of the video
program is a loop, and each iteration of this inner loop outputs one
row of pixels,
corresponding to one horizontal
sweep (or "scan") of the CRT/LCD
monitor. Each loop iteration begins with the CPU making a Jump To
Subroutine
(JSR)
to some address within a portion of memory you've chosen to
use as the video
buffer.
That's right — it jumps to an address where data
is stored!

Thanks to the unusual
hardware I mentioned, what the CPU "sees" in the buffer is not pixel
data.
Instead, those addresses appear to contain a dubious subroutine
composed of dozens of ORA
# $00
instructions. (Alternatively something like AND
# $FF
or CMP
# $C9
instructions could be used. The effect is the same — namly a
two-byte, two-cycle NOP.).
Naturally the CPU follows orders and executes these virtual NOPs. Then
the do-nothing subroutine terminates with an RTS.

What's noteworthy is that the
address
bus rapidly and steadily increments,
once per cycle, as
the
NOPs and the first two cycles of the RTS execute. In other words, the
CPU's PC register spends a
few dozen cycles behaving like an ordinary 16-bit
counter...
and, it is counting its way
through a selected portion of the video
buffer. This is the
"action" part of the sequence,
and the CPU is doing what we would want a DMA controller to do
—
quickly read a
series of bytes from memory. Each byte is
immediately sent to a shift
register that serializes the bits in order to output
the video bit stream.

After the RTS
occurs the spell is broken. We stop fetching from the video buffer, and
the RTS's return address takes us back to finish
the rest of the loop.We output a horizontal sync pulse (typically via
a parallel port
bit), we compute a new address to be used by the
next JSR, then
usually the loop reiterates. There's no
loop exit until there have been enough
scans (horizontal lines) to refresh the entire screen from top to
bottom — ie; one frame. To produce a continuous
succession of frames, the inner loop is wrapped in an outer loop that
ultimately outputs the Vertical
Sync pulse and
rolls the JSR address back to its top-of-the-screen value.

The scheme
just described
produces a bit-mapped display and no interlacing. (Interlacing can be
had by altering the software.) Character-based displays are also
readily possible.
One option is to throw hardware at the problem and install a
Character
Generator ROM. But that approach may not be
worthwhile, given that the same result can be obtained by software. You
can simply use the bit-mapped display
and have it updated by an assembly-language character-drawing
routine.

The sneaky trick
mentioned earlier is what causes the
CPU to see the buffer area as containing quasi-NOPs rather than what's
really there
(the video data). Here's how it's done:

Usually when a CPU sends
out an address, memory
will faithfully reply with the byte stored at that address. But with
Cheap Video a major connection — that between the data buses
— gets
temporarily severed. This lets Cheap Video "lie" about what's in
memory. (See the diagrams above, Business
as Usual
and Cheap
Video.) During the "action"
part of each scan, the
bytes fetched onto the
memory data bus don't get relayed back to the CPU's data bus. Instead,
the bytes (ie; the pixel data we needed to fetch)
get shipped off to the video display. Meanwhile, some Cheap
Video flimflam logic feeds the CPU bus a brazen fabrication, a
persistent ORA
# $00
(and eventual RTS) which appear to reside
at the addresses actually containing data.

Obviously there needs to
be a mechanism that cues hardware regarding when
to suspend reality and produce dummy
op-codes. Lancaster's version takes its cues from the values appearing
on the address bus.
A portion of the 64K map — perhaps 4K or 8Kbytes in
size — is recognized by the decode hardware as the video
buffer. When
scanning is enabled,
from the CPU point of view the entire buffer region is filled with
repeated
images from a 32-byte PROM containing mostly ORA
# $00 instructions plus
an RTS. From this, clever wiring and coding can yield 40- and
80-byte-wide displays, although of course power-of-two widths such as
64 are
easier. With all cheap-video schemes, proper scan timing depends
hugely on how the code is written — particularly
that the
execution time mustn't vary from one line to the next.

The KimKlone is not cued
by addresses. Instead of JSR, the inner loop uses a
KimKlone JSR variant (coded as opcode $33) to initiate the
scan. Scans
terminate
according to a VIA timer cycling at the horizontal frequency. Under
this system the
video buffer — or an array
of them — can reside anywhere in KK's 16
MByte space.

Lancaster realized
that a microprocessor is capable of burst-reads of memory,
sustained a rate of one byte every cycle, even though
conventional processing uses only sporadic accesses to small chunks of
data.
But prolonged sequences of memory reads do
occur as the chip fetches
the bytes of its program.
The CPU unwittingly
mimics a 16-bit counter or a DMA controller, with its address bus
outputting an ascending 16-bit count.

I am indebted to Mr
Lancaster for the lesson I learned
from Cheap Video, namely that a
microprocessor can readily be
manipulated by hardware tricks in order to produce unusual
behaviors that are useful.
The KimKlone, of course, relies
very heavily on this principle.

my KIM (the
original mashup)
and its mutant spawn, the KimKlone

My
very first
computer was a KIM-1 — the classic, 1-MHz
6502
board from MOS Technology. I hadn't had it long before I added some
extra RAM (2114's), a pair of 6522's, an ASCII keyboard & a
paper tape reader and, of course, Cheap
Video. But around 1980 I
switched the focus from video to
memory-space expansion. The
reason? On
the surplus market I'd acquired a DRAM board of 128K
capacity! I was
agog; I felt hypoxemic. This utterly outclassed my previous expansion
of 8K! And of course it was twice as much as the processor could
address.

I decided to
down-rate the new board to 112K, which allowed
the new memory, the pre-existing memory and the I/O space all to reside
within 128K.
Then I devised a circuit which recognized some of the undefined aka
"illegal" 65c02 opcodes and used them as cues to direct access
between "this" bank and "the other" bank. As with the KimKlone (which
came later), the banks
were
a full 64K in size. This
contrasts sharly with conventional expansion schemes, which are
restricted to a comparatively small "window" (eg, 16K) into the
expanded space. Bank
switches were impemented as transient events lasting less
than one instruction cycle;
my new circuitry had to manage its task on a bus cycle by bus cycle
basis.

If I recall
correctly, the deal with my KIM was that each
illegal op-code of the pattern xxxxx011 would cause the upper,
don't-care bits (the xxxxx) to select one of thirty-two 8-bit
patterns held in a TTL PROM, and
the selected pattern was parallel- loaded into a shift register and
regurgitated serially. The
xxxxx011 op-code acted as a prefix
instruction, and the shift
register
would trot out the corresponding pattern, one bit per cycle, while the
following instruction — the
target of the prefix
— executed. The target would be a normal 65xx memory
reference instruction such as INC Absolute, STA Indirect, CMP
Indirect-Y or whatever. The shift
register's serial output toggled a flip-flop feeding A16, the
most-significant address line. Typical timing patterns caused A16 to
flip from one 64K bank to the
other for a single bus cycle
only, exactly
during the time
the target instruction performed its fetch or store.
(Read-Modify-Write instructions used patterns that produced a three-cycle
bank switch.) There
were other capabilities as well: for instance you could JMP to
the alternate bank and stay there, or do a Far JSR and later a Far RTS.
The exact details escape me. But the 65c02's 64K address limit was
transcended by using undefined op-codes as prefixes to specify Far
addressing for legacy instructions.

The arrangement
I've described was perfectly functional, but a more elegant solution
would be to infer timing information directly from the target op-code.
The KIM circuit didn't even sample the target instruction; its behavior
depended solely on the prefix. So, instead of just a few prefixes, a
few sets
of prefixes had to be made available, with members of each set
identical except in regard to timing. That's how I was able to match
the timing of the CPU as it executes different target instructions
using different address modes. It seemed a shame to use all those
undefined op-codes so inefficiently, but with the KIM it didn't
really matter because there was nothing else that needed to be
controlled. Later the
KimKlone, a "clean sheet of paper" design, pushed the envelope a great
deal
further. See KimKlone
Short Summary