01234567890123456789012345678901234567890123456789012345678901234567890123456789
Videobrain Unwrapped
--------------------
061013
V0.05
Written by:
Kevin Horton
Description:
------------
The Videobrain Family Computer was an ill-received home computer and videogame
playing machine that was way ahead of its time. The system was designed in
1977 and only sold for about one year somewhere in the 1977-1979 time frame.
There was only a few games and application programs released for the system
before it died a premature death. With more advertising and programs, it
might've gotten more popular. A lack of programming and advertising doomed it.
---
Reproduction:
-------------
I got interested in the VB because I got a hold of a bunch of the ASICs for
the system by chance. Videobrain systems (from now on called "VB") are very
hard to come by. The last one on ebay was $120 or so and broken, and before
that, $400 for a complete system.
Since the systems are so rare and expensive, and I had some ASICs, I decided
to BUILD my own system! I had most of the chips and parts already to build it
so I bought some perf board, 2102 RAM chips, and proceeded to build it. It
took me about 4 days to complete it. I followed the schematic I found on the
internet.
After assembly, I went through and checked every connection with the
multimeter. I found 3 or 4 minor flubs which were fixed, and I tried running
it. I fired it up and the CPU was getting stuck somewhere. Before the CPU got
wedged, the address lines to the RAM chips were toggling as the CPU
tried to clear RAM out every reset. After the RAM clearing, it'd hang up and
not restart.
I managed to find the datasheet for the 2102 RAM chip (this was quite difficult)
and figured out that the schematic had Din and Dout flipped. Great. Moving 24
wire later, things looked better: RAM was only being cleared once now
and it was possible to see where the CPU was reading the signature byte to
skip the clear routine.
At this point, the CPU was still stopping. I hooked my logic analyzer up
(an Agilent 16700B with three 16750A acquisition cards).
Within about 5 minutes I could watch the CPU executing code, and see it
was dying when it tried to write to an ASIC register. RAM reading and writing
were working fine, halting the CPU and such, but when the ASIC was written
to it died and the CPU never woke up.
It took me a bit longer to figure out why THAT was happening. After playing
around a bit and checking out the spreadsheet that Sean Riddle produced which
had the circuit connections in it, it became obvious that the schematic had
yet another error. Pins 35 and 37 were shown swapped on the schematic.
I flipped these around and now the CPU was running without stopping.
I dropped a wire onto some ASIC pins and fed it into my monitor and some stuff
that looked like video was appearing on the screen. After a little dorking
around, I made an RGB video mod for the system.
---
Conventions in this document:
I will define several terms first which will hopefully make it easier to explain
what is going on.
MCLK - Master Clock. This is a 28.6363MHz clock from which all timing is
derived.
CPUCLK - CPU Clock. This is a 2.04MHz clock which runs the CPU.
BRCLK - BR Clock. Not sure what BR stands for. It runs at 3.579MHz.
buffered bus This is the buffered address/data bus connected to the
cartridge, SRAM, and UV201.
All timing is related to one of these three clocks. To make testing and reverse
engineering easier, I hacked my Videobrain "prototype" up so that both the CPU
clock and the UV202 clocks were derived off the same source.
The UV202 uses a 14.3181MHz crystal, and the CPU runs off a 4MHz crystal divided
by two via the wait state generator JK flipflop. Originally, the UV202 generated
a 14.3181MHz / 7 or 2.04MHz signal, but it stopped 1 cycle too late and this
caused issues with wait the CPU.
A bunch of "fixup" logic was provided externally to work around the issue.
Instead of the 2.04MHz signal, they went with a 2.0MHz clock derived from a
separate oscillator.
My hack was to start with a 28.6363MHz oscillator and divide it by 2, and feed
this 14.318181MHz signal into the UV202, and then divide it by 7 to produce a
4.09MHz signal which was fed into the wait state circuit in place of its
4.0MHz crystal.
This hack then ties both the CPU and UV202 timing together under a common clock
source. Hopefully this will make sussing out timing easier. It also gives my
logic analyzer a perfect clock to suckle off of which will keep the sample count
an exact multiple of the CPU and UV202 clocks.
---
Chip descriptions:
Videobrain ASIC UV202:
----------------------
The UV202 is the timing generator and bus transceiver that links the data bus
from the CPU with the buffered data bus. All of the video timing information
is generated by the UV202, and it also handles the CPU wait state stuff, but
this was broken somewhat in the ASIC so they did some external circuitry to fix
it.
I investigated it, and the UV202's clock that would've run the 3850 CPU runs
a cycle or so too long vs. the external hardware solution. I guess this was
too much and it'd hit some kind of timing glitch during a wait state. It
seems to accurately track the external clock otherwise.
This clock is approximately 2.045MHz, which is 14.318181MHz / 7. This somewhat
surprised me that they'd divide by 7 and not 8. Too bad that it is buggy. The
duty cycle was 50% so they are checking both edges of the input clock. This
is somewhat similar to divide by 3 on the Atari 2600 TIA that divides the
3.579MHz clock down to approx. 1.19MHz.
The 3850 is now run off an external 2MHz clock generated by a JK flipflop
running off a 4MHz crystal oscillator.
UV-202 pinout:
1 - UMIREQ1
2 - UMIREQ0
3 - CPUREQ0
4 - Xin
5 - Xout
6 - DMAREQ0
7 - CPU CLK
8 - WACK
9 - D0
10 - BD0
11 - D1
12 - BD1
13 - D2
14 - BD2
15 - D3
16 - BD3
17 - Hblank
18 - VBLANK
19 - Burst
20 - Csynch
21 - +5V
22 - +12V
23 - Scanline
24 - Field
25 - BD4
26 - D4
27 - BD5
28 - D5
29 - BD6
30 - D6
31 - BD7
32 - D7
33 - BRCLK
34 - COLCLK
35 - DMAREQ1
36 - RST
37 - /800-BFF
38 - GND
39 - CPUREQ1
40 - BISTROBE
1 - UMIREQ1
High when the secondary UV201 wants to DMA data from buffered bus for display.
It has low priority for display with respect to CPU accesses.
2 - UMIREQ0
High when the primary UV201 wants to DMA data from buffered bus for display.
It has low priority for accessing the bus with respect to CPU accesses.
3 - CPUREQ0
See the wait states section for information about this pin.
4 - Xin
5 - Xout
Normally connected to a 14.318181MHz crystal. This sets timing for all
video rendering and wait state generation. To drive this pin from an
external clock, you must connect an inverter from Xin to Xout, and then
drive the Xin pin. a 74HC04 is ideal.
6 - DMAREQ0
Pulses high to tell the primary UV201 that it is now OK to perform one DMA
from buffered bus.
7 - CPUCLK
Broken CPU clock output. This pin toggles at 2.04MHz, which is 14.3181MHz
divided by 7. It is a 50% duty cycle waveform. It does not quit soon enough
in the wait state, so it was fixed with external circuitry.
8 - WACK
Wait Acknowledge. Goes high to acknowledge the wait state. It will not
cause the CPU clock to continue until the falling edge of this signal.
It is used to gate the ASIC and SRAM writes when high.
9 - D0
11 - D1
13 - D2
15 - D3
26 - D4
28 - D5
30 - D6
32 - D7
These are the data lines that connect to a 2K ROM and the CPU and 3853 SRAM
interface.
10 - D0
12 - D1
14 - D2
16 - D3
25 - D4
27 - D5
29 - D6
31 - D7
These are the buffered data lines. They connect to the SRAM, UV201 data lines,
a 2K ROM, and the cartridge.
17 - Hblank
Goes high in horizontal blank.
18 - VBLANK
Goes high for 21 scanlines every 262 or 263 scanlines depending on if this is
the odd or even field. It is offset 1/2 scanline during odd fields. This signal
is used by the UV201 to synchronize rendering to the scan.
19 - Burst
Goes high every scanline immediately after the synch pulse, except for the first
9 scanlines VBLANK is high.
20 - Csynch
Composite synch line that goes high at the beginning of each scanline, except
in the Vblank region (see below).
21 - 5V
Connects to +5V supply
22 - 12V
Connects to +12V supply
23 - Scanline
This line is not used on the VB, but it toggles at the start of each scanline.
It has been useful during debugging.
24 - Field
This line toggles every field and reflects if the odd or even field is being
displayed. low = odd field, high = even field.
33 - BRCLK
This clock is a 3.579MHz clock that is generated by dividing the 14.3181MHz
clock down by 4. It clocks the UV201 and provides all timing information
to it.
34 - COLCLK
This clock appears to be an exact clone of BRCLK. It's the same phase and
frequency as BRCLK. It is used by the LM1889 NTSC encoder / TV modulator.
35 - DMAREQ1
Pulses high to tell the secondary UV201 that it is now OK to perform one DMA
from buffered bus.
36 - RST
Taking this low resets the chip. It is normally connected to a flipflop
so that reset goes high at the start of the frame. I think this was done
simply so that the reset button was a "reset once when pressed" signal
instead of continuous while the button is held down. This would be required
for the weird "top loading VCR" style cartridge slot, which resets the CPU
when a cartridge is inserted .
37 - /800-BFF input.
The external logic must pull this line low when the CPU attempts to access
800-BFF, which is the UV201 address range.
38 - ground
Connects to ground.
39 - CPUREQ1
See the wait states section for more information about this pin.
40 - BISTROBE
Pulses low to perform a read or write operation.
---
Videobrain ASIC UV201:
----------------------
The UV201 is the actual video rendering portion of the system. It fetches data
via DMA and then squirts it out its 4 video pins. It relys on the UV202 to
generate frame timing information for it. This chip holds ALL of the ASIC
registers.
7 - UV201-4 (D0)
8 - UV201-2 (D1)
9 - UV201-5 (D2)
10- UV201-3 (D3)
UV201 pinout:
-------------
1 - GND
2 - video D1
3 - video D3
4 - video D0
5 - video D2
6 - BA0
7 - BA1
8 - BA2
9 - BA3
10 - BA4
11 - BA5
12 - BA6
13 - BA7
14 - BA8
15 - BA9
16 - BA10
17 - BA11
18 - BA12
19 - +5V
20 - +12V
21 - BD0
22 - BD1
23 - BD2
24 - BD3
25 - BD4
26 - BD5
27 - BD6
28 - BD7
29 - Field
30 - EXT INT
31 - keypad column 8
32 - R/W
33 - VBLANK
34 - HBLANK
35 - BRCLK
36 - UMIREQ0
37 - BISTROBE
38 - RESET
39 - /CE
40 - DMAREQ
1 - GND
System ground
4 - video D0
2 - video D1
5 - video D2
3 - video D3
These 4 pins emit the colour. A table is provided below.
6 - BA0
7 - BA1
8 - BA2
9 - BA3
10 - BA4
11 - BA5
12 - BA6
13 - BA7
14 - BA8
15 - BA9
16 - BA10
17 - BA11
18 - BA12
These 13 pins are the buffered address bus. They are used to decode the write
address when the CPU writes to the UV201, and they also drive the buffered
address bus when performing a DMA read.
19 - +5V
5V power supply
20 - +12V
12V power supply
21 - BD0
22 - BD1
23 - BD2
24 - BD3
25 - BD4
26 - BD5
27 - BD6
28 - BD7
These 8 pins are the buffered data bus.
29 - Field
Connects to pin 24 of the UV202 for even/odd field determination.
30 - EXT INT
Drives the 3853 SRAM interface chip to perform interrupts.
31 - keypad column 8
Drives column 8 of the keypad (lol)
32 - R/W
37 - BISTROBE
39 - /CE
These three pins determine what is occuring to the UV201. When /CE is pulled
low, a CPU read or write is in progress. the R/W line will determine if this
is a read (low) or write (high). It is an inverted version of CPUREQ1.
BISTROBE pulses low when the UV202 is finishing its buffered bus cycle, and
lets the UV201 and SRAM know that a write should be taking place.
33 - VBLANK
34 - HBLANK
These two signals are generated by the UV202 and inform the UV201 where the
raster is.
35 - BRCLK
3.579MHz clock that runs the show.
36 - UMIREQ0
Pulled high by UV201 when it wishes to DMA data from the buffered bus.
38 - RESET
Low resets the chip.
40 - DMAREQ
The UV202 pulses this line high when the UV201 should perform its DMA cycle.
---
Frame timing:
-------------
Video on the system is indeed interlaced. The timing looks very good. It's
not exactly 100% NTSC timing compliant, but for 1978 it probably would've been
plenty good enough to be used in a broadcast environment.
They seem to have gone all out to make sure their signal is standards-complaint.
I suspect they were hoping it might be used at a TV or cable station for
showing info screens and such.
The way timing is internally generated is mostly done via a scanline counter
that counts by 262 or 263 depending on if the odd or even field is being
displayed.
I have used my logic analyzer (an HP 16700B) to suss out the exact timing, to
the cycle, of the VB's frame.
The system alternates between even and odd fields approximately 60 times a
second, and displays approximately 30 frames per second.
Exact cycle timings:
Odd Even
scanlines 263 262
clocks 59964 59736 (BRCLK's)
fields/sec 59.81
frames/sec 29.90
Scanline 228 (BRCLK's)
To perform the interlace, the hardware will render 262 scanlines for one field,
then 263 for the other, and then shift the vsynch period back or forth 1/2
scanline to align it at the 1/2 scanline point each field. This is common
and how most interlaced video is generated at the hardware level.
The exact number of clocks for the video is as follows:
Each scanline is 228 BRCLKs long, which incidentally is the same as the Atari
2600. It's a tad longer than it "should" be, but it works well enough.
During a normal scanline:
CSYNCH goes high during cycles 0-17 (18 total).
BURST goes high during cycles 21-29 (9 total).
HBLANK goes high during cycles 222-227 and 0-32 (39 total).
HBLANK spans the end of the previous line to the beginning of the next line.
This makes the total visible area on the screen 189 BRCLKs long.
During the vertical blanking interval, timing is a bit different. Basically,
it is composed of the following:
--- (odd field)
3 scanlines of vsynch
3 scanlines of equalization pulses
244 normal scanlines
2.5 scanlines of equalization pulses
--- (even field)
3 scanlines of vsynch
3 scanlines of equalization pulses
243.5 normal scanlines
3 scanlines of equalization pulses
These two alternate back and forth every field (approx. 60 times a second) to
perform the interlacing.
The TV synchronizes off of Vsynch, so if you count scanlines starting with
it, each field will contain 252.2 scanlines exactly.
CSYNCH goes high TWICE during scanlines which contain equalization pulses:
It goes high on cycles 0-8, and again at cycles 114-122. These pulses are only
9 clocks wide instead of 18 on a normal scanline.
Vsynch is different yet again: CSYNCH is high for the majority of the scanline
and pulses low twice. Unlike the equalization scanlines, its low pulses are
18 clocks wide (same as on a normal scanline, but inverted). It sends these
pulses out on cycles 105-113 and 219-227.
NOTE: even though video is interlaced, as far as the rendering is concerned, it
is progressive: each field shows the same data, so there are only around 262
addressable scanlines instead of the full 525.
---
Wait States
-----------
The UV202 performs all the wait stating. When the CPU wishes to access the
buffered bus, the CPU is stopped and waits are inserted. The quantity of wait
states inserted varies greatly depending on what peripheral the CPU is
accessing.
For this timing, I replaced the 28.63MHz oscillator I was using with a 1MHz
one. The reason for this is I was being buffaloed by propagation delays in the
chips and connections, which made sussing out the exact timing very difficult.
The following is what happens inside the UV202 when no propagation delays are
taken into account. This will be cleared up later. All of the following timing
is relative to the state of the pins on the UV202, vs. the inputs of the D-flops
that synchronize the signals (UMIREQ0, CPUREQ0 and 1).
---
The pins related to wait states are:
UVIREQ1 (pin 1) - High when secondary UV201 wants to DMA data from buffered bus.
UMIREQ0 (pin 2) - High when the UV201 wants to DMA data from buffered bus.
CPUREQ0 (pin 3) - High when the CPU writes anywhere in 800-1FFF (ASIC, RAM,
cart), or reads from C00-1FFF (RAM, cart).
DMAREQ0 (pin 6) - The UV202 pulls this high to let the primary UV201 perform DMA.
WACK (pin 8) - Wait Ack. Goes high when the UV202 is performing a CPU based
read or write. The falling edge restarts the CPU clock.
DMAREQ1 (pin 35) - The UV202 pulls this high to let secondary UV201 perform DMA.
/800-BFF (pin 37) - Low when the CPU is accessing 800-BFFh, high otherwise.
CPUREQ1 (pin 39) - High when the CPU reads from 800-1FFFh. (ASIC, RAM, cart)
BISTROBE (pin 40) - Pulses high in the middle of every buffered bus access
This truth table should make it a bit clearer what happens on each pin
during a specific kind of access:
event: UMIREQ0 CPUREQ0 CPUREQ1 /800-BFF
-------------------------------------------------
UV201 Write x 1 0 0
RAM Write x 1 0 1
UV201 Read x 0 1 0
RAM Read x 1 1 1
Cart Read x 1 1 1
DMA Read 1 0 0 0
x = don't care
---
Let's start with the basics. The CPU is executing code out of the cartridge,
and the UV201 is performing periodic DMA requests.
CPU reads (cart/RAM) and DMA reads:
UV201 DMA requests come either singly or in bursts. A burst is simply N
bytes fetched in a row by the UV201. The CPU can interrupt a burst at any
time, and data is read fast enough even with these interruptions to produce a
glitch free display.
Interestingly, the UV202 seems to be set up to handle TWO UV201's at the same
time, as described in the patent. If both UV201's request data at the same
time, the UV202 will alternate between servicing them.
CPU read requests only come every now and again, only once every 3 DMA requests
maximum. If a CPU read and a DMA request both occur simultaniously, the
CPU read is serviced first followed by the DMA.
At the start of any bus access (CPU read or DMA fetch), a 1 BRCLK penalty
occurs. If a DMA read follows a CPU read, a 1 BRCLK penalty is inserted.
Interestingly, if a CPU read follows a DMA read, no penalty is inserted.
In the event a CPU read and a DMA fetch both occur at the same time, a double
setup penalty occurs:
(all cycles are 1 BRCLK each)
1 penalty
3 cycle CPU read
1 penalty
3 cycle DMA read
3 cycle DMA read
3 cycle DMA read
Otherwise, the following type of thing occurs:
1 penalty
3 cycle DMA read
3 cycle DMA read
3 cycle CPU read // this CPU read appears to incur no penalty
1 penalty
3 cycle DMA read // but the following DMA does
3 cycle DMA read
3 cycle CPU read
1 penalty
3 cycle DMA read
The CPU read above appears to incur no setup penalty, but it IS there. The
DMA access will only be replaced by a CPU read if CPUREQ went high on the
rising edge of BRCLK leading into the last cycle of the DMA request.
This means that during DMA fetching, the CPU will see a 4, 5, or 6 BRCLK wait
while the DMA finishes. Depending on video and CPU synchronization, one of
these maybe favored over another.
---
UV201 register reads:
reads from the UV201 vary a bit. These take 6, 7, 8, or 9 BRCLKs with 1 setup
BRCLK. There appears to be a modulo 4 counter that continuously runs, clocked
off BRCLK. The read sequence is delayed after it starts 1, 2, or 3 extra
BRCLKs depending on the value of this counter.
I have confirmed this behaviour by recording the start cycle of many UV201 reads
and comparing them. When I wrote down the start MCLK cycle of each read and
divided it by 8 (to get BRCLKs) then modulo'ed by 4, the results are clear:
0: 9 BRCLKs
1: 8 BRCLKs
2: 7 BRCLKs
3: 6 BRCLKs
This is fairly definitive proof of a divide by 4 effect going on. When HBLANK
goes high, the divide by 4 contains 3.
Most likely what happens is the read sequence will delay at some point while
it waits for this divide by 4 to contain 00b or 11b.
The usual conditions of the read during a DMA are in effect as before. The
only difference is the read is stretched out from 3 BRCLKs to 4, 5, 6, or 7.
Total number of BRCLKs will be 5, 6, 7, 8 (no DMA) and 9 or 10 is possible
with DMA.
---
UV201 register writes:
These follow the exact same conditions as UV201 reads, above.
---
RAM writes:
These follow the exact same conditions as cart/RAM reads, above.
---
So the above is the "ideal". What happens on the real console, however is
slightly different because of propagation delays. When running the system
28x slower than normal speed, the CPU actually is halted less (even though the
CPU clock is also derived from this same 1MHz clock!) The reason is due to
propagation delays.
The amount of time the CPU spends halted during normal operation at normal
operating speed is as follows:
Total "sunk cost" for my particular Videobrain are as follows:
CPUCLKs UV201R/W during DMA
4.0 (4 BRCLKs)
4.5 X X (5 BRCLKs)
5.0 X X (5 BRCLKs) mine alternated between 4.5 and 5.0
5.5 X X (6 BRCLKs)
6.0 X X (7 BRCLKs)
6.5 X X (8 BRCLKs)
7.0 X (9 or 10 BRCLKs)
(the MCLK values were from falling edge of the last CPUCLK to falling edge of
the first after the halt, so values shown are minus 1 CPUCLK.)
The above is approximately correct. The amounts vary by 1/2 clock depending
on exact synchronization, and they will change plus or minus 1/2 clock
on a normal system due to the difference in frequency of the 2.0MHz CPUCLK
and the 14.318MHz UV202 clock.
This means cycle counted code is not possible on the Videobrain, due to the
random nature of the possible wait states. Every frame, the amount of waiting
will be different due to alignment of the two clocks and code relative to
these clocks. Temperature and voltage variances, phase of the moon, etc.
will conspire to make sure that these timings will never, ever perfectly line
up from one run of a program to the next.
Total "sunk cost" for RAM/cart reads and writes is 4.0 CPUCLKs, and during
DMA, the total increases in line with the above chart.
This pretty much sums up Videobrain wait states to my satisfaction. There's
not much more to it.
---
Address Space:
--------------
Now that the UV202 is adequately described, it's probably a good idea to look at
the address space. This will make things easier to grasp when it comes to
graphics rendering.
Because there are TWO busses, I will describe both in turn, starting with the
CPU bus.
CPU Bus:
--------
This is what the CPU sees. It can access any peripheral in the system with 0 to
10 BRCLK waits (0 or 4.0 to 7.0 CPUCLKs) for each access.
0000 - 07FF (2048) RES1 ROM. The BIOS ROM maps here.
0800 - 08FF (256) UV201. The UV201's registers are visible here.
0900 - 0BFF (768) Cartridge mapped (see below)
0C00 - 0FFF (1024) System RAM (8x 2102 SRAM chips)
1000 - 1FFF (4096) Cartridge ROM is mapped here, either 2K or 4K worth.
2000 - 27FF (2048) RES2 ROM. The second BIOS ROM maps here.
2800 - 28FF (256) ASIC mirror
2900 - 2BFF (768) Cartridge mapped mirror
2C00 - 2FFF (1024) System RAM mirror
3000 - 3FFF (4096) Cartridge ROM mirror
4000 - 7FFF (16384) Mirrors of 0000-3FFF
8000 - BFFF (16384) Mirrors of 0000-3FFF
C000 - FFFF (16384) Mirrors of 0000-3FFF
The upper 2 address lines from the CPU are not used, so it is not surprising
that 4000-FFFF "mirror" the mapping of 0000-3FFF. "Mirroring" is sometimes
called "aliasing", and basically means the same peripheral or device is visible
in multiple places in the address space due to incomplete decoding.
I found it highly interesting how the RES2 ROM is mapped in. Also, 8K games
could've been made for the system if BA13 were used as an upper address line
to select between the two 4K pages.
The amount of penalty cycles per read or write of the CPU space is as follows:
0000 - 07FF : 0 BRCLKs (RES1 ROM)
0800 - 0BFF : 6-10 BRCLKs (UV201 and external device)
0BFF - 1FFF : 4-6 BRCLKs (SRAM and cart ROM)
2000 - 27FF : 4-6 BRCLKs (RES2 ROM)
2800 - 2BFF : 6-10 BRCLKs (UV201 and external device)
2C00 - 3FFF : 4-6 BRCLKs (SRAM and cart ROM)
4000 - FFFF mirror the above sequence 3 more times as in the normal address
decoding.
* Cartridge mapped area:
The area at 0900-0BFF (and again at 2900-2BFF) is interesting, because it is
decoded with the UV201, and can be used to place more SRAM on the system or
even map a small ROM here. It incurs the full wrath of the wait state generator,
however, with 6-10 BRCLKs. But it is easily possible to map SRAM here. I wonder
if the APL cartridge does this? It'd be useless without more RAM otherwise I'd
think.
Pin 29 on the cartridge will go low when this range is being accessed. Pin
40 is then usable as R/W. Little more than 2 NAND gates can be used to hook up
some more SRAM on the bus, in a similar fashion to the existing 2102's.
Buffered Bus:
-------------
Only the UV201 can see this bus completely, the CPU gets a modified view of it.
The buffered bus is only 8K deep, and only uses BA0-BA12. BA13 *is* generated
but only is connected to the cartridge connector. It could be used to make 8K
sized cartridges.
0000 - 07FF (2048) RES2 ROM. The second BIOS ROM maps here.
0800 - 08FF (256) open bus *
0900 - 0BFF (768) open bus *
0C00 - 0FFF (1024) System RAM.
1000 - 1FFF (4096) Cartridge ROM.
* Open bus - nothing is mapped here, and typically the last thing on the bus
(or something else) will be present if the UV201 DMAs from here. On my perf
VB, I have pulldown resistors (this is to help the logic analyzer) so I get
00h mapped in here. On a real unit, the contents will be very random depending
on last access, bus noise, phase of the moon and other variations.
This means it is possible for a cartridge to map another 1K of graphics here,
theoretically.
Only things on this bus can be used by the UV201 during graphics rendering, and
the UV201 cannot pull graphics out of say, the RES1 ROM directly. But it CAN
pull graphics directly out of the cartridge ROM. The game Gladiator was caught
pulling graphics directly out of the cartridge ROM when I had it on the logic
analyzer.
The UV201 CANNOT read anything at 0800-0BFF, even though the UV201 and the
cartridge enable are connected here. The WACK line on the UV202 does not go
high when the DMA is in progress, and this line is used to enable the UV201
for register access and the cartridge device at 900-BFF. Thus, these two things
are not visible during graphics rendering.
---
UV201 Specifics:
----------------
First off, the colour table. When the "color" test screen is used, the
following is the order (related to the 4 video pins above) that they are
displayed:
F - black
8 - light grey
9 - dark yellow (orange)
A - dark magenta
B - dark red
C - dark cyan
D - dark green
E - dark blue
7 - dark grey
0 - white
1 - yellow
2 - magenta
3 - red
4 - cyan
5 - green
6 - light blue
8
F
E
D
C
B
A
9
0
7
6
5
4
3
2
1
I have compared this with the only screen shot that exists, and it appears to
follow this pattern, but dark yellow seems to actually be orange. The circuit
also inverts the 4 video lines before using them, so this is why black is the
all 1's condition- because then all 4 outputs of the inverters go low. Same
with white, it's the all 0's condition, which gets inverted to make all white
video.
After reading the patent, I know why this was done. Two or more(!) UV201's
can be used in parallel if you wish to have more than 16 objects on the screen
at once! The patent describes having another UV201 in a cartridge, and now
a bunch of the connections on the cartridge make sense!
When the primary UV201 is showing black pixels, the video bus is all 1's, and is
open collector, so a secondary UV201 can then pull down these pins and substitute
some new pixel data in place of the all 1's black pixels.
These registers do not correspond to any particular displayed object- any of
these objects can be selected via registers 0870-088F!
----
0800-080F : Object pointer LSB
0810-081F : Object pointer MSB and colour
0820-082F : Object X size, intensity and Xcopy
0830-083F : Object Y size
0840-084F : Object X position
These registers are the true 16 objects- these objects reference the above
5 banks of register, via the "X order" bits (see register descriptions below)
----
0850-085F : Object Y position LSB for list A
0860-086F : Object Y position LSB for list B
0870-087F : Object Y position MSB and X order for list A
0880-088F : Object Y position MSB and X order for list B
Control and status registers
----
08F0 : Y interrupt register
08F2 : Final modifier
08F5 : Background register
08F7 : Command register
08F8 : X-freeze register
08F9 : Y-freeze LSB
08FA : Y freeze MSB and odd/even
08FB : Current Y LSB
0800-080F Object pointer LSB
0810-081F Object pointer MSB and colour
----------------------------------------
These two registers together form a 13 bit pointer and 3 bits for colour.
0810 0800 xdelta)
{
cycles = cycles + 2; // inserting pix count costs 2 cycles
writefifo(xdelta); // write it to the fifo
xdelta = spritex;
usegap = 1;
}
totalcycles = cycles;
// if sprite width is 8 pixels, and there's no blank pixels in front,
// and no waitstates, then add 2 to totalcycles
if (
(spritewidth[spritenumber] == 1) &&
(usegap = 0) &&
(waitstates = 0)
) totalcycles += 2;
delay (cycles); // delay the right number of cycles
----
What happens is the cycle count ("cycles") will be 18-26 after the execution
of this code (and the fifo might have a background pixel count in it if xdelta
was smaller than the first visible xcoordinate of the sprite)
The totalcycles variable is used to determine where the UV201 changes from
dividing by 2 to dividing by 4 on the DMAs. I am not sure why it does this,
but it sure does. Maybe it uses the cycles to calculate the next set of
visible sprites on the next scanline?
I am not sure why that fixup is needed, when those three conditions are
true, but it is. It might be something inside that is ready preferentially
and no longer is ready if the DMA is not as short as possible.
After this, fetching is more regular and it occurs via a loop that checks the
rest of the sprites for visibilty.
----
for (;spritenumber < 16; spritenumber++)
{
if (spritex[spritenumber] == xdelta)
{
for (i = spritewidth[spritenumber]; i > 0; i--)
{
writefifo(spritedata); // write fetched data
xdelta += 8; // update xdelta too
}
}
cycles = (3 * spritewidth[spritenumber]) + waitstates + 15;
if (cycles & 1) cycles++; // round up
totalcycles += cycles; // add to our total
if (totalcycles == 46) totalcycles += 2; // breakover point
// at breakover point, round cycles up by 4 instead of 2.
offset = 0;
if (totalcycles > 48) offset = (totalcycles & 2);
totalcycles += offset;
cycles += offset;
// go to next visible sprite
while((spritenumber < 16) && (usesprite(spritenumber) == 0)) spritenumber++;
// if there's a gap, process it
if (spritex[spritenumber] > xdelta)
{
cycles = cycles + 2; // inserting pix count costs 2 cycles
totalcycles += offset;
writefifo(xdelta); // write it to the fifo
xdelta = spritex;
usegap = 1;
}
delay(cycles); // delay the right number of cycles
}
----
Hopefully I can explain a bit more on what's happening. The amount of time
it takes to perform the DMA is calculated via the following formula:
dmacycles = (3 * spritewidth) + 1
I rolled this into the calculation above by adding 14 to this value because
there's a 14 cycle delay imposed by some internal operations of the UV201.
The UV201 seems to process stuff every 2 BRCLKs, until cycle 48 is reached.
At this point, it starts processing stuff only every 4 BRCLKs. This causes
weirdities in the cycle counts.
If the UV201 must insert blank background pixels between sprites, this costs
2 cycles, which are added on AFTER calculating the delay of a multiple of 4
cycles (if totalcycles > 48).
----
I have written a QB64 (a open source Qbasic replacement that runs on modern
PC's) implementation that reproduces all of the quirks seen in these tests.
It performs around 1800 tests with the above data. I will make this available
to those who wish to dork around with it.
I have left out lots of finer details above, like the FIFO stalling the fetcher
if it is full. These need to be emulated. The basics of this aren't too hard,
however. The FIFO is deemed full when it has 10 entries, and the fetching
engine stops until the FIFO has 8 or less entries in it, whereby it starts
running again, and filling it to a maximum level of 10 entries.
If rendering "spills over" from one scanline to the next, what happens is
the FIFO is instantly cleared on the rising edge of HBLANK, and xdelta is
reset to 0. The rendering engine continues from this point, wherever it left
off.
----
Double X width:
---------------
This is purely a rendering phenomena. The pixel clock to the rendering end
of the FIFO is halved. The result is that the screen is now only effectively
114 pixels wide instead of 228. The number of displayable pixels is less
than this, however.
What this means is that every pixel will be duplicated. The background pixels
are duplicated also. This means that if you position several 8 pixel wide
sprites like so (X coords):
00h, 10h, 18h
They will be rendered at pixels 00h, 20h, and 30h. The first sprite will
occupy Xcoords 0-fh, there will be 10h blank pixels, the second sprite will
occupy pixels 20-2fh, and the third will occupy 30-3fh.
The data fetching end has no clue this is going on and cannot "see" this
happening, other than the FIFO fills up a bit faster since it is not being
empty as quickly.
----
Double Y width:
---------------
This is weird. I *think* I figured out what is going on here. It appears that
the UV201 uses a 1 bit register to help determine when it has to double the
height of a sprite. The way this seems to work is as follows:
When a sprite is marked in range, an "in range" bit is set, and the state of
the scanline's lowest bit is saved into a second "height" bit.
After the sprite is rendered and the last byte is fetched, the "height" bit is
XOR'd with the lowest bit of the scanline counter, and it is XOR'd with the
double height flag in the command register.
If the result is a 1, the memory pointer will be written back to the pointer
register for sprite just rendered. If the result is 0, nothing is written.
So, in normal operation the following occurs:
on scanline 100 (for arguement sake), sprite 0 will be shown in single height
mode.
* On scanline 99, we determine that sprite 0 will be shown on scanline 100. The
"sprite in range" bit is set for sprite 0, for use on the NEXT scanline.
The "height" bit is set to 1, which is (99 & 1).
* On scanline 100, sprite 0 will be shown. It is fetched and stuffed into the
FIFO. After the last byte is fetched, we XOR the height bit with the lowest
bit of the scanline counter: i.e. result = (100 & 1) ^ height.
This gives us a "1" bit. We XOR this with the bit in the command register:
result = result ^ (command_height_bit).
* If the result is "1" then we update the sprite's pointer, and reload the
height bit with (scanline & 1).
Repeating the above again on the NEXT scanline...
* On scanline 101, we load sprite 0 data into the FIFO as before. We then
do: result = (scanline & 1) ^ height. This gives us again, a result of 1.
We then XOR this with the command height bit as before, which gives us
a result of 1 again, so we must write the pointer back and update the height
bit again.
This repeats for all scanlines of a displayed object.
The height bit is effectively toggled every scanline in this case.
In the case that the double height bit is set, the following occurs:
* On scanline 99, we determine sprite 0 will be shown on scanline 100. The
"sprite in range" bit is set like before. The height bit is updated too,
which will be 1.
* On scanline 100, sprite 0 is shown as before. After it is fetched, we
calculate this again: result = (scanline & 1) ^ height, which will be 1
now. We then XOR it with the command register's double height bit. We
get a "0" now.
* Since the result was 0, we DO NOT update the pointer in the sprite regs.
At this point, we have shown one line of the sprite. On the next scanline...
* On scanline 101, sprite 0 is shown. After it is fetched we calculate again:
result = (scanline & 1) ^ height ^ command_height_bit
* This time, the result will be 1, so NOW we update height with (scanline & 1)
and write the pointer back to the sprite's pointer registers, thereby
advancing to the next scanline.
Every time the pointer is updated on the sprite registers, the sprite's Y
height is decremented. After decrementing, if the Y height is 0, then the
sprite's in range bit is cleared, ending rendering of the sprite.
I am pretty sure the above is how it works, because when rendering spills past
the end of the scanline, an excessively wide sprite will suddenly become
double height, because the height bit does not get checked until the NEXT
scanline, resulting in double height.
01234567890123456789012345678901234567890123456789012345678901234567890123456789