This document is in a very preliminary state and is subject to change. Everything within has been tested and verified on a TurboGrafx-16 console, but please be aware that my testing methods or interpretations of results could be flawed. I can't guarantee that everything is 100% accurate.

At the moment, some parts of this document are simply a compilation of notes and test results, while others are detailed descriptions of the hardware. I'll try to get everything coordinated as time progresses.

- Block transfer instructions push Y, A, X to the stack in that order, and then pop X, A, Y from the stack in that order when finished.

- For the alternating block transfer instructions (TAI and TIA), they alternate the source or destination address by adding and then subtracting one; not by inverting bit 0 of the address.

- The length parameter to a block transfer instruction specifies the number of bytes to transfer. For example, $0010 will transfer 16 bytes, and $0000 will transfer 64K bytes, not zero.

- Block transfer instructions cannot be interrupted. If an interrupt is supposed to occur, it occurs once the instruction finishes.

- When using any block transfer instruction to read addresses $0800 through $1400 in the I/O page, the value zero is always returned for every address, regardless of the CPU speed. (So you can't read the joystick port, timer, or IRQ registers) The I/O buffer is not changed either.

Writing to the same range of addresses using the block transfer instructions will work, and the I/O buffer will be modified.

- Stack and zero page operations always use logical addresses $2000-$21FF. For example, ROM data can be read by using instructions that access the zero page or stack.

- On power-up, MPR 7 is set to zero, and the other MPRs are loaded with random values.

- The TMA instruction transfers the contents of an MPR register to the accumulator. Bits 7 to 0 in the operand specify which MPR register to read from, bit 7 is MPR #7 and bit 0 is MPR #0.

If an operand of $00 is used, the accumulator is loaded with the last value that was written with TAM (only if one or more of it's operand bits were set), or the last value that was read with TMA. I think the CPU treats zero as a 'no change' value and the MPR selecting logic isn't updated from the last time it was set.

If multiple bits are set in the operand to TMA, the values from several MPRs are combined together and returned. However, I have not figured out exactly how this works.

- The TAM instruction transfers the contents of the accumulator to an MPR register. Bits 7 to 0 in the operand specify which MPR register to write to, bit 7 is MPR #7 and bit 0 is MPR #0.

If an operand of $00 is used, none of the MPR registers are written to. This does not change the last MPR value that can be read by TMA #$00.

If multiple bits are set in the operand to TAM, each MPR register selected is loaded with the accumulator. For example, an operand of $FF would load all MPR registers.

- When an interrupt occurs (I've tested the timer and IRQ1), P is pushed with the current state of D and T. Within the interrupt subroutine, the CPU clears D, T and sets I, preventing further interrupts from occuring.

- The B flag is set at all times. The only exception is when an interrupt occurs, (I've tested the timer and IRQ1) in this case the value of P pushed to the stack has B cleared. (but B is set if P is read again within the interrupt subroutine) The BRK instruction pushes P with B set.

- BRK pushes the return address plus one to the stack; the next byte after the BRK instruction is always skipped.

- The CSL and CSH instructions change the CPU's clock speed. CSL selects low speed mode which is 1.78 MHz, CSH selects high speed mode which is 7.16 MHz. On power-up the CPU is in low speed mode.

CSH and CSL take 3 cycles each, but that was tested with the CPU already set to the respective clock speed. It currently isn't known if either instruction takes more or less time when switching between different speeds.

- On power-up, the timer count is set to zero and the IRQ disable mask is set to zero.

- A branch instruction that crosses a 256-byte or 8192-byte boundary does not take any additional cycles.

- An indirect JMP instruction with the low byte of the address set to $FF will correctly read the high byte at the next address, instead of wrapping to address 0 like the 6502 does. (so jmp [$FEFF] reads the MSB from address $FF00, not $FE00)

- Illegal opcodes are treated as a NOP, and take 2 cycles each. They do not change the state of the A, X, Y, S, or P registers, with the exception that the T flag will be cleared if set prior to executing an illegal opcode (check the section on the T flag for more information).

NMI - Not used. It isn't connected to anything internally nor is it available on any connector.

IRQ2 - Available on the HuCard and backplane connectors. It's used by the CD-ROM's ADPCM hardware, and the BRK instruction also uses the IRQ2 vector.

IRQ1 - Connected to the VDC.

Timer - Generated by the HuC6280's internal timer. The patents mention an external input used to test timer interrupts, but I believe this isn't used in the TurboGrafx-16.

Interrupts can be disabled through the CPU, the I flag of the P register disabled interrupts (except NMI, BRK) when set. In addition, there are four registers, two of which are usable, that control interrupts:

$1400 : Writes do nothing, reads return the I/O buffer contents.

$1401 : Writes do nothing, reads return the I/O buffer contents.

$1402 : Bits 2-0 are interrupt enable/disable bits, which can be read as well as written.

The enable/disable register does not actually stop the interrupt from occuring; for example if the VDC asserts the IRQ1 line, and bit 1 of $1402 is set, then an interrupt isn't generated. But you can still read the state of the IRQ1 line through $1403, and bit 1 would be set in this case.

All interrupts need to be acknowledged. If not, the interrupt in question occurs after every instruction that is executed (unless the I flag or interrupt disable registers are used)

Data written to $0C00 is copied to a 7-bit latch. When the timer is enabled, a 7-bit counter is loaded with the contents of the latch. The counter is decremented once every 1024 clock cycles, and the timer interrupt request line is asserted when the counter underflows from zero to $7F. (not when the timer goes from one to zero) However, when this occurs the counter is reloaded, so after reading zero from the timer registers you will read the value that's been reloaded into the counter, not $7F.

The interrupt can then be acknowledged by writing any value to $1403. If this is not done an interrupt will occur after every instruction.

Reading $0C00 or $0C01 returns the current value in the counter, or if the timer is disabled, then the last value the counter had prior to it being disabled. If the timer is disabled and then enabled again, it is reloaded with the last value written to $0C00.

When the timer expires, it is reloaded with the last value written to $0C00. The timer begins to count down immediately, it does not wait for the interrupt to be acknowledged. (so the timer is reloaded and counts down within the timer interrupt routine)

The HuC6280 assigns the T flag to bit 5 of the processor status register. It allows all forms of the ADC, AND, ORA and EOR instructions to be processed differently, while the other instructions execute normally.

When the T flag is set, the accumulator is replaced with a zero page memory location indexed by the X register. The operation defined by the instruction is performed using the memory location as one operand, and the effective address as the other. The result is stored in the memory location, leaving the accumulator undisturbed.

It seems that the T flag is cleared each time the CPU fetches an instruction. For example, both BRK and PHP (which save the status of P on the stack) always push it with the T flag cleared when prefixed by SET. T can be set by the SET instruction, or by pushing a byte with bit 5 set and pulling it into P via PLP (in which case the instruction after PLP will be affected as if SET came before it).

If you want to use this feature with ADC for processing BCD numbers, you must execute SED before SET, otherwise the T flag will be cleared.

Like PLP, RTI keeps the T flag set. If an interrupt occurs after a SET instruction which causes P to be pushed with T set, or if the stack is manipulated to have T set, the instruction following the return address used by RTI will be affected as if it was prefixed with SET.

The Mitsubishi 740 series of 65C02-derived CPUs also have a similar feature, with some differences. The T flag remains set until cleared with a CLT instruction or when P is modified. It also supports SBC, LDA and CMP as valid instructions to use with the T flag.

The TurboGrafx-16 has a 21.47727 MHz master clock which is used to drive several components:

- Divided by three in high speed mode (7.16 MHz) or twelve in low speed mode (1.78 MHz) to provide the CPU clock. This is controlled by the CSL and CSH instructions.

- Divided by three to run the timer. (7.16 MHz)

- Divided by six for the PSG clock. (3.58 MHz) The PSG patents say the clock is 7.16 MHz, but all of the formulas for determining frequency multiply the result by two, effectively making the clock 3.58 MHz.

1. The last value read from or written to $0800-$17FF is saved in an internal 8-bit buffer. Reading $0800-$17FF will return this value, though readable locations will modify certain bits in the buffer. Here are some details:

3. When accessing the VDC or VCE, an additional cycle is taken. This occurs for reads and writes (regardless of addressing mode) and instruction execution. (e.g. jsr $0002) I figured this had something to do with the VDC and VCE being external to the CPU, however the CD-ROM registers are not affected.

The 2-button controller has a four-way directional pad and four buttons: Select, Run, II and I. A multiplexer is used to determine which values (directions or buttons) are returned when D3-D0 are read. The SEL line of the I/O port selects directions when high, and buttons when low. The state of D3-D0 are inverted, so '0' means a switch is closed and '1' means a switch is open.

Games use a small delay after changing the SEL line, before the new data is read (a common sequence is PHA PLA NOP NOP). This ensures the multiplexer has had enough time to change it's state and return the right data.

When the CLR line is low, the joypad can be read normally. When CLR is high, input from the joypad is disabled and D3-D0 always return '0'.

Turbo Tap details:

The Turbo Tap is a 5-player adapter that plugs into the joypad port. It allows five controllers to be read in a serial fashion, one after the other. This is handled by an internal counter that is incremented each time there is a zero-to-one transition of the SEL line while CLR is zero.

The counter can be reset by holding SEL high and doing a zero-to-one transition on CLR. At this point, you can then strobe SEL five times to read each controller. Once all five controllers have been read, the Turbo Tap will return $00 in D3-D0 until the counter is reset again. Unconnected controllers always return $0F in D3-D0.

There is also a quirk in how the data is returned during the reset sequence. When $01 is written to $1000, D3-D0 are returned as zero, even though the CLR line isn't high. When $03 is written to $1000, D3-D0 now return the direction pad data for controller #1, although now the CLR line is high and should disable joypad input.

After resetting the Turbo Tap, reading continues normally. You can set SEL high to read the directions, low to read the buttons, and the next high transition will increment the counter and return data from controller #2, and so on up to controller #5.

The Video Display Controller (VDC) manages a background layer, sprites, and display generation. It has 20 internal 16-bit registers, and can address up to 128K of video RAM (VRAM).

The TurboGrafx-16 only has 64K VRAM available, so the latter half of the 128K area mirrors the former half. Sometimes reading mirrored VRAM returns corrupted data, and writes to the mirrored half of VRAM are always ignored (this includes VWR access or through VRAM to VRAM DMA).

The VDC is mapped to addresses $0000-0003 in the hardware page, and these locations are repeatedly mirrored throughout the $0000-03FF area.

The lower five bits of $0000 select which register will be accessed at $0002 and $0003. Only registers $00-$02, $05-$13 are valid; selecting registers $03-$04 or $14-1F and trying to access them has no effect.

Likewise, reading $0002 or $0003 when any other register but VRR is selected will return the contents of the read buffer; but reading $0003 will not update MARR.

You can update registers in part by writing to the LSB or MSB; the new data written will immediately have an effect. Some registers have special properities when the MSB is written to. See the register reference section for more details.

Reading $0000 returns a set of status flags. The letters in parenthesis are the names of the flags from the Develo Book (I think):

Bit 6 is set when the VDC is waiting to read or write data requested by the CPU when it accesses VRR/VWR. For more information, see register $09 in the register reference section.

Bits 5-0 are set when a condition occurs which would trigger an interrupt, as dictated by the corresponding interrupt enable bits in register $05 and $0F. If the interrupt enable bits are not set, the matching status bits will not be set, even if the condition occurs. Bits 5-0 are cleared after the status port is read.

Bit 5 is set on the first line after the active display area ends, which signifies the vertical blanking period has started. This will occur even if the line after the active display area is off-screen, such as within the first 14 lines of the frame (top blanking) or the bottomost 7 (bottom blanking and vertical sync)

If the active display area uses more than 261 lines (assuming VSW and VDS are zero) the interrupt will always occur on line 261, which is the next to last line in the frame. So even if VDW was set to something unusual like $1FF, the interrupt would occur on line 261 within the frame.

Bit 4 is set when a VRAM to VRAM DMA transfer has finished.

Bit 3 is set when a VRAM to SAT transfer has finished, which seems to always happen four lines after the last line in the active display period. It's not known when the transfer actually starts. The exact line affected depends on the setting of VSW, VDS and VSW.

Bit 2 is set when the current scanline matches the value in register $06. See the register reference section for more details.

Bit 1 is set when there are more than 16 sprites on the current scanline. See the sprite section for more details.

Bit 0 is set when an opaque pixel in sprite #0 overlaps another opaque pixel from any other sprite. See the sprite section for more details.

All interrupts caused by the VDC must be acknowledged by reading the status flags once within the IRQ handler. If this is not done, the IRQ line is not lowered and the interrupt occurs as soon as the handler RTIs.

Bits 15-0 select a word offset in VRAM that will be used for VRAM writes.

$01 - Memory Address Read Register (MARR)

Bits 15-0 select a word offset in VRAM that will be used for VRAM reads. After you have written the MSB of this value, a word from VRAM is read and stored in the read buffer. On power-up, the contents of the read buffer are indeterminate. (usually $FFFF)

$02 - VRAM Read Register / VRAM Write Register (VRR/VWR)

When you write to the LSB of this register, the CPU data is stored in a temporary location called the write latch. When the MSB is written, the entire 16-bit value composed of the write latch and MSB are written into the VRAM address specified by MAWR. MAWR is then incremented by the increment factor selected by register $05.

Writing to the LSB multiple times only updates the lower half of the write latch and does not change MAWR or VRAM data. Writing to the MSB multiple times will write the previously latched LSB data along with the new MSB.

Reading the LSB will return the lower byte of the read buffer, and reading the MSB will return the upper byte of the read buffer. MARR is then incremented by the increment factor selected by register $05, and a word of VRAM is read from the new address into the read buffer.

Bits 7 and 6 enable and disable the background and sprite layers, and can be changed at any time. Games use these to clip sprites within a region of the display, or to give a letterboxed effect to the background layer.

Within a single frame, the overscan area outside of the active display period is filled with color #0 from sprite palette #0, and the active display area is filled with color #0 from background palette #0. So even if only the sprites were enabled, they would still be drawn over color #0 from background palette #0.

If bits 7 and 6 are cleared by the time the next frame starts (1), they are locked for the duration of the frame. Changes to them will not be taken into effect until the next frame. During this time, every line in the active display area is filled with color #0 of sprite palette #0.

The VDC patent refers to this as "BURST" mode, where the VDC does not read VRAM for background and sprite rendering. The CPU has unlimited access to VRAM, and in addition VRAM to VRAM DMA can be done during this time. Simply clearing both bits during the active display area does not cause BURST mode to become enabled, it only happens as soon as the active display period ends (which is incidentally when VRAM to SAT DMA occurs), and remains effective in the next frame only if both bits remain reset before the next frame starts.

1. I'm not sure if they are locked when the next frame starts, or when the next active display period starts. I'm assuming the former for now.

Bits 3-0 will, when set, allow status flags to be set when a certain condition occurs. In addition the VDC will generate an IRQ1 interrupt, though interrupts can always be disabled through the CPUs IRQ control registers or the P register's I flag.

$06 - Raster Compare Register (RCR)

The value stored in this register is compared to the current scanline. If there is a match and the raster compare interrupt enable bit in register $05 is set, then bit 2 of the status flags is set and an interrupt occurs.

The range of the RCR is 263 lines, relative to the start of the active display period. (defined by VSW, VDS, and VCR) The VDC treats the first scanline of the active display period as $0040, so the valid ranges for the RCR register are $0040 to $0146.

For example, assume VSW=$02, VDS=$17. This positions the first line of the active display period at line 25 of the frame. An RCR value of $0040 (zero) causes an interrupt at line 25, and a value of $0146 (262) causes an interrupt at line 24 of the next frame.

Any other RCR values that are out of range ($00-$3F, $147-$3FF) will never result in a successful line compare.

The VDC was designed to work with slower video memory. The TurboGrafx-16 happens to use the fastest kind available, but you can still set up the VDC to handle VRAM as if it was slower.

During the active display period of a scanline, the VDC can do one 16-bit access to VRAM on each cycle of the dot clock. Bits 1-0 of MWR tell the VDC how to divide this amongst several sources:

1. CPU (reading or writing a word via register $02) 2. Background character pattern generator data (one read is for bitplanes 0 and 1, another is for bitplanes 2 and 3, either one or two are needed per character) 3. BAT data (character name and palette, one fetch needed per character)

CPU - A read or write to register $02 BAT - The palette block and character name from the BAT ??? - Unknown, possibly an unused 'dummy' access CG0 - Bitplanes 0, 1 from the character generator CG1 - Bitplanes 2, 3 from the character generator

The default mode all games use is 0, as far as I can tell, modes 1, 2 are identical, and mode 3 enables the CG mode bit as described later.

The background generated by the VDC is a tiled layer composed of 8x8 characters. The background can be scrolled horizontally and vertically, and it's size is definable in units of 32 characters in either direction, from 32x32 up to 128x64.

The pattern data used by the characters is stored in a planar format. Because the VDC always accesses VRAM in word units, the organization of bitplanes reflect this. It takes 32 bytes of VRAM to define one tile; the first eight words are bitplanes 0 and 1 for lines 0-7, and the next eight words are bitplanes 2 and 3 for lines 0-7.

The background itself is defined by the block attribute table (BAT), which starts at address zero in VRAM. Each word-wide entry in the BAT defines a single character, and has the following layout:

MSB LSB ppppnnnnnnnnnnnn

p : Color palette (0-15) n : Character name (0-4095)

Notice that there are no provisions for tile flipping or priority control.

Because the TurboGrafx-16 only has 64K of VRAM, only patterns 0-2047 should be used. Patterns 2048-4096 are filled with 'garbage' data.

The color palette selects one of sixteen 16-color palettes for the character to use. The background layer always uses the first 256 colors in the 512-color VCE palette.

The BAT doesn't necessarily have to match the same size of the display. If the BAT is too small, (e.g. it's 32x32 and the display is 40x28), then the offset into the BAT wraps around and the graphics are repeated. In the same vein, you don't have to use up the entire BAT space if the display won't show all of it (e.g. it's 64x32 and the display is 32x16, you wouldn't need to define entries for rows 17-31).

For more information about scrolling, see registers $07 and $08 in the register reference section.

Sprites are positioned in a virtual 1024x1024 space. The active display area starts at offset (32, 64), allowing sprites to be partially shown at the left and top edges, as well as giving sprites a place to be hidden at when their coordinates are set to (0, 0).

The pattern index selects one of 1024 patterns, however the TurboGrafx-16 only has 64K of VRAM, so the first 512 should be used. Patterns 512-1023 are filled with 'garbage' data.

Each sprite pattern takes 128 bytes, and is arranged in four groups of 16 words. Each word corresponds to one 16-pixel line, and each group corresponds to one bitplane. For example, words 0-15 define bitplane 0, words 16-31 define bitplane 1, etc.

The CG mode bit is only valid when the sprite dot width field of the MWR register is set to 2 or 3. When clear, bitplanes 0 and 1 are read, 2 and 3 are treated as zero. When set, bitplanes 2 and 3 are read, 0 and 1 are treated as zero.

The vertical and horziontal flip flags flip an entire sprite (not just one 16x16 pattern).

The CGX and CGY fields define the size of a sprite. Sprites larger than 16x16 use neighboring patterns to make up the rest of the sprite. Depending on the size, the lower 3 bits of the pattern index are masked out. If CGX is set, bit 0 of the pattern index is forced to zero. If CGY is 1, bit 1 of the pattern index is forced to zero. If CGY is 2 or 3, bits 2 and 1 of the pattern index are forced to zero. For example, a 16x16 sprite can use any patterns, a 32x16 sprite can use every second pattern, and a 32x64 sprite can only use every eighth pattern.

The priority and palette fields are discussed later on.

Sprite attribute table parsing:

During the horizontal blanking period of each scanline, the VDC parses the SAT to collect information about what sprites will be displayed on the next line. It progresses through the SAT one sprite at a time, working from sprite #0 to #63. If a sprite is found that has the right Y coordinate and height to make it fall on the next line, the sprite is added to an internal 16-entry buffer. The VDC continues to parse the SAT until the following conditions occur:

- All 64 sprite entries have been examined. - All 16 buffer entries have been used. - The horizontal blanking period ends. (1)

Sprites that are 32 pixels wide count as two sprites; in the event that such a sprite is found but there is only one buffer position left, then the left half of the sprite is added to the buffer, and the right half is not displayed.

If all 16 buffer entries are used but there are more sprites that fall on the line, an overflow condition occurs. If the interrupt enable bit of CR is set, the overflow bit in the status register is set and the VDC will generate an interrupt. Overflows can occur anywhere within a scanline in the active display period, even if the sprites are off-screen.

During the next scanline, the VDC compares a counter (incremented by the dot clock) to the X position of each buffered sprite. When the X position is within range, the sprite bitplane data is shifted out serially, forming a single four bit pixel. Only the first opaque pixel is shown, pixel data from subsequent sprites is ignored. This is what defines the priority when multiple sprites overlap each other; for example if sprites 0 through 3 were transparent, but 4 and 5 were not, only pixels from sprite 4 would be shown.

At this point, collisions between sprite #0 and any other sprite are checked. If the bitplane data from sprite #0 is an opaque pixel, and any other of the 16 buffered sprites also output an opaque pixel, a collision occurs. If the interrupt enable bit of CR is set, the collision bit in the status register is set and the VDC will generate an interrupt. Collisions do not occur outside of the active display period.

The sprite pixel is then compared to the current background layer pixel (or backdrop color) at the same location. If the sprite's priority flag is set, then the sprite pixel overwrites the background pixel. If the priority flag is clear, then the sprite pixel is only shown if the background pixel is transparent.

Note that the background priority flag has no effect inter-sprite priority. For example, if sprite #2 has it's priority flag cleared, it would appear under a section of the background. If sprite #3 partially overlapped sprite #2 and had it's priority flag set, it's pixels which shared the same location as opaque pixels in sprite #2 would not be shown. (since sprite #2 comes first in the 16-entry buffer)

This technique is used in many games (Y's, Neutopia, Dungeon Explorer) to force sections of the background to appear in front of sprites that have their priority flag set but are of a lower sprite priority.

Notes:

1. This happens when the width of the display is modified. If the display is made smaller than 32 characters, two sprites at a time starting from sprite 64 are dropped. This is why the Image 15-in-1 Collection does not use sprites, it uses very small resolutions which reduce how many sprites are available. (either the programmer didn't realize this was happening, or simply chose to not use sprites) In the same vein, making the display too wide cuts out multiple ranges of sprites at a time, though the exact relation of which sprites are dropped based on the display size is unclear.

The VDC has two kinds of DMA: VRAM to VRAM copy, and VRAM to SAT transfer.

- The contents of MAWR, MARR, the read buffer, and the write latch, are not changed by doing VRAM to VRAM DMA.

- The IW bits in CR do not change the value added to the source or destination address during VRAM to VRAM DMA, this value is always one.

- LENR specifies how many words to transfer, 0=1 byte, $FFFF=64K. Writing to the MSB triggers the transfer.

- During VRAM to VRAM DMA, SOUR, LENR, and DESR are modified (they act as counters which are incremented and/or decremented in the course of a transfer) At the end of the transfer, the registers retain their new states instead the original values written. LENR is set to $FFFF, not zero, when a transfer completes.

- VRAM to VRAM DMA can only occur outside of the active display period. It seems to me that if it is still running when the active display period starts, DMA is halted (not aborted), and resumes when the active display period ends.

- Both VRAM to VRAM DMA and VRAM to SAT DMA can run at the same time. I don't know which one has priority if they both access the same range of addresses, however.

- The VRAM to SAT DMA transfer end interrupt occurs four scanlines after the end of the active display period. (e.g. if the last line of the active display is $DF, it happens happens at $E3)

The VCE manages a palette used for the background and sprite layers. The palette is composed of 512 9-bit entries, each entry being divided into three groups of three bits for each of the red, green, and blue color components, giving a total range of 512 possible colors.

The first 256 colors are used for the background layer, and the remaining 256 are used for sprites. Within these two groups, the palette can be divided further into 16 groups of 16-color palettes; each palette is selected by a 4-bit field in the BAT or SAT.

A pixel in a background character or sprite pattern with a value of zero is treated as transparent. For sprites, this means the underlying background data or backdrop color is shown. For background characters, this means the underlying backdrop color is shown.

The backdrop color is displayed in the active display area if the sprites, background, or both are enabled. This color is picked from color #0 from background palette #0.

The overscan color is displayed outside of the active display area, and only inside the active display area when both the background and sprites are turned off. This color is picked from color #0 in sprite palette #0. See the register reference section for more details.

Color #0 of the remaining 15 palettes in the background and sprite sections cannot be displayed.

The VCE is mapped to $0400-0407, and these locations are repeatedly mirrored throughout the $0400-07FF range in the hardware page.

Bits 1-0 select the dot clock. This determines how many pixels are displayed on each horizontal line, but does not affect how many lines are shown per frame. If bit 0 is set while the 10 MHz dot clock is used, the color artifacting around the edges of characters is more prominent, while having the bit cleared minimizes artifacting.

Bit 2 seems to blur the edges of the sprites and background characters. This reduces artifacting between pixels, especially in the higher resolutions.

In my opinion, it almost seems that when bit 2 is cleared, every other line is offset horizontally by half a pixel. When bit 2 is set, this is applied to either odd or even lines on odd or even frames.

This is especially noticable with vertically scrolling graphics; if a sprite is moved vertically at the rate of one line per frame, the 'interlacing' effect described above is lost, and the edges appear jagged. When the sprite is stationary, the edges look smooth.

For what it's worth, the PC-FX patents describe this exact same feature; though it's only usable for an interlaced display and is controlled through a different register of the VCE (which is loosely based on the original one used in the TurboGrafx-16).

These two registers form a 16-bit value, of which the lower 9 bits are used as an index into the color table for subsequent reads and writes by the data register. The remaining upper 7 bits are ignored.

You can update either the LSB or MSB independently and still perform color data reads and writes; the address does not have to be specified in full beforehand.

$0404 - Color table data (LSB) $0405 - Color table data (MSB)

These two registers form a 16-bit value, of which the lower 9 bits contain color data:

MSB LSB -------GGGRRRBBB

G = Green component R = Red component B = Blue component

Reading $0404 returns the lower byte of the color data. Reading $0405 returns the upper byte with bits 7-1 set to '1'. When the upper byte is read, the color table address is advanced by one and will wrap when the address is at $01FF.

Writing to $0404 sends a byte of data to the LSB of the current entry in the color table. Writing to $0405 sends a byte of data to the MSB of the current entry in the color table, and in addition, the address is advanced by one and will wrap when the address is at $01FF.

Writing to $0405 multiple times will only update the MSB of each color table entry, the LSB will remain undisturbed. You can also freely change either half of the address (through $0402/$0403) between writes to the color table data registers.

$0406 - Not used (Reads return $FF, writes do nothing)

$0407 - Not used (Reads return $FF, writes do nothing)

Palette flicker

The VCE color table can only be accessed by either the CPU or the VCE at any given time, with CPU accesses taking priority. When the CPU reads or writes VCE addresses $0404 or $0405 during the active display period, the current pixel being displayed can't look it's color up through the color table as the CPU is currently using the table itself.

Unlike other video hardware (e.g. Sega systems) where the pixel's color will be replaced by the data read or written by the CPU, the VCE will show the same color for the last pixel it displayed. While this still causes distortion of the graphics, it is mostly masked when the current image being displayed is a horizontal strip of the same color.

Note that this also occurs at the edges of the display, when the monitor scans the border (overscan) area to the left and right of the active display, reading or writing the color table will cause the right border color to overlap into the active display, and likewise the active display color can overlap into the left border.

One game that allows you to see this effect is Coryoon, it fades the display in and out without waiting for the vertical blank period. You can easily see on the screen where a read or write has occured as the single pixels are stretched out into short horizontal lines due to the VCE displaying the same pixel color while the CPU is busy accessing the color table.

All aspects of the display are controlled by the VDC (which has several registers that define where graphics are shown within the display), and the VCE (which generates a dot clock, in turn defining the number of pixels displayable). I will start with discussing the vertical control fields in VDC registers $0C, $0D, and $0E.

The TurboGrafx-16 generates a NTSC display that is composed of 60 frames shown per second, with each frame divided into 263 scanlines. These scanlines are grouped as follows:

14 lines for the top blanking area (shown as light black). 242 lines for the active display area (graphics and/or overscan color). 4 lines for the bottom blanking area (shown as light black). 3 lines for the sync area (shown as pure black).

This layout is fixed, and cannot be changed by the vertical control registers. They only define where the graphics data is displayed within a single frame. If the active display area is positioned in a way that it occupies the lines that are used for blanking or sync, those lines will not be shown. The start of a frame is the first line after the vertical retrace period, which is not necessarily the first line you can see on a monitor.

For the sake of discussion, assume the VDC has a two counters that are reset at the start of a frame and incremented on each scanline. One is used to track the position within each frame, and the VDC checks this counter when generating the active display, top and bottom blanking areas, and sync area. I'll call this the frame counter. The other counter is used for tracking the graphics area within a frame, and can be reset multiple times. I'll call this the display counter.

The display counter is compared to an offset made by the sum of VDS and VSW. When they match, graphics data are displayed, or else the overscan color is shown. Because there are 14 lines at the beginning of a frame which make up the top border, the offset created by VDS and VSW must be at least 14 lines. In addition, most monitors cut off the edges of the display, so the offset may need to be larger.

For example, the standard 256x224 resolution used by most games has VDS=$17 and VSW=$02, giving an offset of 25 lines. This clears the top blanking area and gives 11 lines of overscan color before the 224 lines of graphics data are shown.

After the display counter matches the offset created with VDS and VSW, graphics are displayed until the counter now matches the previous offset plus VDW. At this point, graphics are turned off and the overscan color is shown for the remainder of the 242 lines that make up the active display area. If VDW and/or the VSW+VDS offset are large enough that the graphics are shown past line 242, then no overscan color is displayed and these graphics are hidden by the bottom border and sync areas.

Assuming this isn't the case, and there are some lines left in the remainder of the active display area (out of the available 242), the VDC will show 3 blanked lines filled with the overscan color. It will continue to do this for as many lines as are specified in the VCR register.

After this point, the display counter resets. It will begin to show the overscan and active display area again, following the same rules as above except for everything is positioned relative to the last line of the frame as specified by VDS+VSW+VDS+VCR+3.

VCR is normally used to prevent this situation, it can be set to the number of lines remaining in the frame and thereby prevent the active display area from being displayed twice.

Now for an example:

VDW = $07F, VCR = $00, VDS = $0E, VSW = $00

This positions the active display area at line 14 onwards, clearing the top blanking area. The height of the active display area is 128 lines. VCR is not used.

The display will show 14 black lines, 128 lines from the active display area, and then 3 lines of overscan color. Now the display counter resets, so 14 lines of the overscan color are displayed, followed by 97 lines of the active display area. The frame counter has reached the bottom of the frame, so you see 4 lines of the bottom border and finally 3 lines of the sync area, for a total of 263 lines.

Now for the horizontal aspects of the display:

The number of characters per scanline is determined by the dot clock used, and the horizontal parameters define what characters show graphics data and which ones show the overscan color. You can't divide the dot clock further by adjusting the horizontal parameters; for example, the 5 MHz dot clock will always show roughly 342 dots per line, so a small resolution like 128 characters would have a large border on the left and right sides. Some of these dots are off-screen, since they are used for the horizontal blanking, retrace, and color burst areas.

VDC registers $0A and $0B can be modified at any time. Registers $0C, $0D, and $0E can only be modified outside of the active display period. So it's possible to change the horizontal resolution to any width at any line.

These settings were created using the Display Editor program, and are included as a reference for emulator authors to see the largest possible displays, as well as developers to use in their programs.

Key to following settings:

Overscan - Part of the display may be off-screen on some monitors. Max - This is the largest viewable area possible. (I used a video capture card to get around the limitations of a regular monitor) CLK - Bits 1-0 of VCE register $0400

> It's a matter of perspective. The raising edge of the hsync is arguably the beginning of the next line, in which the VDC generates the interrupt (if line match).

After talking with Ryphecha about it, she advised that the interrupt actually triggers near the very end of the previous scanline you asked for. Which would indeed be the Hblank area, and give games times to do things to affect the line they were targeting.

I did tests with Ryphecha, Exophase, and Charles over the years, as well as on my own. I've lost my notes over the years, so things from memory tend to be iffy (except the things/regs I consistently use). I do know that VDC will trigger an interrupt when forced to HDS. I believe there's a delay from the pixel bus to video out on the VCE side (8 pixels or more.. I don't remember). And I know there's a delay on the VDC side (at least 8 pixels). So if you're using composite output as a timing marker to something else, say for measuring when h-int is asserted by the VDC, it's going to be off. If the VDC did assert h-int before HSW (on the previous) line, and I do know that BYR is read during HDS, then it might be possible to test for this *if* the interrupt is generated from the transition of HDW->HDE. But I really don't think that's the case. If anything, it's HDE->HSW or HSW->HDS (both this case scenarios kinda fit the case for when VCE asserts hsync to VDC outside the HSW window). Because neither of those last two scenarios would give issues with games, while the first case potentially could.

I discovered this delay because I found out how to use VDC regs #3 and #4 (marked as reserved). They seem to be some internal test state regs. To access them, you have to set bits in the VRAM READ address reg. I forget which bits correlates to which functions (it's been too long). Reading or writing to reserved regs enables and disables. And earlier internal documentation actually gives them names, but the official doc just says they're reserved. I don't remember their names.

Quote:

> Which VDC reg?

BYR. And I just found there's a THIRD level of latching to it. The cached BYR value is latched at the start of the scanline. Without that, you'll get momentary flicker when games write the register mid-scanline. Which they do a lot because there's only one background and no windows.

That's what I mentioned; every VDC register is buffered. Only VWR/VRR have a latch mechanism though. The other regs can be written for the LSB or MSB singularly (for fast sine wave effects, I only change the LSB of BXR. And fast RCR updates to the LSB only). All regs, from what I remember testing, are updated on the next scan line (or at the start of the scanline when exiting HSW), except the ones defining the size of the vertical window. So that means scroll regs (BXR, BYR), color bits, horizontal window settings, etc -> update on a scanline basis and regs like VPR, VDW, VCR update on vblank (start of new display).

I don't think it has any correlation to a single BG layer and absence of windows. That makes no sense. It's a buffered system specifically to avoid race conditions (i.e. the NES or gameboy BY reg) for mid screen changes.

Quote:

> I have no idea how accurate you plan to go with this, but the VDC processes BG and sprite line setup at different 8pixel dot clock segments during hblank.

I am limited by the available information. And the available information is unbelievably awful and outdated from 2007 and earlier =(

But I want to be as accurate as possible. I just emulated the display start, display length regs. But not yet the display end, sync width regs. Nor the "fun" that happens with out of bounds values (you can apparently make the VDC start drawing the NEXT scanline on the current scanline if your values are too aggressive.)

Chris Covell's screen resolution rom is a good starting point.

Quote:

Even with just that, I'm still having troubles. HDS (horizontal display start) seems to act differently depending on the $0400.d0-d1 clock frequency setting (5mhz, 7mhz, 10mhz). Unless I manually fudge the start offset by another ~24 clock cycles for 7mhz mode only, the status bar in Order of the Griffon isn't centered.

Might have something to do with how HSW works? HSW (it's defined in HSR reg) is nothing more than wait loop window. You define it in lengths of 8 VDC pixel units, but when the VCE asserts Hsync, it immediately transitions to HDS. I.e. It ill never be the full length of the wait loop period unless the VDC times out (no external hsync), then it transitions to HDS as well.

The problem is that I don't have a real PC Engine to run it on and see how the screen actually reacts to changing various values. Especially going way out of bounds.

I recently emulated the SuperGrafx (mostly), and as such the VCE's role kind of crystallized a bit for me. So now I'm splitting the VDC and VCE functionality, where the VDC has no idea about the actual CRT beam cannon, but instead just runs on "start of frame/scanline" signals from the VCE to it, off the clock rate set by the VCE. It does nothing but set a 9-bit data bus that the VCE (or VPC) uses.

This is normal behaviour. Here's a pic of Neutopia on my monitor (run through an XRGB-2) You can see the coloured borders (which really come from Sprite colour 0, but are set to be the same as BG colour 0) on either side. My old Panasonic TV even showed the full 242 lines (blue at the bottom too) which my XRGB cuts off.

By the way, feel free to post a new thread (rather than in the Z80 one...). The PCEngineFX forums have lots of technical discussion like this, but they're not searchable (through Google, etc.) so NESDev's not a bad place to have more PCE dev discussion.

Who is online

Users browsing this forum: No registered users and 2 guests

You cannot post new topics in this forumYou cannot reply to topics in this forumYou cannot edit your posts in this forumYou cannot delete your posts in this forumYou cannot post attachments in this forum