If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Hybrid View

Modern cards framebuffer organization

Hi all!

Does anybody know how framebuffer is made in modern cards? I've heard that in older days dual ported DRAM (called VRAM) was used: RAMDAC read from one port, GPU write to another. But today AFAIK IHV use DDR/GDDR which has single port only. Did buses become fast enough to handle read and write to framebuffer via single port, or there is a dual ported VRAM hidden in GPU chip (cause there are no such chips mounted on PCB)?

- relatively wide (256-384 bits on high end GPUs vs 64 bits/channel on CPUs)
- long bursts in and out of memory to maximise transfer rate
- caches optimized for throughput (eg read-only caches for textures)

Once you get onto the GPU the caches, registers and local stores do have many banks/ports in order to support simultaneous access.

In "older days" feeding the display consumed a big part of the available bandwidth, so having a dedicated port and on-chip shift register really helped.

It was actually the on-chip shift register that made the biggest difference -- that allowed an entire row to be read from the DRAM array and dropped into the shift register with a single RAS/CAS cycle, then the graphics engine would have full time access to the memory interface while data was shifted out to the display. Normal memory cycles could only access a single bit from the row on each access, vs the full-row access of a VRAM.

Nowadays the same approach is still used but rather than having a wide on-chip shift register the sequence is :

- memory controller starts a page-mode burst
- DRAM reads an entire row (DRAM always does this even for a single-bit access)
- memory controller burst-transfers the row (or 1/2, 1/4 etc..) into on-chip line buffer
- graphics engine gets full time access to memory while data is shifted from line buffer to display

Note that modern GPUs support multiple displays so you typically have multiple line buffers as well.

In "older days" feeding the display consumed a big part of the available bandwidth, so having a dedicated port and on-chip shift register really helped.

It was actually the on-chip shift register that made the biggest difference -- that allowed an entire row to be read from the DRAM array and dropped into the shift register with a single RAS/CAS cycle, then the graphics engine would have full time access to the memory interface while data was shifted out to the display. Normal memory cycles could only access a single bit from the row on each access, vs the full-row access of a VRAM.

Does this means that GPU main processor itself did a work of feeding display from this shift register? Both ports of VRAM was connected to GPU itself? I thought it was external device, RAMDAC, which read a memory and send colour bits to encoder..

Originally Posted by bridgman

- memory controller burst-transfers the row (or 1/2, 1/4 etc..) into on-chip line buffer
- graphics engine gets full time access to memory while data is shifted from line buffer to display

Note that modern GPUs support multiple displays so you typically have multiple line buffers as well.

On-chip line buffer is just a part of on-chip memory? It's fast enough, that's because register is no more used?
What is graphics engine? I know, it is too many questions, could you please provide a link to any texts about subject?

In "older days" feeding the display consumed a big part of the available bandwidth, so having a dedicated port and on-chip shift register really helped.

It was actually the on-chip shift register that made the biggest difference -- that allowed an entire row to be read from the DRAM array and dropped into the shift register with a single RAS/CAS cycle, then the graphics engine would have full time access to the memory interface while data was shifted out to the display. Normal memory cycles could only access a single bit from the row on each access, vs the full-row access of a VRAM.

Does this means that GPU main processor itself did a work of feeding display from this shift register? Both ports of VRAM was connected to GPU itself? I thought it was external device, RAMDAC, which read a memory and send colour bits to encoder..

Originally Posted by bridgman

- memory controller burst-transfers the row (or 1/2, 1/4 etc..) into on-chip line buffer
- graphics engine gets full time access to memory while data is shifted from line buffer to display

Note that modern GPUs support multiple displays so you typically have multiple line buffers as well.

On-chip line buffer is just a part of on-chip memory? It's fast enough, that's because register is no more used?
What is graphics engine? I know, it is too many questions, could you please provide a link to any texts about subject?

It's a few years since I've worked on graphics chips, but AFAIR the VRAM output was connected directly to the RAMDAC for the screen, so the GPU didn't have to read the data, just clock it out.

With more modern chips the GPU reads the framebuffer data just like any other memory and sends it to the display itself. It may have to read from multiple buffers for overlays, cursor, etc to generate the final result to output by merging them together. Also, the framebuffer may not be linear (e.g. tiled for better render performance), so that adds more complications to the display hardware.