AGP is a dedicated high-speed bus that allows the graphics controller to move large amoumts of data directly from system memory.
Uses a Graphics Address Re-Mapping Table (GART) to provide a physically-contiguous view of scattered pages in system memory for DMA transfers.

You have to understand that the DRI functions have a different purpose
then the ones in XFree. The DRM has to know about AGP, so it talks to
the AGP kernel module itself. It has to be able to protect certain
regions of AGP memory from the client side 3D drivers, yet it has to
export some regions of it as well. While most of this functionality
(most, not all) can be accomplished with the /dev/agpgart interface, it
makes sense to use the DRM's current authentication mechanism. This
means that there is less complexity on the client side. If we used
/dev/agpgart then the client would have to open two devices,
authenticate to both of them, and make half a dozen calls to agpgart,
then only care about the DRM device.

As a side note, the XFree86 calls were written after the DRM functions.

Also to answer a previous question about not using XFree86 calls for
memory mapping, you have to understand that under most OS`es (probably
solaris as well), XFree86`s functions will only work for root privileged
processes. The whole point of the DRI is to allow processes that can
connect to the X server to do some form of direct to hardware rendering.
If we limited ourselves to using XFree86's functionality, we would not be
able to do this. We don`t want everyone to be root.

Every time you update the GATT, you have to flush the cache and/or
TLBs. This is expensive. Therefore, you allocate and bind the pages
you'll use, and mmap() just returns the right pages when needed.

Then you need to have a remap of the agp aperture in the kernel which
you can access. Use ioremap to do that.

After that you have access to the agp memory. You probably want to make
sure that there is a write combining mtrr over the aperture. There is
code in mga_drv.c in our kernel directory that shows you how to do that.

All this allocation should be done by only one process. If you need
memory in the GTT you should be asking the Xserver for it (or whatever
your controlling process is). Things are implemented this way so that
the controlling process can know intimate details of how memory is laid
out. This is very important for the I810, since you want to set tiled
memory on certain regions of the aperture. If you made the kernel do
the layout, then you would have to create device specific code in the
kernel to make sure that the backbuffer/dcache are aligned for tiled
memory. This adds complexity to the kernel that doesn`t need to be
there, and imposes restrictions on what you can do with agp memory.
Also, the current Xserver implementation (4.0) actually locks out other
applications from adding to the GTT. While the Xserver is active, the
Xserver is the only one who can add memory. Only the controlling
process may add things to the GTT, and while a controlling process is
active, no other application can be the controlling process.

Microsoft`s VGART does things like you are describing I believe. I
think its bad design. It enforces a policy on whoever uses it, and is
not flexible. When you are designing low level system routines I think
it is very important to make sure your design has the minimum of
policy. Otherwise when you want to do something different you have to
change the interface, or create custom drivers for each application that
needs to do things differently.

Let's call it 'kernel ringbuffers'. The premise is to replace the
calls to the 'fire-vertex-buffer' ioctl with code to write to a
client-private mapping shared by the kernel (like the current sarea,
but for each client).

Starting from the beginning:

Each client has a private piece of AGP memory, into which it will
put secure commands (typically vertices and texture data). The client
may expand or shrink this region according to load.

Each client has a shared user/kernel region of cached memory.
(Per-context sarea). This is managed like a ring, with head and tail
pointers.

The client emits vertices to AGP memory (as it currently does with
DMA buffers).

When a statechange, clear, swap, flush, or other event occurs, the
client:

Grabs the hardware lock.

Re-emits any invalidated state to the head of the ring.

Emits a command to fire the portion of AGP space as vertices.

Updates the head pointer in the ring.

Releases the lock.

The kernel is responsible for processing all of the rings. Several
events might cause the kernel to examine active rings for commands to
be dispatched:

A flush ioctl. (Called by impatient clients)

A periodic timer. (If this is low overhead?)

An interrupt previously emitted by the kernel. (If timers
don't work)

Additionally, for those who've been paying attention, you'll notice
that some of the assumptions that we use currently to manage hardware
state between multiple active contexts are broken if client commands
to hardware aren't executed serially in an order which is knowable to
the clients. Otherwise, a client that grabs the heavy lock doesn't
know what state has been invalidated or textures swapped out by other
clients.

This could be solved by keeping per-context state in the kernel and
implementing a proper texture manager. That's something we need to do
anyway, but it's not a requirement for this mechanism to work.

Instead, force the kernel to fire all outstanding commands on client
ringbuffers whenever the heavyweight lock changes hands. This
provides the same serialized semantics as the current mechanism, and
also simplifies the kernel's task as it knows that only a single
context has an active ring buffer (the one last to hold the lock).

An additional mechanism is required to allow clients to know which
pieces of their AGP buffer is pending execution by the hardware, and
which pieces of the buffer are available to be reused. This is also
exactly what NV_vertex_array_range requires.

The first step would be to check out the current mach64 branch from dri
CVS, the tag is 'mach64-0-0-2-branch.' Follow the instructions on
dri.sf.net to compile and install the tree. A couple of things you need
to know are:

1. Make sure to check out the branch, not the head (use '... co -r
mach64-0-0-2-branch xc')

2. You need libraries and headers from a full X install. I used lndir to
add symlinks from /usr/X11R6/include and /usr/X11R6/lib into
/usr/X11R6-DRI.

You'll need to have AGP support for your chipset configured in your kernel
and have the module loaded before starting X (assuming you build it as a
module). At this point, you need agpgart for the driver to load, but AGP
isn't currently used by the driver yet.

Take a look at the code, the list archives and the DRI documentation on
dri.sf.net (it's a little stale, but a good starting point). We are also
using the driver from the Utah-GLX project as a guide, so you might want
to check that out (utah-glx.sf.net). Many of us have documentation from
ATI as well, you can apply to their developer program for docs at
http://apps.ati.com/developers/devform1.asp

Our first priority right now is to get the 3D portion of the driver using
DMA transfers (GUI mastering) rather than direct register programming.
Frank Earl is currently working on this. Then we need to get the 2D
driver to honor the drm locking scheme so we can enable 2D acceleration,
which is currently disabled. Right now switching back to X from a text
console or switching modes can cause a lockup because 2D and 3D operations
are not synchronized. Also on the todo list is using AGP for texture
uploads and finishing up the Mesa stuff (e.g. getting points and lines
working, alpha blending...).

In this layout the Glide(DRI) is really a hardware abstraction
layer. The only API exposed it OpenGL and Glide(DRI) only works with
OpenGL. It isn`t useful by itself.

There are a few Glide only games. 3dfx would like to see those work. So
the current solution, shown above, doesn`t work since the Glide API
isn`t available. Instead we need:

Client -> Glide as API (DRI) -> hw

Right now Mesa does a bunch of the DRI work, and then hands that data
down to Glide. Also Mesa does all the locking of the hardware. If we`re
going to remove Mesa, then Glide now has to do the DRI work, and we have
to do something about the locking.

The solution is actually a bit more complicated. Glide wants to use all
the memory as well. We don`t want the X server to draw at all. Glide
will turn off drawing in the X server and grab the lock and never let it
go. That way no other 3D client can start up and the X server can still
process keyboard events and such for you. When the Glide app goes away
we just force a big refresh event for the whole screen.

I hope that explains it. We`re really not trying to encourage people to
use the Glide API, it is just to allow those existing games to run. We
really want people to use OpenGL directly.

Another interesting project that a few people have discussed is removing
Glide from the picture at all. Just let Mesa send the actual commands to
the hardware. That`s the way most of our drivers were written. It would
simplify the install process (you don`t need Glide separately) and it
might improve performance a bit, and since we`re only doing this for one
type of hardware (Voodoo3+) Glide isn`t doing that much as a hardware
abstraction layer. It`s some work. There`s about 50 calls from Glide
we use and those aren`t simple, but it might be a good project for a few
people to tackle.

There's not a lot we can do with S3TC because of S3's patent/license
restrictions.

Normally, OpenGL implementations would do software compression of
textures and then send them to the board. The patent seems to prevent
that, so we're staying away from it.

If an application has compressed texture (they compressed them
themselves or compressed them offline) we can download the compressed
texture to the board. Unfortunetly, that's of little use since most
applications don't work that way.