Modern Linux systems require large amount of graphics memory to store
frame buffers, textures, vertices and other graphics-related data. Given
the very dynamic nature of many of that data, managing graphics memory
efficiently is thus crucial for the graphics stack and plays a central
role in the DRM infrastructure.

The DRM core includes two memory managers, namely Translation Table Maps
(TTM) and Graphics Execution Manager (GEM). TTM was the first DRM memory
manager to be developed and tried to be a one-size-fits-them all
solution. It provides a single userspace API to accommodate the need of
all hardware, supporting both Unified Memory Architecture (UMA) devices
and devices with dedicated video RAM (i.e. most discrete video cards).
This resulted in a large, complex piece of code that turned out to be
hard to use for driver development.

GEM started as an Intel-sponsored project in reaction to TTM’s
complexity. Its design philosophy is completely different: instead of
providing a solution to every graphics memory-related problems, GEM
identified common code between drivers and created a support library to
share it. GEM has simpler initialization and execution requirements than
TTM, but has no video RAM management capabilities and is thus limited to
UMA devices.

Drivers wishing to support TTM must pass a filled ttm_bo_driver structure to ttm_bo_device_init, together with an
initialized global reference to the memory manager. The ttm_bo_driver
structure contains several fields with function pointers for
initializing the TTM, allocating and freeing memory, waiting for command
completion and fence synchronization, and memory migration.

There should be one global reference structure for your memory manager
as a whole, and there will be others for each object created by the
memory manager at runtime. Your global TTM should have a type of
TTM_GLOBAL_TTM_MEM. The size field for the global object should be
sizeof(struct ttm_mem_global), and the init and release hooks should
point at your driver-specific init and release routines, which probably
eventually call ttm_mem_global_init and ttm_mem_global_release,
respectively.

Once your global TTM accounting structure is set up and initialized by
calling ttm_global_item_ref() on it, you need to create a buffer
object TTM to provide a pool for buffer object allocation by clients and
the kernel itself. The type of this object should be
TTM_GLOBAL_TTM_BO, and its size should be sizeof(struct
ttm_bo_global). Again, driver-specific init and release functions may
be provided, likely eventually calling ttm_bo_global_ref_init() and
ttm_bo_global_ref_release(), respectively. Also, like the previous
object, ttm_global_item_ref() is used to create an initial reference
count for the TTM, which will call your initialization function.

The GEM design approach has resulted in a memory manager that doesn’t
provide full coverage of all (or even all common) use cases in its
userspace or kernel API. GEM exposes a set of standard memory-related
operations to userspace and a set of helper functions to drivers, and
let drivers implement hardware-specific operations with their own
private API.

The GEM userspace API is described in the GEM - the Graphics Execution
Manager article on LWN. While
slightly outdated, the document provides a good overview of the GEM API
principles. Buffer allocation and read and write operations, described
as part of the common GEM API, are currently implemented using
driver-specific ioctls.

GEM is data-agnostic. It manages abstract buffer objects without knowing
what individual buffers contain. APIs that require knowledge of buffer
contents or purpose, such as buffer allocation or synchronization
primitives, are thus outside of the scope of GEM and must be implemented
using driver-specific ioctls.

On a fundamental level, GEM involves several operations:

Memory allocation and freeing

Command execution

Aperture management at command execution time

Buffer object allocation is relatively straightforward and largely
provided by Linux’s shmem layer, which provides memory to back each
object.

Drivers that use GEM must set the DRIVER_GEM bit in the struct
structdrm_driver driver_features
field. The DRM core will then automatically initialize the GEM core
before calling the load operation. Behind the scene, this will create a
DRM Memory Manager object which provides an address space pool for
object allocation.

In a KMS configuration, drivers need to allocate and initialize a
command ring buffer following core GEM initialization if required by the
hardware. UMA devices usually have what is called a “stolen” memory
region, which provides space for the initial framebuffer and large,
contiguous memory regions required by the device. This space is
typically not managed by GEM, and must be initialized separately into
its own DRM MM object.

GEM splits creation of GEM objects and allocation of the memory that
backs them in two distinct operations.

GEM objects are represented by an instance of struct structdrm_gem_object. Drivers usually need to
extend GEM objects with private information and thus create a
driver-specific GEM object structure type that embeds an instance of
struct structdrm_gem_object.

To create a GEM object, a driver allocates memory for an instance of its
specific GEM object type and initializes the embedded struct
structdrm_gem_object with a call
to drm_gem_object_init(). The function takes a pointer
to the DRM device, a pointer to the GEM object and the buffer object
size in bytes.

GEM uses shmem to allocate anonymous pageable memory.
drm_gem_object_init() will create an shmfs file of the
requested size and store it into the struct structdrm_gem_object filp field. The memory is
used as either main storage for the object when the graphics hardware
uses system memory directly or as a backing store otherwise.

Drivers are responsible for the actual physical pages allocation by
calling shmem_read_mapping_page_gfp() for each page.
Note that they can decide to allocate pages when initializing the GEM
object, or to delay allocation until the memory is needed (for instance
when a page fault occurs as a result of a userspace memory access or
when the driver needs to start a DMA transfer involving the memory).

Anonymous pageable memory allocation is not always desired, for instance
when the hardware requires physically contiguous system memory as is
often the case in embedded devices. Drivers can create GEM objects with
no shmfs backing (called private GEM objects) by initializing them with
a call to drm_gem_private_object_init() instead of
drm_gem_object_init(). Storage for private GEM objects
must be managed by drivers.

When the last reference to a GEM object is released the GEM core calls
the structdrm_driver gem_free_object_unlocked
operation. That operation is mandatory for GEM-enabled drivers and must
free the GEM object and all associated resources.

void (*gem_free_object) (struct drm_gem_object *obj); Drivers are
responsible for freeing all GEM object resources. This includes the
resources created by the GEM core, which need to be released with
drm_gem_object_release().

Communication between userspace and the kernel refers to GEM objects
using local handles, global names or, more recently, file descriptors.
All of those are 32-bit integer values; the usual Linux kernel limits
apply to the file descriptors.

GEM handles are local to a DRM file. Applications get a handle to a GEM
object through a driver-specific ioctl, and can use that handle to refer
to the GEM object in other standard or driver-specific ioctls. Closing a
DRM file handle frees all its GEM handles and dereferences the
associated GEM objects.

To create a handle for a GEM object drivers call
drm_gem_handle_create(). The function takes a pointer
to the DRM file and the GEM object and returns a locally unique handle.
When the handle is no longer needed drivers delete it with a call to
drm_gem_handle_delete(). Finally the GEM object
associated with a handle can be retrieved by a call to
drm_gem_object_lookup().

Handles don’t take ownership of GEM objects, they only take a reference
to the object that will be dropped when the handle is destroyed. To
avoid leaking GEM objects, drivers must make sure they drop the
reference(s) they own (such as the initial reference taken at object
creation time) as appropriate, without any special consideration for the
handle. For example, in the particular case of combined GEM object and
handle creation in the implementation of the dumb_create operation,
drivers must drop the initial reference to the GEM object before
returning the handle.

GEM names are similar in purpose to handles but are not local to DRM
files. They can be passed between processes to reference a GEM object
globally. Names can’t be used directly to refer to objects in the DRM
API, applications must convert handles to names and names to handles
using the DRM_IOCTL_GEM_FLINK and DRM_IOCTL_GEM_OPEN ioctls
respectively. The conversion is handled by the DRM core without any
driver-specific support.

GEM also supports buffer sharing with dma-buf file descriptors through
PRIME. GEM-based drivers must use the provided helpers functions to
implement the exporting and importing correctly. See ?. Since sharing
file descriptors is inherently more secure than the easily guessable and
global GEM names it is the preferred buffer sharing mechanism. Sharing
buffers through GEM names is only supported for legacy userspace.
Furthermore PRIME also allows cross-device buffer sharing since it is
based on dma-bufs.

Because mapping operations are fairly heavyweight GEM favours
read/write-like access to buffers, implemented through driver-specific
ioctls, over mapping buffers to userspace. However, when random access
to the buffer is needed (to perform software rendering for instance),
direct access to the object can be more efficient.

The mmap system call can’t be used directly to map GEM objects, as they
don’t have their own file handle. Two alternative methods currently
co-exist to map GEM objects to userspace. The first method uses a
driver-specific ioctl to perform the mapping operation, calling
do_mmap() under the hood. This is often considered
dubious, seems to be discouraged for new GEM-enabled drivers, and will
thus not be described here.

The second method uses the mmap system call on the DRM file handle. void
*mmap(void *addr, size_t length, int prot, int flags, int fd, off_t
offset); DRM identifies the GEM object to be mapped by a fake offset
passed through the mmap offset argument. Prior to being mapped, a GEM
object must thus be associated with a fake offset. To do so, drivers
must call drm_gem_create_mmap_offset() on the object.

Once allocated, the fake offset value must be passed to the application
in a driver-specific way and can then be used as the mmap offset
argument.

The GEM core provides a helper method drm_gem_mmap() to
handle object mapping. The method can be set directly as the mmap file
operation handler. It will look up the GEM object based on the offset
value and set the VMA operations to the structdrm_driver gem_vm_ops field. Note that
drm_gem_mmap() doesn’t map memory to userspace, but
relies on the driver-provided fault handler to map pages individually.

The open and close operations must update the GEM object reference
count. Drivers can use the drm_gem_vm_open() and
drm_gem_vm_close() helper functions directly as open
and close handlers.

The fault operation handler is responsible for mapping individual pages
to userspace when a page fault occurs. Depending on the memory
allocation scheme, drivers can allocate pages at fault time, or can
decide to allocate memory for the GEM object at the time the object is
created.

Drivers that want to map the GEM object upfront instead of handling page
faults can implement their own mmap file operation handler.

For platforms without MMU the GEM core provides a helper method
drm_gem_cma_get_unmapped_area(). The mmap() routines will call
this to get a proposed address for the mapping.

When mapped to the device or used in a command buffer, backing pages for
an object are flushed to memory and marked write combined so as to be
coherent with the GPU. Likewise, if the CPU accesses an object after the
GPU has finished rendering to the object, then the object must be made
coherent with the CPU’s view of memory, usually involving GPU cache
flushing of various kinds. This core CPU<->GPU coherency management is
provided by a device-specific ioctl, which evaluates an object’s current
domain and performs any necessary flushing or synchronization to put the
object into the desired coherency domain (note that the object may be
busy, i.e. an active render target; in that case, setting the domain
blocks the client and waits for rendering to complete before performing
any necessary flushing operations).

Perhaps the most important GEM function for GPU devices is providing a
command execution interface to clients. Client programs construct
command buffers containing references to previously allocated memory
objects, and then submit them to GEM. At that point, GEM takes care to
bind all the objects into the GTT, execute the buffer, and provide
necessary synchronization between clients accessing the same buffers.
This often involves evicting some objects from the GTT and re-binding
others (a fairly expensive operation), and providing relocation support
which hides fixed GTT offsets from clients. Clients must take care not
to submit command buffers that reference more objects than can fit in
the GTT; otherwise, GEM will reject them and no rendering will occur.
Similarly, if several objects in the buffer require fence registers to
be allocated for correct rendering (e.g. 2D blits on pre-965 chips),
care must be taken not to require more fence registers than are
available to the client. Such resource management should be abstracted
from the client in libdrm.

Any foreign dma_buf imported as a gem object has this set to the
attachment point for the device. This is invariant over the lifetime
of a gem object.

The drm_driver.gem_free_object callback is responsible for cleaning
up the dma_buf attachment and references acquired at import time.

Note that the drm gem/prime core does not depend upon drivers setting
this field any more. So for drivers where this doesn’t make sense
(e.g. virtual devices or a displaylink behind an usb bus) they can
simply leave it as NULL.

resv

Pointer to reservation object associated with the this GEM object.

Normally (resv == &**_resv**) except for imported GEM objects.

_resv

A reservation object for this GEM object.

This is unused for imported GEM objects.

funcs

Optional GEM object functions. If this is set, it will be used instead of the
corresponding drm_driver GEM callbacks.

New drivers should use this.

Description

This structure defines the generic parts for GEM buffer objects, which are
mostly around handling mmap and userspace handles.

This macro autogenerates a suitable structfile_operations for GEM based
drivers, which can be assigned to drm_driver.fops. Note that this structure
cannot be shared between drivers, because it contains a reference to the
current module using THIS_MODULE.

Note that the declaration is already marked as static - if you need a
non-static version of this you’re probably doing it wrong and will break the
THIS_MODULE reference by accident.

Drivers should never call this directly in their code. Instead they should
wrap it up into a driver_gem_object_put(structdriver_gem_object*obj)
wrapper function, and use that. Shared code should never call this, to
avoid breaking drivers by accident which still depend upon
drm_device.struct_mutex locking.

GEM memory mapping works by handing back to userspace a fake mmap offset
it can use in a subsequent mmap(2) call. The DRM core code then looks
up the object based on the offset and sets up the various memory mapping
structures.

GEM memory mapping works by handing back to userspace a fake mmap offset
it can use in a subsequent mmap(2) call. The DRM core code then looks
up the object based on the offset and sets up the various memory mapping
structures.

This reads the page-array of the shmem-backing storage of the given gem
object. An array of pages is returned. If a page is not allocated or
swapped-out, this will allocate/swap-in the required pages. Note that the
whole object is covered by the page-array and pinned in memory.

This uses the GFP-mask set on the shmem-mapping (see mapping_set_gfp_mask()).
If you require other GFP-masks, you have to do those allocations yourself.

Note that you are not allowed to change gfp-zones during runtime. That is,
shmem_read_mapping_page_gfp() must be called with the same gfp_zone(gfp) as
set during initialization. If you have special zone constraints, set them
after drm_gem_object_init() via mapping_set_gfp_mask(). shmem-core takes care
to keep pages in the required zone during swap-in.

Set up the VMA to prepare mapping of the GEM object using the gem_vm_ops
provided by the driver. Depending on their requirements, drivers can either
provide a fault handler in their gem_vm_ops (in which case any accesses to
the object will be trapped, to perform migration, GTT binding, surface
register allocation, or performance monitoring), or mmap the buffer memory
synchronously after calling drm_gem_mmap_obj.

This function is mainly intended to implement the DMABUF mmap operation, when
the GEM object is not looked up based on its fake offset. To implement the
DRM mmap operation, drivers should use the drm_gem_mmap() function.

drm_gem_mmap_obj() assumes the user is granted access to the buffer while
drm_gem_mmap() prevents unprivileged users from mapping random objects. So
callers must verify access restrictions before calling this helper.

Return 0 or success or -EINVAL if the object size is smaller than the VMA
size, or if no gem_vm_ops are provided.

If a driver supports GEM object mapping, mmap calls on the DRM file
descriptor will end up here.

Look up the GEM object based on the offset passed in (vma->vm_pgoff will
contain the fake offset we created when the GTT map ioctl was called on
the object) and map it with a call to drm_gem_mmap_obj().

If the caller is not granted access to the buffer object, the mmap will fail
with EACCES. Please see the vma manager for more information.

The Contiguous Memory Allocator reserves a pool of memory at early boot
that is used to service requests for large blocks of contiguous memory.

The DRM GEM/CMA helpers use this allocator as a means to provide buffer
objects that are physically contiguous in memory. This is useful for
display drivers that are unable to map scattered buffers via an IOMMU.

This macro autogenerates a suitable structfile_operations for CMA based
drivers, which can be assigned to drm_driver.fops. Note that this structure
cannot be shared between drivers, because it contains a reference to the
current module using THIS_MODULE.

Note that the declaration is already marked as static - if you need a
non-static version of this you’re probably doing it wrong and will break the
THIS_MODULE reference by accident.

This function frees the backing memory of the CMA GEM object, cleans up the
GEM object state and frees the memory used to store the object itself.
If the buffer is imported and the virtual address is set, it is released.
Drivers using the CMA helpers should set this as their
drm_driver.gem_free_object_unlocked callback.

This aligns the pitch and size arguments to the minimum required. This is
an internal helper that can be wrapped by a driver to account for hardware
with more specific alignment requirements. It should not be used directly
as their drm_driver.dumb_create callback.

This function computes the pitch of the dumb buffer and rounds it up to an
integer number of bytes per pixel. Drivers for hardware that doesn’t have
any additional restrictions on the pitch can directly use this function as
their drm_driver.dumb_create callback.

For hardware with additional restrictions, drivers can adjust the fields
set up by userspace and pass the IOCTL data along to the
drm_gem_cma_dumb_create_internal() function.

This function implements an augmented version of the GEM DRM file mmap
operation for CMA objects: In addition to the usual GEM VMA setup it
immediately faults in the entire object instead of using on-demaind
faulting. Drivers which employ the CMA helpers should use this function
as their ->:c:func:mmap() handler in the DRM device file’s file_operations
structure.

This function exports a scatter/gather table suitable for PRIME usage by
calling the standard DMA mapping API. Drivers using the CMA helpers should
set this as their drm_driver.gem_prime_get_sg_table callback.

Return

A pointer to the scatter/gather table of pinned pages or NULL on failure.

produce a CMA GEM object from another driver’s scatter/gather table of pinned pages

Parameters

structdrm_device*dev

device to import into

structdma_buf_attachment*attach

DMA-BUF attachment

structsg_table*sgt

scatter/gather table of pinned pages

Description

This function imports a scatter/gather table exported via DMA-BUF by
another driver. Imported buffers must be physically contiguous in memory
(i.e. the scatter/gather table must contain a single entry). Drivers that
use the CMA helpers should set this as their
drm_driver.gem_prime_import_sg_table callback.

Return

A pointer to a newly created GEM object or an ERR_PTR-encoded negative
error code on failure.

This function maps a buffer exported via DRM PRIME into the kernel’s
virtual address space. Since the CMA buffers are already mapped into the
kernel virtual address space this simply returns the cached virtual
address. Drivers using the CMA helpers should set this as their DRM
driver’s drm_driver.gem_prime_vmap callback.

This function removes a buffer exported via DRM PRIME from the kernel’s
virtual address space. This is a no-op because CMA buffers cannot be
unmapped from kernel space. Drivers using the CMA helpers should set this
as their drm_driver.gem_prime_vunmap callback.

PRIME import another driver’s scatter/gather table and get the virtual address of the buffer

Parameters

structdrm_device*dev

DRM device

structdma_buf_attachment*attach

DMA-BUF attachment

structsg_table*sgt

Scatter/gather table of pinned pages

Description

This function imports a scatter/gather table using
drm_gem_cma_prime_import_sg_table() and uses dma_buf_vmap() to get the kernel
virtual address. This ensures that a CMA GEM object always has its virtual
address set. This address is released when the object is freed.

The vma-manager is responsible to map arbitrary driver-dependent memory
regions into the linear user address-space. It provides offsets to the
caller which can then be used on the address_space of the drm-device. It
takes care to not overlap regions, size them appropriately and to not
confuse mm-core by inconsistent fake vm_pgoff fields.
Drivers shouldn’t use this for object placement in VMEM. This manager should
only be used to manage mappings into linear user-space VMs.

We use drm_mm as backend to manage object allocations. But it is highly
optimized for alloc/free calls, not lookups. Hence, we use an rb-tree to
speed up offset lookups.

You must not use multiple offset managers on a single address_space.
Otherwise, mm-core will be unable to tear down memory mappings as the VM will
no longer be linear.

This offset manager works on page-based addresses. That is, every argument
and return code (with the exception of drm_vma_node_offset_addr()) is given
in number of pages, not number of bytes. That means, object sizes and offsets
must always be page-aligned (as usual).
If you want to get a valid byte-based user-space address for a given offset,
please see drm_vma_node_offset_addr().

Additionally to offset management, the vma offset manager also handles access
management. For every open-file context that is allowed to access a given
node, you must call drm_vma_node_allow(). Otherwise, an mmap() call on this
open-file with the offset of the node will fail with -EACCES. To revoke
access again, use drm_vma_node_revoke(). However, the caller is responsible
for destroying already existing mappings, if required.

Lock VMA manager for extended lookups. Only locked VMA function calls
are allowed while holding this lock. All other contexts are blocked from VMA
until the lock is released via drm_vma_offset_unlock_lookup().

Return the start address of the given node. This can be used as offset into
the linear VM space that is provided by the VMA offset manager. Note that
this can only be used for page-based addressing. If you need a proper offset
for user-space mappings, you must apply “<< PAGE_SHIFT” or use the
drm_vma_node_offset_addr() helper instead.

Return

Start address of node for page-based addressing. 0 if the node does not
have an offset allocated.

Initialize a new offset-manager. The offset and area size available for the
manager are given as page_offset and size. Both are interpreted as
page-numbers, not bytes.

Adding/removing nodes from the manager is locked internally and protected
against concurrent access. However, node allocation and destruction is left
for the caller. While calling into the vma-manager, a given node must
always be guaranteed to be referenced.

Destroy an object manager which was previously created via
drm_vma_offset_manager_init(). The caller must remove all allocated nodes
before destroying the manager. Otherwise, drm_mm will refuse to free the
requested resources.

Find a node given a start address and object size. This returns the _best_
match for the given node. That is, start may point somewhere into a valid
region and the given node will be returned, as long as the node spans the
whole requested area (given the size in number of pages as pages).

Note that before lookup the vma offset manager lookup lock must be acquired
with drm_vma_offset_lock_lookup(). See there for an example. This can then be
used to implement weakly referenced lookups using kref_get_unless_zero().

Add a node to the offset-manager. If the node was already added, this does
nothing and return 0. pages is the size of the object given in number of
pages.
After this call succeeds, you can access the offset of the node until it
is removed again.

If this call fails, it is safe to retry the operation or call
drm_vma_offset_remove(), anyway. However, no cleanup is required in that
case.

pages is not required to be the same size as the underlying memory object
that you want to map. It only limits the size that user-space can map into
their address space.

Similar to GEM global names, PRIME file descriptors are also used to
share buffer objects across processes. They offer additional security:
as file descriptors must be explicitly sent over UNIX domain sockets to
be shared between applications, they can’t be guessed like the globally
unique GEM names.

Drivers that support the PRIME API must set the DRIVER_PRIME bit in the
struct structdrm_driver
driver_features field, and implement the prime_handle_to_fd and
prime_fd_to_handle operations.

While non-GEM drivers must implement the operations themselves, GEM
drivers must use the drm_gem_prime_handle_to_fd() and
drm_gem_prime_fd_to_handle() helper functions. Those
helpers rely on the driver gem_prime_export and gem_prime_import
operations to create a dma-buf instance from a GEM object (dma-buf
exporter role) and to create a GEM object from a dma-buf instance
(dma-buf importer role).

Drivers can implement gem_prime_export and gem_prime_import in terms of
simpler APIs by using the helper functions drm_gem_prime_export and
drm_gem_prime_import. These functions implement dma-buf support in terms of
six lower-level driver callbacks:

This is the PRIME export function which must be used mandatorily by GEM
drivers to ensure correct lifetime management of the underlying GEM object.
The actual exporting from GEM object to a dma-buf is done through the
gem_prime_export driver callback.

This function sets up a userspace mapping for PRIME exported buffers using
the same codepath that is used for regular GEM buffer mapping on the DRM fd.
The fake GEM offset is added to vma->vm_pgoff and drm_driver->fops->mmap is
called to set up the mapping.

This is the PRIME import function which must be used mandatorily by GEM
drivers to ensure correct lifetime management of the underlying GEM object.
The actual importing of GEM object from the dma-buf is done through the
gem_import_export driver callback.

drm_mm provides a simple range allocator. The drivers are free to use the
resource allocator from the linux core if it suits them, the upside of drm_mm
is that it’s in the DRM core. Which means that it’s easier to extend for
some of the crazier special purpose needs of gpus.

The main data struct is drm_mm, allocations are tracked in drm_mm_node.
Drivers are free to embed either of them into their own suitable
datastructures. drm_mm itself will not do any memory allocations of its own,
so if drivers choose not to embed nodes they need to still allocate them
themselves.

The range allocator also supports reservation of preallocated blocks. This is
useful for taking over initial mode setting configurations from the firmware,
where an object needs to be created which exactly matches the firmware’s
scanout target. As long as the range is still free it can be inserted anytime
after the allocator is initialized, which helps with avoiding looped
dependencies in the driver load sequence.

drm_mm maintains a stack of most recently freed holes, which of all
simplistic datastructures seems to be a fairly decent approach to clustering
allocations and avoiding too much fragmentation. This means free space
searches are O(num_holes). Given that all the fancy features drm_mm supports
something better would be fairly complex and since gfx thrashing is a fairly
steep cliff not a real concern. Removing a node again is O(1).

drm_mm supports a few features: Alignment and range restrictions can be
supplied. Furthermore every drm_mm_node has a color value (which is just an
opaque unsigned long) which in conjunction with a driver callback can be used
to implement sophisticated placement restrictions. The i915 DRM driver uses
this to implement guard pages between incompatible caching domains in the
graphics TT.

Two behaviors are supported for searching and allocating: bottom-up and
top-down. The default is bottom-up. Top-down allocation can be used if the
memory area has different restrictions, or just to reduce fragmentation.

Finally iteration helpers to walk all nodes and all holes are provided as are
some basic allocator dumpers for debugging.

Note that this range allocator is not thread-safe, drivers need to protect
modifications with their own locking. The idea behind this is that for a full
memory manager additional data needs to be protected anyway, hence internal
locking would be fully redundant.

Very often GPUs need to have continuous allocations for a given object. When
evicting objects to make space for a new one it is therefore not most
efficient when we simply start to select all objects from the tail of an LRU
until there’s a suitable hole: Especially for big objects or nodes that
otherwise have special allocation constraints there’s a good chance we evict
lots of (smaller) objects unnecessarily.

The DRM range allocator supports this use-case through the scanning
interfaces. First a scan operation needs to be initialized with
drm_mm_scan_init() or drm_mm_scan_init_with_range(). The driver adds
objects to the roster, probably by walking an LRU list, but this can be
freely implemented. Eviction candiates are added using
drm_mm_scan_add_block() until a suitable hole is found or there are no
further evictable objects. Eviction roster metadata is tracked in structdrm_mm_scan.

The driver must walk through all objects again in exactly the reverse
order to restore the allocator state. Note that while the allocator is used
in the scan mode no other operation is allowed.

Finally the driver evicts all objects selected (drm_mm_scan_remove_block()
reported true) in the scan, and any overlapping nodes after color adjustment
(drm_mm_scan_color_evict()). Adding and removing an object is O(1), and
since freeing a node is also O(1) the overall complexity is
O(scanned_objects). So like the free stack which needs to be walked before a
scan operation even begins this is linear in the number of objects. It
doesn’t seem to hurt too badly.

Search for the smallest hole (within the search range) that fits
the desired node.

Allocates the node from the bottom of the found hole.

DRM_MM_INSERT_LOW

Search for the lowest hole (address closest to 0, within the search
range) that fits the desired node.

Allocates the node from the bottom of the found hole.

DRM_MM_INSERT_HIGH

Search for the highest hole (address closest to U64_MAX, within the
search range) that fits the desired node.

Allocates the node from the top of the found hole. The specified
alignment for the node is applied to the base of the node
(drm_mm_node.start).

DRM_MM_INSERT_EVICT

Search for the most recently evicted hole (within the search range)
that fits the desired node. This is appropriate for use immediately
after performing an eviction scan (see drm_mm_scan_init()) and
removing the selected nodes to form a hole.

Allocates the node from the bottom of the found hole.

DRM_MM_INSERT_ONCE

Only check the first hole for suitablity and report -ENOSPC
immediately otherwise, rather than check every hole until a
suitable one is found. Can only be used in conjunction with another
search method such as DRM_MM_INSERT_HIGH or DRM_MM_INSERT_LOW.

DRM_MM_INSERT_HIGHEST

Only check the highest hole (the hole with the largest address) and
insert the node at the top of the hole or report -ENOSPC if
unsuitable.

Does not search all holes.

DRM_MM_INSERT_LOWEST

Only check the lowest hole (the hole with the smallest address) and
insert the node at the bottom of the hole or report -ENOSPC if
unsuitable.

Does not search all holes.

Description

The structdrm_mm range manager supports finding a suitable modes using
a number of search trees. These trees are oranised by size, by address and
in most recent eviction order. This allows the user to find either the
smallest hole to reuse, the lowest or highest address to reuse, or simply
reuse the most recent eviction that fits. When allocating the drm_mm_node
from within the hole, the drm_mm_insert_mode also dictate whether to
allocate the lowest matching address or the highest.

This represents an allocated block in a drm_mm allocator. Except for
pre-reserved nodes inserted using drm_mm_reserve_node() the structure is
entirely opaque and should only be accessed through the provided funcions.
Since allocation of these nodes is entirely handled by the driver they can be
embedded.

Optional driver callback to further apply restrictions on a hole. The
node argument points at the node containing the hole from which the
block would be allocated (see drm_mm_hole_follows() and friends). The
other arguments are the size of the block to be allocated. The driver
can adjust the start and end as needed to e.g. insert guard pages.

Description

DRM range allocator with a few special functions and features geared towards
managing GPU memory. Except for the color_adjust callback the structure is
entirely opaque and should only be accessed through the provided functions
and macros. This structure can be embedded into larger driver structures.

As the drm_mm range manager hides its node_list deep with its
structure, extracting it looks painful and repetitive. This is
not expected to be used outside of the drm_mm_for_each_node()
macros and similar internal functions.

This iterator walks over all holes in the range allocator. It is implemented
with list_for_each(), so not save against removal of elements. entry is used
internally and will not reflect a real drm_mm_node for the very first hole.
Hence users of this iterator may not access it.

Implementation Note:
We need to inline list_for_each_entry in order to be able to set hole_start
and hole_end on each iteration while keeping the macro sane.

This iterator walks over all nodes in the range allocator that lie
between start and end. It is implemented similarly to list_for_each(),
but using the internal interval tree to accelerate the search for the
starting node, and so not safe against removal of elements. It assumes
that end is within (or is the upper limit of) the drm_mm allocator.
If [start, end] are beyond the range of the drm_mm, the iterator may walk
over the special _unallocated_ drm_mm.head_node, and may even continue
indefinitely.

This functions inserts an already set-up drm_mm_node into the allocator,
meaning that start, size and color must be set by the caller. All other
fields must be cleared to 0. This is useful to initialize the allocator with
preallocated objects which must be set-up before the range allocator can be
set-up, e.g. when taking over a firmware framebuffer.

This just removes a node from its drm_mm allocator. The node does not need to
be cleared again before it can be re-inserted into this or any other drm_mm
allocator. It is a bug to call this function on a unallocated node.

This is useful for when drivers embed the drm_mm_node structure and hence
can’t move allocations by reassigning pointers. It’s a combination of remove
and insert with the guarantee that the allocation start will match.

Nodes must be removed in exactly the reverse order from the scan list as
they have been added (e.g. using list_add() as they are added and then
list_for_each() over that eviction list to remove), otherwise the internal
state of the memory manager will be corrupted.

When the scan list is empty, the selected memory nodes can be freed. An
immediately following drm_mm_insert_node_in_range_generic() or one of the
simpler versions of that function with !DRM_MM_SEARCH_BEST will then return
the just freed block (because it’s at the top of the free_stack list).

Return

True if this block should be evicted, false otherwise. Will always
return false when no hole has been found.

The GPU scheduler provides entities which allow userspace to push jobs
into software queues which are then scheduled on a hardware run queue.
The software queues have a priority among them. The scheduler selects the entities
from the run queue using a FIFO. The scheduler provides dependency handling
features among jobs. The driver is supposed to provide callback functions for
backend operations to the scheduler like submitting a job to hardware run queue,
returning the dependencies of a job etc.

The organisation of the scheduler is the following:

Each hw run queue has one scheduler

Each scheduler has multiple run queues with different priorities
(e.g., HIGH_HW,HIGH_SW, KERNEL, NORMAL)

Each scheduler run queue has a queue of entities to schedule

Entities themselves maintain a queue of jobs that will be scheduled on
the hardware.

The jobs in a entity are always scheduled in the order that they were pushed.

this fence is what will be signaled by the scheduler
when the job is scheduled.

finished

this fence is what will be signaled by the scheduler
when the job is completed.

When setting up an out fence for the job, you should use
this, since it’s available immediately upon
drm_sched_job_init(), and the fence returned by the driver
from run_job() won’t be created until the dependencies have
resolved.

Suspend the delayed work timeout for the scheduler. This is done by
modifying the delayed work timeout to an arbitrary large value,
MAX_SCHEDULE_TIMEOUT in this case. Note that this function can be
called from an IRQ context.