Support targeting Xv or other specialized framebuffers for very low-end devices?

Support hardware-accelerated video decoding and playback?

Accelerated composition of content from multiple processes, including untrusted processes that may lack access to native accelerated graphics APIs

Requirements

Should have an efficient software fallback

In particular, should be able to be implemented as mostly immediate-mode, with minimal temporary buffers

Cross-platform API

Don't make clients implement multiple rendering code paths

Abstract over hardware acceleration APIs (e.g. D3D vs GL), especially since the D3D vs GL decision on Windows is in flux and likely to remain so for some time

Be able to use platform layer APIs (e.g. Core Animation) if we get backed into a corner with undocumented interfaces?

To support the above flexibility of implementation, and also to allow incremental implementation, we need the ability to "fall back" to software rendering of a layer subtree without the user of layers having to know about it

Support VRAM management; we'd like to be able to discard buffers and re-render them when VRAM is full, including with fine granularity via a tile cache

Support subpixel-AA of text rendered onto an opaque background

Testable. In particular, we should be able to use reftests to test every layer implementation.

Ability to render an arbitrary element or frame subtree to an arbitrary context at any time. Use cases:

drawWindow

drag-and-drop feedback drawing

-moz-image-element

Printing individual pages

Of those, drawWindow and -moz-image-element need to be fast. -moz-image-element might even want direct layer support so animated content can be reflected in sync even when animation is happening off the main thread

Open questions

Should we build the layer tree if we are not hardware accelerated?

Roc: I want code above gfx to have a single code path that works regardless of whether we're using accceleration or not. That means we want to be using the layer API all the time. However, in non-accelerated situations the layer API could permit an implementation that's mostly immediate-mode drawing --- that's the approach I took in my proposal.

Should layers always have a backing store?

Jeff is leaning toward yes. I believe Roc is leaning toward no. One thing to think about would be the idea of layers, having a virtual backing store. i.e. layers that don't currently have a backing store would "faulted in" by do the drawing.

Jeff: One advantage of them always having a backing store is that it gives the users the api some performance guarantees. I'd argue that without these the layer abstraction becomes less valuable and perhaps duplicates the frame tree?

Roc: assuming that "container" layers are always formed by compositing together their children, what is the point of having backing store for them? For leaf layers, I assume we would keep them in backing store (as long as clients hold references to them, which may only be as long as one paint cycle). My proposal makes no provision for "faulting in" the contents of discarded layers.

Jeff: My response to that is: what's the point creating layers for things that we don't want a backing store for?

Roc: so we can do hardware accelerated operations on them

It seems that after IRC conversation, we currently don't have any argument for having the API support drawing directly into container layers. Whether the rendering of a container layer is cached in a surface is an implementation detail, which may vary based on speed-vs-VRAM tradeoffs.

How is layer z-order maintained?

Roc: child order. i.e. the nth child is beneath n+1th child

Use case

Assume we have a webpage with a container containing some web content and a video. This container has a 3d-transform applied to it. What should the layer hierarchy look like?

Roc:

Master layer for the page or window (no drawn content of its own)

Retained content that's behind the transformed container

Transformed container layer (no drawn content of its own)

Retained content behind the video

Video (YUV)

Retained content in front of the video (not needed if there is no such content)

Retained content that's in front of the transformed container (not needed if there is no such content)

Roc

Display list processing

Taking a step back, let me write down how I think layout and display lists could/should build and maintain a layer tree.

Currently when we paint we only build a display list for the items that intersect the bounding rectangle of the invalid area. Instead let's assume we have to build a display list for the entire visible window area. This is not likely to be a performance problem. This does not mean we actually have to repaint the entire visible window every time, it just means we figure out what would be painted.

Then traversing the display list lets us (re)build the layer tree.

There are three kinds of layers:
1) Explicit container layers that we create because we need to take advantage of hardware acceleration. Example: element with transform, element with opacity.
2) Explicit leaf layers that we create because we need to take advantage of hardware acceleration. Example: video YUV layer, image element with transform or opacity.
3) Implicit "in between" layers that represent all the content rendered between two explicit layers, or before the first explicit layer of a container, or after the last explicit layer of a container. These are always leaves. All cairo-rendered content is in these layers.

Observation: An explicit layer is always associated with a single frame. (Although a single frame may render more than just its layer, e.g. an nsVideoFrame can have background, border and outline display items as well as the video YUV layer.) We can record in such a frame its explicit layer(s), if any. I think the maximum number of explicit layers a frame might need is 2: one "intrinsic" layer, e.g. the YUV layer for video, and one "extrinsic" layer, e.g. the layer induced by transforms or opacity. For example an nsVideoFrame with borders and a 3D transform would need two layers. I think each frame can have 0 or 1 explicit container layers and 0 or 1 explicit leaf layers. Any kind of frame can have an explicit container layer, but only certain types of frames have explicit leaf layers.

Observation: in-between layers can span arbitrary chunks of the frame tree. Worse, different parts of the same frame can be at different z-levels, and since an in-between layer represents a run of z-contiguous display items, different display items of the same frame can be in different in-between layers, and a single in-between layer can contain display items for frames very far apart in the frame tree. HOWEVER, every in-between layer is either the layer before some explicit layer, or the last layer of some container.

The display list system actually builds a tree of lists. A display item corresponding to an explicit container layer will have a child list with the container contents. If we treat the outermost list for the window as the children of a root container layer, then our job is, given a display list representing the children of a container layer, determine the child layers:

For any items in the list that represent the explicit container layer for a frame, that layer will be a child layer. We can retrieve this layer if it's cached in the frame. But we need to recursively descend into these display items to determine their child layers.

For any items in the list that represent the explicit leaf layer for a frame, that layer will be a child layer. We can retrieve this layer if it's cached in the frame.

For each run of consecutive items in the list that are not an explicit container or leaf layer, we have one implicit layer. It seems painful to attach this layer to any frame or another layer. However, we can keep a list of them in their container layer.

An item in the display list might not have an explicit layer, but might contain a child list containing an item with an explicit layer, etc. This is a problem. We basically need to disable explicit layers for children in this situation, OR guarantee this never happens, by ensuring that all our display items that contain their own display lists can be implemented as explicit layers. For example, consider an element with an SVG filter applied that contains an element with a 3D transform. If we're going to accelerate the 3D transform and do it off the main thread, we also need the compositing-thread to apply the SVG filter (preferably accelerated), so SVG filters need to be supported by the layer API. What needs to be supported:

SVG foreignObject inside arbitrary SVG gunk ... hmm. Maybe if we support all the above, *and* render SVG using display lists (which we currently don't), that would be enough.

(jrmuizel: webkit doesn't have this problem because they don't support this kind of rendering effects. -webkit-mask-image works fine likely because it's easy to accelerate, -webkit-box-reflect doesn't seem to work but would be pretty easy to accelerate as well)

The hard remaining problem is to know how to reuse or update the implicit layers belonging to a container. The content rendered in those layers may have been invalidated. Updating the invalid rectangle in each "implicit" layer is not that hard. The hard part is knowing when to create and/or destroy implicit layers because content has changed z-index, or an explicit layer child has been inserted or removed. I think we can do this with special flags to tag invalidations caused by a frame with an explicit layer child being inserted or removed.

Thought experiment: suppose we have a regular Web page so everything can be rendered in one implicit layer. Then suppose someone inserts a small element with its own layer (say a 3D-transformed button) into the page, or it gets scrolled into view from offscreen. Suddenly the page content splits into two implicit layers, the stuff below the button and the stuff above the button. From the display list we can easily compute the bounding regions of these two layers. If they're both non-empty, then we need to split the old layer into two layers. We need to re-render at least the intersection of the two regions. I think this can be done fairly efficiently, but it will be tricky.

Implicit layers can also change size, need to be scrolled, etc.

Proposal

LayerManager

Every layer belongs to a LayerManager. The idea is that a window will provide an associated LayerManager. (Documents being rendered by their own process will probably have their own LayerManagers.) Every layer in a layer tree must have the same LayerManager.

Updates to the layer tree are performed within a transaction. Nested transactions are not needed or allowed. Only layer tree states between transactions will be rendered. Unless otherwise noted, all layer-related APIs
may only be used within a transaction on the appropriate LayerManager, and only on the main thread.

LayerManager::SetRoot sets a layer to be the root layer. This is how you get a layer to be displayed.

Layer

Layer is the superclass of all layers. A Layer could be anything that can be rendered into a destination surface. It has methods to set various properties that affect the rendering of a layer into its parent: opacity, transform, filter, etc... (For simplicity I'm only showing 'opacity' here.)

Animated properties are supported. To animate, call a setter method one or more times, passing in different timestamped values. When we render, we use the last supplied property value that is before the current time. When calling a setter method with TimeStamp T, all values for times >= T are discarded. To set a constant value, just pass a null TimeStamp, which is interpreted as being at the beginning of time. The interpolationFlag can be 'none', which means that the value changes instantly from its previous value to the new value at time T, or 'linear', which means that during the interval between the time of the previous sample and T, we linearly interpolate from the previous value to the new value.

(We could add richer easing functions, but I think it makes more sense to keep layers simple and have layout perform piecewise-linear approximation of complex functions.)

Layers should be referenced counted. The LayerManager holds a reference to its root layer and parent layers hold references to their children.

RenderedLayer

A RenderedLayer is the basic leaf layer that you can render into using cairo/Thebes.

RenderedLayers are conceptually infinite in extent. Each RenderedLayer has an internal "valid region" which is finite. (An implementation would create a surface large enough to hold the entire valid region.) The initial valid region is empty. The implementation is allowed to discard all or part of the buffered contents of a RenderedLayer by removing areas from the valid region. Drawing into the RenderedLayer adds to the valid region.

When calling beginDraw, the caller specifies in aVisibleRegion a region that needs to be valid when drawing is done. (This is the area that will be visible to the user.) The caller can also specify aChangedRegion to indicate that content in that region has changed and will need to be repainted. The implementation returns aRegionToDraw to indicate the area that must be repainted. Typically this will be aVisibleRegion minus (the currently valid region minus aChangedRegion). aRegionToDraw must not extend outside aVisibleRegion. aRegionToDraw can be added to the valid region. The returned gfxContext is clipped to aRegionToDraw.

No layer operations are permitted between a beginDraw and endDraw, you have to finish drawing before continuing to modify the layer tree.

In beginDraw, when 'aOpaque' is true, the caller promises to fill the entire aRegionToDraw with opaque content. The implementation may be able to use this for optimizations, especially for the drawing of subpixel-antialiased text.

The content in a RenderedLayer can change size. If the size decreases, aChangedRegion will include the area of content that has gone away, and aVisibleRegion will exclude that area. The implementation may trim its buffers appropriately. If the size increases the implementation will need to increase the buffer.

An implementation may only reduce the valid region while there is no active transaction.

It is possible for aRegionToDraw to return empty, e.g. when nothing changed and the entire visible area is still buffered. The caller should optimize by skipping painting in this case.

For scrolling and to enable intelligent salvaging of parts of RenderedLayers by other layers, there is the copyFrom method. The source RenderedLayer must have the same LayerManager. Self-copies are allowed and must be supported somehow, they will be common. The part of aRegion that is valid in the source layer is offset by aDelta and becomes valid in the destination layer. The part of aRegion that is invalid in the source layer becomes invalid in the destination layer.

The user of layers (layout) is responsible for ensuring that at the end of a transaction, only valid areas of a RenderedLayer will end up being visible in the window.

ContainerLayer

A ContainerLayer represents a surface into which its children are composited. Whether there is persistent backing store associated with it is an implementation detail.

An ImageQueue represents a queue of timestamped images. ImageQueues are refcounted, thread-safe and can be used by any thread.

class ImageQueue {
void setImage(TimeStamp, aFormat, aBuffer);
};

setImage can be called multiple times to queue up many images with different timestamps. As with the set methods on Layer, images already queued with a later timestamp than the given timestamp are removed. ImageQueueLayer will take ownership of the buffer, which must not be modified again. YUV formats must be supported. Unlike layer APIs, setImage can be called on any thread. Normally we'd only write into an ImageQueue from a single video decoding thread, however.

The layer implementation can choose to eagerly convert the "next" image to RGB, off the main thread, if a non-GPU-capable device is being targeted (matching what we currently do).

Immediate Mode

We need to be able to restructure our code around layers without sacrificing performance on existing non-accelerated cairo-only backends. Therefore we cannot require an implementation to actually construct a temporary surface for each layer.

For example, suppose we have some normal content, covered by an opacity:0.5 element, covered by some more normal content. Currently we'd render the bottommost content directly to the destination (actually to a backbuffer, but that's irrelevant here), then we'd create a temporary surface, render in the translucent content, composite that onto the destination, and then render the topmost content directly to the destination. A naive layers implementation would render the bottommost content directly to a temporary surface, render the translucent content to another temporary surface, render the topmost content to another temporary surface, and then composite them each onto the destination. That would be an unacceptable performance hit when compositing must be done on the CPU, especially if the buffers will not be saved and used again, because animation is not present or memory is scarce. (Also, since the topmost content layer is likely to not have a transparent background, we are likely to disable subpixel antialiasing for the topmost content, which we would like to avoid if possible.)

To enable an efficient immediate-mode implementation, we impose some constraints on the use of the layers API. First, define "mutation" as a call to any setter method on Layer, or a call to beginDraw or copyFrom on RenderedLayer, or a call to insertBefore or removeChild on ContainerLayer (when we add layer APIs, they may need to be added to this list). Then we impose the following rules:

After calling RenderedLayer::beginDraw, you are not allowed to mutate any layer before or equal to the RenderedLayer in a pre-order traversal of the layer tree (in this transaction).

The layer parameter to RenderedLayer::copyFrom must not be before this layer in a pre-order traversal of the layer tree.

Then we can have an immediate-mode layer implementation which works just like our current code. Rule 1 means that when we call beginDraw, everything before the RenderedLayer has already been drawn, and we know the final values of the properties controlling rendering of this layer and its ancestors. If conditions are right, i.e. the opacity of this layer is 1.0, transform is affine, etc, we can avoid creating a buffer for this layer, and just return a gfxContext that renders directly into some ancestor buffer. The fact that we don't have this layer's contents stored anywhere is reflected by setting the layer's valid region to empty at the end of the transaction. Rule 2 ensures that no other layer can try to use an already-drawn RenderedLayer as the source of a copy. Using a layer later in the tree as a source is fine, it has no valid region so we just remove the copied from the destination's valid region.

Note that rule 1 means you must add all children to a ContainerLayer before rendering into any of them, and in particular you must add a RenderedLayer to its parent before rendering into it. These constraints are easy enough for the display list subsystem to satisfy.

Additional Implementation Notes

Some layer implementations might not be able to "natively" handle some features Gecko needs. (See "what needs to be supported" under "Display List Processing".) This might be because of temporary limitations in the implementation, or fundamental limitations in the design. For example, a layer implementation based on Core Animation would be unable to handle many SVG filters. With this proposal, a layer implementation can fall back to "immediate mode" non-accelerated rendering for the layer subtree of any layer it cannot handle. I think the ability to do this fallback transparently to the user of layers is essential for any workable layer API proposal.

To implement off-main-thread compositing with animation, you need to retain the contents of the visible regions of all RenderedLayers, plus the layer client (layout) must provide visible regions that include all the areas that will become visible over time.

One way to implement begin/endTransaction is for the compositing thread to retain a "lazy copy" of the layer tree. When a layer is modified by the main thread for the first time during a given transaction, it makes a copy of the layer to be used by the compositing thread in case compositing must happen during the transaction. As far as the compositing thread is concerned, its copy of the layer tree is immutable. On endTransaction, the compositing thread's copy can be thrown away (as long as compositing isn't currently happening), and it can get a new lazy copy of the layer tree.

Rendering Layers To Arbitrary Cairo Targets

To support drawWindow and printing, we need to be able to draw a layer tree to any target, with maximum possible fidelity.

For printing, it would be enough to create a LayerManager for a destination gfxContext* and build a new set of layers for it. This would use the immediate-mode implementation described above. This implies that we need to have multiple implementations compiled in. Also, even if we disable 3D rendering on non-accelerated systems, we'll still want to have a complete software fallback (using pixman transforms?) for printing. (jrmuizel: for the record webkit does not currently print 3d transformed layers)

For drawWindow, we definitely want the ability to retarget an existing set of layers to draw them to a different context. This is essential for reftests to work.

Maybe we should add to LayerManager a method

void beginTransactionWithTarget(gfxContext* aDestination);

and a way to construct a LayerManager with a device context for printing. After you end the transaction started with beginTransactionWithTarget, everything is guaranteed to have been rendered to the destination context.

Extensions

This proposal can be extended in several ways:

Additional rendering properties on Layers

New layer types to accelerate different special rendering cases, e.g. WebGL

A new layer type that renders the contents of some other layer subtree (useful for accelerated live thumbnails etc)

A new layer type that renders the contents of a layer subtree from a different process. One possible cross-process layer implementation would be: when the other process calls endTransaction, transmit the layer subtree to the master process to be grafted in and composited.

Jeff

Layers have two basic operations:

InvalidateRect()/Draw() - Gets content into a layer. Content is draw on the main thread using cairo, a video decoder etc.

Composite() - Composites a layer into the scene. Will perform color conversion, filter etc. as needed.

Possible example of how filter layers could work:

Assume we have 5 layers:

3 layers in the background

1 filter layer

1 layer on top

To render this scene we:

create a texture the size of the filter layer

set the render target to the texture

composite the 3 background layers into the texture

set the render target to the framebuffer

composite the 3 background layers into the framebuffer (if the background layers are completely occluded by the filter layer, we can ommit this)

composite the filter layer, applying a blur kernel to the texture we rendered earlier

composite the top layer

A note about how to implement imperative animation in this world:
CoreAnimation will ask a layer to redraw it's contents at particular timestamp. The layer can choose to do so, or ignore the request.

WebKit

Here are some observations on WebKit's implementation:
Adding the style "-webkit-perspective: 800;" to a div will promote it to a layer. This seems to cause two regressions:

This content will move separately from the background. i.e. The div and the rest of the content do not move together, the div lags behind.

Having this layer also causes us to repaint all of the background content when scrolling, instead of just the newly exposed area.

It would be nice if we could avoid these problems.

For the example above it looks like webkit creates three layers:

One for the view

One for the document content (the size of the entire document)

One for the div - 100x100

Bas

// Interface implemented by 'renderers'. This interface can for example
// be implemented by Cairo(possibly with cairo-gl), OpenGL, Direct3D or
// any other framework we want. This should allow easy switching between
// different renderers, and provide needed software fallback mechanisms.
class IRenderer
{
// Set the widget this renders to.
void SetWidget(widget);
}

// The controlling class that controls the composition of frames. This
// lives on a rectangular area on the client's screen, and controls the
// composition of all layers on the compositor. This runs its own thread
// internally from which all OpenGL/D3D operations are executed. All re-
// scheduling of drawing and invalidations are run based on operations
// executed on the compositor and its layers.
class Compositor
{
// Create a layer that can be used to render to, the size here
// describes the size in pixels. The format the format of the data,
// This can be RGB, RGBA, YUV. The compositor will know what to do
// with these layers, and how to render them properly. When the last
// reference to the layer dies there will be only one left, and it's
// ready to be destroyed. Type can be one of hardware or managed.
// Only managed layers can be drawn to directly from software.
// Any created layer can contain other layers inside, places anywhere
// on its surface. The layer is initially locked, meaning it's not
// shown until unlocked.
Layer *CreateLayer(size, format, type);

// This sets the renderer that this compositor uses, without a renderer
// the compositor essentially does nothing.
void SetRenderer(renderer)
};

// These are operations that can be executed on all layers.
class ILayer
{
// Color by which the layers pixels are multiplied,
// This contains an alpha value so opacity can implicitly
// be controlled.
void SetColor(color);
// Sets an affine transformation to place the layer with.
void SetTransform(matrix);
// Add a layer to this layer. This layer may be blitted onto
// this layer's hardware surface.
void AddLayer(ILayer);
// Optional pixel shader program to run on this layer. This can be
// used to apply a variety of effects to the layer when rendered.
void SetShader(shader);
// Lock the layer, this makes no changes take effect while in the
// locked state.
void Lock();
// Unlock the layer, this will cause the compositor to traverse
// passed this frame in the tree when compositing.
void Unlock();
// Clone an instance of this layer, it will contain a copy-on-write
// reference to the contents of this layer. This layer will initially
// be locked.
ILayer *Clone();
};

// Layers exposing this interface allow access to the surface. Double
// buffered, this means that if it's currently being drawn to the compositor
// will simply draw the texture. This will ensure rendering of the compositor
// area doesn't stall waiting on an expensive software render.
class ILockableLayer
{
// Lock the surface of this layer. Returns a gfxContext to draw to.
gfxContext *Lock();
// Unlock the surface, this means we're done. And will signal the
// compositor to update the associated texture and redraw.
void Unlock();
};

// Layers exposing this interface can have their hardware surface accessed,
// which can then be used as a render target for other accelerated parts of
// the code.
class IHardwareLayer
{
// Return hardware surface in whatever structure we pick. Might need
// locking/unlocking logic.
HardwareSurface *Surface();
};

// This class controls animations on objects, any class can be made to
// implement it, but we'd most likely provide some standard implementations.
// Any state it wants to maintain is contained on an implementation level.
class IAnimator
{
// Called by the compositor when starting a rendering cycle, with
// the elapsed time.
virtual void AdvanceTime(double aTime);
// This assigns the animator to a frame and registers with its compositor.
void Assign(ILayer *aLayer);
}

Integration with windowing systems

Webkit/Coreanimmation

AppKit supports hosting a layer tree in a NSView using the following technique:

[aView setLayer:rootLayer];

[aView setWantsLayer:YES];

It seems that this method creates a transparent CGSurface the size of the viewport that all of the layers are drawn on to. The window server then takes care of compositing the background content and layered content.

If we set an animating div to have a z-index of -1, we seem to get a large texture and a bunch of tiles?

Scrolling

What should we do to scroll?

Bas: Use a tile cache.

Anholt: destination tile cache sounds pretty nice to me, and with APPLE_object_purgeable use the GPU could throw things out of the cache when appropriate (rather than you having to guess).

Tile Cache

Disadvantages

Memory usage is >= window size

More complicated

Need a mechanism to deal with tiles that aren't ready

Advantages

Easy to accelerate

Better for pixel scrolling/smooth scrolling

Traditional Scrolling

Disadvantages

Not so good for pixel scrolling/smooth scrolling

Scrolling speed limited by paint speed

No backing store, so we can't move things without repainting the damaged area

Layer tree - the model -- this is the tree that applications mostly interact with

Presentation tree - the view -- contains the current values of the animating properties

Render tree - the view

Having separate trees sort of corresponds to Roc's LayerBuilder infrastructure described above.

Classes

CALayer - CALayer is the model class for layer-tree objects. It encapsulates the position, size, and transform of a layer, which defines its coordinate system. It also encapsulates the duration and pacing of a layer and its animations by adopting the CAMediaTiming protocol, which defines a layer’s time space.

Clutter

ClutterActor - Every actor is a 2D surface positioned and optionally
transformed in 3D space. The actor is positioned relative to top left corner of
it parent with the childs origin being its anchor point (also top left by
default).

ClutterGroup - A group of Actors positioned relative to the group position
ClutterStage - a place where Actors are composited