This is a bug to track the issues that need to be fixed before we turn on tiles everywhere for b2g. Note, there is a similar bug, bug 894333 that only concerns the browser. This may end up obsoleting that bug.

I'm adding some gralloc bugs as blockers, but this may depend on whether we can show that gralloc could be a significant performance increase for tiles (I would say that this isn't a given).
We may also want to consider some kind of EGLImage-backed tiles implementation to move the texture allocation away from the compositor, if that makes sense...

Created attachment 8387521[details][diff][review]
Tiling branch work v2
I've addressed everything in my comments, except:
- DrawDebugOverlay
- splitting out gfxSharedReadLock
- the compositable backend data stuff (I have a patch on another machine that I won't get to for another hour or so)*
- DumpTextureHost
- ReadUnlock warning*
- XXX Sort out this comment*
- CompositableParentManager::IsCrossProcess
- Audit GrallocTextureHost::GetAsSurface*
Stuff marked with a star is stuff I intend to fix now, everything else and any issues I missed, take your pick.

(In reply to Chris Lord [:cwiiis] from comment #9)
> Created attachment 8387565[details][diff][review]
> Restore old gralloc path for non-tiles
>
> Here's a patch that restores the old path for non-tiles...
So was looking at doing the opposite (removing the old path), and it's quite involved. We need to either remove or port GrallocDeprecatedTextureHostOGL and duplicate whatever it is that CompositableParentManager::ReturnTextureDataIfNecessary is doing (which might need some looking at anyway?).
My vote goes for applying this patch and dealing with this later.

(In reply to Chris Lord [:cwiiis] from comment #9)
> Created attachment 8387565[details][diff][review]
> Restore old gralloc path for non-tiles
>
> Here's a patch that restores the old path for non-tiles...
After even more inspection, we realised that the new path breaks the fencing stuff satoro was working on, so I don't think not restoring the old path is an option. I've pushed this to the branch.
Also fixed up the other comments I pointed out.
GetAsSurface is only used by debug code, so I think we're ok to sort that out later.

(In reply to Chris Lord [:cwiiis] from comment #24)
> I've just spotted that TextureHostOGL doesn't implement HasInternalBuffer
> (which defaults to false on TextureHost), so that's negatively impacting
> Android performance/memory use. Will push a fix after I've tested.
This wasn't correct, the problem was an uninitialised member variable in TiledLayerBufferComposite (but it had the same effect).
Here's a try push with that fixed: https://tbpl.mozilla.org/?tree=Try&rev=48735e1258b7
Here's an etherpad where we're tracking what we've pushed to try: https://etherpad.mozilla.org/tiling-talos
Playing with this on a Galaxy Nexus, I'm seeing the compositor occasionally block on the client (or at least, it looks that way). I don't think this should stop us from landing, but we need to investigate this.

Comment on attachment 8387763[details][diff][review]
Tiling branch work v5, rebased on m-c
Review of attachment 8387763[details][diff][review]:
-----------------------------------------------------------------
::: gfx/layers/client/TiledContentClient.cpp
@@ +28,5 @@
>
> +// This is the minimum area that we deem reasonable to copy from the front buffer to the
> +// back buffer on tile updates. If the valid region is smaller than this, we just
> +// redraw it and save on the copy (and requisite surface-locking involved).
> +#define MINIMUM_TILE_COPY_AREA ((TILEDLAYERBUFFER_TILE_SIZE * TILEDLAYERBUFFER_TILE_SIZE)/16)
This could be a constant, not a define.
@@ +314,5 @@
> + MOZ_COUNT_CTOR(gfxShmSharedReadLock);
> +
> + if (mAllocator) {
> +#define MOZ_ALIGN_WORD(x) (((x) + 3) & ~3)
> + if (mAllocator->AllocUnsafeShmem(MOZ_ALIGN_WORD(sizeof(ShmReadLockInfo)),
Since ShmReadLockInfo contains just a int32_t, better static_assert(sizeof==4) and then you dont need to do crazy computations and can remove MOZ_ALIGN_WORD (which btw should be an inline function, not a macro).
::: gfx/layers/client/TiledContentClient.h
@@ +121,5 @@
> +
> + ShmReadLockInfo* GetShmReadLockInfoPtr()
> + {
> + return reinterpret_cast<ShmReadLockInfo*>
> + (mShmem.get<char>() + mShmem.Size<char>() - sizeof(ShmReadLockInfo));
What assumption are you making here about alignof(ShmReadLockInfo) ? Is that assumption safe? Can you cover it by an assertion?
::: gfx/layers/opengl/GrallocTextureHost.cpp
@@ +130,5 @@
> + if (mCompositableBackendData) {
> + // There are two paths for locking/unlocking - if mCompositableBackendData is
> + // set, we use the texture on there, otherwise we use
> + // CompositorBackendSpecificData from the compositor and bind the EGLImage
> + // only in Lock().
Hang on, are we going back to using a single texture compositor-wide for all gralloc buffers? There should at least be a comment here explaining why this is wanted. Also, if you still have these bugs with client processes hanging because the compositor doesn't release a lock on a graphicbuffer, then this would be a place where to look suspisciously!
@@ +324,5 @@
>
> void
> GrallocTextureHostOGL::Unlock()
> {
> // Unlock is done internally by binding the texture to another gralloc buffer
I realize that that was there before, but maybe this is still suspicious and could explain these client process hangs you were mentioning... i.e. what if no new compositing is taking place, so this locks remains, and prevents a client process from getting a write lock.

(In reply to Benoit Jacob [:bjacob] from comment #29)
> Comment on attachment 8387763[details][diff][review]
> Tiling branch work v5, rebased on m-c
>
> Review of attachment 8387763[details][diff][review]:
> -----------------------------------------------------------------
>
> ::: gfx/layers/client/TiledContentClient.cpp
> @@ +28,5 @@
> >
> > +// This is the minimum area that we deem reasonable to copy from the front buffer to the
> > +// back buffer on tile updates. If the valid region is smaller than this, we just
> > +// redraw it and save on the copy (and requisite surface-locking involved).
> > +#define MINIMUM_TILE_COPY_AREA ((TILEDLAYERBUFFER_TILE_SIZE * TILEDLAYERBUFFER_TILE_SIZE)/16)
>
> This could be a constant, not a define.
>
> @@ +314,5 @@
> > + MOZ_COUNT_CTOR(gfxShmSharedReadLock);
> > +
> > + if (mAllocator) {
> > +#define MOZ_ALIGN_WORD(x) (((x) + 3) & ~3)
> > + if (mAllocator->AllocUnsafeShmem(MOZ_ALIGN_WORD(sizeof(ShmReadLockInfo)),
>
> Since ShmReadLockInfo contains just a int32_t, better
> static_assert(sizeof==4) and then you dont need to do crazy computations and
> can remove MOZ_ALIGN_WORD (which btw should be an inline function, not a
> macro).
>
> ::: gfx/layers/client/TiledContentClient.h
> @@ +121,5 @@
> > +
> > + ShmReadLockInfo* GetShmReadLockInfoPtr()
> > + {
> > + return reinterpret_cast<ShmReadLockInfo*>
> > + (mShmem.get<char>() + mShmem.Size<char>() - sizeof(ShmReadLockInfo));
>
> What assumption are you making here about alignof(ShmReadLockInfo) ? Is that
> assumption safe? Can you cover it by an assertion?
All of this was just blindly derived from gfxSharedImageSurface, so I'm afraid I don't really know off-hand (and am struggling to think about it right now)... If you think this can be simplified, great :)
> ::: gfx/layers/opengl/GrallocTextureHost.cpp
> @@ +130,5 @@
> > + if (mCompositableBackendData) {
> > + // There are two paths for locking/unlocking - if mCompositableBackendData is
> > + // set, we use the texture on there, otherwise we use
> > + // CompositorBackendSpecificData from the compositor and bind the EGLImage
> > + // only in Lock().
>
> Hang on, are we going back to using a single texture compositor-wide for all
> gralloc buffers? There should at least be a comment here explaining why this
> is wanted. Also, if you still have these bugs with client processes hanging
> because the compositor doesn't release a lock on a graphicbuffer, then this
> would be a place where to look suspisciously!
No, the two paths are one texture per compositable or one texture per host locked in a frame. The former pretty much equates to the latter in all cases except for tiles, where we have many textures in one compositable.
It's worth restoring the old path because satoro's fence work relies on it, but also the non-tiles gralloc paths are not well-tested with the new locking mechanism.
> @@ +324,5 @@
> >
> > void
> > GrallocTextureHostOGL::Unlock()
> > {
> > // Unlock is done internally by binding the texture to another gralloc buffer
>
> I realize that that was there before, but maybe this is still suspicious and
> could explain these client process hangs you were mentioning... i.e. what if
> no new compositing is taking place, so this locks remains, and prevents a
> client process from getting a write lock.
The hangs I'm seeing are on Android only, b2g seems fine - I would more suspect something in the TextureClient/Host (maybe a contending lock that shouldn't be?)
gralloc is unfortunately a black box (and more irritatingly, a different black box per driver revision per device) - I think we can debug these issues as follow-up and hopefully think up some more robust plan.

Comment on attachment 8387596[details][diff][review]
Tiling branch work v3, rebased on m-c, TextureClientPool added, summary text corrected
Review of attachment 8387596[details][diff][review]:
-----------------------------------------------------------------
part 1 of my review:
::: browser/devtools/profiler/cleopatra/js/parserWorker.js
@@ +1225,5 @@
> bugNumber: "772916",
> check: function(frames, symbols, meta) {
>
> return stepContains('PaintGradient', frames, symbols)
> + && stepContains('ClientTiledLayerBuffer::PaintThebesSingleBufferDraw', frames, symbols)
I say don't bother patching this.
::: gfx/layers/TiledLayerBuffer.h
@@ +199,3 @@
> * has completed uploading.
> */
> + virtual void UseTiledLayerBuffer(ISurfaceAllocator* aAllocator,
UseTiledLayerBuffer is a poor name for this. I like PaintedTiledLayerbuffer much better. Otherwise we should find a better name.
@@ +369,5 @@
> + if (oldTileCount >= tilesMissing) {
> + oldRetainedTiles[i] = AsDerived().GetPlaceholderTile();
> + AsDerived().ReleaseTile(oldTile);
> + } else {
> + oldTileCount ++;
nit: no space here.
::: gfx/layers/client/ClientLayerManager.cpp
@@ +61,5 @@
> mRoot = nullptr;
>
> MOZ_COUNT_DTOR(ClientLayerManager);
> +
> + mTexturePools.clear();
Is this clear needed? The destructor for this object is about to run.
::: gfx/layers/client/TextureClientPool.h
@@ +17,5 @@
> +namespace layers {
> +
> +class ISurfaceAllocator;
> +
> +class TextureClientPool : public RefCounted<TextureClientPool>
This class needs a comment to describe its functionality.
@@ +30,5 @@
> + * a cached client that was returned to the pool, or a newly allocated
> + * client if one isn't available.
> + *
> + * All clients retrieved by this method should be returned using the return
> + * functions, or reported lost so that the pool can manage its size correctly.
I'm not a fan of the feature that a texture client can become loss. This complicates the lifetime quite a bit. Is this *really* needed?
@@ +44,5 @@
> + /**
> + * Return a TextureClient that is not yet ready to be reused, but will be
> + * imminently.
> + */
> + void ReturnTextureClientDeferred(TextureClient *aClient);
Then it shouldn't be returned to the poll. If you have a good reason to do this then it should be tracked outside of the pool. If you return something here but it's not ready to be re-used then you're putting the responsibility on the caller to ReturnDeferredClients to know when it is. Now what happens if there's two types of lifetimes for tiles returned here and one of the calls to ReturnDeferredClients happens while some tiles still aren't ready.
The user of the texture client are responsible for tracking this.
@@ +56,5 @@
> + /**
> + * Attempt to shrink the pool so that there are no more than sMinCacheSize
> + * unused clients.
> + */
> + void ShrinkToMinimumSize();
Ideally these Shrin* would be private (use friend).
@@ +68,5 @@
> + /**
> + * Report that a client retrieved via GetTextureClient() has become
> + * unusable, so that it will no longer be tracked.
> + */
> + void ReportClientLost() { mOutstandingClients--; }
This call is gross too. What you're saying here:
- This pool will allocate the texture client
- You have to return the texture client
- ... but if you don't then you need to tell me about it
I don't think this complexity is justified for a pool. The pool should accept objects that are returned and either make it guaranteed to return or optional. We shouldn't track out much outstanding clients are made.
@@ +87,5 @@
> + static const uint32_t sMinCacheSize = 0;
> +
> + // The maximum number of texture clients managed by this pool that we want
> + // to remain active.
> + static const uint32_t sMaxTextureClients = 50;
I'm not a fan of this max. This wont work for a 4k display for example. In fact I really don't like most of these values. We seems to have done well so far this arbitrary limits so I think we should push back or have a great reason for them and why these numbers are chosen.
::: gfx/layers/composite/TextureHost.h
@@ +430,5 @@
>
> /**
> + * Indicates whether the TextureHost implementation is backed by an
> + * in-memory buffer. The consequence of this is that locking the
> + * TextureHost does not contend with locking the texture on the client side.
I don't understand this comment. This leaves me wondering:
If there's an in-memory buffer then both sides can still access it so locking is important.
I'm sure this works so please clean-up the comment/API to make it more obvious what is happening here.
::: gfx/layers/ipc/CompositableForwarder.h
@@ +222,5 @@
>
> /**
> * Returns the type of backend that is used off the main thread.
> * We only don't allow changing the backend type at runtime so this value can
> * be queried once and will not change until Gecko is restarted.
This comment is wrong now, see the other GetCompositorBackendType.
::: gfx/layers/ipc/ISurfaceAllocator.h
@@ +85,5 @@
> + * We only don't allow changing the backend type at runtime so this value can
> + * be queried once and will not change until Gecko is restarted.
> + *
> + * XXX - With e10s this may not be true anymore. we can have accelerated widgets
> + * and non-accelerated widgets (small popups, etc.)
Shouldn't we fix this by having a different surface allocator or something? It's not clear what assumptions are made on that which will now be broken.

The landing stuck, turned it on by default here:
http://hg.mozilla.org/integration/mozilla-inbound/rev/101152af314f
There seemed to be consensus since it was all green that turning it on my default right now was a good idea. If I interpreted this wrongly I apologize and do backout, but as far as I can see right now the more testing, the earlier, the better.