Actually no ogl client has even started. this is just the xserver being started by slim (login manager) and that doesn't use OGL. it's really basic xlib stuff. so it is basically a raw xserver... perhaps its the glamor accel stuff... but... no OGL clients. :) never got that far.

(In reply to Carsten Haitzler from comment #8)
> Actually no ogl client has even started. this is just the xserver being
> started by slim (login manager) and that doesn't use OGL. it's really basic
> xlib stuff. so it is basically a raw xserver... perhaps its the glamor accel
> stuff... but... no OGL clients. :) never got that far.
Yeah, it would be GL via glamor in that case.

so wouldn't that make it a necessity then if its even glamor needing it? i guess i can turn off glamor accel but realistically gl is a necessity so the problem needs to be addressed sooner or later.
the ring gfx timeout smells to me of "not a mesa bug" in that an ioctl going to the drm driver never returns qhen doing a simple query. it hangs, thus something lower down that is having a bad day, if something as simple as querying a fence causes a hang... :)
what is this ring gfx thing exactly (seems to be some command queue) and why would it be timing out? all the way back at seq 10/11 ... like right at the start of its use? it's almost like some interrupt or in memory semaphore thing mapped from the card is messing up? i'm looking for something to look into more specifically.

Does this patch help?
https://patchwork.freedesktop.org/patch/259364/
Does ARM support write combining? The driver uses it pretty extensively. You might try disabling GTT_USWC (uncached write combined) support in the kernel driver and just falling back to cached memory.

(In reply to Carsten Haitzler from comment #10)
> so wouldn't that make it a necessity then if its even glamor needing it? i
> guess i can turn off glamor accel but realistically gl is a necessity so the
> problem needs to be addressed sooner or later.
>
If you were starting a bare x server, you usually don't hit the glamor paths too extensively compared to a full desktop environment.
> the ring gfx timeout smells to me of "not a mesa bug" in that an ioctl going
> to the drm driver never returns qhen doing a simple query. it hangs, thus
> something lower down that is having a bad day, if something as simple as
> querying a fence causes a hang... :)
>
> what is this ring gfx thing exactly (seems to be some command queue) and why
> would it be timing out? all the way back at seq 10/11 ... like right at the
> start of its use? it's almost like some interrupt or in memory semaphore
> thing mapped from the card is messing up? i'm looking for something to look
> into more specifically.
Each engine on the GPU (gfx, compute, video decode, encode, dma, etc.) has a ring buffer used to feed it. The work sent to the engines is managed by a sw scheduler in the kernel. The kernel driver tests the rings as part of the driver init sequence. The driver won't come up if the ring tests fail so they are working at least until you start X. Presumably X submits (via glamor) some work to the GPU which causes the GPU to hang. The fence never signals because the GPU never finished processing the job due to the hang.
Another simplier test would be to boot up to a console (no X) and then try running some of the libdrm amdgpu tests. They are really simple (copying data and round and verifying it using different engines, allocating freeing memory, etc.).
https://cgit.freedesktop.org/mesa/drm/tree/tests/amdgpu
See if some of the simple copy or write tests work.

And lo and behold:
--- ./include/drm/drm_cache.h~ 2018-08-12 21:41:04.000000000 +0100
+++ ./include/drm/drm_cache.h 2018-11-16 11:06:16.976842816 +0000
@@ -48,7 +48,7 @@
#elif defined(CONFIG_MIPS) && defined(CONFIG_CPU_LOONGSON3)
return false;
#else
- return true;
+ return false;
#endif
}
Makes it work. Of course this isn't a brilliant patch, but indeed there is something up with the way write combined memory is handled on ARM here. but disabling WC for all ARM DRM devices might be too much of a sledgehammer... I'm going to look into a less sledge-hammer solution that might make this work more universally. I'll get back to you on that.

(In reply to Carsten Haitzler from comment #15)
> Makes it work. Of course this isn't a brilliant patch, but indeed there is
> something up with the way write combined memory is handled on ARM here.
Well disabling WC is also a good way of reducing the performance in general.
E.g. what could be is that because you disabled WC the performance is reduced and because of that the timing is changed....