Created attachment 126902[details]
mpv --hwdec=vdpau --vo=opengl-hq
I noticed, that when I play videos with mpv using VDPAU I'm getting an image that looks like deinterlacing/anti-aliasing is not working, making fine structures/patterns or text look ugly/unreadable. Playing the same video with the same stack but using VA-API doesn't exhibit this issue.
Attached you'll find a screenshot taken with VDPAU when playing a video using
mpv --hwdec=vdpau --vo=opengl-hq
The only difference for the VA-API playback is using "--hwdec=vaapi". I would say that this is a regression, but I can't remember when it started. :-(
The stack showing this issue is (Debian testing as a base):
GPU: Hawaii PRO [Radeon R9 290] (ChipID = 0x67b1)
Mesa: Git:master/e4b585f009
libdrm: 2.4.70-1
LLVM: SVN:trunk/r282761 (4.0 devel)
X.Org: 2:1.18.4-1
Linux: 4.7.5
Firmware: firmware-amd-graphics/20160824-1
libclc: Git:master/88b82a6f70
DDX (amdgpu): 1.1.2-1
Let me know, if you need anything else.

Here's a bit of background:
Earlier, mpv used interop with OutputSurfaces. That is, it rendered the video into an RGB surface and then used that for its own procesing (scaling, rendering OSD on top, etc.).
Now recently, mpv has switched to interop with VideoSurfaces. That is, it maps four fields of luma/chroma planes. These are split up into top and bottom field, just as the spec for the interop extension says:
https://www.opengl.org/registry/specs/NV/vdpau_interop.txt
mpv then weaves the fields and does color space conversion, scaling, etc.
Things get wrong with the weaving, and I found some very suspicious code in Mesa:
https://cgit.freedesktop.org/mesa/mesa/tree/src/mesa/state_tracker/st_vdpau.c#n82
Here a sampler is returned based on texture name index, and the same plane sampler is used for top and bottom field. That can't be right at all. Is the sampler possibly mangled somewhere else to make it point to the right fields? It does not look like it, but maybe I'm just missing something.
This really makes me wonder whether VideoSurface interop ever worked correctly.

Sorry, it's something else. The field is set up later by pointing to the right layer. So there's something else going wrong. Andy, I can confirm that reverting to that commit fixes it. I'll keep looking.

Created attachment 126946[details]
Weave shader dumps
It looks like LLVM miscompiles the weave fragment shader due to the changes to descriptor loading. See the attached file. With these changes, LLVM generates some really odd looking and inefficient code in this case, and the two sample instructions are somehow folded together, which cannot possibly work.
Someone proficient in LLVM can probably figure out what is going wrong.

Thanks Grigori.
GLSL:
color = fract(gl_FragCoord.y / 2) < 0.5 ?
texture(texture0, texcoord0) :
texture(texture1, texcoord0);
texture0 and texture1 are SMEM loads.
LLVM (SimplifyCFG) transforms it to:
color = texture(fract(gl_FragCoord.y / 2) < 0.5 ? texture0 : texture1,
texcoord0);
That's a nice transformation. You don't have to use 2 SMEM loads, you can just use one SMEM load depending on the result of the condition.
The problem is gl_FragCoord.y is a VGPR and texture0/1 are SGPRs, therefore flat VMEM loads are used to load the descriptors. However, image_sample requires descriptors in SGPRs, so v_readfirstlane is used. That effectively uses the result of the condition from the first active lane, discarding the results from all other lanes. The result would be exactly the same if the compiler did: v_readfirstlane s0, gl_FragCoord.y;
The test case seems pretty trivial I wonder how many other apps are affected.

(In reply to Marek Olšák from comment #11)
> Thanks Grigori.
>
> GLSL:
> color = fract(gl_FragCoord.y / 2) < 0.5 ?
> texture(texture0, texcoord0) :
> texture(texture1, texcoord0);
>
> texture0 and texture1 are SMEM loads.
>
> LLVM (SimplifyCFG) transforms it to:
> color = texture(fract(gl_FragCoord.y / 2) < 0.5 ? texture0 : texture1,
> texcoord0);
>
> That's a nice transformation. You don't have to use 2 SMEM loads, you can
> just use one SMEM load depending on the result of the condition.
>
> The problem is gl_FragCoord.y is a VGPR and texture0/1 are SGPRs, therefore
> flat VMEM loads are used to load the descriptors. However, image_sample
> requires descriptors in SGPRs, so v_readfirstlane is used. That effectively
> uses the result of the condition from the first active lane, discarding the
> results from all other lanes. The result would be exactly the same if the
> compiler did: v_readfirstlane s0, gl_FragCoord.y;
>
> The test case seems pretty trivial I wonder how many other apps are affected.
I think the best solution here would be to teach the backend how to do a scalar select based on value of vccz.

(In reply to Tom Stellard from comment #12)
>
> I think the best solution here would be to teach the backend how to do a
> scalar select based on value of vccz.
You are missing the point. The condition code (VCC) isn't the same across all threads. The problem is that the conditional assignment is transformed into a form that makes descriptor load addresses non-uniform (dependent on a VGPR). There is nothing the AMDGPU backend can do about it. It's not a problem in the backend.

(In reply to Marek Olšák from comment #13)
> (In reply to Tom Stellard from comment #12)
> >
> > I think the best solution here would be to teach the backend how to do a
> > scalar select based on value of vccz.
>
> You are missing the point. The condition code (VCC) isn't the same across
> all threads. The problem is that the conditional assignment is transformed
> into a form that makes descriptor load addresses non-uniform (dependent on a
> VGPR). There is nothing the AMDGPU backend can do about it. It's not a
> problem in the backend.
Well the backend could iterate over all the variants to handle that, but I have strong doubts that this would result in optimal performance.
So I think the best solution for now would be to mark the texture intrinsic in a way which disallows such optimizations.

Created attachment 126996[details][review]
piglit test
The attached test reproduces the issue. CFGSimplificationPass is the culprit.
Note that CFGSimplificationPass also doesn't preserve the !amdgpu.uniform metadata when transforming the loads, which is a change in behavior.

Looks like we'd need the opposite of the convergent attribute. And that attribute should only apply to one argument of the texture sampling intrinsic (i.e. not to the texture coordinate). I don't know if something like that already exists in LLVM.

(In reply to Tom Stellard from comment #12)
> (In reply to Marek Olšák from comment #11)> > The test case seems pretty trivial I wonder how many other apps are affected.
I don't know the answer to that, but I've just noticed that a debug build of mesa will catch the issue with mpv -
mpv: state_tracker/st_sampler_view.c:481: st_get_texture_sampler_view_from_stobj: Assertion `stObj->base.MinLayer == view->u.tex.first_layer' failed.

(In reply to Nicolai Hähnle from comment #20)
> Created attachment 127812[details][review] [review]
> Mesa part of fix
>
> LLVM patch https://reviews.llvm.org/D26348 together with the attached patch
> for Mesa fix this bug.
Nice, will test later, but is there a actually a nice way to get a patch from Phabricator that will apply from top level with git apply?
I usually fail with "download raw diff" and have to search/sort out the paths by hand.

(In reply to Andy Furniss from comment #21)
> (In reply to Nicolai Hähnle from comment #20)
> > Created attachment 127812[details][review] [review] [review]
> > Mesa part of fix
> >
> > LLVM patch https://reviews.llvm.org/D26348 together with the attached patch
> > for Mesa fix this bug.
>
> Nice, will test later, but is there a actually a nice way to get a patch
> from Phabricator that will apply from top level with git apply?
>
> I usually fail with "download raw diff" and have to search/sort out the
> paths by hand.
Hello Andy,
how did you solved this?

(In reply to Dieter Nützel from comment #23)
> (In reply to Andy Furniss from comment #21)
> > (In reply to Nicolai Hähnle from comment #20)
> > > Created attachment 127812[details][review] [review] [review] [review]
> > > Mesa part of fix
> > >
> > > LLVM patch https://reviews.llvm.org/D26348 together with the attached patch
> > > for Mesa fix this bug.
> >
> > Nice, will test later, but is there a actually a nice way to get a patch
> > from Phabricator that will apply from top level with git apply?
> >
> > I usually fail with "download raw diff" and have to search/sort out the
> > paths by hand.
>
> Hello Andy,
>
> how did you solved this?
Turned out to be easier than I remembered from my previous few efforts when (rarely) applying patches for llvm from there.
All I needed to do, was wherever there is a +++ or --- followed by a path, add a leading / to the path.
I guess there is a cool command line way like Nicolai says - but I would probably end up hosing the whole thing if I tried like that :-)