I can confirm this on Sandybridge. (It appears to work on Ivybridge, which is strange...)
Apparently, the other thing the deleted code did was override LUMINANCE and INTENSITY floating point formats to RED.
The failing case appears to be a L32_FLOAT -> R32_FLOAT blit. I don't understand why that should matter...sampling from L32 will return <R, R, R, 1> and writing to a single channel R32 render target ought to ignore the other channels...
I'll keep looking.

Hey Paul,
I'm really stumped by this bug. On Sandybridge, performing a BLORP blit from either I32_FLOAT or L32_FLOAT to R32_FLOAT appears to subtly break - the rendered image looks mostly correct, but it's all blocky, and fails at the edges. (See the images attached to the bug.) The 16-bit formats appear to work fine.
Forcing the source and destination buffers to both be R32_FLOAT appears (via the attached patch) makes it work correctly as well. This makes no sense to me, as sampling from I32_FLOAT should just return <R, R, R, R>, and writing to a R32_FLOAT buffer should ignore all but the R component anyway...
Do you have any ideas?

(In reply to comment #5)
> Hey Paul,
>
> I'm really stumped by this bug. On Sandybridge, performing a BLORP blit
> from either I32_FLOAT or L32_FLOAT to R32_FLOAT appears to subtly break -
> the rendered image looks mostly correct, but it's all blocky, and fails at
> the edges. (See the images attached to the bug.) The 16-bit formats appear
> to work fine.
>
> Forcing the source and destination buffers to both be R32_FLOAT appears (via
> the attached patch) makes it work correctly as well. This makes no sense to
> me, as sampling from I32_FLOAT should just return <R, R, R, R>, and writing
> to a R32_FLOAT buffer should ignore all but the R component anyway...
>
> Do you have any ideas?
I saw these sorts of blocky artifacts when I was first developing MSAA on Sandy Bridge. They happen when something in the software (or hardware) implements the IMS multisample layout incorrectly.
IMS layout is organized into 2x2 blocks. If the texels in a 2x2 block are:
A B
C D
And 4x multisampling is in use, then the data is encoded like this:
A0 B0 A1 B1
*
C0 D0 C1 D1
A2 B2 A3 B3
C2 D2 C3 D3
Sandy Bridge has dedicated hardware to assist in doing multisample resolves: if, for example, the "sample" message is used to sample from the position marked with a "*" in the image above (at the corner shared by samples A1, B1, C1, and D1), then instead of the sampler doing its normal linear blending (which would cause it to average together samples A1, B1, C1, and D1), some special bit twiddling logic kicks in which causes it to instead average together samples B0, B1, B2, and B3, which is exactly what is needed for a multisample resolve. This dedicated hardware doesn't exist in Ivy Bridge (presumably because this clever trick wouldn't work with Ivy Bridge's UMS and CMS formats). Blorp has special case logic to use this dedicated hardware to do multisample resolves on Gen6 (grep for single_to_blend()). On Gen7 it does the averaging manually.
I suspect what's going on is that the special bit twiddling logic isn't kicking in properly when the surface format is I32_FLOAT or L32_FLOAT, so the samples that are being averaged are A1, B1, C1, and D1, and that's causing the blocky artifacts.
Why the special bit twiddling logic isn't kicking in is anyone's guess. It could be a hardware bug, or it could be due to some restriction that we've never noticed in the bspec (though I spent some time digging this morning and didn't find anything). In any case, there's an easy solution: just do the blit as R32_FLOAT -> R32_FLOAT.
I'll send a patch out to mesa-dev as soon as I've run it through a full piglit run on Sandy Bridge.