Recently while testing with the scsi_debug driverI was able to trick the block layer into readingrandom data which the block layer thought wasvalid ***.

Best to start with an example, say LBA ** 4660 hasan unrecoverable error (aka medium error) andthe block layer fires off a SCSI READ for 8blocks (512 byte variety) at LBA 4656. The responsewill be a medium error with the sense buffer infofield indicating LBA 4660. Now are the 4 blocksthat precede it (i.e. LBA 4656 to 4659) possiblysitting in the data-in buffer and valid??

The block layer thinks they are. This is what myterm "short read" in the title alludes to. So I putthis question to the T10 reflector:http://www.t10.org/t10r.htmtitled "sbc: reading blocks prior to a medium error".And the answers were pretty clear. And the one fromGeorge Penokie of LSI is interesting because Linux'sblock layer assumption breaks some of LSI's equipment.

On the other hand, big array vendors and database vendorswant exactly what the block layer is doing at the moment.So those guys don't want a change. [Please correct meif that is too sweeping.] Also I'm informed some otherOSes do this as well.

I would like to propose a solution, at least in the SCSIsubsystem context. The 'resid' field was added 11 yearsago and is used by a HBA driver to indicate how many bytesless than requested were placed in the scatter gatherlist (i.e. the data-in buffer). It defaults to zero(meaning all requested bytes have been read). Usuallyfor a medium error one would not bother setting resid(so resid would remain 0). Somewhat surprisingly theblock layer has always ignored resid. I propose in thecase of a short read caused by a MEDIUM ERROR the blocklayer checks resid. And if resid equals the requestednumber of bytes then that means no data in the scattergather list is valid. So the block layer should act onthis information.

To this end I propose to change the scsi_debug driverto set resid equal to bufflen when it simulates amedium error.

Changes in the block layer and drivers from vendors whowant the strict "T10" handling of medium errors wouldalso be required. Maybe the USB mass storage (and UAS)folks might also check if this impacts them.

Doug Gilbert

** LBA is Logical Block Address (origin 0)

*** Using 'modprobe scsi_debug opts=2' will set up a pseudo device which the example in the second paragraph is based on. Write a known pattern into the pseudo device (only 8 MB long) and use dd to read that device. Due to the 4 KB blocks used by the block layer, the read ends at LBA 4655. In my tests LBAs 4576 through to 4655 are corrupted (i.e. not what is actually on the pseudo device).