[PATCH 2/2] mmc_block: ensure all sectors that do not have errors are read - Kernel

This is a discussion on [PATCH 2/2] mmc_block: ensure all sectors that do not have errors are read - Kernel ; If a card encounters an ECC error while reading a sector it will
timeout. Instead of reporting the entire I/O request as having
an error, redo the I/O one sector at a time so that all readable
sectors are provided ...

[PATCH 2/2] mmc_block: ensure all sectors that do not have errors are read

If a card encounters an ECC error while reading a sector it will
timeout. Instead of reporting the entire I/O request as having
an error, redo the I/O one sector at a time so that all readable
sectors are provided to the upper layers.

Re: [PATCH 2/2] mmc_block: ensure all sectors that do not have errors are read

On Thu, 16 Oct 2008 16:26:57 +0300
Adrian Hunter wrote:
> If a card encounters an ECC error while reading a sector it will
> timeout. Instead of reporting the entire I/O request as having
> an error, redo the I/O one sector at a time so that all readable
> sectors are provided to the upper layers.
>
> Signed-off-by: Adrian Hunter
> ---

But you have to assume that no driver changes the 'blocks' variable e.g.
counts it down. It is not an optimisation, it is just to improve
reliability and readability. What does it obscure?
> 2. A comment here as well. Explain what this does and why it is safe
> (so people don't try to extend it to writes)

ok
> 3. You should check all errors, not just data.error and ETIMEDOUT.

No. Data timeout is a special case. The other errors are system errors.
If there is a command error or stop error (which is also a command error)
it means either there is a bug in the kernel or the controller or card
has failed to follow the specification. Under those circumstances

Data timeout on the other hand just means the data could not be retrieved
- in the case we have seen because of ECC error.
> 4. You should first report the successfully transferred blocks as ok.

If a card encounters an ECC error while reading a sector it will
timeout. Instead of reporting the entire I/O request as having
an error, redo the I/O one sector at a time so that all readable
sectors are provided to the upper layers.

Re: [PATCH 2/2] mmc_block: ensure all sectors that do not have errors are read

Adrian Hunter wrote:
> Pierre Ossman wrote:
>> On Thu, 16 Oct 2008 16:26:57 +0300
>> Adrian Hunter wrote:
>>
>>> If a card encounters an ECC error while reading a sector it will
>>> timeout. Instead of reporting the entire I/O request as having
>>> an error, redo the I/O one sector at a time so that all readable
>>> sectors are provided to the upper layers.
>>>
>>> Signed-off-by: Adrian Hunter
>>> ---
>>
>> We actually had something like this on the table some time ago. It got
>> scrapped because of data integrity problems. This is just for reads
>> though, so I guess it should be safe.
>>
>>> @@ -278,6 +279,9 @@ static int mmc_blk_issue_rq(struct mmc_queue *mq,
>>> struct request *req)
>>> brq.stop.flags = MMC_RSP_SPI_R1B | MMC_RSP_R1B | MMC_CMD_AC;
>>> brq.data.blocks = req->nr_sectors;
>>>
>>> + if (disable_multi && brq.data.blocks > 1)
>>> + brq.data.blocks = 1;
>>> +
>>
>> A comment here would be nice.
>
> Ok
>
>> You also need to adjust the sg list when you change the block count.
>> There was code there that did that previously, but it got removed in
>> 2.6.27-rc1.
>
> That is not necessary. It is an optimisation. In general, optimising an
> error path serves no purpose.
>
>>> @@ -312,6 +318,13 @@ static int mmc_blk_issue_rq(struct mmc_queue
>>> *mq, struct request *req)
>>>
>>> mmc_queue_bounce_post(mq);
>>>
>>> + if (multi && rq_data_dir(req) == READ &&
>>> + brq.data.error == -ETIMEDOUT) {
>>> + /* Redo read one sector at a time */
>>> + disable_multi = 1;
>>> + continue;
>>> + }
>>> +
>>
>> Some concerns here:
>>
>> 1. "brq.data.blocks > 1" doesn't need to be optimised into its own
>> variable. It just obscures things.
>
> But you have to assume that no driver changes the 'blocks' variable e.g.
> counts it down. It is not an optimisation, it is just to improve
> reliability and readability. What does it obscure?
>
>> 2. A comment here as well. Explain what this does and why it is safe
>> (so people don't try to extend it to writes)
>
> ok
>
>> 3. You should check all errors, not just data.error and ETIMEDOUT.
>
> No. Data timeout is a special case. The other errors are system errors.
> If there is a command error or stop error (which is also a command error)
> it means either there is a bug in the kernel or the controller or card
> has failed to follow the specification. Under those circumstances
>
> Data timeout on the other hand just means the data could not be retrieved
> - in the case we have seen because of ECC error.
>
>> 4. You should first report the successfully transferred blocks as ok.
>
> That is another optimisation of the error path i.e. not necessary. It
> is simpler to just start processing the request again - which the patch
> does.
>
>>> @@ -360,14 +373,21 @@ static int mmc_blk_issue_rq(struct mmc_queue
>>> *mq, struct request *req)
>>> #endif
>>> }
>>>
>>> - if (brq.cmd.error || brq.data.error || brq.stop.error)
>>> + if (brq.cmd.error || brq.stop.error)
>>> goto cmd_err;
>>
>> Move your code to inside this if clause and you'll solve 3. and 4. in a
>> neat manner.
>
> Well, I do not agree with 3 and 4.
>
>> You might also want to print something so that it is
>> visible that the driver retried the transfer.
>
> There are already two error messages per sector (one from this function
> and one from '__blk_end_request()', so another message is too much.
>
>>>
>>> - /*
>>> - * A block was successfully transferred.
>>> - */
>>> + if (brq.data.error) {
>>> + if (brq.data.error == -ETIMEDOUT &&
>>> + rq_data_dir(req) == READ) {
>>> + err = -EIO;
>>> + brq.data.bytes_xfered = brq.data.blksz;
>>> + } else
>>> + goto cmd_err;
>>> + } else
>>> + err = 0;
>>> +
>>> spin_lock_irq(&md->lock);
>>> - ret = __blk_end_request(req, 0, brq.data.bytes_xfered);
>>> + ret = __blk_end_request(req, err, brq.data.bytes_xfered);
>>> spin_unlock_irq(&md->lock);
>>> } while (ret);
>>>
>>
>> Instead of this big song and dance routine, just have a dedicated piece
>> of code for calling __blk_end_request() for the single sector failure.
>
> Ok
>
> Amended patch follows: