On 01/06/12 13:20, Torne (Richard Coles) wrote:> On 1 June 2012 11:09, Adrian Hunter <adrian.hunter@intel.com> wrote:>> On 01/06/12 12:32, Torne (Richard Coles) wrote:>>> On 1 June 2012 10:31, Torne (Richard Coles) <torne@google.com> wrote:>>>> On 1 June 2012 09:35, Adrian Hunter <adrian.hunter@intel.com> wrote:>>>>> On 29/05/12 05:32, Ben Hutchings wrote:>>>>>> On Mon, 2012-05-28 at 18:31 +0100, Torne (Richard Coles) wrote:>>>>>>> From: "Torne (Richard Coles)" <torne@google.com>>>>>>>>>>>>>>> MMC CSD info can specify very large, ridiculous timeouts, big enough to>>>>>>> overflow timeout_ns on 32-bit machines. This can result in the card>>>>>>> timing out on every operation because the wrapped timeout value is far>>>>>>> too small.>>>>>>>>>>>>>> Fix the overflow by capping the result at 2 seconds. Cards specifying>>>>>>> longer timeouts are almost certainly insane, and host controllers>>>>>>> generally cannot support timeouts that long in any case.>>>>>>>>>>>>>> 2 seconds should be plenty of time for any card to actually function;>>>>>>> the timeout calculation code is already using 1 second as a "worst case">>>>>>> timeout for cards running in SPI mode.>>>>>>>>>>>> Needs a 'Signed-off-by'.>>>>>>>>>>>>> --->>>>>>> drivers/mmc/core/core.c | 11 ++++++++++->>>>>>> 1 files changed, 10 insertions(+), 1 deletions(-)>>>>>>>>>>>>>> diff --git a/drivers/mmc/core/core.c b/drivers/mmc/core/core.c>>>>>>> index 0b6141d..3b4a9fc 100644>>>>>>> --- a/drivers/mmc/core/core.c>>>>>>> +++ b/drivers/mmc/core/core.c>>>>>>> @@ -512,7 +512,16 @@ void mmc_set_data_timeout(struct mmc_data *data, const struct mmc_card *card)>>>>>>> if (data->flags & MMC_DATA_WRITE)>>>>>>> mult <<= card->csd.r2w_factor;>>>>>>>>>>>>>> - data->timeout_ns = card->csd.tacc_ns * mult;>>>>>>> + /*>>>>>>> + * The timeout in nanoseconds may overflow with some cards. Cap it at>>>>>>> + * two seconds both to avoid the overflow and also because host>>>>>>> + * controllers cannot generally generate timeouts that long anyway.>>>>>>> + */>>>>>>> + if (card->csd.tacc_ns <= (2 * NSEC_PER_SEC) / mult)>>>>>>> + data->timeout_ns = card->csd.tacc_ns * mult;>>>>>>> + else>>>>>>> + data->timeout_ns = 2 * NSEC_PER_SEC;>>>>>>>>>>>> We clearly need to guard against overflow here, and this is the correct>>>>>> way to clamp the multiplication. I can't speak as to whether 2 seconds>>>>>> is the right limit.>>>>>>>>>> The host controllers I have looked at have a limit of around 2.5 seconds.>>>>>>>>>> But why not just use the size of the type as the limit? e.g.>>>>>>>>>> if (card->csd.tacc_ns <= UINT_MAX / mult)>>>>> data->timeout_ns = card->csd.tacc_ns * mult;>>>>> else>>>>> data->timeout_ns = UINT_MAX;>>>>>>>> The host controller drivers don't seem to all do a very good job of>>>> preventing further overflows or handling large values correctly>>>> (though some do). sdhci takes the especially annoying additional step>>>> of printk'ing a warning for *every single MMC command* where>>>> data->timeout_ns is larger than the controller can accommodate.>>>> Capping it to a value with a sensible order of magnitude seems to make>>>> it more likely that cards with obviously bogus CSD parameters will>>>> actually work. I don't object to using a larger number for the limit,>>>> but UINT_MAX on a 64-bit system obviously doesn't limit this at all>>>> and will leave you with timeouts up to 17 minutes, which seems>>>> ridiculous :)>>>>>> Er, not 17 minutes; 102.4 seconds as I used later in my mail. SD cards>>> have their timeouts capped already, so their larger 100x multiplier is>>> not a problem; 102.4 seconds is the longest for an MMC card.>>>>>>> Linux is LP64. i.e. "int" is always 32-bit in the kernel> > Oh, sorry; didn't think that through. So, yeah, that'd be 4.29> seconds, which is still too long for many hosts :)> >>>> My original motivation for this patch is that I have a device with an>>>> eMMC that specifies a 25.5 second timeout, attached to a sdhci host>>>> whose maximum timeout is 2.8 seconds. Originally I proposed a patch to>>>> just remove the warning in sdhci, but nobody replied, and when I>>>> realised there was actually an overflow happening I opted to fix that>>>> instead.>>>>>>>> So, yeah, we could use UINT_MAX, but then at minimum I also need to>>>> kill the warning in sdhci to make my device work, and probably all the>>>> host controller drivers need to be checked to make sure they don't use>>>> timeout_ns in a way that can overflow.>>>>>>>> I've also just noticed that struct mmc_data's comment for timeout_ns>>>> says /* data timeout (in ns, max 80ms) */ which is not true (the max>>>> is 102.4 seconds if my math is correct), which may have contributed to>>>> the host drivers not being too careful :)>>>>>>>> What do you think?>>>> If you can identify the card, the you could make a new quirk in a fashion>> similar to mmc_card_long_read_time().>>>> Alternatively you could make use of SDHCI_QUIRK_BROKEN_TIMEOUT_VAL or>> introduce your own sdhci quirk to suppress the warning.> > Those would work, but it seems silly to me to suppress the warning> only for some cards, or to cap the timeout only for some cards. The> best way to identify a card that has a "broken" timeout value is.. if> the timeout value is a really big number, no? I am very skeptical that> there is a card out there anywhere that will actually take more than> two seconds (or 4.29 seconds, if you prefer) to successfully complete> a command.> > The warning itself seems to have extremely limited use; there's> nothing you can do about it other than suppress it (the driver is> already capping the timeout for you), and because timeouts are> calculated per-command the warning is absurdly noisy (in fact, the fun> part on my system was the warning being logged to klogd, being written> to logfiles on the eMMC, causing more warnings, causing more log> messages, etc) :) sdhci is the only host driver that complains about> this; it seems logical for the warning to apply to all hosts, or to> none of them...

I just noticed that from linux 3.4, the SD write timeout is now 3 secondstriggering the sdhci driver warning on every write on every SD card.So change pr_warning to DGB in sdhci_calc_timeout(). Chris?