On 12.07.2011, at 23:58, Michał Mirosław <mirq-linux@rere.qmqm.pl> wrote:
> On Tue, Jul 12, 2011 at 10:21:05PM +0800, Felix Fietkau wrote:
>> On 12.07.2011, at 21:03, Michał Mirosław <mirq-linux@rere.qmqm.pl> wrote:
>>
>>> On Tue, Jul 12, 2011 at 08:54:32PM +0800, Felix Fietkau wrote:
>>>> On 12.07.2011, at 17:55, Michał Mirosław <mirq-linux@rere.qmqm.pl> wrote:
>>>>
>>>>> On Tue, Jul 12, 2011 at 12:36:06PM +0800, Felix Fietkau wrote:
>>>>>> On 2011-07-11 8:52 AM, Michał Mirosław wrote:
>>>>>>> Also constify buf_addr for ath9k_hw_process_rxdesc_edma() to verify
>>>>>>> assumptions --- dma_sync_single_for_device() call can be removed.
>>>>>>>
>>>>>>> Signed-off-by: Michał Mirosław<mirq-linux@rere.qmqm.pl>
>>>>>>> ---
>>>>>>> drivers/net/wireless/ath/ath9k/ar9003_mac.c | 4 ++--
>>>>>>> drivers/net/wireless/ath/ath9k/ar9003_mac.h | 2 +-
>>>>>>> drivers/net/wireless/ath/ath9k/recv.c | 10 +++-------
>>>>>>> 3 files changed, 6 insertions(+), 10 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/net/wireless/ath/ath9k/recv.c
>>>>>>> b/drivers/net/wireless/ath/ath9k/recv.c
>>>>>>> index 70dc8ec..c5f46d5 100644
>>>>>>> --- a/drivers/net/wireless/ath/ath9k/recv.c
>>>>>>> +++ b/drivers/net/wireless/ath/ath9k/recv.c
>>>>>>> @@ -684,15 +684,11 @@ static bool ath_edma_get_buffers(struct ath_softc
>>>>>>> *sc,
>>>>>>> BUG_ON(!bf);
>>>>>>>
>>>>>>> dma_sync_single_for_cpu(sc->dev, bf->bf_buf_addr,
>>>>>>> - common->rx_bufsize, DMA_FROM_DEVICE);
>>>>>>> + common->rx_bufsize, DMA_BIDIRECTIONAL);
>>>>>>>
>>>>>>> ret = ath9k_hw_process_rxdesc_edma(ah, NULL, skb->data);
>>>>>>> - if (ret == -EINPROGRESS) {
>>>>>>> - /*let device gain the buffer again*/
>>>>>>> - dma_sync_single_for_device(sc->dev, bf->bf_buf_addr,
>>>>>>> - common->rx_bufsize, DMA_FROM_DEVICE);
>>>>>>> + if (ret == -EINPROGRESS)
>>>>>>> return false;
>>>>>>> - }
>>>>>>>
>>>>>>> __skb_unlink(skb,&rx_edma->rx_fifo);
>>>>>>> if (ret == -EINVAL) {
>>>>>> I have strong doubts about this change. On most MIPS devices,
>>>>>> dma_sync_single_for_cpu is a no-op, whereas
>>>>>> dma_sync_single_for_device flushes the cache range. With this
>>>>>> change, the CPU could cache the DMA status part behind skb->data and
>>>>>> that cache entry would not be flushed inbetween calls to this
>>>>>> functions on the same buffer, likely leading to rx stalls.
>>>>> You're suggesting a platform implementation bug then. If the platform is
>>>>> not
>>>>> cache-coherent, it should invalidate relevant CPU cache lines for
>>>>> sync_to_cpu
>>>>> and unmap cases. Do other devices show such symptoms on MIPS systems?
>>>>>
>>>>> I'm not familiar with the platform internals, so we should ask MIPS
>>>>> people.
>>>> I only mentioned MIPS to describe the potential side effect of this
>>>> change. From my current understanding of the DMA API, it would be wrong on
>>>> other platforms as well. I believe the _for_device function needs to be
>>>> used to transfer ownership of the buffer back to the device, before
>>>> calling _for_cpu again later for another read.
>>> What you're saying reminds the wording in DMA-API-HOWTO.txt that I find
>>> wrong (or at least misleading) compared to what DMA-API.txt describes.
>>> DMA sync calls do not transfer the ownership of the buffer - they are
>>> cache synchronization points, ownership passing is handled entirely by
>>> the driver.
>> What I meant was that the DMA sync calls reflect the ownership transfer of
>> the memory regions. In this case ownership is transferred between device and
>> CPU multiple times and the code reflects that.
>>>> This is definitely required in this case, because when the return code is
>>>> -EINPROGRESS, the driver waits for the hardware to complete this buffer,
>>>> and the next call has to fetch the memory again after the device has
>>>> updated it.
>>> Correctness of this access should be provided by sync_to_cpu() call.
>> At least in MIPS I'm sure it isn't. If I remember correctly, it also isn't
>> on ARM, so I'm pretty sure that either your understanding of the API is
>> incorrect, or arch code does not implement it properly. In either case, this
>> change (and probably also the p54 one) should not be merged.
>
> I briefly looked through DMA API implementation in MIPS, and except
> for R10k and R12k both sync_for_cpu and sync_for_device are no-ops
> (see: arch/mips/mm/dma-default.c). For R10k and R12k the syncs are
> in both points, and exactly like I described before - CPU cachelines
> are invalidated for DMA_FROM_DEVICE mappings, written back for
> DMA_TO_DEVICE, both for DMA_BIDIRECTIONAL (including redundant
> mapping+sync direction).
>
> So doing that sync_to_device you are just invalidating the same cachelines
> twice for no gain (or do nothing twice in some cases) - they are not read
> by CPU between sync_to_device -> sync_to_cpu (unless you have other bugs
> in the driver).
I think you're missing something. It works like this: In the AR9380 rx path,
the descriptor is part of the skb. The rx tasklet checks for rx frame
completion by calling the sync for cpu, reading the completion flag and (in
case of a not completed frame) flushes the cache for that location again (for
device). If you remove the for_device call, the next call to this function can
see stale data, as the for_cpu call can be a no-op.
- Felix