Commit Message

From: Thierry Reding <treding@nvidia.com>
The HOST1X_CHANNEL_DMAEND is an offset relative to the value written to
the HOST1X_CHANNEL_DMASTART register, but it is currently treated as an
absolute address. This can cause SMMU faults if the CDMA fetches past a
pushbuffer's IOMMU mapping.
Properly setting the DMAEND prevents the CDMA from fetching beyond that
address and avoid such issues. This is currently not observed because a
whole (almost) page of essentially scratch space absorbs any excessive
prefetching by CDMA. However, changing the number of slots in the push
buffer can trigger these SMMU faults.
Signed-off-by: Thierry Reding <treding@nvidia.com>
---
drivers/gpu/host1x/hw/cdma_hw.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

On Fri, Feb 01, 2019 at 04:48:56PM +0300, Dmitry Osipenko wrote:
> 01.02.2019 16:41, Thierry Reding пишет:> > On Fri, Feb 01, 2019 at 04:37:59PM +0300, Dmitry Osipenko wrote:> >> 01.02.2019 16:28, Thierry Reding пишет:> >>> From: Thierry Reding <treding@nvidia.com>> >>>> >>> The HOST1X_CHANNEL_DMAEND is an offset relative to the value written to> >>> the HOST1X_CHANNEL_DMASTART register, but it is currently treated as an> >>> absolute address. This can cause SMMU faults if the CDMA fetches past a> >>> pushbuffer's IOMMU mapping.> >>>> >>> Properly setting the DMAEND prevents the CDMA from fetching beyond that> >>> address and avoid such issues. This is currently not observed because a> >>> whole (almost) page of essentially scratch space absorbs any excessive> >>> prefetching by CDMA. However, changing the number of slots in the push> >>> buffer can trigger these SMMU faults.> >>>> >>> Signed-off-by: Thierry Reding <treding@nvidia.com>> >>> ---> >>> drivers/gpu/host1x/hw/cdma_hw.c | 4 ++--> >>> 1 file changed, 2 insertions(+), 2 deletions(-)> >>>> >>> diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c> >>> index 485aef5761af..a24c090ac96f 100644> >>> --- a/drivers/gpu/host1x/hw/cdma_hw.c> >>> +++ b/drivers/gpu/host1x/hw/cdma_hw.c> >>> @@ -75,7 +75,7 @@ static void cdma_start(struct host1x_cdma *cdma)> >>> > >>> cdma->last_pos = cdma->push_buffer.pos;> >>> start = cdma->push_buffer.dma;> >>> - end = start + cdma->push_buffer.size + 4;> >>> + end = cdma->push_buffer.size + 4;> >>> > >>> host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP,> >>> HOST1X_CHANNEL_DMACTRL);> >>> @@ -126,7 +126,7 @@ static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr)> >>> HOST1X_CHANNEL_DMACTRL);> >>> > >>> start = cdma->push_buffer.dma;> >>> - end = start + cdma->push_buffer.size + 4;> >>> + end = cdma->push_buffer.size + 4;> >>> > >>> /* set base, end pointer (all of memory) */> >>> host1x_ch_writel(ch, lower_32_bits(start), HOST1X_CHANNEL_DMASTART);> >>>> >>> >> This seems fixes problem that was added by a previous patch in this> >> series, "gpu: host1x: Support 40-bit addressing". What about to just> >> squash the patches? > > > > This actually fixes a bug that's always been there. This just happens to> > touch the same lines as an earlier patch as a result of some refactoring> > that the earlier patch did.> > Oh, wow. Indeed! That's a bit unfortunate :) Though it's quite> difficult to spot that bug by looking at the code, good that it got> caught. Was this bug triggered by trying to move IOVA allocation to> the end of the AS?
This was actually triggered because I noticed that we were using 512
slots in the push buffer, which means 4096 bytes, but then we needed
that extra 4 bytes for the RESTART opcode, which means that we're
currently allocating 8192 bytes of which only 4092 are being used.
So I thought: "Well, we should be able to live with 511 slots per push
buffer and save a full memory page. This is an easy optimization!" Boy
was I wrong... After making that change I started to see SMMU faults
for the address immediately after the push buffer mapping. I think the
same error happens regardless of where the push buffer is located. The
reason for the SMMU faults seem to be that CDMA happily keeps on pre-
fetching (a lot of) commands before it wraps around because of the
RESTART opcode. DMAEND is designed as a mechanism to prevent CDMA from
excessively fetching commands, but if you program a really large value
into it, it'll add that really large value to the DMASTART as offset
and the mechanism doesn't work anymore.
We don't currently see this because the 4092 bytes in the "scratch"
page of the push buffer allocation are enough to absorb the prefetching
that CDMA does.
We would also likely never see it happen in non-SMMU cases because the
CDMA would just keep on prefetching whatever memory happened to be after
the push buffer.
Thierry

01.02.2019 17:10, Thierry Reding пишет:
> On Fri, Feb 01, 2019 at 04:48:56PM +0300, Dmitry Osipenko wrote:>> 01.02.2019 16:41, Thierry Reding пишет:>>> On Fri, Feb 01, 2019 at 04:37:59PM +0300, Dmitry Osipenko wrote:>>>> 01.02.2019 16:28, Thierry Reding пишет:>>>>> From: Thierry Reding <treding@nvidia.com>>>>>>>>>>> The HOST1X_CHANNEL_DMAEND is an offset relative to the value written to>>>>> the HOST1X_CHANNEL_DMASTART register, but it is currently treated as an>>>>> absolute address. This can cause SMMU faults if the CDMA fetches past a>>>>> pushbuffer's IOMMU mapping.>>>>>>>>>> Properly setting the DMAEND prevents the CDMA from fetching beyond that>>>>> address and avoid such issues. This is currently not observed because a>>>>> whole (almost) page of essentially scratch space absorbs any excessive>>>>> prefetching by CDMA. However, changing the number of slots in the push>>>>> buffer can trigger these SMMU faults.>>>>>>>>>> Signed-off-by: Thierry Reding <treding@nvidia.com>>>>>> --->>>>> drivers/gpu/host1x/hw/cdma_hw.c | 4 ++-->>>>> 1 file changed, 2 insertions(+), 2 deletions(-)>>>>>>>>>> diff --git a/drivers/gpu/host1x/hw/cdma_hw.c b/drivers/gpu/host1x/hw/cdma_hw.c>>>>> index 485aef5761af..a24c090ac96f 100644>>>>> --- a/drivers/gpu/host1x/hw/cdma_hw.c>>>>> +++ b/drivers/gpu/host1x/hw/cdma_hw.c>>>>> @@ -75,7 +75,7 @@ static void cdma_start(struct host1x_cdma *cdma)>>>>> >>>>> cdma->last_pos = cdma->push_buffer.pos;>>>>> start = cdma->push_buffer.dma;>>>>> - end = start + cdma->push_buffer.size + 4;>>>>> + end = cdma->push_buffer.size + 4;>>>>> >>>>> host1x_ch_writel(ch, HOST1X_CHANNEL_DMACTRL_DMASTOP,>>>>> HOST1X_CHANNEL_DMACTRL);>>>>> @@ -126,7 +126,7 @@ static void cdma_timeout_restart(struct host1x_cdma *cdma, u32 getptr)>>>>> HOST1X_CHANNEL_DMACTRL);>>>>> >>>>> start = cdma->push_buffer.dma;>>>>> - end = start + cdma->push_buffer.size + 4;>>>>> + end = cdma->push_buffer.size + 4;>>>>> >>>>> /* set base, end pointer (all of memory) */>>>>> host1x_ch_writel(ch, lower_32_bits(start), HOST1X_CHANNEL_DMASTART);>>>>>>>>>>>>> This seems fixes problem that was added by a previous patch in this>>>> series, "gpu: host1x: Support 40-bit addressing". What about to just>>>> squash the patches? >>>>>> This actually fixes a bug that's always been there. This just happens to>>> touch the same lines as an earlier patch as a result of some refactoring>>> that the earlier patch did.>>>> Oh, wow. Indeed! That's a bit unfortunate :) Though it's quite>> difficult to spot that bug by looking at the code, good that it got>> caught. Was this bug triggered by trying to move IOVA allocation to>> the end of the AS?> > This was actually triggered because I noticed that we were using 512> slots in the push buffer, which means 4096 bytes, but then we needed> that extra 4 bytes for the RESTART opcode, which means that we're> currently allocating 8192 bytes of which only 4092 are being used.> > So I thought: "Well, we should be able to live with 511 slots per push> buffer and save a full memory page. This is an easy optimization!" Boy> was I wrong... After making that change I started to see SMMU faults> for the address immediately after the push buffer mapping. I think the> same error happens regardless of where the push buffer is located. The> reason for the SMMU faults seem to be that CDMA happily keeps on pre-> fetching (a lot of) commands before it wraps around because of the> RESTART opcode. DMAEND is designed as a mechanism to prevent CDMA from> excessively fetching commands, but if you program a really large value> into it, it'll add that really large value to the DMASTART as offset> and the mechanism doesn't work anymore.> > We don't currently see this because the 4092 bytes in the "scratch"> page of the push buffer allocation are enough to absorb the prefetching> that CDMA does.> > We would also likely never see it happen in non-SMMU cases because the> CDMA would just keep on prefetching whatever memory happened to be after> the push buffer.
Yeah, that's what usually happens when the code is getting improved. Good job! :)