*Re: [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths
2019-07-18 5:37 ` Viresh Kumar@ 2019-07-19 4:12 ` Saravana Kannan
2019-07-29 9:24 ` Viresh Kumar0 siblings, 1 reply; 47+ messages in thread
From: Saravana Kannan @ 2019-07-19 4:12 UTC (permalink / raw)
To: Viresh Kumar
Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
Sweeney, Sean, daidavid1, Rajendra Nayak, Sibi Sankar,
Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML
On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> I know you have explained lots of things earlier as well, but they are
> available over multiple threads and I don't know where to reply now :)
>
> Lets have proper discussion (once again) here and be done with it.
> Sorry for the trouble of explaining things again.
>
> On 17-07-19, 13:34, Saravana Kannan wrote:
> > On Wed, Jul 17, 2019 at 3:32 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > On 02-07-19, 18:10, Saravana Kannan wrote:
> > > > gpu_cache_opp_table: gpu_cache_opp_table {
> > > > compatible = "operating-points-v2";
> > > >
> > > > gpu_cache_3000: opp-3000 {
> > > > opp-peak-KBps = <3000>;
> > > > opp-avg-KBps = <1000>;
> > > > };
> > > > gpu_cache_6000: opp-6000 {
> > > > opp-peak-KBps = <6000>;
> > > > opp-avg-KBps = <2000>;
> > > > };
> > > > gpu_cache_9000: opp-9000 {
> > > > opp-peak-KBps = <9000>;
> > > > opp-avg-KBps = <9000>;
> > > > };
> > > > };
> > > >
> > > > gpu_ddr_opp_table: gpu_ddr_opp_table {
> > > > compatible = "operating-points-v2";
> > > >
> > > > gpu_ddr_1525: opp-1525 {
> > > > opp-peak-KBps = <1525>;
> > > > opp-avg-KBps = <452>;
> > > > };
> > > > gpu_ddr_3051: opp-3051 {
> > > > opp-peak-KBps = <3051>;
> > > > opp-avg-KBps = <915>;
> > > > };
> > > > gpu_ddr_7500: opp-7500 {
> > > > opp-peak-KBps = <7500>;
> > > > opp-avg-KBps = <3000>;
> > > > };
> > > > };
> > >
> > > Who is going to use the above tables and how ?
> >
> > In this example the GPU driver would use these. It'll go through these
> > and then decide what peak and average bw to pick based on whatever
> > criteria.
>
> Are you saying that the GPU driver will decide which bandwidth to
> choose while running at a particular frequency (say 2 GHz) ? And that
> it can choose 1525 or 3051 or 7500 from the ddr path ?
>
> Will it be possible to publicly share how we derive to these decisions
> ?
GPU is just an example. So I can't really speak for how a random GPU
driver might decide the bandwidth to pick.
But one obvious way is to start at the lowest bandwidth and check the
bus port busy%. If it's > 80% busy, it'll pick the next bandwidth,
etc. So, something like what cpufreq ondemand or conservative governor
used to do.
> The thing is I don't like these separate OPP tables which will not be
> used by anyone else, but just GPU (or a single device).
The BW OPP table isn't always a secondary OPP table. It can be a
primary OPP table too. For example, if you have a bandwidth monitoring
device/HW IP that can measure for a path and make requests for that
path, it'll have a BW OPP table and it'll pick from one of those BW
OPP levels based on the hardware measurements. It will have it's own
device driver. This is basically no different from a device being the
only user of a freq OPP table.
> I would like
> to put this data in the GPU OPP table only. What about putting a
> range in the GPU OPP table for the Bandwidth if it can change so much
> for the same frequency.
I don't think the range is going to work. If a GPU is doing purely
computational work, it's not unreasonable for it to vote for the
lowest bandwidth for any GPU frequency.
>
> > > These are the maximum
> > > BW available over these paths, right ?
> >
> > I wouldn't call them "maximum" because there can't be multiple
> > maximums :) But yes, these are the meaningful bandwidth from the GPU's
> > perspective to use over these paths.
> >
> > >
> > > > gpu_opp_table: gpu_opp_table {
> > > > compatible = "operating-points-v2";
> > > > opp-shared;
> > > >
> > > > opp-200000000 {
> > > > opp-hz = /bits/ 64 <200000000>;
> > > > };
> > > > opp-400000000 {
> > > > opp-hz = /bits/ 64 <400000000>;
> > > > };
> > > > };
> > >
> > > Shouldn't this link back to the above tables via required-opp, etc ?
> > > How will we know how much BW is required by the GPU device for all the
> > > paths ?
> >
> > If that's what the GPU driver wants to do, then yes. But the GPU
> > driver could also choose to scale the bandwidth for these paths based
> > on multiple other signals. Eg: bus port busy percentage, measure
> > bandwidth, etc.
>
> Lets say that the GPU is running at 2 GHz right now and based on above
> inputs it wants to increase the bandwidth to 7500 for ddr path, now
> does it make sense to run at 4 GHz instead of 2 so we utilize the
> bandwidth to the best of our ability and waste less power ?
This is kinda hard to explain, but I'll try.
Firstly, the GPU power increase might be so high that you might not
want to do this anyway.
Also, what you are proposing *might* improve the perf/mW (efficiency)
but it doesn't decrease the actual power consumption. So, this doesn't
really work towards saving power for mobile devices.
Also, if the GPU is generating a lot of traffic to DDR and you
increase the GPU frequency, it's only going to generate even more
traffic. So you'll end up in a positive feedback loop that maxes out
the frequency and bandwidth. Definitely not something you want for a
mobile device.
> If something like that is acceptable, then what about keeping the
> bandwidth fixed for frequencies and rather scale the frequency of the
> GPU on the inputs your provided (like bus port busy percentage, etc).
I don't think it's acceptable.
> The current proposal makes me wonder on why should we try to reuse OPP
> tables for providing these bandwidth values as the OPP tables for
> interconnect paths isn't really a lot of data, only bandwidth all the
> time and there is no linking from the device's OPP table as well.
I think everyone is getting too tied up on mapping device frequency to
bandwidth requests. That's useful for a limited set of cases. But it
doesn't work for a lot of use cases.
Couple of benefits of using BW OPPs instead of with listing bandwidth
values as part of frequency OPP tables:
- Works better when the interconnect path has more useful levels that
the device frequency levels. I think this might even be true on the
SDM845 for GPU and DDR. The link from freq OPP to BW OPP could list
the minimum bandwidth level to use for a particular device freq and
then let the hardware monitoring heuristic take it higher from there.
- Works even if no freq to bandwidth mapping heuristic is used but the
device needs to skip certain bandwidth levels based on the platform's
power/perf reasons.
- More scalable as more properties are added to BW OPP levels. Traffic
priority is one natural extension of the BW OPP "rows". Explicit
latency is another possibility.
- Currently devices that have use case specific bandwidth levels
(that's not computed at runtime) have no way of capturing their use
case level bandwidth needs in DT. Everyone is inventing their own
scheme. Having BW OPP table would allow them capture all the use case
specific bandwidth levels in DT and then pick one using the
index/phandle/etc. We could even allow naming OPP rows and pick it
that way. Not saying this is a main reason for BW OPP tables or we
should do this, but this is a possibility to consider.
Long story short, BW OPP tables make a lot of sense for anyone who has
actually done bandwidth scaling on a commercial platform.
If people are getting too tied up about the interconnect-opp-table we
can just drop that. I just added that to avoid having any implicit
ordering of tables in the operation-points-v2 property vs
interconnects property and call it out more explicitly. But it's not a
hill worth dying on.
-Saravana
^permalinkrawreply [flat|nested] 47+ messages in thread

*Re: [PATCH v3 1/6] dt-bindings: opp: Introduce opp-peak-KBps and opp-avg-KBps bindings
2019-07-22 23:35 ` Rob Herring@ 2019-07-22 23:40 ` Saravana Kannan
2019-07-23 14:26 ` Rob Herring0 siblings, 1 reply; 47+ messages in thread
From: Saravana Kannan @ 2019-07-22 23:40 UTC (permalink / raw)
To: Rob Herring
Cc: Sibi Sankar, Georgi Djakov, Mark Rutland, Viresh Kumar,
Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
Sweeney, Sean, daidavid1, Rajendra Nayak, Bjorn Andersson,
Evan Green, Android Kernel Team, Linux PM,
open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML
On Mon, Jul 22, 2019 at 4:35 PM Rob Herring <robh@kernel.org> wrote:
>
> On Tue, Jul 16, 2019 at 11:58:08AM -0700, Saravana Kannan wrote:
> > On Tue, Jul 16, 2019 at 10:25 AM Sibi Sankar <sibis@codeaurora.org> wrote:
> > >
> > > Hey Saravana,
> > >
> > > https://patchwork.kernel.org/patch/10850815/
> > > There was already a discussion ^^ on how bandwidth bindings were to be
> > > named.
> >
> > Yes, I'm aware of that series. That series is trying to define a BW
> > mapping for an existing frequency OPP table. This patch is NOT about
> > adding a mapping to an existing table. This patch is about adding the
> > notion of BW OPP tables where BW is the "key" instead of "frequency".
> >
> > So let's not mixed up these two series.
>
> Maybe different reasons, but in the end we'd end up with 2 bandwidth
> properties. We need to sort out how they'd overlap/coexist.
Oh, I totally agree! My point is that the other mapping isn't the
right approach because it doesn't handle a whole swath of use cases.
The one I'm proposing can act as a super set of the other (as in, can
handle that use case too).
> The same comment in that series about defining a standard unit suffix
> also applies to this one.
I thought I read that whole series and I don't remember reading about
the unit suffix. But I'll take a closer look. I've chosen to keep the
DT units at least as "high of a resolution" as what the APIs accept
today. The APIs take KB/s. So I make sure DT can capture KB/s
differences. If we all agree that KB/s is "too accurate" then I think
we should change everything to MB/s.
-Saravana
^permalinkrawreply [flat|nested] 47+ messages in thread

*Re: [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths
2019-07-19 4:12 ` Saravana Kannan@ 2019-07-29 9:24 ` Viresh Kumar
2019-07-29 20:12 ` Saravana Kannan0 siblings, 1 reply; 47+ messages in thread
From: Viresh Kumar @ 2019-07-29 9:24 UTC (permalink / raw)
To: Saravana Kannan
Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
Sweeney, Sean, daidavid1, Rajendra Nayak, Sibi Sankar,
Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML
On 18-07-19, 21:12, Saravana Kannan wrote:
> On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > I would like
> > to put this data in the GPU OPP table only. What about putting a
> > range in the GPU OPP table for the Bandwidth if it can change so much
> > for the same frequency.
>
> I don't think the range is going to work.
Any specific reason for that ?
> If a GPU is doing purely
> computational work, it's not unreasonable for it to vote for the
> lowest bandwidth for any GPU frequency.
I think that is fine, but if the GPU is able to find how much
bandwidth it needs why can't it just pass that value without needing
to have another OPP table for the path ?
The interconnect can then take all the requests and have its own OPP
table to get help on deciding what stable bandwidth to use.
--
viresh
^permalinkrawreply [flat|nested] 47+ messages in thread

*Re: [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths
2019-07-29 9:24 ` Viresh Kumar@ 2019-07-29 20:12 ` Saravana Kannan
2019-07-30 3:01 ` Viresh Kumar0 siblings, 1 reply; 47+ messages in thread
From: Saravana Kannan @ 2019-07-29 20:12 UTC (permalink / raw)
To: Viresh Kumar
Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
Sweeney, Sean, David Dai, Rajendra Nayak, Sibi Sankar,
Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML
On Mon, Jul 29, 2019 at 2:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 18-07-19, 21:12, Saravana Kannan wrote:
> > On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > I would like
> > > to put this data in the GPU OPP table only. What about putting a
> > > range in the GPU OPP table for the Bandwidth if it can change so much
> > > for the same frequency.
> >
> > I don't think the range is going to work.
>
> Any specific reason for that ?
The next sentence was literally explaining this :) Fine to debate
that, but ignoring that and asking this question is kinda funny.
> > If a GPU is doing purely
> > computational work, it's not unreasonable for it to vote for the
> > lowest bandwidth for any GPU frequency.
>
> I think that is fine, but if the GPU is able to find how much
> bandwidth it needs why can't it just pass that value without needing
> to have another OPP table for the path ?
You were asking this question in the context of "can the GPU OPP just
list all the range of bandwidth it might use per GPU frequency". My point
is that the range would be useless because it would the entire
available bandwidth range (because purely compute work might not need
any bandwidth).
Whereas, what the GPU's algorithm actually needs might be the list of
"useful" bandwidth levels to use.
Also, as we add more ICC request properties, this range idea will not scale.
-Saravana
^permalinkrawreply [flat|nested] 47+ messages in thread

*Re: [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths
2019-07-29 20:12 ` Saravana Kannan@ 2019-07-30 3:01 ` Viresh Kumar
2019-08-03 7:36 ` Saravana Kannan0 siblings, 1 reply; 47+ messages in thread
From: Viresh Kumar @ 2019-07-30 3:01 UTC (permalink / raw)
To: Saravana Kannan
Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
Sweeney, Sean, David Dai, Rajendra Nayak, Sibi Sankar,
Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML
On 29-07-19, 13:12, Saravana Kannan wrote:
> On Mon, Jul 29, 2019 at 2:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> >
> > On 18-07-19, 21:12, Saravana Kannan wrote:
> > > On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > > I would like
> > > > to put this data in the GPU OPP table only. What about putting a
> > > > range in the GPU OPP table for the Bandwidth if it can change so much
> > > > for the same frequency.
> > >
> > > I don't think the range is going to work.
> >
> > Any specific reason for that ?
>
> The next sentence was literally explaining this :) Fine to debate
> that, but ignoring that and asking this question is kinda funny.
Okay, but ...
> > > If a GPU is doing purely
> > > computational work, it's not unreasonable for it to vote for the
> > > lowest bandwidth for any GPU frequency.
... it wasn't clear to me even after reading this sentence again now
:)
I understand that you may have to vote for the lowest bandwidth but
that doesn't explain why a range can't work (sorry if it was just me
who doesn't understood it :)).
> > I think that is fine, but if the GPU is able to find how much
> > bandwidth it needs why can't it just pass that value without needing
> > to have another OPP table for the path ?
>
> You were asking this question in the context of "can the GPU OPP just
> list all the range of bandwidth it might use per GPU frequency". My point
> is that the range would be useless because it would the entire
> available bandwidth range (because purely compute work might not need
> any bandwidth).
If it is useless to have entire range here, then why bother providing
one ? Why can't the GPU request what it needs in exact terms, based on
its calculations ? And then based on these requests, let the
interconnect find what's the best/stable values it really wants to
program the path for (and for that the interconnect can use its own
OPP table, which would be fine).
> Whereas, what the GPU's algorithm actually needs might be the list of
> "useful" bandwidth levels to use.
Hmm, I am not sure GPU's algorithm needs this table AFAIU based on all
the conversations we had until now. It is very capable of finding how
much bandwidth it needs, you just want the GPU driver to finally align
that with a stable bandwidth for the platform later on. And what I am
asking is that it is not required for the end driver to look for
stable values, it just requests what it wants and let the interconnect
core/driver decide the stable values.
Very much like the clock framework, most of the user drivers just ask
for a clk value to be programmed and it is the clock driver which
keeps a table of the stable values and then aligns the requested value
to one of those.
> Also, as we add more ICC request properties, this range idea will not scale.
I am not sure about what kind of properties there can be and where
should they land. My feedback is completely based on what you have
presented in this patchset.
--
viresh
^permalinkrawreply [flat|nested] 47+ messages in thread

*Re: [PATCH v3 0/6] Introduce Bandwidth OPPs for interconnect paths
2019-07-30 3:01 ` Viresh Kumar@ 2019-08-03 7:36 ` Saravana Kannan0 siblings, 0 replies; 47+ messages in thread
From: Saravana Kannan @ 2019-08-03 7:36 UTC (permalink / raw)
To: Viresh Kumar
Cc: Georgi Djakov, Rob Herring, Mark Rutland, Viresh Kumar,
Nishanth Menon, Stephen Boyd, Rafael J. Wysocki, Vincent Guittot,
Sweeney, Sean, David Dai, Rajendra Nayak, Sibi Sankar,
Bjorn Andersson, Evan Green, Android Kernel Team, Linux PM,
open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS, LKML
Resending due to HTML.
On Mon, Jul 29, 2019 at 8:02 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
>
> On 29-07-19, 13:12, Saravana Kannan wrote:
> > On Mon, Jul 29, 2019 at 2:24 AM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > >
> > > On 18-07-19, 21:12, Saravana Kannan wrote:
> > > > On Wed, Jul 17, 2019 at 10:37 PM Viresh Kumar <viresh.kumar@linaro.org> wrote:
> > > > > I would like
> > > > > to put this data in the GPU OPP table only. What about putting a
> > > > > range in the GPU OPP table for the Bandwidth if it can change so much
> > > > > for the same frequency.
> > > >
> > > > I don't think the range is going to work.
> > >
> > > Any specific reason for that ?
> >
> > The next sentence was literally explaining this :) Fine to debate
> > that, but ignoring that and asking this question is kinda funny.
>
> Okay, but ...
>
> > > > If a GPU is doing purely
> > > > computational work, it's not unreasonable for it to vote for the
> > > > lowest bandwidth for any GPU frequency.
>
> ... it wasn't clear to me even after reading this sentence again now
> :)
>
> I understand that you may have to vote for the lowest bandwidth but
> that doesn't explain why a range can't work (sorry if it was just me
> who doesn't understood it :)).
Well, doesn't work as in, it doesn't give any additional info. I can
just vote for 0 or UINT_MAX if I want to stay at the lowest or high
bandwidth. Having the actual values of the lowest or highest point
doesn't help for cases where you need to skip intermediate bandwidth
levels when going from low to high (as the need increases).
>
> > > I think that is fine, but if the GPU is able to find how much
> > > bandwidth it needs why can't it just pass that value without needing
> > > to have another OPP table for the path ?
> >
> > You were asking this question in the context of "can the GPU OPP just
> > list all the range of bandwidth it might use per GPU frequency". My point
> > is that the range would be useless because it would the entire
> > available bandwidth range (because purely compute work might not need
> > any bandwidth).
>
> If it is useless to have entire range here, then why bother providing
> one ? Why can't the GPU request what it needs in exact terms, based on
> its calculations ? And then based on these requests, let the
> interconnect find what's the best/stable values it really wants to
> program the path for (and for that the interconnect can use its own
> OPP table, which would be fine).
Let's say there actual path can support 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10 GB/s.
Let's say 2, 3, and 4 need the same voltage level as 5 for this path.
So, for GPU's needs using 2, 3 and 4 GB/s might not be good because
the power savings from the frequency difference is not worth the
performance and power (if you run the interconnect slow, the GPU would
run faster to achieve the same performance) impact compared to running
the interconnect at 5 GB/s. Similarly it might skip 6 GB/s. So even if
the GPU can somehow calculate the exact bandwidth required (or say
measure it), it'll need to know to skip 2, 3 and 4 because they aren't
power/perf efficient levels to use.
But all these bandwidth levels might be useable for a smaller HW IP
whose power cost isn't high. So power savings running the interconnect
at 3 GB/s might be worth it -- because even if the small HW IP ran
faster to achieve the performance, the power increase in the HW IP
won't be higher than the power savings from running the interconnect
slower.
> > Whereas, what the GPU's algorithm actually needs might be the list of
> > "useful" bandwidth levels to use.
>
> Hmm, I am not sure GPU's algorithm needs this table AFAIU based on all
> the conversations we had until now. It is very capable of finding how
> much bandwidth it needs,
Not really. If you have a monitor that can actually measure the
bandwidth, yes. Most often that's not the case. If you just have a
monitor that can give you the bus port busy% then it'll have to use
this table to pick the useful ones. As in, in the example above, if
the bus is still too busy at 1 GB/s it would directly ask for 5 GB/s
instead of going through 2, 3 and 4.
> you just want the GPU driver to finally align
> that with a stable bandwidth for the platform later on. And what I am
> asking is that it is not required for the end driver to look for
> stable values, it just requests what it wants and let the interconnect
> core/driver decide the stable values.
I think you've misunderstood my prior statements then.
The interconnect driver would then still have to aggregate the
requests and pick the final frequency for the interconnect. That's
where it comes in -- executing/implementing the requests of all the
clients.
> Very much like the clock framework, most of the user drivers just ask
> for a clk value to be programmed and it is the clock driver which
> keeps a table of the stable values and then aligns the requested value
> to one of those.
This of this similar to the clock API and the OPP tables for CPUs. The
clock API could run the CPU at multiple different frequencies. But the
CPU driver uses the CPU OPP table to pick a certain set of frequencies
that are "useful". If your CPU clock is shared with another block, say
L3 and there's an L3 driver that's requesting a different frequency
range (using clk_set_rate_range(x, UINT_MAX)), that's where the clock
driver aggregates their request and set's the final clock frequency.
Similarly, the GPU driver wants to pick useful interconnect path
bandwidth levels using BW OPP tables. And the interconnect framework
aggregates the requests across multiple drivers requesting different
bandwidths.
Does that make sense?
-Saravana
^permalinkrawreply [flat|nested] 47+ messages in thread