Date: Wed, 26 Jun 2019 11:10:31 -0700
From: Saravana Kannan <saravanak@...gle.com>
To: Viresh Kumar <viresh.kumar@...aro.org>
Cc: MyungJoo Ham <myungjoo.ham@...sung.com>,
Kyungmin Park <kyungmin.park@...sung.com>,
Chanwoo Choi <cw00.choi@...sung.com>,
Viresh Kumar <vireshk@...nel.org>, Nishanth Menon <nm@...com>,
Stephen Boyd <sboyd@...nel.org>,
"Rafael J. Wysocki" <rjw@...ysocki.net>,
Android Kernel Team <kernel-team@...roid.com>,
Linux PM <linux-pm@...r.kernel.org>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v1 0/3] Add required-opps support to devfreq passive gov
On Tue, Jun 25, 2019 at 11:32 PM Viresh Kumar <viresh.kumar@...aro.org> wrote:
>
> On 24-06-19, 22:29, Saravana Kannan wrote:
> > No, the CPUs will be the "parent" and the cache will be the "child".
> > CPU is a special case when it comes to the actual software (not DT) as
> > we'll need the devfreq governor to look at all the CPUfreq policies to
> > decide the cache frequency (max of all their requirements).
> >
> > I think "master" and "slave" would have been a better term as the
> > master device determines its frequency using whatever means and the
> > "slave" device just "follows" the master device.
>
> Okay, so to confirm again this is what we will have:
>
> - CPUs are called masters and Caches are slaves.
>
> - The devfreq governor we are talking about takes care of changing
> frequency of caches (slaves) and chooses a target frequency for
> caches based on what the masters are running at.
>
> - The CPUs OPP nodes will have required-opps property and will be
> pointing to the caches OPP nodes. The CPUs may already be using
> required-opps node for PM domain performance state thing.
>
>
> Now the problem is "required-opp" means something really *required*
> and it is not optional.
But we could interpret it as "required" for different things.
> Like a specific voltage level before we can
> switch to a particular frequency.
Required for stability.
> And this is how I have though of it
> until now. And this shouldn't be handled asynchronously, i.e. required
> OPPs must be set while configuring OPP of a device.
The users of clocks are expected to set up the voltage correctly
before changing the frequency, the drivers are expected to power up
the device before trying to access its registers, etc. So I don't
think there is one correct way. Also OPP sets the pstate only if
dev_pm_opp_set_genpd_virt_dev() has been called to set them up. So
this is not a mandatory principle in the kernel that a framework
providing an API should have all the dependencies. So I don't think
we'll be violating some golden rule.
> So, when a CPU changes frequency, we must change the performance state
> of PM domain and change frequency/bw of the cache synchronously.
I mean, it's going to be changed when we get the CPUfreq transition
notifiers. From a correctness point of view, setting it inside the OPP
framework is not any better than doing it when we get the notifiers.
> And
> in such a case "required-opps" can be a very good fit for this use
> case.
Glad you agree :)
> But with what you are trying to do it is no longer required-opp but
> good-to-have-opp. And that worries me.
I see this as "required for good performance". So I don't see it as
redefining required-opps. If someone wants good performance/power
balance they follow the "required-opps". Technically even the PM
pstates are required for good power. Otherwise, the system could leave
the voltage at max and stuff would still work.
Also, the slave device might need to get input from multiple master
devices and aggregate the request before setting the slave device
frequency. So I don't think OPP framework would be the right place to
deal with those things. For example, L3 might (will) have different
mappings for big vs little cores. So that needs to be aggregated and
set properly by the slave device driver. Also, GPU might have a
mapping for L3 too. In which case the L3 slave driver needs to take
input from even more masters before it decides its frequency. But most
importantly, we still need the ability to change governors for L3.
Again these are just examples with L3 and it can get more complicated
based on the situation.
Most importantly, instead of always going by mapping, one might decide
to scale the L3 based on some other governor (that looks at some HW
counter). Or just set it to performance governor for a use case for
which performance is more important. All of this comes for free with
devfreq and if we always set it from OPP framework we don't give this
required control to userspace.
I think going through devfreq is the right approach for this. And we
can always rewrite the software if we find problems in the future. But
as it stands today, this will help cases like exynos without the need
for a lot of changes. Hope I've convinced you.
-Saravana