Hello,
This series is a second stab at exposing hardware controls on Baseboard
Management Controllers that are hard to fit into any a coherent abstraction.
The patches introduce new devicetree bindings and sysfs attributes, along with
a platform driver to expose devicetree nodes of the former as the latter.
Obviously not having an abstract interface to these knobs and switches is not
ideal, but the proposal does have some advantages over devmem:
1. Removal of read-modify-write races, as register update is atomic
2. Reduced foot-gun, as only the defined field is accessible
3. Improved discoverability as the fields are named
The intent is that the setup should be used as a second-last resort (over
devmem). I'm interested in feedback on:
a) Is this a acceptable improvement over devmem?
b) If a), is the devicetree the best way to describe the fields?
c) If b), is directly mapping them to a sysfs attr group managable longterm?
My concern with b) and c) is that there's not a clear restriction on what
fields can be exposed using the driver, so I've tried to compensate by
explicitly documenting the recognised fields in the bindings.
Looking for feedback on all fronts.
Cheers,
Andrew
Andrew Jeffery (4):
dt-bindings: misc: Add bindings for misc. BMC control fields
Documentation: ABI: Add sysfs-devices-platform-field to testing
misc: Add bmc-misc-ctrl
dts: aspeed-g5: Describe VGA, SIO scratch and DAC mux fields
.../ABI/testing/sysfs-devices-platform-field | 95 ++++
.../bindings/misc/bmc-misc-ctrl.txt | 252 ++++++++++
MAINTAINERS | 8 +
arch/arm/boot/dts/aspeed-g5.dtsi | 192 ++++++++
drivers/misc/Kconfig | 11 +
drivers/misc/Makefile | 1 +
drivers/misc/bmc-misc-ctrl.c | 446 ++++++++++++++++++
7 files changed, 1005 insertions(+)
create mode 100644 Documentation/ABI/testing/sysfs-devices-platform-field
create mode 100644 Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
create mode 100644 drivers/misc/bmc-misc-ctrl.c
--
2.17.1

"Fields" expose control of hardware directly to userspace where
appropriate. Examples of expected use are single bit switches or other
small masks of registers where the range of values is entirely policy
driven and the field is not part of a larger, coherent design.
These fields can be from read-only, read-write or
write-1-set/write-1-clear register sets.
Using fields to control the behaviour of hardware local to the kernel
exposing them is likely incorrect. The use-case motivating the fields
feature is for Baseboard Management Controllers (BMCs) to expose policy
controls for booting and running their host systems.
Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
---
Since RFC v1:
* Describe a 'type' attribute that determines the behaviour of the remaining
attributes
* Rework paths to point through /sys/devices/platform
* Add a description to the commit message
.../ABI/testing/sysfs-devices-platform-field | 95 +++++++++++++++++++
MAINTAINERS | 1 +
2 files changed, 96 insertions(+)
create mode 100644 Documentation/ABI/testing/sysfs-devices-platform-field
diff --git a/Documentation/ABI/testing/sysfs-devices-platform-field b/Documentation/ABI/testing/sysfs-devices-platform-field
new file mode 100644
index 000000000000..216481d8bc99
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-devices-platform-field
@@ -0,0 +1,95 @@
+This document defines the sysfs attributes provided by the bmc-misc-ctrl
+driver. See Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt for
+exhaustive list of field definitions.
+
+What: /sys/devices/platform/.../<field>/<label>/label
+Date: July, 2018
+KernelVersion: v4.19
+Contact: openbmc@lists.ozlabs.org
+Description:
+ An RO attribute providing the name of the field of interest.
+ Corresponds to the value of <label> in the path
+Users: openbmc@lists.ozlabs.org
+
+What: /sys/devices/platform/.../<field>/<label>/type
+Date: July, 2018
+KernelVersion: v4.19
+Contact: openbmc@lists.ozlabs.org
+Description:
+ An RO attribute describing the type of the field. The type
+ takes one of three values:
+
+ 'ro': The field is read-only. The 'value' attribute will be
+ read-only and neither 'set' nor 'clear' attributes will
+ be present.
+ 'rw': The field is read-write. The 'value' attribute will be
+ both readable and writable and neither 'set' nor
+ 'clear' attributes will be present. Values written to
+ the 'value' attribute will be atomically updated.
+ 'w1sc': The field uses write-1-{set,clear} semantics. The
+ 'value' attribute will be read-only, and both 'set' and
+ 'clear' attributes will be present to manipulate
+ 'value'. 'set' and 'clear' will both be write-only.
+Users: openbmc@lists.ozlabs.org
+
+What: /sys/devices/platform/.../<field>/<label>/mask
+Date: July, 2018
+KernelVersion: v4.19
+Contact: openbmc@lists.ozlabs.org
+Description:
+ An RO attribute providing the mask applied to the value
+ read/written from the 'value' attribute.
+Users: openbmc@lists.ozlabs.org
+
+What: /sys/devices/platform/.../<field>/<label>/value
+Date: July, 2018
+KernelVersion: v4.19
+Contact: openbmc@lists.ozlabs.org
+Description:
+ The value of the field of interest.
+
+ If the field is exposed from a read-modify-write register this
+ attribute will be RW, where writes will set the field to the
+ value written. Writing values that exceed the width of the
+ field will return an error.
+
+ If the field is exposed from a write-1-set/write-1-clear
+ register this attribute will be RO, and the attributes 'set'
+ and 'clear' will be present as write-only.
+Users: openbmc@lists.ozlabs.org
+
+What: /sys/devices/platform/.../<field>/<label>/set
+Users: openbmc@lists.ozlabs.org
+Date: July, 2018
+KernelVersion: v4.19
+Contact: openbmc@lists.ozlabs.org
+Description:
+ A WO attribute that when written will set bits in the backing
+ register corresponding to set bits in the value written.
+ Register bits corresponding to cleared bits in the written
+ value will remain unchanged.
+
+ This attribute is exposed when the field is identified as being
+ composed of write-1-set and write-1-clear registers.
+
+ Writing values that exceed the width of the mask value will
+ return an error.
+Users: openbmc@lists.ozlabs.org
+
+What: /sys/devices/platform/.../<field>/<label>/clear
+Users: openbmc@lists.ozlabs.org
+Date: July, 2018
+KernelVersion: v4.19
+Contact: openbmc@lists.ozlabs.org
+Description:
+ A WO attribute that when written will clear bits in the backing
+ register corresponding to set bits in the value written.
+ Register bits corresponding to cleared bits in the written
+ value will remain unchanged.
+
+ This attribute is exposed when the field is identified as being
+ composed of write-1-set and write-1-clear registers.
+
+ Writing values that exceed the width of the mask value will
+ return an error.
+Users: openbmc@lists.ozlabs.org
diff --git a/MAINTAINERS b/MAINTAINERS
index fa2033a522f2..d167f0340c11 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2741,6 +2741,7 @@ R: Andrew Jeffery <andrew@aj.id.au>
L: openbmc@lists.ozlabs.org (moderated for non-subscribers)
S: Supported
F: Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
+F: Documentation/ABI/testing/sysfs-devices-platform-field
BPF (Safe dynamic programs and tools)
M: Alexei Starovoitov <ast@kernel.org>
--
2.17.1

On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote:
> Baseboard Management Controllers (BMCs) are embedded SoCs that exist to
> provide remote management of (primarily) server platforms. BMCs are
> often tightly coupled to the platform in terms of behaviour and provide
> many hardware features integral to booting and running the host system.
>
> Some of these hardware features are simple, for example scratch
> registers provided by the BMC that are exposed to both the host and the
> BMC. In other cases there's a single bit switch to enable or disable
> some of the provided functionality.
>
> The documentation defines bindings for fields in registers that do not
> integrate well into other driver models yet must be described to allow
> the BMC kernel to assume control of these features.
So we'll get a new binding when that happens? That will break
compatibility.
>
> Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
> ---
>
> Since RFC v1:
>
> * Add a commit message
> * Minor changes to documented labels
>
> .../bindings/misc/bmc-misc-ctrl.txt | 252 ++++++++++++++++++
> MAINTAINERS | 6 +
> 2 files changed, 258 insertions(+)
> create mode 100644 Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
>
> diff --git a/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
> new file mode 100644
> index 000000000000..2c869fcc7ef2
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
> @@ -0,0 +1,252 @@
> +BMC Miscellaneous Control Interfaces
> +====================================
> +
> +Baseboard Management Controllers (BMCs) often have an array of hardware
> +features that need to be described but are awkward to sensibly expose.
> +
> +This bindings document provides a generic mechanism for describing such
> +features, covering read-only (RO), read-modify-write (RMW) and
> +write-1-set/write-1-clear (W1SC) semantics.
If we wanted a generic mechanism for single register bits/fields in DT,
we'd have one already. A node per register bit doesn't scale.
Maybe this should be modelled using GPIO binding? There's a line there
too as whether the signals are "general purpose" or not.
Rob

On Wed, 2018-07-11 at 14:04 -0600, Rob Herring wrote:
> On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote:
> > Baseboard Management Controllers (BMCs) are embedded SoCs that exist to
> > provide remote management of (primarily) server platforms. BMCs are
> > often tightly coupled to the platform in terms of behaviour and provide
> > many hardware features integral to booting and running the host system.
> >
> > Some of these hardware features are simple, for example scratch
> > registers provided by the BMC that are exposed to both the host and the
> > BMC. In other cases there's a single bit switch to enable or disable
> > some of the provided functionality.
> >
> > The documentation defines bindings for fields in registers that do not
> > integrate well into other driver models yet must be described to allow
> > the BMC kernel to assume control of these features.
>
> So we'll get a new binding when that happens? That will break
> compatibility.
What do you mean ? The binding provides a way to describe arbitrary
register fields and expose them to userspace. I'm not sure what you
mean by backward compatibility.
There is simply no way these things can be "abstracted" via some kind
of nice layered Linux subsystem of some sort. It's just a whole bunch
of knobs that control various things integral to the operation of the
host system. Andrew can provide a more precise list if you need to.
>
> >
> > Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
> > ---
> >
> > Since RFC v1:
> >
> > * Add a commit message
> > * Minor changes to documented labels
> >
> > .../bindings/misc/bmc-misc-ctrl.txt | 252 ++++++++++++++++++
> > MAINTAINERS | 6 +
> > 2 files changed, 258 insertions(+)
> > create mode 100644 Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
> >
> > diff --git a/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
> > new file mode 100644
> > index 000000000000..2c869fcc7ef2
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
> > @@ -0,0 +1,252 @@
> > +BMC Miscellaneous Control Interfaces
> > +====================================
> > +
> > +Baseboard Management Controllers (BMCs) often have an array of hardware
> > +features that need to be described but are awkward to sensibly expose.
> > +
> > +This bindings document provides a generic mechanism for describing such
> > +features, covering read-only (RO), read-modify-write (RMW) and
> > +write-1-set/write-1-clear (W1SC) semantics.
>
> If we wanted a generic mechanism for single register bits/fields in DT,
> we'd have one already. A node per register bit doesn't scale.
Register *field*. I think you are looking at this from the wrong angle.
This isn't about exposing 236821643 SoC registers in an embedded
product. This is about exposing a dozen or so tunables that don't
control the SoC from the perspective of the OS running on it, but
controls how various features of the SoC are exposed to the *host*
system. Examples could be which of the SoC internal PCIe devices
exposed to the host is enabled (the SoC can appear as a GPU, a BMC misc
device or both or neither, it can have a DMA controller or not,
etc...). Other examples are scratch registers that need to be set to
system specific values by userspace, which the BIOS of the host will
read to determine some configuration information. That sort of thing.
There isn't that many, scalability isn't a problem. Both the list of
registers of interest and what needs to go in them is very much
system/vendor specific. This is a way for the kernel to act as a
"conduit" that doesn't need to know the specifics of a given
system/vendor/BIOS combination.
Your response smells like yet another case of trying to apply one of
those general "blanket" rules to something where it's completely
unapplicable.
> Maybe this should be modelled using GPIO binding? There's a line there
> too as whether the signals are "general purpose" or not.
GPIOs are binary values right ? These are arbitrary fields. We want
arbitrary fields. This is really needed Rob, otherwise we'll NEVER have
BMC support upstream.
The other option will be a dozen random ad-hoc char-devs that would
have to be updated every time a new bit needs to be added or changed.
Ben.

Hi Rob,
Thanks for the response.
On Thu, 12 Jul 2018, at 05:34, Rob Herring wrote:
> On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote:
> > Baseboard Management Controllers (BMCs) are embedded SoCs that exist to
> > provide remote management of (primarily) server platforms. BMCs are
> > often tightly coupled to the platform in terms of behaviour and provide
> > many hardware features integral to booting and running the host system.
> >
> > Some of these hardware features are simple, for example scratch
> > registers provided by the BMC that are exposed to both the host and the
> > BMC. In other cases there's a single bit switch to enable or disable
> > some of the provided functionality.
> >
> > The documentation defines bindings for fields in registers that do not
> > integrate well into other driver models yet must be described to allow
> > the BMC kernel to assume control of these features.
>
> So we'll get a new binding when that happens? That will break
> compatibility.
Can you please expand on this? I'm not following.
>
> >
> > Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
> > ---
> >
> > Since RFC v1:
> >
> > * Add a commit message
> > * Minor changes to documented labels
> >
> > .../bindings/misc/bmc-misc-ctrl.txt | 252 ++++++++++++++++++
> > MAINTAINERS | 6 +
> > 2 files changed, 258 insertions(+)
> > create mode 100644 Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
> >
> > diff --git a/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
> > new file mode 100644
> > index 000000000000..2c869fcc7ef2
> > --- /dev/null
> > +++ b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
> > @@ -0,0 +1,252 @@
> > +BMC Miscellaneous Control Interfaces
> > +====================================
> > +
> > +Baseboard Management Controllers (BMCs) often have an array of hardware
> > +features that need to be described but are awkward to sensibly expose.
> > +
> > +This bindings document provides a generic mechanism for describing such
> > +features, covering read-only (RO), read-modify-write (RMW) and
> > +write-1-set/write-1-clear (W1SC) semantics.
>
> If we wanted a generic mechanism for single register bits/fields in DT,
> we'd have one already.
I feel like this is an argument of tradition. Maybe people have been dissuaded from doing so when they don't have a reasonable use-case? I'm not saying that what I'm proposing is unquestionably reasonable, but I don't want to dismiss it out of hand.
> A node per register bit doesn't scale.
It isn't meant to scale in terms of a single system. Using it extensively is very likely wrong. Separately, register-bit-led does pretty much the same thing. Doesn't the scale argument apply there? Who is to stop me from attaching an insane number of LEDs to a system?
Obviously if there are lots of systems using it sparingly and legitimately then maybe there's a scale issue, but isn't that just a reality of different hardware designs? Whoever is implementing support for the system is going to have to describe the hardware one way or another.
>
> Maybe this should be modelled using GPIO binding? There's a line there
> too as whether the signals are "general purpose" or not.
I don't think so, mainly because some of the things it is intended to be used for are not GPIOs. For instance, take the DAC mux I've described in the patch. It doesn't directly influence anything external to the SoC (i.e. it's certainly not a traditional GPIO in any sense). However, it does *indirectly* influence the SoC's behaviour by muxing the DAC internally between:
0. VGA device exposed on the host PCIe bus
1. The "Graphics CRT" controller
2. VGA port A
3. VGA port B
Maybe this could be modelled by pinmux, but then we still need some way to expose the mux functions to userspace for selection (userspace needs to transition arbitrarily between at least options 0 and 1 at runtime), at which point we haven't achieved much beyond adding a whole heap of infrastructure in the chain.
Given 0 and 1, maybe exposing attributes in relevant drivers would be reasonable, except 0 isn't exposed on the SoC's internal bus so there is no driver on the BMC-side to do so. Taking into account 2 and 3 are also purely hardware paths further dashes the idea, as the configuration doesn't really "belong" to the Graphics CRT device more than it belongs anywhere else, except for the fact that there isn't anywhere else to expose it.
Further, the BMC's kernel can't make the decision as to when to switch the mux as it knows nothing of the host's state. The BMC userspace is controlling the host's boot state and so *does* know when to flip the switch. Finally, the mux is in separate IP to the CRT or VGA blocks: It lives in the System Control Unit.
My current point of view is the DAC mux field is effectively its own device, and we need to control it from userspace, so we need some way to describe it (i.e. not ignore it) in order for its capability to be exposed.
I'm fully aware what I'm proposing isn't awesome as it's not providing any real abstraction, but the problem(s) at hand also seem to defy abstraction, and in order to avoid a plethora of bespoke bindings I thought it was reasonable to define something generic.
All-in-all I appreciate the suggestion, but assuming you agree with my reasoning above do you have thoughts on other alternatives?
Cheers,
Andrew

On Wed, Jul 11, 2018 at 6:54 PM Andrew Jeffery <andrew@aj.id.au> wrote:
>
> Hi Rob,
>
> Thanks for the response.
>
> On Thu, 12 Jul 2018, at 05:34, Rob Herring wrote:
> > On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote:
> > > Baseboard Management Controllers (BMCs) are embedded SoCs that exist to
> > > provide remote management of (primarily) server platforms. BMCs are
> > > often tightly coupled to the platform in terms of behaviour and provide
> > > many hardware features integral to booting and running the host system.
> > >
> > > Some of these hardware features are simple, for example scratch
> > > registers provided by the BMC that are exposed to both the host and the
> > > BMC. In other cases there's a single bit switch to enable or disable
> > > some of the provided functionality.
> > >
> > > The documentation defines bindings for fields in registers that do not
> > > integrate well into other driver models yet must be described to allow
> > > the BMC kernel to assume control of these features.
> >
> > So we'll get a new binding when that happens? That will break
> > compatibility.
>
> Can you please expand on this? I'm not following.
If we have a subsystem in the future, then there would likely be an
associated binding which would be different. So if you update the DT,
then old kernels won't work with it.
> > > Signed-off-by: Andrew Jeffery <andrew@aj.id.au>
> > > ---
> > >
> > > Since RFC v1:
> > >
> > > * Add a commit message
> > > * Minor changes to documented labels
> > >
> > > .../bindings/misc/bmc-misc-ctrl.txt | 252 ++++++++++++++++++
> > > MAINTAINERS | 6 +
> > > 2 files changed, 258 insertions(+)
> > > create mode 100644 Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
> > >
> > > diff --git a/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
> > > new file mode 100644
> > > index 000000000000..2c869fcc7ef2
> > > --- /dev/null
> > > +++ b/Documentation/devicetree/bindings/misc/bmc-misc-ctrl.txt
> > > @@ -0,0 +1,252 @@
> > > +BMC Miscellaneous Control Interfaces
> > > +====================================
> > > +
> > > +Baseboard Management Controllers (BMCs) often have an array of hardware
> > > +features that need to be described but are awkward to sensibly expose.
> > > +
> > > +This bindings document provides a generic mechanism for describing such
> > > +features, covering read-only (RO), read-modify-write (RMW) and
> > > +write-1-set/write-1-clear (W1SC) semantics.
> >
> > If we wanted a generic mechanism for single register bits/fields in DT,
> > we'd have one already.
>
> I feel like this is an argument of tradition. Maybe people have been dissuaded from doing so when they don't have a reasonable use-case? I'm not saying that what I'm proposing is unquestionably reasonable, but I don't want to dismiss it out of hand.
One of experience. The one that stands out is clock bindings.
Initially we were doing a node per clock modelling which could end up
being 100s of nodes and is difficult to get right (with DT being an
ABI).
It comes up with system controller type blocks too that just have a
bunch of random registers. Those change in every SoC and not in any
controlled or ordered way that would make describing the individual
sub-functions in DT worthwhile.
>
> > A node per register bit doesn't scale.
>
> It isn't meant to scale in terms of a single system. Using it extensively is very likely wrong. Separately, register-bit-led does pretty much the same thing. Doesn't the scale argument apply there? Who is to stop me from attaching an insane number of LEDs to a system?
Review.
If you look, register-bit-led is rarely used outside of some ARM, Ltd.
boards. It's simply quite rare to have MMIO register bits that have a
fixed function of LED control.
> Obviously if there are lots of systems using it sparingly and legitimately then maybe there's a scale issue, but isn't that just a reality of different hardware designs? Whoever is implementing support for the system is going to have to describe the hardware one way or another.
>
> >
> > Maybe this should be modelled using GPIO binding? There's a line there
> > too as whether the signals are "general purpose" or not.
>
> I don't think so, mainly because some of the things it is intended to be used for are not GPIOs. For instance, take the DAC mux I've described in the patch. It doesn't directly influence anything external to the SoC (i.e. it's certainly not a traditional GPIO in any sense). However, it does *indirectly* influence the SoC's behaviour by muxing the DAC internally between:
>
> 0. VGA device exposed on the host PCIe bus
> 1. The "Graphics CRT" controller
> 2. VGA port A
> 3. VGA port B
And this mux control is fixed in the SoC design?
>
> Maybe this could be modelled by pinmux, but then we still need some way to expose the mux functions to userspace for selection (userspace needs to transition arbitrarily between at least options 0 and 1 at runtime), at which point we haven't achieved much beyond adding a whole heap of infrastructure in the chain.
>
> Given 0 and 1, maybe exposing attributes in relevant drivers would be reasonable, except 0 isn't exposed on the SoC's internal bus so there is no driver on the BMC-side to do so. Taking into account 2 and 3 are also purely hardware paths further dashes the idea, as the configuration doesn't really "belong" to the Graphics CRT device more than it belongs anywhere else, except for the fact that there isn't anywhere else to expose it.
>
> Further, the BMC's kernel can't make the decision as to when to switch the mux as it knows nothing of the host's state. The BMC userspace is controlling the host's boot state and so *does* know when to flip the switch. Finally, the mux is in separate IP to the CRT or VGA blocks: It lives in the System Control Unit.
>
> My current point of view is the DAC mux field is effectively its own device, and we need to control it from userspace, so we need some way to describe it (i.e. not ignore it) in order for its capability to be exposed.
>
> I'm fully aware what I'm proposing isn't awesome as it's not providing any real abstraction, but the problem(s) at hand also seem to defy abstraction, and in order to avoid a plethora of bespoke bindings I thought it was reasonable to define something generic.
>
> All-in-all I appreciate the suggestion, but assuming you agree with my reasoning above do you have thoughts on other alternatives?
Seems the controls are more fixed than I first thought. All the data
you have here could simply be within a driver. Help me understand what
functions are fixed (in the SoC) and which ones vary by board. Only
what's changing per board really needs to go into DT.
Rob

On Thu, 2018-07-12 at 09:11 -0600, Rob Herring wrote:
> On Wed, Jul 11, 2018 at 6:54 PM Andrew Jeffery <andrew@aj.id.au> wrote:
> >
> > Hi Rob,
> >
> > Thanks for the response.
> >
> > On Thu, 12 Jul 2018, at 05:34, Rob Herring wrote:
> > > On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote:
> > > > Baseboard Management Controllers (BMCs) are embedded SoCs that exist to
> > > > provide remote management of (primarily) server platforms. BMCs are
> > > > often tightly coupled to the platform in terms of behaviour and provide
> > > > many hardware features integral to booting and running the host system.
> > > >
> > > > Some of these hardware features are simple, for example scratch
> > > > registers provided by the BMC that are exposed to both the host and the
> > > > BMC. In other cases there's a single bit switch to enable or disable
> > > > some of the provided functionality.
> > > >
> > > > The documentation defines bindings for fields in registers that do not
> > > > integrate well into other driver models yet must be described to allow
> > > > the BMC kernel to assume control of these features.
> > >
> > > So we'll get a new binding when that happens? That will break
> > > compatibility.
> >
> > Can you please expand on this? I'm not following.
>
> If we have a subsystem in the future, then there would likely be an
> associated binding which would be different. So if you update the DT,
> then old kernels won't work with it.
What kind of "subsystem" ? There is almost no way there could be one
for that sort of BMC tunables. We've look at several BMC chips out
there and requirements from several vendors, BIOS and system
manufacturers and it's all over the place.
> > I feel like this is an argument of tradition. Maybe people have
> > been dissuaded from doing so when they don't have a reasonable use-
> > case? I'm not saying that what I'm proposing is unquestionably
> > reasonable, but I don't want to dismiss it out of hand.
>
> One of experience. The one that stands out is clock bindings.
> Initially we were doing a node per clock modelling which could end up
> being 100s of nodes and is difficult to get right (with DT being an
> ABI).
>
> It comes up with system controller type blocks too that just have a
> bunch of random registers. Those change in every SoC and not in any
> controlled or ordered way that would make describing the individual
> sub-functions in DT worthwhile.
So what's the alternative ? Because without something like what we
propose, what's going to happen is /dev/mem ... that's what people do
today.
> > > A node per register bit doesn't scale.
> >
> > It isn't meant to scale in terms of a single system. Using it
> > extensively is very likely wrong. Separately, register-bit-led does
> > pretty much the same thing. Doesn't the scale argument apply there?
> > Who is to stop me from attaching an insane number of LEDs to a
> > system?
>
> Review.
>
> If you look, register-bit-led is rarely used outside of some ARM, Ltd.
> boards. It's simply quite rare to have MMIO register bits that have a
> fixed function of LED control.
Well, same here, we hope to review what goes upstream to make it
reasonable. Otherwise it doens't matter. If a random vendor, let's say
IBM, chose to chip a system where they put an insane amount of cruft in
there, it will only affect those systems's BMC and the userspace stack
on it.
Thankfully that stack is OpenBMC and IBM is aiming at having their
device-tree's upstream, thus reviewed, thus it won't happen.
*Anything* can be abused. The point here is that we have a number,
thankfully rather small, maybe a dozen or two, of tunables that are
quite specific to a combination (system vendor, bmc vendor, system
model) which control a few HW features that essentially do *NOT* fit in
a subsystem.
For everything that does, we have created proper drivers (and are doing
more).
> > Obviously if there are lots of systems using it sparingly and
> > legitimately then maybe there's a scale issue, but isn't that just
> > a reality of different hardware designs? Whoever is implementing
> > support for the system is going to have to describe the hardware
> > one way or another.
> >
> > >
> > > Maybe this should be modelled using GPIO binding? There's a line there
> > > too as whether the signals are "general purpose" or not.
> >
> > I don't think so, mainly because some of the things it is intended to be used for are not GPIOs. For instance, take the DAC mux I've described in the patch. It doesn't directly influence anything external to the SoC (i.e. it's certainly not a traditional GPIO in any sense). However, it does *indirectly* influence the SoC's behaviour by muxing the DAC internally between:
> >
> > 0. VGA device exposed on the host PCIe bus
> > 1. The "Graphics CRT" controller
> > 2. VGA port A
> > 3. VGA port B
>
> And this mux control is fixed in the SoC design?
This specific family of SoC (Aspeed) support those 4 configurations.
How they need to be configured at runtime depends on the combination of
system vendor and system model, along with in some cases the need to
switch it at runtime.
This is just one example. Another one is the handful of scratch
registers that need to be populated with the "right" values for the
host system BIOS, VGA BIOS and VGA driver. (The host bits access them
via LPC IO space).
The host system BIOS will read some basic config info there before its
IPMI stack is up (and some BIOSes already rely on that). The VGA BIOS
will get some strapping info and panel info. The VGA driver (which is
already upstream, has been for a long time) will look for other things
in some of these guys, such as connector configuration.
Andrew, if it helps, we could put together a list of what we typically
need on an OpenPower system today. That would give people like Rob a
better idea of what this is all about.
> >
> > Maybe this could be modelled by pinmux, but then we still need some
> > way to expose the mux functions to userspace for selection
> > (userspace needs to transition arbitrarily between at least options
> > 0 and 1 at runtime), at which point we haven't achieved much beyond
> > adding a whole heap of infrastructure in the chain.
> >
> > Given 0 and 1, maybe exposing attributes in relevant drivers would
> > be reasonable, except 0 isn't exposed on the SoC's internal bus so
> > there is no driver on the BMC-side to do so. Taking into account 2
> > and 3 are also purely hardware paths further dashes the idea, as
> > the configuration doesn't really "belong" to the Graphics CRT
> > device more than it belongs anywhere else, except for the fact that
> > there isn't anywhere else to expose it.
> >
> > Further, the BMC's kernel can't make the decision as to when to
> > switch the mux as it knows nothing of the host's state. The BMC
> > userspace is controlling the host's boot state and so *does* know
> > when to flip the switch. Finally, the mux is in separate IP to the
> > CRT or VGA blocks: It lives in the System Control Unit.
> >
> > My current point of view is the DAC mux field is effectively its
> > own device, and we need to control it from userspace, so we need
> > some way to describe it (i.e. not ignore it) in order for its
> > capability to be exposed.
> >
> > I'm fully aware what I'm proposing isn't awesome as it's not
> > providing any real abstraction, but the problem(s) at hand also
> > seem to defy abstraction, and in order to avoid a plethora of
> > bespoke bindings I thought it was reasonable to define something
> > generic.
> >
> > All-in-all I appreciate the suggestion, but assuming you agree with
> > my reasoning above do you have thoughts on other alternatives?
>
> Seems the controls are more fixed than I first thought. All the data
> you have here could simply be within a driver. Help me understand what
> functions are fixed (in the SoC) and which ones vary by board. Only
> what's changing per board really needs to go into DT.
Most of these things is specific to a given board or may even need to
be changed at runtime.
For example the VGA mux is system specific, *and* will change at
runtime on some systems. For example, on IBM systems, the BMC will
route its internal CRT display controller to the VGA port in order to
display early boot progress information when the host hasn't
initialized its graphics driver yet, and route the host VGA to the VGA
port when the host has.
(To clarify, the BMC has 2 display controllers: one for use by the SoC
ARM itself, and one exposed to the host via PCIe, this routes which one
gets to output to the VGA port).
The scratch registers are similar. Their content tend to be specific to
a specific BIOS vendor/system manufacturer. It might need to change
from boot to boot based configuration changes the user might do via the
BMC IPMI or web interface.
Another example is that the SoC can expose a couple of PCIe devices to
the host, a given vendor will want to control whether to do that and
which ones to expose. This can be fixed or tunable. Some vendors want
the user to be able to control (via IPMI or web UI or some other
mechanism) whether the SoC internal VGA shows up or not depending on
whether they chose to use a separate discrete GPU, as some OSes get
confused when in the presence of multiple different graphic adapters.
Talking of which: Andrew, did you put "default values" in your binding
? That would be a nice way to deal with system specific immutables, so
that userspace doesn't even have to care.
So to clarify once and for all, *anything* that fits in a subsystem,
we're putting in one. All the random board control is all GPIOs and
that's fine as well. For some things that require a bit of fiddly usage
like the "MBOX" logic between BIOS and BMC we are also doing a
dedicated driver.
But there's a few stragglers here, and they tend to be so
board/system/BIOS specific that it's not sustainable to create/change
random drivers all the time just for exposing those few tunables.
Cheers,
Ben.

Hi Rob, Ben,
I've replied to you both inline below, hopefully it's clear enough from the context.
On Fri, 13 Jul 2018, at 10:25, Benjamin Herrenschmidt wrote:
> On Thu, 2018-07-12 at 09:11 -0600, Rob Herring wrote:
> > On Wed, Jul 11, 2018 at 6:54 PM Andrew Jeffery <andrew@aj.id.au> wrote:
> > >
> > > Hi Rob,
> > >
> > > Thanks for the response.
> > >
> > > On Thu, 12 Jul 2018, at 05:34, Rob Herring wrote:
> > > > On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote:
> > > > > Baseboard Management Controllers (BMCs) are embedded SoCs that exist to
> > > > > provide remote management of (primarily) server platforms. BMCs are
> > > > > often tightly coupled to the platform in terms of behaviour and provide
> > > > > many hardware features integral to booting and running the host system.
> > > > >
> > > > > Some of these hardware features are simple, for example scratch
> > > > > registers provided by the BMC that are exposed to both the host and the
> > > > > BMC. In other cases there's a single bit switch to enable or disable
> > > > > some of the provided functionality.
> > > > >
> > > > > The documentation defines bindings for fields in registers that do not
> > > > > integrate well into other driver models yet must be described to allow
> > > > > the BMC kernel to assume control of these features.
> > > >
> > > > So we'll get a new binding when that happens? That will break
> > > > compatibility.
> > >
> > > Can you please expand on this? I'm not following.
> >
> > If we have a subsystem in the future, then there would likely be an
> > associated binding which would be different. So if you update the DT,
> > then old kernels won't work with it.
>
> What kind of "subsystem" ? There is almost no way there could be one
> for that sort of BMC tunables. We've look at several BMC chips out
> there and requirements from several vendors, BIOS and system
> manufacturers and it's all over the place.
Right - This is the fundamental principle backing these patches: There will never be a coherent subsystem catering to any of what we want to describe with these bindings.
>
> > > I feel like this is an argument of tradition. Maybe people have
> > > been dissuaded from doing so when they don't have a reasonable use-
> > > case? I'm not saying that what I'm proposing is unquestionably
> > > reasonable, but I don't want to dismiss it out of hand.
> >
...
> >
> > It comes up with system controller type blocks too that just have a
> > bunch of random registers.
This matches the situation at hand.
> > Those change in every SoC and not in any
> > controlled or ordered way that would make describing the individual
> > sub-functions in DT worthwhile.
"Not worthwhile" is what I'm pushing back against for our use cases. I think they are narrow and limited enough to make it worthwhile.
Obviously we want to avoid describing these things *badly* - you mentioned the clock bindings - so I'm happy to hash out what the right representation should be. But I struggle to think the solution is not describing some of our hardware features at all.
>
> So what's the alternative ? Because without something like what we
> propose, what's going to happen is /dev/mem ... that's what people do
> today.
Yep. And I've outlined in the cover letter what I think are the advantages of what I'm proposing over /dev/mem. It's not an incredible gain, but has several of nice-to-have properties.
>
> > > > A node per register bit doesn't scale.
> > >
> > > It isn't meant to scale in terms of a single system. Using it
> > > extensively is very likely wrong. Separately, register-bit-led does
> > > pretty much the same thing. Doesn't the scale argument apply there?
> > > Who is to stop me from attaching an insane number of LEDs to a
> > > system?
> >
> > Review.
> >
> > If you look, register-bit-led is rarely used outside of some ARM, Ltd.
> > boards. It's simply quite rare to have MMIO register bits that have a
> > fixed function of LED control.
>
> Well, same here, we hope to review what goes upstream to make it
> reasonable. Otherwise it doens't matter. If a random vendor, let's say
> IBM, chose to chip a system where they put an insane amount of cruft in
> there, it will only affect those systems's BMC and the userspace stack
> on it.
>
> Thankfully that stack is OpenBMC and IBM is aiming at having their
> device-tree's upstream, thus reviewed, thus it won't happen.
>
> *Anything* can be abused. The point here is that we have a number,
> thankfully rather small, maybe a dozen or two, of tunables that are
> quite specific to a combination (system vendor, bmc vendor, system
> model) which control a few HW features that essentially do *NOT* fit in
> a subsystem.
Exactly. I tried to head off the abuse vector by requiring that uses be listed in the bindings document, and thus enforce some level of review. It might not be the most effective approach at the end of the day, but at least it is something.
>
> For everything that does, we have created proper drivers (and are doing
> more).
>
>
> > > Obviously if there are lots of systems using it sparingly and
> > > legitimately then maybe there's a scale issue, but isn't that just
> > > a reality of different hardware designs? Whoever is implementing
> > > support for the system is going to have to describe the hardware
> > > one way or another.
> > >
> > > >
> > > > Maybe this should be modelled using GPIO binding? There's a line there
> > > > too as whether the signals are "general purpose" or not.
> > >
> > > I don't think so, mainly because some of the things it is intended to be used for are not GPIOs. For instance, take the DAC mux I've described in the patch. It doesn't directly influence anything external to the SoC (i.e. it's certainly not a traditional GPIO in any sense). However, it does *indirectly* influence the SoC's behaviour by muxing the DAC internally between:
> > >
> > > 0. VGA device exposed on the host PCIe bus
> > > 1. The "Graphics CRT" controller
> > > 2. VGA port A
> > > 3. VGA port B
> >
> > And this mux control is fixed in the SoC design?
>
> This specific family of SoC (Aspeed) support those 4 configurations.
> How they need to be configured at runtime depends on the combination of
> system vendor and system model, along with in some cases the need to
> switch it at runtime.
>
> This is just one example. Another one is the handful of scratch
> registers that need to be populated with the "right" values for the
> host system BIOS, VGA BIOS and VGA driver. (The host bits access them
> via LPC IO space).
>
> The host system BIOS will read some basic config info there before its
> IPMI stack is up (and some BIOSes already rely on that). The VGA BIOS
> will get some strapping info and panel info. The VGA driver (which is
> already upstream, has been for a long time) will look for other things
> in some of these guys, such as connector configuration.
>
> Andrew, if it helps, we could put together a list of what we typically
> need on an OpenPower system today. That would give people like Rob a
> better idea of what this is all about.
It's primarily what I've outlined at the bottom of the bindings document, though the use cases aren't provided there as they are a bit out-of-scope. So the SuperIO and VGA scratch registers, plus the DAC mux. A bunch of tunable things.
OpenPOWER platforms make use of the SuperIO scratch registers to convey configuration information from the BMC to the host. Information provided includes low-level control of the host firmware initialisation process, UART and logging configuration, and the strategy for handling errors (crash vs log). This is all an "arbitrary" contract between the BMC userspace and the host firmware, i.e. different platforms/firmware could lay out the same information in different ways or communicate entirely different information altogether. The BMC kernel shouldn't care about any of it, other than provide sensible access to the hardware.
Again on OpenPOWER systems using the ASPEED BMC SoCs running OpenBMC, the BMC uses the VGA scratch registers to sense initialisation of the host graphics driver in the host's boot process. When the BMC userspace detects the host VGA driver is up we switch the DAC mux from the BMC CRT device to the host VGA device so that the host is now driving the VGA output. Non-OpenPOWER OpenBMC configurations may do something entirely different, or not do anything at all with the hardware, so as above, it's not really the job of the BMC kernel to be involved in any of this, other than to provide sensible access to userspace.
There are a number of other switches that control the availability of ASPEED BMC hardware features to the host system that also don't fit any particular subsystem and so will use these bindings, but our (OpenPOWER/OpenBMC) current uses are what's described above.
Dell also suggested they had some use-cases that aligned with the intent of the bindings, but I don't know what they had in mind. Eugene (on Cc) can elaborate.
>
> > >
> > > Maybe this could be modelled by pinmux, but then we still need some
> > > way to expose the mux functions to userspace for selection
> > > (userspace needs to transition arbitrarily between at least options
> > > 0 and 1 at runtime), at which point we haven't achieved much beyond
> > > adding a whole heap of infrastructure in the chain.
> > >
> > > Given 0 and 1, maybe exposing attributes in relevant drivers would
> > > be reasonable, except 0 isn't exposed on the SoC's internal bus so
> > > there is no driver on the BMC-side to do so. Taking into account 2
> > > and 3 are also purely hardware paths further dashes the idea, as
> > > the configuration doesn't really "belong" to the Graphics CRT
> > > device more than it belongs anywhere else, except for the fact that
> > > there isn't anywhere else to expose it.
> > >
> > > Further, the BMC's kernel can't make the decision as to when to
> > > switch the mux as it knows nothing of the host's state. The BMC
> > > userspace is controlling the host's boot state and so *does* know
> > > when to flip the switch. Finally, the mux is in separate IP to the
> > > CRT or VGA blocks: It lives in the System Control Unit.
> > >
> > > My current point of view is the DAC mux field is effectively its
> > > own device, and we need to control it from userspace, so we need
> > > some way to describe it (i.e. not ignore it) in order for its
> > > capability to be exposed.
> > >
> > > I'm fully aware what I'm proposing isn't awesome as it's not
> > > providing any real abstraction, but the problem(s) at hand also
> > > seem to defy abstraction, and in order to avoid a plethora of
> > > bespoke bindings I thought it was reasonable to define something
> > > generic.
> > >
> > > All-in-all I appreciate the suggestion, but assuming you agree with
> > > my reasoning above do you have thoughts on other alternatives?
> >
> > Seems the controls are more fixed than I first thought. All the data
> > you have here could simply be within a driver.
Rob: A driver for what though? One unique to this particular mux? That feels overly specific when we can generalise the concept to cover a wider range of use-cases.
> > Help me understand what
> > functions are fixed (in the SoC) and which ones vary by board. Only
> > what's changing per board really needs to go into DT.
I think this last sentence identifies a difference in our starting points, so I'd like to explore that. Blocks of functionality might move around inside the SoC as well, so don't we need a way to describe those functions appropriately? And from there describe how the SoC integrates the functions, and then describe how a board integrates the SoC? This all composes, and the problem at the end of the day comes down to what we want to view as a point of abstraction, right?
It seems ideal to me that the metadata about hardware features resides in the description of the relevant system (DT, for a function, a SoC or a board), otherwise don't we wind up with crazy, unfocused, monolithic drivers for things like system controllers? (There's MFD/syscon, but having used it previously I'm still grappling with the benefit over some of the weirdness it injects into devicetree - maybe I did it wrong.) Or alternatively, a generic driver that's choc full of platform-specific data covering the platforms that require it? The driver that implements the behaviour of the bindings described here turns out quite focused (even if the first attempt was a bit of a basket case, hopefully the second is better (sorry Greg)).
>
> Most of these things is specific to a given board or may even need to
> be changed at runtime.
*snip*...
>
> Talking of which: Andrew, did you put "default values" in your binding
> ? That would be a nice way to deal with system specific immutables, so
> that userspace doesn't even have to care.
Yes, I described a `default-value`property for RW fields, and `default-set` and `default-clear`properties for write-1-set/write-1-clear fields for exactly this purpose.
>
> So to clarify once and for all, *anything* that fits in a subsystem,
> we're putting in one. All the random board control is all GPIOs and
> that's fine as well. For some things that require a bit of fiddly usage
> like the "MBOX" logic between BIOS and BMC we are also doing a
> dedicated driver.
(As an aside, the "MBOX" functionality is slightly different from the scratch registers in that it has configurable interrupts each way (BMC-to-Host and Host-to-BMC) - as such it can be used to implement a dynamic protocol and so deserves its own driver. This is in contrast to the dumb scratch registers we're describing with these bindings which have no such interrupts.)
>
> But there's a few stragglers here, and they tend to be so
> board/system/BIOS specific that it's not sustainable to create/change
> random drivers all the time just for exposing those few tunables.
>
Yes, this is my feeling too.
Cheers,
Andrew

[-- Attachment #1.1: Type: text/plain, Size: 15156 bytes --]
Andrew, Ben, first of all let me thank you for bringing in this set of
patches.
From the discussion it looks to me like Rob is not familiar with
specifics of BMC-managed servers and tries to apply to them the rules
that have proven to be good for workstations and laptops.
As someone using /dev/mem these days to configure those registers in
BMCs, I'm all for this patch set as it will make BMC configuration less
obscure. Writing 1 or 0 to a named node is way clearer than writing some
magic value at some magic offset in /dev/mem. I like the idea of having
it all configurable via DT as it allows for only having exported the
nodes that are actually needed, thus reducing, as you have said, the
foot-gun.
So far I do not have any objections or constructive comments to the
architecture of the proposed patches.
So I'm writing this to support your position in this discussion and to
let Rob and other reviewers know that this feature is actually needed.
With best regards,
Alexander Amelkin
13.07.2018 09:31, Andrew Jeffery wrote:
> Hi Rob, Ben,
>
> I've replied to you both inline below, hopefully it's clear enough from the context.
>
> On Fri, 13 Jul 2018, at 10:25, Benjamin Herrenschmidt wrote:
>> On Thu, 2018-07-12 at 09:11 -0600, Rob Herring wrote:
>>> On Wed, Jul 11, 2018 at 6:54 PM Andrew Jeffery <andrew@aj.id.au> wrote:
>>>> Hi Rob,
>>>>
>>>> Thanks for the response.
>>>>
>>>> On Thu, 12 Jul 2018, at 05:34, Rob Herring wrote:
>>>>> On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote:
>>>>>> Baseboard Management Controllers (BMCs) are embedded SoCs that exist to
>>>>>> provide remote management of (primarily) server platforms. BMCs are
>>>>>> often tightly coupled to the platform in terms of behaviour and provide
>>>>>> many hardware features integral to booting and running the host system.
>>>>>>
>>>>>> Some of these hardware features are simple, for example scratch
>>>>>> registers provided by the BMC that are exposed to both the host and the
>>>>>> BMC. In other cases there's a single bit switch to enable or disable
>>>>>> some of the provided functionality.
>>>>>>
>>>>>> The documentation defines bindings for fields in registers that do not
>>>>>> integrate well into other driver models yet must be described to allow
>>>>>> the BMC kernel to assume control of these features.
>>>>> So we'll get a new binding when that happens? That will break
>>>>> compatibility.
>>>> Can you please expand on this? I'm not following.
>>> If we have a subsystem in the future, then there would likely be an
>>> associated binding which would be different. So if you update the DT,
>>> then old kernels won't work with it.
>> What kind of "subsystem" ? There is almost no way there could be one
>> for that sort of BMC tunables. We've look at several BMC chips out
>> there and requirements from several vendors, BIOS and system
>> manufacturers and it's all over the place.
> Right - This is the fundamental principle backing these patches: There will never be a coherent subsystem catering to any of what we want to describe with these bindings.
>
>>>> I feel like this is an argument of tradition. Maybe people have
>>>> been dissuaded from doing so when they don't have a reasonable use-
>>>> case? I'm not saying that what I'm proposing is unquestionably
>>>> reasonable, but I don't want to dismiss it out of hand.
> ...
>>> It comes up with system controller type blocks too that just have a
>>> bunch of random registers.
> This matches the situation at hand.
>
>>> Those change in every SoC and not in any
>>> controlled or ordered way that would make describing the individual
>>> sub-functions in DT worthwhile.
> "Not worthwhile" is what I'm pushing back against for our use cases. I think they are narrow and limited enough to make it worthwhile.
>
> Obviously we want to avoid describing these things *badly* - you mentioned the clock bindings - so I'm happy to hash out what the right representation should be. But I struggle to think the solution is not describing some of our hardware features at all.
>
>> So what's the alternative ? Because without something like what we
>> propose, what's going to happen is /dev/mem ... that's what people do
>> today.
> Yep. And I've outlined in the cover letter what I think are the advantages of what I'm proposing over /dev/mem. It's not an incredible gain, but has several of nice-to-have properties.
>
>>>>> A node per register bit doesn't scale.
>>>> It isn't meant to scale in terms of a single system. Using it
>>>> extensively is very likely wrong. Separately, register-bit-led does
>>>> pretty much the same thing. Doesn't the scale argument apply there?
>>>> Who is to stop me from attaching an insane number of LEDs to a
>>>> system?
>>> Review.
>>>
>>> If you look, register-bit-led is rarely used outside of some ARM, Ltd.
>>> boards. It's simply quite rare to have MMIO register bits that have a
>>> fixed function of LED control.
>> Well, same here, we hope to review what goes upstream to make it
>> reasonable. Otherwise it doens't matter. If a random vendor, let's say
>> IBM, chose to chip a system where they put an insane amount of cruft in
>> there, it will only affect those systems's BMC and the userspace stack
>> on it.
>>
>> Thankfully that stack is OpenBMC and IBM is aiming at having their
>> device-tree's upstream, thus reviewed, thus it won't happen.
>>
>> *Anything* can be abused. The point here is that we have a number,
>> thankfully rather small, maybe a dozen or two, of tunables that are
>> quite specific to a combination (system vendor, bmc vendor, system
>> model) which control a few HW features that essentially do *NOT* fit in
>> a subsystem.
> Exactly. I tried to head off the abuse vector by requiring that uses be listed in the bindings document, and thus enforce some level of review. It might not be the most effective approach at the end of the day, but at least it is something.
>
>> For everything that does, we have created proper drivers (and are doing
>> more).
>>
>>
>>>> Obviously if there are lots of systems using it sparingly and
>>>> legitimately then maybe there's a scale issue, but isn't that just
>>>> a reality of different hardware designs? Whoever is implementing
>>>> support for the system is going to have to describe the hardware
>>>> one way or another.
>>>>
>>>>> Maybe this should be modelled using GPIO binding? There's a line there
>>>>> too as whether the signals are "general purpose" or not.
>>>> I don't think so, mainly because some of the things it is intended to be used for are not GPIOs. For instance, take the DAC mux I've described in the patch. It doesn't directly influence anything external to the SoC (i.e. it's certainly not a traditional GPIO in any sense). However, it does *indirectly* influence the SoC's behaviour by muxing the DAC internally between:
>>>>
>>>> 0. VGA device exposed on the host PCIe bus
>>>> 1. The "Graphics CRT" controller
>>>> 2. VGA port A
>>>> 3. VGA port B
>>> And this mux control is fixed in the SoC design?
>> This specific family of SoC (Aspeed) support those 4 configurations.
>> How they need to be configured at runtime depends on the combination of
>> system vendor and system model, along with in some cases the need to
>> switch it at runtime.
>>
>> This is just one example. Another one is the handful of scratch
>> registers that need to be populated with the "right" values for the
>> host system BIOS, VGA BIOS and VGA driver. (The host bits access them
>> via LPC IO space).
>>
>> The host system BIOS will read some basic config info there before its
>> IPMI stack is up (and some BIOSes already rely on that). The VGA BIOS
>> will get some strapping info and panel info. The VGA driver (which is
>> already upstream, has been for a long time) will look for other things
>> in some of these guys, such as connector configuration.
>>
>> Andrew, if it helps, we could put together a list of what we typically
>> need on an OpenPower system today. That would give people like Rob a
>> better idea of what this is all about.
> It's primarily what I've outlined at the bottom of the bindings document, though the use cases aren't provided there as they are a bit out-of-scope. So the SuperIO and VGA scratch registers, plus the DAC mux. A bunch of tunable things.
>
> OpenPOWER platforms make use of the SuperIO scratch registers to convey configuration information from the BMC to the host. Information provided includes low-level control of the host firmware initialisation process, UART and logging configuration, and the strategy for handling errors (crash vs log). This is all an "arbitrary" contract between the BMC userspace and the host firmware, i.e. different platforms/firmware could lay out the same information in different ways or communicate entirely different information altogether. The BMC kernel shouldn't care about any of it, other than provide sensible access to the hardware.
>
> Again on OpenPOWER systems using the ASPEED BMC SoCs running OpenBMC, the BMC uses the VGA scratch registers to sense initialisation of the host graphics driver in the host's boot process. When the BMC userspace detects the host VGA driver is up we switch the DAC mux from the BMC CRT device to the host VGA device so that the host is now driving the VGA output. Non-OpenPOWER OpenBMC configurations may do something entirely different, or not do anything at all with the hardware, so as above, it's not really the job of the BMC kernel to be involved in any of this, other than to provide sensible access to userspace.
>
> There are a number of other switches that control the availability of ASPEED BMC hardware features to the host system that also don't fit any particular subsystem and so will use these bindings, but our (OpenPOWER/OpenBMC) current uses are what's described above.
>
> Dell also suggested they had some use-cases that aligned with the intent of the bindings, but I don't know what they had in mind. Eugene (on Cc) can elaborate.
>
>>>> Maybe this could be modelled by pinmux, but then we still need some
>>>> way to expose the mux functions to userspace for selection
>>>> (userspace needs to transition arbitrarily between at least options
>>>> 0 and 1 at runtime), at which point we haven't achieved much beyond
>>>> adding a whole heap of infrastructure in the chain.
>>>>
>>>> Given 0 and 1, maybe exposing attributes in relevant drivers would
>>>> be reasonable, except 0 isn't exposed on the SoC's internal bus so
>>>> there is no driver on the BMC-side to do so. Taking into account 2
>>>> and 3 are also purely hardware paths further dashes the idea, as
>>>> the configuration doesn't really "belong" to the Graphics CRT
>>>> device more than it belongs anywhere else, except for the fact that
>>>> there isn't anywhere else to expose it.
>>>>
>>>> Further, the BMC's kernel can't make the decision as to when to
>>>> switch the mux as it knows nothing of the host's state. The BMC
>>>> userspace is controlling the host's boot state and so *does* know
>>>> when to flip the switch. Finally, the mux is in separate IP to the
>>>> CRT or VGA blocks: It lives in the System Control Unit.
>>>>
>>>> My current point of view is the DAC mux field is effectively its
>>>> own device, and we need to control it from userspace, so we need
>>>> some way to describe it (i.e. not ignore it) in order for its
>>>> capability to be exposed.
>>>>
>>>> I'm fully aware what I'm proposing isn't awesome as it's not
>>>> providing any real abstraction, but the problem(s) at hand also
>>>> seem to defy abstraction, and in order to avoid a plethora of
>>>> bespoke bindings I thought it was reasonable to define something
>>>> generic.
>>>>
>>>> All-in-all I appreciate the suggestion, but assuming you agree with
>>>> my reasoning above do you have thoughts on other alternatives?
>>> Seems the controls are more fixed than I first thought. All the data
>>> you have here could simply be within a driver.
> Rob: A driver for what though? One unique to this particular mux? That feels overly specific when we can generalise the concept to cover a wider range of use-cases.
>
>>> Help me understand what
>>> functions are fixed (in the SoC) and which ones vary by board. Only
>>> what's changing per board really needs to go into DT.
> I think this last sentence identifies a difference in our starting points, so I'd like to explore that. Blocks of functionality might move around inside the SoC as well, so don't we need a way to describe those functions appropriately? And from there describe how the SoC integrates the functions, and then describe how a board integrates the SoC? This all composes, and the problem at the end of the day comes down to what we want to view as a point of abstraction, right?
>
> It seems ideal to me that the metadata about hardware features resides in the description of the relevant system (DT, for a function, a SoC or a board), otherwise don't we wind up with crazy, unfocused, monolithic drivers for things like system controllers? (There's MFD/syscon, but having used it previously I'm still grappling with the benefit over some of the weirdness it injects into devicetree - maybe I did it wrong.) Or alternatively, a generic driver that's choc full of platform-specific data covering the platforms that require it? The driver that implements the behaviour of the bindings described here turns out quite focused (even if the first attempt was a bit of a basket case, hopefully the second is better (sorry Greg)).
>
>> Most of these things is specific to a given board or may even need to
>> be changed at runtime.
> *snip*...
>
>> Talking of which: Andrew, did you put "default values" in your binding
>> ? That would be a nice way to deal with system specific immutables, so
>> that userspace doesn't even have to care.
> Yes, I described a `default-value`property for RW fields, and `default-set` and `default-clear`properties for write-1-set/write-1-clear fields for exactly this purpose.
>
>> So to clarify once and for all, *anything* that fits in a subsystem,
>> we're putting in one. All the random board control is all GPIOs and
>> that's fine as well. For some things that require a bit of fiddly usage
>> like the "MBOX" logic between BIOS and BMC we are also doing a
>> dedicated driver.
> (As an aside, the "MBOX" functionality is slightly different from the scratch registers in that it has configurable interrupts each way (BMC-to-Host and Host-to-BMC) - as such it can be used to implement a dynamic protocol and so deserves its own driver. This is in contrast to the dumb scratch registers we're describing with these bindings which have no such interrupts.)
>
>> But there's a few stragglers here, and they tend to be so
>> board/system/BIOS specific that it's not sustainable to create/change
>> random drivers all the time just for exposing those few tunables.
>>
> Yes, this is my feeling too.
>
> Cheers,
>
> Andrew
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

Dell - Internal Use - Confidential
+1 from someone using Nuvoton's BMC SoC
-----Original Message-----
From: Alexander Amelkin [mailto:a.amelkin@yadro.com]
Sent: Friday, July 13, 2018 10:14 AM
To: Andrew Jeffery; Benjamin Herrenschmidt; Rob Herring
Cc: Mark Rutland; devicetree@vger.kernel.org; Greg Kroah-Hartman; Cho, Eugene; linux-kernel@vger.kernel.org; Joel Stanley; stewart@linux.ibm.com; OpenBMC Maillist; moderated list:ARM/FREESCALE IMX / MXC ARM ARCHITECTURE
Subject: Re: [RFC PATCH v2 1/4] dt-bindings: misc: Add bindings for misc. BMC control fields
Andrew, Ben, first of all let me thank you for bringing in this set of patches.
From the discussion it looks to me like Rob is not familiar with specifics of BMC-managed servers and tries to apply to them the rules that have proven to be good for workstations and laptops.
As someone using /dev/mem these days to configure those registers in BMCs, I'm all for this patch set as it will make BMC configuration less obscure. Writing 1 or 0 to a named node is way clearer than writing some magic value at some magic offset in /dev/mem. I like the idea of having it all configurable via DT as it allows for only having exported the nodes that are actually needed, thus reducing, as you have said, the foot-gun.
So far I do not have any objections or constructive comments to the architecture of the proposed patches.
So I'm writing this to support your position in this discussion and to let Rob and other reviewers know that this feature is actually needed.
With best regards,
Alexander Amelkin
13.07.2018 09:31, Andrew Jeffery wrote:
> Hi Rob, Ben,
>
> I've replied to you both inline below, hopefully it's clear enough from the context.
>
> On Fri, 13 Jul 2018, at 10:25, Benjamin Herrenschmidt wrote:
>> On Thu, 2018-07-12 at 09:11 -0600, Rob Herring wrote:
>>> On Wed, Jul 11, 2018 at 6:54 PM Andrew Jeffery <andrew@aj.id.au> wrote:
>>>> Hi Rob,
>>>>
>>>> Thanks for the response.
>>>>
>>>> On Thu, 12 Jul 2018, at 05:34, Rob Herring wrote:
>>>>> On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote:
>>>>>> Baseboard Management Controllers (BMCs) are embedded SoCs that
>>>>>> exist to provide remote management of (primarily) server
>>>>>> platforms. BMCs are often tightly coupled to the platform in
>>>>>> terms of behaviour and provide many hardware features integral to booting and running the host system.
>>>>>>
>>>>>> Some of these hardware features are simple, for example scratch
>>>>>> registers provided by the BMC that are exposed to both the host
>>>>>> and the BMC. In other cases there's a single bit switch to enable
>>>>>> or disable some of the provided functionality.
>>>>>>
>>>>>> The documentation defines bindings for fields in registers that
>>>>>> do not integrate well into other driver models yet must be
>>>>>> described to allow the BMC kernel to assume control of these features.
>>>>> So we'll get a new binding when that happens? That will break
>>>>> compatibility.
>>>> Can you please expand on this? I'm not following.
>>> If we have a subsystem in the future, then there would likely be an
>>> associated binding which would be different. So if you update the
>>> DT, then old kernels won't work with it.
>> What kind of "subsystem" ? There is almost no way there could be one
>> for that sort of BMC tunables. We've look at several BMC chips out
>> there and requirements from several vendors, BIOS and system
>> manufacturers and it's all over the place.
> Right - This is the fundamental principle backing these patches: There will never be a coherent subsystem catering to any of what we want to describe with these bindings.
>
>>>> I feel like this is an argument of tradition. Maybe people have
>>>> been dissuaded from doing so when they don't have a reasonable use-
>>>> case? I'm not saying that what I'm proposing is unquestionably
>>>> reasonable, but I don't want to dismiss it out of hand.
> ...
>>> It comes up with system controller type blocks too that just have a
>>> bunch of random registers.
> This matches the situation at hand.
>
>>> Those change in every SoC and not in any controlled or ordered way
>>> that would make describing the individual sub-functions in DT
>>> worthwhile.
> "Not worthwhile" is what I'm pushing back against for our use cases. I think they are narrow and limited enough to make it worthwhile.
>
> Obviously we want to avoid describing these things *badly* - you mentioned the clock bindings - so I'm happy to hash out what the right representation should be. But I struggle to think the solution is not describing some of our hardware features at all.
>
>> So what's the alternative ? Because without something like what we
>> propose, what's going to happen is /dev/mem ... that's what people do
>> today.
> Yep. And I've outlined in the cover letter what I think are the advantages of what I'm proposing over /dev/mem. It's not an incredible gain, but has several of nice-to-have properties.
>
>>>>> A node per register bit doesn't scale.
>>>> It isn't meant to scale in terms of a single system. Using it
>>>> extensively is very likely wrong. Separately, register-bit-led does
>>>> pretty much the same thing. Doesn't the scale argument apply there?
>>>> Who is to stop me from attaching an insane number of LEDs to a
>>>> system?
>>> Review.
>>>
>>> If you look, register-bit-led is rarely used outside of some ARM, Ltd.
>>> boards. It's simply quite rare to have MMIO register bits that have
>>> a fixed function of LED control.
>> Well, same here, we hope to review what goes upstream to make it
>> reasonable. Otherwise it doens't matter. If a random vendor, let's
>> say IBM, chose to chip a system where they put an insane amount of
>> cruft in there, it will only affect those systems's BMC and the
>> userspace stack on it.
>>
>> Thankfully that stack is OpenBMC and IBM is aiming at having their
>> device-tree's upstream, thus reviewed, thus it won't happen.
>>
>> *Anything* can be abused. The point here is that we have a number,
>> thankfully rather small, maybe a dozen or two, of tunables that are
>> quite specific to a combination (system vendor, bmc vendor, system
>> model) which control a few HW features that essentially do *NOT* fit
>> in a subsystem.
> Exactly. I tried to head off the abuse vector by requiring that uses be listed in the bindings document, and thus enforce some level of review. It might not be the most effective approach at the end of the day, but at least it is something.
>
>> For everything that does, we have created proper drivers (and are
>> doing more).
>>
>>
>>>> Obviously if there are lots of systems using it sparingly and
>>>> legitimately then maybe there's a scale issue, but isn't that just
>>>> a reality of different hardware designs? Whoever is implementing
>>>> support for the system is going to have to describe the hardware
>>>> one way or another.
>>>>
>>>>> Maybe this should be modelled using GPIO binding? There's a line
>>>>> there too as whether the signals are "general purpose" or not.
>>>> I don't think so, mainly because some of the things it is intended to be used for are not GPIOs. For instance, take the DAC mux I've described in the patch. It doesn't directly influence anything external to the SoC (i.e. it's certainly not a traditional GPIO in any sense). However, it does *indirectly* influence the SoC's behaviour by muxing the DAC internally between:
>>>>
>>>> 0. VGA device exposed on the host PCIe bus 1. The "Graphics CRT"
>>>> controller 2. VGA port A 3. VGA port B
>>> And this mux control is fixed in the SoC design?
>> This specific family of SoC (Aspeed) support those 4 configurations.
>> How they need to be configured at runtime depends on the combination
>> of system vendor and system model, along with in some cases the need
>> to switch it at runtime.
>>
>> This is just one example. Another one is the handful of scratch
>> registers that need to be populated with the "right" values for the
>> host system BIOS, VGA BIOS and VGA driver. (The host bits access them
>> via LPC IO space).
>>
>> The host system BIOS will read some basic config info there before
>> its IPMI stack is up (and some BIOSes already rely on that). The VGA
>> BIOS will get some strapping info and panel info. The VGA driver
>> (which is already upstream, has been for a long time) will look for
>> other things in some of these guys, such as connector configuration.
>>
>> Andrew, if it helps, we could put together a list of what we
>> typically need on an OpenPower system today. That would give people
>> like Rob a better idea of what this is all about.
> It's primarily what I've outlined at the bottom of the bindings document, though the use cases aren't provided there as they are a bit out-of-scope. So the SuperIO and VGA scratch registers, plus the DAC mux. A bunch of tunable things.
>
> OpenPOWER platforms make use of the SuperIO scratch registers to convey configuration information from the BMC to the host. Information provided includes low-level control of the host firmware initialisation process, UART and logging configuration, and the strategy for handling errors (crash vs log). This is all an "arbitrary" contract between the BMC userspace and the host firmware, i.e. different platforms/firmware could lay out the same information in different ways or communicate entirely different information altogether. The BMC kernel shouldn't care about any of it, other than provide sensible access to the hardware.
>
> Again on OpenPOWER systems using the ASPEED BMC SoCs running OpenBMC, the BMC uses the VGA scratch registers to sense initialisation of the host graphics driver in the host's boot process. When the BMC userspace detects the host VGA driver is up we switch the DAC mux from the BMC CRT device to the host VGA device so that the host is now driving the VGA output. Non-OpenPOWER OpenBMC configurations may do something entirely different, or not do anything at all with the hardware, so as above, it's not really the job of the BMC kernel to be involved in any of this, other than to provide sensible access to userspace.
>
> There are a number of other switches that control the availability of ASPEED BMC hardware features to the host system that also don't fit any particular subsystem and so will use these bindings, but our (OpenPOWER/OpenBMC) current uses are what's described above.
>
> Dell also suggested they had some use-cases that aligned with the intent of the bindings, but I don't know what they had in mind. Eugene (on Cc) can elaborate.
>
>>>> Maybe this could be modelled by pinmux, but then we still need some
>>>> way to expose the mux functions to userspace for selection
>>>> (userspace needs to transition arbitrarily between at least options
>>>> 0 and 1 at runtime), at which point we haven't achieved much beyond
>>>> adding a whole heap of infrastructure in the chain.
>>>>
>>>> Given 0 and 1, maybe exposing attributes in relevant drivers would
>>>> be reasonable, except 0 isn't exposed on the SoC's internal bus so
>>>> there is no driver on the BMC-side to do so. Taking into account 2
>>>> and 3 are also purely hardware paths further dashes the idea, as
>>>> the configuration doesn't really "belong" to the Graphics CRT
>>>> device more than it belongs anywhere else, except for the fact that
>>>> there isn't anywhere else to expose it.
>>>>
>>>> Further, the BMC's kernel can't make the decision as to when to
>>>> switch the mux as it knows nothing of the host's state. The BMC
>>>> userspace is controlling the host's boot state and so *does* know
>>>> when to flip the switch. Finally, the mux is in separate IP to the
>>>> CRT or VGA blocks: It lives in the System Control Unit.
>>>>
>>>> My current point of view is the DAC mux field is effectively its
>>>> own device, and we need to control it from userspace, so we need
>>>> some way to describe it (i.e. not ignore it) in order for its
>>>> capability to be exposed.
>>>>
>>>> I'm fully aware what I'm proposing isn't awesome as it's not
>>>> providing any real abstraction, but the problem(s) at hand also
>>>> seem to defy abstraction, and in order to avoid a plethora of
>>>> bespoke bindings I thought it was reasonable to define something
>>>> generic.
>>>>
>>>> All-in-all I appreciate the suggestion, but assuming you agree with
>>>> my reasoning above do you have thoughts on other alternatives?
>>> Seems the controls are more fixed than I first thought. All the data
>>> you have here could simply be within a driver.
> Rob: A driver for what though? One unique to this particular mux? That feels overly specific when we can generalise the concept to cover a wider range of use-cases.
>
>>> Help me understand what
>>> functions are fixed (in the SoC) and which ones vary by board. Only
>>> what's changing per board really needs to go into DT.
> I think this last sentence identifies a difference in our starting points, so I'd like to explore that. Blocks of functionality might move around inside the SoC as well, so don't we need a way to describe those functions appropriately? And from there describe how the SoC integrates the functions, and then describe how a board integrates the SoC? This all composes, and the problem at the end of the day comes down to what we want to view as a point of abstraction, right?
>
> It seems ideal to me that the metadata about hardware features resides in the description of the relevant system (DT, for a function, a SoC or a board), otherwise don't we wind up with crazy, unfocused, monolithic drivers for things like system controllers? (There's MFD/syscon, but having used it previously I'm still grappling with the benefit over some of the weirdness it injects into devicetree - maybe I did it wrong.) Or alternatively, a generic driver that's choc full of platform-specific data covering the platforms that require it? The driver that implements the behaviour of the bindings described here turns out quite focused (even if the first attempt was a bit of a basket case, hopefully the second is better (sorry Greg)).
>
>> Most of these things is specific to a given board or may even need to
>> be changed at runtime.
> *snip*...
>
>> Talking of which: Andrew, did you put "default values" in your
>> binding ? That would be a nice way to deal with system specific
>> immutables, so that userspace doesn't even have to care.
> Yes, I described a `default-value`property for RW fields, and `default-set` and `default-clear`properties for write-1-set/write-1-clear fields for exactly this purpose.
>
>> So to clarify once and for all, *anything* that fits in a subsystem,
>> we're putting in one. All the random board control is all GPIOs and
>> that's fine as well. For some things that require a bit of fiddly
>> usage like the "MBOX" logic between BIOS and BMC we are also doing a
>> dedicated driver.
> (As an aside, the "MBOX" functionality is slightly different from the
> scratch registers in that it has configurable interrupts each way
> (BMC-to-Host and Host-to-BMC) - as such it can be used to implement a
> dynamic protocol and so deserves its own driver. This is in contrast
> to the dumb scratch registers we're describing with these bindings
> which have no such interrupts.)
>
>> But there's a few stragglers here, and they tend to be so
>> board/system/BIOS specific that it's not sustainable to create/change
>> random drivers all the time just for exposing those few tunables.
>>
> Yes, this is my feeling too.
>
> Cheers,
>
> Andrew

Hi Alexander,
I've rearranged your reply slightly :)
On Sat, 14 Jul 2018, at 00:44, Alexander Amelkin wrote:
>
> So I'm writing this to support your position in this discussion and to
> let Rob and other reviewers know that this feature is actually needed.
Thanks.
So to summarise some other replies so far, YADRO (Alexander) and Dell (Eugene) - platform vendors integrating BMCs - are interested in a solution, and the patches are seen useful for ASPEED and Nuvoton (Avi) BMCs.
>
> From the discussion it looks to me like Rob is not familiar with
> specifics of BMC-managed servers and tries to apply to them the rules
> that have proven to be good for workstations and laptops.
>
Whatever Rob's familiarity with BMCs or other systems, I'm keen to listen to, explore and gain from his experience. I was certainly expecting push back on these patches and in different circumstances I'd probably say no to them as well. The argument for them needs to stand up to scrutiny, and if that scrutiny leads to alternative solutions (that, ideally, are not /dev/mem) then that's fine.
Currently I think we need what I've proposed despite it looking like a violation of abstractions, because the hardware itself doesn't have a neat, abstract design. But we could be thinking about it wrong, e.g. maybe what we need instead are no devicetree bindings and simply some in-kernel helpers that allow arbitrary drivers to instantiate the sysfs interface I've proposed for these fields under their control. That removes the need for explicit description in the devicetree, though may lead to the creation of a number of unique drivers (and bindings) that each cover only the functions we're trying to generalise over here.
It would be good to have some feedback on the proposed sysfs interface as well - as explored above we could feasibly live without the bindings in their current form but taking away the sysfs interface would nuke the series.
Cheers,
Andrew

On Fri, Jul 13, 2018 at 12:31 AM Andrew Jeffery <andrew@aj.id.au> wrote:
>
> Hi Rob, Ben,
>
> I've replied to you both inline below, hopefully it's clear enough from the context.
>
> On Fri, 13 Jul 2018, at 10:25, Benjamin Herrenschmidt wrote:
> > On Thu, 2018-07-12 at 09:11 -0600, Rob Herring wrote:
> > > On Wed, Jul 11, 2018 at 6:54 PM Andrew Jeffery <andrew@aj.id.au> wrote:
> > > >
> > > > Hi Rob,
> > > >
> > > > Thanks for the response.
> > > >
> > > > On Thu, 12 Jul 2018, at 05:34, Rob Herring wrote:
> > > > > On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote:
> > > > > > Baseboard Management Controllers (BMCs) are embedded SoCs that exist to
> > > > > > provide remote management of (primarily) server platforms. BMCs are
> > > > > > often tightly coupled to the platform in terms of behaviour and provide
> > > > > > many hardware features integral to booting and running the host system.
> > > > > >
> > > > > > Some of these hardware features are simple, for example scratch
> > > > > > registers provided by the BMC that are exposed to both the host and the
> > > > > > BMC. In other cases there's a single bit switch to enable or disable
> > > > > > some of the provided functionality.
> > > > > >
> > > > > > The documentation defines bindings for fields in registers that do not
> > > > > > integrate well into other driver models yet must be described to allow
> > > > > > the BMC kernel to assume control of these features.
> > > > >
> > > > > So we'll get a new binding when that happens? That will break
> > > > > compatibility.
> > > >
> > > > Can you please expand on this? I'm not following.
> > >
> > > If we have a subsystem in the future, then there would likely be an
> > > associated binding which would be different. So if you update the DT,
> > > then old kernels won't work with it.
> >
> > What kind of "subsystem" ? There is almost no way there could be one
> > for that sort of BMC tunables. We've look at several BMC chips out
> > there and requirements from several vendors, BIOS and system
> > manufacturers and it's all over the place.
>
> Right - This is the fundamental principle backing these patches: There will never be a coherent subsystem catering to any of what we want to describe with these bindings.
I never said they would. Saying "do not integrate well into other
driver models YET" implies at some point they will. No point in
beating this any further, just remove "yet"...
> > > > I feel like this is an argument of tradition. Maybe people have
> > > > been dissuaded from doing so when they don't have a reasonable use-
> > > > case? I'm not saying that what I'm proposing is unquestionably
> > > > reasonable, but I don't want to dismiss it out of hand.
> > >
> ...
> > >
> > > It comes up with system controller type blocks too that just have a
> > > bunch of random registers.
>
> This matches the situation at hand.
>
> > > Those change in every SoC and not in any
> > > controlled or ordered way that would make describing the individual
> > > sub-functions in DT worthwhile.
>
> "Not worthwhile" is what I'm pushing back against for our use cases. I think they are narrow and limited enough to make it worthwhile.
>
> Obviously we want to avoid describing these things *badly* - you mentioned the clock bindings - so I'm happy to hash out what the right representation should be. But I struggle to think the solution is not describing some of our hardware features at all.
>
> >
> > So what's the alternative ? Because without something like what we
> > propose, what's going to happen is /dev/mem ... that's what people do
> > today.
>
> Yep. And I've outlined in the cover letter what I think are the advantages of what I'm proposing over /dev/mem. It's not an incredible gain, but has several of nice-to-have properties.
>
> >
> > > > > A node per register bit doesn't scale.
> > > >
> > > > It isn't meant to scale in terms of a single system. Using it
> > > > extensively is very likely wrong. Separately, register-bit-led does
> > > > pretty much the same thing. Doesn't the scale argument apply there?
> > > > Who is to stop me from attaching an insane number of LEDs to a
> > > > system?
> > >
> > > Review.
> > >
> > > If you look, register-bit-led is rarely used outside of some ARM, Ltd.
> > > boards. It's simply quite rare to have MMIO register bits that have a
> > > fixed function of LED control.
> >
> > Well, same here, we hope to review what goes upstream to make it
> > reasonable. Otherwise it doens't matter. If a random vendor, let's say
> > IBM, chose to chip a system where they put an insane amount of cruft in
> > there, it will only affect those systems's BMC and the userspace stack
> > on it.
> >
> > Thankfully that stack is OpenBMC and IBM is aiming at having their
> > device-tree's upstream, thus reviewed, thus it won't happen.
> >
> > *Anything* can be abused. The point here is that we have a number,
> > thankfully rather small, maybe a dozen or two, of tunables that are
> > quite specific to a combination (system vendor, bmc vendor, system
> > model) which control a few HW features that essentially do *NOT* fit in
> > a subsystem.
>
> Exactly. I tried to head off the abuse vector by requiring that uses be listed in the bindings document, and thus enforce some level of review. It might not be the most effective approach at the end of the day, but at least it is something.
>
> >
> > For everything that does, we have created proper drivers (and are doing
> > more).
> >
> >
> > > > Obviously if there are lots of systems using it sparingly and
> > > > legitimately then maybe there's a scale issue, but isn't that just
> > > > a reality of different hardware designs? Whoever is implementing
> > > > support for the system is going to have to describe the hardware
> > > > one way or another.
> > > >
> > > > >
> > > > > Maybe this should be modelled using GPIO binding? There's a line there
> > > > > too as whether the signals are "general purpose" or not.
> > > >
> > > > I don't think so, mainly because some of the things it is intended to be used for are not GPIOs. For instance, take the DAC mux I've described in the patch. It doesn't directly influence anything external to the SoC (i.e. it's certainly not a traditional GPIO in any sense). However, it does *indirectly* influence the SoC's behaviour by muxing the DAC internally between:
> > > >
> > > > 0. VGA device exposed on the host PCIe bus
> > > > 1. The "Graphics CRT" controller
> > > > 2. VGA port A
> > > > 3. VGA port B
> > >
> > > And this mux control is fixed in the SoC design?
> >
> > This specific family of SoC (Aspeed) support those 4 configurations.
> > How they need to be configured at runtime depends on the combination of
> > system vendor and system model, along with in some cases the need to
> > switch it at runtime.
> >
> > This is just one example. Another one is the handful of scratch
> > registers that need to be populated with the "right" values for the
> > host system BIOS, VGA BIOS and VGA driver. (The host bits access them
> > via LPC IO space).
> >
> > The host system BIOS will read some basic config info there before its
> > IPMI stack is up (and some BIOSes already rely on that). The VGA BIOS
> > will get some strapping info and panel info. The VGA driver (which is
> > already upstream, has been for a long time) will look for other things
> > in some of these guys, such as connector configuration.
> >
> > Andrew, if it helps, we could put together a list of what we typically
> > need on an OpenPower system today. That would give people like Rob a
> > better idea of what this is all about.
>
> It's primarily what I've outlined at the bottom of the bindings document, though the use cases aren't provided there as they are a bit out-of-scope. So the SuperIO and VGA scratch registers, plus the DAC mux. A bunch of tunable things.
>
> OpenPOWER platforms make use of the SuperIO scratch registers to convey configuration information from the BMC to the host. Information provided includes low-level control of the host firmware initialisation process, UART and logging configuration, and the strategy for handling errors (crash vs log). This is all an "arbitrary" contract between the BMC userspace and the host firmware, i.e. different platforms/firmware could lay out the same information in different ways or communicate entirely different information altogether. The BMC kernel shouldn't care about any of it, other than provide sensible access to the hardware.
>
> Again on OpenPOWER systems using the ASPEED BMC SoCs running OpenBMC, the BMC uses the VGA scratch registers to sense initialisation of the host graphics driver in the host's boot process. When the BMC userspace detects the host VGA driver is up we switch the DAC mux from the BMC CRT device to the host VGA device so that the host is now driving the VGA output. Non-OpenPOWER OpenBMC configurations may do something entirely different, or not do anything at all with the hardware, so as above, it's not really the job of the BMC kernel to be involved in any of this, other than to provide sensible access to userspace.
>
> There are a number of other switches that control the availability of ASPEED BMC hardware features to the host system that also don't fit any particular subsystem and so will use these bindings, but our (OpenPOWER/OpenBMC) current uses are what's described above.
>
> Dell also suggested they had some use-cases that aligned with the intent of the bindings, but I don't know what they had in mind. Eugene (on Cc) can elaborate.
>
> >
> > > >
> > > > Maybe this could be modelled by pinmux, but then we still need some
> > > > way to expose the mux functions to userspace for selection
> > > > (userspace needs to transition arbitrarily between at least options
> > > > 0 and 1 at runtime), at which point we haven't achieved much beyond
> > > > adding a whole heap of infrastructure in the chain.
> > > >
> > > > Given 0 and 1, maybe exposing attributes in relevant drivers would
> > > > be reasonable, except 0 isn't exposed on the SoC's internal bus so
> > > > there is no driver on the BMC-side to do so. Taking into account 2
> > > > and 3 are also purely hardware paths further dashes the idea, as
> > > > the configuration doesn't really "belong" to the Graphics CRT
> > > > device more than it belongs anywhere else, except for the fact that
> > > > there isn't anywhere else to expose it.
> > > >
> > > > Further, the BMC's kernel can't make the decision as to when to
> > > > switch the mux as it knows nothing of the host's state. The BMC
> > > > userspace is controlling the host's boot state and so *does* know
> > > > when to flip the switch. Finally, the mux is in separate IP to the
> > > > CRT or VGA blocks: It lives in the System Control Unit.
> > > >
> > > > My current point of view is the DAC mux field is effectively its
> > > > own device, and we need to control it from userspace, so we need
> > > > some way to describe it (i.e. not ignore it) in order for its
> > > > capability to be exposed.
> > > >
> > > > I'm fully aware what I'm proposing isn't awesome as it's not
> > > > providing any real abstraction, but the problem(s) at hand also
> > > > seem to defy abstraction, and in order to avoid a plethora of
> > > > bespoke bindings I thought it was reasonable to define something
> > > > generic.
> > > >
> > > > All-in-all I appreciate the suggestion, but assuming you agree with
> > > > my reasoning above do you have thoughts on other alternatives?
> > >
> > > Seems the controls are more fixed than I first thought. All the data
> > > you have here could simply be within a driver.
>
> Rob: A driver for what though? One unique to this particular mux? That feels overly specific when we can generalise the concept to cover a wider range of use-cases.
Not unique. Just instead of populating the structs you have in the
driver from DT, define them in the driver and attach them to
match->data ptr.
> > > Help me understand what
> > > functions are fixed (in the SoC) and which ones vary by board. Only
> > > what's changing per board really needs to go into DT.
>
> I think this last sentence identifies a difference in our starting points, so I'd like to explore that. Blocks of functionality might move around inside the SoC as well, so don't we need a way to describe those functions appropriately?
Yes, if the blocks have well defined boundaries and functions. Blocks
like a UART for example do. Various pieces dumped into system
controllers generally don't IME.
> And from there describe how the SoC integrates the functions, and then describe how a board integrates the SoC? This all composes, and the problem at the end of the day comes down to what we want to view as a point of abstraction, right?
Yes. It's a judgement call as to how much we try to describe in DT. To
use clocks again, a clock divider, mux, or gate all seem like well
defined functions which could be (and were) described in DT, but we
learned that doesn't really work. We're still converting platforms
that did it that way...
> It seems ideal to me that the metadata about hardware features resides in the description of the relevant system (DT, for a function, a SoC or a board), otherwise don't we wind up with crazy, unfocused, monolithic drivers for things like system controllers? (There's MFD/syscon, but having used it previously I'm still grappling with the benefit over some of the weirdness it injects into devicetree - maybe I did it wrong.) Or alternatively, a generic driver that's choc full of platform-specific data covering the platforms that require it?
If that data is one set per SoC, then i'm not that concerned having
platform-specific data in the driver. That doesn't mean the driver is
not "generic". It's still not clear to me in this thread, how much of
this is board specific, but given that you've placed all the data in
an SoC dtsi file it seems to be all per SoC.
Rob

On Mon, 16 Jul 2018, at 23:25, Rob Herring wrote:
> On Fri, Jul 13, 2018 at 12:31 AM Andrew Jeffery <andrew@aj.id.au> wrote:
> >
> > Hi Rob, Ben,
> >
> > I've replied to you both inline below, hopefully it's clear enough from the context.
> >
> > On Fri, 13 Jul 2018, at 10:25, Benjamin Herrenschmidt wrote:
> > > On Thu, 2018-07-12 at 09:11 -0600, Rob Herring wrote:
> > > > On Wed, Jul 11, 2018 at 6:54 PM Andrew Jeffery <andrew@aj.id.au> wrote:
> > > > >
> > > > > Hi Rob,
> > > > >
> > > > > Thanks for the response.
> > > > >
> > > > > On Thu, 12 Jul 2018, at 05:34, Rob Herring wrote:
> > > > > > On Wed, Jul 11, 2018 at 03:01:19PM +0930, Andrew Jeffery wrote:
> > > > > > > Baseboard Management Controllers (BMCs) are embedded SoCs that exist to
> > > > > > > provide remote management of (primarily) server platforms. BMCs are
> > > > > > > often tightly coupled to the platform in terms of behaviour and provide
> > > > > > > many hardware features integral to booting and running the host system.
> > > > > > >
> > > > > > > Some of these hardware features are simple, for example scratch
> > > > > > > registers provided by the BMC that are exposed to both the host and the
> > > > > > > BMC. In other cases there's a single bit switch to enable or disable
> > > > > > > some of the provided functionality.
> > > > > > >
> > > > > > > The documentation defines bindings for fields in registers that do not
> > > > > > > integrate well into other driver models yet must be described to allow
> > > > > > > the BMC kernel to assume control of these features.
> > > > > >
> > > > > > So we'll get a new binding when that happens? That will break
> > > > > > compatibility.
> > > > >
> > > > > Can you please expand on this? I'm not following.
> > > >
> > > > If we have a subsystem in the future, then there would likely be an
> > > > associated binding which would be different. So if you update the DT,
> > > > then old kernels won't work with it.
> > >
> > > What kind of "subsystem" ? There is almost no way there could be one
> > > for that sort of BMC tunables. We've look at several BMC chips out
> > > there and requirements from several vendors, BIOS and system
> > > manufacturers and it's all over the place.
> >
> > Right - This is the fundamental principle backing these patches: There will never be a coherent subsystem catering to any of what we want to describe with these bindings.
>
> I never said they would. Saying "do not integrate well into other
> driver models YET" implies at some point they will. No point in
> beating this any further, just remove "yet"...
Right, there should have been a comma before 'yet'. Sorry for the confusion.
*snip*
> > > > >
> > > > > Maybe this could be modelled by pinmux, but then we still need some
> > > > > way to expose the mux functions to userspace for selection
> > > > > (userspace needs to transition arbitrarily between at least options
> > > > > 0 and 1 at runtime), at which point we haven't achieved much beyond
> > > > > adding a whole heap of infrastructure in the chain.
> > > > >
> > > > > Given 0 and 1, maybe exposing attributes in relevant drivers would
> > > > > be reasonable, except 0 isn't exposed on the SoC's internal bus so
> > > > > there is no driver on the BMC-side to do so. Taking into account 2
> > > > > and 3 are also purely hardware paths further dashes the idea, as
> > > > > the configuration doesn't really "belong" to the Graphics CRT
> > > > > device more than it belongs anywhere else, except for the fact that
> > > > > there isn't anywhere else to expose it.
> > > > >
> > > > > Further, the BMC's kernel can't make the decision as to when to
> > > > > switch the mux as it knows nothing of the host's state. The BMC
> > > > > userspace is controlling the host's boot state and so *does* know
> > > > > when to flip the switch. Finally, the mux is in separate IP to the
> > > > > CRT or VGA blocks: It lives in the System Control Unit.
> > > > >
> > > > > My current point of view is the DAC mux field is effectively its
> > > > > own device, and we need to control it from userspace, so we need
> > > > > some way to describe it (i.e. not ignore it) in order for its
> > > > > capability to be exposed.
> > > > >
> > > > > I'm fully aware what I'm proposing isn't awesome as it's not
> > > > > providing any real abstraction, but the problem(s) at hand also
> > > > > seem to defy abstraction, and in order to avoid a plethora of
> > > > > bespoke bindings I thought it was reasonable to define something
> > > > > generic.
> > > > >
> > > > > All-in-all I appreciate the suggestion, but assuming you agree with
> > > > > my reasoning above do you have thoughts on other alternatives?
> > > >
> > > > Seems the controls are more fixed than I first thought. All the data
> > > > you have here could simply be within a driver.
> >
> > Rob: A driver for what though? One unique to this particular mux? That feels overly specific when we can generalise the concept to cover a wider range of use-cases.
>
> Not unique. Just instead of populating the structs you have in the
> driver from DT, define them in the driver and attach them to
> match->data ptr.
Okay, I'll prototype it.
>
> > > > Help me understand what
> > > > functions are fixed (in the SoC) and which ones vary by board. Only
> > > > what's changing per board really needs to go into DT.
> >
> > I think this last sentence identifies a difference in our starting points, so I'd like to explore that. Blocks of functionality might move around inside the SoC as well, so don't we need a way to describe those functions appropriately?
>
> Yes, if the blocks have well defined boundaries and functions. Blocks
> like a UART for example do. Various pieces dumped into system
> controllers generally don't IME.
>
> > And from there describe how the SoC integrates the functions, and then describe how a board integrates the SoC? This all composes, and the problem at the end of the day comes down to what we want to view as a point of abstraction, right?
>
> Yes. It's a judgement call as to how much we try to describe in DT. To
> use clocks again, a clock divider, mux, or gate all seem like well
> defined functions which could be (and were) described in DT, but we
> learned that doesn't really work. We're still converting platforms
> that did it that way...
>
> > It seems ideal to me that the metadata about hardware features resides in the description of the relevant system (DT, for a function, a SoC or a board), otherwise don't we wind up with crazy, unfocused, monolithic drivers for things like system controllers? (There's MFD/syscon, but having used it previously I'm still grappling with the benefit over some of the weirdness it injects into devicetree - maybe I did it wrong.) Or alternatively, a generic driver that's choc full of platform-specific data covering the platforms that require it?
>
> If that data is one set per SoC, then i'm not that concerned having
> platform-specific data in the driver. That doesn't mean the driver is
> not "generic". It's still not clear to me in this thread, how much of
> this is board specific, but given that you've placed all the data in
> an SoC dtsi file it seems to be all per SoC.
Right, the features we want to expose are part of the SoC, not the board it sits on. The distinction was that different boards (or even different software payloads on the same board) could use them in vastly different ways, though that mainly affects how they're treated inside the kernel and at the kernel/userspace interface rather than at the devicetree level.
Cheers,
Andrew

On Mon, 2018-07-16 at 07:55 -0600, Rob Herring wrote:
> If that data is one set per SoC, then i'm not that concerned having
> platform-specific data in the driver. That doesn't mean the driver is
> not "generic". It's still not clear to me in this thread, how much of
> this is board specific, but given that you've placed all the data in
> an SoC dtsi file it seems to be all per SoC.
So Rob, I think that's precisely where the disconnect is.
I think we all (well hopefully) agree that those few tunables don't fit
in any existing subystem and aren't likely to ever do (famous last
words...).
Where we disagree is we want to make this parametrized via the DT, and
you want us to hard wire the list in some kind of SoC driver for a
given SoC family/version.
The reason I think hard wiring the list in the driver is not a great
solution is that that list in itself is prone to variations, possibly
fairly often, between boards, vendors, versions of boards, etc...
We can't know for sure every SoC tunable (out of the gazillions in
those chips) are going to be needed for a given system. We know which
ones we do use for ours, and that's a couple of handfuls, but it could
be that Dell need a slightly different set, and so might Yadro, or so
might our next board revision for that matter.
Now, updating the device-tree in the board flash with whatever vendor
specific information is needed is a LOT easier than getting the kernel
driver constantly updated. The device-tree after all is there to
reflect among other things system specific ways in which the SoC is
wired and configured. This is rather close...
Cheers,
Ben.

On Tue, 17 Jul 2018, at 14:26, Benjamin Herrenschmidt wrote:
> On Mon, 2018-07-16 at 07:55 -0600, Rob Herring wrote:
> > If that data is one set per SoC, then i'm not that concerned having
> > platform-specific data in the driver. That doesn't mean the driver is
> > not "generic". It's still not clear to me in this thread, how much of
> > this is board specific, but given that you've placed all the data in
> > an SoC dtsi file it seems to be all per SoC.
>
> So Rob, I think that's precisely where the disconnect is.
>
> I think we all (well hopefully) agree that those few tunables don't fit
> in any existing subystem and aren't likely to ever do (famous last
> words...).
>
> Where we disagree is we want to make this parametrized via the DT, and
> you want us to hard wire the list in some kind of SoC driver for a
> given SoC family/version.
>
> The reason I think hard wiring the list in the driver is not a great
> solution is that that list in itself is prone to variations, possibly
> fairly often, between boards, vendors, versions of boards, etc...
>
> We can't know for sure every SoC tunable (out of the gazillions in
> those chips) are going to be needed for a given system. We know which
> ones we do use for ours, and that's a couple of handfuls, but it could
> be that Dell need a slightly different set, and so might Yadro, or so
> might our next board revision for that matter.
>
> Now, updating the device-tree in the board flash with whatever vendor
> specific information is needed is a LOT easier than getting the kernel
> driver constantly updated. The device-tree after all is there to
> reflect among other things system specific ways in which the SoC is
> wired and configured. This is rather close...
Not sure this helps, but I feel that the proposal pretty closely matches what's described in Documentation/devicetree/bindings/mfd/mfd.txt. It's intended that nodes using the bindings I'm proposing are children of a 'compatible = "syscon", "simple-mfd"' node (this is the case with the features we're hoping to describe for our SoC). I should explicitly call that out.
But to go on, "simple-mfd" is effectively an alias of "simple-bus", which means its intended to match child node compatibles to drivers provided by the kernel. If we shouldn't be describing minor features of a SoC in the devicetree, doesn't this invalidate the case for simple-mfd? What is the *correct* use of simple-mfd? When is it not used to expose minor features in set of "miscellaneous system registers"? Why doesn't this proposed case fit?
Cheers,
Andrew

On Tue, Jul 17, 2018 at 5:28 PM Andrew Jeffery <andrew@aj.id.au> wrote:
>
> On Tue, 17 Jul 2018, at 14:26, Benjamin Herrenschmidt wrote:
> > On Mon, 2018-07-16 at 07:55 -0600, Rob Herring wrote:
> > > If that data is one set per SoC, then i'm not that concerned having
> > > platform-specific data in the driver. That doesn't mean the driver is
> > > not "generic". It's still not clear to me in this thread, how much of
> > > this is board specific, but given that you've placed all the data in
> > > an SoC dtsi file it seems to be all per SoC.
> >
> > So Rob, I think that's precisely where the disconnect is.
> >
> > I think we all (well hopefully) agree that those few tunables don't fit
> > in any existing subystem and aren't likely to ever do (famous last
> > words...).
> >
> > Where we disagree is we want to make this parametrized via the DT, and
> > you want us to hard wire the list in some kind of SoC driver for a
> > given SoC family/version.
> >
> > The reason I think hard wiring the list in the driver is not a great
> > solution is that that list in itself is prone to variations, possibly
> > fairly often, between boards, vendors, versions of boards, etc...
> >
> > We can't know for sure every SoC tunable (out of the gazillions in
> > those chips) are going to be needed for a given system. We know which
> > ones we do use for ours, and that's a couple of handfuls, but it could
> > be that Dell need a slightly different set, and so might Yadro, or so
> > might our next board revision for that matter.
> >
> > Now, updating the device-tree in the board flash with whatever vendor
> > specific information is needed is a LOT easier than getting the kernel
> > driver constantly updated. The device-tree after all is there to
> > reflect among other things system specific ways in which the SoC is
> > wired and configured. This is rather close...
>
> Not sure this helps, but I feel that the proposal pretty closely matches what's described in Documentation/devicetree/bindings/mfd/mfd.txt. It's intended that nodes using the bindings I'm proposing are children of a 'compatible = "syscon", "simple-mfd"' node (this is the case with the features we're hoping to describe for our SoC). I should explicitly call that out.
IMO, any binding that has only those compatibles is not correct and a
specific compatible is needed. We should be able identify a specific
h/w block.
> But to go on, "simple-mfd" is effectively an alias of "simple-bus", which means its intended to match child node compatibles to drivers provided by the kernel. If we shouldn't be describing minor features of a SoC in the devicetree, doesn't this invalidate the case for simple-mfd? What is the *correct* use of simple-mfd? When is it not used to expose minor features in set of "miscellaneous system registers"? Why doesn't this proposed case fit?
I'm no fan of simple-mfd either. I think it is abused and often a sign
of bad binding design. The general problem with MFD bindings is people
define child nodes based on what drivers they happen to have for some
OS. DT is not the only way to instantiate drivers. Child nodes are
really only needed when you have resources per child that need to be
defined. For example, if the MFD has an interrupt controller and
interrupts to sub-blocks or sub-blocks have their own clocks.
"simple-mfd" was for when the parent node has no driver or the child
nodes have no dependency on the parent.
Rob

On Mon, Jul 16, 2018 at 10:56 PM Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> On Mon, 2018-07-16 at 07:55 -0600, Rob Herring wrote:
> > If that data is one set per SoC, then i'm not that concerned having
> > platform-specific data in the driver. That doesn't mean the driver is
> > not "generic". It's still not clear to me in this thread, how much of
> > this is board specific, but given that you've placed all the data in
> > an SoC dtsi file it seems to be all per SoC.
>
> So Rob, I think that's precisely where the disconnect is.
>
> I think we all (well hopefully) agree that those few tunables don't fit
> in any existing subystem and aren't likely to ever do (famous last
> words...).
>
> Where we disagree is we want to make this parametrized via the DT, and
> you want us to hard wire the list in some kind of SoC driver for a
> given SoC family/version.
>
> The reason I think hard wiring the list in the driver is not a great
> solution is that that list in itself is prone to variations, possibly
> fairly often, between boards, vendors, versions of boards, etc...
Can we separate the list of what's enabled from the details of the
registers? Even if we put all this into DT, we may still want to have
some separation for dts file structure. Some portion of this has to be
SoC specific because you are simply exposing SoC registers.
> We can't know for sure every SoC tunable (out of the gazillions in
> those chips) are going to be needed for a given system. We know which
> ones we do use for ours, and that's a couple of handfuls, but it could
> be that Dell need a slightly different set, and so might Yadro, or so
> might our next board revision for that matter.
That's very hand wavy. Do you have some concrete examples (i.e. dts
files) showing the differences.
One problem I'm having with this is you are still going to need per
board specifics in userspace. You may be moving some of details out,
but moving to DT is not going to completely eliminate that. I agree
that not using /dev/mem is a good thing, but there are several ways
you could do that independent of any DT binding.
> Now, updating the device-tree in the board flash with whatever vendor
> specific information is needed is a LOT easier than getting the kernel
> driver constantly updated. The device-tree after all is there to
> reflect among other things system specific ways in which the SoC is
> wired and configured. This is rather close...
Sadly, updating my kernel is easier than updating my PC firmware
(though packaged firmware on my current laptop changes that). I can
assure you that ARM boards are generally much worse in that regard.
BTW, you may want to pay attention to EBBR[1][2]. Not sure how much
you care for BMCs, but there may be some interest.
Rob
[1] https://github.com/ARM-software/ebbr
[2] https://lists.linaro.org/pipermail/boot-architecture/

On Wed, 2018-07-18 at 13:50 -0600, Rob Herring wrote:
>
> > So Rob, I think that's precisely where the disconnect is.
> >
> > I think we all (well hopefully) agree that those few tunables don't fit
> > in any existing subystem and aren't likely to ever do (famous last
> > words...).
> >
> > Where we disagree is we want to make this parametrized via the DT, and
> > you want us to hard wire the list in some kind of SoC driver for a
> > given SoC family/version.
> >
> > The reason I think hard wiring the list in the driver is not a great
> > solution is that that list in itself is prone to variations, possibly
> > fairly often, between boards, vendors, versions of boards, etc...
>
> Can we separate the list of what's enabled from the details of the
> registers? Even if we put all this into DT, we may still want to have
> some separation for dts file structure. Some portion of this has to be
> SoC specific because you are simply exposing SoC registers.
Not sure I understand what you mean by "what's enabled".
The goal here is to expose register "fields" (nor raw values, some
registers need locking for RMW etc... the kernel would handle that via
syscon locking I expect), so basically named "tunables" to userspace.
Userspace is the one that knows the values for a given system.
> > We can't know for sure every SoC tunable (out of the gazillions in
> > those chips) are going to be needed for a given system. We know which
> > ones we do use for ours, and that's a couple of handfuls, but it could
> > be that Dell need a slightly different set, and so might Yadro, or so
> > might our next board revision for that matter.
>
> That's very hand wavy. Do you have some concrete examples (i.e. dts
> files) showing the differences.
We could list what we have on the pile for some of our IBM systems
today, but we would need Dell and Yadro to chime in with what they
need.
I still, from experience with that stuff and gut feeling, am pretty
sure it's going to be a moving list, and updating the kernel driver
constantly isn't going to fly.
> One problem I'm having with this is you are still going to need per
> board specifics in userspace.
Yes. Userspace is ultimately the one that knows what needs to be done
on a given machine.
> You may be moving some of details out,
> but moving to DT is not going to completely eliminate that.
We aren't trying to either. We are trying to make sure we don't need to
change the *kernel* all the time, in part bcs we are pushing hard for
OpenBMC vendors to use upstream with, if possible, no vendor changes.
So userspace has to know the board specific tunables anyways. Today in
many systems, it does that by whacking /dev/mem. I guess you can
understand why that is bad :-)
> I agree
> that not using /dev/mem is a good thing, but there are several ways
> you could do that independent of any DT binding.
Such as ? The only other option is to have one or more kernel drivers
that have built-in the precise and complete list of those tunables that
need to be exposed.
It's not impossible, but I worry that it's not going to scale terribly
well, and that vendors will constantly "fork" that driver to add
different things to that list.
I might be wrong here, so I'd like for Eugene (Dell) and Alexander
(Yadro) to chime in, but experience with BMCs has shown that we
regularily , as we add a feature or rewrite something, need to find
another new magic SoC tunable the HW manufacturer hid somewhere that
will make our stuff work.
> > Now, updating the device-tree in the board flash with whatever vendor
> > specific information is needed is a LOT easier than getting the kernel
> > driver constantly updated. The device-tree after all is there to
> > reflect among other things system specific ways in which the SoC is
> > wired and configured. This is rather close...
>
> Sadly, updating my kernel is easier than updating my PC firmware
> (though packaged firmware on my current laptop changes that). I can
> assure you that ARM boards are generally much worse in that regard.
> BTW, you may want to pay attention to EBBR[1][2]. Not sure how much
> you care for BMCs, but there may be some interest.
You are conflating your host kernel and your BMC here. The BMC kernel
is part of the "firmware", as is its DTS and the BMC userspace.
(Again this isn't the host DTS, this is the BMC DTS).
They get all updated together. My point isn't about the ease or
difficulty for a *user* to udpate their BMC, in that case the solutions
above are equivalent.
The point is from a system vendor perspective. A system vendor using
OpenBMC will *customize* their BMC build in various ways. Typically
they *will* have a custom DT since this is what represents their
specific system and they will have some specific userspace bits.
However, we are trying very hard for them *not* to fork the kernel, and
if possible move OpenBMC towards a fully upstream kernel (still working
on getting all the SoC drivers cleaned up and pushed up but it's moving
in the right direction).
So from a vendor perspective, such as us IBM, we *alread* have custom
DTs and customized userspace, that's part of the normal flow of
deploying a BMC.
However, we are trying *NOT* to have a custom kernel (and we don't, at
the moment there is an OpenBMC kernel accross vendors, though it's not
*yet* fully upstream).
So if the solution proposed is prone to requiring frequent changes to a
kernel driver, that solution makes the above a lot more difficult and
will encourage vendors to keep forking kernels.
This is what we are trying to avoid.
Cheers,
Ben.
> Rob
>
> [1] https://github.com/ARM-software/ebbr
> [2] https://lists.linaro.org/pipermail/boot-architecture/

On Thu, 19 Jul 2018, at 04:37, Rob Herring wrote:
> On Tue, Jul 17, 2018 at 5:28 PM Andrew Jeffery <andrew@aj.id.au> wrote:
> >
> > On Tue, 17 Jul 2018, at 14:26, Benjamin Herrenschmidt wrote:
> > > On Mon, 2018-07-16 at 07:55 -0600, Rob Herring wrote:
> > > > If that data is one set per SoC, then i'm not that concerned having
> > > > platform-specific data in the driver. That doesn't mean the driver is
> > > > not "generic". It's still not clear to me in this thread, how much of
> > > > this is board specific, but given that you've placed all the data in
> > > > an SoC dtsi file it seems to be all per SoC.
> > >
> > > So Rob, I think that's precisely where the disconnect is.
> > >
> > > I think we all (well hopefully) agree that those few tunables don't fit
> > > in any existing subystem and aren't likely to ever do (famous last
> > > words...).
> > >
> > > Where we disagree is we want to make this parametrized via the DT, and
> > > you want us to hard wire the list in some kind of SoC driver for a
> > > given SoC family/version.
> > >
> > > The reason I think hard wiring the list in the driver is not a great
> > > solution is that that list in itself is prone to variations, possibly
> > > fairly often, between boards, vendors, versions of boards, etc...
> > >
> > > We can't know for sure every SoC tunable (out of the gazillions in
> > > those chips) are going to be needed for a given system. We know which
> > > ones we do use for ours, and that's a couple of handfuls, but it could
> > > be that Dell need a slightly different set, and so might Yadro, or so
> > > might our next board revision for that matter.
> > >
> > > Now, updating the device-tree in the board flash with whatever vendor
> > > specific information is needed is a LOT easier than getting the kernel
> > > driver constantly updated. The device-tree after all is there to
> > > reflect among other things system specific ways in which the SoC is
> > > wired and configured. This is rather close...
> >
> > Not sure this helps, but I feel that the proposal pretty closely matches what's described in Documentation/devicetree/bindings/mfd/mfd.txt. It's intended that nodes using the bindings I'm proposing are children of a 'compatible = "syscon", "simple-mfd"' node (this is the case with the features we're hoping to describe for our SoC). I should explicitly call that out.
>
> IMO, any binding that has only those compatibles is not correct and a
> specific compatible is needed. We should be able identify a specific
> h/w block.
I didn't intend for that to get interpreted quite as literally, so apologies for that. We do have h/w-block-specific compatibles in there too. The point was to demonstrate that we're dealing (at this point, only) with mfds/syscons.
>
> > But to go on, "simple-mfd" is effectively an alias of "simple-bus", which means its intended to match child node compatibles to drivers provided by the kernel. If we shouldn't be describing minor features of a SoC in the devicetree, doesn't this invalidate the case for simple-mfd? What is the *correct* use of simple-mfd? When is it not used to expose minor features in set of "miscellaneous system registers"? Why doesn't this proposed case fit?
>
> I'm no fan of simple-mfd either. I think it is abused and often a sign
> of bad binding design.
Ah, yes, this is a familiar feeling when reflecting on things I've done in the past. Hence trying to understand how to use it right.
> The general problem with MFD bindings is people
> define child nodes based on what drivers they happen to have for some
> OS. DT is not the only way to instantiate drivers. Child nodes are
> really only needed when you have resources per child that need to be
> defined. For example, if the MFD has an interrupt controller and
> interrupts to sub-blocks or sub-blocks have their own clocks.
> "simple-mfd" was for when the parent node has no driver or the child
> nodes have no dependency on the parent.
Thanks for the clarification, I'll keep that in mind going forward.
Andrew

> > I agree
> > that not using /dev/mem is a good thing, but there are several ways
> > you could do that independent of any DT binding.
>
> Such as ? The only other option is to have one or more kernel drivers
> that have built-in the precise and complete list of those tunables that
> need to be exposed.
>
> It's not impossible, but I worry that it's not going to scale terribly
> well, and that vendors will constantly "fork" that driver to add
> different things to that list.
Picking on that last point, "different things" doesn't necessarily mean "different fields in the SoC" (though it may). We could just need to use different initial values for the fields already described. So taking that into account, the field descriptions could vary wildly from platform to platform - where "platform" here is the combination of the BMC, its host system, and the host system's required configuration - not just referring to the BMC in isolation. That in turn may cause a fork of the driver (changes that are incompatible with other platforms), or not scale terribly well as Ben suggests.
The initial value concept can help reduce the impact on userspace as userpace may no-longer need to care about it, so I think it's a desirable property. With respect to devicetree, the field definitions will stay in the SoC dtsi, but the initial values would be described on a per-platform basis in the dts.
>
> I might be wrong here, so I'd like for Eugene (Dell) and Alexander
> (Yadro) to chime in, but experience with BMCs has shown that we
> regularily , as we add a feature or rewrite something, need to find
> another new magic SoC tunable the HW manufacturer hid somewhere that
> will make our stuff work.
>
> > > Now, updating the device-tree in the board flash with whatever vendor
> > > specific information is needed is a LOT easier than getting the kernel
> > > driver constantly updated. The device-tree after all is there to
> > > reflect among other things system specific ways in which the SoC is
> > > wired and configured. This is rather close...
> >
> > Sadly, updating my kernel is easier than updating my PC firmware
> > (though packaged firmware on my current laptop changes that). I can
> > assure you that ARM boards are generally much worse in that regard.
> > BTW, you may want to pay attention to EBBR[1][2]. Not sure how much
> > you care for BMCs, but there may be some interest.
>
> You are conflating your host kernel and your BMC here. The BMC kernel
> is part of the "firmware", as is its DTS and the BMC userspace.
>
> (Again this isn't the host DTS, this is the BMC DTS).
>
> They get all updated together. My point isn't about the ease or
> difficulty for a *user* to udpate their BMC, in that case the solutions
> above are equivalent.
>
> The point is from a system vendor perspective. A system vendor using
> OpenBMC will *customize* their BMC build in various ways. Typically
> they *will* have a custom DT since this is what represents their
> specific system and they will have some specific userspace bits.
>
> However, we are trying very hard for them *not* to fork the kernel, and
> if possible move OpenBMC towards a fully upstream kernel (still working
> on getting all the SoC drivers cleaned up and pushed up but it's moving
> in the right direction).
>
> So from a vendor perspective, such as us IBM, we *alread* have custom
> DTs and customized userspace, that's part of the normal flow of
> deploying a BMC.
>
> However, we are trying *NOT* to have a custom kernel (and we don't, at
> the moment there is an OpenBMC kernel accross vendors, though it's not
> *yet* fully upstream).
>
> So if the solution proposed is prone to requiring frequent changes to a
> kernel driver, that solution makes the above a lot more difficult and
> will encourage vendors to keep forking kernels.
>
> This is what we are trying to avoid.
Well described. Thanks Ben.
Andrew

On Thu, 2018-07-19 at 11:58 +0930, Andrew Jeffery wrote:
> > > I agree
> > > that not using /dev/mem is a good thing, but there are several ways
> > > you could do that independent of any DT binding.
> >
> > Such as ? The only other option is to have one or more kernel drivers
> > that have built-in the precise and complete list of those tunables that
> > need to be exposed.
> >
> > It's not impossible, but I worry that it's not going to scale terribly
> > well, and that vendors will constantly "fork" that driver to add
> > different things to that list.
>
> Picking on that last point, "different things" doesn't necessarily
> mean "different fields in the SoC" (though it may). We could just
> need to use different initial values for the fields already
> described.
I don't worry about initial values too much. Those could be specified
by userspace. It would be trivial to have something akin to
/etc/sysctl.conf that contains the initial values and a script blasting
them early. In fact I prefer this being in userspace to be frank. It
could be in the initramfs if we want it done early enough, maybe with a
usermode helper.
Unless we have some demonstrable reasons why some of these must be set
before the various kernel drivers initialize.
> So taking that into account, the field descriptions could vary wildly
> from platform to platform - where "platform" here is the combination
> of the BMC, its host system, and the host system's required
> configuration - not just referring to the BMC in isolation. That in
> turn may cause a fork of the driver (changes that are incompatible
> with other platforms), or not scale terribly well as Ben suggests.
I really only worry about the actual set of registers/fields. I think
folding in initial values goes a bit too far.
> The initial value concept can help reduce the impact on userspace as
> userpace may no-longer need to care about it, so I think it's a
> desirable property.
I don't worry too much about userspace containing system specific bits
and pieces such as a config file with the appropriate initial values
for the platform. Userspace will have to contain platform specific
stuff regardless (if anything, stuff like thermal control is
intrinsicly different from one platform to the next).
> With respect to devicetree, the field definitions will stay in the
> SoC dtsi, but the initial values would be described on a per-platform
> basis in the dts.
If the fields are in the SoC dtsi, then Rob has a reasonable argument
that the list of fields is rather fixed for a given SoC and thus could
be hard wired in the driver....
I think at this stage, we need to create more clarity. In order to do
that, I think we need to come up with a list, as exhaustive as we can,
of what are the fields/register we need for our platforms, and find any
reason why userspace wouldn't be a good enough location to stick the
initial values.
Then we need Dell and Yadro (and maybe others) to help with their
variant of that list for the same SoCs to begin with.
With that, we'll get an idea of whether we think we can get a
reasonable stable long term list that's appropriate for most vendors.
If that's the case, then my objections to have it in the kernel instead
of the DT fade away a bit.
If one or two of those fields absolutely must have their defaults
early, then we can consider a specific set of one or two properties in
the SoC node for that SoC family that provides those specific defaults.
But unless we have that list, I think we'll be throwing too many
hypotheticals around to convince Rob and others.
Andrew, can you start with a list that shows what you expect us to need
on our systems ?
Cheers,
Ben

>
> Andrew, can you start with a list that shows what you expect us to need
> on our systems ?
>
Okay, our Witherspoon and Romulus platforms containing the ASPEED AST2500 currently need the following tuneables exposed:
From the SCU:
- Debug UART enable
- VGA DAC mux
- VGA scratch registers 0-7
- LPC SuperIO decode enable
- VGA MMIO decode enable
From the LPC controller:
- iLPC2AHB enable
- SuperIO scratch registers 0x20-0x2f
(The LPC controller is just as much of a collection of random bits as the SCU)
Lastly, our Palmetto platform uses an AST2400 which has fewer features compared to the AST2500. Its tuneable list is the same as the above with the exception of "Debug UART enable".
Tuneables that we may need to expose in the future include:
From the SCU:
- PCI VID/DID for the BMC PCIe device
- VGA device enable (may need to be disabled if the platform contains a discrete graphics processor)
From the LPC controller:
- UART mux
Alexander, Eugene, can you chime in with your platforms' needs?
Cheers,
Andrew

On Fri, 2018-07-20 at 09:37 +0930, Andrew Jeffery wrote:
> >
> > Andrew, can you start with a list that shows what you expect us to need
> > on our systems ?
> >
>
> Okay, our Witherspoon and Romulus platforms containing the ASPEED AST2500 currently need the following tuneables exposed:
>
> > From the SCU:
>
> - Debug UART enable
> - VGA DAC mux
> - VGA scratch registers 0-7
> - LPC SuperIO decode enable
> - VGA MMIO decode enable
>
> > From the LPC controller:
>
> - iLPC2AHB enable
> - SuperIO scratch registers 0x20-0x2f
>
> (The LPC controller is just as much of a collection of random bits as the SCU)
>
> Lastly, our Palmetto platform uses an AST2400 which has fewer features compared to the AST2500. Its tuneable list is the same as the above with the exception of "Debug UART enable".
>
> Tuneables that we may need to expose in the future include:
>
> > From the SCU:
>
> - PCI VID/DID for the BMC PCIe device
> - VGA device enable (may need to be disabled if the platform contains a discrete graphics processor)
Additionally there's a bunch of resigters controlling the mapping of
various MMIO regions of the BMC PCIe device to portions of the BMC
address space. I'm not sure what's the best way to handle that.
This specific set might require a dedicated device as a subnode of
the SCU in the DT that contains all the mappings as properties...
That or we consider them static enough and just whack it in u-boot.
> > From the LPC controller:
>
> - UART mux
>
> Alexander, Eugene, can you chime in with your platforms' needs?
>
> Cheers,
>
> Andrew

Andrew, Benjamin, Rob,
Thanks for bringing up the set of patches and a great discussion.
After going through the thread I figured that I'd like to share a few
things we needed to hack when programming several BMC boards:
- Debug UART enable/mux
- Disable GPIO D/E passthrough (I think this is supported by the
current pinctrl driver)
- RMII/RGMII strapping
- iLPC2AHB control
- SPI master mux select
- Various SuperIO configurations
As for the discussion whether these belong to a platform driver or
device tree nodes, I think in an ideal world all these configurations
could be nicely grouped and abstracted in a platform kernel driver (or
drivers). However in reality think this as an "M * N" problem: there
are M variants of BMCs and N different platforms built with these
BMCs. Each platform-BMC combination is going to have its own quirks
and slightly different requirements in BMC "tunables". Were there a
kernel driver for the M BMC variants, it would inevitably have a lot
of churn due to the different needs of the platforms.
What I like about the device tree approach is the expressiveness of
the format and the ability to specify non-conflicting initial values
easily. Sometimes we need initial values for these parameters set
before running userspace, and setting such values in device tree is
easier than using #defines or kernel parameters.
On Thu, Jul 19, 2018 at 9:57 PM Benjamin Herrenschmidt
<benh@kernel.crashing.org> wrote:
>
> On Fri, 2018-07-20 at 09:37 +0930, Andrew Jeffery wrote:
> > >
> > > Andrew, can you start with a list that shows what you expect us to need
> > > on our systems ?
> > >
> >
> > Okay, our Witherspoon and Romulus platforms containing the ASPEED AST2500 currently need the following tuneables exposed:
> >
> > > From the SCU:
> >
> > - Debug UART enable
> > - VGA DAC mux
> > - VGA scratch registers 0-7
> > - LPC SuperIO decode enable
> > - VGA MMIO decode enable
> >
> > > From the LPC controller:
> >
> > - iLPC2AHB enable
> > - SuperIO scratch registers 0x20-0x2f
> >
> > (The LPC controller is just as much of a collection of random bits as the SCU)
> >
> > Lastly, our Palmetto platform uses an AST2400 which has fewer features compared to the AST2500. Its tuneable list is the same as the above with the exception of "Debug UART enable".
> >
> > Tuneables that we may need to expose in the future include:
> >
> > > From the SCU:
> >
> > - PCI VID/DID for the BMC PCIe device
> > - VGA device enable (may need to be disabled if the platform contains a discrete graphics processor)
>
> Additionally there's a bunch of resigters controlling the mapping of
> various MMIO regions of the BMC PCIe device to portions of the BMC
> address space. I'm not sure what's the best way to handle that.
>
> This specific set might require a dedicated device as a subnode of
> the SCU in the DT that contains all the mappings as properties...
>
> That or we consider them static enough and just whack it in u-boot.
>
> > > From the LPC controller:
> >
> > - UART mux
> >
> > Alexander, Eugene, can you chime in with your platforms' needs?
> >
> > Cheers,
> >
> > Andrew
--
Regards,
Kun