*[PATCH 00/11] EDAC/ghes: Cleanup, rework and improvement of memory reporting@ 2020-03-06 15:13 Robert Richter
2020-03-06 15:13 ` [PATCH 01/11] EDAC/mc: Use int type for parameters of edac_mc_alloc() Robert Richter
` (11 more replies)0 siblings, 12 replies; 27+ messages in thread
From: Robert Richter @ 2020-03-06 15:13 UTC (permalink / raw)
To: Borislav Petkov, Mauro Carvalho Chehab, Tony Luck
Cc: James Morse, Aristeu Rozanski, Robert Richter, linux-edac, linux-kernel
This series contains a significant cleanup and rework of the ghes
driver and improves the memory reporting as follows:
* fix of DIMM label in error reports (patch #2),
* creation of multiple memory controllers to group DIMMs depending on
the physical memory array (patches #9-#11). This should reflect the
memory topology of a system in sysfs. Esp. multi-node systems show
up with one memory controller per node now.
The changes base on the remaining patches that are a general cleanup
and rework:
* small change to edac_mc, not really dependent on the rest of the
series (patch #1),
* general cleanup and rework of the ghes driver (patches #3-#8).
The implementation of multiple memory controllers bases on the
suggestion from James (see patch #11), thank you James for your
valuable input here. The patches are created newly from scratch and
obsolete the GHES part of my previous postings a while ago that have
not been accepted upstream:
https://lore.kernel.org/patchwork/cover/1093488/
Tested on a Marvell/Cavium ThunderX2 Sabre (dual socket) system.
Robert Richter (11):
EDAC/mc: Use int type for parameters of edac_mc_alloc()
EDAC/ghes: Setup DIMM label from DMI and use it in error reports
EDAC/ghes: Remove local variable rdr_mask in ghes_edac_dmidecode()
EDAC/ghes: Remove unused members of struct ghes_edac_pvt, rename it to
ghes_mci
EDAC/ghes: Cleanup struct ghes_edac_dimm_fill, rename it to
ghes_dimm_fill
EDAC/ghes: Carve out MC device handling into separate functions
EDAC/ghes: Have a separate code path for creating the fake MC
EDAC/ghes: Carve out code into ghes_edac_register_{one,fake}()
EDAC/ghes: Implement DIMM mapping table for SMBIOS handles
EDAC/ghes: Create an own device for each mci
EDAC/ghes: Create one memory controller per physical memory array
drivers/edac/edac_mc.c | 7 +-
drivers/edac/edac_mc.h | 6 +-
drivers/edac/ghes_edac.c | 514 +++++++++++++++++++++++++++++----------
include/linux/edac.h | 2 -
4 files changed, 390 insertions(+), 139 deletions(-)
--
2.20.1
^permalinkrawreply [flat|nested] 27+ messages in thread

*Re: [PATCH 09/11] EDAC/ghes: Implement DIMM mapping table for SMBIOS handles
2020-03-06 15:13 ` [PATCH 09/11] EDAC/ghes: Implement DIMM mapping table for SMBIOS handles Robert Richter
@ 2020-03-16 9:40 ` Borislav Petkov0 siblings, 0 replies; 27+ messages in thread
From: Borislav Petkov @ 2020-03-16 9:40 UTC (permalink / raw)
To: Robert Richter
Cc: Mauro Carvalho Chehab, Tony Luck, James Morse, Aristeu Rozanski,
linux-edac, linux-kernel
On Fri, Mar 06, 2020 at 04:13:16PM +0100, Robert Richter wrote:
> GHES uses SMBIOS handles to identify the DIMM that was causing a hw
> error. Currently this is stored in smbios_handle of struct dimm_info.
> This implementation has several drawbacks. The information is private
> to the ghes driver, but struct dimm_info is for general use. The DIMMs
> are tied to a *mci struct, which makes a lockup inefficient. It is
"lookup"
> hard to dynamically allocate DIMMs and also to meet locking
> constraints when adding or removing them. This becomes even more
> challenging when having multiple MC instances that group a set of
> DIMMs, so the change is needed also to later support multiple MC
> instances.
Err, I don't understand: normally a bunch of DIMMs belong to a memory
controller and that gives you the hierarchy automatically. Why is that a
problem?
And normally you allocate the DIMM representations on MC init and
deallocate them when removing the MC.
So what is the problem here?
...
...
> @@ -72,6 +79,52 @@ struct memdev_dmi_entry {
> u16 conf_mem_clk_speed;
> } __attribute__((__packed__));
>
> +/* ghes_reg_mutex must be held. */
lockdep_assert_held()
...
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette^permalinkrawreply [flat|nested] 27+ messages in thread

*Re: [PATCH 11/11] EDAC/ghes: Create one memory controller per physical memory array
2020-03-06 15:13 ` [PATCH 11/11] EDAC/ghes: Create one memory controller per physical memory array Robert Richter
@ 2020-03-16 9:51 ` Borislav Petkov
2020-03-17 16:34 ` John Garry0 siblings, 1 reply; 27+ messages in thread
From: Borislav Petkov @ 2020-03-16 9:51 UTC (permalink / raw)
To: Robert Richter
Cc: Mauro Carvalho Chehab, Tony Luck, James Morse, Aristeu Rozanski,
linux-edac, linux-kernel, Toshi Kani, John Garry
On Fri, Mar 06, 2020 at 04:13:18PM +0100, Robert Richter wrote:
> The ghes driver only creates one memory controller for the whole
> system. This does not reflect memory topology especially in multi-node
> systems. E.g. a Marvell ThunderX2 system shows:
>
> /sys/devices/system/edac/mc/mc0/dimm0
> /sys/devices/system/edac/mc/mc0/dimm1
> /sys/devices/system/edac/mc/mc0/dimm2
> /sys/devices/system/edac/mc/mc0/dimm3
> /sys/devices/system/edac/mc/mc0/dimm4
> /sys/devices/system/edac/mc/mc0/dimm5
> /sys/devices/system/edac/mc/mc0/dimm6
> /sys/devices/system/edac/mc/mc0/dimm7
> /sys/devices/system/edac/mc/mc0/dimm8
> /sys/devices/system/edac/mc/mc0/dimm9
> /sys/devices/system/edac/mc/mc0/dimm10
> /sys/devices/system/edac/mc/mc0/dimm11
> /sys/devices/system/edac/mc/mc0/dimm12
> /sys/devices/system/edac/mc/mc0/dimm13
> /sys/devices/system/edac/mc/mc0/dimm14
> /sys/devices/system/edac/mc/mc0/dimm15
>
> The DIMMs 9-15 are located on the 2nd node of the system. On
> comparable x86 systems there is one memory controller per node. The
> ghes driver should also group DIMMs depending on the topology and
> create one MC per node.
>
> There are several options to detect the topology. ARM64 systems
> retrieve the (NUMA) node information from the ACPI SRAT table (see
> acpi_table_parse_srat()). The node id is later stored in the physical
> address page. The pfn_to_nid() macro could be used for a DIMM after
> determining its physical address. The drawback of this approach is
> that there are too many subsystems involved it depends on. It could
> easily break and makes the implementation complex. E.g. pfn_to_nid()
> can only be reliable used on mapped address ranges which is not always
> granted, there are various firmware instances involved which could be
> broken, or results may vary depending on NUMA settings.
>
> Another approach that was suggested by James' is to use the DIMM's
> physical memory array handle to group DIMMs [1]. The advantage is to
> only use the information on memory devices from the SMBIOS table that
> contains a reference to the physical memory array it belongs too. This
> information is mandatory same as the use of DIMM handle references by
> GHES to provide the DIMM location of an error. There is only a single
> table to parse which eases implementation. This patch uses this
> approach for DIMM grouping.
>
> Modify the DMI decoder to also detect the physical memory array a DIMM
> is linked to and create one memory controller per array to group
> DIMMs. With the change DIMMs are grouped, e.g. a ThunderX2 system
> shows one MC per node now:
>
> # grep . /sys/devices/system/edac/mc/mc*/dimm*/dimm_label
> /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0
> /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0
> /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0
> /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0
> /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0
> /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0
> /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0
> /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0
> /sys/devices/system/edac/mc/mc1/dimm0/dimm_label:N1 DIMM_I0
> /sys/devices/system/edac/mc/mc1/dimm1/dimm_label:N1 DIMM_J0
> /sys/devices/system/edac/mc/mc1/dimm2/dimm_label:N1 DIMM_K0
> /sys/devices/system/edac/mc/mc1/dimm3/dimm_label:N1 DIMM_L0
> /sys/devices/system/edac/mc/mc1/dimm4/dimm_label:N1 DIMM_M0
> /sys/devices/system/edac/mc/mc1/dimm5/dimm_label:N1 DIMM_N0
> /sys/devices/system/edac/mc/mc1/dimm6/dimm_label:N1 DIMM_O0
> /sys/devices/system/edac/mc/mc1/dimm7/dimm_label:N1 DIMM_P0
>
> [1] https://lkml.kernel.org/r/f878201f-f8fd-0f2a-5072-ba60c64eefaf@arm.com
>
> Suggested-by: James Morse <james.morse@arm.com>
> Signed-off-by: Robert Richter <rrichter@marvell.com>
> ---
> drivers/edac/ghes_edac.c | 137 ++++++++++++++++++++++++++++++---------
> 1 file changed, 107 insertions(+), 30 deletions(-)
This is all fine and good but that change affects the one x86 platform
we support so the whole patchset should be tested there too. Adding
Toshi.
As a matter of fact, the final version of this set should be tested on
all platforms which are using this thing. Adding John Garry too who
reported issues with this driver recently on his platform.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette^permalinkrawreply [flat|nested] 27+ messages in thread

*Re: [PATCH 11/11] EDAC/ghes: Create one memory controller per physical memory array
2020-03-16 9:51 ` Borislav Petkov@ 2020-03-17 16:34 ` John Garry
2020-03-17 22:14 ` Kani, Toshi
2020-03-24 11:32 ` Xiaofei Tan0 siblings, 2 replies; 27+ messages in thread
From: John Garry @ 2020-03-17 16:34 UTC (permalink / raw)
To: Borislav Petkov, Robert Richter
Cc: Mauro Carvalho Chehab, Tony Luck, James Morse, Aristeu Rozanski,
linux-edac, linux-kernel, Toshi Kani, Xiaofei Tan, Shiju Jose
On 16/03/2020 09:51, Borislav Petkov wrote:
> On Fri, Mar 06, 2020 at 04:13:18PM +0100, Robert Richter wrote:
>> The ghes driver only creates one memory controller for the whole
>> system. This does not reflect memory topology especially in multi-node
>> systems. E.g. a Marvell ThunderX2 system shows:
>>
>> /sys/devices/system/edac/mc/mc0/dimm0
>> /sys/devices/system/edac/mc/mc0/dimm1
>> /sys/devices/system/edac/mc/mc0/dimm2
>> /sys/devices/system/edac/mc/mc0/dimm3
>> /sys/devices/system/edac/mc/mc0/dimm4
>> /sys/devices/system/edac/mc/mc0/dimm5
>> /sys/devices/system/edac/mc/mc0/dimm6
>> /sys/devices/system/edac/mc/mc0/dimm7
>> /sys/devices/system/edac/mc/mc0/dimm8
>> /sys/devices/system/edac/mc/mc0/dimm9
>> /sys/devices/system/edac/mc/mc0/dimm10
>> /sys/devices/system/edac/mc/mc0/dimm11
>> /sys/devices/system/edac/mc/mc0/dimm12
>> /sys/devices/system/edac/mc/mc0/dimm13
>> /sys/devices/system/edac/mc/mc0/dimm14
>> /sys/devices/system/edac/mc/mc0/dimm15
>>
>> The DIMMs 9-15 are located on the 2nd node of the system. On
>> comparable x86 systems there is one memory controller per node. The
>> ghes driver should also group DIMMs depending on the topology and
>> create one MC per node.
>>
>> There are several options to detect the topology. ARM64 systems
>> retrieve the (NUMA) node information from the ACPI SRAT table (see
>> acpi_table_parse_srat()). The node id is later stored in the physical
>> address page. The pfn_to_nid() macro could be used for a DIMM after
>> determining its physical address. The drawback of this approach is
>> that there are too many subsystems involved it depends on. It could
>> easily break and makes the implementation complex. E.g. pfn_to_nid()
>> can only be reliable used on mapped address ranges which is not always
>> granted, there are various firmware instances involved which could be
>> broken, or results may vary depending on NUMA settings.
>>
>> Another approach that was suggested by James' is to use the DIMM's
>> physical memory array handle to group DIMMs [1]. The advantage is to
>> only use the information on memory devices from the SMBIOS table that
>> contains a reference to the physical memory array it belongs too. This
>> information is mandatory same as the use of DIMM handle references by
>> GHES to provide the DIMM location of an error. There is only a single
>> table to parse which eases implementation. This patch uses this
>> approach for DIMM grouping.
>>
>> Modify the DMI decoder to also detect the physical memory array a DIMM
>> is linked to and create one memory controller per array to group
>> DIMMs. With the change DIMMs are grouped, e.g. a ThunderX2 system
>> shows one MC per node now:
>>
>> # grep . /sys/devices/system/edac/mc/mc*/dimm*/dimm_label
>> /sys/devices/system/edac/mc/mc0/dimm0/dimm_label:N0 DIMM_A0
>> /sys/devices/system/edac/mc/mc0/dimm1/dimm_label:N0 DIMM_B0
>> /sys/devices/system/edac/mc/mc0/dimm2/dimm_label:N0 DIMM_C0
>> /sys/devices/system/edac/mc/mc0/dimm3/dimm_label:N0 DIMM_D0
>> /sys/devices/system/edac/mc/mc0/dimm4/dimm_label:N0 DIMM_E0
>> /sys/devices/system/edac/mc/mc0/dimm5/dimm_label:N0 DIMM_F0
>> /sys/devices/system/edac/mc/mc0/dimm6/dimm_label:N0 DIMM_G0
>> /sys/devices/system/edac/mc/mc0/dimm7/dimm_label:N0 DIMM_H0
>> /sys/devices/system/edac/mc/mc1/dimm0/dimm_label:N1 DIMM_I0
>> /sys/devices/system/edac/mc/mc1/dimm1/dimm_label:N1 DIMM_J0
>> /sys/devices/system/edac/mc/mc1/dimm2/dimm_label:N1 DIMM_K0
>> /sys/devices/system/edac/mc/mc1/dimm3/dimm_label:N1 DIMM_L0
>> /sys/devices/system/edac/mc/mc1/dimm4/dimm_label:N1 DIMM_M0
>> /sys/devices/system/edac/mc/mc1/dimm5/dimm_label:N1 DIMM_N0
>> /sys/devices/system/edac/mc/mc1/dimm6/dimm_label:N1 DIMM_O0
>> /sys/devices/system/edac/mc/mc1/dimm7/dimm_label:N1 DIMM_P0
>>
>> [1] https://lkml.kernel.org/r/f878201f-f8fd-0f2a-5072-ba60c64eefaf@arm.com
>>
>> Suggested-by: James Morse <james.morse@arm.com>
>> Signed-off-by: Robert Richter <rrichter@marvell.com>
>> ---
>> drivers/edac/ghes_edac.c | 137 ++++++++++++++++++++++++++++++---------
>> 1 file changed, 107 insertions(+), 30 deletions(-)
>
> This is all fine and good but that change affects the one x86 platform
> we support so the whole patchset should be tested there too. Adding
> Toshi.
>
> As a matter of fact, the final version of this set should be tested on
> all platforms which are using this thing. Adding John Garry too who
> reported issues with this driver recently on his platform.
Adding other RAS-centric guys for H.
Cheers,
John
>
> Thx.
>
^permalinkrawreply [flat|nested] 27+ messages in thread