2990wx-based DAW & DPC Latency?

I'm building a DAW around a 2990wx, and I'm trying to iron out the last little bits of DPC latency from my system. I started with an install of Windows 10 & Cockos Reaper, and the pops and clicks were maddening. Long story short, after a lot of internet sleuthing I've downloaded Microsofts Interrupt Affinity Policy Tool & have been setting interrupt affinities for most devices to the CPUs in NUMA node 0. The pops & clicks are almost gone, and I have a few questions that will hopefully get me to the last mile:

LatencyMon never, ever shows any DPC activity on CPUs 32 - 63 (All CPUs are selected under "options"). Even with all the manual affinities I set for 0 - 15, *some* DPC activity eventually shows up on the CPUs 16 - 31. But I've left it running for hours, and nothing ever gets recorded against the higher-numbered CPUs. Why?

This has me wondering if CPUs "0 & 32", "1 & 33" etc are the pairs, and DPCs are only counted against the lower-numbered CPU? But I feel like that wouldn't make sense, as Task Manager says CPUs 0 & 32 are in NUMA nodes 0 & 2, respectively.

I set interrupt affinities for nearly every peripheral & all "Generic Software Device" to CPUs 0 - 15 (Numa node 0) while leaving all the CPUs, "motherboard devices", pcie controllers, etc unbound. Just doing this has significantly tamed the latency in the system, and almost all of the yellow & red in the latency measuring tools went away. I'm curious if there's a known explanation for why limiting the majority of interrupt handling to 1/4 of the available CPUs seems to have improved overall latency? (I'm guessing its because 0 - 15 are directly attached to the pci & memory, but???)

Which NUMA nodes are directly attached, and which are indirectly attached? I'm thinking I want to bind the DAW to the CPUs with **indirect** memory access and bind device interrupts (& other processes) to the other cores to the cores that are directly attached to the pci & memory buses.

Is there anywhere a good technical discussion of the benefits / drawbacks of SMT with regard to any DAW? Google turns up a lot of generic advice threads which can be summarized as either "hyperthreading is bad, m'kay" or "it definitely improves performance!!", but I haven't found anything that has a good technical explanation of how it either helps or hinders real-time audio performance.

You are asking for a fairly technical advice, I suggest you ask your question at AMD Developers Forum which will determine which AMD Developer Forum you question is best suit for from here Newcomers Start Here