So, long story short - I've designed a PCB, manufactured it and soldered on components. At the core there's a Chipkit Max32 and a bunch of other stuff, including four MCP23S17. Since there's limited information about MCP's interaction in code on the web, at least easily graspable without digging too much.

The problem was that the MCP's simply wouldn't work correctly and I tried three different libraries - what ended working for me was a modified version of Majenko's library. In that library's read and write functions, only a single data transfer was made per SPI session (in addition to cmd and opcode etc). To the best of my knowledge, that should work as the transaction is finalized when slaveselect is pulled high. The other libraries I didn't get to work at all. The MCP's are using the same DSPI port, with individual slave selects and individual hardware addresses as well, and the design of the circuit is by the books.

But.
When doing full port writes and/or several consecutive writes/read, the MCP's would fail and date would be garbled, ending up on the wrong pins and simply not working and lighting up pins despite sending all zeroes etc. Single pin writes would mostly work, which led me to do hardware "voodoo" and I was tempted to bring out the soldering iron and order new parts etc. Sometimes the order of the commands would make a huge difference in function; It could work, and after a recompile it failed to run etc, typical non-initialized variable problems - except in hardware.

But - Part 2.
I dug into the specs, double checked everything and found that the solution was simple enough - by always talking to the first port, using BANK=0 (paired A-B ports), and writing both A and B during the same SPI "session", the problem vanished completely. Now I can write (haven't tried reading) and the correct pins would be activated at all times. I can run full 10MHz without a glitch inside and outside an interrupt/core service.

Majenko is a great coder, and the library itself is indeed very neat - so I highly doubt this error would have sneaked passed him, and I'm wondering if this is due to a hardware revision in the MCP's, or DSPI / SPI differences or perhaps something else? The fact that none of the other libraries worked for me is also weird. It just doesn't add up to me, but I think this section of the specs' might be the issue:

3.2.1 BYTE MODE AND SEQUENTIAL MODE
The MCP23X17 family has the ability to operate in Byte mode or Sequential mode (IOCON.SEQOP).

Byte mode disables automatic Address Pointer incrementing. When operating in Byte mode, the MCP23X17 family does not increment its internal address counter after each byte during the data transfer. This gives the ability to continually access the same address by providing extra clocks (without additional control bytes). This is useful for polling the GPIO register for data changes or for continually writing to the output latches.

A special mode (Byte mode with IOCON.BANK = 0) causes the address pointer to toggle between associated A/B register pairs.For example, if the BANK bit is cleared and the Address Pointer is initially set to address 12h (GPIOA) or 13h (GPIOB), the pointer will toggle between GPIOA and GPIOB. Note that the Address Pointer can initially point to either address in the register pair.
Sequential mode enables automatic address pointer incrementing. When operating in Sequential mode, the MCP23X17 family increments its address counter after each byte during the data transfer. The Address Pointer automatically rolls over to address 00h after accessing the last register.

These two modes are not to be confused with single writes/reads and continuous writes/reads that are serial protocol sequences. For example, the device may be configured for Byte mode and the master may perform a continuous read. In this case, the MCP23X17 would not increment the Address Pointer and would repeatedly drive data from the same location.

Majenko's library default to use BANK=0, HAEN=1, SEQOP=1, unless I'm mistaken, so the problem might be the special mode and that the chip expects the next port to be read immediately after?

Anyhow - Hopefully this helps someone else who's wrestling with these IC's in the future!

Update:
Reading doesn't work, it always returns 0xFFFF or 0x0000.
I've also tried manually talking to the MCP's with SPI commands and it seems that HAEN never gets enabled.
However, my setup is a bit peculiar though;

All four chips has their own CS line and unique HW address, but utilize the same SPI channel.
Could this be the issue, even if I init each chip?
Will investigate further.

majenko wrote:If they have a unique CS pin they have no need of a unique hardware address. The hardware address only makes any sense when you have multiple chips on a single CS pins, as in my 32-port IO expander.

Yes, and that was the plan as well. But then I thought while designing the PCB, "I'll put the HW addresses in there, just in case".
I don't know if that messed things up, but to be honest I haven't had the time to investigate further. I'll try to get ultra basic once that is done, could possibly be something else that is interfering with the SPI port/pins.

Just reviving a dead thread by letting people know what finally was the problem.

Power.

I'm not exactly sure why, since the 5V pins at any measurable scale show 5V, but unless I connected the Chipkit 5V line directly to the 5V circuit, the MCP's won't "boot" properly and ultimately won't get the configuration sent to them by the MCU. The DMD and Chipkit itself runs without a hitch without the jumper, which is a bit peculiar. Anyhow, the solution was to simply fix the power and then I could have used any library or function and that would have worked too. I'm currently not using a library since the interrupt-function is very tight (52 uS), but everything runs super smoothly now.

Haven't investigated this in greater detail, but it's quite odd; the power lines show proper 5V far before (i.e several seconds) before the MCU has sent its first commands. Best guess is that something in the PCB board design is acting up, since I had troubles with powering the board in the beginning. I was using a MOSFET to break the ground signal with a switch since I could find a powerful enough switch to break several amperes and fit on a PCB. I had to bypass the MOSFET completely to get it running properly...

If anyone's interested, or maybe have any advice what could be wrong; feel free to check out my blog post about it.

Looking at your PCB layout I can see a couple of things that could be an issue:

+5V for the Switch Matrix and Cabinet IO expanders comes from the DMD power header. It starts on the underside of the board, goes about 1mm, then comes back up to the top side through a tiny via, before heading off to the expanders. The traces there need to be thicker, and you need to pull that via - have the track start and finish on the top side.

The same happens for the ground to those chips. It comes from the underside of the DMD power header then immediately vias up to the top side, then snakes its way around to the chips.

Pretty much all your ground traces around that area are far too small. Really you want a ground pour, not measly little traces like that. I can see similar things in other areas of the board - tiny little ground and power traces. You did a reasonable job with the main distribution traces, but what comes after lets it down somewhat.

Basically: you have too much impedance in your power and ground trace layout and transients cause the power feed to lag.

Normally that wouldn't be that much of an issue, but there's another thing you seem to have completely missed out from your design: decoupling capacitors. I don't see a single capacitor that isn't for timing of main power. Every chip should have 0.1uF on every power pin. Every subsection of your board that feeds off the main power bus should have (around) 10uF to feed that section.

These capacitors absorb the transients that your traces can't keep up with.