iPXE booting possibly broken on OS X Sierra Update

Hi @sysadminatelier, @Abuelika, @Seb-B, @dfuriet, @Warget, @Imperilled
I finally got to borrow a MacBookPro8,2 from a friend to test booting it using straight iPXE over network. And guess what, it’s just working. I am a bit confused. So I try to collect all the information to hopefully get this sorted. These are the specs of the system I have:

The issue has to do with the network equipment. I use an unmanaged dumb mini switch. Maybe all of you want to give this a try again.

Although the Sierra version is not the very latest (I have 10.12.5 right now while 10.12.6 update came out 19th of July 2017) it’s still very recent and I doubt that mine is too old to have the bug. Possibly even one of the latest updates fixed the issue. So please upgrade your Mac and see if that helps.

Possibly the NIC in this MacBookPro8,2 is just slightly different to all the others? Maybe, but I kind of doubt this is making the difference.

I’ve been working on the on and off over the last weeks. This is what I just sent to iPXE dev Michael Brown:

Tried looking at the NIC registers as suggested by NiKiZe in the forums a while back. Added ethtool code to dump registers to iPXE but from my tests it looks like many registers are different on every reboot and I have no idea which registers mean what. For now I don’t think this is getting us anywhere. But I am open to suggestions.
Digging through the tg3*.c code I figured that there might be some auto negotiation issue on the iMac model I have. Curtis did not seem to have that and got that sort of fixed. Now I am seeing the same thing as Curtis, interface comes up and it tries to send packets to wire but fails so. As far as I know Curtis has been onto this together with you on a IRC chat.
On the other lane I have tested the Linux code a fair bit. Tested 4.13.4, early 3.9 and the very first commit that added 57766 NIC support to the Linux kernel. I did test those early versions as I hoped to clarify if the later added binary firmware blob is playing a role or if any commit in the kernel code did fix an issue with Macs. Turns out it’s not from my tests. They all worked like a charm on the iMac that cannot send packets within iPXE.

Hope we’ll find some time to give me some hints on where to dig next.

@Seb-B As well if you have ideas and suggestions on how to diff the tg3 code of iPXE and the Linux kernel to figure out what’s up. I have started to but there seem to be too many minor differences making it really hard to diff. Though I have the feeling that we should be able to solve this soon as Linux kernel is working great.

@P-Schnoeckel Thanks for testing! Would have been great if Apple had just fixed this for us. But they don’t. @fractal13 even got in contact with Apple but they refuse to give us details on what has changed in the UEFI firmware. So I think we need to figure this out ourselves…

so, we did try to revert a machine to El capitan but while the OS was no problem, the firmware didn’t downgrade on the way.
We looked for a way to force the firmware downgrade but couldn’t find any so far…

I’m not sure either about the firmware, but it would seem strange to me that it would be reverted at the same time as the OS, but as I said we’ll try just to make sure.

“Make a FOG backup of a machine that is known to have this PXE boot issue”
well, I’d certainly love to be able to do that wouldn’t I ;)

Seriously though, the clients are in use in a another building and have quite a load of heavy applications on it, so I would have to wait for one to be inoccupied for at least two days (one for the test, one for reverting to a usable state). I’ll chack the occupation planning of these room and try to get this done as soon as it can be.

@Seb-B Thanks for posting the link to the iPXE forum again. Haven’t noticed that there were new posts on that topic. Though I don’t think there is much news to what we already know. But still it’s good to know that another person is onto that. Seems like Curtis definitely knows the bits. I will try to contact him and see what we can do.

… it has to do with the firmware installed with Sierra (which will not be reverted if we switch back to El Capitan) so I don’t have much hope for that …

Are you sure about the firmware not being reverted? I kind of hoped (not knowing enough about Macs) that installing El Capitan would also install the old firmware version but possibly I am wrong. I still think it’s worth trying just so we know. In case this is irreversible I’d find this even worse.

It’s going to take some us time to try and revert to an El Capitan OS…

Couldn’t you just test this on one single client? Make a FOG backup of a machine that is known to have this PXE boot issue, pop in the El Capitan DVD and do a fresh clean install (removing all the Sierra stuff before)? That shouldn’t take very long, right?

we just tried resetting the PRAM on the imac, sadly it didn’t change anything.

It’s going to take some us time to try and revert to an El Capitan OS and I think it was established in the ipxe forum that it has to do with the firmware installed with Sierra (which will not be reverted if we switch back to El Capitan) so I don’t have much hope for that. I’ll still try as soon as I can though, and keep you informed.

@Seb-B Thanks heaps for getting back to me on this one. As you see I have worked with a MacBookPro8,2 which has a very similar NIC (BCM57765 instead of your BCM57766) and this one does not seem to cause the issue. So I still hope that we can figure this one out and get it fixed.

According to your other post using snp.efi didn’t work either. The chief iPXE developer asked me to give this a try and see what we get. Let’s see what he thinks of your video.

What I have been wondering for a while is: We have people reporting that machines PXE booted properly before upgrading to Sierra and it stopped working right after the upgrade. So I am really wondering if installing an earlier release or maybe resetting NVRAM would make this issue go away again. Would this be possible for you to test?

EDIT: I just had an even better idea. Do you still have an image with a pre-Sierra Mac OS X installed? Would be great testing to deploy this, boot once into Mac OS X and then see if PXE booting does work again after that.

In our case, we already tried going through an unmanageable switch when it was tought to be a spanning tree error, but I can’t remember if it was a 100mbps or a 1Gbps. I’ll give it another try with a 100Mbps ASAP just to be sure.

Found something interesting here. For this first test (all working great) I used an old 100MBit/s switch as I don’t have much equipment here at hand. Just for the fun of it I tried connecting the MacBook straight to my PC (where FOG is running in a VM) to get a full Gigabit connection. Again booting into iPXE it takes a lot longer when it waits for the link to come up and sometimes even fails. Possibly this is just caused by my crippled setup here but I think you guys should all give it a try. Connect your Mac to the LAN using a dumb 100 MBit/s mini switch! Sure this is not a solution to the problem but we might figure out where this is gonna take us.