Author
Topic: Diskless MDs not PXE booting (Read 6460 times)

purps, you are a God amongst men: the driver was the f*&king culprit!! I installed the r8168 module and boom, the Jetway Ecomini pulled in a IP and started the PXE boot. Thanks man! I would not have considered the Realtek module in 0810 as being the source of the issue - at least not for a while.

So, back to the question of what's in the MD. Guess what? It's a Realtek PCIe Gb NIC. Sweet!!

I rebuilt the diskless image after installing the r8168 module, but the MD still kernel panics. From memory, the error is something about an eth0 file not being present.

What's my next step? I'm going to search the forums and wiki in the meantime.

Note: my switch is a Dell PowerConnect Gb 8port...probably 6 years old now.

So, back to the question of what's in the MD. Guess what? It's a Realtek PCIe Gb NIC. Sweet!!

I rebuilt the diskless image after installing the r8168 module, but the MD still kernel panics. From memory, the error is something about an eth0 file not being present.

Don't worry, you should be able to get the MD working. Read my user page, check out the Living Room MD, it will be a similar process for you. The various instructions mentioned here are all based on the wiki pages that Tim mentioned.

To give you a nudge in the right direction, you need to manually place a copy of the r8168 driver that you installed on your core amongst the MD gubbins also; at the moment, it is literally just on the core, and the MD can't use it.

I tried out what Murcel suggested, and it did indeed get the boot even further along. The problem now appears to be the MD pausing indefinitely after printing out something about "we've announced ourselves to the router" - I can't remember the exact error. I let the MD sit like that for 45 minutes and saw no change. So, I rebooted the core for shits and giggles, and powered the MD back up after the core reboot was complete. Unfortunately, the original error (Error: cannot connect to router: rebooting in 5 seconds.) came back.

I'm assuming I can get past this error if I run the startup-script.sh script again.

Questions:

1. Why do I have to keep running the startup-script.sh script?2. I don't see any "diskless" script running on the core when the MD gets to the "announced ourselves to the router" message. How do I fix this?3. Why is my install so broken? Did one bad NIC driver really introduce this many problems?

I did what was in those links. I'd already installed the r8168 driver, so I did the other stuff:

- included the r8168 module in /etc/initramfs-tools-interactor/modules- recreated the diskless image (the way it's described on the 0810 install page)- I don't have a /usr/pluto/diskless dir, so I searched around until I found the post about running "startup_script.sh" to get past the MD boot error.

So that's where I am.

BTW, how can I tell if the diskless image is actually being created? I looked on the core and can't find any running process that would indicate the MD's image is being created.

One other thing: although I can't ping or ssh to my core from the external network, I can definitely ping the external network from the core. I didn't look at the iptables rules - maybe they're messed up?

modprobe r8168depmod -a/usr/pluto/bin/Diskless_BuildDefaultImage.shThe diskless image thing is already created, so I don't think that running the script from the 810 install page will do very much. The script above would be more appropriate. The depmod command should be run because you have changed some modules (that's my understanding anyway).

Try another reboot once you've done that, we want a directory to appear in /usr/pluto/diskless, as I am sure you are aware.

BTW, how can I tell if the diskless image is actually being created? I looked on the core and can't find any running process that would indicate the MD's image is being created.

The script Diskless_Setup.sh creates the image. If that is running, the Diskless MD filesystems are being created. If you don't see it, it's not happening. It creates the /usr/pluto/diskless directory and MD subdirectory.

If this is not the case for you, here's how it all works:

1. New MD PXE boots default boot image.2. Default boot image connects to the Core and tells it to create a new MD device.3. Default boot image displays "announced ourselves to the router" and waits for messages from the Core.4. Core creates a MD device (check your device tree)5. Core allocates IP address to new MD, tells new MD about it (you get "Allocated permanent IP" message on MD).6. Core runs Diskless_Setup.sh, tells new MD about it (you get "Running Diskless_Setup.sh" message on MD).7. When Diskless_Setup.sh finishes, Core tells MD about it. If it fails, the MD will display "Diskless_Setup.sh failed" message. If it succeeds, you'll get a success message and the Core will also tell the MD to reboot.8. MD reboots into its new filesystem.

At no point should Diskless_Setup.sh die without the MD getting a message (error or success).

If you don't have the MD device in your tree after the router announcement, you have a different problem. If you do have the device in the tree, and MD says Diskless_Setup is running, but you don't see Diskless_Setup on the core, run /usr/pluto/bin/Diskless_Setup.sh yourself on the Core and see what's happening.

The first two steps get were done by the install script for the r8168 module. I ran the script in the final step, rebooted - no change. The MD just at the same screen, and there was no diskless image being created on the core.

I'm going to reinstall LinuxMCE. Before I run the final install script from the desktop, I'm going to install the r8168 module. If things still don't work, I'll try a new DVD snapshot. If shit still fails, I'll dance on the computer.

FAIL! Reinstall of LinuxMCE and install of r8168 module right from the get go did not fix anything:

1. MD PXE boot dies after saying it can't contact the router. The ONLY way to get past this step is to run "startup_script.sh" after EVERY SINGLE CORE REBOOT.2. The MD's diskless image never runs. The MD might say it's announced itself to the router, but the core doesn't actually do anything.

There is some seriously broken shit in the LinuxMCE snapshot I'm using.

1. Ran install2. Appeared to complete successfully, so rebooted.3. Logged into Kubuntu desktop4. Tried to stop network services. Got an error about an unknown device, even though both NICs were up. eth0 had 192.168.80.1 IP, and eth1 had IP from my external DHCP server.5. I unloaded the eth0 module, r8169.6. Installed r8168 module. After install script finished, eth0 was back up with old IP.7. I added the r8168 module to /etc/initramfs-tools-interactor/modules8. Created the default diskless image9. Ran the final install script by double clicking the icon on the desktop10. Rebooted after the install finished.11. Last step of install ran and completed after reboot12. Powered up the Jetway Ecomini MD13. After it got an IP, it began the PXE boot.14. After eth0 apparently came up, the Jetway reported it couldn't connect to the router, so it rebooted.15. 60 minutes later, it's still rebooting because it can't find the router.

13. After it got an IP, it began the PXE boot.14. After eth0 apparently came up, the Jetway reported it couldn't connect to the router, so it rebooted.15. 60 minutes later, it's still rebooting because it can't find the router.

WTF.

Began the PXE boot in what way? Does it load the "default" PXE config file, then the kernel, then the initrd.img files or it doesn't get this far? If it doesn't get this far, check syslog on the Core and tell us what is says.