But things didn’t work well. A plain bond device worked correctly in all my tests, but when I had a bridge running over it I had problems every time I tried pulling cables. My test for a bond is to boot the machine with a cable in eth0, then when it’s running switch the cable to eth1. This means there is a few seconds of no connectivity and then the other port becomes connected. In an ideal situation at least one port would work at all times – but redundancy features such as bonding are not for an ideal situation! When doing the cable switching test I found that the bond device would often get into a state where it every two seconds (the configured ARP ping time for the bond) it would change it’s mind about the link status and have the link down half the time (according to the logs – according to ping results it was down all the time). This made the network unusable.

Now I have deided that Xen is more important than bonding so I’ll deploy the machine without bonding.

One thing I am considering for next time I try this is to use bridging instead of bonding. The bridge layer will handle multiple Ethernet devices, and if they are both connected to the same switch then the Spanning Tree Protocol (STP) is designed to work in this way and should handle it. So instead of having a bond of eth0 and eth1 and running a bridge over that I would just bridge eth0, eth1, and the Xen interfaces.

8 comments to Ethernet Bonding and a Xen Bridge

I’ve got bonding working with xen on redhat 5. When I initially setup the bonding, I thought I would have to explicitly use the bond0 device in the xen config, however I was told otherwise by redhat support. If I didn’t touch the xen config, it just worked, with or without bonding.

Troy: Interesting. Red Hat do some unusual things with their bridging (renaming interfaces to give the same names), I’m not sure whether that impacts the results or whether something else is happening.

I’ll have to do another test of both Red Hat and Debian bonding/bridging on the same hardware.

I just posted a blog about this today, getting it working under SLES.
The bond device will fail every 2 seconds if you dont have an IP configured on it (as you dont in the above config) so it has no source to do the ARP prove test.
The scripts for Xen work for bonding if you tell them your netdev is bond0, not eth0.

http://www.santaba.com/blog/archives/3
The way the scripts set it up (once you tell them to put your bond device in the bridge) is that bond0 retains its IP (unlike non bonded Xen installs, where the peth0 IP is moved to eth0, which is a netloopback device for vif0.0), the bridge does not get an IP, and the bond device is a member of the bridge.

With the bond device failing every 2 seconds, I was saying the way around that is to configure another IP on the bond interface (so it has a source for the ARP probes), but that’s not a good solution anyway.

Sorry, I mis-stated in comment 5. Confusing myself with some experiments.
With the xen scripts, both the xen bridge and the bond interface get the same IP, but there is no equivalent of peth or vif for the bond.

I’ve got bonding working on Centos 5, but I still have some issues, that I believe are correlated to xen.
The problem is, I don’t know where I can put those commands in Redhat-like distro?
/etc/sysconfig/network-scripts/ifcfg-bond0 doesn’t seem like the right place, and the /etc/sysconfig/networking/(devices,profiles) seems to be just a copy.