When running FreeBSD under Xen as a DomU guest - a PVHVM based FreeBSD machine cannot route traffic for any other PV based DomU guests on the same Xen Dom0.
Fix:
To fix the problem either:
- Replace the DomU router machine with a Linux guest (not ideal!)
- Drop the DomU router machine into HVM mode (i.e. xn0 etc. get replaced by rl0 et'al)
- Drop the other DomU guests from PV/PVHVM mode down to HVM mode (this also appears to fix the problem!)
- Move the DomU router machine to a different XenServer, even if it's in the same pool (problem only happens if the DomU router machine, and the DomU guest trying to use it as a gateway are on the same physical Xen Dom0 host).
None of these solutions are ideal - it's basically precluding you from running a 'gateway' machine on XenServer unless it's either cited on it's own pool - or not efficient (i.e. HVM mode only) - which in turn makes it non-agile.
How-To-Repeat: Install XenServer 6.2.
Install FreeBSD 9.2 / 10.0 as a DomU guest, using the PVHVM (so you end up with a NIC called 'xn0' etc.)
Set this first machine up with (for example) 'gateway_enable="YES"' etc. and configure it to route or NAT traffic to the Internet.
Install another DomU guest (e.g. FreeBSD again, or Windows) on the same XenServer.
Make the default gateway of the 2nd DomU the IP of the first DomU.
Even though the fist DomU machine can fetch data/route traffic to/from "The Internet" - the second DomU machine cannot use it as a gateway. Pings will work, TCP sessions will initially 'connect' but cannot exchange any traffic.
If you replace the 'router' DomU machine with say a Linux box (or Windows box) it works as expected. Only FreeBSD in PVHVM mode does not work as the gateway.

Having setup a test system with FreeBSD 9.2-STABLE, 10-STABLE, 11-CURRENT etc. this bug still exists on all of those, regardless of version.
For a 'Client' (i.e. a guest VM trying to route traffic through the other FreeBSD 'router' machine) you can do:
ifconfig xn0 -txcsum
And it will fix that single client. No amount of option fiddling (other than restarting in HVM mode) will fix the 'router' machine - i.e. it's not possible to fix the 'router' machine so that clients don't need any fix.
I've been unable to test disabling txcsum on Windows clients running on the same XenServer as I can't see where I can do that.

I just re-tested this with:
- XenServer 6.5
- FreeBSD 10.1 amd64
Installing FreeBSD in PVHVM mode (i.e. with 'xn0' NIC etc.) - and the problem still exists (incase anyone else runs into it) - there's been a least a couple of other people run into this issue setting up VM's for routing etc.

I use an HVM based VM in the rootbsd cloud. Recently on -current I had to disable rxcsum and txcsum on my vm interfaces to make it "happy" with PF
ifconfig_xn0="inet XXX.XXX.XXX.XXX netmask 0xfffffffc -rxcsum -txcsum"
Maybe try that?

Issues with checksums on XENHVM kernels and the ability to route traffic between XENHVM guests are separate. RootBSD appears to use Cisco switches - at least if the MAC address of the gateway for my RootBSD guest is to be believed. You wouldn't run the gateway for an entire cloud infrastructure off a FreeBSD VM regardless.
For the record, I've been using the OSS Xen releases for years, and have never been able to get PVM (XENHVM) domU to be functional as a gateway - I've either had to use HVM or setup a separate box as the router. This has been the case since at least FBSD8 I think, or whenever XENHVM became an option.

(In reply to Sean Bruno from comment #4)
Following Sean's idea, I was playing with the PV network frontend options, and got one FreeBSD 10.1 RELEASE to get its traffic routed by another FreeBSD 10.1 RELEASE, both within the same XenServer 6.5 host and both with xn1 over the same host-VLAN.
router0# ifconfig xn1 -txcsum -tso4 -lro
vm0# ifconfig xn1 -txcsum -tso4 -lro
Without this config on both domUs, I can not do a:
# fetch http://www.google.com/
(obs.: on success, result will be stored in fetch.out;
needs two control-C to stop)
But with the indicated configuration, I can exec the above command like a charm. I could even SSH into the vm0 from the Internet.
Best regards,
Raimundo Santos

(In reply to raitech from comment #6)
Hi,
I've tested those options here - and they do work *for FreeBSD* boxes.
However - if you set '-txcsum -tso4 -lro' on the FreeBSD box acting as a router - Windows machines still cannot pass traffic through it as a router :( [not tested Linux - but I'd guess from past experience a Linux PV instance will be the same]
So whilst this is a work around (of sorts) for FreeBSD boxes using another FreeBSD box as a router - it's not usable in mixed platforms.
There's obviously some weird interaction still going on with PV to PV network traffic involving FreeBSD when it's 'routing' things.
-Karl

Hello,
I would really like to reproduce this, but sadly my FreeBSD network knowledge is very limited, so please bear with me. When you say:
"Set this first machine up with (for example) 'gateway_enable="YES"' etc. and configure it to route or NAT traffic to the Internet."
Can you please provide examples about how to route NAT traffic to the Internet? A very simple (reduced) use-case that can be used to reproduce this issue would help me a lot.
Thanks, Roger.

(In reply to Sydney Meyer from comment #9)
Just as a side note, I haven't been able to reproduce this using a FreeBSD Dom0, I will now try with a Linux Dom0 (I guess Linux is more picky about checksums?).

Hello,
I've recently committed a bunch of netfront fixes that I think should help solve this issue. ATM, the only reliable way to do packet forwarding on a FreeBSD DomU is to disable all the hardware offload features on both nics (rxcsum, txcsum, tso and lro). Could someone give it a try?
I'm also working on making forwarding work _without_ having to disable all those features, so that we can get optimal performance, however those patches have not yet been reviewed:
https://reviews.freebsd.org/D6656https://reviews.freebsd.org/D6612https://reviews.freebsd.org/D6612
Roger.

(In reply to Sydney Meyer from comment #15)
Hello,
To clarify this, the current code in HEAD should work when doing packet forwarding if rxcsum, txcsum, tso and lro are disabled.
Then, the 3 patches that you mention should allow packet forwarding to work _without_ disabling those features, and yes, you need all 3.
Roger.

(In reply to kpielorz from comment #20)
Yes, I think I know what the issue is. What OS are the other DomUs on the same host using?
If you can provide me with complete tcpdump traces on both interfaces (xn0/xn1), that would help me quite a lot, the following rune should get you the traces:
# tcpdump -n -i <if> -s0 -w <output>.pcap
The resulting pcap files are going to be quite big, so you will probably have to upload them somewhere. Just 10s of capture while trying to route traffic is probably fine.
Thanks, Roger.

FWIW, i have applied the three patches to r301515M and on a dom0 running Xen 4.4.1 with Linux 4.5.1, and a Xen 4.5.3 / NetBSD 7.0.1 host i was able to ping, connect via ssh, scp a file and nc some data between two FreeBSD VMs connected trough a third VM, all running this revision, on the same Linux, respective NetBSD dom0 host.

(In reply to kpielorz from comment #28)
Can you provide a little more info about your router configuration? In my tests I've only enabled net.inet.ip.forwarding=1 in the VM acting as a router and configured different subnets in two interfaces, and exchanged packets between them. Are you doing NAT/Filtering/...?

(In reply to kpielorz from comment #30)
Oh right, this is kind of different from my test setup, it could explain why it works in my case but not in yours. Do you see any checksum errors? 'netstat -s -f inet' should tell you if there are any checksum errors, at least from a FreeBSD point of view.
It would also be interesting to do the same on the Dom0 itself, but I'm not aware of the rune to obtain that information from Linux.
Roger.

batch change:
For bugs that match the following
- Status Is In progress
AND
- Untouched since 2018-01-01.
AND
- Affects Base System OR Documentation
DO:
Reset to open status.
Note:
I did a quick pass but if you are getting this email it might be worthwhile to double check to see if this bug ought to be closed.

(In reply to Eitan Adler from comment #33)
Hi - this issue still exists, I've just re-tested in 10.4 and 11.1. I'm not able to test 12.x at the moment, but I have no reason to believe it's been fixed in current or anything.
It affects anything working with 'low level' packets - so NAT, OpenVPN, DHCP et'al. - e.g. with OpenVPN it seems packet coalescing 'behind the scenes' ends up presenting way over 1500 byte packets to OpenVPN - which it point blank refuses to handle.
Workaround we're using here is to set 'hw.xen.disable_pv_nics=1' in /boot/loader.conf on FreeBSD with a small mod to 'qemu-dm-wrapper' on the Xen Server, and a custom field added to affected VM's in XenCenter that the wrapper 'keys off' - this turns xn0 into vtnet0 for these hosts - these do work with the above applications, and are still live-migratable (and appear to be better performance than re0 NIC's).
-Karl

(In reply to karl from comment #36)
Yes, I assumed so. I'm currently quite busy, so I don't think I will have time to look into this ATM.
One thing I remember about reproducing this issue is that it takes a non trivial amount of time to setup a way to reproduce it (last time I tried I had to setup a forwarding VM).
Could you perhaps document the faster way to reproduce it, and the one that likely involves less setup? That would be helpful (for me at least) and maybe others that will look into the issue.
Thanks, Roger.