bad udp check sum on traceroute from box w/ NAT

I've been trying to troubleshoot a very odd traceroute/udp issue I'm seeing that I can't figure out.

I have two gateway/firewall boxes running NAT for internal hosts on two different networks (office1 and office2).
one is running 2.6.22-gentoo-r9, lets call this one gw1
the other 2.6.20-gentoo-r8, lets call this one gw2

Theres two other systems here:
a system inside either of these two lans, we'll call it lan_host1
a system out on the internet not behind nat, we'll call it outside1

From lan_host1 which is behind either gw1 or gw2 I can traceroute out fine. I can hit gw1 or gw2 from inside the opposites network, i can hit any other host on the internet ( including outside1 ).

From outside1 I can traceroute to both gw1 and gw2 no problem.

Whenever I try to traceroute from either gw1 or gw2 to any host on the internet, including each other and/or outside1, it eventually times out and fails. Running tcpdump on both hosts at the same time while doing the traceroute shows the same UDP packets at both ends, and I know that both hosts respond to traceroute otherwise. What is interesting is what tcpdump shows ( on both hosts ) of the packets in question:

Now with every attempt at the traceroute to the destined host the INVALID counters on the INPUT table increase. So I try removing that line for INPUT and not DROP'ing any of my INVALID data, but still no reply goes back to the original host. I am guessing that the kernel still drops the packet because the UDP check sum is incorrect. tcpdump will show zero reply to these traceroute attempts. I have even tried to traceroute to other networking gear including my ISPs upstream Cisco cores, and they too drop these UDP packets ( although I have not contacted them to confirm/ask their thoughts, I just know traceroute fails).

Things I can rule out:
1. non-NAT related iptables rules. outside1 runs almost the exact same rule set minus NAT related statements and forwarding.
2. ISPs blocking traceroute udp packets. I can clearly traceroute from other hosts, or from inside those networks to the opposite one.
3. hardware. all my servers use Intel cards, if it was hardware I believe I'd see a lot more issues with udp packets, etc.

Oh, a traceroute -I (for icmp) works fine.

Googling this issue in every way that I can think of returns nothing. Both boxes, with different Kernels, creating the same packets that have bad check sums. Could be something with my NAT setup, but I don't do anything that crazy with it.

Thoughts? Any help, greatly greatly appreciated. I'd also be happy to provide more info if I can.

Remove the "INVALID" on output and make the inbound connections (in INPUT) like so:

33 6737 DROP all -- br0 any anywhere anywhere state INVALID,NEW
73908 51M ACCEPT all -- any any anywhere anywhere state RELATED,ESTABLISHED

Do you really want to drop all INVALID outbound packets? When you do a traceroute (in Linux/Unix) you're actually not making a valid connection but rather using a little trick to generate ICMP port unreachable errors (and time-exceeded), which traceroute uses to tell how far away the hosts are. Your IPTables rules are breaking this little trick...

-- FLR or flrichar is a superfan of Linux Journal, and goofs around in the LJ IRC Channel