I believe I've achieved a table that routes packets from and to eth1/192.168.3.x through 192.168.3.1, and packets from and to eth0/192.168.1.x through 192.168.1.1 (helpful source).

Question: when doing tracepath from 192.168.3.20 (from within vserver), I'm getting kernel: [318535.927489] martian source 192.168.3.20 from 212.47.223.33, on dev eth0 at or near the target IP, while intermediary hops go without (log below).

Note that you may see packets from non-routable IP addresses when running the traceroute or tracepath commands. While packets cannot be routed to these routers, packets sent between 2 routers only need to know the address of the next hop within the local networks, which could be a non-routable address.

Can someone explain that paragraph in human language? Based on short initial trials so far, everything else seems to work without causing martians. Is this contained to the nature of tracepath operation or do I have some other bigger routing problem that will cause work traffic breakage?

Side note: is it possible to inspect martian packet with tcpdump or wireshark or anything of the sort? I'm have not been able to get it to show up on my own.

3 Answers
3

If you receive the martian packet, wireshark should be able to show it.

I also see you've disabled loopback by setting an unreachable route for 127.0.0.0/8. This isn't standards-compliant, and probably isn't that useful to do, but I doubt it has much to do with this problem.

The documentation paragraph simply means that you're likely to see RFC1918 addresses or other unreachable things in the traceroute since these addresses can be used between routers in many cases (eg. within one AS), but will be the address the router gives when the packet exceeds its TTL there. It doesn't mean you should expect martians. I also doubt it has anything to do with this particular packet.

The martian packet may have nothing to do with the traceroute. However, it might. It's often caused by a gateway not doing source nat when it ought to be, but it's also possible that you have a broken NAT rule somewhere translating the destination address of packets outbound from eth1 toward the IP of eth0. This seems most likely given the source of the packet. It also might mean that you're forgetting to do source NAT on outbound packets of yours at your gateway.

You should run a wireshark capture on eth1 and eth0 both, and try and find the packet in eth0 and see if you can correlate it with one from eth1. Also check your NAT rules.

I believe that the "problem" is that both your interfaces are connected to the same network. At some point you're getting packets with source IP 192.168.3.20 on your eth0 interface which causes the log_martians config entry to come in actino.

I'm pretty sure that you have rp_filter enabled and that this will go away if you disable it (e.g. /proc/sys/net/ipv4/conf/all/rp_filter), but read bellow:

This can be because of two reasons:

You're receiving legitimate packets from your own network from this (i.e. the wrong) interface. In your case, in theory, all packets for subnet 192.168.3.0/27 should be arriving in eth1 unless you have multipath routing, in which case you need to disable rp_filter.

During your tracepath one of the intermediate links uses IP addresses from the subnet 192.168.3.0/27. I.e. imagine that one of the intermediate ISPs between you and the destination uses IP addresses from that subnet on its routers. Because of the way traceroute works you'll be receiving ICMP TTL EXCEEDED packets with that source IP (since the router will be sending them). And since your default route is via eth0, the kernel will complain (because it's expecting everything from 192.168.3.0/27 to arrive in eth1 - because of rp_filter).

There are multiple ways to troubleshoot this and yes you can tcpdump on that (e.g. tcpdump -ni eth0 'host 192.168.3.20').