Adding Networks to Exadata: Fun with Policy Routing

I’ve noticed that Exadata servers are now configured to use Linux policy routing. Peeking at My Oracle Support, I’ve noticed that note 1306154.1 goes in a bit more detail about this configuration. It’s apparently delivered by default with factory build images 11.2.2.3.0 and later. The note goes on to explain that this configuration was implemented because of asymetric routing problems associated with the management network:

Database servers are deployed with 3 logical network interfaces configured: management network (typically eth0), client access network (typically bond1 or bondeth0), and private network (typically bond0 or bondib0). The default route for the system uses the client access network and the gateway for that network. All outbound traffic that is not destined for an IP address on the management or private networks is sent out via the client access network. This poses a problem for some connections to the management network in some customer environments.

This configuration tells traffic originating from the 10.10.10.93 IP (which is the management interface IP on this particular machine) and traffic destined to this address to be directed away from the regular system routing table to a special routing table 220. Route-eth0 configures table 220 with two routers: one for the local network and a default route through a router on the 10.10.10.1 network.

The difference between this type of policy routing and regular routing is that traffic with the _source_ address of 10.10.10.93 will automatically go through default gateway 10.10.10.1, regardless of the destination. (The bible for Linux routing configuration is the Linux Advanced Routing and Traffic Control HOWTO, for those looking for more details.)

I ran into an issue with this configuration when adding a second external network on the bondeth1 interface. I set up the additional interface configuration for a network, 10.50.52.0/24:
[root@exa1db01 network-scripts]# cat ifcfg-bondeth1
DEVICE=bondeth1
USERCTL=no
BOOTPROTO=none
ONBOOT=yes
IPADDR=10.50.52.104
NETMASK=255.255.255.0
NETWORK=10.50.52.0
BROADCAST=10.50.52.255
BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000 updelay=5000 num_grat_arp=100"
IPV6INIT=no
GATEWAY=10.50.52.1

This was a dedicated data guard network to a remote server, IP 10.100.52.10.

The problem with this configuration was that it didn’t work. Using tcpdump, I could see incoming requests come in on the bondeth1 interface, but the replies came out the system default route on bondeth0 and did not reach their destination. After some digging, I did find the cause of the problem: In order to determine the packet source IP, the kernel was looking up the destination in the default routing table (table 255). And the route for the 10.100.52.0 network was in non-default table 211. So the packet followed the default route instead, got a source address in the client-access network, and never matched any of the routing rules for the data guard network.

And then we ran into a second issue: The main interface IP could now be reached, but not the virtual IP (VIP). This is because the rule configuration, taken from the samples, didn’t list the VIP address at all. To avoid this issue, and in the case of VIP addresses migrating from other cluster nodes, we set up a netmask in the rule file, making all addresses in the data guard network use this particular routing rule:
[root@exa1db01 network-scripts]# cat rule-bondeth1
from 10.50.52.0/24 table 211
to 10.50.52.0/24 table 211

So to sum up, when setting up interfaces in a policy-routed Exadata system remember to:

Set up the interface itself and any bonds using ifcfg- files.

Create a rule- file for the interface, encompassing every possible address the interface could have. I added the entire IP subnet. Add “from” and “to” lines with a unique routing table number.

Create a route- file for the interface, listing a local network route and a default route with the default router of the subnet, all using the table number defined on the previous step.

Add to the route- file any static routes required on this interface, but don’t add a table qualifier.

Share this article

4 Responses to “Adding Networks to Exadata: Fun with Policy Routing”

[…] basically specify that traffic go back out whatever interface it came in on. For more info, see http://www.pythian.com/news/36747/ad…olicy-routing/ Good luck and I hope this helps someone. # cat rule-bondeth0 from 10.22.102.0/23 table 210 to […]

Sorry to bother you but an attempt to configure Linux Advanced Routing (policy routing) brought me here after reading hundreds of other sites and opening an SR in MOS. Three months have passed without a valid info from their side but this is not surprising based on my experience from MOS. The key issue here is where you are pointing (among others) the following:
“…•Add to the route- file any static routes required on this interface, but DON’T ADD A TABLE QUALIFIER…”.
This specific notification is included only in your blog (as I’ve seen so far).

The problem is that although your way of setup regarding addition of extra static routes in route-ethx file works just fine (it goes to main routing table), according to relevant redhat knowledgebase article regarding “How to make routing rules persistent, when I want packets to leave the same interface they came in?” this is a wrong setup in terms that you must add the table qualifier to the static route entry! I quote from the knowledgebase article (example from article’s route-eth0 setup):
…..
# cat /etc/sysconfig/network-scripts/route-eth0
default via dev eth0 table 1
#to add additional static routes
# via dev eth0 table 1
…
As you can see on the last line he adds the table qualifier to the additional static route and btw he removes the DEFAULT GATEWAY from any relevant files (/etc/sysconfig/network, ifcfg-eth* files). The problem of course is that when I add the table qualifier to my static route, although it’s seen in the specific table’s routes, it’s totally ignored so I cannot access (outgoing initiated only) the corresponding network (I get “network unreachable…”).
So bottom line if you are not bored already. Besides the very important fact that your way works, do you have any official article/document from Oracle or Redhat that backs this type of setup for extra static routes in Advanced Routing configurations?
I hope that you will find the time to clarify things if possible.

I assume you’re asking if you can run two different RAC clusters in the same physical Exadata rack. I don’t see it done often, but it’s possible to physically partition your rack between two clusters, each with their own compute and storage servers. This so-called “split rack” configuration would locally separate everything but the InfiniBand and power distribution infrastructure. To maintain high availability and quorum, you would want a minimum of 2 compute and 3 storage servers for each part of your split rack.

As isn’t particularly well documented, so I’d suggest engaging professional services for assistance if you’re considering this type of configuration.