While digging through the archives of the Cisco Ironport support knowledge base, I came across a pretty slick solution for load balancing client web traffic between two or more proxy servers in a PAC file-based deployment scenario. So what is a PAC file? A PAC file is a text file containing policy information written in JavaScript and interpreted by a web browser each time a HTTP request is made. The policy defines what and where web traffic should be sent by the web browser, either to the proxy server or bypass the proxy server. Typically, this is a fairly vanilla policy. For example (see comments for detail):

Using the PAC file above, all web traffic not matching a conditional statement resulting in a “direct” action will be sent only to iproxy01 in a steady state. Upon failure of iproxy01 traffic be sent to iproxy02. What if we have 40Mb of internet traffic we would like to load balance between the two? We could deploy WCCP and move to transparent redirection but are there any options with a PAC file? Absolutely!

Since a PAC file is JavaScript-based, the plethora of Java classes are at your disposal to manipulate policy as you see fit. We’ll need to instruct the web browser to send connections to either web proxy using the result of some sort of algorithm. To accomplish this, we can write a Java function using a couple objects from the math class:

This function (selectRandomProxy()) will randomly select either case 0 which sends web traffic to iproxy01 or case 1 which sends web traffic to iproxy02. Using Math.random(), a random value will be select between 0.0 and 1.0 (i.e. 0.7234213). This value is then multiplied by 2. Math.floor() will then normalize the result to the closest integer no greater than the original result. For example, if Math.random() generates a random value of 0.25 which is then multiplied by two (0.50), Math.floor() would normalize this to an integer of zero. If Math.random() generates a random value of 0.75 which is then multiplied by two (1.50), Math.floor() would normalize this to an integer of one. A switch statement then evaluates the resulting integer value against a list of cases and returns the case matching the integer.

The web browser will evaluate the configured PAC file prior to every new HTTP connection and over time, result in nearly a 50/50 distribution of traffic between both web proxies.

A word of caution: There is no intelligence or session tracking with the load balancing decision making. It’s completely stateless. During a single HTTP session, objects will be fetched using both web proxies. While this isn’t necessarily an issue from the perspective of the web proxies, this may wreak some havoc on web apps behind a load balancer relying on session stickiness by source IP address. As HTTP objects part of the same session are fetched from two different source IP addresses (two web proxies), this will look like a new session to the destination load balancer and may not be “stuck” to the same real server. As long as both proxies are PAT’d to the same address, this shouldn’t cause an issue. Also, if you are doing any type of SSL termination on your web proxies for content inspection, this will cause you some problems.

Over the past 9 months since I changed jobs, I’ve had a ton of opportunities to work with the the Cisco ASA. Up until this point, our only clients utilizing the IPSec VPN client have been internal employees authenticated against Active Directory (via ACS) and a handful of vendors with locally created accounts on the ASA. However, a recent security policy change has dictated all users, both employees and vendors, must be authenticated against Active Directory but Radius must be utilized for accounting. No big deal, right? Move the local accounts on the ASA to AD, create new group mappings in ACS and communicate with the vendors. Not so fast. As it turns out, this will all work fine but, under the covers, a security problem has been created. Let’s look at the problem at a high level.

Here’s the configuration:

In the ASA, there are two tunnel-groups created: Employee and Vendor. The Employee tunnel-group allows full access to the internal network. The Vendor tunnel-group allows restricted access to the internal network.

In Active Directory, there are two groups/OUs created: Employee and Vendor.

In Radius, there are two groups created: Employee and Vendor. These are mapped to their respective groups in AD.

The Good

Vendor JoeyNT is successfully authenticated! Nice job, Joey. As you can see:

The clients credentials are passed from the ASA to Radius to Active Directory (AD).

When Radius passes the client’s credentials to AD, it will also ask for which groups(OUs) the user belongs via Microsoft Net Logon. These are then passed back to Radius upon successful authentication.

The client is placed into the AD-mapped group in Radius

Radius finally sends an auth success/failure message back to the ASA

In this case, the ASA receives an auth success and permits the client

The one important item to note is tunnel-group membership is not conveyed in any way between Radius and the ASA, by default. Radius simply sends a pass/fail message. This becomes the root of our problem.

The Bad

Now vendor JoeyNT is dissatisfied with his level of access and is feeling a bit ambitious. He has acquired the “Employee” PCF file from a developer within our organization to help with “support issues”. He logs into his VPN client with the Employee tunnel-group specified:

As you can see, JoeyNT is authenticated successfully and is placed into the Employee tunnel-group, allowing full access to the internal network, even though he is part of the Vendor OU in AD and placed in the Vendor group in Radius. Uh oh.

The Solution

What we need is a way for Radius to tell the ASA what tunnel-group the client should be allowed, along with the auth success message.

Enter vendor-specific-attribute (VSA) 3076/85 – Tunnel-Group-Lock.

By enabling this, Radius will send the locally configured tunnel group name to the ASA, based on the AD-to-Radius group mapping. If this does not match the tunnel-group to which the client is attempting to join, the client will be denied access. With Tunnel-Group-Lock used, here is our oh so adventurous vendor JoeyNT attempting to authenticated to the Employee tunnel-group:

The Configuration

In Cisco ACS, this first needs to be enabled under Interface Configuration–>RADIUS (Cisco VPN 3000/ASA/PIX 7.x+):

Then check the box under [026/3076/085] Tunnel-Group-Lock and click submit:

Now under Group Setup, each group will have the following under the Cisco VPN 3000/ASA/PIX v7.x+ RADIUS Attributes section. The value specified here MUST match a configured tunnel-group on the ASA:

That’s it!

Footnote: I believe this can also be done using IETF class attribute 25 but I have not tested this.

While working on the design for a 20+ site DMVPN migration, I realized something often overlooked in the documentation for an internet-based DMVPN deployment. To maintain a zero (or minimal) touch deployment model in an internet-based DMVPN, default routing is a must for dynamic tunnel establishment between hubs and spokes. The public addressing of spoke routers is typically at the mercy of one or more service providers and even if you have been allocated a static address per the service contract, these still have a tendency to change due to reasons out of the customer’s control. This is especially true in teleworker-type deployments with a broadband service provider. To deal with this issue, an engineer has two options: maintain a list of static routes on every hub/spoke router comprised of every public and next-hop address in the DMVPN environment or use a static default route pointing out the public interface.

Tough decision, huh? Not so fast.

What happens when you have a transparent proxy deployed in your network at the hub site? No problem, just have the spoke routers carry a default route advertised into the IGP from the hub site. Wait…we are already using a default route to handle DMVPN tunnel establishment between spoke routers. To resolve this issue, we need two default routes: one for clients within the VPN and one for establishing spoke-to-spoke tunnels. We could add two defaults to the same routing table with the same administrative distance but load balancing is not the behavior we want and our tunnels would throw a fuss due to route recursion. How about policy-based routing with the local policy command configured for router-initiated traffic? Pretty ugly. Enter FVRF or Front-door VRF.

Front-door VRF takes advantage of the VRF-aware features of IPSec. While touted as a security feature in the scant Cisco documentation by separating your private routing table into an isolated construct from your public address space, this feature also provides an ideal solution for maintaining separate routing topologies for DMVPN control-plane traffic and user data-plane traffic.

So how does all this work? Pretty simply if you are familiar with the VRF concept. First, on your spoke routers, create a VRF to be used for resolving tunnel endpoints:

Note the only VRF-specific configuration is the crypto keyring statement. Both the ISAKMP policy and IPSec transform-set configuration is no different than a typical deployment. GET VPN could be used instead, if your security posture calls for it.

Configuring the tunnel interface is standard fare except for the “tunnel vrf” argument. This command forces the far-side tunnel endpoint to be resolved in the VRF specified. By default, tunnel endpoint resolution takes place in the global table which is obviously not the behavior we want. Also, notice the “ip nhrp shortcut” and “ip nhrp redirect” arguments. These two commands mean we are using DMVPN Phase 3 and it’s fancy CEF rewrite capable for spoke-to-spoke tunnel creation.

Last, lets add our default route within the VRF:

ip route vrf FVRF 0.0.0.0 0.0.0.0 10.1.1.2 name DEFAULT_FOR_FVRF

And we’re done! At this point, assuming your hub site configuration is correct, you should have a working DMVPN tunnel.

In the output below, notice the “fvrf” and “ivrf” sections under tunnel interface 1. The concept of IVRF is the exact opposite of FVRF: tunnel control-plane traffic operates in the global routing table, and your private side operates in a VRF. IVRF can be tricky in that, if your spoke routers are managed over the tunnel, all management functionality (SNMP, SSH, etc.) must be VRF-aware. Recent IOS releases have been much better with VRF-aware features but YMMV:

You can now configure your favorite flavor of IGP as would normally would (globally, that is) without impacting DMVPN control-plane traffic. In this scenario, OSPF is used with the tunnel interfaces configured as a point-to-multipoint network type. The static default route in the FVRF table handles tunnel establishment while the dynamically-learned default via OSPF handles the user data plane within the VPN:

Front-door VRF works best when used on both hub and spoke routers. Why? Well, anytime a new spoke is to be provisioned, you have to do zero configuration on the hub site. Configure the spoke router, ship it out the door, and have the field plug it in at their convenience.

This post could also be titled “How to build a healthy, long-lasting relationship with your system administration team”. One of the most important (and overlooked) pieces of deploying VMware ESX in a network is handling an upstream network failure. Because larger organizations have segregated network and system administration teams, the switchport tends to be the demarcation of responsibility. Where this particularly fails is in the perceived reaction of a network component failure, be it an upstream switch or router.

With the increased push towards server consolidation and deployment of VMware, the “routed is better” mantra has become muted by the layer 2 requirements of virtual machine mobility. A virtualized server also can present cable density issues with each server possibly needing 6 NICs (2 x Production, 2 x VMKernal, 1 x Backup, 1 x iLO). From a network design perspective, a VMware deployment screams for a top of rack switching model. Top of rack switching and VMware ESX physical NIC (pNIC) failure detection methods can present some interesting challenges.

VMware ESX allows for two options to detect a upstream network failure: Beaconing Probing and Link Status. Here is an in-depth summary on both methods:

Basically, beacon probing is pretty awful if you’re a network admin. It will send broadcasts out each physical interface of the ESX server for EACH vlan configured (if using dot1q tagging which you should be). So that is:

p number of physical servers x n number of pNICs per server x v number of vlans = broadcast storm

Link status is the preferred failure detection method but it will only track the state of the local link (between the ESX server and the switch). This tells the ESX server nothing about the switch’s ability to forward frames. This is where link-state tracking comes in. Link-state tracking will convey the switch’s upstream link-state to the local link of the ESX server by creating a logic gate between upstream and downstream links.

Suppose you have the following loop-free network topology deployed in your data center:

The network detection failure method configured on the ESX server is link status. Most likely your ESX server is sending frames out both interfaces due to the particular load balancing configuration but in this case we are only interested in frames sent to the switch on the left. In the event the left switch’s uplink fails, we will experience a black hole situation for some of our traffic leaving the ESX server:

By utilizing link status as our ESX failure method detection, the ESX server merely tracks physical link state at layer 1 and the ability of the upstream switch to forward frames is not taken into account:

Link-state tracking configured on the switch will convey this uplink failure to the link directly connected to the ESX server. Let’s get our switch configured correctly (which is stupidly simple):

First, define your link state group globally:

Switch(config)#link state track 1

Then define your upstream links within the link state group:

interface GigabitEthernet1/0/1

link state group 1 upstream

Lastly, define your downstream links:

interface GigabitEthernet1/0/2

link state group 1 downstream

Now the upstream link state will be conveyed to the downstream links which will cause the link to the ESX server to be shutdown in the event the upstream switch link goes down. Interfaces are coupled in :

Once the upstream link failure occurs and the interface is marked as down, the resulting action created by link state tracking is to bring down all downstream interfaces:

By bringing down the physical state of the interfaces to the ESX servers, the action by ESX link status tracking will be to initiate a pNIC failover event:

This will in turn create an long and happy relationship between network and system administrators and eliminate another instance of finger pointing when redundancy fails to function correctly.

If you can tell me of a more understated topic on the CCIE Routing and Switching v4.0 lab blueprint than Optimized Edge Routing (OER), I’ll buy you a beer. This was quietly snuck into the blueprint in between policy-based routing and redistribution, both fairly straightforward topics. Should be no big deal right? False.

OER removes rigidness of standard IP routing where typical routing metrics are derived from physical layer measurements and in turn, dictates a generic routing policy for all traffic. OER does this by gathering higher-lever performance metrics through IP SLA and Netflow information and uses this to determine the most optimal exit point for certain destination prefixes or traffic classes. Once the ideal exit point has been decided, routing policy is dynamically updated to influence the specific traffic class.

Navigating through the configuration guide for OER can be daunting but configuration can be broken down into 5 steps:

1. Profile

The selection of a subset of traffic to optimize performance

Learns the flows passing through the router with the highest delay or throughput

Statically configure class of traffic to performance route

2. Measure

Once traffic has been profiled, metrics need to be generated against it. This is down through:

Passive monitoring – measuring performance of a traffic flow as the flow is traversing the data path

“Avoiding these types of problems is really quite simple: never announce the information originally received from routing process X back into routing process X.”

And it truly is that simple. Always mark/tag/color routes based on their source routing domain and when redistributing, select which routes to redistribute. After all, routes are merely destination information. It’s all about who needs to know and from whom they need to know it.

IPv6 unique local unicast addresses are the equivalent of IP version 4 RFC 1918 space in most ways and are formatted in the following fashion:

7-bit Prefix – FC00::/7

1-bit Local bit (position 8 ) – Always set to “1”…for now

40-bit “kinda-almost-unique” Global ID

16-bit Subnet-ID

64-bit Interface ID

The intention and scope of these addresses is for unicast-based intra/inter-site communication. The definition of a “site” within the plethora of IPv6 RFCs is slightly ambiguous but in the case of RFC 4193, the demarcation of a “site” is between ISP and customer. According to the RFC, unique local unicast addresses are permitted to be used between “sites” i.e. customer-to-customer VPN communication but the FC00::/7 prefix is to be filtered by default at any site-border router. This space is not intended to be advertised to any portion of the internet.

Now the interesting portion of this RFC is the recommended algorithm for generating a realistically unique yet theoretically common 40-bit Global ID for your local unicast addresses. Section 3.2.2 recommends the following:

Obtain the current time of day in 64-bit NTP format

i.e. reference time is C029789C.45564D4E

Obtain an EUI-64 identifier from the system running this algorithm

i.e. bia of C201.0DC8.0000

Concatenate the time of day with the system-specific identifier in order to create a key

Also included in the RFC is sample probabilities of IPv6 address prefix uniqueness depending on the number of peer connections to a site. It’s safe to say if you experience an overlap using this method to assign Global IDs, play the damn lottery. While this method almost eliminates any overlap possibility between sites, the Global IDs generated with this method are hardly “pretty” numbers and there will undoubtedly be folks assigning Global IDs of ::1/40. If you have ever went through a merger/acquisition with IPv4, do yourself a favor and follow academia for assigning your Global IDs.

I never thought I would come across the opportunity to use an OSPF virtual link in a production environment, but sure enough, yesterday was the day. The Maryland area had dual links from a 6500 running VSS into our OSPF backbone. Because of a fiber cut, both adjacencies into area 0 were lost (interfaces stayed up). This 6500 also had interfaces in area 18 as redundancy. In theory, the area 0 links would be lost, the 6500 would no longer be an ABR and traffic would re-route back through area 18 to the other ABR.

False.

Why? The 6500 still believed it was an ABR and the loop prevention rules of OSPF ABRs kicked in:

A type-3 LSA learned via a non-backbone area will not be forwarded back into the backbone area. This is similar to split-horizon with Distance Vector routing protocols.

ABRs will ignore LSAs advertised by other ABRs when calculating least cost paths. An ABR must not select a path through a non-backbone area to reach the backbone area.

Rule #2 applies to this particular situation. Summary LSAs from area 0 were in the OSPF database of the 6500. However, the LSAs were not being considered for SPF calculation because they were learned via a non-backbone area.

A couple reasons why this failed:

The interfaces in area 0 on the 6500 stayed up, though, the neighbor adjacencies were lost so the 6500 still considered itself as an ABR. This caused area 0 to become partitioned relative to this ABR. In certain IOS releases (I believe) if the adjacency is lost the interface will be marked as “down” from an OSPF standpoint. Here’s an example from 12.4 mainline code: *Mar 1 00:55:28.407: %OSPF-5-ADJCHG: Process 100, Nbr 10.4.4.2 on FastEthernet1/0 from FULL to DOWN, Neighbor Down: Dead timer expired
!
R4#sh int fast 1/0
FastEthernet1/0 is up, line protocol is up
!
R4#sh ip ospf Area BACKBONE(0) (Inactive)
Number of interfaces in this area is 1
Area has no authentication
SPF algorithm last executed 00:00:17.356 ago
SPF algorithm executed 5 times
Area ranges are
Number of LSA 14. Checksum Sum 0x085E01
Number of opaque link LSA 0. Checksum Sum 0x000000
Number of DCbitless LSA 0
Number of indication LSA 0
Number of DoNotAge LSA 9
Flood list length 0

While digging around during the outage, the 6500 chassis had two port-channel interfaces in area 0 pointing back into the location. I have no idea why. So even if the IOS version running behaved as decribed above, because of these port-channels, the 6500 would have still considered itself an ABR and not considered the summary LSAs learned via area 18 for SPF calculation.

Now what this will do is defeat the loop prevention rules mentioned above, specifically #2. As you can see, the virtual link between the ABRs is essentially the same as having a link in area 0. This will then allow each ABR to learn LSAs via area 0 and consider them for SPF calculation. This is a nice alternative to a tunnel because traffic is natively routed. If you look at the routing table, an intra-area router will be the next-hop for a route learned via the opposing ABR. There is a caveat in that virtual links cannot be used if the underlying transit area is a stub. Why? Intra-area stub routers lack full OSPF databases which means it lacks forwarding information which means there is a possibility of loops. As mentioned before regarding virtual links, traffic is not tunneled but natively routed so the intra-area routers must have complete forwarding information if they are to be used as a next-hop router for an ABR. This is similar to the iBGP full-mesh requirement.

We have a ton of remote sites (payments centers, small offices, etc) with a single router on-site, typically an 871 series hanging off a cable modem with a static IP. These routers are terminated via point-to-point GRE tunnels back to a pair of central hub sites. Because we run OSPF, and are poorly summarized, these routers can carry up to 7500 prefixes depending on the area type of the market. All these routers really need is a default route back towards the two hubs in a active/standby model with return traffic to the spokes preferring the active hub site for symmetry. Secondly, provisioning new remotes requires touching both the hub and spoke routers for building tunnel interfaces. The configs on the hub sites can be fairly long and annoying to troubleshoot, especially with static IPs of the remote sites changing over time and lack of cleaning up old configs.

To address the size of the RIB on the spokes, we could use statics and redistribution on the hubs and floating statics on the spoke routers. It will reduce the RIB of the spokes but its fairly high maintenance on both of the hub sites. With a static for a loopback, voice and data vlans, your looking at least 90 statics on some of the hub routers. That’s not helping the config complexity problem. Distribute lists do not work with OSPF in the outbound direction and the interface-level “ip ospf database-filter all” won’t help us leak a default to the spokes. We need distance vector. EIGRP stub flags and an 0.0.0.0/0 summary towards the spokes would be perfect.

To cut down on the tunnel interfaces, the obvious choice is DMVPN. There’s is little to no spoke-to-spoke traffic so DMVPN will purely serve as a tool for configuration simplification.

Here’s the DMVPN configuration for the two hub sites. First, notice there are no static unicast maps, multicast maps or nhs configuration pointing to the opposite hub site. Basically, I don’t want a tunnel and ultimately a EIGRP neighbor relationship built from Hub site 1 to Hub site 2. There would be no reason to have this in place and will only cause issues since both hubs are only advertising a default route out their tunnel interfaces. Secondly, Hub site 1 has it’s tunnel interface delay set to 100 so all spokes will prefer the default route via Hub site 1 after calculating their feasible distance. Lastly, the default route being generated to the spokes is being set with an administrative distance of 254. The reason for this is because when you manually summarize, a summary-route is generated in the routing table pointing to Null0 with an administrative distance of 5. While this is not necessarily a problem for CIDR blocks where more specific prefixes exist in the RIB, this can cause traffic following a default route to be black-holed. We want to set this null route above any IGP-learned default route so it is not preferred. Oh, and notice split-horizon and next-hop-self for EIGRP are not being disabled on the tunnel interface. We are not interested in spoke to spoke tunnels nor are we interested in spokes having all routes within the DMVPN. Disabling split-horizon would allow the spoke prefixes to be advertised back out the tunnel interfaces to the other spokes. Disabling next-hop-self would allow these prefixes to be advertised with a next-hop of the advertising spoke router (which is where the NHRP query would come into play for a spoke-to-spoke tunnel).

Here’s the DMVPN configuration for the spoke sites. It’s pretty straightforward. I found the “ip nhrp registration timeout” command had to be added on the spokes. When testing failure of a hub site, there were issues with EIGRP adjacencies being reformed with the failed hub router when it came back online. Because we don’t want Hub site 1 and Hub site 2 to be NHRP peers, the hub site’s will not query one another for NHRP mappings. So instead, the spokes are periodically broadcasting their presence every 5 seconds. When the hub comes back online, it will receive the registration message from the spoke, rebuild its mGRE tunnel and reform its EIGRP adjacency. The spoke routers do not necessarily need to be configured as stubs as they should never be queried but it is good practice.

So that takes care of getting a dynamic default to the spokes, but now we need advertise the routes from the spokes back into the rest of the network. Remember, we set the delay of the tunnel interface of Hub site 1 so all spokes would prefer its default route. When we are redistributing EIGRP into OSPF, we will be injecting them as E2s with a metric of 100 from Hub site 1 and a metric of 200 from Hub site 2. This should give us traffic symmetry. Also, we are only permitting specific blocks for redistribution. We don’t want anyone routing any prefix they damn please.