Introduction

This document describes how to troubleshoot issues with Optimal Gateway Selection (OGS). OGS is a feature that can be used in order to determine which gateway has the lowest Round Trip Time (RTT) and connect to that gateway. One can use the OGS feature in order to minimize latency for Internet traffic without user intervention. With OGS, Cisco AnyConnect Secure Mobility Client (AnyConnect) identifies and selects which secure gateway is best for connection or reconnection. OGS begins upon first connection or upon a reconnection at least four hours after the previous disconnection. More information can be found in the Administrator's guide.

Tip: OGS works best with the latest AnyConnect client and ASA software Version 9.1(3)* or later.

How does OGS work?

A simple Internet Control Message Protocol (ICMP) ping request does not work because many Cisco Adaptive Security Appliance (ASA) firewalls are configured to block ICMP packets in order to prevent discovery. Instead, the client sends three HTTP/443 requests to each headend that appears in a merge of all profiles. These HTTP probes are referred to as OGS pings in the logs, but, as explained earlier, they are not ICMP pings. In order to ensure that a (re)connection does not take too long, OGS selects the previous gateway by default if it does not receive any OGS ping results within seven seconds. (Look for OGS ping results in the log.)

Note: AnyConnect should send an HTTP request to 443, because the response itself is important, not a successful response. Unfortunately, the fix for proxy handling sends all requests as HTTPS. See Cisco bug ID CSCtg38672 - OGS should ping with HTTP requests.

Note: If there are no headends in the cache, AnyConnect first sends one HTTP request in order to determine if there is an authentication proxy, and if it can handle the request. It is only after this initial request that it begins the OGS pings in order to probe the server.

OGS determines the user location based on the network information, such as the Domain Name System (DNS) suffix and the DNS server IP address. The RTT results, along with this location, are stored in the OGS cache.

OGS location entries are cached for 14 days. Cisco bug ID CSCtk66531 was filed to make these settings user-configurable.

OGS is not run again from this location until 14 days after the location entry is first cached. During this time, it uses the cached entry and the RTTs determined for that location. This means that when AnyConnect starts again, it does not perform OGS again; instead, it uses the optimal gateway order in the cache for that location. In the Diagnostic AnyConnect Reporting Tool (DART) logs, this message is seen:

RTT is determined with a TCP exchange to the Secure Sockets Layer (SSL) port of the gateway to which the user will try to connect as specified by the host entry in the AnyConnect profile.

Note: Unlike the HTTP-ping, which does a simple HTTP post and then displays the RTT and the result, OGS computations are slightly more complicated. AnyConnect sends three probes for each server, and calculates the delay between the HTTP SYN that it sends out and the FIN/ACK for each of these probes. It then uses the lowest of the deltas in order to compare the servers and make its selection. So, even though HTTP-pings are a fairly good indication of which server the AnyConnect will choose, they might not necessarily tally. There is more information about this in the rest of the document.

Currently, OGS only runs the checks if the user comes out of a suspend, and the threshold has been exceeded. OGS does not connect to a different ASA if the ASA the user is connected to crashes or becomes unavailable. OGS contacts only the primary servers in the profile in order to determine the optimal one.

Once the OGS client profile is downloaded, when the user restarts the AnyConnect client, the option to select other profiles will be grayed out as shown here:

Even if the user machine has multiple other profiles they will not be able to select any of them until OGS is disbaled.

OGS Cache

Once the calculation is finished, the results are stored in the preferences_global file. There have been issues with this data not being stored in the file before.

Location Determination

OGS caching works on a combination of the DNS domain and the individual DNS server IP addresses. It works as follows:

Location A has a DNS domain of locationa.com, and two DNS server IP addresses - ip1 and ip2. Each domain/IP combination creates a cache key that points to an OGS cache entry. For example:

locationa.com|ip1 -> ogscache1

locationa.com|ip2 -> ogscache1

If AnyConnect then connects to a physically-different network, the same buildup of domain/IP combinations is created and checked against the cached list. If there are any matches at all, that OGS cache value is used, and the client is still considered to be at location A.

Failure Scenarios

Here are some failure scenarios users might encounter:

When Connectivity to the Gateway is Lost

When OGS is used, if connectivity to the gateway to which the users are connected is lost, then AnyConnect connects to the servers in the backup server listandnot to the next OGS host. The order of operations is as follows:

OGS contacts only the primary servers in order to determine the optimal one.

Once determined, the connection algorithm is:

Attempt to connect to the optimal server.

If that fails, try the optimal server’s backup server list.

If that fails, try each server that remains in the OGS selection list, ordered by its selection results.

Note: When the administrator configures the backup server list, the current profile editor only allows the administrator to enter the Fully Qualified Domain Name (FQDN) for the backup server, but not the user-group as is possible for the primary server: Cisco bug ID CSCud84778 has been filed in order to correct this, but the complete URL must be entered in the host address field for the backup server, and it should work: https://<ip-address>/usergroup.

Resume After a Suspend

In order for OGS to run after a resume, AnyConnect must have had a connection established when the machine was put to sleep. OGS after a resume is only performed after the network environment test occurs, which is meant to confirm that network connectivity is available. This test includes a DNS connectivity subtest.

However, if the DNS server drops type A requests with an IP address in the query field, as opposed to replying with "name not found" (the more common case, always encountered during tests), then Cisco bug ID CSCti20768 "DNS query of type A for IP address, should be PTR to avoid timeout" applies.

TCP Delayed-ACK Window Size Selects Incorrect Gateway

When ASA versions earlier than Version 9.1(3) are used, the captures on the client show a persistent delay in the SSL handshake. What is noticed is that the client sends its ClientHello, then the ASA sends its ServerHello. This is normally followed by a Certificate message (optional Certificate Request) and ServerHelloDone message. The anomaly is two-fold:

The ASA does not immediately send the Certificate message after the ServerHello. The client window size is 64,860 bytes, which is more than enough to hold the entire response from the ASA.

The client does not ACK the ServerHello immediately, so the ASA retransmits the ServerHello after ~120ms, at which point the client ACKs the data. Then the Certificate message is sent. It is almost as though the client waits for more data.

This happens because of the interaction between TCP slow-start and TCP delayed-ACK. Prior to ASA Version 9.1(3), the ASA uses a slow-start window size of 1, whereas the Windows client uses a delayed-ACK value of 2. This means that the ASA only sends one data packet until it gets an ACK, but it also means that the client does not send an ACK until it receives two data packets. The ASA times out after 120ms and retransmits the ServerHello, after which the client ACKs the data and the connection continues. This behavior was changed by Cisco bug ID CSCug98113 so that the ASA uses a slow start window size of 2 by default instead of 1.

This can impact OGS calculation when:

Different gateways run different ASA versions.

Clients have different delayed-ACK window sizes.

In such situations, the delay introduced by the delayed-ACK could be sufficient to cause the client to select the wrong ASA. If this value differs between the client and the ASA, there could still be problems. In such situations, the workaround is to adjust the Delayed Acknowledgements window size.

Windows

Start the Registry Editor.

Identify the GUID of the interface on which you want to disable the delayed-ACK. In order to do this, navigate to:HKEY_LOCAL_MACHINE > SOFTWARE > Microsoft > WindowsNT > CurrentVersion > NetworkCards > (number).Look at each number listed under NetworkCards. On the right-hand side, the Description should list the Interface (for example, Intel(R) Wireless WiFi Link 5100AGN) and the ServiceName should list the corresponding GUID.

Locate and then click this registry subkey:HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\Interfaces\<Interface GUID>

On the Edit menu, point to New, and then click DWORD Value.

Name the new value TcpAckFrequency, and assign it a value of 1.

Quit Registry Editor.

Restart Windows for this change to take effect.

Note: Cisco bug ID CSCum19065 has been filed to make TCP tuning parameters configurable on the ASA.

Typical User Example

The most common use case is when a user at home runs OGS the first time, it records the DNS settings and the OGS ping results in the cache (defaults to a 14-day timeout). When the user returns home the next evening, OGS detects the same DNS settings, finds it in the cache, and skips the OGS ping test. Later, when the user goes to a hotel or restaurant that offers Internet service, OGS detects different DNS settings, runs the OGS ping tests, selects the best gateway, and records the results in the cache.

The processing is identical when it resumes from a suspended or hibernated state, if the OGS and AnyConnect resume settings allow for it.

Troubleshoot OGS

Step 1. Clear the OGS Cache in Order to Force a Reevaluation

In order to clear the OGS cache and reevaluate the RTT for available gateways, simply delete the Global AnyConnect Preferences file from the PC. The location of the file varies based on the Operating System (OS):

Windows Vista and Windows 7

C:\ProgramData\Cisco\Cisco AnyConnect Secure Mobility Client\preferences_global.xml Note: in older client versions it used to be stored in C:\ProgramData\Cisco\Cisco AnyConnect VPN Client

/opt/cisco/anyconnect/.anyconnect_globalNote: with older versions of the client it used to be /opt/cisco/vpn..

Linux

/opt/cisco/anyconnect/.anyconnect_globalNote: with older versions of the client it used to be /opt/cisco/vpn..

Step 2. Capture the Server Probes During the Connection Attempt

Start Wireshark on the test machine.

Start a connection attempt on AnyConnect.

Stop the Wireshark capture once the connection is complete.

Tip: Since the capture is only used in order to test OGS, it is best to stop the capture as soon as AnyConnect selects a gateway. It is best to not go through a complete connection attempt, because that can cloud the packet capture.

Step 3. Verify the Gateway Selected by OGS

In order to verify why OGS selected a particular gateway, complete these steps:

Initiate a new connection.

Run AnyConnect DART:

Launch AnyConnect, and click Advanced.

Click Diagnostics.

Click Next.

Click Next.

Examine the DART results found in the newly created DartBundle_XXXX_XXXX.zip file on the desktop.

Navigate to Cisco AnyConnect Secure Mobility Client > AnyConnect.txt.

Note the time the OGS probes started for a particular server from this DART log:

Server Address RTT (ms)gw1.cisco.com 302gw2.cisco.com 132 <========= As seen, 132 was the lowest delay of the three probes from the previous DART loggw3.cisco.com 506gw4.cisco.com 877

Selected 'gw2.cisco.com' as the optimal server.

******************************************

Step 4. Validate the OGS Calculations Run by AnyConnect

Inspect the capture for the TCP/SSL probes used in order to calculate RTT. See how long the HTTPS request takes over a single TCP connection. Each probe request should use a different TCP connection. In order to do this, open the capture in Wireshark, and repeat these steps for each of the servers:

Use the ip.addr filter in order to isolate the packets sent to each of the servers into their own capture. In order to do this, navigate to Edit, and select Mark All Displayed Packets. Then navigate to File > Save As, select the Marked packets only option, and click Save:

In this new capture, navigate to View > Time Display Format > Date and Time of Day:

Identify the first HTTP SYN packet in this capture that was sent when the OGS probe was sent based on the DART logs as identified in Step 3.3.2. It is important to remember that, for the first server, the first HTTP request is not a server probe. It is easy to mistake the first request for a server probe, and thus arrive at values completely different from what OGS reports. This problem is highlighted here:

In order to more easily identify each of the probes, right-click the HTTP SYN for the first probe, and then select Colorize Conversation as shown here:

Repeat this process for the SYNs on all of the probes. As shown in the previous image, the first two probes are depicted in different colors. The advantage of colorizing the TCP conversations is to easily spot retransmissions or other such oddities per probe.

In order to change the time display, navigate to View > Time Display Format > Seconds Since Epoch:

Select Milliseconds, because that is the level of precision that OGS uses.

Calculate the time difference between the HTTP SYN and the FIN/ACK, as shown in the diagram of Step 4. Repeat this process for each of the three probes, and compare the values to those shown in the DART logs in Step 3.3.3.

Analysis

If after the analysis of the captures the determined RTT values are calculated and compared to the values seen in the DART logs and everything is found to match up, but it still seems like the wrong gateway is being selected, then it is due to one of two problems:

There is an issue on the headend. If this is the case, there might be too many retransmissions from one particular headend, or any other such oddities seen in the probes. A closer analysis of the exchange is required.

There is a problem with the Internet Service Provider (ISP). If this is the case, there might be fragmentation or large delays seen for one particular headend.

Q&A

Q: Does OGS work with load-balancing?

A: Yes. OGS is only aware of the cluster master name, and uses that in order to judge the nearest headend.

Q: Does OGS work with the proxy settings defined in the browser?

A: OGS does not support auto proxy or proxy Auto Config (PAC) files, but does support a hard-coded proxy server. As such, OGS operation does not occur. The relevant log message is: "OGS will not be performed because automatic proxy detection is configured."