Advanced Network Troubleshooting: Using traceroute

This discussion is a continuation on our series about network troubleshooting. You can read our feature on Basic Network Troubleshooting here. On this article, we focus on troubleshooting connectivity problems through examining the output produced by the traceroute command.

The traceroute command lists all the router jumps that happen between your server and the target server. Checking this list helps you verify if the routing over the networks in between is correct. All operating systems carry some form of router-path tracing utility. Linux distributions, for example, have tracepath and traceroute6 (for IPv6; equivalent to traceroute -6), while Windows has tracertand PathPing (Windows NT).

This is how traceroute works:

It sends a ICMP or UDP packet with a time-to-live (TTL) of “0” to the target server.

The first router on the path recognizes that the TTL already exceeded and drops the packet. At the same time, this router also sends an Internet Control Message Protocol (ICMP) time-exceeded message back to the source.

traceroute then records the IP address of the router that sent the ICMP message as this is the first “hop” on the path to the final server destination.

traceroute does the same action but uses a TTL of “1” this time. The first hop reads this packet, decrements its TTL to 0 and forwards it to the hop on the path. Second router then does the same actions as in step 3.

This continues until the final or target server is reached.

You will, of course, only receive responses from functioning machines. Simply put, if a device responds when you do your troubleshoot, it is not likely the source of the connectivity problem.

Use the following syntax to generate traceroute reports:

# traceroute [destination_host]

Below is an example of a traceroute output for a query on google.com. Note that all the hop times are less than 50 milliseconds (ms). This is the acceptable return speed.

The expected 5-second response time was exceeded. The delay could be caused by one of the following:

A router on the path is not sending back the ICMP time-exceeded messages.

A router or firewall in the path is blocking the ICMP time-exceeded messages.

The target IP address is not responding.

!H, !N, or !P

The host, network, or protocol is not reachable.

!X or !A

An administrator-imposed setting is blocking the, which means that either a router Access Control List (ACL) or firewall is in the way.

!S

The source route has failed as traceroute attempts to use a certain path. A certain router security setting might be causing this failure.

Performing bidirectional traces

Always trace from both directions: from the source IP to the target IP, and from the target IP to the source IP. Routes are often asymmetric, which mean they take one path in one direction and a different path in the return direction. Trace the route both ways to pinpoint a problem more accurately.

Tracing via looking glass

A lot of Internet service providers (ISPs) provide a facility to do a traceroute from dedicated servers called looking glasses. As these looking glasses are in various locations, you can trace whether the connectivity issue you are experiencing stems from your web server or from the ISP being used.

You can do a quick web search and query the term “Internet looking glass” to a get a long list of alternatives. You can also go to traceroute.org, which already lists looking glasses by country.

Time-exceeded false alarm

If traceroute does not get a response within a 5-second timeout interval, three asterisks (see table above) appear beside that hop:

Note that there are devices that prevent receiving traceroute packets but allow ICMP packets. To get around this, add an -I flag to the traceroute syntax so that it uses ICMP packets instead. See the change below after the -I flag was used:

This is not an outright indication of congestion, latency, or packet loss. If those issues are really happening, then all the other hops past 7 should have been problematic as well. What the trace result above actually says is that the devices on hops 6 and 7 were just slow to respond with ICMP TTL-exceeded messages. Remember that a lot of web routing devices give very low priority to packets related to trace utilities so they can give more bandwidth to other more lucrative traffic.

Request timeout before reaching target server

If the trace times out before the target server is reached, the possible causes may can one of the following scenarios:

A server has a bad default gateway.

The server is running a firewall that blocks traceroute.

The server is either shut down, disconnected from the network, or has an incorrectly configured network interface controller (NIC).

In the example below, the last device that responded to traceroute is a router that acts as the default gateway of the server. Remember that the problem, in this instance, is not with the router but with the server as traceroute only receives responses from functioning devices.

A ping to 162.219.X.X gave a TTL timeout message. Usually, this event only happens if there is a routing loop wherein the packet bounces between two routers on the way to the target server. Each bounce makes the TTL decrease by a count of “1” until it reaches “0,” at which point the ping request times out.

The mentioned routing loop was confirmed when a traceroute was done and the packet was seen bouncing between routers 12.34.56.78 and 12.34.56.79:

The routers with IPs 12.34.56.78 and 12.34.56.79 had their routing processes reset to solve the problem. Further investigation showed that the issue was set off by an unstable network link that caused frequent routing recalculations. The constant activity eventually corrupted the routing tables of one of the routers.

Reasons for failed traceroutes

There are several possible reasons a traceroute fails to reach the target server:

The traceroute packets are blocked or rejected by a router in the path. Usually, the router immediately after the last visible hop is the one causing the blockage. Check the routing table and the status of this device.

The target server does not exist on the network, which means it is either disconnected or turned off. Note that !H or !N messages are likely to appear.

The network where you are expecting the target host to be in does not exist in the routing table of one of the routers in the path. Note that !H or !N messages are likely to appear.

Wrong IP address is used for the target server.

There is a routing loop where packets bounce between two routers and never reach the target destination.

The packets do not have a proper return path to your server. The router immediately after the last visible hop where the routing changes. If this occurs, do the following steps:

Log on to the last visible router.

Look at the routing table to know where the next hop should be.

Log on to this next hop router.

Do a traceroute from this router to your target server.

If the trace completes – The routing to the target server is working fine. Trace back to your source server and traceroute will probably fail at the bad router on the return path.

If the trace fails – Test the routing table and check the other status of all the hops between this router and your target destination.

Essentially, if nothing is blocking your traceroute packets, then the last visible router of an incomplete trace is either the last good router on the path or the last router with a valid return path to the server that issued the traceroute.

The traceroute command is a very handy tool when troubleshooting network connectivity problems. Understanding it is crucial for every network administrator.