Tools for Troubleshooting IP Communication Issues

Articles and Tips: article

In the previous two Network Troubleshooter installments, we've demonstrated troubleshooting techniques for TCP/IP networks using various utilities. Let's look briefly at each of these utilities and what they can be used for when troubleshooting IP networking issues.

The PING.NLM and TPING.NLM Utilities

These utilities simply generate an ICMP request to verify that the address you are "pinging" is alive. Use these tools to verify which hosts are alive (local hosts or remote hosts). Use these utilities by typing the following at the server console prompt:

PING <IP address> <Enter> orTPING <IP address> <Enter>

The difference between PING and TPING is that the PING utility can send a continuous stream of ICMP ECHo requests. You can also modify the size of the requests (useful for troubleshooting fragmentation issues) and timeout between sends with the PING utility.

There is also a PING command that you can use from Windows workstation's DOS prompt for pinging servers on the network. To run the command type the following:

PING <IP address> <Enter>

at the C:\ prompt. You will see information similar to what you see when pinging at the server console.

The IPTRACE.NLM Utility

This NLM allows you to trace what route a packet takes from the source to its destination. At each intermediate router, an address is given. When communication fails, IPTRACE can give you an idea at what router the problem is. Once the problem router is detected, check its routing table for more information. Chances are there is a missing entry for either the source of destination network, or for a default route.

The TCPCON.NLM Utility

The TCPCON utility gives a huge amount of information that may help troubleshoot communication/routing problems (see Figure 1).

Figure 1:
The TCPCON utility's initial screen.

The following examples are to give you an idea of what to look for.

IP Address Translation (ARP Cache Table)

Verify that the MAC entries corresponding to IP addresses are correct. Note that the ARP cache timeout for IP address entries in this table is 5 minutes and these may therefore disappear from the table after this timeout. Incorrect entries here will usually result in connectivity problems on the local subnet.

Refrain from using static (permanent) ARP entries.

Routing Table

Make sure that a default route exists. Without a default route, no communication outside the local subnet is likely to happen.

Check for host (not network) entries in the table. If they exist, check if the protocol associated with these entries is ICMP. These ICMP generated entries are likely to be temporary and as a direct result of an ICMP discard or path MTU algorithm event - these may point to possible routing problems on the network. If such ICMP routing entries are seen, look at the destination and the next hop and verify whether the routes are correct by referencing a map of the network.

IP Statistics

In the Available Options window, select the Statistics/IP options, then use the "More IP Statistics" field to verify whether fragmentation and reassembly is occurring on a regular basis. The most important field to note is the fragmentation/reassembly detection failures, as such fragmentation will impact performance.

Check the 'local errors' field entries on Incoming and Outgoing Discarded Datagram fields to make sure that the values are not incrementing when the communication problems occur. This may point to filtering issues, a lack of resources available to process packets, or possible routing loops detected (check the ICMP Time Exceeded Messages entry as well). If they continue to increment, use the "Set Tcp Ip Debug = 1" command and using TID 10023725 identify where the problem may be.

ICMP Statistics

The Destination Unreachable Messages entry.
Under the Available Options window, select the Statistics/ICMP options, then look at the Destination Unreachable Messages entry. This is a generic message type and may be broken down to host, network, or port unreachable messages - need to take traces to identify which of the unreachable messages is being generated. If this value is incrementing consistently, take a LANalyzer trace and verify the type of unreachable message that is being generated. From this, look at the host, network or port on the remote host to see why the problem exists.

If it is a "Port unreachable" message type (see Figure 2), figure out what application it is trying to communicate with, and why it is not listening to the incoming packets.

Figure 2:
Viewing the Destination Unreachable Message entry.

If it is a "Network unreachable" message type, look at the routing table of the host generating this packet and find out why it doesn't have an entry for the remote network.

If it is a "Host unreachable" message type, the chances are that the local router has issued an ARP request for the IP address of the node but no response to that request was received. It could simply be that the IP address you are trying to communicate with is down.

ICMP Redirects.
Servers send ICMP redirect packets to notify other IP hosts that they should not use a specific gateway to route IP traffic to certain destinations. The server receives ICMP redirect packets from other IP hosts that detect that the server is sending packets over inappropriate gateways. If the redirect value received is high, it may indicate that the default router you're pointing to is not the correct one, or that you have a routing problem in your network.

The Time Exceeded Messages entry.
This entry increases for two reasons. First, the TTL (Time To Live) field in the IP header has dropped to 0 and the packet was discarded.

With the default setting at 128, it's likely that the packet has been "ping-ponged" between different hosts. If this ICMP message count is high, take a LANalyzer trace to help you locate the ICMP message and see what host has generated it and why.

Note: Running the IPTRACE.NLM (mentioned above) can cause the Time Exceeded field to increment. When troubleshooting IP routing problems, ensure the value increments on its own without IPTRACE running.

Second, a large IP packet has been fragmented and not rebuilt within a certain period of time. Again, use a LANalyzer trace to verify the reason for the Time Exceeded message. Should they be occurring as a result of fragmentation problems, use the Set Always Allow IP Fragmentation=OFF parameter to force fragmentation to be set to OFF and to enable the path MTU algorithm.

The Source Quench Messages entry.
If a system is overloaded and cannot process an IP packet due to lack of resources, it will send back a Source Quench ICMP message. This may occur occasionally in which case they may be ignore. If the Source Quench count is high however, and continuously increasing, take a LANalyzer trace to figure out what host is sending these ICMP messages, and consider adding memory to the servers and minimum packet receive buffers to the LAN card.

* Originally published in Novell AppNotes

Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.