After writing about being prepared for an interview, I started covering the aspects of a technical interview. I also shared with you how I conduct my interviews and how important it is to have great troubleshooting skills if you want to work in IT.

So, starting today, I am going to share with you what I consider to be the basics of troubleshooting. I am going to start with a very simple scenario, and in the following blog posts I am going to explore more complex troubleshooting scenarios and technologies.

Disclaimer

The scenarios I am going to use for these posts will vary in complexity and they may not be as accurate as you may come across in your organization.

If you are a seasoned network engineer, systems administrator or help desk engineer you may find these troubleshooting scenarios very simple.

Keep in mind that the examples I am going to use here have the sole objective to expose you to basic methodology of troubleshooting.

In each scenario, I am going to emphasize what are the logical steps that I expect from a candidate or support engineer when troubleshooting a particular problem. In most cases there will be more than one logical path that will help you identify the problem. That is OK. I just want to expose you to the tools, concepts, naming conventions and technologies that you should know to be successful in this career.

It’s Troubleshooting Time!

Welcome to Acme Corporation

I am going to use a fictitious company for our troubleshooting scenarios. Acme Corporation is a small company and it has a very basic IT infrastructure. The diagram below will be used to illustrate our troubleshooting scenarios.

Acme has a few users connected to their Local Area Network (LAN). For this scenario we are going to work with a laptop that has a fixed IP address of 192.168.1.200.

Acme has a Windows file server. Its IP address is 192.168.1.10 and its Fully Qualified Domain Name (FQDN) is app.acme.com.

In this very simplistic network, all devices are connected to a single network switch.

Acme’s router’s IP address is 192.168.1.1 and its FQDN is router.acme.com.

There is a firewall between Acme’s LAN and the Internet.

We are going to connect to Google’s servers. The IP address of the server we are going to use here is 173.194.115.18. Its FQDN is www.google.com.

The Internet is represented by a cloud.

One of Acme’s employees works from home.

Assumptions

For this first troubleshooting scenario, we are not going to take the firewall (5) into consideration. We are going to assume that all traffic from the LAN can reach the Internet (7) and vice versa.

We are not going to worry about protocols, PAT, NAT, etc.

Acme LAN users should be able to ping all devices on the LAN and WAN (Wide Area Network, in this case the Internet).

You should be able to troubleshoot all the issues on your own.

We are just going to identify where the problem may be located.

You are working from the same office as Acme’s user. Your PC, running Windows, is connected to the same switch.

For these scenarios, we are going to assume Google has only one server (173.194.115.18) and there are no routing issues on the WAN. I am not going to require you to know the command traceroute (or tracert on Windows) yet.

Troubleshooting Case

For this case, assume that you are working as a Help Desk support engineer. You are at your desk, the telephone rings and you need to help the user on the other end. Here we go…

John, one of Acme’s employees explains that last night he was working on a report for his boss. He was using Google to research data about widgets and everything was working great, but this morning, when he got to the office, he could not connect to Google’s search page at http://www.google.com. He tells you that he can connect to the XYZ application, which runs off the server app.acme.com. It’s 8 AM and he needs to get the report ready for a meeting at 10 AM. He needs your help.

So, what are you going to do first?

Before you start troubleshooting any issue, do the following:

Make sure you truly understand what the user told you. If necessary, ask more questions about the problem.

Write down all the important details, such as IP addresses, URLs, application and server names, and times when the events took place.

Let’s get started…

In this case, the user gave you some important information:

He was able to access www.google.com the night before.

He cannot reach the same URL this morning.

However, he can connect to Acme’s server on the LAN.

I would expect you to do the following:

Open a DOS window.

Type the following command: ping www.google.com. Ping is a great networking utility that can be used to test if a device can be reached.

Based on the picture above, what have we learned?

We know that the Domain Name System (DNS) resolution is working. DNS translated the URL www.google.com to the IP address 173.194.115.18.

We know that we cannot ping that IP address, as we are repeatedly getting the Request Timed Out message.

What you can do next, is to ping another website or IP address on the Internet. For example, you could try to ping www.yahoo.com.

That worked. We can see the reply messages from 98.138.252.30.

We have just confirmed that we can reach the WAN, however we cannot get to Google’s server at 173.194.115.18.

The conclusion for this first scenario should be the following:

The server we are trying to reach is unavailable at this time.

We confirmed that we could get to the Internet and reach another server (www.yahoo.com).

Resource List

Below is a list of links to important concepts and information that you should be familiar with.

What’s Next?

Next week I am going present you with a variation of the scenario detailed above. You may want to read about Traceroute and Nslookup, as I am going to use these two networking utilities on the next troubleshooting case.