How to Calculate Network Availability?

Is the application down or not available? Is the root cause of the problem the network or the backend systems?

I am sure that everybody understands the difference between uptime and availability, however I still see these terms used incorrectly as synonyms. Think about when you are troubleshooting a remote application issue: you have to determine if the application is down or if is the network that is rendering the service in accessible to remote users.

Uptime

Uptime refers to the amount of time that a server, cloud service, or other machine has been powered on and working properly. This metric is expressed in years, days, months, minutes, and seconds. For example, all Unix computers and network equipment implement the uptime command, which has the following output:

1

2

user@unix# uptime

10:28:24up16days,1:24,1user,load average:0.16,0.03,0.01

1

2

switch# show version | include uptime

switchuptime is2weeks,2days,2hours,30minutes

Availability

Availability is the percentage of time, in a specific time interval, during which a server, cloud service, or other machine can be used for the purpose that it was originally designed and built for. The formula most commonly used to calculate uptime is the following:

Availability (%) = Uptime/Total Time

Where

Total Time = Downtime + Uptime

With this formula we can derive the maximum amount of downtime that a service can suffer in order to meet its Service Level Agreements (SLA):

Availability

Downtime per year

Downtime per month

99.999%

5.26 minutes

25.9 seconds

99.995%

26.28 minutes

2.16 minutes

99.99%

52.56 minutes

4.32 minutes

99.95%

4.38 hours

21.56 minutes

99.9%

8.76 hours

43.8 minutes

Ideally, most enterprises (and cloud service providers) are aiming to achieve five nines (99.999%). In reality few of them meet this goal. The important thing to keep in mind is that, in order to generate five nines reports of network availability, it is necessary to use a tool that provides one-second accuracy. NetBeez can monitor a specific resource down to one-second interval with PING. The central server also generates availability reports for each agent.

Network availability

While the formula to calculate network availability is fairly simple it can be tricky to determine what should be included in the calculation.

Let’s take the case of a medium-size enterprise that has several remote offices with their own Internet connection. Each office connects back to the headquarters via a VPN tunnel. Users at each remote office need to access a mix of internal applications across the VPN tunnel and external services via the local Internet connection.

What is the network availability value if the VPN tunnel is down, but not the Internet connection? How does this value change if the Internet connection goes down?

We could calculate network availability based on the uptime of the Internet connection, because if the Internet goes down, it will affect both internal and external applications. However, this is not realistic because, on the other side, a failure of the VPN tunnel would still impact the access to internal applications but not external applications (partial outage). In this case, network availability can be calculated as the weighted average between the availability of the Internet connection and the availability of the VPN tunnel:

The two weights in the formula should be based on the percentage of work that is dependent to that specific resource. These values should be set and discussed with the business architect.

For example, let’s assume that 80% of the business applications used by the users are external (Internet) and that the remaining 20% are internal (VPN). If in one-year time interval, the Internet connection never failed, while the VPN tunnel was unavailable for 1 day, then the value of the overall network availability for that location is:

Network Availability = 80% * 100% + 20% * (99.726%) = 99.945%

I hope not to have caused too much headache with this article. I would really like to hear back from you about how you measure and monitor your network availability.