I can see from the 5 minute output rate I'm transmitting 140Mbps of traffic, but that's averaged over the last five minutes. So not right now, and no better than Cacti et all.

I could enter the interface command load-interval 30 to bring down the sampling rate to 30 seconds, at which point the txload and rxload values become more accurate.

Again though, if I need to find out which link is maxed out right now, I find it difficult to believe that for all the amazing things Cisco routers and switches can do, they can't tell me current tx/rx rate for an interface, right now.

I understand a sampling period maybe required for such a figure to be attainable, what's wrong with 1 second? Surely the CPU demand isn't that much, just counting the number of packets that pass every second and their size?

Has someone perhaps developed their own way of working this out? I see others haunted by the same conundrum?

UPDATE: I should have given better context to this, so I will try and do so now.

A typical need for me to know an interface throughput right now, is when a customer who has a 100Mbps port rings up and says "Hey, I'm download X from your mirror server but its only transferring at 20Mbps". I want to be able to see their switch pot throughput right now, whilst they're on the phone to me, to confirm this (non-tech customers often report incorrect values in my experience).

So, in this scenario I can confirm whether the customer port is already receiving 100Mbps so they only get 20Mbps because their port is at capacity, or if they are even transferring at anything like the speed they are claiming (whether it be above or below). Also, next I will want to see the throughput of the router that switch is terminated on for all those customers, this is another potential bottleneck. Also, I can check the switch port of the mirror server whilst the transfer is in progress.

I don't want to respond to the customer with "OK, well make sure you continue to download for another 5 minutes so I can wait for $NMS_OF_CHOICE to poll the interfaces", not an acceptable answer in the customers eyes. I can give many more scenarios, but essentialy, complaining customers is top priority :)

How is the throughput at any given second going to be of any use to you. Network traffic tends to be variable. If you have it set to 1 second, and you miss that one-second interval that a particular bit of traffic happened, then what have you learned?
–
ZoredacheFeb 14 '12 at 22:03

Hi Zoredache, I have updated my post. I hope this will make it clear now :)
–
jwbensleyFeb 17 '12 at 9:50

4 Answers
4

I'm not a switch designer, so so in terms of what the cost would be to monitor at this frequency I can't say.

What I can say, is that I have run into the situation where even a second isn't always enough because you can overun the buffers in periods shorter than a second. So if you want to know if a link is maxed out, I recommend looking at the dropped packets (You can monitor this via SNMP as well). If you are dropping packets (a few here there is okay, but a lot is not good) then you are demanding more than the interface can handle. This can also happen on your servers before they even hit the switch. The precise rate of dropped packets generally isn't important, but if they keep increasing each show interface you are likely in a bad place.

In terms of Cacti, that is not a limit of the switch or SNMP. SNMP records the bits sent and received as an increasing counter, so if you poll every second, you will get per second resolution. The way it works is the time stamp of each sample is taken along with the current count. It then takes the difference and expresses it in units of "per second" but in reality is is really a per 5 minute rate converted or expressed as per second.

If you try polling SNMP every second though, you better watch your CPU.

Your interface is transmitting and receiving at full gig speeds. It doesn't actually transmit anything at 140Mbps, that's just what it averages out to over the interval. Real time traffic utilization would be useless to you as a human reader because it would be a constant flipping back and forth between sending/receiving at 100% and 0%. If your concern is how quickly you can identify a network problem then I would suggest what @Kyle-Brandt has said above. Dropped packets are the best indicator of an over-utilized link.

Primarily, I want to identify bottlenecks when trouble shooting reports of slow speeds between point X and Y in the network. Its not a continuous measurement, I understand how that wouldn't be useful. Thanks for your insight though, good idea regarding dropped packets for over utilized links, but I don't just want to know when a link is at capacity, just what its current utilization level is.
–
jwbensleyFeb 17 '12 at 9:53

Polling network interfaces via SNMP every 5 seconds is not something administrators should be doing often as part of a permanent monitoring solution. However, for monitoring on an ad-hoc point in time basis, there are a few cases where under 60 second polling interval can be useful.

Understanding polling intervals -- as they go both up and down, is paramount to being able to interpret the data coming out of the tools.

As a fictitious (but concept seen many times) example, an interface that registers 90% utilization on a 5 second interval may not lead to end-user perceived problems -- however, that same interface at 50% utilization on a 60 second interval may in fact lead to end-user perceived problems.

The error in thinking most administrators make is assuming the 50% utilization on a 60 second interval is somehow "less than" a 90% utilization on a 5 second interval. It is not "less than" -- it is not "greater than." The short answer is that the utilization figures cannot be compared as if they were equivalent figures because the intervals are different.

To dive a little deeper -- and show how extremes can affect the math -- it would be possible for the interface to be operating at 100% utilization for a full 30 seconds -- then go silent for 30 seconds -- and the utilization on a 60 second interval would still be 50%. During the 30 seconds of 100% utilization, end-user applications experienced enough packet loss and/or delay to timeout/break displaying a message.

Compare that with a 90% utilization on a 5 second interval. A case where even if the interface were at 100% utilization for 4.5 seconds, then silent for 0.5 seconds -- resulting in a 90% utilization on a 5 second interval -- the packet loss and/or delay may not be enough to cause the end-user application to react just yet.

The above are completely fictitious examples -- however, the concept has been witnessed many times. Cogently assessing over-subscription/over-utilization of an interface relies on knowledge of the monitoring tools, understanding the monitoring/polling intervals and interpreting the output of the monitoring/polling tools, and behavior of the applications in use.

Thanks Weaver some really good points here, I hadn't thought of yet! I have posted my own answer below, perhaps I will factor these statistics into my script as well, so rather than just giving throughput, it gives drops and discards etc, for stats that more accurately reflect what's going on right now. Good idea! :)
–
jwbensleyFeb 17 '12 at 9:56

I have written an expect script to poll for the bytes sent/received every second for me, from the output of the "show int xxx" command. The difference between each second is the traffic through the interface.