Load Average versus CPU Utilization explained

We have customers that we provide support for both unix and windows based systems. We like put metrics for these systems into our cacti monitoring system, especially performance based values. Here is an explanation I provided to a customer as we recently deployed a Linux based system for their MySQL database alongside their ASP.NET based web app:

—

On unix based systems, the metric of Load Average is based on the number of processes that have asked the kernel for cpu time and are currently waiting for that to be made available to them.

Ideally, you want your hardware to be powerful enough so that your Load Average is always below 1.0, meaning that when a process asks for cpu time it gets it without having to wait. It’s called an average because the cpu scheduler in the kernel reports these as an average over the past 1 minute, past 5 minutes and past 15 minutes and that is what you are seeing in that graph.

This is a completely different perspective than the graph that shows specific percent CPU utilization that we have setup for the Windows system.

It is possible to have 100% cpu utilization of a system, but a load average of <1.0. In this scenario, there is only One process that is asking for time and it is using all the cpu it can. The Load Avg is 1.0 or less because there aren’t other threads/processes that need the cpu as well during that period.

The more cpus, the more processes can concurrently request/have access to cpu time before the Load Average starts to reach 1.0, even if those processes are maxing out their individual cpus.

With more powerful cpus, the quicker a process will complete a task before the next task gets its time, so the “run queue” is emptied quicker, thereby keeping the Load Avg lower

Remote Desktop Authentication controls

I was setting up a MacMini running Leopard for a client and I was tweaking some of the options and controls for “Remote Management”. I had connected in with Remote Desktop Admin with an admin level account and wanted to tighten up some of the controls for who was authorized to connect. Using this screen here:

I clicked on the “Only these users” button, without having put the account I was connected as into the list, and got dropped from the connection. Be warned that all of these controls take IMMEDIATE effect. Thanks Apple.

Re: Service Reliability

An email exchange I had with a very smart colleague regarding how one defines “Reliability”, specifically in relation to Active Directory, of which I admit to knowing very little, so the discussion mostly centers on philosophical perspectives.

On 8/18/07 12:48 AM, “Wm.” wrote:

>
> So what you are saying is the protocol definition makes provision for
> what the client is supposed to do in the attempt to hide server
> outages from the user?

Yes. If that is what the protocol is designed to do.

> Do the following protocols define the service
> reliability as a function of the client’s ability to find a working
> server? (HTTP, NTP,WebDAV, AFP, SMB, FTP, IKE, VNC, IPP, ARD, IMAP,
> POP, SMTP, LDAP, Kerberos)

Not that I know of:
WebDAV, AFP, SMB, FTP, VNC, IPP, ARD, IMAP, POP

Yes:
NTP

Maybe:
LDAP, Kerberos

I don’t know:
IKE

Kind of:
HTTP, SMTP

> AD is a summation of LDAP, Kerberos and some proprietary mechanisms
> that Microsoft stacks on top. The actual’service’ is either LDAP or
> Kerberos depending upon the request.

ok. That’s more than I knew before having never looked at what AD is.

> It is kinda like going to Avis and renting a car. They put 30 on the
> lot and it is the client’s job to find the car ‘service’ that works.
> This alleviates the Avis ‘server’ from responsibility for having
> working ‘services’. As long as one car starts, no problem. Half of
> them may not start but it is not a problem unless all of them don’t
> start?

Not a very good analogy, or merely one that just fits to bolster your point.

You are trying to take your particular definition of “service reliability”
and apply it in a one-size fits all manner.

The true answer (like most things in life) is: It Depends

Some protocols are merely information request oriented protocols, like DNS
or NTP. These protocols tend have methods for dealing with inaccessible
sources of data.

Some are data access/transaction oriented protocols (any file sharing
protocol, mail). These protocols tend to not have “redundancy” as the
ability to have a data exchange transaction be mirrored across physical
servers difficult due to it being harder to replicate data across those
physical systems.

Some protocols have a pseudo redundancy in them. HTTP can redirect a
requestor to a different source to complete the data exchange. SMTP can give
a temporary deferment for data exchange.

As for AD: From what little I know of it, it is mostly a information request
system (where am I, can I get an auth token using these credentials, is this
system authorized to access me, etc) and that data can, for the most part,
be replicated easily across systems, much like DNS is replicated across
servers to serve out the same answers for a single question.

So a better analogy for the Avis world would be something like:

Are there multiple sources for me to get an answer to the Ultimate Question
of Life, The Universe and Everything:

“Dude, Where’s My Car?”

If none of the agents can answer the question, then yeah, the service sucks.

One thing I sent to Bill offline that I’ll include here for the benefit of
anyone else that is still reading:

—

If the protocol in question has to use BROADCAST traffic in order to
discover redundant systems to query then I consider that to be a poor design
choice for redundancy.

—

> On Aug 6, 2007, at 3:08 PM, Brian Blood wrote:
>
>> On 8/6/07 12:20 PM, “Wm.” wrote:
>>
>>> Does anyone recall coming across a white paper relative to measuring
>>> service reliability and collecting metrics on such.
>>>
>>> I am in a discussion with Active Directory admins who insist that if
>>> an AD client can root around and find a working server, then their
>>> service reliability metric is 100%. My stance is that service
>>> reliability is measured not by the workaround that the client
>>> performs but the availability of the service at the server’s point of
>>> presence (aka domain name).
>>
>>
>> I think you are dealing in semantics here.
>>
>> Look at DNS for an example.
>>
>> With most systems, a domain name is handled by two dns servers.
>>
>> If one of these is down, then the other covers traffic that would
>> have been
>> down.
>>
>> This redundancy is part of the dns protocol.
>>
>> While the DNS Service as a whole would have 100% reliability,
>> because of how
>> the protocol is designed, the reliability of the specific server
>> would not
>> be 100%.
>>
>>
>> So, the answer, as usual, is: it depends on what system you are
>> analyzing.
>>
>> If an AD client as part of the AD protocol can look for multiple
>> servers to
>> auth against, then the reliability of the AD SERVICE on that
>> network will be
>> measured as a whole.
>>
>>
>> In short, I think you are wrong.
>> 🙂
>>
>>
>> Brian

WordPress Post mangling – quick change to keep 2 past revs of a post

I had been working really hard on my post on our super duper mail server and at some point I started having some really weird interactions with the tinymce editor. I was switching back and forth between the raw HTML editor and all of a sudden I only had the middle 60% of my post. Stupidly I hit Save and lost a good chunk of my valuable words of wisdom. I was able to recover most of the text from the original email, but I was a bit perturbed there wasn’t a revert feature.

Slow laptops – Drive speed vs Drive size, RAM, VM

Does it cause pain and or destruction to our machines in any way to
force quit applications?

Andy-san and I are both too impatient to wait for laptops to close
3 programs so we can start up two others.
Force quit much faster. It is very snap-snap, but we worry about
snap-snap ohhh… you break it.

My reply:

Basic question:
Why are you closing the applications in the first place?

The OS will reallocate RAM to your active applications on an as-needed basis.

Also, you don’t HAVE to wait for an application to finish quitting before doing something else. Use Cmd-Tab to bring up the application switcher, hit tab to scoot over to the app you want to quit (still holding down the Cmd key)…. then hit the Q key and that will send a quit command to the app. You could do all of this while staying in your current application.

In general, I would NOT do what you are doing in force quitting. The most immediate thing I’d be concerned about is corrupting documents, pref files, etc…..

They were doing tests of the nominally sized 100 GB 7200 rpm drives and comparing them to the bigger and slower (5400rpm) drives. What they found was very interesting. In general, yes, the 7200 rpm drives did usually beat out the slower 5400 rpm drives, but they then redid the same tests when the drives were loaded with 74GB of STUFF. The 100GB 7200rpm drive at 74% full showed a considerable drop in performance compared to the 160GB 5400 rpm drive which was only 50% full. Enough of a drop to make the drives perform about the same.

Now, to bring the relevance back to your situation…..

As you get more and more applications/data open on your system, the OS will actually save out sections of RAM onto your disk and read them back in as necessary. So, the more RAM, the less of a possibility of needing to swap out to disk…. Ultimately, over time, as you use your computer it will almost assuredly need to swap things in and out (that’s what the spinning beach ball is when you flip back to an app that’s been sitting in the background for a while, it’s swapping data back into ram from the hard drive)

So, the FASTER the hard drive is, the faster it can SWAP. Again, you want to avoid swap, and that is where more ram comes in. (see the vicious cycle here?)

If you have older laptops…. it’s likely you have even slower hard drives in them: 4200 rpm; further aggravating your performance issues.