Posted
by
Roblimo
on Thursday December 30, 1999 @05:47AM
from the you-can-prove-anything-with-statistics dept.

Bex writes "Ever wonder how your server uptime is when compared to others with
different operating systems? Ever wanted some hard numbers for the
Linux vs NT or FreeBSD debates? Check out uptime.hexon.cx
for a list of servers and some interesting number on uptimes. It looks like FreeBSD stomps all over everybody else, with a whopping 1994 days of uptime for one server, and a 138 day average uptime. NetBSD is second for max uptime, but better on average. Windows 2000 is second to last, just barely beating out BeOS with a paltry record of 49 days. Its average uptime was under a week!
Remember, downtime doesn't always mean a crash, but is is a good indication of how often a machine needs maintenance."Update: 12/30 10:33 by R: There's a new version of the uptime page here.

C'mon, folks, this post isn't insightful, IMHO it's just plain off topic. (I think, anyway -- there might be some point I just haven't grasped in there). Uptime is measured for servers -- not home machines, so no, it isn't natural that home machines would lower the average.

Why not? because NOT THAT MANY home machines are hosting websites, because Joe AHU (Average Home User) doesn't have a static IP address -- he's got a dial up connection. So virtually every point made in this post doesn't add up to a lower uptime for windows.

The uptime counter bug mentioned by other posters and the BSOD (blue screen of death or black screen of death) are what accounts for it. That and the fact that most of the Windows operating systems have historically had serious memory leakage problems... [memory leak: when the OS loses track of which RAM has been allocated and/or deallocated to or by a specific program)

Given that IBM's OS/2 and Microsoft's various Windows operating systems shared a similar code base, doesn't it seem weird that OS/2 never seemed to suffer this problem?

In fact, it did.

In 1997, a team at Ford Motor Company had noticed that, after 6 to 8 weeks of operation, our OS/2 v2.11 machines would begin executing once-per-day tasks several times during a day, and after several such executions, the computers would crash. Further analysis showed that those tasks were being executed every 1h09m54s. I spent a day trying to manipulate that number into something meaningful, and gave up in frustration. We assumed that our own code was to blame, and rewrote it several times (to no avail).

After rewriting our code seemed to have no effect, we decided to install the latest set of O/S patches on our machines. On a Sunday, we moved between machines that were scattered over a several hundred acre manufacturing facility.

Black Monday

Seven weeks and one day later, the facility started building units. Within hours, the OS/2 systems started showing the symptoms that led to the crash, in the exact order in which we upgraded them!

The coincidence was too much to escape notice, so we called IBM Technical support. Their Level 1 guy spent about five minutes talking to us before he realized this was a deep O/S problem, and we were kicked to Level 2 support. The Level 2 person heard our version of the events ("The machine flakes out and every 1h09m54s executes a task that should only happen once a day"), and asked, "are the machines crashing 49.7 days after being rebooted?"

BINGO!

Apparently, someone inside IBM had noticed this problem a few weeks before we did, and had the patch in final testing when we called. (I think the patch was #XR2011 or #XR2014). However, since we were a customer, our bug report took priority over theirs.

The Problem

Someone used an unsigned 32 bit integer to record the number of milliseconds since the O/S was booted. That number rolls over after 49 days, 17 hours, 2 minutes, 47.296 seconds. The symptoms we saw began the day previous to crash day due to the rollover that occured when our code scheduled a task for "tomorrow".

The Moral

It's too bad that Microsoft and IBM were not on speaking terms at this point. If they had still been working together, MS would have had a fix two years ago.

You think that the average workstation has the same uptime as the average server? I'm not refering to possibilities, I am refering to real world usage. Which is what I am questioning in this survey.

This "kid" does'nt need his 10+ years in electronics/computing to know the above.

So by your definition, a Workstation can instantly become a Server by just leaving it on. So I don't have to enable any services, no ftp or http daemons? No SMB or DECNet? Not even a little Appletalk or IPX? Nothing but an untouched power switch eh?

So you gunna give me more info on these 2 incredible OS' you have written or are you going to remain just another AC kiddie?

No MacOS - I know it's not great, but I bet a machine running appleshare server would do pretty well...

Only machines which have either binaries available (SunOS, Linux, Windows, FreeBSD) or PERL are eligible to partake.

Would someone running OpenBSD be running PERL for instance? (is it even included???) Or run an unknown binary?

How about Irix boxes that just sit and calculate all day long... Why would they want perl on their system, let alone an unneeded program?

This project is a JOKE. There is no way in the world that this can be subjective... Notice the heavy skewing in the number of linux boxes on the list vs others... Now look at the real world... All those NT boxes are probably admin'd by people who have no business setting foot near the console of one of those boxes.

>I think the answer is that the average Solaris >admin comes from an NT background

I'd like to know where you came up with that.

Also -- if, as you assert, certain OS's attract various levels of competence then wouldn't an IS decision maker who wants the >best uptime still choose the OS with the best track record? After all, if BSD sysadmins are so much better than Solaris admins (who are clearly just washed-out NT admins) then why not deploy BSD and hire the technically superior BSD admins?

Quite recently, my FreeBSD-based development box was rebooted. Violently. The machine had been up for about 320 days, when somebody accidentally removed its power cord in a co-location facility. I had a ssh session to the box at the moment, and pretty damn near shit in my pants when the machine stopped answering. I got to sigh in relief in a few minutes, after the machine booted and got all its services running.

Which gets us to the subject of the day. The reason I was so worried was that after 320 days of tinkering, new IP numbers, new system software, new services, old services scrapped, etc, I had no idea if the server would come up on a reboot or not.

Frequent, comfortably scheduled reboots tell you whether your machine will come up after an unexpected interruption or not.

That's pretty much the only direct problem with long uptimes. However, glorifying long uptimes has other, well known drawbacks. The most important of them is, of course, putting up important kernel security updates in pursuit of long uptime.

It measures nothing. In particular I really doubt that most Solaris people come from NT backgrounds. Rather, most people running serious machines will simply not bother to install some random daemon to let other people know information that they don't particularly want to give away. I can see Solaris machines with 285 day uptimes from here, and they aren't particularly special.

I also must take issue with the `uptime being a point of pride' thing. If the machine doesn't have any particular state (say it's an NFS server), and it's going to take some time to work out what's wrong with it, the professional thing to do is just reboot it: only some idiot who isn't accounting for their time properly is going to spend half a day trying to work out what's wrong with it without rebooting first to make sure it's not something transient

And anyway, no sysadmin worth his salt will place high uptime ahead of keeping up to date with kernel patches.

HUH?!?! Any sysadmin worth his/her salt would know that you should only bring down a production server if NEEDED. If you don't need the latest kernal patch, say it fixes a bug in a driver for card XYZ, but you don't have card XYZ, do you need to patch? Heck no. Keep the baby running, and your clients/users happy, by NOT patching just for the sake of patching.

Maybe when you say "up to date patches" you mean patches actually needed, but if you really need good uptime you could design the system around drivers that are very stable and unchanging thus reducing the chance that you will need to patch.

IIRC you can still find some pre-1.0 linux kernels out there running. Why? Because the system requirements haven't changed, if it is stable and secure and does what it is supposed to do, don't mess with it.

First of all, I have been using Linux for 2 years as a hobby and DecUnix and DecVMS for more than 4 years whilst I worked as a PC/Telecomms tech programmer for my local Stock Exchange. I also used Microsoft products.

When I state service, I am refering to a service that is served back and/or forth between the server, other servers and their client machines via some transmission line, ethernet or whatever.

Sure, a Mac can be serving a user Photoshop but this is not what I am refering to as being a server.

The common defenition of a workstation is one that has its CPU and disks used by the person in front of that particular computer, usually by the same person and usually for 8 hours a day. For the other 16 hours of the day, these machine are usually turned off to conserve power. Unless of course they have shared file systems or printers that are required to be up 24/7. But for the most part, this is not the case. Any decent sys admin will have this type of task delegated to a machine that is specifically set up to serve these tasks, which is good for backup ease.

The servers on the other hand usually stay up for obvious reasons. Mail serving, file, print, web, db, whatever.

"All computers are nothing but mere terminals jacking into the great global network."? You've been reading too much Cyberpunk fiction buddy.

But if it impresses you, my videocard'less OpenBSD firewall/NAT gateway is accessed through our network via its serial port through a DECServer700 from anywhere in the network which has a ssh client. (Whoops I better be carefull with my usage of the word client there, I would'nt want to confuse another AC).

Wow. This sort of surprises me. I know some of you out there did lots of the set-it-and-forget-it Netware 3.12 systems like I did.;) You never heard back from the client - the server just didn't go down. I have a Netware 5 box with 400 days of uptime back at work and another 4.11 that would be up there too if it weren't for a 7 hour blackout. Ah, but alas, this is/. Linux rules the day.:)

Someone hasn't screwed up.. as you can see.. IF you read the page there's absolutely nothing scientific about this uptimes project... IT's a project for fun.. not for research.. but most slashdot readers seem to be missing the point... COMPLETELY..... Tgm just wrote something that wasn't there yet.. and a lot of people have joined his small project.. If the slashdot readers would read the pages before commenting on it then the world would be a better place... I think!:)

Windows 2000 is second to last, just barely beating out BeOS with a paltry record of 49 days.

Let's at least be fair here. Windows 2000 has not even been released yet. 49 days ago they were at beta 2 or something like that. Let's at least wait until it's released before we start bitching about how much it sucks. I mean, we don't want to sink to their level and start spreading FUD now do we??

I don't know about others, but I've never had to reboot my BeOS system because of a crash. That being said, what exactly is uptime? I mean, I have had the network server crash on Be, but, because of the micro kernel design, I just restarted the network server. No reboot. Is my machine still "up"?

Also, I imagine most people running Be also dual or triple boot. This might explain the short uptimes listed. I'll usually leave my system in BeOS unless I'm banging out some Java or playing some games with a friend.

well the best test for BeOS uptime would be it doing real work such as maintaining a theme park http://www.lcsaudio.com/installations.html (sidenote: wow! I didn't know Hayden Planetarium will use BeOS. My brother wanted to go there on a date only to find that it was closed for renovations)

I don't think it is really fair to bag Windows 2000 for having an average uptime of 5 days. Don't forget this is a cutting edge MS operating system, and you are going to need reboots for upgrades. It would be fairer to judge that six months or so after the release.

I do wonder about the 49.2 max uptime for Win2000 & 95, though. There was a bug in Win95 that would crash it after roughly that amoutn of time (Can't remember teh exact number of days) - the Win2000 uptime looks suspiciously close to that, too.

I was a little surprised about the BeOS stats, too, until I realised there was only 4 BeOS machines in the survey. No Macs, either.

There is also no way to compare what the machines were doing. A hardcore development or games machine is much more likely to crash (or reboot) often than a machine doing nothing.

Conclusion? Interesting, but don't read too much into the results. It is nice to see some of the really high uptimes, though.

It's a great idea. We need solid MTBF numbers for operating systems. It's just that their data collection strategy needs work. Some people have argued that it's not meaningful to collect data of this type. That's wrong. The mainframe community has done this for decades. It's a basic management tool for any serious computing operation. The "uptimes.net" people are on track; the implementation just needs some fine tuning.

Here's what I see as the technical problems with the "uptimes.net" approach:

Their current reports show uptime, but not why an "up" period ended. An uptime client should report why the system went down. NT systems log this in the event log. UNIX systems have it in the crash dump. Systems with UPSs attached have UPS status info available, so you can report power problems explicitly. There will still be some "reason unknown" entries, but they should be the exception, not the norm.

Reports should distinguish between requested shutdowns, power failures, and system crashes. Reasons for system crashes need to be logged; that's valuable information. High-quality logging of what went wrong is the beginning of quality control.

These are all fixable, and fixing them would answer most of the criticisms given in previous postings.

I suppose you are right (well, you just are:) but what I meant to say was that i think it would be great if the community would work together on a project which would do this as scientifically as possible.

The operating system is only one (small!) part of what makes a reliable system. Staff, procedures, and methodology are all more important. The most reliable OS in world will still crash if in the process of rushing a system into production, someone decided to run the power cord across the floor and someone trips over it (don't laugh - this happened to one of the places I worked before). Don't get me wrong, it's cute and fun to compare uptimes (my NT PDC and BDC have been up for 169 and 183 days respectively), just don't expect those numbers to correllate into the real-world when it actually comes to getting work done. If you plan, test, and implement properly and consistiently, any modern server OS will perform adequately. Always remember - the right tool for the right job.

VMS is by far the most stable general purpose operating system, and puts Unix and Windows to shame.

At DECUS '97 some power company announced that they had a VAX which was up since 1985, which easily stomps out FreeBSD and the other little Unix computers on this little survey. Show me a Unix machine which has been up for 12 years!

You are correct that Novell 3.12 + 5 Page Long Patch List is a very stable configuration.

However, back in the day when Novell 3.x was a current product, I don't quite remember it that way. Our servers seemed to abend about once a month - more often if they were running non-standard stuff like CD drivers or backup software. --

Just that the 49 day thing does not happen everytime (or at all for your boxes) does not mean it does not exist. I had it happen for my SMP workstation (NT 4 SP 5 (at the time)). To be fair, I've also had a UP Linux 2.0.34 box to crash after 497 days when its jiffies wrapped around, and that does not happen for every 2.0 box, either. (NT uses 1024Hz, Linux 100Hz, both have 32bit counter.)

...is the Mac. I know they are not great for high-end stuff (but just wait for MOSX! MWAHAHHAA!!!!) but theyre really good for a cheap low to medium load server, be it mail, web, print, whatever. Set it and forget it. Almost no maintenance. I have an old 7100, on system 8, that has been up for at least 5 months, running my website. Last time I had to reboot it was, like so many others, a power outage. When I get my G4 in January, my G3 will replace the 7100, which will be cool.

Let's at least be fair here. Windows 2000 has not even been released yet. 49 days ago they were at beta 2 or something like that. Let's at least wait until it's released before we start bitching about how much it sucks. I mean, we don't want to sink to their level and start spreading FUD now do we??

While this is true, it points to another criticism of MS operating systems. They seem to require upgrades. Whereas one can install FreeBSD or Linux on a machine and rely on it for years, ignoring new versions, one is apparently compelled to always have the latest-and-greatest version of Windows. This is particularly interesting in light of the fact that aquiring a new version of FreeBSD or Linux need not cost a thing, while there's a price associated with every new version of Windows.

I'd just like to point out that BeOS hasn't been in existence for 1994 days yet. About 1990 days ago, it was running on the Hobbit processor, and people were talking about the soon to appear PowerPC version.

That, plus the tiny number of registered systems, the dual-booting nature of a brand-new OS, and the fact few use it as a server.

Sure it is. It's fair to show that those losers are too stupid to think of anything interesting to do with their computers.:-)

Because of course you simply wouldn't want to accept that these people might be getting almost everything they need or want out of their machines; obviously, they need to be doing something "interesting" in your eyes to validate such users and their OS.

For some people, Win95 is more than they need. So what? It's their money and their box.

What OS have you written AC? It does'nt take an OS coder to know the difference.

You can't rope me into YOUR argument AC. Obviously to everyone but yourself, when I state "server" I am speaking of a computer which has the role of serving services 24/7, and when I state "workstation" I am speaking of a machine that is used by individuals on a daily basis which is usually switched off at night.

The relevance in what I have said, is not in the choice of OS but the application of it.

Keep in mind that "kill" doesn't kill a process. It tells a process to commit suicide. A process that's stuck waiting on I/O (which sounds like what you're describing) will not be killable on any Unix or Unix-like OS. A process must run to die.

You missed the point. When you fly off to a convention, I assume you take your computer with you (a laptop, presumabaly). While you and others might be in the fractional minority, most home machines are not in that category. For example, I'm temporarily stuck on a WinXX machine at work, and have both a Linux machine and a WinXX machine at home. (gotta take care of wife and little kids, and there's not that much good Linux educational software for seven year olds -- yet.) The Linux machine stays up because it is hosting the site(s). The WinXX machines get shut down at the end of the day or session. But since they are not serving websites they don't count against the uptime average for WinXX machines.

Also one has to look at the pure number of clients on uptime.hexon.cx for each OS. Linux has such a low average time because there are 200+ clients on uptime instead of the, what, 30-40 for the BSDs? This gives a much better sampling where a few abherrant ones don't skew the data to far off.

As for the top dogs I've heard that one of the two is just copying his uptime between reboots since the current version of his OS wasn't even out when his uptime supposedly started.

Finally, as for Windows, you won't find any windows above 49 days, 17 hours and some odd minutes. Remember, Windows has the 49 day bug!

First off, I'd like to appologize for this slightly off-topic comment. However, a few months back, a friend of mine and myself were discussing how misguiding the term 'Uptime' is. Say I am running a corporate server on Linux or BSD. I change a setting on it and suddenly no one can get into webpages outside of the firewall. The computer is still on, but it's not doing jack. I don't consider that up. I personally feel that uptime should be rewritten, to not just calculate how long the computer itself has been on, but how long the server has been able to provide the services it's designed to provide.

The glorious BeOS has never crashed on anyone I know of (perhaps 10 hardcore BeOS users). Besides the fact that only 4 BeOS boxen were in the survey, most people who use BeOS also use another OS. I for one have Winblows 98 on here so I can play some phat FPSs. My uptime is very seldom more than 24 hours, often its 19 just before I reboot. Do any other BeOS users feel this is why the BeOS uptimes are pretty low? I know if Q3A were out for BeOS (i've seen it running on BeOS at PC Expo, just waiting for it to be released) then I would almost never need to reboot, except to play Worms Armageddon.

Nothing bleeding edge, fairly competent box for any OS. I installed Windows 98 on it, and it crashed MULTIPLE times EVERY day. At one point I removed just about every PCI card (except video) and tested that - got about 2 days worth of uptime before it barfed on me again. Installed RH6.0 on the SAME hardware and it NEVER, not ONCE dumped on me.

So, yes, take the statistics with a grain of salt but DO remember that there are REASONS why the Windows systems uptimes SUCK - and those of Linux and *BSD don't.

While this is true, it points to another criticism of MS operating systems. They seem to require upgrades. Whereas one can install FreeBSD or Linux on a machine and rely on it for years, ignoring new versions, one is apparently compelled to always have the latest-and-greatest version of Windows. This is particularly interesting in light of the fact that aquiring a new version of FreeBSD or Linux need not cost a thing, while there's a price associated with every new version of Windows.

Right, so taking your comment to the extreme, why aren't we seeing linux machines with an uptime of something like 10-20 years? Upgrades? we don't need no stinking upgrades! Give us linux 0.001 over linux 2.1 any day!

See the point here? Uptimes ISN'T the be-all and end-all. Shit happens. The power fails. Someone stumbles in the power chord. What IS important is

1) Functionality 2) Security 3) Stability

Even linux has to be rebooted to replace the kernel when there's a WEAKNESS in it, otherwise it's not really a good server anymore, is it?

I can see it now, the server admin of a linux box going "ooh! i can't fix this security hole, i'll fuck up my uptime!".

For what it's worth, I personally think linux IS better than win9x, and NT is irking me a bit. However, I do NOT automatically look on uptime as compliments of how good an OS is. I look at functionality, security AND stability... I've had to reboot my linux server quite a lot lately simply due to my wish to keep my server secure. My server isn't automatically an unstable server, nor is the OS automatically unstable. I do this willingly, because I wish to have a secure system. Then again, if I *really* wanted a completely secure server out of the box, I'd run OpenBSD, not Linux.

-m

99 little bugs in the code, 99 bugs in the code, fix one bug, compile it again...

I think I remember reading somewhere the the engineers eliminated 200 instances in which a reboot is necessary with Windows 2000 (from firsthand experience, stuff like changing network settings no longer requires a reboot).

And there are now ways for software developers to dynamically replace DLL's without requiring a reboot. If a software package requires a reboot in windows 2000, then it wasn't designed propperly.

No MacOS - I know it's not great, but I bet a machine running appleshare server would do pretty well...

I agree here. I know of one Macintosh server (My old high school's SE/30, running System 6) that has been going continuously for at least 9 years. (I went back a few months ago, and it STILL hadn't been rebooted since my Freshman year.) Assuming it is exactly 9 years (possibly more) that means it has been up for 3206 days!

As for Windows 2000 only being up for 49 days? IT'S BETA!!! I'm sure that 50 days after it is officially released, the record will be 50 days. Of course it has low uptimes now; it gets a new major release every three weeks. How many of you have a release of Mozilla running for 49 days? (Or even have the same version installed for that long?)

But, I completely understand why BSD is the king. I've worked with BSD for years, and it is by far the most stable OS I have ever worked with. Yes, Linux is good, but BSD just seems to be more inherently able to stay up for that long. Wether it's due to downing for maintenance, or a crash, Windows will never last 1000+ days. Linux may last a long time, but every few months you usually have to down it for SOME maintenance. Of course, the 1994 day record equates to 5.5 years. Any Linux/Windows box that has been running that long is hopelessly out of date for todays uses. A Windows box that old wouldn't have the processing power to do anything useful, or even the capability to serve modern Windows systems. A Linux box that old would have such an old kernel that I wouldn't dare run it.

A U*x admin that has managed to avoid reboot on a production system for so long most likely has not left any known or even supsected h0lez for R00ting...

I have no idea how any admin would do that when it comes to kernel vulnerabilities. Crystal ball, perhaps? Most exploit principles are not new, but the techniques certainly are.

Most holes are at the application level, which is all well and good, and as it should be. They can be easily fixed without a reboot. Exploits of kernel services (tcp stack, for example), require patching and rebooting, and they're not unknown - not to mention that fixing them is generally outside the job description and specialized knowledge of a sysadmin.

Now, it's possible to secure your boxes from outside attack through firewalls and the like. Perhaps even to the point that the weakness is unexploitable from the outside. The feeling following such work is known as hubris, the pride that goeth before the fall. Are you willing to make the claim of total security for every machine within your network, or every employee you have or have had? All these things are potential vulnerabilities; a machine secured to the outside, as you can see, isn't really secure at all.

Having admined a few Solaris boxen for 5 years, one thing I found irritating was the way an errant TCP/IP application--say, Netscape Enterprise Server--could get stuck in the middle of handling a request and end up unkillable. In order to release the port, the only remedy--I swear, ask Sun--is to reboot. Nothing you can do with kill, with proc tools, or by restarting netorking services, will kill a process in such a state, at least through Solaris 2.6.

These statistics are about as good as slashdot polls. (ie if you really believe them, you need help). The big point being, its only those who actually submit there times that are counted, so, like magazine polls, that makes it heavily bias. Case-in-point, a certain magizine in the US (I'm not sure which one) awhile back that predicted John F Kennedy wouldn't win the election. But that was based on their magazine poll, and their magazine was target towards the (for lack of a better word) upper-class.

These are hardly what I would call "Hard numbers". They are even worse than a microsoft sponsered comparision.

Don't get me wrong, I love linux, and I highly believe that yes, windows IS miserable for uptime, and yes FreeBSD does kick ass with uptime. (Linux has more frequent kernel release, and we haven't yet figured a way to upgrade your kernel w/o rebooting).

I'm surprised that an article phrased in such a way that its sounds as if its suppose to be serious would be posted on slashdot.

Thats enough ranting, I'll probably get moderated down as a troll. But seriously, tune out anyone who quotes these numbers as reliable.

You might think, at first, that this measures the reliability of the OS. However there are some other factors here besides the OS, the main one being the competence of the administrator.

The clue here is that Solaris has a much worse uptime than the other Unixes. Yet we all know that Solaris is a damn fine product, and I've seen some Solaris boxen with amazing uptimes.

So why does it perform so poorly here?

I think the answer is that the average Solaris admin comes from an NT background and believes that reboots solve a problem. You get some of these people in the Linux stats too.

Now look at BSD. Who runs BSD? Old guard Unix people, who generally have their sh*t together, and know the hell what they're doing. These are the kind of people for whom uptime is a point of pride, who take it as a grave personal failing if they have to reboot to solve a software problem.

So while I don't doubt that BSD is a robust and stable OS, I think that to some extent the uptime stats reflect the average level of experience of the admins, and not just the robustness of their OS.

I would guess Solaris makes a much better showing if you can eliminate this effect. BSD would still presumably edge out Linux (since uptime is what BSD developers and users strive for, I think the OS provides it), but not, I think, by a 2:1 margin.

I've never whined before, but i possibly see how i was "redundant" with that remark... I wouldn't have said it if someone else had pointed this out...

Anyways...

1 - who cares how much power you have? For most people, today's computers are just overkill for their needs... (Gamers, programmers, and other power-user types not included here)

2 - To show how dedicated you are? By installing a piece of software and then not shutting down your computer? WOW... that's such dedication...:)

3 - You're not figuring anything... your computer is dumbly trying one key after another... If you win, it shows nothing about your skill or your computing power, just that the keyserver happened to dole out the winning key to you.

4 - Yeah... I'll agree with that one... it's cool to actually know your CPU is doing something... rather than how it is right now, probably around 0% usage as i type on slashdot....

If you have two FreeBSD boxes, sitting in the closet since 1995, they have significant uptime even averaged between themselves. If you have 5,002 Linux boxes, two sitting in the closet since 1995 and 5,000 rebooted on odd chance, you have a heavily skewed bias.

Without basis on why the machines are up/down and factoring that into the averages, it's merely pretty pictures.

I have Linux boxes filtered and firewalled that have been up for years. Due to denial of service attacks to the vulnerable kernels they are running, I can't and won't post them. I will however say that two of these boxes were listed as #1 and #3 on the previous uptime site a couple of years ago. #1 had an excess of 500 days when the site disappeared.

Depending my boxes' job, it may be rebooted several times a day or it may be up for months at a time. I do a lot of code development and testing in/out of the kernel so I have a lot of boxes that get rebooted. I also have a lot of boxes that gather dusty electrons month by month. A few of the boxes I build kernels for crash. Dev kernels do that once in a while. By far however, the systems are completely stable.

All of my machines that lost large uptimes lost it 100% due to power loss.

Let's try and view these figures with an understanding that the Linux boxes outnumber all the others combined by a large number. I'm willing to bet that most of these Linux boxes are personal machines rather than black box setups.

who would have thought anyone would ever use BeOS for a server. The comparison between Linux and FreeBSd is really interesting, considering how much publicity Linux has been getting lately as being "ultra stable". Statistically FBSD is more than twice as stable as Linux. Realistically this probably has more to do with the huge number of different Linux kernels used in the tests and the comparitively few different FBSD kernels used. It is interesting to think about though, are the half dozen different Linux distros more or less stable than others? I personally think that FBSD's development (Open and Net also) makes for a bit more stable of an OS.

You know, looking at those stats a thought struck me - despite BeOS managing a high of 35 days uptime, the average of four machines was just five days, implying there were a number of instant deaths. Pretty bad yeah?

Now compare this to the 49 day high of Win NT (40% improvement) achieved from a pool of seventy-two NT servers (1800% more machines) and BeOS is pretty much doing as well as NT. Especially when you consider the amount of NT trouble-shooting being done by both MS and the community in general - When was the last time you saw BeOS mentioned on Bug-traq? Nt does manage to improve on BeOS' average uptime though - twelve days... Must be why NT admins get uptight every fortnight...:)

I feel the need to bitch about some of the numbers is see, I'd like to know what job those systems are doing, BSD computers more often than not are a network server or something similar, you don't just turn that off. On the other hand, I turn my win NT machine at work off every evening (uptime roughly 8.5 hours?) or does uptime count the number of hours the system is running between crashes? In that case my BeOS machine at home must now be somewhere around 500 days or so. I upgraded it a few times but it never ever crashed on me. Putting win2k in this statistic is of course ridiculous, the OS that has been out for 50 days has a maximum uptime of 49 days well..*duh* To make a statistcally valid comparison of uptimes you'd have to use the same number of systems for each OS, not well over 500 for one and just under 20 for several others. In a larger population you are naturally going to see more extremes. I bet the record for shortest uptime can also be found in either the linux or freeBSD groups. The averages of course tell us something, but in the really small populations they too are irrelevant. I'd like to see this uptime project become bigger amongst users of less uptime centered operating systems so that the statistics become a bit more valid.

The only thing this chart tells me is that *BSD and Linux users are more concerned with statistics like this than users of other operating systems. What would psychologists make of that?

I have a few debian-slinks upgraded to 2.2.7+ip-patch for Alpha and 2.2.13 for x86. They do not crash. Period. The uptime so far has been determined by power and hardware upgrades. In btw: they are usually loaded to above 1 loadavg (some of them above 3) and have insanities like a full Internet BGP routing table (more than 64K routes).

My recommendation - never run a stock kernel. Especially a kernel that has the bugfixes for bogus IDE controllers compiled in (unless you have bogus controllers;-). Some of these can really mess you up.If you clean up your hardware driver list to match your actual config your uptime will increase.

That's not quite true. Windows 2000 has gone gold recently and should be released sometime next february. 49 days ago, RC3 was released. And that's close to the final release... except for the lag of a few drivers and such.

Home users turn off their computers at night. Most of the Windows users I know aren't running mail servers or FTP servers that require constant uptime, so they power down at night to save some pennies on the juice bill.

Home users don't have uninterruptible power supplies. If the power goes out, the last thing they want to be doing is sitting in front of their computer. The $100 investment just doesn't make sense for them, and thus, they experience downtime with every power drop.

Home computers are used by children. Your spiffy FreeBSD machine is probably locked in a wiring closet somewhere, well away from six year olds with a penchant for DirectX games and dripping their Cokes on the keyboard.

Home computers are moved around. It might sound odd, but you're much more likely to shut down and pull the plug on a home system than a server just to move it over a few feet or to clean underneath it.

I'm not meaning to slam Windows as a home operating system, but isn't it fair to say that Windows (all flavors, even NT) has more home users than FreeBSD? Isn't it thereby, safe to assume, that if you really have an accurate survey of uptime, Windows will naturally be lower? Just something to keep in mind.

Show me a general-purpose UNIX/PC box that's been up for 2 years and I'll show you a box full of unpatched security holes.

The vast majority of exploits used to gain access to a system happen in userland, and can be fixed in userland. No rebooting required, high uptime maintained.

The vast majority of exploits in the kernel are denial of service bugs, which cause the system to reboot and/or hang. In my experience the main problem is oddly formed packets crashing the TCP/IP stack. In the case of these high-uptime boxes, obviously they have some sort of firewall protecting them from bad packets, or they would simply not have the uptime that they do.

So your assumption that long-uptime boxes are swiss cheese is not necessarily true.

That said, I'd bet none of the really high uptime machines are running as shell servers for hostile users / script kiddies. That's the ultimate test of the quality of the sysadmin(s) and OS.

Think about this. Some person is amazingly proud of the fact they are running kernel 2.0.18. How many vulnerabilites does this server have? What is kernel 2.0 upto? 2.0.37?? how many DOS attacks is this thing vulnerable to??

I have no idea about FreeBSD but i am guessing that either it has less time between kernel updates, the version our leader is running is the last in the stable series and FreeBSD has moved on to a new series or it too is vulnerable to any bugs or attacks that have been fixed in newer kernels.

I am proud of the fact that my servers have an uptime of only 30 days or so. Because i know that I am performing regular maintenence on them. They crash rarely, usually due to hardware failure, but I reboot them frequently to make sure they are running all the latest fixes eg a new kernel install.

This is like saying "WooHoo my bog standard RedHat 5.0 box has been up for 2 years!!" Crackers ahoy! Vulnerable target sited!! A quick search of any crack DB will give you root access in less time then it will take you to make a cup of coffee.

I would expect any NT box should have a maximum uptime dating back to the release of Service pack 5. (Dont know about 6. A few admins i know are avoiding that like the plague)

The same applies to Linux and or FreeBSD or whatever. If you fail to apply critical patches to your system, most likely in production use, why in god's name should you get kudo's off the hacker/admin community for a job well done??

I read the first few high-rated commments before reading the story. Unless it REALLY interests me, I don't read a story until there are 50-100 comments. I never run into the problem you describe. So what if the first comment (ie the posting by the/. staff) has errors? The community fixes them.

While I think there should be some effort to avoid errors (and I'm sure there is), I don't know that the/. staff need to try to rectify it all; the posting and moderation system see to it that it will be fixed fairly quickly.

A process that's stuck waiting on I/O (which sounds like what you're describing) will not be killable on any Unix or Unix-like OS.

There is a interruptable flag in the arguments to the sleep() kernel routine that determines whether or not a process can be killed when it is blocked on an event. That makes it driver dependent.

I used to have problems with an old 68000 Unix system's tty driver, the serial chip would get confused and processes would go into an unkillable state waiting for certain events. Later releases of the kernel fixed the problem by adding the interruptable flag to the sleep() calls in the tty driver.

I used to enjoy reading the source code for DEC device drivers. The programmers always seemed to assume the worst from the hardware. They would set up timers so that if the controller went catatonic, they could reset it and retry the request. Some of the Unix source code that I have read took the opposite approach, error recovery consisted of calling panic(). The rumor was that DEC used to burn in VAX systems by running Unix, since it would crash on systems with hardware glitches that wouldn't bother VMS.

Not only has Win2k only been gold for a couple of weeks - the website statistics havent been updated for 'bout 6 months. Even if they had, the average uptime for Win2k would still reflect the even earlier beta's from back in the day:Psignature smigmature

There's lots of [justified] griping above, pointing out how you can't draw conclusions about any OS's stability based on the longest runtimes.

While that's true, this kind of survey does give us maximum runtimes, and I don't think that's available anywhere but here.

For example, maybe a few posters could close their blathering pieholes long enough to see that the 49 -day figure applies to Windows _anything_, not just Win2k. For a startling revelation, go to The List [hexon.cx] and click "All" under "Alltime".

There are dozens of 49-day, 17h02m uptimes for Win32, and none longer. Obviously, either the OS or some popular [driver|service|screensaver] is broken [insert dumb "already knew that" joke here]. I dimly recall Microsoft claiming this was fixed in an NT service pack; obviously that's not the case.

For a more subtle trend, you can see a clump of Linux boxes topping out at 497 days, 02h27m. This is 2**32 Intel jiffies (100ths of a second; Alpha jiffies are 1024ths of a second) -- if you're running a module that assumes the jiffy count is always increasing, you'll get weird happenings when the counter rolls over. Again, I dimly recall one of the kernel people suggesting the jiffy counter be initialized (at boot) to MAX_JIFFIES - 3600, so that every module author writes code that will handle a rollover.

Faults tht only appear after a long runtime are typically easy to fix, but almost impossible to detect. Right now, the survey doesn't filter out shutdowns for known reasons, or collect enough info from the client (what modules are running, etc.).

If that changed, it could be a real goldmine, both to software maintainers, and to those who want to know when their system is due for its next crash.

To a certain extent this also measures the frequency of major kernel upgrades. While Unices can stay up with virtually any piece of software being installed or upgraded, everybody reboots for a new kernel. I don't think that installing a new kernel a couple times a year is necessarily a sign of weakness in an OS. There will be security patches, bug fixes and the occasional new feature.

A comparison of the maximum uptime with the length of time that particular OS version (down to the kernel rev in the case of Linux) has been available is useful. If an OS can consistently stay up until the system administrator wants to upgrade the kernel, it is stable. Of course, one other number is needed for that. What percentage of the systems running that OS and running 24x7 actually approach that maximum uptime?

Unix sys admins have learned to like statistics like this. They are a decent indication of how often the pager will go off. Back in my sys admin days I loved the fact that our uptime was good. It meant I got more sleep.

This tells me that they aren't using a random sample, but rather you have to activly register your box to have your uptimes scored, which doesn't indicate that the sample of people is very diverse first of all. Also, the sample size for some of the OS's is horrible. A few examples:Windows NT 71Windows 95 30BeOS 5 And people are actually making remarks about BeOS's performance when only 5 people have contributed their uptimes to this study? As far as I'm concerned, the only samples that are worth jack are Linux and FreeBSD (590 and 137, respectivly), and even then I don't trust the results because of the first point I brought up.

..but the point is at the time we didn't know any better. Nowadays I would put in place a Linux solution for that, but back then my geek credentials were a bit thin on the ground. The point is, I had never used NT before but after a couple of weeks learning it by playing around I managed to put together a, by NT standards, stable server that did what they wanted and continues to do so, and most of those two weeks were spent in me going through the entire system learning how it did what, I could actually have produced a server in a couple of days but not been confident about it. For the small firm ease of use is all important.

Come on, Slashdot people, research a story for 5 seconds before you post it.

Approximately 17 minutes after the story was posted with an incorrect URL, a correction was posted (ie the post I'm replying to). Over the next few hours, it had been moderated up. Now it's at the top of the list*. It's an amazing thing, the power of many eyeballs.

While I agree that accuracy is important, I can't help but be impressed with the self-correcting nature of the slashdot community as a whole.

the client gets the uptime from the gettickcount-API-call which doesn't return more than 49days. so this doesn't mean NT and W2K crash after 49 days like win9x does [microsoft.com].(information from the maintainer of the project)

if You sort the complete list [hexon.cx] by OS, several entries of NT with 49 days uptime show up, indicating that it's so unusual to reach those 49 days and more.

remember also that NT has to be rebootet more often because of silly things, like changing the default gateway. on the other hand those intentional reboots might lower the probability of a crash.

I think the stats on Windows 2000 and BeOS should be looked at very carefully. These two OSes will go down not necessarily due to instability but due to sysadmins playing with them.

With Windows 2000, the system people have isn't finished, and as patches have come out, well, they get applied, and as they get applied, well, the OS must restart (it IS Windows).

Now, for Be, there are very few people I know who would run it as a standard server. Plus, on top of that, there are lots of new programs coming out. I don't think they require a reboot, but, well, we'll see. At any rate, BeOS isn't really designed to be a server (yet).

The shocking part is how little Novell servers manage to stay up. How many goverment institutions do I know that just bought those? That's pathetic. I knew that Netware was a pain to run, but I didn't know is was that tought to maintain.

That bug was only in 95/98, not NT. Our company is extremely pro-Linux, but a properly administered NT machine -- which includes, IMHO, not overloading it with too many different simultaneous load-intensive tasks, unlike Linux -- can have decent uptimes (6 months+). No reason to spread FUD when the truth is so much more powerful.

Thats really not true at all. I dont recall if that problem even existed in NT, if so it was fixed long ago. The problem now is that GetTickCount() will only report 49 days uptime I believe, but you can easily use microsofts uptime utility to gather the uptime from the reboots in the system log. I have several NT servers well over 150 days uptime.

Aside from Win98, which strangely managed to have an uptime of 61 days, 13:52m (I'd really like to know how, the best I could get out of it was 4 days on at least 6 different systems), They all have the exact same max uptime. Now, I remember something like this being mentioned awhile back and microsoft acknowledging the mistake, but then why is Windows 2000 in the same boat as 95 and NT?

It's the Win98 box I worry about her messing up. Only thing odd I've had to do is tape over the power switches. Absolutely true... I've have to tape a peice of cardboard over the front of my computers to protect not only the power switch, but floppy and CD drive. I run Linux on my system, and Win 98 on my wife's system. I am constantly having to fix the 98 system mainly due to my 3 year old reconfiguring it. I have yet to have either child come close to harming the Linux machine. They have their own accounts, and I don't have to worry about them messing with things I don't want them to mess with in Linux. I expect to have to reinstall Win98 within a few weeks if the pattern holds up.

I would like to see more advanced statistical data with respect to this. I would suspect that the average uptime of a linux box is a bimodal distribution, with hobbyists representing the first, larger node with shorter uptime, and professional administrators representing most of the second, smaller node with longer uptime. The first node would dominate the other, and the arithmetical average and median would be pulled towards it through no fault in the operating system.

It is interesting, however. I can see the smaller node being constrained by the release of security fixes (and I wonder about the very long life FreeBSD boxes as well). I do think, however, that the best purpose of an uptimes study would be to find artificial constraints on uptimes, as respresented by abnormal distributions.

This could have pointed out the 49-day limit in MS operating systems well in advance of it being reported by MS proper, for example. If a kernel bug in Linux (or any other operating system) caused frequent crashes over time, it would reveal itself in the distribution of uptimes. Like I said above, the point should be to improve our (and by "our" I mean everyone's, from BSD to Linux to Windows) OS reliability, not merely to dick-size about ludicrously long uptimes.

Perhaps this calls for a more advanced massaging of data from the uptimes people:). I wouldn't mind helping. It's a project well worth the effort.