Lack of Linux System Admins hurting adoption

by Tom Adelstein

I recently got a call from a friend in Amman, Jordan who said the very large web site he manages needs a Linux system administrator. But guess what, they can't find a single Linux sysadmin in the entire country. With plenty of Linux desktop users around, I'm starting to wonder if anyone wants to learn how to use chkconfig.

The next problem that I'm starting to sense deals with a lack of standards. A client company in Dallas-Ft. Worth (my hometown) demonstrates a global condition. A Linux system administrator goes to work and builds a nice application for this media company. He sets up the system and then like many Linux sysadmins decides he can't stand seeing Windows XP on the desktops and in film editing bays. So he leaves.

The server needs maintenance and someone to show the users how to back up the considerable data on the server. So, we get a system admin over to the client to help out. He finds a kluge of workarounds, experiments with software raids, deprecated libraries and so on. We can't figure out how the previous Linux guy did what he did. Of course, our predecessor left no documentation.

You got it: We mounted the file system, compressed the data files and moved them to a Win32 server. Then we install a new Linux operating system, build the data management application again and transport the data back to the server. This time, the system has documentation and a record of what we did.

Plenty of companies experience similar outcomes. Linux sounds like a good idea (and it is) but Linux slobs leave their IT departments and a mess no one can figure out. It's not giving us a good rap, friends.

We even found a system where the sysadmin rewrote the entire init system. Try coming along after that.

Hey, I love Linux and will always advocate its use. But, I hate slobs and the arrogance in the community.

Now, I expect lots of trash in the comment section from people in denial. All I can say is to stay in denial, remember to voice your opinion that Microsoft will fail and Linux already rules. How about this: "in your dreams".

46 Comments

Matthew Sporleder
2006-08-17 06:09:58

Maybe your old admin was just using a distro your new admin didn't like.

William
2006-08-17 07:08:51

This is just another example of the Freedom vs. Conformity dichotomy. Linux provides sysadmins tremendous freedom, and few well-defined standards or best practices.

Part of the problem is that the Linux landscape is changing much faster than the old proprietary UNIX landscape did, because of commodity hardware and Linux's use exploding into all manner of devices at once. The other part of the problem is many Linux users feeling like they are coming from the enlightened hinterlands, with a chip on their shoulders and something to prove.

The solution is to continue to harp on the qualities that make Linux work in the first place - communication, openness, knowledge-sharing - and make sure that everyone is constantly reminded that these qualities don't only apply to the creation of software, but also to its use and maintenance.

WilCo
2006-08-17 09:47:01

I don't think it's strictly a Linux phenomenon; I've seen it in places with proprietary UNIX systems too. The problem seems to be that UNIX systems give people a lot of flexibility and raw capabilities, so people are often tempted to do things on their own where otherwise they'd have paid for software--either COTS or custom from a professional. As they said, UNIX gives you enough rope to shoot yourself in the foot. Linux and free OSes lowered the barrier of entry that would have traditionally required movement through some sort of organizational program, which would have at least instilled a consistent set of bad habits, if not good ones.

I just recently discovered on some of my AIX boxes that, prior to my arrival, developers had managed to convince one of the junior admins to replace the stock /usr/bin/perl with a symlink to the locally-built /usr/local/bin/perl and were using #!/usr/bin/perl in their scripts. They're going to be hurting when I apply OS patches that replace the symlink with an updated binary and the locally-installed Perl modules suddenly go missing.

Wilco
2006-08-17 10:27:49

What you read above is strictly a Linux phenomenon. I didn't say anything about UNIX. People mess up all the time in places like the one you described. My short discourse relates only to Linux as a widespread problem.

Roger Weeks
2006-08-17 12:00:23

This isn't a Linux or a Unix problem. This is a system administration problem, coupled with bad management.

When you have admins that are busy, you get no documentation. When you have bad management, that never insists on documentation, you don't get any. I've seen this happen many times. Many times I have been the admin brought in afterwards to figure out what's going on, and make things work again.

Personally, I have had to puzzle out Linux, AIX, Solaris, BSD, Mac OS X, Windows NT 3.51/4.0, and Windows 2000 systems. Not to mention Cisco LAN and WAN networks using RIP, OSPF, EIGRP, BGP... all undocumented. Applications uncounted, from sendmail to Microsoft Exchange, IIS to Apache. All undocumented.

Lazy administrators are part of the problem, but bad management is also part of the problem.

Tom
2006-08-17 12:26:53

Roger Weeks: I was pretty sure we'd get comments like yours: Totally unrelated to Linux. It's not the human condition. It's specific to Linux. Linux is documentation hell.

Roger Weeks
2006-08-17 13:50:11

Well that's where my experience and yours must differ, because my experience is that all systems are documentation hell. Linux is no different than any other Unix in that regard.

If you've never come into a big complicated network with multiple routing protocols, none of it documented, or a huge AIX installation running a Progress 4GL database, none of it documented... etc etc etc.

I have direct experience with all of these and Linux wasn't anywhere nearby.

Roger Weeks
2006-08-17 13:50:50

Frankly, I think blaming this problem on Linux is short-sighted.

Tom
2006-08-17 13:58:45

Roger, give me a break. I'm not some guy in a diner. You want to compare IBM docs to what Linux has out there? Maybe your shops haven't paid for the proper "vendor documentation" but it does exist. How do I know that? Like I said, give me a break.

Roger Weeks
2006-08-17 14:01:23

One more post, and I'll shut up, I promise.

If your client company had been using AIX, or Solaris, instead of Linux, how would this situation be any different? The server would still need maintenance, your admin would still have had kludges, experiments and deprecated libraries, and you'd still have to figure out how the previous guy did what he did.

And I'm not in denial about this, it is a huge problem at nearly every place that I've ever worked: Windows and Non-Windows alike.

Tom
2006-08-17 14:20:33

Roger, I don't disagree with you at all. We don't have any disagreement.

Life is a conversation, some existentialist said. And we're having two different conversations. I'm saying that a massive gap exists - not just in experience and training - but in the documentation, standards of work processes, etc. I believe (note believe) that has caused Linux guys to be at a disadvantage.

Do you get tired of orphaned posts on mailing lists asking for help? How about the standard RTFM response. The problem with that: no FM exists. I think people should write the FM.

Next, Linux sysadmins are missing in action (MIA). Why? I believe (believe again) because of the lack of straight old Linux work.

Now, Red Hat today is a different story. They do have a tight grip on their niche. Good docs, training, standards, etc. But, have you seen the terrible job that exists in the Fedora community?

Know any Fedora sysadmins?

So, in conclusion, this is not a conversation about the human condition unless the bull of the Freedom vs Conformity argument somehow gets factored in here. Personally, I think the conversation is about the ability to find solutions when the documentation dumped in /usr/share/doc area has dates of 2000, 2001, 2002 and 2003.

On another subject, a psychiatrist says that all mental health professionals in Austin are extremely busy. He suggests that the young, professionals seek help. Whereas, the bubbas handle their problems in the bar. Wanna beer?

Roger Weeks
2006-08-17 15:21:34

I'm a Fedora Sysadmin. Have been since FC1 was released. But I've used RedHat since at least version 5, probably before. We still use RedHat too, on servers where its necessary. Fedora runs some of our less critical systems.

There are lots of best practices out there for Linux sysadmins. They are the SAME best practices for Unix. The OS details differ, yes, but an experienced Unix sysadmin makes a wonderful Linux sysadmin.

I agree that Linux documentation is patchy at best. But frankly, having had to troubleshoot oddities in Solaris and AIX and Mac OS X too, with bizarre library versions, and compiler issues and network card drivers, I don't feel any different about Linux than I do about any of the commercial Unix distributions. IBM documentation is supposed to be legendary, and the BULK of it is certainly legendary, but it almost never helped me in troubleshooting AIX.

The places where I generally get help on mailing lists? I don't have orphaned responses. But I'm looking at my list subscriptions here and not one of them is for an operating system - they're all for various applications.

I'm looking at an IRC channel where I stay logged in to keep in touch with friends, and there are at least 5 folks on there who are full-time sysadmins. Not all of them are full-time Linux. Most places don't run specifically on Linux, at least any places I know. There's almost always a hodge-podge of systems.

I have probably a hundred people in my circle of contacts that all have extensive experience in Linux systems administration. I don't think a single one of them works somewhere at the moment that is Linux-only.

So to me, a lack of Linux syadmins really means that most people have Unix and Linux skills, and use them both.

Tom
2006-08-17 17:14:25

Roger: Thanks for sharing your experience.

Caitlyn Martin
2006-08-17 23:07:14

One of my little specialties is coming along and cleaning up a mess. It pays well. I did it when I was a management type (IT Director) in two places and I've done it as a consultant (read: back as a technical type) on more occasions than I care to recall.

Tom, Roger Weeks hit the nail on the head. I've also seen lots of developers and sysadmins in the Windows world leave undocumented mess and all sorts of wieredness. Ditto on commercial UNIXes, including AIX, Solaris, HP-UX, and Irix. Ever tried to migrate off a legacy Digital UNIX or QNix system that was in a similar state? Worlds of fun, let me tell you.

Most Linux sysadmins are also UNIX sysadmins. The two skillsets are so close and there is so rarely a job for just one or the other that I have yet to run into somone who does Linux professionally who doesn't also do UNIX. There are some wonderfully skilled and disciplined sysadmins who keep to standards and document well. There are many who do not.

Also, don't just blame the sysadmins. Blame management who cut corners and leave IT departments grossly understaffed. Given a choice between doing it right and doing it fast the sysadmins are often forced to choose the latter approach. Then you or I come along and get to clean up their mess -- a mess they were often forced to make.

Tom
2006-08-18 05:53:03

Caitlyn: Your statements have a ring of truth about them. Yet, IT management can live with the mess and have you clean it up because they can look at a document or a knowledge base and say "hey this is the way he was supposed to do it. OK, fix it."

It ain't so with Linux.

Also, UNIX admins appear less likely to fall in the Linux area. Most working Linux sysadmins have had to learn the craft on the fly and come from the Microsoft-Netware population. Ooops. IT management considers MCSEs more reliable and will pay for them to learn like at the Red Hat academy.

I expected that we'd see comments like your. The Linux community remains out of touch with reality and in a state of wishful thinking. That's not a productive approach to a wide-spread problem systemic in the community and it will not solve further the action toward a solution.

Caitlyn Martin
2006-08-18 10:09:04

Tom, you are forgetting that I am an IT professional and a Linux consultant just like you are. You forget that I have worked as a consultant for Red Hat visiting Linux customers. I also used to be IT management. I think I ma very much in touch with reality, thankyouverymuch, as in every bit as much as you are. Claiming I and others are not is simply a way of dismissing those that disagree without considering that they may have a point. C'mon, Tom, you're better and brighter than that.

Let's start with the documentation issue: 91% of the Linux server market in the United States is owned by Red Hat and as you have already admitted Red Hat has excellent documentation. Novell/SuSe's isn't half bad either. Then there are lots of O'Reilly books that effectively fill in gaps. There isn't a central Linux repository of knowledge out there and NEVER WILL BE because Linux isn't one monolithic project run by one company. What you seem to be demanding is that Linux morph into Solaris or Windows. It can't and won't.

Having said that a central source doesn't guarantee good documentation as Roger pointed out. Back in the day of HP-UX 10.02 there was this shelf full of HP books for the OS. That doesn't exist for 11i, does it? You have to go online and dig. It's cheaper for HP and their customers that way, isn't it? The net result is that in terms of useful, easy to parse through documentation, HP-UX is way, way behind where it once was. Roger hit a nail on the head when he pointed out that AIX, Solaris, the BSDs, et al are all in the same boat. The situation is hardly unique to Linux and, thanks largely to Red Hat and that 91% server market share, it may actually be better in the Linux world than in much of the commercial UNIX world.

What you describe is generally customized scripting, coding, and hacking at the OS that some (most?) Linux/UNIX admins do. You described an extreme example where some overzealous admin rewrote init. Would it surprise you to know that I've seen the same thing? Would is also surprise you to know that I walked into a shop that still had legacy Solaris 1.x and SunOS 4.x servers and that exactly the same had been done on those boxes? It is a failure to document, to write clean code in the first place, or just some very strange technical decision making that is the problem here. There are lots of reasons why this happens: lack of proper training, lack of discipline, lack of time, perceived cost savings, lack of technically qualified management which translates into a lack of supervision of processes and procedures, and lack of making documentation a priority. It is not and never has been a lack of qualified sysadmins. Let's face it, you have to know quite a bit to rewrite the init process even if it is a stupid thing to do. It happens ever bit as much in UNIX as in Linux.

IME the Linux admins that come over from Windows are the minority. Thanks to IT downsizing and outsourcing there are a glut of qualified people rather than a shortage. Granted, most of the cream of the crop is still working and always has been but there are still talented and experienced people out there. It is also true that a bright person within a department can move into new areas and we do get people who come over from Windows that way and, failing proper training, do make mistakes. Some companies still try to save a buck by grabing a smart kid straight out of college. There is absolutely nothing wrong with that if you mentor and direct them. If not, and if they treat the corporate server farm as their playground, well... we've both seen the results. How can you call all these issues a failing in Linux?

Again, what you describe seeing at customer sites is real. I've seen it to. However, claiming it is somehow uniquely a Linux problem rather than a wider IT problem is not.

One of the reasons I have toyed with looking for a management position in a corporate IT department again is that I could at least do it the right way in one company. Recruiters approach me about such positions regularly. Then I think about it and promptly come back to my senses.

Simon Hibbs
2006-08-18 10:11:38

One problem here is we;re talking about two different things, so no wonder a lot of this thread is at cross purposes.

2. Undisciplined sysadmins can realy make a mess of a system - It's probably work fine, but will be unmanageable by anyone else.

The reason for a lot of the cross-talk is that the orriginal article gave an example of point 2 in the specific case of a Linux admin, but was realy talkign about point 1 because there isn't much official documentation of the right way to do things that would mitigate point 2 in this case.

I hope I've got that streight?

The solution to point 1 is better documentation from Linux vendors, and I think this is happening. The solution to point 2 is beter professionalism in admins, but also crucialy better management.

Any manager responsible for IT should be aware of this problem, and take steps to mitigate it. This means pay for professional admins. Allow admins time and resources to document what they do. Finaly, insist that they do so, and take steps to verify it. Modern documentation tools such as wikis make ad-hoc documentation and note taking a cinch, and make the results broadly accessible to the team making the documentation process both easy and transparent.

Matej Cepl
2006-08-18 19:39:08

Just a question -- I am just trying to switch from academia (non-IT-related one) to Linux administration -- where are those wonderful documentation techniques documented in the first place? What I am supposed to so that I won't be hated as the mentioned admin? Is /root/log.txt enough? Probably not, so how to do it?

Thanks,

Matej

Tom
2006-08-18 20:28:46

Matej: Simon did an excellent job of summarizing the issue. It seems like Linux documentation or the lack thereof is a problem - a really bad one. I don't know how to lead you to material that can help you get the started learningg the standard body of knowledge. To solve problems and do my job I usually have to dig and dig to find answers.

If you are on your own, you should give yourself about two years to turn the corner if you have a lab of servers on a public network. Start learning local administration and then move to the Internet side. Also pick a specialty like mail servers and directories like Postfix and OpenLDAP. Or Database administration.

You'll have to learn the operating system inside and out and work with standard services and their config files. You just need to know it's a big learning curve. It's not an easy transition but it's really worth the time and effort.

David Levine
2006-08-18 20:44:55

Some interesting comments earlier in this thread comparing available (formal) Linux documentation and IBM documentation... The thing is - there is a TON of formal IBM documentation for any number of IBM systems and platforms - but I'll be damned if I can find any of it. I can spend days... even weeks attempting to find good information on i5/OS and never get anywhere... (redbooks are great - but are not real world). However in about 15 minutes I can find some extremely helpful documentation on any variety of task or application on Linux.

The availability of information on Linux is vast - and will continue to grow as people adopt it...

Tom
2006-08-18 21:04:06

David: People get IBM documentation when they purchase an IBM product. If you purchased AIX for the pSeries then you would have more documentation than one could imagine. But, you won't find much in the public domain. Redbooks have only limited use. IBM is a proprietary company and even with all their support of Linux, they don't give their other products away.

I don't know a diplomatic way to respond to someone who would say "the availability of information on Linux is vast - and will continue to grow as people adopt it..."

I think of words like naive optimism, lack of practical experience, fanaticism, etc. Those things probably don't apply to you and I'm not clairvoyant so any character assessment on my part would be pure speculation.

I will say that my assertion about the problems with the poor quality of Linux documentation and what it does to its adoption is well supported. The evidence is clear and if you have ever worked on a serious sysadmin problem you should know that.

As someone who writes books, howtos and other documentation for the Linux community, I assert that we need more people on projects working as traditional technical writers.

K C Ramakrishna
2006-08-19 01:32:32

I totally agree with this problem. I am a Linux consultant and have faced this problem when I have been called in for fire-fighting. I once went in and managed Linux servers for a 1000 people co. armed with just the root passwords.
There is definitely a problem with the documentation by the sysadmin but the community really needs to address this problem. I would suggest a tool for the sysadmin where he records all the changes made.
Or maybe automatically version all configuration files (svn anyone?)
Its unbeleivably frustrating and guess to which the OS the co. migrated to? - yup Win2003 server. And of all the galling things, I had to help as I was the only on who knew what was happening on the servers.
Non-Availability of quality Linux Admins is definitely killing Linux adoption in India.

SP Kasam
2006-08-19 01:37:33

I think part of the problem of sysadmins not properly documenting Linux configurations is down to the high reliability/security/stability of Linux/Unix compared to Windows.

On Windows servers you expect crashes/instability/worms/viruses etc. and you expect to have to re-install from time to time and also to patch regularly in crisis situations to deal with worms etc. You therefore need to document the configuration properly and avoid customising anything if you can avoid it because it just makes re-installation more time consuming.

With Linux/Unix, because of rock steady stability and because much better security is achievable, sysadmins take shortcuts in documenting the installation, and are tempted to do extreme customisation and leave the configuration undocumented. I do this myself. Linux sysadmins just don't look ahead to the need to document because the need doesn't arise - re-installation is rarely required, there aren't the periodic worm attack crises and patching required with Windows, and the configuration in /etc is self documentng to an extent that the Windows registry can never hope to get close to. Linux sysadmins can get away with this level of minimal documentation without any problems - until they leave that is - then documentation and even worse, extreme customisation becomes a real problem.

Trevor Lane
2006-08-19 03:11:26

Linux sysadmins don't need to document their work did you say?

AAArgh; Please don't apply for a job with me. Unfortunately you have just corroborated the earlier point that quality sysadmins are few and far between.

Volker Hett
2006-08-19 04:27:57

I've seen this, too. A guy set up small servers for small companies, homegrown Celerons with ATA drives running the then current SuSE distribution. The users don't know anything about it, don't do backups since they don't know how to and with what and then the guy died in an accident.
OTOH, his clients are pretty cheap and don't want to, or can't, pay for a part time administrator. After salvaging a corrupted netatalk share on a Linux box at one of his former customers for a bottle of Sherry and a reconstructing files from a broken harddisk of another customers laptop for 100 Euro I made a proposition for a reworked system with dokumentation, training and a backup plan, it was turned down in both cases.

macker
2006-08-19 05:22:30

Pay me now, or pay later??
This semi-typical anecdote may generalize to more than Linux, Unix, or OSnnnnn networks and database systems: it is just one more specific instance of a more fundamental paradox, constructed as an unintended consequence of our technology-dependent society.

Those who don't know how to deal with the details must hire those who *do* know how, and thereby lose control, in an at-will employment scenario. It's really simple. If you don't have the skill and/or inclination to do your own network design, programming, and administration, but you still want a network, you are forever at the mercy of the people who are willing and able to handle the technology.
Learn how to do it yourself, and DO IT!! or quit whining about how that terrible technology person "left you in the lurch". You get what you pay for, ( and pay, and pay, and pay.... ) Somebody just nickel-and-dimed themself into a problem, and I have no sympathy for them at all.

Chris Bergeron
2006-08-19 06:14:08

I've got to agree with you on that one. As tedious as it might be, any custom work you need to do to get a system going needs to be thoroughly documented. *Especially* if that's harder than actually doing the work.

Some people seem to view obscurity as some kind of job security.

I'll grant you that it *currently* happens more often on Linux. But I've got a whole chain of SCO Openserver using stores out there that'll argue the point that it's strictly Linux.

I've got docs on all of my SCO stuff for the customers. Unfortunately this chain had a hot-shot local guy (no longer around) do the installs at all their stores with all these custom "tweaks". I have enough trouble finding *any* documentation on that version of SCO, all their scripts and kludges (which are necessary in the OS at times) are totally undocumented.

Mandriva's actually a bit better documented and using 5 different releases in the field has eased off the headache of managing a *single* SCO release for me . Although I can't vouch for the other UNIX(s), I have an easier time administering Linux (researching flaws, patching up bugs, gettings stuff to "just work" remotely) than SCO and I've found it to be equiv. to what the Windows shop guys I know have to put up with.

You're definitely right about your cases, and possibly even in general. It just doesn't apply to my shop. The machine labeled FILE SERVER (backed up) contains a directory (pretty clearly labeled, don't recall as what ATM) that has all the oddball documentation I/we used and created over the years.

My 2 cents. Hopefully it doesn't get run through a penny press...

Alex Chejlyk
2006-08-19 06:19:53

I too, am an IT person. I see very little sysadmin documentation for all companies I've "taken over". Recently I took over a small firm with MS SBS 2003. The only documentation was the password list. The security on certain shared directories was setup improperly (no one including administrators could access the shares), missing router passwords, etc.

My documentation isn't the greatest mainly due to time. I comment the files I modify and keep a log of work performed but I could do better. I recommend not to modify the distro structure as there are many ways to cook an egg, some ways are quicker but not necessarily better. I see lack of docs more of a sysadmin problem than a specific OS issue.

Cheers,

Alex

Scott Alan Miller
2006-08-19 06:34:22

I am a Senior Linux Systems Admin for a large multi-national and know scores of other admins who would be more than willing to work with companies in Jordan needing administration, engineering, etc. Are these companies able to use offsite administration? That is what most American companies do. Even in house staff is seldom located at the data center.

SP Kasam
2006-08-19 06:56:03

Linux sysadmins don't need to document their work did you say?

AAArgh; Please don't apply for a job with me. Unfortunately you have just corroborated the earlier point that quality sysadmins are few and far between.
Trevor Lane

Read my post - I didn't say that sysadmins don't need to document their system configuration - I was simply giving the reason why it often doesn't happen. I said that because of the superior stability and security of Linux and Unix compared to Windows, and the ability to embed comments into the config files in /etc, Linux and Unix sysadmins can get away with documenting configuration with comments in the various config files and customised scripts plus some external notes in a note pad as to wehich scripts are customised. This is no problem provided the sysadmin doesn't leave. When he does though it is difficult for another sysadmin to pick things up.

A Windows administrator has to deal with crisis patching and reinstallation far more frequently, and Windows GUI config tools and Windows registry aren't as conducive to embedded documemntation as editing Linux config files. The result is that Windows has to be documented externally, and the Windows sysadmin will do the documentation simply to save himself work because he knows he will definitely be required to do reinstallation/crisis patching at regular intervals.

Rather than being a problem with poor quality sysadmins, this is a problem with poor quality IT managers. It is human nature to act in self interest. Why do you expect a Linux/Unix sysadmin to do documentation which goes beyond what he requires to run and maintain the system himself, and into facilitating his replacement? Get real! It is the sloppy/incompetent IT manager that fails to ensure ease of continuity of maintenance if a sysadmin leaves, by ensuring installation/configuration documentation is done by all sysadmins under him in a manner that can easily be taken over by someone else.

alucinor
2006-08-19 07:12:14

I think all the Linux OSes for general desktop/server usage should just standardize on Debian. That's probably what's going to happen in the future anyways.

jp
2006-08-19 07:21:21

I've been managing systems for 10+ years and it's nonsense to say this is only a problem for Linux. Vendor documentation is slightly better for some OS's like MS, providing you can find it in their knowledgebase. The community support is only slightly better, simply because there is such a large user base. On the other hand, when it comes to very difficult complex problems, I find most OS's are similar when it comes to the difficulty in finding support from the vendor or the user community.

But, as far as documentation by system administrators, there's certainly no difference. I have seen the same things this article mentions with Windows admins. They write all kinds of bizarre scripts to do things, or setup databases or other services in very strange ways, and it's the same scenario where you just start from scratch rather than trying to figure out what they did or why. It's poor systems administration for whatever reason, but it has nothing to do with the OS.

James
2006-08-19 08:03:23

I work with 2 other SysAdmins at an ISP/IPTV/Fiber Transport shop. We have complex systems and networks ... When one us start to toot our own horn about the super cool solution we came up with to solve some issue or need. The other two always say "Is it in Wiki?" Because its vaporware if it isn't in Wiki. All three of of is have over 10 years of experience and we all can share stories about that situation you described but it has nothing to do with Linux.

Oh, and on our front page of our wiki .... We Don't Do Windows.

dietrich
2006-08-19 08:30:09

Don't be a litter bug--deposit your trash here:(/dev/null)
;)

Dale Wilcox
2006-08-19 09:29:04

Documentation is one of the tasks that most if not all admins put off. This is the biggest problem of all the organizations that I have worked for for the past 30 years.
It is not a Linux problem. It is management and admins lack of follow through period.

Tom
2006-08-19 09:44:32

Dale: You can stomp your feet and say period all you want. I've done this for 29 years and I've seen some great shops. Additionally, I've consulted on very high levels for more than half of my career. I've been a partner in some big and some small firms. I have seen and worked in many more shops then I have managed. But the shops I managed were clean, well documented and top notch.

It still boils down to you having a different conversation than me. You just didn't get it.

Bpechter
2006-08-19 11:34:05

In general -- Linux documentation sucks. This is one of my major rants.

RedHat and Suse documentation sucks less than no documentation but more than commercial Unix docs.

I've been doing Unix Sysadmin for about 20 years, since I left DEC's VAX/VMS and hardware support. I've been a sysadmin trainer, training doc developer, sysadmin and product support engineer.

I've been a sysadmin in a commercial IT department and for a lot of development labs (inside computer companies and in Bell Labs).

The man pages for the standard commands are less well maintained for Linux than the ones for *BSD, Solaris, SysVRel x...

Fact.

See the number of utilities that refer the user to the gnu info docs. See the lack of good examples in the manual sections.

But that's not the problem. It's a symptom of the nature of the GNU tools. Linux vendors need man pages at least of the level of Solaris and *BSD.

Linux is just another Unix. It should be able to be administered like any other Unix-like system. I've done almost 30 different Unix versions and do multiple linux versions today including Ubunto, RHEL3/4 and clones, RH6.x->9, Corel 1.1.2, Suse7...

There's no way a competent Unix sysadmin can not run any Linux environment/distribuition/network.

Let's get real. A main thing that professional sysadmins know is that rewriting system standard startup scripts and the normal system programs to incorporate special functions is a bad thing.

Local modifications need to be done in a way that will be transparent to normal sys admin and documented. In addition normal system upgrades should not break functionality.

I've run into the problems with stuff I did with a custom OpenSSH port in Solaris which gets into issues when the Sun OpenSSH patches get installed.

Any mods like this need documents on the rationale, method and maintenance.

The problem is small shops (we've got three people for a couple of hundred systems on three continents with telecomm and mail and web services running 24x7.)

Docs get done in between emergency updates, customer demanded new features, product updates and roll-outs.
Internal docs for the sysadmin staff often is just a quick man page and an email on how it's used.

If we had the staffing (we do operations, some development assistance, hardware maintenance, windows support, hardware engineering and platform test, Cisco networking). This is all done with just three people -- down to two or one during vacation crunches.

Most big shops would have at least a dozen doing what we're holding together. I'd love to have a job where I could do this correctly.

With the minimum staffing in IT these days in startups and small companies you get the kind of engineering you pay for.

I'd love to get a job where I'd be able to do it right. I've got my resume out there looking but the current IT management wants certifications before experience and experience on EXACTLY the same stuff they're running so they can drop someone into their enviornment on the fly with no learning time.

I've done AIX, Linux, Solaris, HP-UX, SCO... a good sysadmin who UNDERSTANDS HOW SYSTEMS WORK can sysadmin any of these without a huge amount of retraining...

About 10 years ago I was looking for a job that wanted five years of commercial Linux experience. The only person who could claim that was maybe Linus...

The OS was at the 0.99.3 kernel when I started with it. Anyone who did commercial linux in those days had to be a kernel hacker level guy or a liar. Would I bet everything on 0.99.3 in a business then. Nope.
I'd have run Solaris or SCO or maybe *BSD.

It's either the self-trained so-called Uber-geek or the untrained rookie that make the messes.

The software raid experiments mentioned sound like the problem that the hardware raid either didn't work with the distribution or was too expensive.

The stuff I was forced to implement used a RAID driver from Adaptec that only worked with a certain release of RHEL... The RHEL kernel updates can't be updated without breaking the raid.

Whose fault was this -- the engineers that "spec'd" the beast, the Intel guys who said the on-board controlller worked with Linux, or the Adaptec lack of ongoing support for the chipset under Linux, the management who wouldn't spend the $$$ for upgrading to another RAID card.

My job was to get it running. Making it the right way was not an options.

Blame the prior sysadmin the mess is often something that happens when the next guy comes on the job.

I've done it. I'm not proud of it but I'm honest. I admit I lost my cool and said "What #$%^ did this $#%^&*." I've done it more than once starting in 1986 on my first sysadmin job. I do it a lot less now.

I've been ashamed of my attitude when I worked with the guy at another job. The Central NJ Unix job market was pretty small. Computer companies like Concurrent, the electronics command at Fort Monmouth, Bell Labs and the local colleges are about it. People rotate between them.

The guy I bitched about was good, probably better than I was. He also worked with constraints I didn't know he had. Sometimes the second guy into a site gets a different set of priorities than the guy that had to make a demonstration proof of concept work. Then that "proof of concept" goes live without the redundant hardware and backup networking... poof. A disaster.

Rembember that old joke about a corporate CIO who replaces the one the CEO dumped after a failed system roll-out.

The old CIO gives him three envelopes. "Here," he said. "Take these three envelopes and when you hit a crisis open them in the numbered order."

His plans to reorganize the IT runs into a delay.
He opens envelope number one and it says, "Blame the outsource partner and bring everything in house."

He does that and survives the delay. Crisis number two hits. He opens envelope number two and it says, "Blame the hardware vendors for delivering underperforming new hardware platforms late."

When his third delay hit occurs he opens the last envelope and it says, "Get three envelopes."

I use Linux for my desktop and 75% of my work machines are Linux -- with 20 percent Solaris and the rest Windows2000/XP.

One day I'll get that chance to run one of these projects with enough funding and staff to do it up right. These days in the trenches we just try to keep ahead of the next wave of incoming features requests.

Perhaps it's time for me to start consulting. There seems to be much more of a buy-in by management when someone outside states hardware requirements and asks for a stable list of software features before starting projects.

Richard Steven Hack
2006-08-19 12:23:47

And can you guess how many Windows MCSE "slobs" there are who leave their systems totally messed up as well? Over at Slashdot, we've had this discussion before. Linux (and UNIX) sys admins in general tend to have more general competence than Windows sys admins. I HAVE seen poorly installed Linux systems. I've seem MORE poorly maintained Windows systems. Don't even start on crap like Access database systems designed by people with zero knowledge of database design best practices.

Fact it - 90% of the people who set up computer systems are not particularly competent. This is true for Linux, UNIX, and Windows. Get used to it. And it's an opportunity for anybody who IS competent to get rich.

As Woody Allen, the greatest philosopher of the 20th Century, summed up the human condition - and the IT industry - in five words: "Nothing works and nobody cares."

Ashe
2006-08-19 12:49:30

As a comment, there are probably more people out there who are quite capable of performing as a Linux sysadmin than you may be aware of. However, the problem you will find is that many of them are the home-grown kind,the ones that haven't gone through MCSE/etc certification, the ones who don't have x years in the industry, the ones who got into it off their own backs. The policies whereby if you don't have a degree or insane amounts of commercial experience or other numbers on paper (and there is a distinct lack of these for Linux right now) mean that people slip through the net. Maybe this is as much employment practise (I have never seen a job vacancy that mentioned the word Linux that didn't want at least 5 years experience, for example) as it is lack of qualifications or perceptions.

John Fanghorn
2006-08-19 13:22:13

Badly implemented and documented IT systems are certainly not a Linux problem per se, and Linux is no more predisposed to producing a bad setup than any other given platform technology. There's a great book by Robert L Glass entitled "Computing Projects which Failed" which was published in the 70's, long before Linux was around, which provides a humerous catalog of failed IT projects, some with huge budgets; it's a great read if you can find a copy, and the author makes some very pertinant observations on the nature of the technology business. In more recent times, some of the large government IT systems that have been widely publicised in the media as failures don't use Linux at all. In 30+ years of professional software development, using Windows, various mainframes and commercial Unix based systems in the financial sector, I've seen many badly implemented and documented system; it's a problem common to all platforms, having complex causes, with it's roots mired in the intrinsic difficulty of engineering well-designed complex systems and in the commercial nature of the business.

l1nux
2006-08-19 15:37:26

First someone needs to document a few best practices:
1. Operating System will be vendor supported (e.g. RedHat, Suse)
2. Vendor packages will be installed whenever possible (e.g rpm install vs. source install)
3. Every source install must be documented.
4. Modifications to any system file must be documented. (System can actually be recreated from scratch)
5. Document system configurations (e.g. cfg2html)
6. Ensure all sysadmins read system documentation written by co-workers.
7. Actually cross train sysadmins between areas (e.g .Windows, Linux, Solaris)
8. Ensure each sysadmin has a backup sysadmin.
9. Have each sysadmin take a week vacation during the year. See if others can solve all issues.
10. Manager checks on progress during the handoff of systems when a sysadmin departs.
11. Maintain customized step-by-step OS installation guide per system type.
12. All custom scripts are configuration controlled. (e.g. cvs, svn)
13. Maintain a central repository for all the documentation. (svn)
...
Geez I could go on all day.

l1nux

Tom
2006-08-20 13:07:18

FYI: I do get to sysop this blog. Any vitriolic posts go. I consider personal attacks off-topic. Take it where someone else can appreciate it.

TheSmeg
2006-08-22 04:52:56

"Badly implemented and documented IT systems are certainly not a Linux problem per se, and Linux is no more predisposed to producing a bad setup than any other given platform technology."

I think I agree with the parent blog entry in saying Linux itself isn't more predisposed to these problems, but rather that Linux Sysadmins are. While my evidence is all from my personal experience of working with sysadmins who primarily use Solaris and those that primarily use various Linux distros, I'd have to say that the prevalence of a compile-from-source-without-documentating-what-I've-done mentality is far, far greater in Linux-land. My guess would be thi is more because the cost of proprietary UNIX hardware in the past has meant more proprietary sysadmins learn their craft in a big money corporate environment where hacking and experimenting in production is a sackable offence. I know some big enterprises would be an exception to this but in general, they pay for stability and tend to be more conservative.

I've watched 4 introductions of Linux servers into investment banking environments (were instability of the platform is really not tolerated), two where Linux was implemented by specially hired Linux admins and two where there was a mix of Solaris and Linux admins working together, which the older Solaris heads taking the lead. Guess which banks actually got Linux into their production environments?

Tom
2006-08-22 05:08:46

TheSmeg: I'm usually amused by someone who projects his or her personal experience out and then generalizes as if the whole world was constructed the way they see it. That was something I had to break early in my career.

Guess what, your snapshot of life has little to do with reality other than your own. I'm not impressed by anyone who feels they have to vote on everything whether they're competent to do so or not. Four investment banks: wow. I guess they entire investment banking comunity is constructed they way you see it. Boy, am I impressed with your take on life.

Graham Bentley
2006-11-21 09:58:34

I disagree about documentation being similar for *nixes - I have found the FreeBSD doco superb coming from Linux nearly every question I had was answered within it or within the community lists.

learning
2006-12-03 14:28:00

I have been using linux for 2/3 years. I like linux because
most of the time you can use good software with out any cost and
it realy helps. But the problem with linux is that there are many linux flavour if you just consider fedora there are six + debian+freebsd and others. It creats problem when you want to install a software because it may require different lib/version.
For example my postfix rpm of f4 do not install in f5 and so time installing from source is quite impossible without the help of google!!!
If the admin keep a document that does not help that much ..personaly I found the IP and installed major softlist helpful..because no document will help unless you run the system for 1/6 weeks . That is what I believe.
In windows there are 98/xp/2k/2003 and you do not need to worry to much about the software ..it is software developer headache.
..different flavour will kill Linux one day !

Sign up today to receive special discounts, product alerts, and news from O'Reilly.