Which changes that you have implemented did have the biggest impact on saving time in you daily sysadmin workload? What are your tricks to work more efficient and get more things done or work less for the same results?

This question exists because it has historical significance, but it is not considered a good, on-topic question for this site, so please do not use it as evidence that you can ask similar questions here. This question and its answers are frozen and cannot be changed. More info: help center.

22 Answers
22

monitoring+alerting - which is great safety net. just as developers write unit tests to make sure things don't get messed up when they update code, i rely on monitoring as additional safety net just in case i screw something up [ that is disconnect a server, deny production traffic on firewall etc ]. it gives a peace of mind - if things break i will know before customers call.

Plus: we have a huge TV on the wall, showing rotating diagrams of our network (done with nagios and nagvis). Creates great visibility, and latest version of nagvis is a stunner. Gives your boss and your users the feeling that you are in control (which you are once you have this).
–
wolfgangszSep 5 '10 at 12:16

I got to the point that I was administering 40 or so Slackware machines, and each one had local authentication PLUS local Samba authentication. I also had a VPN solution where each account needed to be setup, plus an internal jabber server and an internal email server. Everything had its own account. MAC (Moves, Adds, Changes) were insane.

So I switched from Slackware to CentOS, created an Active Directory infrastructure, and used Likewise Open to authenticate all of my Linux machines against AD. It probably saved me 20 hours a month without joking even a little.

Now, I've got everything authenticated through AD that I can, and it works tremendously. I can't recommend centralized authentication enough if you're still doing things the bad old way.

chmeee: I don't know, as my infrastructure isn't as complicated as yours. I suggest you try it on a spare box. Likewise Open is free, and it makes no domain changes at all, other than adding the machine to the AD computers OU.
–
Matt SimmonsJul 6 '09 at 12:48

Why AD and not OpenLDAP or the Redhat/Netscape one? Is it predominantly a windows network?
–
David GardnerJul 15 '09 at 14:03

Because I was familiar with AD and not OpenLDAP or Redhat Directory Server
–
Matt SimmonsJul 15 '09 at 15:06

I agree with the obvious choices here; Automation and central authentication. However, it appears that I have to be the guy to mention documentation.

By documenting as many problems, workflows, installations and guides as possible people were able to work through some of their issues without the need to get our department on it.

Another great time saver is issue tracking.

Being able to prioritize tasks, assign them to team members and getting rid of all the clutter of people sending in requests by email, msn or simply coming to the office. This also helps our good friends, the managers with seeing how efficient you are (if you want).

Then of course, the icing on the cake would be my 'RTFM' (Read the Fine Manual) mug that gets raised a lot.

Monitoring+Alerting imho is way better than documentation. It is implicit documentation (that is not that I dislike written docs). Agreed on issue tracking, couldn't do without
–
Server HorrorJul 6 '09 at 14:55

2

Monitonring + Alerting are the "what". Documentation is the "why".
–
David MackintoshJul 6 '09 at 17:58

You can have all the monitoring and alerting you like but if you don't have any documentation about what to do when the pager goes off or notes on how to extend the monitoring then you'll be stuck answering all the pages and maintaining all of the monitoring yourself. Documentation allows knowledge transfer so you can build your team and let others step in to share the work.
–
dannymanSep 1 '11 at 4:12

Infrastructure automation with a tool like Chef or Puppet is the best thing I've implemented on systems I manage. Monitoring is great and all, but often, getting the various bits to play nice with the rest of the infrastructure requires a lot of work. Chef and Puppet are both great at automating the entire infrastructure, providing a lot of glue that used to be written by hand. Particularly piecing together which servers provide what services.

Chef has, built in, the ability to query the server for attributes and recipes applied on other nodes, so you can ask who the production web servers are, or who the database master is, making automation much easier. Puppet can do this too, but it requires an external node classification tool like iClassify.

This should have obvious implications for monitoring and trending tools like Nagios and Munin. It can also, for example, provide automated configuration of load balanced environments, so the LB's can query all the web servers that need to be covered for a particular app.

The other big time saver I've implemented in a variety of environments is automated builds, like kickstart (redhat/centos) and preseed (debian/ubuntu). This should be obvious for most people, but it can be surprising how many sites still build systems from hand off CD. It's even better if the automated build gets the system ready to run Chef or Puppet to get all the other goods ready.

Configuration management ( I used puppet ) plus PXE server (cobbler) were great time savers for me. But the biggest time save has come from 'time management' I found Tom Limoncelli's book 'Time Management for System Administrators' to be invaluable in this. Now that my day is more structured and planned I spend less time 'planning' and procrastinating and more time just doing whats relevant.
–
aussielunixJul 6 '09 at 5:19

Nice automation tools, but do you know of any tools similar to Chef or Puppet that are not based on Ruby?
–
AndrioidJul 6 '09 at 7:11

@Android - cfEngine, but Chef and Puppet are nicer to work with, and so is Ruby :D.
–
jtimbermanJul 6 '09 at 14:24

Check out Bcfg2. It's similar in capability to Puppet but written in Python.
–
Kamil KisielJul 8 '09 at 22:15

Monitoring is great of course, but not sure it's a time saver. For my money it was centralized logging, with a viewing system that filtered out the mundane, highlighted the dangerous (disk failure, virus scanner finds) and displayed everything else for categorization.

syslog (and perl) for the win.

It basically allowed me to read the event logs of all the computers on the network while eating my bagel; at least a cursory check to look for anything scary. Huge time savings.

This has had the effect of now I don't have to deal with any of the, how do I get X application to install (cause you are not allowed to now), my computer has a virus/spyware, my computer is running slow and pretty much anything related to that.

I never relized how stable this made the workstations until I came across one workstation that had been completely missed in the audits, windows updates, etc. It had been running for about 4 years without a single update done to it. I think I it was on Windows SP1. That site never once complained about any issues with it and when I did discover it, I found it running great.

This is really good advice for a small company making the awkward transition into a medium-sized company.
–
staticsanJul 8 '09 at 23:25

1

Let me just add that it doesn't work to bundle together all users under a policy like this. If there are legitimate power users (e.g. developers) they need to be treated differently. If not, a) they can't do their job, and b) they'll subvert it anyway, causing other problems. Thus, the policy needs to be aware of many types of users.
–
jplindstromSep 9 '09 at 14:33

Don't assume developers need admin access. If they don't have admin access maybe then they will actually develop programs that worked properly as limited users (for a change). There are users that do need it, thats why they would be given a second account to install software,etc so their daily account is still a limited user. Much like how the Linux world works. If they subvert it, that's easy, fire the first person, the rest will get in line real quick. If they need something installed to do their job, then request what they need, not admin access.
–
SpaceManSpiffSep 11 '09 at 14:58

Between deploying servers from templates, managing servers from a single interface, detailed hardware monitoring built into the infrastructure client, it has really changed how we administer our infrastructure.

And the impact it has had on how we think of our "hardware" has really made it a game changer. Clusters are no longer "too expensive" because we can deploy them virtually. Need more Citrix servers, clone it. As long as we keep our physical hardware farm providing adequate resources (and those servers are truly commodities now), everything is peachy.

Puppet. The idea of changing one place and having all the systems that are affected is fantastic.

Couple that with standard installs, and, it's very fast to bring a new system up. You netboot and run a stock install, and then puppet takes over and everthing is configured.

Finally, standardize. No, you really don't want 35 different linux distros and 4 different Solaris versions. Work to move to one standard install. Each unique system that you turn off saves you loads of time.

Add another vote for monitoring. The principle is quite simple: I want to know what's happening before the users are affected. System Administration should ideally be a transparent role. Users should neither know nor care about what you're doing. From their perspective it should just simply work. Happy and satisfied users should equal happy and satisfied admins.

One thing that is often overlooked in IT is that the computers are there to work for us, not the inverse. Nevertheless I know admins who spend a significant part of their day manually checking their servers and the logs. Why? Computers can monitor each other and with a little scripting you can have just the interesting part of the logs delivered to you. Really, you don't need to wade through a few million informational entries, such as Fred's print job or the DNS transfer was successful. Just tell me when they're not.

The biggest time saver I have implemented was Disk Imaging of our production workstations. They are all the same and no one stores anything locally, so if there is a problem I just re-image the machine and it's all set to go, good as new.

I documented all support contract info in standardized text files in a standardized directory structure. I had one central and kept more than one copy around.

Each bit of information (web portal, phone number, point of contact, expiry date, contract number, phone menu shortcuts, etc.) where preceded by a standard tag in old .ini format (tag:data).

Finding a phone number was as simple as going to the top level directory and running:

grep Phone */*support.txt | more

Where the first wildcard expanded to the vendor and/or product name.

I did not use Excel, Word, OpenOffice, a database, etc., etc., simply because when something is down, that something might be the very thing holding your support information. Also these are not easily viewable from a text mode console screen.

I implemented an IT Department Wiki (using Mediawiki for those interested) several years ago. When we started getting comfortable using it, the reply to many questions asked around the office was "Did you check the wiki?" It took a little time for us to get used to checking the Wiki for specific information, but once we did we realized it's great potential. We have all the information we need right at our fingertips -- if there is something that isn't in there, we have the ability to add/change a page quickly.

The last job I had was for a custom vehicle manufacturer. The assemblers were minimally proficient computer users
and managed to crash the program they had to use to enter which job they were on. Everyday, several times a day, I had to go around to 15+ workstations in 3 different buildings and kill the crashed program, relaunch it, and get it back to the data entry screen. I eventually installed VNC so I could do it remotely which cut down on travel time but still involved me remembering to go in and reset the machines every so often. When I found AutoIt, I realized I could set the program to watch the computer and if there hasn't been any input for 5 minutes, it could reset the program and type and click everything needed to get it back to the input screen. Doing this saved me at least an hour a day and made finance very happy since less people complained about the computers being down and more people were entering their job data.

++ for central auth and account management including account creation AND termination handling. We have AD (two forests) and LDAP (and, until recently NDS) with various groups accessing resources existing in either directory. The time we put into getting the directories in sync and getting all resources managed in one or other of the directories has been worth it in spades.

The next-biggest win has been any amount of automation whether it's account cleanups, config centralization, or what have you.

I'm not sure how much time actually monitoring saves but it's essential. It doesn't take all that much for an environment to get big enough or complex enough for "manual checking" to be impossible and ineffective. Plus, it's nice to sleep sometimes. ;)

This may be a bit off out of the main vein of thinking on this but we also had a huge win when we standardized our hardware platform. We picked a server platform that was workable for all our OS in-house and have stuck with it for several years. We learned the hardware and we learned the remote management of it and it saves time and energy in various ways:

No more supporting half-dozen or more wonky types of servers each with their own quirks

Cross-team support: when it comes to hardware, the Unix folks know it, the Windows folks know it, even the network folks are familiar enough with it to pitch in a hand as needed since various of their appliances run on the hardware.

spare parts!

Same goes for standard, documented, and reviewed OS builds. May seem basic but I bump with shops frequently where the builds aren't standard and there's no end of messing around to see if this tool or that tool is present or whether a particular server has its settings right. That kind of chaos can turn even the most basic tasks and problems into firedrills.

Learning to delegate and trust my colleagues - once you know that you can hand off bits of work to other people life becomes much more relaxed. And not because I'm lazy and have everybody else do my work for me; it's the peace of mind that comes with knowing you have good backup. And, of course, well-monitored, standardized OS configurations on standard hardware. Goes without saying.

My goal for automation has always been that now again I get an email saying "Such-and such broke on server foo. It's been fixed." and then once I've sent the bug to the developers I can go back to reading the paper and drinking coffee. We're not there yet, but we've come a long way from the reactive chaos we used to fight through every day.

I recently implemented AntHill Pro at work and now have all our builds and deployments for a number of projects completely automated and tracked. This included creating a shared Tomcat deployment Ant library that all the projects use, simplifying the maintenance of those projects in AHP. Next up is creating a similar library for site CDA deployments.

While it doesn't save me time personally, it saves the time of our developers and our operations staff. I enjoy being the oil can for other people's wheels. :-)

I'm hoping to look at Chef and Puppet next to help out on the infrastructure side of automating things.

Oh, and documentation is HUGE help. It saves a lot of time to just point people to a well written document rather than answer the same question 20 times.

My biggest time saver was preseed scripts for the installation of our linux workstations. we have contractors coming and going all the time, so we have a pool of workstations that get re-used on a regular basis. When they come back to IT, we shove the install CD in, add the preseed file to the install command and within 20 minutes (and no further keystrokes from anyone of us) the box is back to a fresh, working base install, with all the tools and pre-configured to run in our network. Plug'n'play.