I've done a rm -rf / while going for a full path but hitting enter by mistake. /bin was blown clean away. rm didn't stop running despite being deleted, I suspect this is because the process maintains a handle on the file. Deleting /dev, however, took enough time for me to hit ctrl+c. YMMV.

That site might be a better place to post these kind of stories though. I think this post would get pretty cluttered with everybody's wtf and especially the ensuing arguments over whether it truly is a wtf or not.

Breaking the build on one's very first submit when working as an outside contributor on a sub-team at work when they have a big deadline looming and then breaking it again with the fix! I got a variety of "EXTERNAL BUILD BREAKER!!!" emails :).

The same thing happened me with some standalone program designed to add email-sending ability to PHP on Windows. Luckily there was also a DNS problem that meant all DNS queries took 30 seconds so that slowed down the amount of mail sent.

AspEmail let me guess? Guy at my work did that. Easily done. Also a guy at that same work had a mail sending app that had a bug which went into an infinite loop sending the same email. He was using our office Exchange server as the relay which got hammered, eventually ran out of disk space and pretty totally f***ed out and went into the feotal position, so no email for a day. People were not best pleased...

I did the same thing about 10 years ago with a large database of email addresses. It slowed the network to a crawlf for hours and the "to" field got so large the emails were huge. Many recipients called to comain that it crashed their mail clients.

Errant PHP script that generated 5,000 emails before Apache took a poop, and until our ISP blacklisted us. It's been about nine years, and it taught me to enable sendmail on my local box to test mail generating scripts from then on. Or else, you know. Fail.

When I first started programming in VB, I didn't know how to use the DAO object correctly, ended up writing a routine that would query each field one by one from a given recordset. Yeah that app was real slow.

What command did you run? If you "rm" (with no options), you get back "rm: cannot remove `foo': Is a directory". If you "rmdir", you get back "rmdir: failed to remove `foo': Directory not empty". But "rm -rf" is entirely unnecessary ...

Connected through remote desktop to a production windows box to make a change that's in a another city, when I was done making my change I did a 'shutdown' instead of 'logoff'!!! Since it was at a remote location and not a local box I had to own up to my stupid mistake and call to get the machine turned back on by someone at that site. I now overly pay attention to 'logoff' and 'shutown'.

2008-09-15 15:47:18

I had a situation like this before - thats why I never use the start bar, I always go Start -> Run -> "logoff"

We had some servers that sometimes wouldn't restart when you told them to. It just logged you off the RDP session, then stopped doing very much. So I used shutdown -i to tell it to restart. Sadly, I forgot to change it to restart instead of shutdown. Sadly, as a dev I don't have physical access to the equipment room and had to ask one of our server team if they could please go and press the power button...

During testing of my prototype, I needed more data than what I had available. I connected the prototype to the production data feed, watched it for a bit to make sure nothing was going wrong, then let it accumulate data for a while.

I checked on it two hours or so later and found that my program had quit. Not a big deal, but I couldn't connect to the production feed again. Oops, it's down! During my program's long processing, the data backed up on the production server side and eventually crashed the feed. They had to manually reload the data from hard-to-access tapes.

My main saving grace was that I was the one who reported it first. The person who usually monitored the feed was out, and his replacement hadn't noticed it was down for over an hour. I got a little talking to, but nothing too serious.

We did however add a buffer process so that if my system bogged down again, it wouldn't take everyone else down with us.

2008-09-15 15:47:55

+17 A:

I accidently deleted a database of over 100,000 customers.... I learned at that exact moment, one and only one lesson. BACKUP BACKUP BACKUP.

While removing old entries from our development server crontab, I accidentally copied the pared-down dev crontab onto our production server. Our flagship app uses about a dozen cron jobs to poll a database every 5 minutes for job processing. Within an hour the cron error emails going out to our dev team shut down the corporate email server.

Oops!

2008-09-15 15:55:49

+76 A:

I rewrote a whole module that was working perfectly but that looked "messy" to me. I had managed to convince my boss that it was the Right Thing To Do, and the rewrite took me 3 weeks.

I still remember the pearls of sweat running from my armpits as my boss, looking over my shoulder, was commenting on bug after bug in my new shiny super-clean module...

I can totally see that happening, I find it very probable that it will happen to me. Some say; "Just write a new one. It'll be clean and familiar". Others say otherwise: http://www.joelonsoftware.com/articles/fog0000000069.html

Most of the times rewriting something is a very bad idea. Especially when it's because it's "messy". Here in our company there is a reason why it's messy. Because of dozens of exceptions from the rules. But they are there for a reason as I found out... No rewriting since then.

Moved instead of copied a svn repository for use in a new project. The old project was in maintenance mode and so I didn't realize my error until several weeks later (and after we'd long since overwritten the backups). Luckily, I was able to export the first revision of the new project, but we'd lost all the revision history from the old project. :(

When i just started out I didn't understand how to get an identity back off a seeded primary key in the database. Instead I did a full select on the table and looped through until I found an unused key.

lol... when it happened to me, i truncated the production table instead of the test table i was working on... my face got all red, but, lucky I... Query ran at 10:04 am and backup system fired at 10:00 am

This is always one of my questions when I interview somebody... it can be very informative about a candidates honesty, problem management, and ability to learn from mistakes.

So some beauties:

Implementing password aging as a risk "0" (i.e. did not show on change reports) on the financial systems of a fortune 100 company. Who would have guessed that the many many oracle and application accounts wouldn't be able to specify a new password when their old ones immediatley expired. 6 hours downtime.

Never ever use the console of one server to connect through to another. A newbie, first day out of college and on the job, was asked to reboot a test server. No biggie. Connects to console, and reboots. Little did he know someone else had used the console to connect to the production server. Brought down a factory, but at least this company wasn't in the top 100 of the Fortune list :-)

So your manufacturing lines are run by some old HPUX machines. You have some vague knowledge of Unix, and you are annoyed that people keep changing settings on the server. I know, make the files read only, from root, with no backups:

When running an UPDATE/FROM statement on a large million+ record table, don't forget to comment out the SELECT line you just used to test the FROM clause (i.e. I updated the entire table rather than the 10 records I was looking for...)

I almost hit a zero bug count for a complete upgrade for a particular client. One of the last things I did was ensure that all exceptions were caught and recorded. Only issue was that I added exception handling into the exception logging code. This had a bug in it and that was that if there was no existing log file it would throw an exception which it would then try to log. Stackoverflow!

I blue-screened the client's laptop. :(

Still after that was fixed it was zero-bug (or rather 1-bug) so my blushes were somewhat spared/limited.

This isn't my bug but it still made me lol. The Dev manager was responsible for developing one component for a really important release. The week before the release the Dev Manager went on holiday and people tried to use his component. It worked for about 10 minutes and then fell over, the office was in panic. The very best devs on the team were assigned to find out the problem.

Eventually one of my colleagues burst into laughter and I swiveled my chair to see the following C# code (or thereabouts) on his screen.

The problem (aside from the REAL WTF) was due to the fact that he didn't dispose the DataReader. After about 10 minutes of the app executing a ridiculous number of round trips to the database the database refused to give out any new readers (or the app ran out of memory, I forget) and the whole thing fell over.

@Andrei Rinea ...because the public function GetNewGuid() in all likelyhood being called from outside sources... which may or may not have access to Guid.NewGuid().ToString(). GetNewGuid() is a abstracted interface for the implementation.

@Mohit Ranka: System.Guid is in mscorlib; it's highly unlikely that something won't have access to mscorlib. GetNewGuid() returning string implies the implementation. An abstracted interface would be GetNewPatientId() where it chooses a different type of Id based on how the application is configured/deployed (not that is necessarily a good idea)

No, really... I changed some value in sql 6.5 that told it how much memory to use. But the box was for physical memory, and I put in an amount more than the machine actually had installed. "You must restart the service for this change to take effect" which I did. At 1am, on a production server, and I didn't check the backups, which had been failing btw... anyway, the service wouldn't restart because it couldn't allocate the memory, and I couldn't change the setting because I couldn't connect to the server object because it wasn't running because it couldn't start because the value was wrong and I couldn't change it because, well, you can see the loop here...

$249 to mss and an hour on the phone with a Guru and we found the registry setting. I finally got to bed at about 4am...

I did almost the exact same thing - told SQL to use more RAM than was in the machine. Rebooted Windows NT and it blue-screened. Only solution: Add more RAM. The Server was in a customers data centre 300 miles away.

Programming a data synchronization system trough FTP, I didn't think what would happend if the FTP wasn't able to CD into a directory. Well he wasn't able to do it so it stayed in the root.. and after finding a lot of files that didn't match the synchronized system. Well the script started to delete everything in the server.....everything.

I realized half an hour later when the script was around the WINDOWS\System32 folder....

You weren't running on NTFS? Or were your permissions too permissive? That sounds like something I've had to happen on my laptop with ViceVersa; user was owner of everything, added C:\ by mistake. You can guess what happened next.

When I set up a new VMware instance at the evening and went home keeping it running. What I didn't noticed was that I used the IP of the Nameserver as IP of the VMware instance. Suddenly all hosts in our building tried to connect to the VMware for DNS lookups.

Our whole network was practically down.

Since this was a VMware our Admins were not able to track down the MAC address. So they had to plug off every single computer in our office (~500) until the problem was gone.

At the next day I found a letter on my desk: "Who dares to switch this computer on will die".

I once forgot that the id column in a table was not auto_increment, and was in fact a foreign key to a single row in another table. I made the mistake of saying, "Oh, I'll just manually update this row in the live database" and ended up 'fixing' about 250 rows. Luckily it only caused about 10 minutes of downtime for the users and no one flipped, but I think I'll just ask the DBAs to do it next time.

I was working on an iPhone version of a site. I had just woken up and didn't check what directory I ftp'ed to. Before I knew it, everyone visiting the site was seeing the iphone interface (which had administration links and the works, also not pretty). I didn't have a backup of the index.php file and panicked. Luckily, the site uses Wordpress as the frontend and was easy to fix. Now I double check my paths and never use index.php when testing things.

I started a project without a specification. When I asked if I could work out what was wanted I was told not to speak to the client.

It's a meal ticket for life.

2008-09-15 16:48:54

Then there was the time I was forbidden to talk to the project engineer (long story, and not my fault), and got a bug report: "It runs for half an hour, and then crashes." I informed the person I was supposed to talk to that I needed more details.

I had multiple windows open... what's this stuff doing in D:\winnt? Ah, just old crap... I'll delete it... except it wasn't the local machine... it was our main server at our Internet Service, 25 miles away.

I once wrote a symbolic assembler in itself. I actually punched the source card deck and brought it into the machine room before realizing that I had no way to translate it the first time. Not a great public embarrassment but I did feel awfully stupid standing there trying to figure out which binary executable to load.

My new language WTF can create any compiler with the help of the sole statement BuildTheDarnThing <language>, so it should create itself by just BuildTheDarnThing WTF. Now I only need to find out which executable to load ...

A while ago I had the first live purchase on a software shopping cart module that I coded in ASP.NET/C# for a popular CMS. We had tested and retested all aspects of the cart (except the live purchase, which we avoided because of the credit card bills).

So the purchase goes through for $32.95 in all database records. For all intents and purposes, everything looks correct. The problem being that the Authorize.Net receipt shows another story. In reality, the customer paid $2.00.

OMG Hacks! was my first thought. I retrace all the code to ensure nothing can influence the price besides the items in the cart. The item price is correct. The quantities are correct. The totals and subtotals are correct.

Finally I trace the total to the last stage of the purchase process, wherein I format the System.Decimal type to a string for insertion into the authorization transaction via HTTPS.

I see:

x_amount.ToString("D2")

And now I see the source of the "$2.00". I rack my brain, trying to remember why on earth I would think this would work. I run a test and the string returned is "D2". I cry and wail. Evidently Authorize.Net thinks "D2" is close enough to the requisite "2.00" to charge $2.00. Finally I remember that I saw this as a forum post suggestion. (Where was Stackoverflow.com then?)

When originally coding, I had planned to use the "C" (currency) formatting. This does everything correctly except for pre-pending a "$" to the string. The Authorize.NET API docs say they want it in decimal format without "$" or any other monetary symbol. So I went to the collective wisdom of other .NET developers for a quick workaround. I didn't like the idea of formatting as currency and then stripping the first character, so I saw an off-hand post about "D2" formatting causing essentially the same format but without the monetary symbol. I believed it and did not verify its output. Gah!

Not to mention that this had been extensively tested in test mode. But for some reason we thought nothing of having Authorize.Net return transactions in test mode that had a purchase price of $2.00. Myself (and unnamed others) thought it was just a quirk of the test mode...

The morals of the story are:

It's not a valid approach until it
works for the end user.

Make sure a "found solution" does what it
says on the tin.

$30.95 is a bit spendy for a bug
report, but it got the job done.

If you are going to "mess up the mundane details", make sure it isn't going to cost your customers all but $2.00 of each sale.

Happened at my place, too, when someone was looking for the "oddly shaped light switch" in the dark. The light switches were little cubes stucking out of the wall and they were so hard to press that some people believed they wouldn't move at all.

At my previous work, a guy had the computer, the tower, on the floor and the button for reseting it was prominently out and at the bottom of the machine. The result is that half the time someone sat with him to talk or pair, they would reset his machine.The problem was solved with a coke top and scotch tape.

I worked at a place with a computer in a training room. Once, while giving a training session, the instructor, a tall woman, hit the power switch with her knee. The next day, there was a piece of cardboard box covering the power switch with "KNEE GUARD" written on it.

I've worked in my company's computer room for over two years and I've noticed that some manufacturers appear to design the emergency power-off button in a way that *encourages* accidents. A lot of the ones on our arrays and tape libraries stick out several inches from the unit. Only one or two of them have plastic covers. They're almost as attractive as shiny objects to some people.

My girlfriend (at the time) just finished writing up a lengthy essay for an assignment she was doing and asked me if I'd read through it for mistakes. I wheeled over on my office chair and whacked my knee into the power button and she lost the lot. Eee, things were a bit tense for a couple of days.

Heh, I did something similar meaning to remove some .tmp files and accidentally typed "rm -f *" instead. Unfortunate side effect of late night hacking is you get tired ;) Thank god, the main source code file was still open in vim or I'd have lost everything I was working on!

A GUI might be safer, but it doesn't offer the same speed or options, once you get back used to using a command line there normally isn't a problem and you can work much faster without the need for GUI programs.

This wasn't my biggest WTF moment but it was the biggest I have heard of to date.

A decent sized grocery store chain that is based here in Grand Rapids had a developer that wrote a system for promotions. The system would take in SKU's and you could set rules like buy SKU "XYZ" and get SKU "ASDF" half off. This system was in production for a while and one of the "features" was that you could leave the SKU blank and it would apply to all SKU's.

Someone on the business end didn't realize this and they accidentally set up buy one get one half off rule but without any SKU's.

So basically anyone who checked out at any of the stores could buy a pack of gum and get the rest of their cart half off!!!!

It made the news but they never really mention what really happened behind the scenes....

I added a new feature to update status on table with over a million records. The SQL update query was "UPDATE table ... WHERE id = id" instead of "WHERE id = :id". Thankfully this was infrequently used and the database server only crashed a half-dozen times before the problem was found and fixed.

This one is more of a dumbass IT moment that was catalyzed by development, but here it is anyway.

I was putting together a simple console app to interact with a SAM filesystem on one of our Solaris servers and help with file management/restoration. I needed to update some of the library files in /lib, so my first instinct was to backup the files in another directory in case I needed to go back after overwriting them. Made a copy of the old files, put in the new ones, didn't fix the problem I was looking to fix. So I go to restore from my copy, start by deleting the current installed libraries without thinking, then tried to 'cp' the backed up files...problem was that 'cp' is DYNAMICALLY linked against those same files...so it threw up.

I had deleted the dependencies for pretty much every fundamental utility on the Unix box...on a prodution server with a few hundred connected users, all hourly 'data-manufacturing' employees who need the server to do their jobs...

Panic.

Luckily we had another server running the same version of Solaris, so I hopped over to that one, wrote a quick and dirty 'cp_oops' C app in about 1.5 minutes, compiled it to STATICALLY link, pushed it onto an existing share to the broken server from the non-broken one, ran back and copied the libraries back before anything threw up in a noticeable way (to the production staff at least :p)

2008-09-15 18:21:05

+4 A:

Some badly coded linux applications require insane permissions to be set on every directory they use. One in particular, a perl-based music streaming server that shall remain unnamed, required execute permissions on all files in my shared music directory.

As I frequently add new music, typing the command 'chmod -R 777 ./' in the data directory became routine. Worse yet, as several user accounts used this directory, the command was executed as root.

Being a very l33t fast-typing keyboardist, and as most of my peers have probably experienced, there is a quantum effect where keys change places for a split seconds which cause the characters to be input out of order. So, one night fate would have it that the order of characters be 'chmod -R 777 /.'. For the uninvited, this means that everyone will be granted full access to every single file in the entire filesystem!

Fortunately, I quickly discovered my error and managed to abort the operation after a few seconds. It still took me a couple of days to clean up the mess though.

I don't do it this way anymore. And I'm glad my job title does not include "UNIX admin".

2008-09-15 18:21:57

For extra fun, that command removes the SUID bit from certain executables, like /usr/bin/sudo or /usr/bin/X.

Wrote a plug-in for MS's IIS web server that returned an XML-formatted dump of one of our application's databases, without taking into account the amount of data involved. Turns out IIS wasn't happy trying to return 10-15 MB in a single request, and would routinely drop the connection. Worked much better when we fixed the plug-in to send data in 64K chunks; even better when we came up with querying semantics more sophisticated than 'gimme all yer data'. :-)

Not making backups, and not using a source control system. Fortunately I learned that lesson about 15 years ago.

2008-09-15 18:48:18

A:

My worst was made years ago when importing a bunch of stored procedures to a live server.

I didn't notice I had said to drop the existing data on import. (Why is that a default setting?)

The backups failed because they were spread over two rather large files we couldn't get to download from the backup location.

The site, a statewide system, was down for 3 days. And I made the mistake on Easter Sunday.

2008-09-15 19:12:09

A:

rm -rf is not something you want to get in the habit of using often. Leave off the -f unless you have a good reason to include it. Especially after you start a new job, it stinks to have to inquire--during your first week--about what automatic backups are in place.

I used to work on laptops where that wouldn't work. In fact, the users kept spilling coffee on the keyboards, then being shocked when they found us in the bathroom running water over the laptop to clean it off. Panasonic CF-10s were great :D

Our product was used by police forces to input data about people that are arrested and what they are charged with. It would also store digital mugshots and fingerprints, and electronically submit the fingerprints to the FBI. While testing, we would routinely use our own fingerprints for fake bookings that got inserted into the test database. Except for the time that I "temporarily" switched the test machines over to the production database and forgot to switch them back...

Cleaning up our production database was easy, but it took a court order signed by the superintendent of the Boston Police Department to remove my colleague's fingerprints from the FBI database -- she had booked herself under the name "Elroy Jetson".

This story is 100% true. At the time, the security was terrible. We could walk into the police HQ, by the front door guard who didn't know us, and get into the server room with just a 3-digit door lock. They have since improved it, and I doubt we'd have direct access to the DB now.

When saving reports to disk, I took the date and time and computed the MD5 hash and used this as the filename. I think I thought the MD5 would make the name more unique! They were displayed in alphabetical order, which meant every time a new report was created it popped into a seemingly random place in the list.

I have absolutely no idea what was going through my mind that day. Once my colleagues spotted it, it was swiftly removed and the standard response to any question became "MD5 it". Doh!

Removing safety checks in a sales
application for a tradeshow and
deploying it immediately is a bad
idea :)

A few years ago, I was responsible for slapping together an ad-hoc sales program for a company that was selling its products at NAB. The system was to be run on laptops connected to USB card-reader out front, all connected to a "server" in the back part of the display. We ran a bunch of tests to make sure the card readers worked, and that we could properly charge credit cards and everything looked great.

The first morning we started off with pretty brisk sales, and it looked like the system was performing as expected. Then at about 11am, a guy gets shown into the back room and says that he went to use his credit card at another booth and it was declined; after calling his company they said he had reached his limit and the only other purchases today were listed as being from us.

This is what had happened: the salespeople were complaining that they had to press {ENTER} everytime after they swiped the card to verify the amount and send the credit charge through. Figuring that everything was ok, I circumvented the dialog and had the app just send the charge directly. What I didn't realise was that the USB card readers could actually send the "swipe" message several times in a row and now the program was merrily charging people far more than they expected.

I spent the remainder of that morning crawling through the hundreds of credit card transactions voiding all the duplicate/triplicate/.... charges we had made. Never, never again :)

Didn't happen to me personally, but a little while ago someone in my company renamed a cronjob file to cornjob, which caused mass confusion. For several days. Almost had a client drop us because of it.

I was logged into my new dedicated server box and configuring some firewall rules over ssh. The first thing I did was set it to not accept any connections from anyone. Then I saved it to test that before going through and adding the various ports I wanted to allow.

This is why, whenever I'm making *any* changes to my Cisco router's config, I start off with a "reload in 10" command that forces the router to restart itself in 10 minutes unless I cancel it. That way, until I verify the changes work and save them, the router will automatically reset its config...

One friend of mine did something similar. He installed on his PC a "paranoid ultra secure OpenBSD variant"... Forgot to abilitate an account... root disabled by default... hd crypted away... Ah, the joy of reformatting!!!

I am not too proud to admit a similar fudge-up....only difference being that I repeated the mistake 3 times (the result, not the approach ;-) in 10 days....with the server located on the other side of the country....Learning through extreme humiliation is , to understate it, effective...

Like the time when, instead of moving a new library on top of an old one, I deleted the old one first. Turned out it was used in user command processing. Fortunately, this was before I'd actually done anything with the system, so re-installing the OS worked just fine.

deploying another developer's code straight into production without proper code review or testing:

this guy wrote a simple java servlet to handle 404s and general 500 errors. it was supposed to just kick the user off into a simple "this page cannot be found" or somesuch error page.

the problem was, his servlet made a database connection ( which may or may not have even been used; i dont remember ).

so the first time there's a database error ( probably something to do with a temporary lack of database connections in the pool ) - the error servlet gets called up, tries to access the database, which throws an error to the error servlet, which tries to access the database...and so on into infinity.

over the next 24 hours, the our site gets close to 5 million hits, and all of our servers grind to a halt.

lessons learned:

don't push unproven or worse yet, unseen code straight to production because a suit says so.

On a DG-UX system, pressing TAB while typing the first letters of a command (probably 'shar') as root - of course - and getting 'shutdown' unexpectedly. But it was fine, because I had typed '--help' as the parameter. Except that its shutdown didn't seem to care about that parameter, so it shut down.

Tools down around the office, me watching it boot back up (it took about half an hour) with management laughing on the other side of the server room window.

I learned quickly about sudo - and about checking man pages on any flavour of UNIX I wasn't intimately familiar with.

I was working on a database project for a client and had written all my DDL for creating tables and such in scripts for deployment from environment to environment. The first thing the script did was drop the tables before recreating them. I did NOT however code a prompt for what database to hit, it just worked in your current session. Well as you have probably already guessed, I was in the wrong environment when I executed it! And this was after they had hired temps the week before to enter production data. Luckily we had just created a backup that morning. This was quite early in my career and I learned a valuable lesson from it!

Big press gathering.
Have coded for four months straight.
About to demo interactive game for kids on HIFI you wouldn't believe, 3 4x3 meter screens, as well as 30 client computers in an auditorium.
Starts show.
50 seconds into the (beautifully synced across the 3 screens) intro, power goes down.
5 sweaty minutes later, power back up. Ok.
Another 2 minutes later, past the intro, players begin hitting the database with answers to questions.
For the real WTF, everything slows down to a crawl, audio/video out of sync, client screens dead, all waiting for.... The g¤#! d"!#¤d Access database the client insisted on!!!

I was prepping some data for a manual email marketing blast to some 2,800 users. I forgot to edit out the loop to send to all customers during my initial test run. So, when the loop over 2800 customer ran, it sent to my test address (my email address).

To make matters worse, my browser crashed due to the POST, and when I brought it back up, re-triggered the action, sending a total of 4600 emails to my inbox.

I'm the admin of my email server, so thankfully I was able to do some cleanup on the box, but not before it took me over-quota and nearly killed Outlook.

when i was messing with intel 8086 instruction set, its MOV instruction set has

'MOV target source'

but linux has

'mv source target'

just image what could happen when get confused....

2008-09-15 22:00:39

+7 A:

Always check your web.config or app.config files before uploading them or checking them into source control. You don't want to leave your passwords and localized settings in there. I've done this more than once.

First day on the job I was asked to set up a new machine for development. So I log onto the machine, sudo to root and start by covering all the basics... like the firewall. Ofcourse I previously had only configured such basics from the console, so it took me a few moments to figure out why I lost my connection right after the

I once left in some testing code that caused a thread to sleep for 10 seconds during an important loop. Ground the system to a halt, and it took a lot longer to find than it should have. It's become the canonical example of screw-ups at my company.

My first job was working at an ISP back in the 90s. We all brought our computers in from home to play Doom during my first month of work. When it came time to leave, I disconnected my computer from the network... and apparently I took the network terminator with me (not realizing what a network terminator was at the time). Doh!

I got a new version of an operating system and was all giddy to install it. I went through my files and thought I had backed everything up. Turned out I backed up everything BUT my programming projects. Lost LOTS of work to say the least.

Once, our credit card processing system was down, so we just collected charges to be put through once it came up. When it was available, we wrote a script to collect the charges and input them into the credit card processing app. To make the coding and rounding easier, we treated the numbers in the script as cents (i.e. we multiplied by 100). Of course, we forgot to divide by 100 before submitting the charges. Even worse, we realized the problem after only a few charges went through, but the application would not let us remove the items or even void the transactions until it cleared its queue, so we had to wait for it to finish mischarging about a hundred people before we could void any of the transactions.

Lesson #1: Make sure that your data is presentable at all times.
Lesson #2: Make sure that your data doesn't ever look misleadingly correct.
Lesson #3: None of the credit card companies called us to ask why we were submitting so many transactions beyond people's credit limits.
Lesson #4: This is a good way to get a list of your customers with large credit limits.

I was in the lab making some last minute fixes prior to a customer trial, if I recall correctly. It was crunch time anyway, and I was behind the eightball. It was a windows dev machine with mingw installed.

I did this: grep texttofind *.txt | output.txt

Oh, power of recursion!

It filled the C drive in seconds. "Out of system resources" message boxes were poping up all over the place. I was frantically trying to clear them so I could kill the grep command when a ill conceived Windows message came up:

"Drive C is full do you wish to format? Ok / Cancel"

Before I realized what the message read, I clicked "Yes".

That was all she wrote, I was staring face to face with "non-system disk error".

Simultaneously both stunned and pissed, I tossed the dead machine aside, grabbed my personal machine from my cubical, pulled a copy of the trunk from subversion and I was back in action in a couple of hours. Thankfully, I did a commit earlier that morning.

What version of Windows is this? No out of disk space message in my memory has asked if you want to format the drive. I need proof, because if true, whoever wrote that particular dialog must have been hitting _something_ damn hard.

I once drove two hours to our datacenter to install a new production box. After getting back and spending the next day configuring the server and installing our software, the last thing I needed to do was change the password.

Somehow I managed to enter the new password twice with the same typo. After trying a couple of dozen permutations to no avail, I finally gave up and had to explain to the team why I would be spending the next day back at the datacenter.

My experience was last year, fall semester. It wasn't so much as a WTF on my part, rather a WTF on the instructor's part. She made us comment every line of code (including whitespace) the entire semester. She'd dock points too if you didn't comment properly. It was somewhat overbearing.

Formatting a floppy disk - but instead of "format a:" I typed "format c: " and didn't notice until I got past all the prompts ...

2008-09-16 05:59:03

I once did a dd if=/dev/zero of=/dev/sda1 when I wanted to wipe the USB disk at /dev/sdb1.... bye bye Linux root partition. The fun part was that I managed to restore everything from a second Linux server running the same OS version, and avoided a complete reinstall.

Once I were injected into a classic ASP to Asp.net conversion project. [total 1.5 months and we have no contact with the client during this he just give a Classic Asp site code]
Login module is completed by the fellow developer. After all the hard work and even working from home when I have completed the project a week and a half before the schedule. I check the login pages of old application then certainly found out that my other developer have hardcoded a single role sign up. [Where there are roles with their respective pages.]
This wasn't even checked by the Project manager who'd given the task sheet for completion. Left out pages were equal to the one I have developed. Eventually I check all those pages of classic asp again and due to Full OOP approach I was used in the project and working from home I was able to complete the Project a Day ago.
And thanks to God project was very happily accepted by the client with more then expected performance and reliability.

2008-09-16 06:57:18

+3 A:

Boy!

We're talking about 15-years ago, working on a 64 user DEC VAX running VMS. I'm debugging a program I wrote to scan through a bunch of files because it is hanging in an infinite loop. I made a small fix (moving the loop brackets), run and then... the whole system crashed, not just my login but the whole multi-user system, and left about 50+ people twiddling their thumbs.

2-hrs later, the system is back up, I log in, I get back to my program and run it... and the whole system crashes again.

Late afternoon, the system comes up and I am just logging in when my phone rings. It's the sys admin, shouting abuse and telling me not to run that $^&*ing program again!

Turns out I had made an infinite loop that continuously opened file handles. There were no quotas set on the DEC VAX I was using so when the system ran out of file handles it just crashed and burned taking down everyone else logged in as well (most of my division).

I managed to cause a subtle bug that lasted dozens of revisions, caused random errors in output, and took days to track down... by misspelling the name of a preprocessor #define. It was worse because I made the assumption that if the code was broken it would be immediately obvious, but for extremely subtle reasons the broken code would only cause problems in unbelievably rare circumstances, but just often enough to be a major issue.

This also makes the record of my shortest bugfix ever, a bugfix which added two characters.

That last '*' is a doozy. I found out that our organizational restore procedures are somewhat lacking.

2008-09-16 12:47:57

A:

Back in the DOS 6 days, I was playing on my Dad's work computer (he's an accountant) and I discovered the deltree C:*.* command. To complicate matters, he was using Stacker to gain some hard disk space. It took a long time to recover from that, and it was quite a while before I got to use the computer again.

Near the end of the dot com bubble, my company was doing research on a sector of the market. We had been given a database with company names in that sector, and we weren't sure if many of them were still viable companies. I wrote a quick and dirty app which looped through the database and tried the URLs to see if it got a valid response... the assumption being that a 404 would be a failed company. The app used the IE browser COM component and actually displayed the pages while it processed. I split the database into three sections and set it to run on three machines beginning at the close of business and running overnight. My cube was extremely proximate to the CEO and CFO.

Upon arrival the next day, I discovered that the database was not at all accurate. Apparently it was open to the public for update, and numerous spammers and porn companies had inserted records and URLs of their own. This, in itself, was not terrible. What was terrible is that many of the pages when loaded, spawned pop-up windows of extremely explicit details and while the program moved on to the next page, the pop-up windows were orphaned and visible for all to see.

Are you sure the database "was open to the public for update"? I'd guess the porn sites hadn't just registered the domains of the expired companies, hoping to get some traffic from old bookmarks etc. That happens a lot.

I once worked on a project with a multi-architecture build, spread over several machines.

Occasionally, a build would fail on one machine, and leave files lying around. To fix this, I would log onto the offending machine and clear down the directory - conveniently named /tmp/build.

I found I could do this remotely in a nice simple commandline:

rsh -l user "cd /tmp/build;rm -rf *"

This seemed to work fine. But one day the build failed for a slightly different reason, and the /tmp/build directory wasn't there. Just to add to the fun, the user was root, and the default home directory was /.

2008-09-16 15:30:58

A:

Using a workstation's MAC address as the "register number" on a (DOS) networked point-of-sale system. Hello, register number 08AC00007AEC0991!

In my first year as a developer, I accidentally pointed the production app to the testing DB. Nine days later I cleared out the testing db. One day after that, I realized what I'd done, but it was too late. Nine days of registrations and financial transactions were gone.

I was given a GPS system that was used on ships and given the task of getting a program to interface with it and collect co-ordinates. We didn't have a manual for the device, but once I got it powered on, I found a big help button. I thought that might at least get us started with how to use it.

After pressing the button, it beeped a couple of times, and then the screen started flashing: Sending S.O.S. signal

Gosh damn - I unplugged the power cable, hoping the thing would turn off, but it must have had an emergency battery inside, because it carried on going, and there was no stopping it.

I waited, very anxiously, expecting a Sea King helicopter to appear outside the office at any moment, wondering how I was going to explain what had happened.

Fortunately - either because I was indoors and the signal didn't get through or because the receiver of the signal realized an S.O.S. originating 100 miles inland probably wasn't a real shipping incident - no sea king turned up. Phew.

Say you work for a company. Say your company is providing software for another company. Say you also provide them with a server to host said software. Now, say this company is far far away. Ok. Say when you telnet in to the remote company, you don't have a user account. Alright... now say you ask your manager "who should I log in as?"... if your manager says "just login as root, it's an empty server anyway" - Here be Dragons! Within 20 minutes we got a broadcast message saying "Ha ha ha! Got you!" followed by a sudden drop of our telnet sessions.

The best was when I was remotely connected to our build machine, and we would sometimes have issues where DNS would drop on our internal network so the easy solution was to bring up a command prompt and do:

ipconfig /release
ipconfig /flushdns
ipconfig /renew

As soon as I brought up the command prompt and typed ipconfig /release and hit enter, I was wondering why I lost my connection to the server...

Then it hit me, ahhh doy! Remote Desktop Connection! LOL

So I had to go down to the server room and directly connect to the server.

Working with about a thousand blogs and making an uninstallation script for one of them...
Well, hrm.. The script took about 999 too many.
My Boss noticed that he couldn't access one of the blogs and he asked me, I tried to come up with a good excuse but I couldn't...

He then asked me if I had fixed the backup system as I promised... but well, there where complications. -.-

2008-09-16 19:23:12

+3 A:

I was young. My manager didn't like the numbering scheme of the backup tapes and told me to recreate the backup tapes with better numbers. He told me to use [some specific command syntax I've forgotten] and I used that verbatim, as instructed. It released the backup tapes for reuse instead of renumbering them. We caught some of them, but some were overwritten. For the next eight years I was afraid that this major metropolitan hospital would be audited and have no financial data. I think most of my major mistakes have come from trusting someone who said "do it this way" and not researching the method myself.

ok, even if you will not believe me when I'll say this, I DID NOT DO what follows.

years ago we were working on a very big web site, for a very big customer.

a colleague of mine, while working on html pages, for some very obscure reason decided to launch a search and replace command using his VERY powerful editor, directly on the production server's file system.

that was - needless to say - after a long session of updates on the pages contents.

well, the command was to REPLACE EVERY SINGLE SPACE ON EVERY SINGLE FILE WITH AN EMPTY STRING. every single little space. disappeared.

the site started to implode. the command caused a sort of zipping of everything, and it was not reversible.

after a few seconds, when he understood what was happening, he even tried to launch himself on the network cable, but it was too late!!!

so: don't work directly on production pages, backup often, and turn on brains!!!

I Ran a batch script I've found in some blog in our server to delete some files.

I just wanted it to delete some specific files (older than 7 days) recursivelly in a folder, but the script found some files with names like %temp%file and began to replace them with the windows variable values,

It ended up in the C:\ dir and began to delete everything.

Luckly I was looking at the screen and hit Ctrl + C ASAIC, but sadly the server was a bit fast and it was able to delete 2 databases in the meantime.

The worst part is that we just found out that it had time to delete the databases one month later, and the backups were keept for just 7 days.

Lessons learned:

Never run a script you've found on
the internet and you don't
understand it's code.

In first device programming job, me and my co-worker were trying to solve a problem involving the force that needed to be set as a parameter to allow our $25,000 device component to extract itself from an injection point. Unfortunately we made the assumption that increasing the value of the force parameter in the function we were calling on the device would do the trick. What we didn't realize (not having read the manual) was that the force was in a range from a negative value to a positive value, with the positive value being downward force. We needed upward. Our device got jammed deeper in the vessel it was injecting into... and of course then the next step in our process was a very forceful shaking of that vessel thus destroying our injecting device. First month on the job and I'd already cost them a $25k part...

2008-09-16 20:23:33

A:

Wasn't a programming WTF but I drove almost 2 hours to install an important software update to one of our store computers. When I got there I realized that I had forgotten the 4 or 5 floppy disks that were required for the update. I ended up using a remote connection and very slowly downloading each individual disk for the installation. I never told anyone there about it...I was too embarrassed.

It was the first ASP.NET 2.0 application on a server running 1.1. Someone else ran some tests on their machine and said that running asp.net 2.0 and 1.1 on the same machine was fine.

Their machine was XP running IIS 5. The production server was running IIS 6.

Application pools? What is that?

So I deployed the application to the production environment with the same app pool and it didn't run. I changed the framework version to 2.0 and it ran.

I goto another site on the server and it was dead. I get some error message about the .net framework version. I googled the problem and added the application pool and set it up properly.

Phone call from the boss who is on a business trip because she was giving a demo in front of people. It turns out I crashed the server during the demonstration. When she refreshed the page it was fixed.

I once wrote a VB6 DLL that read from a database and auto-generated an HTML page.

It gets worse.

The page was a bunch of values that the user needed to enter and save, so the DLL also auto-generated an embedded javascript function that iterated through the controls on the page and composed something with the values; this was sent back to the server and saved in the database.

It gets worse.

The thing that the javascript function composed and saved was itself another javascript function. This saved function was embedded in the page and called when the page was subsequently reloaded; the function iterated through the controls on the page and set the values of each control to what they had been when the user saved.

So: I had VB6 code generating javascript code which generated more javascript code.

It gets worse: I didn't know how to use any of the escape functions in VB6 or javascript at the time.

Worked on network test software at NASA; I wrote an app that could generate arbitrary packets including poisonously malformed ones to stress test network equipment. I had 2 nic's in my development computer: one connected to my private test network. The other connected to the Official Government Network which is super-secure and managed by a third-party contractor who is paranoid about everything that goes on on their network. Guess which nic I accidently had selected when I sent 10,000 malformed ping packets from a sham IP address of "0.0.0.0"? It took my boss a week to get access turned back on for me; I'm told the network admins had come to the building's utility room and physically removed my ethernet cable from the patch panel. But I was just a summer intern, so really, what could they do to me?

2008-09-17 02:40:37

The answer to that last question could have been a lot worse if you were working for NSA instead. ;)

I was cleaning up in a source repository (StarTeam) and went to delete a bunch of files marked Missing. For some reason I was thinking, oh they're missing from StarTeam because I never checked them in. Later did I find out I deleted 70ish files which wasn't a joy to recover.

I once wrote a script that would zip up a bunch of python sources into a single zip file (organized into separate directories). All these files came from a subversion repository, so to cut back the size of the redistributable, I copied all the relevant files to a temporary folder within my working copy and then deleted all the .svn directories within it.

I spent the rest of the day trying to figure out why I couldn't commit the changes I just made. As it turns out, the script was overzealous and deleted the .svn directories not only in the source files, but the directory the zip was built in and the .svns two levels above that.

I now realize that it's better to either move the files somewhere OUTSIDE the repository before building or to not copy the .svn files in the first place.

Next story: I was trying to delete a package from my site-packages directory. I had grown tired of typing in long file paths and got to where I would just drag and drop files to the command line window. Here's what I meant to do:

Quite awhile back a colleague and I were working on data conversions from one OS to another. This is back in the days when there were scads of competing non-standard computers (Eagle, Apricot, DEC Rainbow, Heathstar, etc.) and PC-DOS was just becoming more than a minor entry. I was standing behind the colleague, dictating actions, and he was at the keyboard entering them in. I was leading him through the cleanup phase of the 30 minute process of disc creation, changing directories constantly, and while still in the root, told him to type in del *. After he hit Enter, he looked up at me, and we both groaned at the same time.

It was the first computer I owned, paid for with my own money. Windows 2000 suddenly decided it would not power the computer down anymore, but would give me the "It is now safe to shut your computer off." This annoyed me so I started googling a solution, and someone recommended flashing my bios. Not even knowing what bios were, I thought this was a brilliant idea. Unfortunately, my floppy drive died in the middle of the update on the computer I had owned for only 2 months. My next computer had recoverable bios.

First day on a new job and I was tasked with helping convert a CVS repo over to SVN. While I was working I lost track of which server I was connected to in which terminal windows and wiped out the entire CVS repo.

I was in charge of a studio for a live TV breakfast show. We ran the closing credits etc... 5 minutes later the presenter stopped checking his email on the PC at the studio desk, looked around, then nonchalantly wandered into the control room to tell me he was still on air. I looked at a the televisions around the office and he was indeed, correct. I forgot to switch the transmission feed back to the network feed.

As a vendor, I was once working inside the data center of a private ATM (automated teller machine) network. One of the customer's PIN had to be reset as part of our maintenance work. I knew the encrypted PIN block of 1234 and wrote something like the following in SQL Query Analyzer:

update atm_card set pin = 'BA3452318689A190'
where card_id = 5

and somehow I selected the first line and pressed F5!! I didn't realize my mistake till the call center started getting calls from customers that their PINs were not working. There were around 10 calls in 5 minutes. When somebody from the call center approached me, I realized the mistake and temporarily delayed breaking the catastrophic news by saying that the PINs will work when the maintenance work was over.

I saved the day by looking for any backups the data center had taken that day; restoring the database with a separate name and running another update query referencing the external DB!

Lesson learnt: Always, disconnect production servers and take database backup before making any changes

Wow, this is a good reminder to be careful about having a selection while a query is being run in sqldeveloper or SSMS. I'm actually of the opinion that those tools should NOT have this functionality; better to be slightly less efficient and have to comment out sections of queries manually than run into a situation like this!

@Exception Obviously that's where orip came up with the figure, but it assumes there's an exactly even distribution of pins, which is unlikely since banks let people choose their pins -- a lot of people are going to choose common patterns like 1234

I've made the mistake, as a contractor, of assuming functions in bank code actually return a meaningful result. Really annoyed the boss when he went to demonstrate my changes to his existing code and nothing would commit (his function always returned an error code).

This would qualify as my FIRST WTF in my programming life about 18yrs ago at this point. Just started a new job as a programmer working in a language called MUMPS. I'm learning the ins and outs of it. It stores data in global references, designated like ^A, ^B, etc. So I was using ^CTK which equates out to my initials, and but also happened to be used for a system 'caretaker' process which governed the whole database. KILL ^CTK wasn't appreciated by the users or my new boss.

Wrote a little utility for a friend who was running a mailbox system at his home.

Because his machine crashed regulary he asked me to write a little watchdog that after an hour of harddisk inactivity simply reboots the machine. That was way back in the DOS days, and I was an assembler coding fanatic. So I started to write a little TSR programs (does anyone here remembers those?).

I hooked myself into all DOS interrupts and just forwarded the data to the original interrupt handler. To check if it works so far I flashed the VGA border color register.

Started my program -

everything seemd to work well.

Typed DIR . /s to make some disk activity ... Screen border flashed for a moment, then silence. Dead silence. System hung.

I rebooted, but the system didn't came up anymore. After a long recovery session I was able to boot to dos again. It turned out that I forgot to save the registers in the interrupt handlers around my border color flash code. That did all kind of nasty things like turning read requests into writes and vice versa.

I messed up my harddrive so bad that I lost most of my content. Guess who hasn't made any backup of his non-toy project? Yep. That was me.

I lost 3 month of works that way.

Never again I'll hook into critical interrupts on a production machine.

@Andrei Riena the memories still hurt.. I had a Turbo Pascal project with 100k lines of code in the works at the moment. I didn't lost all, but I lot the essencial three month of productive work that made the difference between a hobby project and something serious.

I erroneously put my home phone number vs the company phone number in an licensing error message for a product which we released a "free" version of on CompuServe. I did not discover this error until I received a phone call 2am requesting to purchase the product. Doh!

:) Always works better when it's plugged in! I recently pulled apart our gas-top oven after the piezoelectric starter stopped working...turns out my better half had accidentally turned off the switch to the starter, which is hidden at the back of the cupboard next to the oven.

@Byron Whitlock: You Are Not Gonna Need It. A phrase indicating some added constructions that 'may come in handy' in the future (but after a thought or two, are probably never going to be needed). But actually it's YAGNI - corrected for that.

I had a block oriented data file class once that was absolutely central to an important sensor processing application at a long-term client. It had an internal structure a lot like a simplified FAT filesystem: allocation table, subdirectories, files, etc.

I took pride in being somewhat performance oriented... maybe not to extremes, but at least I was making sure the design was such that any serious bottlenecks were avoided.

Back to this sensor processing application. Once the files started to get above about 50-100 meg, a simple data read was taking up to a few hundred milliseconds, and I was ignoring it for years as just an IO bottleneck.

It turns out the initial block offset lookup was not getting copied into the file read properly, resulting in the read function reading from the BEGINNING OF THE FILE every single read, until it got to the blocks it wanted and copied them into the waiting buffer. Every graph on the screen called this disk read function 4-8 times a second, so you can imagine the effect this had.

Due to file caching, most of the disk read was in memory and so it came back very quickly.

This bug existed for MANY years, and once I fixed it the entire application became about 10 times more responsive.

During my first year at university I worked on a game project in a course. When the course ended we figured the game was good enough to be used in game competition for Swedish students. So we submitted it and made it to the finals and therefor went to Stockholm (the capital of Sweden) for a dinner and party. We didn't win but some people's eyes were caught by the game and we got the opportunity to upload the game to one of Sweden's largest game sites. The problem was that I screwed the realeased version up. I used a Swedish letter in a settings file and because of that the version that could be downloaded had no enemies in it! The worst part is that I used the Swedish letter because it formed a really bad joke compared to using the real letter. At least I learned a lesson never to release untested software. :-)

I was REALLY green, and I was working on a web application for network security alert analysis and response. Since I was new, I was tasked with a large amount of testing. One part of the testing was to analyze the captured data for the alert and send out a message to the offending party's ISP. Well, for one such intruder, I noted the offending IP address, cobbled together the warning message, looked up the whois record, and fired off the stern warning message. Oh, the kicker.....the IP address was somewhere in the range of 192.168.x.x and I sent the message to IANA. Someone responded. Humiliation followed.

We had two prototypes of the new hardware. I was working late trying to get them to boot. This was in the days before Flash chips: the EEPROMs had to be removed from the board and inserted into a programming device to erase and rewrite them, a process I had done several dozen times that night.

I thought I'd fixed the problem, and was sure it would boot. My fingers must have been tired, it seemed like it was harder to push the EEPROM into its socket than it had been, but whatever.

Powered it on... nothing. What could be wrong? I started poring over the changes I'd just made, until I smelled it. That horrible melting plastic smell.

I had put the EEPROM in backwards, shorting power to ground and ruining one of the only two prototypes. My colleagues did not allow me unsupervised access to the other one until the production boards came in.

Added the Debian testing repository to test out a single program then a while later typing "apt-get dist-upgrade" without removing the testing entry.

First time I saw a kernel SEGFAULT on boot up. It was pretty cool until I realize all the kernels did that.

That wasn't too bad. Linux can be fixed.

Then in my hurry to fix linux I starting reinstalling Debian only to realize I just destroyed the boot partition and I wouldn't be able to boot into Windows and work on finishing my paper until Debian was finished installing. :)

Building an "installer" that inadvertently disabled the update functionality ... permanently.

The application was for generally non-technical users (mortgage brokers) and they would never notice, it was also essentially impossible to tell who received that build of the installer. So we had in the vicinity of 500 users who'll never get another update unless they ask. DOH!

I remember being in tech support for an ERP when one of my colleagues did this :) . I wrote a script which generated random passwords for everyone and sent out email informing them that this was a security exercise: They were in breach of security policy requiring them to use strong passwords and(/or) change their password atleast every 3 months... We were doing this so that they would be forced to change their passwords atleast this once.Not one of them complained.

Our product had a voice engine to notify users via phone (e.g. a library notifies you that a requested book is available). Unfortunately there was a problem where the software called an old woman several times in the late evening and would promptly hang-up. The poor lady ultimately called the police because she thought she was being stalked.

The problem was probably a combination of thread-safety and time change (daylight savings). I don't know what the fix was but hopefully it at least involved (a) do not repeat a call within 24 hours and (b) check the duration of the call and auto-email sys admin if it is too short.

A table named WorkOrders has triggers on update, insert events. I didn't know what these triggers do important events like sending an email. After 10000 insert query, our customer 's system admin gone crazy that exchange server was down.

Overkill OOP (OOOP?). Several years ago an external contract programmer was tasked to create a visual screen editor for us. He was a die-hard OOP fan from what i heard.

The end result? Down to the smalles bit, everything was a class. Yes, he actually had a class "CBit" in it! And since this was a windows application, it relied on messaging to get things done. The absolute horror was revealed when we finally removed him from the project, got the source code from him and took over the development internally because we weren't happy with the project's progress.

Because the framework sent out each message to ALL fracking objects and since each object checked wether it needed anything to do with this message, the data export of this tool was slow as hell (not to mention the numerous bugs we had because this thing was so hard and painful to code in). Remember, every one of the tens of thousands CBit in a typical project processed each message. The data export took about 90 minutes with a full project and required a computer with 1 GB RAM so it didn't trash the swap file too much. This was back when a "good" computer setup had 256 MB of RAM.

Over a year later, some of our coders hacked in some caching and filtering mechanism and lo and behold, the data export took only 90 seconds instead of 90 minutes.

Though similar to this post regarding issuing an UPDATE without a WHERE clause, I've issued a DELETE on a production web membership database without a WHERE clause ... and the backup was out-of-date!! It took me 8 hours to restore the data using manual queries from a staging database that luckily had just been updated from the production database.

I once accidentally misconfigured the firewall to do source NAT on port 25, not just destination NAT... as a result every incoming connection on port 25 showed up as coming from the trusted 192.168.0.x network in our mail server. Spammers discovered that within a week. Oops!

I'd have to say that forgetting to switch the 110/220V selector on the back of the disk drive enclosure (many years ago, obviously) before flipping the power switch and watching the white smoke that drives all electronics leak out the back of the power supply.

We had a system in the field that used a combination of encoders and photo eyes to track packages moving down a conveyor. There was a database entry listing the position of each photo eye in terms of encoder counts. The original values were calculated from the layout drawings. Once installed the actual values in the field could be measured and the entries could be corrected.

About once a week we would download a new version of the code. It would always fail miserably and they would revert back to the original version. The code would pass all of our in house tests and simulations. We looked for weeks trying to figure out what was wrong. Finally after over a month we realized that we had never copied the working values of database form the field. We were still using the initial values in our copy of the database. When we down loaded we overwrote everything including the database with the correct values!

I once worked on some test software for a computer manufacturing center. The computers had sequentially numbered, six digit, Base 36 serial numbers that currently started with 'H1' and we tracked systems by those numbers. I needed a dummy serial number to unit test the software, so I made up one that wouldn't be hit in the normal sequence but would be recognizable, so I used:

H0RSHT

Of course, one reference to it was left in an error check so after a few weeks of production, I got a call from the manufacturing floor because an error box had popped up that said:

When Windows 95 first came out, my parents got a new machine with it preinstalled. I was well-known in the family as the primary suspect for whenever the computer wasn't working (and also the primary repair person). I downloaded winzip to C:\ and accidentally unzipped them to the same directory. Obviously, this would not do and I was already at a command prompt, so I just figured I'd move them all manually to the directory I wanted. Here's the command I used:

C:\> move win*.* c:\winzip

Apparently, Microsoft decided to change the move command between DOS and Windows so that the command could be applied to directories, too! My whole brand new (as in 3-days brand new) Windows directory was moved to C:\Winzip. If I remember correctly (only 12 at the time, sorry for fuzzy memories), there was some sort of issue with simply moving the files back from whence they came via the command line. Naturally, ALL shortcuts to windows files were borked (so much for that new fancy drag-and-drop feature for moving directories since Windows Explorer couldn't be found and I had no idea how shortcuts, etc worked). After a $50 repair bill and a very heated lecture about safe experimentation/personal responsibility ("undocumented changes to commands aren't my fault!"), the system was back in working condition...as working as Win95 could be, anyway.

2008-10-06 15:10:42

Been there. Done That. It really stinks to have killed the family computer.

I consider this as one of my 'divine intervention' moments (Ref: Samuel Jackson in Pulp Fiction)

I was setting up my machine for a live technical demonstration of a database migration. I was too early to finish my preparations. So I thought it was good time to get rid of all the files in a temporary folder. Here is what I did roughly:

C:\> del /s *.* c:\temp\demo

The console started showing the list of files the system is deleting. I was thinking "dah! I should have used /q or something similar to make it silent". After a few idle minutes, I paid little attention to my console, it was actually deleting all the files and the folders under C drive!! The last parameter c:\temp\demo was completely ignored by windows, and it started happily deleting the stuff under the current path which is C:\

I pressed all the control+break+c+printscreen and what not! But too late. I already lost many things under 'program files'. My MSSQL enterprise manager does not open up. Del command was deleting files and folders in alphabatical order. I was pretty glad that C:\Windows is alphabatically at the end.

After plenty of un-deletes, re-installations, system restore, and simply copying installed files from other machines (!), it is still unbelievable to me, that the demonstration (to really big folks) went through just fine. I clicked buttons, moved my mouse, etc only if it is absolutely required. I didn't even dare to open notepad.

My first job out of college, I was a server admin at a boutique financial firm. The company ran all sorts of simulations on expensive SPARCServers that were stacked up in a small machine room -- slightly bigger than a storage closet.

The really powerful AC was installed with the vent blowing straight down from the ceiling to the machines. If you were working on the machines for more than a minute, your neck and shoulders were chilled more than your favorite after-work beverage!

I came one weekend for a long set of server patches/upgrades, and spent a lot of time at the console terminal (single user mode) -- after rubbing my shoulders and neck one too many times, I turned off the AC switch on the thermostat.

Two hours later, I was done, verified everything was running well, and then left to enjoy the rest of the weekend.

That evening, I received a page that the system was down. I came in to work and the first thing I heard were several thermal warning alarms from the RAID boxes.

Fortunately, most of the systems survived. But one CPU module on the SPARCServer had to be replaced. I think that module cost the equivalent of two of my paychecks then...

This is a bit of an embarrassing WTF. I was connected over RDP to a remote windows 2000 server, and I was taking note of the IP address as well as the network interface of the server in the control panel. So to open the properties window I went to right click, and clicked properties, or so I thought. Because of the slow connection my mouse cursor did not move as fast as it would have done in real life. So instead I clicked "Disable". Yes, the machine was now unreachable.

The first time I installed win95 I noticed a systemdat and a system.da0 in the windows dir. Thought it would be nice to save this 1 meg on expensive hd space... ...so I started to reinstall win95 the very same day

Soon after joining a cross-platform open source project, I thought it would be a good idea to do some clean up. It had mostly been developed by Unix dudes and I was working on the Windows port. When I built it with Visual Studio, there were thousands of warnings complaining about potential data loss and suggesting types be cast correctly.

Away I went, changing 100s of files correcting this 'problem'. It still worked afterwards for me so no problems! As so many files had been changed, I checked it in by batches of 10 files at a time.

End result: everyone else on the project was spammed by CVS commit e-mail messages, and the build was completely broken for every non-Windows OS!!

i developed a diagnostic system for machines that should be used by several hundred persons all over the world. the application also showed some pictures of the machines that could get changed by the technical staff of the machine producer.
while testing we didn't had all images of coffee machines and used some "bikini"-pictures instead.

needless to say that quality management was not existant and the test-database became production db.

thank's to automatic softwareupdating we only had to wait 20 mins till the first customers called and asked why their "coffe machine" looks like a D-Cup brunette.

A few years back, I was modifying a Java application for a customer. I decided to be real professional, and whipped up a fancy installer as well, using InstallShield, I think.

I found, of course, that the installer could display a splash graphic while loading its resources. I thought that would be cool and, just temporarily put a picture in there which I had lying around in My Documents - displaying a very scantily clad female.

Needless to say, I forgot to replace the splash graphic before I shipped the installer off. Actually, the customer never complained, probably because then the person in charge was a middle-aged man who probably didn't mind at all. My luck.

Years ago, my old man wanted me to reinstall windows for him. He had windows 2000 on drive C: (NTFS) and data such as family photos on drive D: (FAT).

I placed all his documents and personal files on drive D before booting the Windows installation CD. Booting from CD didn't work for some reason, so I created a boot disk and typed

"format C:/q" ENTER

y ENTER

Copied boot files and cd driver to C: so I was able to reboot from hard disk again and start the installation process from there.

So, I rebooted and to my surprise I saw the windows 2000 screen come up.

"Hmm, the format process went fine, what happened?" were my thoughts.

And then it hit me. It really felt as if the ground underneath the chair disappeared as I realized that my boot disk couldn't see an NTFS drive, and as such it had mapped drive D: to drive C: and I had happily formatted all my dad's work on family photos and months of work on descratching is vinyl albums. Together with his documents and such.

Something similar happened when I was toying around with Linux about 10 years ago, wanting to install it on a spare partition of my main Windows box. I was young and didn't know jack about Linux, and `hda`, `hdb`, `hdc` and `A:`, `C:`, `D:` are so easy to confuse... :)

Yeah, been there and lost my entire music collection, game saves, and everything else that i'd spent about 4 hours backing up to my FAT disk from my NTFS disk. Now I unplug it if it's not being formatted.

Not quite a programming problem, but was caused by one so I'll include it.

Having worked for over 2 weeks on a particular bug in one of our modules that had been hanging around in various forms for over 6 months, I was particulary happy to finally find the cause and resolve it.

Being very pleased with myself I did what any program does at times like this.

At one job, for a short time my desktop HP (running HPUX) was being used as a print server. I soon found out that having something heavy in front of the plug was a good idea, after kicking it out of the outlet a couple of times. I used my old TRS-80 4P, doubtless the slowest computer ever to perform a critical function at that company.

Thread is probably lost in antiquity, but my biggest WTF happened at my very first programming job, of course. I was in charge of the installation. This is back when Win95 was new, and I knew jack about installations. I was using Installshield, and having a blast installing, testing, wiping the machine, isntalling Win3.1, upgrading to Win95, installing, etc.

Anyway, I had that installation SOLID man! And we were going to a big testing thing in St. Louis, and I was goign to have to install this on 50 computers there. Right before we got in the car to start driving to St Louis, I decided to use Installshield's "Package on Demand" feature or whatever it was called. Recompiled the setup, copied it onto the 10 floppies that it needed, and we took off. Without testing.

I then went and installed it on all 50 machines when we got there, one painful floppy at a time -- start one machine with floppy 1, then when it was doing start another on floppy1, while putting floppy2 in the first machine...this took HOURS.

Still didn't test it.

After I get them ALL done, my boss goes to a machine and says, quietyly, but my guts wrenched, adrenaline was not shot into my veins so much as hosed into it..."Matt...why won't ICS run? I get an error about a missing library?"

Yeah -- all the needed files were on the disk, but there was nothing int he installation program that would actually PUT THEM ON THE MACHINES. They all had nice icons, but no working program.

Remember, this is in the days before good reliable iNet from hotels. Hell, none of us even had any laptops, nor a way to remote into our work machines. We had to go back to an old version of the program that had tons of bugs, but at least we knew where they were and how to work around them. I did't get to sleep that night as I was up REintalling the app on all of those machines. Again.

Moral of the story -- not matter how trivial you think a change is, GO BACK AND FARKING TEST IT.

But I hate to admit it, even to myself, which is why it's so hard to even write this reply.

Probably the most recent one that comes to mind was at a demo this year. It was with a potential new customer, and we'd planned a demo where I would meet my boss with my laptop to show the newest latest and greatest version of my GUI. Somehow I remembered the meeting time wrong, and thought I had an extra hour. Also, I had left my phone at the office.

Luckily some intuition caused me to cut lunch short, and I came into the demo 10 minutes late, and it did go quite smoothly and well, although very stressful for my boss (and me!).

Yep, it's painful to write this. I did learn from it though -- Be calm, and be more organised.

MySQL master servers need the option enabled to do Binary Logging, as that is what the replication slave reads. They also have options to log changes on some tables. Well, if you want to remove all those filters, you actually need to remove all the replicate-do-db entries. Leaving one blank will result in nothing being logged for replication. :-/

This was a database that could not be taken down long enough to take a binary copy of the data for re-creating the slaves. It took the company 3 weeks to get everything back up to date again and yes, it helped lose me my job some months later.

It may have started there, but the walking directions feature is stil relatively new and not always accurate. In my area a major foot ferry isn't registered in Google maps, advises you walk 12 miles around the island instead.

A previous boss of mine once set up a monitoring system based on mail. If something went wrong, it would send a mail.

Well, something went wrong and roughly 30'000 mails were sent.

That wasn't the problem. The first problem was that the Exchange admins came complaining that he was "abusing" their server. After they were gone, he tried to delete the mails and found out that Outlook couldn't. Apparent, no one had ever tested Outlook with a big inbox.

All he could do was select the mails 100 at a time and then delete them. After that Outlook would allow him to select another set of 100 mails.

In a former life where I still had to do tech support and given that I still don't understand digital phones I had two very angry customers with different problems about different bits of software neither of whom were english-first-language, and I managed to connect them on a party line by themselves. I can only imagine how that played out.

Oh back in 1989 in my local community college (a 2.5 hour daily commute in a rickety bus on rural Irish roads). Money was tight – hence the commute – disks for my dual-floppy PC where prohibitively expensive, and backups weren’t an option. The lesson learned was drink related :)

@lagerdalek: It made him shut up and go away, which is exactly what I was aiming for (between you and me, I don't think I'm dumber and slower on the uptake, I definitely reckon I'm less arrogant though)

I was going to send a newsletter to all of our members (10.000+) that we were to release a new version of our application. I quickly wrote a small set of codes to send the newsletter and run it. But unfortunatelly forgot to loop names, so everyone got the newsletter in the name of the first record in database, which was also me :(

I was coding an ecommerce site and was testing the credit card charging system. I was very bored of typing the same card number over and over again, so I decided to hard-code my own card, and pass the credit card screens. Well, at the end of the month, I saw that I'd spent over $1.000

1 or 2 times data loss (learned the "backup lesson" very well)

I decided to use a remote control service to manage one of our servers. After installing the application, the first thing I did was to restrict all IP addresses, but forgot to enable my own. I went home to continue to write my code. At home, I realized that I couldn't connect to server, so at around 1am I went to the office and enabled my home IP.

I had a lot of data on an external drive which needed backing up. So I got another external drive, and plugged them in ready to copy from one to the other. There was a funny smell, followed by a few whiffs of smoke, as I watched four years of photos burn up, not to mention several hundred dollars of equipment.

The two drives, both made by the same manufacturer, had identical power supplies. Or should I say, apparently identical power supplies, but with the 12V and 5V lines swapped. There was NO way to tell which power supply belonged to which drive!

I did manage to recover the data though - I ordered an identical hard drive and swapped the controller boards. It worked. Score: Seagate 1, Akasa 0.

Well, there was the time where I was SSH'ed into a remote server (3 hour drive, one way) changing some configuration files for my customer. Once done, I used 'init 1' to stop the services so I could reload them...

Luckily this was a backup server; we had a service call to that location scheduled the next day.

Another (earlier) time in my IT career, I was in the Army. Another tech was helping me run some Cat-5 cable to some new workstations we were installing in the Headquarters tent. Once done, I went over to the switch and plugged in all the loose cable ends.

What I didn't know at the time was one of those loose cable ends was actually the workstation end of a cable already plugged into the switch. The auto-sensing switch. And no, we didn't have Spanning Tree enabled at the time.

The Commanding General was very curious why his network was down during his daily briefing...

In the old desktop database Paradox you could link to tables hosted on a Database Server pretty much as you can from Access to any other db using ODBC ... the subtle difference was if you deleted the link to the table then the table got dropped as well ...

The feeling I got after realising that I'd dropped some enormous hospital patient data tables was one I'll never forget ...

My first job after college I worked for a company that would send alert messages to pagers (remember those?) for different events that happened in a hospital. One of my first tasks was to write a new alert format for the pagers when a patient changed beds in a hospital. I was given the test pager and went on my merry way.

To fully test out the new messages I set it up to fire anytime anything changed on a patient. This gave me plenty of sample data to play with and I soon had the messages working up to par.

You can see where this is going right? I didn't ever turn off the test alert and it was sent to a large hospital with around 400 beds, which all had patients, who were all having their data changed often.

I didn't find out about it until my manager walked over to my desk with a stack of papers that came with the pager bill that month. $600 later and a lot wiser, I now triple check where my code is going to end up.

I have done something similar. We had two different SIM cards. One for testing datacall, and one for testing SMS. Guess who got the cards confused. Luckily it was near the end of the billing cycle and the bill came quickly 0_0

a company i was working for once used a bug in IE to access the clients machines from within a website to "increase usability" and making a seamless web/desktop integration.
a few months after deployment the bugfix was rolled out with windows update and the customer wasn't able to use the features anymore. they were not able to work around this.
not to say it was the last thing this customer paid the company for.

This application allows a bank to track charges that customers pay for their bills. At the end of the day, a bank worker "ends the day" and the charges are transferred to, say, electric company. For ending the day one must:

Click "day endings" from menu

Choose "end day".

Click "Continue" on "This will end the day and blah blah" dialog.

Click "Yes" on "Are you sure?" dialog.

The customer requested a more error prone interface in this request. We could not understand how can they end the day accidentally but they told us they could. So, we added these extra steps at the end of the process:

Type your password again on "password required to end the day" dialog.

Click "Yes" on an additional "Are you sure?" dialog.

(OK, this step was included to express how irritated we were) Show a full screen dialog with a red background and with the message "Are you sure you want to end the day? This is an irreversible action. Are you really, really sure do you want to continue?".

After that last step, the customer could end the day. After the code went in the production, we called the bank to check if everything was fine.

-- Did you see the updates?

-- Yes, we were looking at that.

While talking on the phone we heard a panicking sound at the background:

I worked for a startup doing some Sega Dreamcast development. The devkits were pretty expensive and we only had 3 or 4 that we shared around. They had a nasty habit of frying if you plugged a cable into the RCA video-out jack while the system was powered on. I found this out because a senior dev on the team did this one day. BZZZT! A few weeks later we got our replacement dev kit. Same senior dev fired it up... plugged in the video cable... BZZZT.

2009-03-04 21:47:28

Wow... I think the people that developed the devkit needs to learn about electronics... sounds like a few resistors and capacitors were missing. Of course it is a good way for them to keep bringing in money.

The support techs have a special "back door" for when users forget their administrative password. One of the company's oldest customers called and said that they couldn't login as an admin. For some reason, the remote assistance service wasn't working, so the support tech gave them the "back door" to try for themselves. Luckily, the customer was able to login and reset their password and all was well.

Then, about two years later, that same customer calls back. This time, however, it's a different person. They say that a suspended employee account is still showing up in the access logs. Guess who that suspended employee was... that's right, the person who got the administrative back door.

So, basically, they were let go a few months before that and got a new job at a competing company. Using their administrative superpowers, the employee was able to log into his old account and steal customer information even though all outward appearances indicated that his account was suspended.

There wasn't anything they could do except delete the account completely (which they should have done 2 years ago). That employee is still out there, though... lurking.

Anyway, the moral is: if your application has a secret back door to a fully-privileged administrative account, don't give it to a customer just because your remote assistance service is offline.

Actually, a better moral would probably be: don't include a secret back door to a fully-privileged administrative account.

Was logged on to a an apple server thing (whatever those are called again) through remote desktop. The something with the network wasn't working correctly, so I thought I would give it a fresh start. So I turned the network interface off, waited a second or two and turned it on again.

Well, I was going to turn it on again, but of course... that didn't work so well... :p

Of course, this would have been less of a problem if the computer I was connected to had a screen and a keyboard and, you know... anything... but it didn't have anything, so had to fancy boot it through firewire on another another computer and stuff... not what I wanted to do... cause I was already tired...

In other words: Go to bed when tired. Don't experiment with crucial things :p

I was demoing some software our team developed, for a major credit card company, to the management. It was my first project and first demo, so I was a little bit nervous.

The software would automate incoming calls, from customers, after they had keyed in their credit card details. Once the call reached the correct hunt group (correct team/department), the software would retrieve all their details, including account history, faxes and scanned letters and images. This was quite leading edge at the time (early nineties).

Okay, I was sitting at the computer, with all the executives, managers and consultants, explaining how you just simply dial a number and the software just kicks in. I made a fictitious call typing 9999 on the office phone connected to the system, which invoked a dummy customer. All was well, I demonstrated the software and it's functions. Everyone seemed happy.

Towards the end of the demo a call came from ground floor reception, saying the police were downstairs and rushed in to a 999 call, from me - WTF! I forgot that dialing the first nine would make an outside call, then the next three nines, called emergency services.

The demo came to premature halt and I had to go and convince the police it was all my fault!

In setting up a phone system at my work I left the default setting of allowing you to directly dial 911. Many of the teachers apparently got a kick of trying to dial two 1s before the area code. Need less to day I removed the quick 911 and change the dial out number from 9 to 8.

I was setting up a modem on my PC years ago, and just put a string of 9s in to test it was dialing properly. Had to very quickly find and plug in a phone to explain to the emergency services operator what had happened.

In the old days, 999 was a reasonable choice because the old dial phones made dialing 9 very difficult to do by accident. On more modern phones, it's easy to do in all sorts of ways, some more embarrassing than others.

Since I haven't been programming long I don't have any real WTFs to my name, I did discover busy loops right back at the beginning of my university course when I was practicing on the side; didn't take me long to realise that was a bad idea.

Went to the customer's site to upgrade the production version of our software, which was running on Unix.

Logged in as root & just ran the upgrade script without looking through it first.

The script assumed it was logged in under the application account and the first thing it did was rm -F $approot/bin/*

Of course $approot was not defined since I was logged on as root and not as the application. And root can delete anything!

It took the customer's IT guys much of the rest of the day to figure out how to get the system back.

I now always look through install scripts before I run them and I never run them as root unless there is no other way.

2009-03-20 08:59:58

A:

I once noticed a bunch of files uploaded to the production server by a colleague had file permissions that wouldn't let others modify them.

I did a chmod -R 774 * (or something similar to that) within the main web site directory. Unfortunately, losing that 001 (world execute) bit was really bad for directories, because it meant they could no longer be traversed by the Apache process.

It sent me into panic mode for a little while. Luckily it was an easy fix. In hindsight I should have used something like g+w to only update the group bits. I also find practice of having Apache run as nobody a bit dubious, as it requires all the files in your site to have "world" permissions. I realise it is common practice though.

Long ago, I was testing some phone number formatting code in a very basic search screen. Instead of using something like 123-456-7890, I used my own home phone number, and forgot to take it out of there as the default text for the label caption. If an account did not have a phone number when displayed on the search screen it would display mine instead. Oh the phone calls and messages I would get about delinquent accounts for utility bills in other states....

I was working as an intern in my college's IT department and was asked to vacuum off the dust from the intake of the servers. The server room was on a platform and all of the power supplies were under the floor. I removed a panel to find a plug and when I went to unplug the vacuum I instead unplugged an entire rack of servers. All of those little green lights weren't green anymore and I almost singlehandedly brought down the entire campus computer network. I went and told my sup and we laughed about it. To this day (I have some friends that still work there), they still call it a CronJob (because of my last name).

One of the things we do at my company is instant win web sites, you know, the kind where you enter your info and you automatically get a winner/loser message? Yeah. Well our instant win system was designed to be completely random, but allowed for the ability to manually make the next X entries winners (for test purposes). Unfortunately, due to bad design, this ability was available in production too.

Well, one day, I'm logged into my DEV sql box (I think) and testing something, so I design to make the next 5 entries winners. 3 minutes later I realize I was logged into production.
The prizes were already claimed and the users notified. The prizes were worth approx. $5,000 each.

Similar to Graeme Perrow, My company was working on a system that linked the county court to the local Sheriff's office. I accidentally deleted all of the arrest warrant data in the court system which then cheerfully started to tell the Sheriff's systems that every warrant issued in the past five years or so was still active.

Fortunately, I was able to take the court side of the communication link off line until we restored the data.

A foriegn key has saved my hide in a case where I did pretty much the same thing. Delete from usersForiegn key violation on (user_profile for instance)*Breathes sigh of relief*, *kicks self*, *adds where clause*

My boss called me panicked one day and told me to get into the office now. When I got there, I was told all of the customers data was deleted. After research, we found that a tech support person deleting one account did the following:

rm -rf /home /username/

Beware of where you put your spaces kids. The only backup was a hard drive from months ago since the tape backup was being worked on by my boss.

I had an automatic update utility that was working fine. But sometimes a download would fail, and the update would fail. So I added a CRC check. If the download failed, it would fail the CRC check and download it again. Unfortunately there was an overflow issue in the CRC calculation, which meant that after about 3 months of normal operation, a file was uploaded that mis-calculated and the downloads could never work. So it sat there downloading again and again and again.

The $40/month server that hosted the downloads went over its allocation, and the ISP billed for an extra $4,000. They wouldn't budge on reducing it either.

I was testing a new linux server.
I had a problem with permission and after some struggling I found the right directory, it had a bunch of subdirectories and just for testing I decided to put everything on 777. I don't know why (maybe I was tired), but I decided that adding a "/" was the best way to select the current directory.
I issued this:

chmod 777 -R /

I immediatly realized what I had just done, but that damn command is fast! It had chmodded half of the system while I was founding the ctrl+c.
Needless to say I had to start everything from scratch.

In an attempt to avoid screwing up a delete statement I did a select to verify my where clause, and then wrote the delete. Unfortunately I didn't comment out my select, deleting the whole table (even though there is a where clause)

Same thing happened in my company, except the employee's email was in the FROM field. After he quit and his email account was deleted, the email form stopped working. Lesson learned - use more permanent email accounts in production.

While I was working on my coop (paid internship in college) I was developing a small testing framework which queried a database and generated an HTML page with test results. One day my manager had planned to show some of our work with others at the company. I forget exactly what it was, but I apparently did something to screw up the current tests immediately before the meeting.

So, during the meeting when the time came to show everyone the test results page, instead of showing a bunch of 'green' test results (results were color coded) everything was red which meant that all the tests were failing, and very quickly (the test programs were failing to execute).

Needless to say I learned a hard lesson that you should never make any changes to a system before a demonstration without verifying that everything is still working correctly. I was quite embarrassed.

thankfully there was an easily parsed registration log file that allowed me to restore user avatars.

2009-06-22 07:49:37

A:

Once worked on web application with err... 'interesting' self-made error handling logic. From outside - nothing fancy, it just logged errors in file. It was tied together with logging.

I was assigned to fix some bugs. To start doing something - i had to deploy our app at first (it had not automatic build, just manual with a lot of configuring tasks).

Everything went smoothly, when suddenly i saw something new. Actually - it wasn't anything new, it looked like standard "Server application unavailable", thrown by IIS, but nothing helped to get rid of it.

Got completely desperate and tried to reinstall .NET (you know - you can't just uninstall MS production). After failure - i tried to reinstall Visual Studio. After next failure - i had to reinstall my whole workplace, cause .NET was somehow corrupted. All of this just because i was completely sure that application logic must be fine and there is something wrong with infrastructure.

Next day we did some pair programming (cause my PC still wasn't ready). After some debugging we finally found the reason - co-worker has changed error handling. Before - it was like big try{}catch{}, where catch didn't throw anything further. Every1 knows that empty catch`es are bad (more like stupid), so now - every time application wanted to log something, in case it couldn't create a log file, it threw an error, caught it and tried to log an error, couldn't create log file, threw an error, caught it and tried to log an error...

A colleague at my current company used to write unit tests (NUnit) that did their testings under "c:\". Although he nicely introduced a TestDirectory property which got initialized by the test suites Setup method, he never thought of using a dedicated temporary test directory.

Guess what happened after another colleague went through all test suites and added the TearDown method that recursively deleted the path pointed to by the TestDirectory property...

This killed one developer's machine as well as our build server (TeamCity). It took me a few hours to figure what happened and a day to setup a new build server.

At a previous company we managed an application for a local univeristy managing the dining halls. Part of the application allowed users to send in comments/suggestions. Anyway, our company's generic support email was on the list of recipients. Long story short a girl had written in complaining about how the dining hall doesn't cater to her during her "time of the month" and the food would aid in giving her bad cramps, mood swings, etc. Also she had some kind of STD and couldn't do something within the dining hall. She came off as a real crazy #####. A guy I worked with recently got divorced and I "forwarded" him this email telling him "Hey man! this girl is a real catch!" Little did I know I hit reply-all and the rest of the support team and the girl got my response. I kid you not she called the office looking for me (glad my signiture had my phone # in it!) and yelled at me for a little bit. At the end she asked to be forwarded to the guy I had sent the email to. Five minutes later he came to my office and said "dude..she just asked me if i was up for a challenge."

I think he actually went out on a date with her considering he was in his 30s and she was a college student. Guess it all worked out for the best.

2009-06-29 18:38:10

+3 A:

I got a few of these moments, but this one was one of the first. I was getting increasingly annoyed at having to connect to the database server to run the Perl's $dbh->quote() function to properly sanitize the records about to be inserted. Additionally, I thought it was too slow.

So, I decided to write my own quote() function and roll it into production, after some testing. Well, there was a little tiny corner case (ok, it was huge), I forgot to escape the escape symbol - \. In an unfortunate series of events, one of my UPDATE statements was ending in a \ and this escape symbol ended up escaping the closing single quote and perhaps some other random event + 3 saturn moons aligning, and my WHERE clause ended up being completely ignored.

All 50 million rows in the table were overwritten with one completely insane looking value (insane enough to end in \ in the first place) and provided me with hours of research and blaming all kinds of MySQL voodoo for it.

To add insult to injury, the backups were failing for about a month, and there wasn't a single full copy of the database anywhere.

Additionally, my own quote() function was actually slower, even though it didn't connect to a remote server. Perl is slower than CPP, after all.

Lessons learned: test more, think more, and make sure your backups work. We now have a slave running that is purposedly delayed by 12 hours, which makes it a very effective rolling backup.

In the early days of the commercial internet (around 1995), I spent about 4 hours doing tech support for an ISP and ran away screaming.

A couple of months later, I get a job as an IT intern at an advertising agency on the floor about the ISP. Said ad agency has started an ISP of its own, jumping on the booming Internet bandwagon. All is cool until I get assigned to go unplug our company's T1 line in the phone closet in the basement.

I go down and find two lines coming through the wall. I unplug the one I'm sure was to our office (of course, neither were marked).

I find out later that it was the ISP's line and they are now suing the ad agency, claiming it was done maliciously (the one and only time I met one of the partners at the agency :/). Amazingly, the dumb intern was not fired, but I did quit shortly thereafter to take up independent consulting.

My girlfriend was suffering from an embarrassing women's problem. I Googled it at work, and found an informative page which I thought she would like to read. I used Remote Desktop to get into my home PC, fired up a browser, put in the URL, and sent the web page to my home printer.

What I didn't know, was Remote Desktop attaches the local printer as the default device, and sent it to the office printer in the next room where the young ladies work!

I was running Windows and Linux with dual-boot. Windows because I actually needed it for work, Linux just for testing it. Under Linux, I had my Windows drive mounted under /mnt/C.

At some point, I got bored running Linux and thought that it would be exciting to see what happens if I delete all files from the Linux system. How long will the operating system keep running when all the files are gone? So I did a cd /, rm * -rfv (or something similar to it). After a few seconds, I saw C:/Windows flashing by.

In Windows ME, I somehow turned off the "Don't show hidden or system files" (default is on), and others, option in Windows Explorer. Later I browsed through C:\, noticing some new files I had never seen before.

I tried deleting those in C:\, later on I rebooted, but at the next reboot I figured out I should probably have left those semi-transparent icons alone.

I had just started working for a large Golf Magazine that had a department that ran an electronic tee sheet/POS system for regional golf destinations. One of your RS6000 servers in the Myrtle Beach area was not letting a customer dial in via modem so, following the troubleshooting procedures I logged in...grep'ed for the user's processes...su to root...and then kill -9 all of them...including process 1.

At the time, mid-July, there were 200+ courses in the area using our system that no longer had tee sheet access. There was confusion as to why a fairly new server crashed in the middle of the day for no reason...

My sysadmin at the University made himself famous one day I went to his office to request a owner change. I don't remember why, but some files in my /home/users/mylogin directory appeared to be owned by root, so I came to his desktop to request him to execute a "chmod" on that files.

Oh, wait, I forgot to mention, those files were "hidden files" (it's filenames starts with .). Oh, and one of this files it's a folder... So, the sysadmin, after a few moments of thinking, said to me: "Ok, this could be done very fast with just one command", and thinking "look at me, I'm a so clever kind of sysadmin" he quickly typed

chown -R mylogin .*

My eyes caught fire as soon as I looked that "clever command line" this guy has just typed, and even more fire as I saw the sysadmin's face color turning red and then fading to green blue, and finally white, as he started to think "Hey, why the whole filesystem started to scroll in the screen? Hey, is the /etc folder what I saw in the screen? I've made some kind of mistake in the command, or is just my imagination? What could happen if I hit Ctrl-C now? HOLY F*** I'VE JUST MADE THE WHOLE SERVER OWNED BY THIS USER!!!!!!

Ok, I'm guilty because I've been looking over the sysadmin's shoulder while he was typing that command, and instantly realized what will happen, but just kept quiet to stare the reaction of that poor guy as soon as he realized what a mess he just have made.

The Sybase Powerbuilder IDE doesn't have background compilation or anything like that, but it uses CTRL-L to compile the current object. It took me a long time after I switched to Visual Studio to stop periodically using CTRL-L (which deletes the current line).

Actually, I did something really stupid once: I was logged in on a live UNIX-based system as root and executed rm * -f (or whatever the syntax is) forgetting that I had just changed to / moments earlier! Luckily the system had been backed up.

Some years ago, myself and a partner set out to write a program to do high speed formating of floppy disks. After spending several days writing this on our PC Clone, we had it working perfectly. Spent a couple more days reviewing the code to make sure that it was perfect. Ok, time to beta test. Gave the program to a customer that had an IBM XT. The customer ran the program and it seemed to work just fine. Until he closed the program--got the famos "Abort, Retry, Ignore" message. It seems that we had failed to save some values before calling the BIOS to format a track on the disk. The net result was that on XT's, when we wrote the new directory structure, instead of going to the floppy, it went to the HD. Of course, a blank root directory was just as good as formating the HD.

I wound up spending the entire night rebuilding his computer.

Lesson learned: Always assume that anything that isn't documented (like registers being saved) will function in the manor least likely to allow you to sleep that night.

Years and years ago, I had written a database migration tool that was going to be used once and then get discarded. The tool was tested on the staging server and proved to be doing its job. It was ready to be used in production.

I sent the binary to the server admins and then realized that I had accidentally wiped the source code, without committing the latest version to source control. I was just left with the binary.

Then, I was told that I needed to change a hard-coded numeric parameter that the tool was going use (it could have been a port number, or some threshold value -- I don't remember.) I fired up the hex editor, guessed which occurrence would be that parameter and changed it and sent the updated binary to the server guys. The tool did its job and nobody learned about the source code screw up.

This one belongs to my boss, but I don't think he's on here and it's a good story so I'll share it. He used to have a habit of putting nasty words in as debug output. He thought he was pretty good about cleaning them up until one day one of our clients sent an email with a screen shot of a pop-up window saying, "F*(*& You!" after one of the values they entered didn't validate. Needless to say, he quit that habit very quickly.

I was working on an online move ticket sales system. We had kiosks with credit card readers in the pilot movie theater. To unlock the kiosk to perform administrative operations, one would have to swipe a special card registered in the system.

I had registered an old debit card that I wasn't using anymore and left it with the people at the ticket booth so that anyone from the development team could come into the theater to unlock the kiosk during emergencies. I advised to the booth denizens to stash the card somewhere easy to locate.

One night, in another emergency where the kiosk bugged out and hijacked a majority of the theater's seats, I rushed to the theater and retrieved the card from ticket booth. Swipe, swipe, swipe, rub, rub, swipe, swipe... Nothing. The card didn't register. It was dead. I had to call a teammate to register my credit card in the system and ended up unlocking the kiosk after a lot of delay.

The location that they had chosen for the unlock card was on top of a CRT monitor.

Many years ago I was swapping my 286 with my aunt's 386. Armed with a copy of pkzip I backed up everything I needed to keep onto a large stack of floppies and we switched machines.

After copying the contents of all the floppies onto the new machine it was time to unzip them. Where did I put pkunzip again? Oh, there it is, pkunzip.zip. CRAP!

With the old 286 reformatted and not being able to find a friend that had pkunzip.exe on a floppy I was SOL. Being only 12 years old at the time and well before Al Gore invented the internet in our home, I had no means to replace it.

Not me, thank (Deity /) I used to work for a television company that covers a world-wide motor racing season (yeah, that one). The entire TV complex was powered by 4 generators.

One day someone walked past a generator and accidentally bumped the emergency stop button. I was in the media centre and saw all the screens go blank. I looked outside at the generators, all with their exhaust raincaps resting in the down position.

The ensuing debate was whether or not to disable the emergency stop button stopping all the generators or just the local one. Cover plate anyone?

I once had to write a piece of software that gathered data from a local accessdb and load it up to a central point once a week.

It had to run at several unattended locations where there was no technical support. Accordingly this was written as a very reilient application (Full windows NT service, implemented acording to MicroSofts best practice, three network connections to three central sites defined etc. etc. etc.).

The program would try all three network connections, wait an hour and try again for two days before it would finally give up. The whole thing worked really well in test and it waas actually very hard to get it to fail -- so I left the error message "Bugger Me I just give up" in place thinking no one would ever see this.

Everything worked just fine (I got over five years uptime on one of the windows NT processes) except for one site. The network administrator for this region decided the naming standards were not good enough and implemented an alterative scheme, then,
after head office found out was forced to revert to the original naming scheme.
Which meant for a period of several months the network was reconfigured nearly every week causing half the support team to receive sms's, emails and various alerts with the "B***r me ..." message in the text every Sunday morning.

To make matters worse these were not native English speakers, and, this is not the sort of thing that gets covered in a respectable English Language course. I was asked "What does 'B****r mean " by ernest coworkers hoping to improve thier English on many occasions, once in the midlle of a management presentation with about 50 people attending.

The time I was demoing some software and did a software-initiated wipe of a production CD/R jukebox used for providing online access to archived scanned deal documents for an Investment Bank where I was working.

Everything was in backup, but I had to give up a weekend to build, initialise and re-populate the thing again:

80 CDs loaded via a cartridge/case & post-slot servo-mechanism (so no hopper to feed them), manually using the machine's console to identify/fill a slot with each disk, index writing (again manually from the console), then software initialisation and data-write. I was sick of the bank's server rooms by the time I'd finished :-(

Late one evening, I logged a linux box and noticed "You have new mail." Checked it, and the account had 60,000+ messages, all STDERR from a cron job. Well I thought it would be funny to forward all of those messages to the inbox of the guy who wrote the cron job and was supposed to monitoring it.

So a little proc mail recipe later and I decided to go ahead and call it a day and go home.

When I came to work late the next morning, the mail admin guys were running around with their hair on fire.

What I failed to consider is the company used Lotus Notes for email. And Lotus doesn't like a flood of email. My little stunt brought down all of the Lotus servers.. Which beside email, was trying to replicate data for other important systems, some in Korea, some in Germany.

The system would crash every few thousand emails. The Admin's would clear the box, reset the server, and then three+ thousand messages later crash again. And the linux box doing the mailing was on the same LAN as the server, and the admin's couldn't figure where the mail was coming from, etc, etc.

Lessons Learned:

Lotus sucks more that I already thought.

Don't send messages in mass.

Don't try to do something funny when you are getting ready to go home.

Picture a networked machine with a cron job that runs every 5 minutes gets unplugged for a day (because a visitor needs to plug in his laptop, and there are no spare cables/switch ports nearby). Each cron job blocks on an NFS-mounted filesystem. Next day the machine gets plugged back in and every cron job decides to send an email. That was when I first saw a load average above two thousand.

Some of my earliest, functional programs written in FutureBASIC on my old Performa were meant to annoy the user. One was pretty simple, and ran in the background detecting any disk insert events, promptly spitting the floppy right back out whenever it got a hit. The one that's most relative to this thread however, was a simple, one-line program:

SHUTDOWN

Yes, that's a valid FBASIC command. Now, plant that in the Startup Items folder. Restart the computer, wait two minutes for all of your extensions to load and the system to boot up, then your desktop appears and...BOOM! Computer goes bye-bye.

Suffice to say I quickly learned that one major weakness to the program was that it (thankfully) didn't override the ability to disable startup items.

This happened when I was giving a live demo to business users on an application. Due to unavailability of test data and client environment on demo machine I was connected to the application hosting server using a remote desktop connection. The server was located on a remote site far away which I happened to know later :(...

In order to simulate the fail-safe mechanism for the application in case if link is down between server there was a use case when I had to disable the network and then simulate that the application didn't crash... It is only when I press disabled when I realized that I was connected to a remote server and disabling the network on remote server will not just disconnect every one including me but no one will be able to reconnect either...

Later on I got to know that it was located on a remote site when some one had to go there and enable the network. Worst part is that it was running on virtual so it took a while to identify which one it was...

Lesson learnt: Investigate the physical working environment location of the machines, availability etc and always prepare a demo environment on a physically available machine..

Some years back (last century!) I was working for railway company that were busy putting together a bid for a multi-million pound contract. All the bid data was on one PC. Although I was a developer, the boss asked me to take a back up of the data.

This bid was worth a lot of money, had taken the bid team months to put together, and was considered very important.

I rang the bid team, asked them to log out of the system on this single workstation, mapped a drive to the workstation in one explorer window, mapped a drive to the backup location in another explorer window and started the copy.

After about 60s, the copy failed with a "file in use" error - they hadn't logged out.

I phoned them up and asked them, somewhat testily, to log out; they said sorry, and said they were logging out now.

The partial backup already taken was incomplete, so I highlighted the files and hit delete.
And as I watched the files disappear I realised, in horror, that I was deleting the wrong set of files - I was trashing the bid team's workstation.

I froze, while in an evolved "fight or flight response", pumped my body full of adrenalin.
The files were disappearing before my eyes. This was a remote network share, so no recycle bin.

And then, as I reached forward to click cancel, and see what utter devastation I'd caused, the delete juddered to a halt with a "file in use" error. The bid team still hadn't logged out.

And through the fog of panic it started occuring to me that the files I had just deleted, those 100s of files, were exactly the same set of file that I had a backup of, from my first attempt.
So silently, without letting on to anyone (apart from the visible signs of sweat and panic), I copied the previously backed up files to the bid workstation, and then backed the whole lot up.

No-one ever complained that there were any problems. I got away with it.

The moral of the story:

1) Always plan for your own stupidity - in this case, don't create a read/write share if you only want to read.

2) PAY ATTENTION TO WHAT YOU ARE DOING!!!

Even typing out this story has raised my heart rate in memory of that day!

It happened few years back when I joined my first company. I had never used version control before. Anyway, I was working on a feature and spent 10 days developing it. I created a separate branch to check in my code during the development. I didn't check in for the last 5-6 days.

When the development was completed I was supposed to merge my changes with the Trunk. I thought it was going to be easy and I would simply SWITCH the current working branch to the Trunk and then check in the code.

You must have guessed by now what happened :)

Lessons learnt :

check in the code daily ( if required multiple times a day). That's what a version control is for.

Learn what a command does before using it.

Fortunately as I was new to svn, I was taking local backup of the code and only lost 1-2 days of work.

It not happened to me but to a friend co-worker. He was taking some screenshots from a web application we were developing and he took a break. He had to send the screenshots to our client (a government organization) for approval. He was talking and joking with a couple of fellow workers and then they googled for "vagina en lata" (NSFW - search for it yourself). Of course he forgot to clean the top right search box in Firefox before taking a few screenshots more and send them to the client.

He was in panic but as the days went on surprisingly no one appear to have noticed it.

I remember working on an Election project for Mauritius Government a couple of years ago. There were a number of constituencies where the election was going on and it was all online and real time that day. Our software was almost complete with a few fixes remaining for a particular constituency where the things were a little different from other.

I was just a junior programmer and was told that for the "Rodrigues" constituency (the particular one) result will come last(for some reason I was not aware) so I can continue fixing the software. BUT my luck ran out and the first contituency where the result was declared was "Rodrigues".
Phew... I had to quickly create a patch application in 10 mins (for which the Mauritius People waited)... Sorry Guys !!!

I wanted to make the Ogres properly set the angle on their grenades so they would land at the target. (In the original code, Ogres always fired at 14.3333 degrees - even if you were below them).

Was sitting with a friend at my PC for a few hours as we worked out how to do it, sorted out the cos/sin lookup tables, etc, etc.

Made the last change, ran it and it worked!

Quit Quake and we're sitting there, big grins all round. He's looking at me, and can see the screen and I'm looking at him, away from the screen. With my fingers on the keyboard, I renamed the file (from the original temporary name).

Only, I actually typed "delete ogre_new.c"

About a second or so later, my fingers informed my brain what they'd actually typed.

He kept looking at me and I kept looking at him and he said, "I don't think you wanted to do that."

I was writing some automated steps for our build server, we'd been having trouble with stray files staying checked out on the machine after previous failed runs so I added a simple "revert all checked out files" step at the beginning of the build process.

At the point where I saw "Failed to remove /proc/something..." I noticed the execution of my script is taking an unusual amount of time: half a second later, I understood what's going on and manically pressed Ctrl-C until it died.

The bad news? This was shortly before going into production.
The good news? The web applications and databases we needed were all under /var and since rm -rf worked alphabetically, it didn't get there yet. There followed a long night of reinstalling the server, but it was only a single night. ;)

Takeaway?
1.) linux will not stop you from shooting yourself in the foot
2.) always check your variables before a volatile operation is executed :)

It took place quite some time ago in a non-trivial shell script: I might have missed a detail or two...but `rm -rf $nosuchvariable` does not return the message you see on my machine (ubuntu 10.4)...I guess bash (or a different shell?) behaviour may be configured differently.

The company makes system for monitoring patients in residential care facilities. Patients can push a button to call a nurse for assistance. Any number of automated devices can also send in a call. Calls are displayed on central or area monitors, and the nurse responsible for that area gets a message via a paging system. If there is no response to a call within a set period of time (“call canceled”), the page is repeated, and is escalated to a supervisor, who is also paged. If still no response after a further time period, a higher level manager is paged as well as the first and second level responders. All pages are repeated until there is finally a response. This is a proven system which guarantees that no one will be left without assistance for too long, and has worked well in practice for years.

We enhanced the system to not just send pages, but to be able to use an email address as well, and I tested this sending to my own email address at our company. Worked fine.

Then, to stress test another part of the system, I ran a script which simulated sending calls repeatedly from thousands of patients without also automatically canceling the original calls, so that they continued to escalate over the weekend...

...Without thinking to disable the 'send email' function first.

Over the weekend I sent myself an email from home to work as a reminder of something, and got a “mailbox is full” response, which puzzled me, since I get so little email at work.

When I came in Monday morning no one else's email was working either. Of course our ISP had shut us down as a spammer part way through the weekend when the geometrically increasing number of calls hit some magic number.

->Complete a test and start from a clearly defined state before testing something else....