Note that both versions of the code fragment have the same race condition:
If the file desktop.ini initially exists but gets deleted
before you get around to creating a new Stream­Reader,
you will get a File­Not­Found­Exception.

I bet these people hate going to the library to get a book by title,
because it takes so darn long:
They go up to the librarian and say,
"Please give me all the books you have,"
and then they fill up their cart with thousands of books,
then sit in the corner saying,
"Nope, the title of this book is wrong.
Nope, not that one either.
Still the wrong title.
How about this book?
Nope, not this one either.
Man, this is taking forever..."

I've seen the first 'for' loop in the last few months, except with maybe three if(iterator == constant)'s in the loop, with longer but mostly repeated code in each 'if' body… not sure how the person got into writing that, though there were also variables declared that were never used, so probably poor comprehension combined with cut-and-paste from some other source.

The best part about the library analogy is that once they find the right book they don't break out! No no no – they still keep on looping through and checking titles until the last book in the library.

Granted, this is a terrible coding pattern, but who would write such code in the first place? Has such code ever been seen "in the wild"? I know I haven't…

[I remember one customer who used this anti-pattern heavily. You also see this anti-pattern used in real life: "What flavors do you have?" and then after the list of flavors is recited, "I was hoping you had raspberry." -Raymond]

Thanks for the library analogy, it was amusing as well as accurate. I take the principle of avoiding code like this even further. I avoid any code that does anything I cannot justify as necessary. Like performing the same calculation twice, no matter how apparently simple. Every time it is executed it wastes time. If you get a different answer the second time, it's a hard to find bug. If everyone was more careful about the code they write, there would be a lot fewer applications that either cripple the performance of a computer more than 3 years old or exhibit hard to find bugs.

[I remember one customer who used this anti-pattern heavily. You also see this anti-pattern used in real life: "What flavors do you have?" and then after the list of flavors is recited, "I was hoping you had raspberry." -Raymond]

Yes, but at least in that case its just checking to see if they had something better. Like if you ask for chocolate, you miss the fudge brownie. But if you just ask for the fudge brownie you miss the double fudge brownie. Or perhaps a good party cake ice cream is best, just because they have it.

Your example of Hashtables is a nice one: depending on the implementation, you might get NULL from the "return hashtable["target"];" statement, or an exception. If you positively don't want to cause an exception you might first try and check if the element you're looking for is indeed there. Of course you can always put "try..catch" around it but then there's not much of a difference in LOCs between

In other words, don't use a linear search if a constant-time operation is available. But "for-if antipattern" is couched in terms that people who've never heard of "linear search" would understand.

I usually present this as "hash tables are your friends", because the most common instance I see is someone building simple arrays or lists and then using linear searches to find items — not uncommonly there are multiple levels, leading to loops in loops — instead of using hash tables, thus turning an operation that should take a negligible amount of time into a quadratic beast of burden. The pernicious thing is that the faster machines become, the less noticeable their initial poor choice of data structures is, until the size of the collection reaches a crucial tipping point that causes cache/page misses and performance suddenly takes a dramatic nosedive (instead of gradually worsening).

I haven't yet seen someone not understanding how hash tables work and using Raymond's second example, but this month I've had no less than three occasions of hash tables not being used when they should have been, so 'tis the season. And one of these was (of course) in code I myself wrote a few months ago, because "it's a hundred elements at most, no need to get fancy". That *was* true when I wrote the code, but the crucial follow-up question "and how is this number expected to change in the future" was not asked. There are several morals in this that I won't bother to spell out…

@Code Priorities: Aside from defence-in-depth, something is either secure or it isn't. As such security isn't on the list of things to optimise for, but something you have to keep in mind when deciding what the software must do and how it does it.

It isn't one of those things you start to look at when you have a working system, but rather one of those things you start thinking about before your hands even touch the keyboard, along with required functionality and such.

Trying to tack security on later is a long and painful process, which is a shame since coding securely isn't any worse than not when you do it from the start.

Have you asked any of these programmers why they used those anti-patterns? I have a feeling the reasons they give might actually be legitimate and something you'd never guess in a million years if they never told you.

I've written code that's sort of like that in the past, but there were legitimate reasons for doing so.

Consider that first for loop. Maybe it's possible that the code has gone through many revisions, and that for loop used to do other things besides "doing something" when i equaled 42. At a later point in time, maybe that programmer was asked to remove some of that code. Any time you make changes to code, you want to minimize how much code you change in order to minimize the probability of introducing bugs. Maybe that's what that programmer was thinking.

It's also possible that said programmer was under pressure to make many code changes in a short period of time, so he was hurrying to get the code changes done as fast as possible while being careful not to introduce bugs. Maybe he didn't feel "safe" refactoring that changed for loop into just "do_something(42)" and he didn't have the time to think about it.

Another possibility is that maybe the programmer originally wanted to do other things in that for loop, but decided later not to. When he made that later decision, he thought "Well maybe I might change my mind again, so I'll leave the code in just in case and comment it out. That way, if I do change my mind later, I can just uncomment all that code instead of having to retype it all." Then six months later, a different programmer decides to remove all commented out code from the programmer so that the source code looks cleaner. This second programmer is scared of making code changes (because he doesn't want to introduce bugs by mistake).

Things like this just happen for weird reasons like that.

[I had the opportunity to ask the programmer who wrote the Directory.GetFiles loop why they didn't do it the more direct way. "Oh, that's a neat trick," they said. (The "if (i == 42)" example did not come from a real program. It was an extreme example.) -Raymond]

I've *written* something even more stupid, but related. I started with some code that walked some variable length data structures. One of the thing it did was count the total length. Another thing it did was do something with each and every item.

As the program developed, the "do something" bit became less complex and after a lot (but not quite enough) refactoring, I was left with the moral equivalent of:

I know this is technically risking a nitpicker's corner entry, but doesn't the File.Exists() code introduce a problem with 8.3 filename collisions? I think the Directory.GetFiles() version is immune to that problem, inefficient as it may be.

Regarding the question of using a dictionary to look up a value. That's all well and good if you already have a dictionary, but if all you have available is an array, then iterating through the array to find the one element you need to do something to will be quicker than building a dictionary in order to do one single lookup on it. (Of course, if you will later need to do other lookups and can cache the dictionary, then building a dictionary becomes more advantageous).

"Any time you make changes to code, you want to minimize how much code you change in order to minimize the probability of introducing bugs. Maybe that's what that programmer was thinking."

This is a general principal that leads to a concept I've seen referred to as "bit rot". Horrendous code that is a result of myriads of minor changes (small fixes and enhancements). When code goes into maintenance mode this approach is often taken, but it is a blinkered approach. I have seen bug after bug appear in areas of code, where the root cause of the problem was a bad design decision. A refactor then becomes the "correct" bug fix. Too often I see developers trying to solve for the local minima, while failing to see the bigger picture.

Thanks for the comments @Neil and @Danny, I guess I need to figure out the cost of throwing the exception (the only thing I really remember is the old "Exceptions are expensive" trope) vs File.Exists then return new StreamReader.

(I guess I'm defensive about it because I do if(File.Exists(whatever)){return new StreamReader(whatever);} as a matter of course, all the time, and while I'm used to not knowing what I am talking about half the time, I thought I was doing the right thing.)

No body is forcing you to care about this particular race condition. Your manager most definitely will not sleep more comfortably if you handle it somehow, but the next maintenance programmer will have one less thing to worry about if you do it right.

As for your next point, in managed world there is no way to avoid the race condition without watching for FileNotFoundException. So you may as well open the file directly rather than checking for it first.

In native domain, CreateFile would behave similarly with an error code INVALID_HANDLE_VALUE for any potential error and then you have to call GetLastError for details. There is a LockFile/Ex function but that: a) locks byte ranges, b) and requires a file handle in the first place to begin with which brings us back to CreateFile and cousins.

In addition to the race, the BCL's File.Exists() method is of limited value as it can only confirm that a file exists at a location, not that it does not exist. (It's the same with Directory.Exists().)

This is because File.Exists() does not discriminate between a file not existing and an underlying error condition that prevents the existence of a file being determined. It returns false in both cases.

What has been implemented is not Exists() semantics, it is IsAccessible() semantics, and only in the instant that you call it. You cannot assume that it is still accessible after due to the race.

So… just try and open the file and handle the FileNotFoundException exception.

"Regarding the question of using a dictionary to look up a value. That's all well and good if you already have a dictionary, but if all you have available is an array, then iterating through the array to find the one element you need to do something to will be quicker than building a dictionary in order to do one single lookup on it."

That goes without saying, so I didn't say it. Don't make me bring back the nitpicker's corner.

Well, I can think of an Ada example where the for-case pattern makes sense. Since a case must include all values of a given type, doing a for over the full range of a type and then a case inside it would make sure to handle all values in that range.

"You also see this anti-pattern used in real life: 'What flavors do you have?' and then after the list of flavors is recited, 'I was hoping you had raspberry.'"

For-if might be a reasonable method in this case. The customer may believe that the odds of the provider having raspberry are low, and must therefore choose an alternative. If the provider's options are unknown and the odds are high that the entire list will need to be recited, it may not optimize the scenario to ask if raspberry is available before asking for the list. The customer may also be in a situation where raspberry sounds good now, but might hear something that sounds better (I can't think of a situation where this would happen in computing, but it works in this analogy).

You could also imagine scenarios where the customer had multiple preferred options; something like "I was hoping for raspberry, blueberry, or strawberry." Depending on what options are available to make that determination, it may be more efficient to just ask for the list and see if one of the preferred options is there. Doing so may even free the provider to service other requests while the customer is making a decision. This is fairly close to what happens when a server at a restaurant hands you a menu.

The questions I would try to answer before deciding on the best approach are:

How long are the customer and provider's lists?

How often do they change?

How often will each list need to be queried/compared?

What is the cost of communication between the customer and provider?

What methods does the provider support to answer the customer's questions?

I am not so sure that i can write this off as an absolute anti pattern. of course, if you are not breaking from the loop, it is a problem. But consider the coding / real life scenarios:

1. What if you are given a list and are asked to operate on a specific element. I cannot assume that the list is always hashed. Depending on the size of the list and / or frequency of execution of my logic, i may simply decide to iterate till i find the right item. I would not blindly assume that i need to hash the list every time because, hashing comes at the expense of additional memory and one time processing.

2. Going to your flavor logic, lets assume that all flavors are on the display and there is no server [this is a true representation of the concept]. Are you not going to go through the list till you see the flavour you want? NOW, lets assume the person being present at the counter. I do not have to spend time looking at an extensive menu display. i can simply ask the server. HERE, the server IS my hash in a way.

So, in sort, this is a bad practice but only if

1. if you own the list in your code

2. frequently need to operate on a specific element

3. Do not have member contract considerations to implement hashing

[Obviously not every enumeration falls into for-if anti-pattern. Only the ones where the container provides direct addressing. -Raymond]

Phaeron above is right. Calling GetFiles() avoids short-name collisions as it won't return any short names.

Too bad there's no option to the wildcard expansion to FindFirstFile to avoid them. However, the way GetFiles() works, is it returns the long name for any file matching by shortname to wildcard (I don't know if Find*File always returns the long name or if .NET

makes another call) so simply repeating the match in the code fixes the problem.

Surely this is another problem with the for-if anti-pattern, rather than something it does right? I mean, if the filename is "longfi~1.txt" instead of "desktop.ini", presumably the intended behaviour would be that the routine succeeds, which the for-if anti-pattern doesn't. (And again, in this case it would be even better to "just go ahead and do it, and catch the exception if it doesn't work".)

Having said that, it does prove that there's a difference in behaviour, and so it's possible in principle that the for-if anti-pattern would have better results in some situations.

[Trust me on this — the code that used the for-if anti-pattern was not doing so because it was worried about short names. (The actual example was not a file system query but a database query.) -Raymond]

[Trust me on this — the code that used the for-if anti-pattern was not doing so because it was worried about short names. (The actual example was not a file system query but a database query.) -Raymond]

Filed under: completely missing the point (not Raymond, but everybody else). This changes just about everything.