Some quick estimation shows that using slow mode enumeration would be 200 times slower than fast mode. Does it really matter whether the speed-up is 195.1231 times or even 103.4761 times? Even if the estimate were off by an order of magnitude, a 20-fold speed-up is still worth it.

Imagine if people had to carry out experiments for every possible optimization (both before and after) to prove that it was worthwhile.

"Could you please deliver these letters?"

"Sure, here let me grab them."

"Why are you doing that? Why don't you take one letter at a time?"

"Um, because it'll be faster to take all of them at once so I don't have to keep coming back here."

"Do you have any hard data to support that? I'm not going to believe you until you show me some hard data."

Sounds like you wouldn’t make it as a scientist. I find scientists are unwilling to believe anything unless it is backed up with hard data that has been published in a respected peer reviewed journal and subsequently reproduced.

"In theory grabbing all the letters at once is faster, but without actual measured results how do you know there’s any real-world benefit?"–

–a variation on the joke where a mathematician starts to present a proof of a new theorem, when a mathematician in the audience calls out, "I have a counterexample!" "That’s OK," says the first mathematician, "I have another proof."

Or, you say to a scientist (or engineer, apparently), "Your argument contains a fallacy; your conclusion doesn’t follow from your assumptions." "Well," he replies, "I’ll have to see some hard data to support your claim." (Or, alternatively, "I’ll have to see some real-world, actual measured results to support your claim.")

To the man with a hammer, everything looks like a nail, and similarly for the man with "hard data that has been published in a respected peer reviewed journal and subsequently reproduced" or "actual measured results."

Actually, he would make a great engineer. Engineering requires great skills at knowing what’s worth measuring and what’s not. That’s why an engineer doesn’t usually commission a finite-element analysis to verify that a two story house will hold up. That’s why an engineer will not usually calculate torsion stresses on an ordinary beam, unless s/he has a reason to believe it might an issue.

Remember the adage about an engineer being the guy who can do for $50 what any fool can do for $1000? Well, measuring costs money.

(Graduated as a Civil Engineer abroad, not a licensed Civil Engineer anywhere in the US, so this is not advice: always consult your personal Civil Engineer)

You guys are tense.. this is a blog, someone talking about stuff he loves to interact with other people. Blogs practically at the level of a water fountain conversations, if water fountains came with a blackboard to scribble some stuff on.

If this were a published book you paid for, no doubt there would be some extra time and verifications, on top of the already existing one. From my personnal experience, I figure Raymond can probably spend more than an hour verifing a post (and just imagine how many people will put him back in his place if he gets details wrong!) Deeper testing would probably test several days.

I’ve been maintaining a file browser that’s using IShellFolder, which had several iteration based on clients complain about its network speeds, and I’m interested in the topic. We have had these problems.

imho the topic is about the general idea behind it, and not about a specific quoted number of speedup time.

Also, there are a lot of times that making something 200 times faster in theory has absolutely no real-world benefit. Using a quick-sort instead of a bubble sort on a 10 item list. Giving your CPU a 200 item issue queue. Things like that.

I agree. That’s why agreements and disagreements get into the comments, and that’s why conversations fragment along several tangents.

Also consider if a water fountain conversation had one speaker and multiple listeners. Of course some kinds of announcements are designed to operate that way, but they’re still a lot less popular than typical water fountain conversations.

Oops sorry, this tangent went on longer than expected. I could prove this assertion but don’t have an envelope at the moment.

And you’re a civil engineer? remind me not to drive across any bridge you design.

Then I suggest you don’t drive across ANY brigde, because that’s how they are designed. A bunch of "people who know about bridges" sit for a while doing math, until they are convinced that the bridge will hold.

Of course, the "hard-data approach" may seem better, but it’s just too expensive… before I can say how much weight my briedge is goint to hold, I have to spend millions of dollars to build one and start putting weight on it until it breaks. Then I’ll have "hard data" of what the briedge can hold. Except that maybe I was lucky, so I really need to build 100 briedges, and then see if they all hold the same weight. Only then I can build the briedge I actually wanted to build.

The problem is that in general by enhancing performance we degrade some other aspect of the software, typically maintainability.

Degrading maintainability will have a longer-term impact on the speed and performance of the application, so it’s important to analyse the improvement in short-term performance before making this sacrifice.

That’s the reason behind the oft-heard, "premature optimisation is the root of all evil".

> analysis to verify that a two story house will hold > up. That’s why an engineer will not usually

> calculate torsion stresses on an ordinary beam,

> unless s/he has a reason to believe it might an

> issue.

And you’re a civil engineer? remind me not to drive across any bridge you design.

People making decisions because of back-of-the-envelope calculations and not actually testing things give us things like the ceiling of the Big Dig collapsing.

The whole issue, if you go back to the original blog entry, was the the "200 times speedup" given was calculated from a really contrived scenario. Some of us were like "nice theory, but we’d like to see some actual numbers before believing that’s the real world results".

From experience I’ve seen many times people claim that you’ll get big improvements doing something a different way, only for it to turn out to be the same, or even *worse*.

Estimating, back-of-the-enveloping, and brainstorming are great, and good for a discussiom. They are never a substitute for actual implementation and testing. And some of us prefer our software to actually run faster in real life, not just on the back of some envelope.

> The whole issue, if you go back to the original blog entry, was the the "200 times speedup" given was

> calculated from a really contrived scenario. Some of us were like "nice theory, but we’d like to see

> some actual numbers before believing that’s the real world results".

Why would you need that? It’s trivial to see. If you have to do 201 requests in the "slow" way and only 1 in the "fast" way, then it’s *by definition* 200x faster. Each individual request takes approximately the same amount of time (that’s what great about networks – a 1KB request takes about the same time as a 100KB request).

"While I’ve not seen the code here, it would appear to me that this optimisation is both obviously faster and (presumably) more readable/maintainable.

How is GetAllFiles() harder than While(FilesExist);GetNextFile();loop"

While it might not be harder directly, things like caching, swapping, and a slow read medium might make getting a listing of more files than actually necessary can be detrimental, both in performance and in management. So, when one is searching for a specific file in a directory on a remote server, it’s not necessarily clear that the slow method is actually slower. And it can actually be harder, if one allocates and deallocates a large memory chunk repeatedly (which might actually be proper design) for a whole file listing when a few reads and a simple loop would have done.

The general thrust of the anti-envelope comments remind me of a Feynman observation about results presented at a gravity conference – they were either obviously wrong or convoluted demonstrations of obvious truths.

While it might not be harder directly, things like caching, swapping, and a slow

read medium might make getting a listing of more files than actually necessary can

be detrimental, both in performance and in management.

So what? We’re not trying to speed up getting only partial directory listings here. We’re speeding up the SPECIFIC case of Explorer listing a folder’s contents. The ENTIRE folder’s contents. If you only want one file, the "slow" method is still there.

I think Jade Philosopher is right. Either that, or people only read half-way before commenting.

Dean: alternatively, Explorer could optimise by working out that it will be displaying, say, nine items, and fetching only those nine. (Depending on sort order and what information is needed for the status bar, obviously.)

Just opening my i386 directory on a network server takes around 12 seconds to enumerate the 5,941 objects within – and it only has to display the first 12, the rest are out of sight! (Come to think of it, displaying the first 12 straight away then fetching the rest asynchronously would be nice too…)

[So you’re saying that Explorer should not sort the results? How else can it tell which are the “first 12”? -Raymond]

Seems to me that some people don’t understand the difference between a reasonable estimate and a wild guess. And those people feel that they know what makes a good engineer. That’s rather ironic.

On the other hand, the concept of "falsifiability" in science has been grossly misinterpreted here as removing the burden of proof from the one making a claim. Falsifiability means that a hypothesis *can* be disproven, and a claim is unscientific if it is not falsifiable. It certainly does not mean that a claim is scientifically *true* unless *disproven*!

Software has strong roots in both the science and engineering fields. I think what we’re seeing in here is a bizarre argument between two groups who are firmly on their own side of the fence and perhaps don’t realize that the other side even exists.

I’m pretty sure that Raymond has a good grasp of both sides. The fact is, he’s neither proposing a scientific theory nor building a bridge. It’s not necessary to have reams of hard data as long as the logic is sound (which it is) and the estimate is reasonable (which it is).

“So you’re saying that Explorer should not sort the results? How else can it tell which are the “first 12″? -Raymond”

In the Explorer window I just opened, there are four sort order options: Name, Size, Type and Modified. Since none is ticked, no order has been specified; the first 12 returned by the server would fulfil this requirement as well as anything else. Getting any 12 straight away while the other few thousand get retrieved beats an animated flashlight and nothing else until it’s possible to apply an unspecified and unsolicited order, IMO! (If/when I actually ask for the results to be sorted in some order, that’s different – but I haven’t.)

In this case, the meaning of ‘first’ is as obvious as in ‘findfirstfile’: the first one returned, whatever the source order might happen to be. Apart from anything else, it’s the only meaning which works; to query it is like asking which timezone someone is asking about when they ask what time it is…

[I think if you look more closely, your “unsorted” view is still sorted. -Raymond]

Couldn’t you ask the server for all the names (but not the handles) in large chunks, sort the names, then go back and get the handles for the first 15 files synchronously, then display the first 15 files, and only *then* go back and get the rest of the handles asynchronously?

(Now this won’t work if the redirector’s implementation of NtQueryDirectoryFile doesn’t return all the possible attributes that the folder may be sorted by. However, FindFirstFile’s return structure has members for both size and the various dates, so it should be possible to get that data the first time through.)

And certainly the "fast mode" would still be faster than this. But at least this would alleviate some of the problems with slow mode, because the slowness would be happening in the background.

Windows Explorer is showing a small ascending triangle next to the word for filename. Most likely if I want to then I can find the statement of how this is sorted, but this alphabetical order? Sorted according to which language?

Sure sorting isn’t a bad thing and I wouldn’t mind having more options. I’d like to set things so that Windows Explorer will sort lists in the same order on local drives as it does in ftp views. Lower priority, I might want to set Outlook Express 6 to sort things in the same order as Outlook Express 6. But sometimes maybe it would be OK to say it’s listing things as it finds them while showing that the search is still under way, like a dog does.

(By the way I didn’t really capture a screenshot of Internet Explorer downloading Vista RC2 at the 3% point, I only created a file with that name to check how Windows Explorer would sort it. At the 0%, -3%, and 221% points, I captured screenshots.)

Guys, if you don’t think fast mode is faster in enough cases to make it useful then you don’t have to use it.

Personally I think Raymond is spot-on here with his estimate, but that’ll only affect code that *I* write.

Now I need to get back and benchmark if cout or printf is the fastest way to display the help text for my app. Does anyone here have data on whether processor architecture / SMP affects the performance of stdout? :P

“[I think if you look more closely, your “unsorted” view is still sorted. -Raymond]”

I know it *is* sorted – my point is that I didn’t *ask* for it to be sorted, the relevant menu indicates no sorting order, and that sorting it is causing problems (a non-trivial delay opening the folder, to achieve a result I didn’t request in the first place). Explorer not indicating that it is sorted is, arguably, a bug, the fact the implicit ‘no sorting’ option doesn’t actually deliver is just an irritating drawback.

FWIW Unix’s (GNU’s?) ‘ls’ behaves as I’d want: for ‘small’ numbers of files they’re sorted by name, anything higher it would take too long, so they’re unsorted. Explorer could go one better and display results as they come in, as it does already for search operations…

TCLIU: For real performance gains, just #define them both as noops :-)

[I can see the bug reports already. “Explorer shows files in scrambled order… sometimes.” If you don’t pick an explicit sort order, then the folder’s “natural” sort order is used. -Raymond]

[I can see the bug reports already. “Explorer shows files in scrambled order… sometimes.” If you don’t pick an explicit sort order, then the folder’s “natural” sort order is used. -Raymond]

I can’t really see anyone regarding ‘Explorer shows files in the order I requested’ (besides, if ‘natural’ order actually matters to anyone, why is there no option to specify it?!) as more of a bug than ‘Explorer sits twiddling its thumbs for long periods of time’ – and displaying incrementally rather than waiting for the whole lot to arrive would preserve whatever order is being imposed, at the expense of a bit of CPU time. Sort and display the results as they arrive, everyone’s happy – make them wait as long as possible, nobody gains except perhaps the CPU temperature or battery life.

Web browsers have been doing it for years, for faster operations than that!

(On a related note, it would be really nice if XP’s Notepad didn’t stop the message pump while trying to open files, too…)

[This creates the different problem of “I was about to click on something and it moved and I ended up clicking on the wrong thing.” As for the related note, edit controls (like all window controls) are single-threaded. There is no special treatment for the WM_SETTEXT message. -Raymond]

However I think James’ idea was that Windows Explorer would list files as they’re found (and my idea was that Windows Explorer would show its status as still in the middle of reading), adding to the list as they come in. The user would click to sort when the user is ready for the display to be reordered.