I often find myself saying,
"I bet somebody got a really nice bonus for that feature."

"That feature" is something aggressively user-hostile,
like forcing a shortcut into the Quick Launch bar or
the Favorites menu,
like automatically turning on a taskbar toolbar,
like adding an icon to the
notification area
that conveys no useful information but merely adds to the clutter,
or (my favorite)
like adding an extra item to the desktop context menu that
takes several seconds to initialize and gives the user the ability
to change some obscure feature of their video card.

Allow me to summarize the guidance:

The Quick Launch bar and Favorites menu belong to the user.
There is intentionally
no interface to manipulate shortcuts in the Quick Launch bar.
We saw what happened to the Favorites menu and
learned our lesson:
Providing a programmatic interface to high-valued visual real
estate results in widespread abuse.
Of course, this doesn't stop people from hard-coding the path
to the Quick Launch directory—too bad the name of the
directory isn't always "Quick Launch"; the name can change
based on what language the user is running.
But that's okay, I mean,
everybody speaks English, right?

There is no programmatic interface to turn on a taskbar toolbar.
Again, that's because the taskbar is a high-value piece of
the screen and creating a programmatic interface can lead to no good.
Either somebody is going to go in and force their toolbar on,
or they're going to go in and force a rival's toolbar off.
Since there's no programmatic interface to do this,
these programs pull stunts like generating artificial user input
to simulate the right-click on the taskbar, mousing to the
"Toolbars" menu item, and then selecting the desired toolbar.
The taskbar context menu will never change, right?
Everybody speaks English, right?

The rule for taskbar notifications is that they are there to,
well, notify the user of something.
Your print job is done.
Your new hardware device is ready to use.
A wireless network has come into range.
You do not use a notification icon to say "Everything is just like it
was a moment ago; nothing has changed."
If nothing has changed, then say nothing.

Many people use the notification area to provide quick access to
a running program,
which runs counter to the guidance above.
If you want to provide access to a program, put a shortcut on
the Start menu.
Doesn't matter whether the program is running already or not.
(If it's not running, the Start menu shortcut runs it.
If it is already running, the Start menu shortcut runs the program,
which recognizes that it's already running and merely activates the
already-running copy.)

While I'm here, I may as well remind you of the guidance for
notification balloons:
A notification balloon should only appear if there is something
you want the user to do.
It must be actionable.

Balloon

Action

Your print job is complete.

Go pick it up.

Your new hardware device is ready to use.

Start using it.

A wireless network has come into range.

Connect to it.

The really good balloons will tell the user what the expected action is.
"A wireless network has come into range.
Click here to connect to it." (Emphasis mine.)

Here are some bad balloons:

Bad Balloon

Action?

Your screen settings have been restored.

So what do you want me to do about it?

Your virtual memory swap file
has been automatically adjusted.

If it's automatic, what do I need to do?

Your clock has been adjusted for daylight saving time.

Do you want me to change it back?

Updates are ready for you to install.

So?

One of my colleagues got a phone call from his mother asking
him what she she should do about a new error message that wouldn't go away.
It was the "Updates are ready for you to install" balloon.
The balloon didn't say what she should do next.

The desktop context menu extensions are the worst,
since the ones I've seen come from video card manufacturers
that provide access to something you do maybe once when you set up
the card and then don't touch thereafter.
I mean, do normal users spend a significant portion of their day
changing their screen resolution and color warmth?
(Who on a laptop would even want to change their screen resolution?)
What's worse is that one very popular such extension adds an annoying two
second delay to the appearance of the desktop context menu,
consuming 100% CPU during that time.
If you have a laptop with a variable-speed fan, you can hear it
going nuts for a few seconds each time you right-click the desktop.
Always good to chew up battery life initializing a context menu
that nobody on a laptop would use anyway.

The thing is, all of these bad features were probably justified
by some manager somewhere because it's the only way their feature
would get noticed.
They have to justify their salary by pushing all these stupid ideas
in the user's faces.
"Hey, look at me! I'm so cool!"
After all, when the boss asks,
"So, what did you accomplish in the past six months,"
a manager can't say,
"Um, a bunch of stuff you can't see.
It just works better."
They have to say,
"Oh, check out this feature, and that icon, and this dialog box."
Even if it's a stupid feature.

As my colleague
Michael Grier put it,
"Not many people have gotten a raise and a promotion
for stopping
features from shipping."

A lot of people simply don't care to learn the difference between
the search box and the address bar.
"If I type what I want into this box here, I sometimes
get a strange error message.
But if I type it into that box there, then I get what I want.
Therefore, I'll use that box there for everything."
And you know what?
It doesn't bother me that they don't care.
In fact, I think it's good that they don't care.
Computers should adapt to people, not the other way around.

You can try to explain to these people,
"You see, this is a URL, so you type it into the address box.
But that is a search phrase, so you type it into the search box."

"You-are-what?
Look, I don't care about your fancy propeller-beanie acronyms.
You computer types are always talking about how computers are so easy
to use, and then you make up these arbitrary rules about where I'm supposed
to type things.
If I want something, I type into this box and click 'Search'.
And it finds it.
Watch.
I want Yahoo, so I type 'yahoo' into the box,
and boom, there it is.
I have a system that works.
Why are you trying to make my life more confusing?"

I remember attending a presentation by the MSN Explorer team
on what they learned about how people use a web browser.
They found many situations where people failed to accomplish
their desired task because they typed
the right thing into the wrong box.
But instead of trying to teach people which
box to type it in, they just
expanded the definition of "right".
You typed your query into the wrong box?
No problem, we'll just pretend you typed it into the correct box.
In fact, let's just get rid of all these special-purposes boxes.
Whatever you want, just type it into this box, and we'll get it for you.

I wish the phone company would learn this.
Sometimes I'll dial a telephone number and I'll get an automated
recording that says, "I'm sorry.
You must dial a '1' before the number.
Please hang up and try again."
Or "I'm sorry. You must not dial a '1' before the number.
Please hang up and try again."
That's because in the state of Washington,
there are complicated rules about when you have to dial a "1"
in front of the number and when you don't.
(Fortunately, the rule on when you have to dial the area code is
easier to remember: If the area code you are calling
is the same as the area code
you are dialing from, then you can omit the area code.)
For example, suppose your home number is 425-882-xxxx.
Here's how you have to dial the following numbers:

To call this number

you dial

425-202-xxxx

202-xxxx

425-203-xxxx

1-

203-xxxx

206-346-xxxx

206-

346-xxxx

206-347-xxxx

1-

206-

347-xxxx

If you get it wrong, the voice comes on the line to tell you.
Hey, since you know what I did wrong and you know what I meant to do,
why not just fix it?
If I dial a number and forget the "1", just insert the 1 and connect
the call.
If I dial a number and include the "1" when I didn't need to,
just delete the 1 and connect the call.
Don't make me have to look up in the book whether I need a 1 or not.
(In the front of the phone book are tables showing which numbers
need a "1" and which don't.
I hate those tables.)

(Yes, I know there are weird technical/legal reasons for why I have
to dial the phone in four different ways depending on whom I want to call.
But it's still wrong that these technical/legal reasons mean that
the rules for dialing a telephone are impossibly complicated.)

Representatives from the IT department of a major worldwide
corporation came to Redmond and took time out of their busy
schedule to give a talk on how their operations are set up.
I was phenomenally impressed.
These people know their stuff.
Definitely a world-class operation.

One of the tidbits of information they shared with us is
some numbers about the programs they have to support.
Their operations division is responsible for 9,000
different install scripts for their employees around the world.

That was not a typo.

Nine thousand.

This highlighted for me the fact that backwards compatibility is
crucial for adoption in the corporate world.
Do the math.
Suppose they could install, test and debug ten programs each business day,
in my opinion, a very optimistic estimate.
Even at that rate, it would take them three years
to get through all their scripts.

This isn't a company that bought some software ten years ago
and don't have the source code.
They have the source code for all of their scripts.
They have people who understand how the scripts work.
They are not just on the ball; they are all over the ball.
And even then, it would take them three years to go through and
check (and possibly fix) each one.

The little sliver at the top is the mapping of zero to zero.
The big white box at the bottom is the mapping of all negative
numbers to corresponding negative numbers.
And the rainbow represents the mapping of all the positive
values, mod 65536, into the range 0x80070000 through 0x8007FFFF.

Now let's take a look at that puzzle I left behind:

Sometimes, when I import data from a scanner, I get the error
"The directory cannot be removed."
What does this mean?

My psychic powers told me that the customer was doing something
like this (error checking deleted):

Remember the
Ten Immutable Laws of Security.
Today, we're going to talk about number three:
If a bad guy has unrestricted physical access to your computer,
it's not your computer any more.

There was a bug which floated past my field of vision many months ago
that went something like this:
"I found a critical security bug in the USB stack.
If somebody plugs in a USB device which emits a specific
type of malformed packet during a specific step in the protocol,
then the USB driver crashes.
This is a denial of service that should be accorded critical
security status."

Now, it's indeed the case that the driver should not crash
when handed a malformed USB packet,
and the bug should certainly be fixed.
(That said, I'm sure some people will manage to
interpret this article as advocating not fixing the bug.)
But let's look at the prerequisites for this bug
to manifest itself:
The attacker needs to build a USB device that is intentionally
out of specification in one particular way
and plug that device into a vulnerable machine.
While that's certainly possible, it's a lot of work for
your typical hacker to burn a custom EEPROM with USB firmware
that manages to hit the precise conditions necessary to trigger
the driver bug.

It's much easier just to grab a fork.

You see, since this attack requires physical access to a USB port,
you may as well attack the machine in a much more direct manner
that doesn't require you to spend hours with a soldering gun
and a circuit board:
Just grab a fork and jam it into the USB port.
I haven't tried it, but I suspect that will crash the
machine pretty effectively, too.
If you can't get the fork to work,
pouring a glass of water into the USB port
will probably seal the deal.

Doron tells me that some companies address this problem by
removing physical access:
They fill the USB ports on all their machines with epoxy.

Update: Randy Aull tells me that the USB 2.0 specification
anticipated the fork attack and requires that all transceivers
be able to withstand short circuits "of D+ and/or D- to VBUS,
GND, other data lines, or the cable shield at the connector,
for a minimum of 24 hours."
(Though I'm not sure if that also covers shorting VBUS to GND.)
I wonder if they also have a paragraph specifying that USB devices
must also withstand water immersion...
Of course, you could still use that fork to push the power button or
jam it into an outlet on the same circuit as the computer you want
to take down in order to blow a fuse.

Just say no to DIY overclocking and let us do it for you!
We'll factory overclock your Intel®
quad-core processor.4
Yep, you read that right: factory overclock,
which is something that most other major PC manufacturers don't do.

We discussed earlier the history behind the
the return value of the ShellExecute function,
and why its value in Win32 is meaningless aside from testing it
against the value 32 to determine whether an error occurred.

Let's turn the question around.
How would you, the implementor of the
ShellExecute function, report success?
The ShellExecute is a very popular function,
so you have to prepared for the ways people check the return
code incorrectly yet manage to work in spite of themselves.
The goal, therefore, is to report success in a manner that breaks
as few programs as possible.

(Now, there may be those of you who say, "Hang compatibility.
If programs checked the return value incorrectly, then they
deserve to stop working!"
If you choose to go in that direction,
then be prepared for the deluge of compatibility bugs to be
assigned to you to fix.
And they're going to come from a grumpy compatibility testing team
because they will have spent a long time just finding out that
the problem was that the program was checking the return
value of ShellExecute incorrectly.)

Since there is still 16-bit code out there that may thunk
up to 32-bit code, you probably don't want to return a value
greater than 0xFFFF.
Otherwise, when that value gets truncated to a 16-bit
HINSTANCE will lose the high word.
If you returned a value like 0x00010001,
this would truncate to 0x0001, which would
be treated as an error code.

For similar reasons, the 64-bit implementation of the
ShellExecute function had better not use the
upper 32 bits of the return value.
Code that casts the return value to int
will lose the high 32 bits.

Furthermore, you shouldn't return a value that, when cast
to an integer, results in a negative number.
Some people will use a signed comparison against 32;
others will use an unsigned comparison.
If you returned a value like -5, then the
people who used a signed comparison would think the
function failed, whereas those who used an unsigned comparison
would think it succeeded.

By the same logic, the value you choose as the return value
should not result in a negative number when cast to
a 16-bit integer.
If the return value is passed to a 16-bit caller that
casts the result to an integer and compares against 32,
you want consistent results independent of whether the
16-bit caller used a signed or unsigned comparison.

Edge conditions are tricky, so you don't want to return
the value 32 exactly.
If you look at code that checks the return value from
ShellExecute, you'll probably find that
the world is split as to whether 32 is an error code or not.
So it'd be in your best interest not to return the value 32
exactly but rather a value larger than 32.

So far, you're constrained to choosing
a value in the range 33–32767.

Finally, you might be a fan of Douglas Adams.
(Most geeks are.)
The all-important number 42 fits into this range.
Your choice of return value, therefore, might be
(HINSTANCE)42.

Going back to the original question:
How should I check the return value of ShellExecute
for errors?
MSDN says you can cast the result to an integer and compare the
result against 32.
That'll work fine.
You could cast in the other direction, comparing the return
value against (HINSTANCE)32.
That'll work fine, too.
Or you could cast the result to an INT_PTR and
compare the result against 32.
That's fine, too.
They'll all work, because the implementor of the ShellExecute
function had to plan ahead for you and all the other people who call
the ShellExecute function.

Back in the Windows 95 days,
people swore that increasing the value of
MaxBPs in the system.ini file
fixed application errors.
People usually made up some pseudo-scientific explanation
for why this fixed crashes.
These explanations were complete rot.

These breakpoints had nothing to do with Windows applications.
They were used by 32-bit device drivers to communicate with code
in MS-DOS boxes,
typically the 16-bit driver they are trying to take over from
or are otherwise coordinating their activities with.
A bunch of these are allocated at system startup when
drivers settle themselves in, and on occasion, a driver
might patch a breakpoint temporarily into DOS memory,
removing it when the breakpoint is hit (or when the
breakpoint is no longer needed).
Increasing this value had no effect on Windows application.

I fantasized about adding a "Performance" page to Tweak UI
with an option to increase the number of
"PlaceBOs".
I would make up some nonsense text about this setting controlling
how high in memory the system should place its "breakpoint
opcodes". Placing them higher will free up memory for other
purposes and reduce the frequency of "Out of memory" errors.
Or something like that.

I was reminded of this story by my pals in products support who were
trying to come up with a polite way of explaining to their customer
that
there is no /7GB boot.ini switch.
In other situations, they sometimes dream of shipping
placebo.dll to a customer to solve their problem.

(And by the way, the technical reason why the user-mode address
space is limited to eight terabytes
was given by commenter darwou:
The absence of a 16-byte atomic compare-and-exchange instruction
means that bits need to be sacrificed to encode the sequence number
which avoids the ABA problem.)

You can sometimes narrow down the source of a problem just by looking at the
screen and moving the mouse.

When you move the mouse, the cursor on the screen moves to match.
This work is done in the window manager in kernel mode.
The mouse hardware notifies the window manager,
"Hey, I moved left twenty units."
The window manager takes this value,
accelerates or decelerates it according to your mouse
acceleration settings,
calls any low-level mouse hooks that are installed,
and then tells the display driver,
"Move that sprite left about thirty pixels" (say).
It then
sets the "the mouse moved" flag
so that the program who owns the window under the new mouse
position will get a WM_MOUSEMOVE message.
The window manager also sets the cursor to the "virtual cursor state"
corresponding to the window beneath the cursor.
The "virtual cursor state" remembers the cursor that the thread
(or threads, if input has been attached) responsible for the window
most recently set.
Maintaining the virtual cursor state is important,
for if a thread calls SetCursor to change the
cursor to an hourglass and then stops processing messages
(because it is busy), you really want the cursor to change
back to an hourglass when it moves over the thread's windows.

What does it mean if
the cursor doesn't move at all when you move the mouse?
Could it be caused by an application?
If you read through the flowchart I described above,
the only place applications get involved in the "move the mouse cursor"
code flow is if they are filtering out the mouse motion in a
low-level mouse hook.
(Another way an application can "lock up" the mouse is by
calling the ClipCursor function, but vanishingly few
applications do this.
I'm assuming you aren't the victim of malicious software
but instead are trying to figure out what program, if any,
is accidentally freezing the mouse.)

Low-level mouse hooks are comparatively uncommon since they
exact a high performance penalty on the system.
If you're moving your mouse and don't see the cursor move around
on the screen, my guess is that there is a problem in the kernel-mode
side of the equation.
If you're seeing the entire system freeze up, then it's probably
a device driver that has started acting up and held a lock for too
long.

A flaky hard drive can have the same effect.
If the window manager itself takes a page fault, it has to wait
for the hard drive to page in the data.
and if the window manager happened to be holding
a lock when this happened, that lock is held across the entire I/O
operation.
If your hard drive is flaky and, say, takes ten seconds to produce
a sector of data instead of several milliseconds,
then it will look like the system has frozen for ten seconds,
since the window manager is stuck waiting on your disk,
which is in turn grunting and recalibrating in a desperate attempt
to produce the data the memory manager requested.

In other words:
If the cursor won't move,
it's likely a driver or hardware problem.
(Figuring out which driver/hardware will require hooking up
a kernel debugger and poking around.
Not for the faint of heart.)