Some people have noticed that certain programs cause the
Add or Remove Programs control panel to create an enormous
amount of blank space. What's going on?

These are programs that have bad custom uninstall icon registrations.

If you go to the registry key
HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Uninstall,
you'll find a list of programs that have registered for appearing
in the Add or Remove Programs control panel.
Some of them might have been so kind as to provide a
"DisplayIcon" value,
thereby saving the control panel the indignity of
having to guess at an appropriate icon.

Unfortunately, if they put a bad icon registration in that registry
value, the result is a bunch of blank space since the control panel
is trying to reserve space for a bogus icon.

The format of the icon registration is a filename, optionally
followed by a comma and a decimal number.

C:\full\path\to\icon\file.dll
C:\full\path\to\icon\file.dll,123

Since this is not a command line, quotation marks are not necessary
(although they are tolerated).
Furthermore, the number
can be any value except for -1.
Why is -1 forbidden?
Because
the ExtractIcon function
treats the value -1 specially.

If the icon file does not exist in the icon file,
or if the icon number is -1,
then the icon specification is invalid and the
Add or Remove Programs control panel will reserve an odd amount of space
for an icon that doesn't exist.

Perhaps the Add or Remove Programs control panel should be more
tolerant of invalid icon registrations?
Or should it stay the way it is,
adhering to the
"Don't bend over backwards to fix buggy
programs; force the program authors to fix their own bugs" policy
that so many of my readers advocate?
(Noting furthermore that refusing to accomodate invalid icon registrations
makes it look like Add or Remove Programs is the buggy one.)

Some people attempt to simulate keyboard input to an application
by posting keyboard input messages, but this is not reliable for
many reasons.

First of all, keyboard input is a more complicated matter than
those who imprinted on the English keyboard realize.
Languages with accent marks have dead keys,
Far East languages have a variety of Input Method Editors,
and I have no idea how complex script languages handle input.
There's more to typing a character than just pressing a key.

Second, even if you manage to post the input messages into
the target window's queue, that doesn't update the keyboard
shift states. When the code behind the window calls
the GetKeyState function
or
the GetAsyncKeyState function,
it's going to see the "real" shift state and not the fake
state that your posted messages have generated.

The SendInput function
was designed for injecting input into Windows.
If you use that function, then at least the shift states will be
reported correctly. (I can't help you with the complex input problem,
though.)

For some reason,
people think too hard.
If you want to create
a fullscreen window that covers the taskbar, just create a fullscreen
window and the taskbar will automatically get out of the way.
Don't go around hunting for the taskbar and poking it; let it do its thing.

Note that this sample program doesn't worry about destroying that
fullscreen window or preventing the user from creating more than one.
It's just a sample.
The point is seeing how the CreateFullScreenWindow function
is written.

We use
the MonitorFromWindow function
to figure out which monitor we should go fullscreen to.
Note that in a multiple monitor system, this might not be the same
monitor that the taskbar is on. Fortunately, we don't have to worry
about that; the taskbar figures it out.

I've seen people hunt for the taskbar window and then do
a ShowWindow(hwndTaskbar, SW_HIDE) on it.
This is nuts for many reasons.

First is a mental exercise you should always use when evaluating
tricks like this: "What if two programs tried this trick?"
Now you have two programs both of which think they are in charge
of hiding and showing the taskbar, neither of which is coordinating
with the other. The result is a mess. One program hides
the taskbar, then the other does, then the first decides it's finished
so it unhides the taskbar, but the second program wasn't finished yet
and gets a visible taskbar when it thought it should be hidden.
Things only go downhill from there.

Second, what if your program crashes before it gets a chance
to unhide the taskbar?
The taskbar is now permanently hidden
and the user has to log off and back on to get their taskbar back.
That's not very nice.

Third, what if there is no taskbar at all?
It is common in Terminal Server scenarios to
run programs by themselves without Explorer
[archived].
In this configuration, there is no Explorer, no taskbar.
Or maybe you're running on a future version of Windows
that doesn't have a taskbar,
it having been replaced by some other mechanism.
What will your program do now?

Don't do any of this messing with the taskbar.
Just create your fullscreen window and let the taskbar do its thing
automatically.

Today's dead computer is my Sony Vaio PCG-Z505LE laptop,
with a 600MHz processor and 192MB of RAM.
Certainly a big step up from that 486/50 with 12MB of RAM.

Laptop computers have a comparatively short lifetime.
(Hardware vendors must love them.)
I've learned that the best value comes from buying used laptops.
You have to accept being a bit behind the curve,
but the way I use my laptop, it needs to do only a small number
of things:

surf the web,

read e-mail,

use Remote Desktop Connection to access my desktop machine,

download pictures from my digital camera, and

compile the occasional program.

Only that last operation requires a hefty processor, and I do it so
rarely that it doesn't bother me that it's kind of slow.
(I just run the command line version of the compiler, so that at
least takes the IDE overhead out of the picture.)

I bought this laptop two years ago, used, and it ran just fine
until a couple months ago when the internal power supply burnt out.
I was ready to abandon the model line and give away the accessories
I had bought, including a $200+ double-capacity battery.

Allow me to digress on laptop batteries.
Observe that batteries for old-model laptops
cost almost as much as the laptops themselves.
That's because the battery is the only real consumable in a
laptop computer.
The other components will run practically
indefinitely if you don't drop them or douse them in soda,
but batteries just plain wear out.
That's where the money is.

This means that many ads for used laptops will mention
"needs new battery" at the end.
And those are the ones I sought out.
Because I have a working battery!
Most prospective buyers would be turned off by a dead battery,
but that didn't bother me one bit.

The replacement laptop arrived a few days ago, and it runs great.
I wiped the drive and reinstalled Windows XP from scratch.
(Challenging because the laptop doesn't come with a bootable
CD-ROM drive.
I had to use floppies!)
I may install a handful of programs but that's all.
I don't like installing software on my computer.
The more programs you install, the more likely there's going to
be a conflict somewhere.

The old laptop has already started being scavenged for parts.
A friend of mine needed a replacement laptop hard drive,
so I gave him my old one.
The battery and power brick can of course be used by
the new laptop.
The memory from the old Vaio is no use, since the Vaio has
only one memory expansion slot.
The other parts of the old laptop aren't much use for
anything aside from spares.
Perhaps I should put the old laptop on concrete
blocks on my front lawn.

Next time (if there is a next time), the story of the dead AlphaServer.

After our latest round of optimization, the 100ms barrier teased us,
just milliseconds away.
Profiling the resulting program reveals that
60% of the CPU is spent in operator new.
Is there anything we can do about that?

Indeed, we can.
Notice that the memory allocation pattern for the strings in our
dictionary is quite special:
Once a string is allocated into the dictionary,
it is never modified or freed while the dictionary is in use.
When the dictionary is freed, all the strings are deleted at once.
This means that we can design an allocator tailored to this
usage pattern.

I don't know whether there is a standard name for this thing,
so I'm just going to call it a StringPool.
A string pool has the following characteristics:

Once you allocate a string, you can't modify or free it
as long as the pool remains in existence.

If you destroy the string pool, all the strings in it are destroyed.

We implement it by using the same type of fast allocator that
the CLR uses: A single pointer.
[25 May 2005: The blog server software corrupts the diagram, sorry.]

allocated

free

↑

p

To allocate memory, we just increment p by the number
of bytes we need.
If we run out of memory, we just allocate a new block, point p
to its start, and carve the memory out of the new block.
Destroying the pool consists of freeing all the blocks.

Note also that this memory arrangement has very good locality.
Instead of scattering the strings all over the heap, they are
collected into one location. Furthermore, they are stored in
memory in exactly the order we're going to access them,
which means no wasted page faults or cache lines.
(Well, you don't know that's the order we're going to access them,
but it's true.
This is one of those
"performance-guided designs"
I mentioned a little while ago.)

Each block of memory we allocate begins with a
StringPool::HEADER structure, which we use
to maintain a linked list of blocks as well as providing enough
information for us to free the block when we're done.

Exercise: Why is HEADER a union
containing a structure rather than just being a structure?
What is the significance of the alignment member?

At construction, we compute the size of our chunks.
We base it on the system allocation granularity, choosing
the next multiple of the system allocation granularity
that is at least sizeof(HEADER) + MIN_CBCHUNK in size.
Since a chunk is supposed to be a comfortably large block of
memory, we need to enforce a minimum chunk size to avoid having
an enormous number of tiny chunks if we happen to be running on
a machine with a very fine allocation granularity.

To allocate a string, we first try to carve it out of the
remainder of the current chunk. This nearly always succeeds.

If the string doesn't fit in the chunk, we allocate a new chunk
based on our allocation granularity.
To avoid integer overflow in the computation of the desired
chunk size, we check against a fixed "maximum allocation" and
go stright to the out-of-memory handler if it's too big.

Once we have a new chunk, we link it into our list of
HEADERs and abandon the old chunk.
(Yes, this wastes some memory, but in our usage pattern,
it's not much, and trying to squeeze out those last few bytes
isn't worth the added complexity.)
Once that's done, we try to allocate again; this second time
will certainly succeed since we made sure the new chunk was big
enough. (And any decent compiler will detect this as a tail
recursion and turn it into a "goto".)

There is subtlety here. Notice that we do not update
m_pchNext until after we're sure we either
satisfied the allocation or allocated a new chunk.
This ensures that our member variables are stable at the points
where exceptions can be thrown.
Writing exception-safe code is hard, and
seeing the difference between code that is and isn't exception
safe is often quite difficult.

And finally, we pass our string pool to
DictionaryEntry::Parse so it knows where
to get memory for its strings from.

With these changes, the dictionary loads in 70ms
(or 80ms if you include the time it takes to destroy the
dictionary).
This is 70% faster than the previous version,
and over three times as fast if you include the destruction time.

And now that we've reached our 100ms goal, it's a good time to stop.
We've gotten the running time of dictionary loading down from
an uncomfortable 2080ms to a peppier 70ms, a nearly 30-fold improvement,
by repeatedly profiling and focusing on where the most time is
being spent.
(I have some more simple tricks that shave a few
more milliseconds off the startup time.
Perhaps I'll bring them into play if other changes to startup
push us over the 100ms boundary.
As things stand, the largest CPU consumers are
MultiByteToWideChar and lstrcpynW,
so that's where I would focus next.)

That's the end of the first stage. The next stage will be
displaying the dictionary in an owner-data listview, but you'll
have to wait until next month.

One of the tests performed by
Windows Hardware Quality Labs (WHQL)
was the NCT packet stress test
which had the nickname "Hell".
The purpose of the test was to flood a network card
with an insane number of packets, in order to see how it
handled extreme conditions.
It uncovered packet-dropping bugs, timing problems, all sorts
of great stuff.
Network card vendors used it to determine what size internal
hardware buffers should be in order to cover "all reasonable
network traffic scenarios".

It so happened that at the time this test had currency (1996 era),
the traffic on the Microsoft corporate network was approximately
1.7 times worse than the NCT packet stress test.
A card could pass the Hell test with flying colors,
yet drop 90% of its packets when installed on a computer
at Microsoft because the card simply couldn't keep up
with the traffic.

The open secret among network card vendors was,
"If you want your card to work with Windows, submit one card
to WHQL and send another to a developer on the
Windows team."

(This rule applied to hardware other than network cards.
I was "gifted" a sound card from a major manufacturer
and installed it on my main machine.
It wasn't long before I found and fixed a
crashing bug in their driver.)

Since it was the Big5 dictionary we downloaded,
the Chinese characters are in Big5 format,
known to Windows as code page 950.
Our program will be Unicode, so we'll have to convert it as we load
the dictionary. Yes, I could've used the Unicode version of the
dictionary, but it so happens that when I set out to write this program,
there was no Unicode version available.
Fortunately, this oversight opened up the opportunity to illustrate
some other programming decisions and techniques.

The first stage in our series of exercises will be loading the dictionary
into memory.

Our dictionary is just a list of words with their English definitions.
The Chinese words are written in three forms
(traditional Chinese,
simplified Chinese, and
Pinyin romanization).
For those who are curious, there are two writing systems
for the Mandarin Chinese language and two phonetic systems.
Which one a particular Mandarin-speaking population follows depends
on whether they fell under the influence of China's
language reform of 1956.
Traditional Chinese characters and the Bopomo
phonetic system
(also called Bopomofo)
are used on Taiwan; simplified Chinese characters
and the Pinyin system are used in China.
Converting Pinyin to Bopomo isn't interesting,
so I've removed that part from the program I'm presenting here.

(The schism in the spelling of the English language follows a similar
pattern.
Under the leadership of Noah Webster,
the United States underwent its own spelling reform,
but countries which were under the influence of the British crown
retained the traditional spellings.
Spelling reform continues in other languages even today,
and the subject is almost always highly contentious,
with traditionalists and reformists pitted against each other
in a battle over a language's—and by proxy,
a culture's—identity.)

The program itself is fairly straightforward.
It creates a Unicode file stream wifstream
and "imbues" it with code page 950 (Big5).
This instructs the runtime to interpret the bytes of the file
interpreted in the specified code page.
We read strings out of the file, ignore the comments,
and parse the rest, appending them to our vector
of dictionary entries.

Parsing the line consists of finding the spaces, brackets,
and slashes, and splitting the line into the traditional Chinese,
Pinyin, and English components. (We'll deal with simplified
Chinese later.)

When I run this program on my machine, the dictionary loads in 2080ms
(or 2140ms if you include the time to run the destructor).
This is an unacceptably long startup time, so the first order
of business is to make startup faster. That will be the focus of
this stage.

Notice that as a sanity check, I print the total number of words in the
dictionary. The number should match the number of lines in the
cedict.b5 file (minus the one comment line).
If not, then I know that something went wrong.
This is an important sanity check:
You might make a performance optimization that looks great
when you run it past a stopwatch,
only to discover that your "optimization" actually introduced a
bug. For example, one of my attempted optimizations of this program
resulted in a phenomenal tenfold speedup,
but only because of a bug that caused it to think it was finished
when it had in reality processed only 10% of the dictionary!

As my colleague
Rico Mariani is fond of saying,
"It's easy to make it fast if it doesn't have to work!"

The other day, one of my colleagues
mentioned that his English name "Ben" means
"stupid" in Chinese: 笨/bèn/ㄅㄣˋ.
(His wife is Chinese; that's why he knows this in the first place.)
Knowing that the Chinese language is rich in homophones, I fired
up my Chinese/English dictionary program to see if we could find
anything better.
(Unfortunately, the best I could come up with was
賁/贲/bēn/ㄅㄣ,
which means "energetic".)

Ben seemed to take his appellative fate in stride; he seemed
much more interested in the little dictionary program I had written.
So, as an experiment, instead of developing tiny
samples that illustrate a very focused topic,
I'll develop a somewhat larger-scale program (though still small by
modern standards) so you can see how multiple techniques
come together.
The task will take many stages, some of which may take only a day
or two, others of which can take much longer.
If a particular stage is more than two or three days long,
I'll break it up with other articles,
and I'll try to leave some breathing room between stages.

If you're going to play along at home, beware that you're going
to have to install Chinese fonts to see the program as it evolves,
and when you're done, you'll have a Chinese/English dictionary program,
which probably won't be very useful unless
you're actually studying Chinese...

If you're not into Win32 programming at all, then, well,
my first comment to you is, "So what are you doing here?"
And my second comment is,
"I guess you're going to be bored for a while."
You may want to
go read another blog during
those boring stretches, or just turn off the computer and go outside
for some fresh air and exercise.

When a program starts or when a DLL is loaded,
the loader builds a dependency tree of all the DLLs
referenced by that program/DLL, that DLL's dependents, and so on.
It then determines the correct order in which to initialize
those DLLs so that no DLL is initialized until after all the
DLLs upon which it is dependent have been initialized.
(Of course, if you have a circular dependency, then this falls apart.
And as you well know, calling
the LoadLibrary function
or
the LoadLibraryEx function
from inside a DLL's DLL_PROCESS_ATTACH notification also messes up
these dependency computations.)

Similarly, when you unload a DLL or when the program terminates,
the de-initialization occurs
so that a DLL is de-initialized after all its dependents.

But when you load a DLL manually,
crucial information is lost: Namely that the DLL that is calling
LoadLibrary depends on the DLL being loaded.
Consequently, if A.DLL manually loads B.DLL, then there is no
guarantee that A.DLL will be unloaded before B.DLL.
This means, for example, that code like the following is
not reliable:

At the line marked "oops", there is no guarantee that
B.DLL is still in memory because B.DLL
does not appear in the dependency list of A.DLL,
even though there is a runtime-generated dependency caused by
the call to LoadLibrary.

Why can't the loader keep track of this dynamic dependency?
In other words
when A.DLL calls LoadLibrary(TEXT("B.DLL")),
why can't the loader automatically say "Okay, now A.DLL depends
on B.DLL"?

In this scenario, the load of B.DLL happens
not directly from A.DLL, but rather through
an intermediary (in this case, MiddleFunction).
Even if you could trust the return address, the dependency
would be assigned to MIDDLE.DLL instead of
A.DLL.

Sometimes people ask for features that are such blatant security
holes I don't know what they were thinking.

Is there a way to get the current user's password?
I have a program that does some stuff, then reboots the system,
and I want to have the current user's password so I can log
that user back in when I'm done, then my program can resume
its operation.

Imagine the fantastic security hole if this were possible.
Anybody could write a program that steals your password
without even having to trick you into typing it.
They would just call the imaginary
GetPasswordOfCurrentUser function and bingo!
they have your password.

Even if you didn't want the password itself but merely some
sort of "cookie" that could be used to log the user
on later, you still have a security hole.
Let's call this imaginary function
GetPasswordCookieOfCurrentUser;
it returns a "cookie" that can be used to log the user on
instead of using their password.

This is just a thinly-disguised GetPasswordOfCurrentUser
because that "cookie" is equivalent to a password.
Log on with the cookie and you are now that person.