Issue 3-31, August 5, 1998

Be Engineering Insights: "Threads Don't Like Playing Ping Pong," Part II

By PierreRaynaud-Richard

Last week, in the first part of this article
Be Engineering Insights: "Threads Don't Like Playing Ping Pong," Part I
(which we strongly encourage you to read or read again before pursuing),
we started studying the behavior of various systems of threads sharing a
critical section through two different locking mechanisms: semaphores
(TEST 1—An Ideal World), and benaphores (TEST 2—An Idealer World).

We then discovered that when two of those threads are in fair
competition, a lot of unnecessary context switches are generated (TEST 3
-- A Small Dose of Reality) as threads exchange control of the CPU
continuously (that's what we call playing ping pong), and benaphores lose
most of their advantage (TEST 4—The Benaphore Curiosity). This ping
pong game clearly affected the efficiency of the code, but as all tests
were done in an ideal world (those threads were the only serious CPU
cycles consumers), the speed hit was only in the range of 1.4 to 2.5
times slower.

In this second article, we now try to come closer to reality by adding
more noise (more CPU cycles consumers) in our test environment, but in a
controlled fashion that gives us a chance to still understand what's
really happening.

TEST 5—Introducing the Dummy

It's time to introduce our third player, the dummy thread. Its only goal
in life is to stay around and use whatever CPU cycles are available, if
any. It never executes the same critical section as the other threads, so
it never conflicts with them: it's just an independent CPU cycles
consumer.

As in test 1, we have a single B_NORMAL_PRIORITY thread passing through a
semaphore. To disturb the environment a little, we introduce an unrelated
B_LOW_PRIORY "dummy" thread. This dummy thread doesn't contend for the
semaphore, and it's running at a lower priority, so it shouldn't really
affect the graph of our "real" thread. Right?

And we're not disappointed. The main spike and "echoes" are almost
identical to those in test 1 (A:99.64% & 3.54us; B:0.31% & 8.62us;
C:0.02% & 30.68us).

The only small difference (D) is the appearance of an infrequent (0.01%)
hit at about 3020us. This represents the dummy thread that's being
scheduled for the full 3000us scheduler quanta.

Analogously, the unlocking graph (not shown) looks nearly identical to
the unlocking graph in test 1, but with the addition of an infrequent
3020us hit. So in the case of a single thread, adding an unrelated low
priority thread doesn't significantly disturb the system, which is just
what we hoped for.

TEST 6—We Go to Hell

Now let's add a second real thread to contend for the semaphore. Just as
test 5 was similar to test 1, here we hope to see results similar to test
3 (two threads with no dummy), with the additional infrequent 3020us hit.

Holy moly, what the hell happened? We wanted order—we got a mess. Not
only are the spikes all over the place, but there aren't enough samples.
Until now, we'd been getting on the order of 100,000 samples per test.
This time we only got about 2,000 (1,000 per thread).

Rather than analyze the spikes, let's look at this test from the CPU's
perspective. We see that in the tests without the dummy thread, a single
real thread spends about 45% of its time in the critical section--the
rest is used for locking. With two real threads and no dummy, the "real
work" percentage drops to 34% because of the additional context switch
overhead. But with two threads and a dummy, the real work plummets to
1.5%!

Now let's consider what happens when the scheduler runs. Here's what it
sees:

one B_NORMAL_PRIORITY thread is blocked (it can't be rescheduled).

the other B_NORMAL_PRIORITY thread is ready to run.

the dummy B_LOW_PRIORITY thread is (always) ready to run.

Most of the time, the scheduler chooses the available normal priority
thread, which runs through the critical section and blocks soon after,
causing the scheduler to run again. This doesn't take long—as we've
seen, it takes less than 13us.

Less often, the scheduler will choose the low priority thread. Since this
thread has nothing to block it, it uses up the entire 3000us scheduling
quanta. In other words, although the low priority thread is chosen much
less often, it runs more than 200 times *longer* than the high priority
threads.

Intermission

Here we are in the middle of Act II. There are two bloody threads on the
floor and it looks like the dummy did it. Or was it the kernel scheduler?

No, it's not the scheduler's fault. The scheduler only guarantees that a
thread of higher priority will be *scheduled* more often than a thread of
lower priority as long as they are both ready to run. It can't guarantee
that the higher priority thread will *run* longer than the low priority
thread. If the higher priority thread is "stupid" enough to block itself
quickly, it will starve, higher priority or not. That's exactly what
happened in this case.

In some cases, you can avoid this starvation problem by staying in the
critical section as long as you can—as long as is fair to other
competing threads. Let's try it and see what happens.

TEST 7—We're Rescued

This is similar to test 6 (two real threads, one dummy), but now the loop
looks like this:

We set KEEP_IT_DELAY to 2000us, a little less than the standard scheduler
quanta. In other words, once the thread get the ownership of the critical
section, it keeps using it longer than in the previous tests. Also notice
that we're using a benaphore in this test.

Furthermore, the test will record the latency for each execution of the
do_critical_section() function, and not only once between each pair of
locking and unlocking. For consistency with the other graph, we timed the
do while() loop mechanism as an equivalent of locking/unlocking, even if
it's not really the case, so we expect to get a lot of hits close from 0
latency.

We notice two things immediately: we got our 100,000 samples back (don't
forget that there is twice as many samples that what we see here, as we
have two working threads), and the latency is drastically improved.

(A) This is a count of the inner loop iterations that ran during the
2000us "free time." It's close to 0, as we predicted. The probability
is extremely high (99.65%) but mostly that's because we're counting the
calls to do_critical_section() rather than the real entire critical
section (this wouldn't be an unfair count—but it's no longer cogent
for comparison to the other tests).

(B) This is the ping pong case. The latency (2036us) is about what we
expect; it signifies that this thread is blocked waiting for the other
real thread to eat up its 2000us worth of scheduler quanta.

(C) This latency, at about 5000us is what happens when the thread has
to wait for both the other real thread AND the dummy thread. The
frequency is less than the one of B, which means that the dummy runs
less often than the real threads, as expected.

(D) "scheduler echo" for A.

But the real test is the CPU usage. Here we see that 73% of the global
CPU time is spent doing real work. This is an overly-idealized number,
since our inner loop is unrealistically efficient (it can ALWAYS consume
the entire 2000us that we give it).

TEST 8—Apples and Oranges

How much you actually gain by using the "saturate the quanta" solution
compared to the plain, one-lock/one-inner loop method depends on the
duration of the inner loop. Here are some measurements comparing the
semaphore, benaphore, and benaphore+saturation methods for two normal
priority threads, with and without a low priority dummy thread, for
various length of the inner loop in microseconds (column "Duration"). The
numbers in the body of the table is the fraction (in percent) of the CPU
time really used for processing of the do_critical_section() function:

Here we see the advantage of the saturation method even with a long inner
loop. Also, notice that the benaphore is still slightly more efficient
than the semaphore.

IN THE LOBBY

The problem we identified (and fixed!) in the app server is exactly what
we've been pointing out here. The names have been changed (our real
threads were running at B_DISPLAY_PRIORITY and the "dummies" at
B_NORMAL_PRIORITY), but the behaviour was the same.

And please, don't have any illusions about the numbers and statistics
given in this article. The tests were run in unrealistically ideal
circumstances: The system noise was reduced to a very low level, we never
used more than three threads and one lock, and the memory system of the
machine was not stressed at all. Real life is, of course, very very
different.

But beyond the numbers, this article pointed out some general tendencies
about the dynamics of synchronization systems, some problems that can
occur and some of the bad side-effects they can generate.

We won't pretend that this was all-inclusive (we didn't even speak about
a multiple-reader/ one-writer mechanism for example) or perfectly
accurate, but we hope that it will give you hints about how to design
more efficient and more stress-proof synchronization mechanisms. Luckily
enough, most of you will never have to care about such issues. As for the
other ones, please remember at least one thing:

"Threads don't like to play ping pong."

Post Script: For those who followed everything carefully, here's a puzzle
for you: What test configuration is this locking graph coming from?

For the sake of our mail server, I won't be including the entire source
code in these articles anymore. You can download the source code for this
week's project from the Be FTP site:

ftp://ftp.be.com/pub/samples/intro/TextEditor.zip

Now that you're caught up, and have this week's source code, let's get
started. Note that the TextApp and
TextWindow classes and their related
functions have been separated into two source files, and that there are
now separate header files for each class. This makes it easier to follow
the code.

This week, we'll add open and save file panels to our text editor, along
with the code needed to actually save and load document files.

Look at the TextApp.h file, in
the TextApp class definition. There are a
few new items in here: a RefsReceived() function, which is a standard
member of the BApplication class; and the
BFilePanel pointer, which will
be used for the open document panel.

The TextApp constructor now includes the following line, which creates
the open file panel (the panel isn't made visible, it's just created):

openPanel = new BFilePanel;

Note also that the windowRect is now a global variable and is no longer
messed around with in TextApp's constructor. Since you can now create
documents from multiple sources, we need more flexibility with this
rectangle, so it can't remain local. The WINDOW_REGISTRY_ADD handler in
TextApp::MessageReceived() has changed so that a reply is always sent
(instead of just when the window is untitled, and the reply includes a
new B_RECT_TYPE field, rect,
that specifies the frame rectangle to use
for the new window.

MessageReceived() has one addition to the switch statement that
dispatches messages:

case MENU_FILE_OPEN:
openPanel->Show();
break;

When the Open option is selected from a window's File menu, the request
is dispatched here, to the TextApp::MessageReceived() function (we'll
show how that's done when we get to it). This makes the open file panel
visible, so the user can choose a file (or files) to open.

The TextApp::RefsReceived() function receives as an argument a pointer to
a message that indicates what file or files should be opened. This
message contains a single field, refs, which contains one or more
entry_ref structures, each indicating a file to open.

RefsReceived() is called whenever a user selects files to open in a file
panel, drags files onto the application's icon, or double-clicks them in
the Tracker.

The code is fairly simple. A do loop iterates through all the items in
the refs list until an error occurs and the loop ends. For each
entry_ref found, a new window is created:

new TextWindow(windowRect, &ref);

This uses a new TextWindow constructor that accepts an additional
argument—an entry_ref indicating what file to load up into the window.

Let's look at the changes to the TextWindow class, starting with the
class itself:

There's a lot of new stuff in here, including the second TextWindow
constructor, the private _InitWindow() function, and the saveitem,
savemessage, window_id, and savePanel fields.

The _InitWindow() function (take a look, don't be shy) now handles most
of the user interface setup for a new window. This lets us consolidate
the code that's shared by both versions of the TextWindow constructor.
The first constructor now calls _InitWindow()
and Show() to set up the
window and make it visible.

The second constructor is called when the new window needs to have a file
loaded into it. _InitWindow() is called to create the window, then the
BFile object, file, is set to the
entry_ref of the file to open. If that
succeeds, we can read in the file by getting its length (via the
BFile::GetSize() call), allocating enough memory to contain the file's
data, and reading the file into the newly allocated block.

A little trick follows that will make more sense later: we create a
B_SAVE_REQUESTED message referring to this file. This is the message
that's sent to the TextWindow::MessageReceived() function when it's time
to save the file, and is how we keep track of where the file is located
on disk. This message contains an entry_ref named directory that
indicates the directory in which the file is located, and a string called
name that shows the name of the file itself. This message is kept in
the savemessage field of the TextWindow object.

Finally, we use BTextView::SetText() to set the window's text, set its
title to the name of the file, and then free the allocated text buffer,
before showing the window and returning. Once this is done, a new
document window is created, with the name of the file that's been opened;
the text in the window is the contents of the file. The window won't be
visible until it's un-Minimize()d, as described below.

The _InitWindow() function's code is mostly the same as the user
interface setup code from Part 4's TextWindow constructor, with these
exceptions:

First, the following line is added so we know there isn't currently a
savemessage available (which means the file has never been saved, and
wasn't loaded from disk, so we don't know where to save it without asking
the user):

savemessage = NULL; // No saved path yet

Second, the File menu's Open
item's target is changed to point to the
TextApp object, so it will handle the open requests:

Note that we keep a pointer to the Save item, so we can enable it later
when the document has been saved for the first time using the
Save as
option.

Fourth, to support the Undo option, we add the following line:

textview->SetDoesUndo(true);

Fifth, we need to create the Edit menu and populate it with the options
we want. This is similar to adding the File menu. Each item is added, and
its target is set to be the BTextView, so the text view can handle the
commands Undo, Cut, Copy, Paste, and Select All.

Sixth, the savePanel needs to be created.
This is a B_SAVE_PANEL type
BFilePanel object:

The target for messages sent by the file panel is the TextWindow, so the
TextWindow::MessageReceived() function needs to be augmented to handle
them.

Finally, _InitWindow() calls
Minimize() to minimize the new window. This
is a nifty trick that means that even when BWindow::Show() is called, the
window won't appear on screen, because it's minimized. This lets us play
around with the window's title, position, and size before making the
window visible. We have to Show() the window to do these things, because
it won't receive any messages until it's shown, and we have to receive
the WINDOW_REGISTRY_ADDED message so we know what name and position to
use for the new window. Show() is called by the constructors, so it's not
included in _InitWindow().

The TextWindow destructor has been augmented to delete
the savemessage
(if it exists) and the savePanel.

The MessageReceived() function has been augmented as follows:

The WINDOW_REGISTRY_ADDED handler has been changed so that the window's
title is set to "Untitled #" only if there isn't a savemessage attached;
this lets the name we've specified when opening a new document actually
stick (otherwise all windows would be given the name "Untitled#").

In addition, the rect parameter is checked, and if it exists, the
window is repositioned to occupy the area of the screen indicated. The
registry then determines where each window should be drawn. This allows
easy custom window placement later (for instance, the current code
doesn't handle the possibility that eventually you'll have enough windows
that they'll wind up off screen—you should be able to fix this easily
by correcting the WINDOW_REGISTRY_ADD code in
the TextApp class. I'll
leave that as an exercise).

Handlers have also been added for the MENU_FILE_SAVEAS,
MENU_FILE_SAVE,
and B_SAVE_REQUESTED commands. The BFilePanel is shown when
MENU_FILE_SAVEAS is received. When
MENU_FILE_SAVE is received, the Save()
function, which we'll examine next, is called, with a NULL argument
passed. This indicates to Save() that the savemessage already attached to
the TextWindow should be used to determine where to save the file.
B_SAVE_REQUESTED, indicates that the document should be saved at the
location specified by the message, which is passed through to Save().

The Save() function handles saving the document's text. When we opened
the file, we used a BFile object to handle the file access. Here, just to
show that it can be done, we'll use standard POSIX file commands.

Save() accepts a single argument: a pointer
to a BMessage that should be
formatted in B_SAVE_REQUESTED style, indicating where to save the file.
If NULL is specified, the savemessage
field in the TextWindow will be
used.

Save() begins by checking to be sure
that if NULL is specified, the
savemessage has actually been initialized. If it
hasn't been, B_ERROR is returned.

Then the entry_ref of the save directory and the name of the file to be
saved are peeled out of the message. A BEntry is set to the save
directory, and BEntry::GetPath() is called to
obtain a BPath object
representing that directory. BPath::Append() is then called to append the
filename to the path.

Now that we have a BPath representing the location of the file to be
saved, we call fopen() to open the file—the path name, as a standard
string, is extracted by calling BPath::Path().

Then we just use standard POSIX file I/O calls to write the file and
close it:

The BTextView::Text() function returns the
contents of the BTextView, and
BTextView::TextLength() returns the size of the text in bytes.

If the write was successful, the window's title is set to the document
name (in case it's been changed), the Save item in the File menu is
enabled, and the savemessage is changed to indicate the new location of
the file by deleting the existing message (if there is one) and replacing
the pointer with a pointer to the message passed to Save(). If the
original message and the new message are the same, this is unnecessary
(and dangerous) so we don't do it.

That's all for this time. This application is essentially complete; you
might want to add code to present alerts to the user if an error occurs
(errors are, currently, blissfully ignored). Check out the BAlert class
in the Be Developer's Guide. Try to fix the registry to guarantee that
windows will always appear on screen, instead of eventually disappearing
off the bottom-right corner.

Next time, we'll start a new project, focusing on areas of the BeOS we
haven't investigated yet (there are plenty of them!).

Relationships vs. Transactions

By Jean-LouisGassée

This week's topic suggested itself as I was attempting to reorganize my
e-mail inbox. We regularly get requests for two items: a downloadable
version of the BeOS for Intel Architecture processors, and a trial
version of the BeOS. I hope my discussion of these two topics will shed
some light on what I mean by "Relationships vs. Transactions."

Let me start with the first item. Why don't we have a downloadable
version of the IA (Intel Architecture) BeOS? Are we deaf or blind to what
our customers want? Wouldn't it make more sense for would-Be users and
for Be to deliver the BeOS electronically? Why acquire StarCode and chant
the merits of SoftwareValet technology, why invest in BeDepot.com
e-commerce and not offer an Intel download similar to the PowerPC
download?

Similar is the operative word. Dated is another—our PowerPC download
is not current. There isn't a version of the BeOS for G3 machines,
because Apple declined to supply us with the technical data required to
make sure the BeOS works on their new hardware. For our part, we declined
to engage in arguments. It's their right to withhold such information
and, in any case, we assume they're busy enough turning the ship around,
with good results lately.

But that's no excuse for the lack of a PC download. Let's take another
look at the facts. First, assuming you don't already have partitioning
and boot manager software, you would have to download site preparation
software and run it—that is, create a partition for the BeOS and
install a boot manager.

Then, using your Windows browser, you'd have to download the BeOS and
another Windows application that would take the BeOS download in your
Windows partition and write it onto the BeOS partition you just created.
Finally, you'd tell the boot manager the BeOS has landed and you're in
business.

Contrast this with the current physical distribution method for the BeOS.
You get two CDs and one floppy. One CD is for the PowerPC version of 3.1;
it works on older, pre-G3 PowerPCs. The other is for the Intel-based 3.1;
it contains Windows partitioning tools and BeOS files.

Once you've partitioned your hard disk, the floppy boots you into the
BeOS. From that point installation proceeds from the CD onto the BeOS
partition, without having to be stored in the Windows partition first
before being transferred into the BeOS space.

We realize that the industry has already moved into a phase where
software is increasingly available in a downloadable form. This creates
an entirely normal expectation of being able to download the BeOS, just
as you can the latest version of Netscape Communicator or the "free"
version of Outlook 98.

What this expectation omits is the fact these products are downloaded
from Windows for Windows. In our case, we have to download from Windows
and install in a different world. Now add two more complications:
partitioning and hardware support. In our constantly recalculated
opinion, this makes the risk of starting our relationship with a bad
experience too great.

You'll recall some of the humbling stories I've told here, such as the
time one of our investors, and member of our board of directors, saw the
data on his spouse's PC disappear because of a problem involving the
switch between Windows FAT 16 and FAT 32 files systems. It's not that we
wouldn't love the broader reach and higher volume of e-commerce
transactions we'd generate if the BeOS were delivered electronically
today. But...we don't think we've developed the knowledge and tools to
meet the standards and expectations created by the now customary practice
of downloading and installing into the same environment.

Looping back to the apparent inconsistency in our behavior—promoting
BeDepot.com e-commerce but not offering a BeOS for Intel download --
Software Valet does indeed promote the positive experience of downloading
and installing into the same environment, once you've installed the BeOS.
I'm aware that this may make our explanation of the situation sound
flimsy. Frustration at being unable to download the BeOS resurfaces,
because instant access is such a simple, powerful idea that it won't go
away; we can brush it off, but it keeps coming back.

All I can say for now is that we're very aware of that and feel bad about
not offering more than the explanation of our calculus of risk vs.
reward. We apologize for the frustration and we hope to offer a simpler,
safer solution in the future.

Which brings me to the trial version. "The Intel version isn't available
for download and, compounding your problems, you have the nerve to ask
people to buy your BeOS, sight unseen, without offering a trial version."
Add or subtract a few choice words, that's how some correspondents
question our personal or corporate sanity.

This is a little easier to explain—I mean answer. Please buy the
product and return it for a full refund if you're not satisfied, no
questions asked. We believe demands for a trial version to be entirely
justified. Unfortunately, we don't have a solution today, but we can get
around the problem with our offer of a straight refund.

The absence of a trial version isn't the only reason for the refund
offer: We can't afford even one unhappy customer. In our business,
word-of-mouth is still the most potent marketing weapon. At the very
least, we want someone who tries the BeOS to say, "Well, it wasn't for
me, but the Be folks treated me decently." At the very best, we want this
pioneer to buttonhole others and persuade them to share his bliss. So,
our offer of a refund would hold even if we had a trial version—or
perhaps I should say *when* we have a trial version.

We'd like to do just that. PCs have been bootable from the now standard
CD-ROM drive for awhile, and we're wondering if it would be a good idea
to put together a version of the BeOS that boots from the CD-ROM. This
has many built-in limitations, starting with performance. On the other
hand, it requires no alteration of your hard disk, no partitioning, no
boot manager. To be continued when we have more data.

In any event, we have another motivation to do better than provide
logical but sometimes frustrating explanations—the threat of silence.
By this I refer to a simple fact: arguments keep us alive, silence kills.
I don't say this just because I was born and raised in Paris, where cab
drivers and café waiters love to argue about anything.

No, what I mean is this—as long our beloved customer complains,
argues, and otherwise seeks to alter our behavior, she or he is doing us
two favors. First, we get a chance to put the relationship back on track
and, second, we get an earful of information we can put to good use in
other transactions. When the customer doesn't even bother to tell us
what's wrong, we're dead—we have no information and no opportunity to
apologize and correct a problem.

To all our honorable correspondents who care enough and hold us in high
enough regard to offer energetic feedback, thank you. To the others who
think we don't care or can't do anything about their problem, thank you
for reconsidering and giving us a chance to do the right thing.