It's a tremendous result for the museum; hopefully they will do a good job of curating the materials and will make them available for ongoing study, as there is still lots about this period of history that is not well known.

Saturday, February 26, 2011

If you haven't been paying a lot of attention to the strange case of HBGary, let me draw your attention to this essay by Paul Roberts of Kaspersky Labs:RSA 2011: Winning the War But Losing Our Soul. It's the most thoughtful and reflective analysis I've seen yet, and focuses the discussion where I think it is most useful: on the ethics and morality of the entire episode. Here's the crucial quote:

Focus on cyber crime and hacking in the same way as you focus on other types of crimes: as long term problems that must be managed within the "context of normal life," rather than "wars" that pose an existential threat to those involved and must be won at all costs.

Monday, February 21, 2011

In 45 minutes that will just fly by (because he's such a good speaker, and organizes his material so well), this wonderful speech by Vint Cerf, given as the keynote address at the annual Linux conference in Australia, manages to cover pretty much everything there is to know about the current state of the Internet.

Mr. Montgomery and his associates received more than $20 million in government contracts by claiming that software he had developed could help stop Al Qaeda's next attack on the United States. But the technology appears to have been a hoax, and a series of government agencies, including the Central Intelligence Agency and the Air Force, repeatedly missed the warning signs, the records and interviews show.

During the heyday of the anti-terrorist hysteria, Montgomery realized that he could capitalize on the activity:

the company won the attention of intelligence officials in Washington. It did so with a remarkable claim: Mr. Montgomery had found coded messages hidden in broadcasts by Al Jazeera, and his technology would decipher them to identify specific threats.

I remember the media reporting on such "messages", and it indeed had just enough plausibility that I can see how people were willing to believe it. There was, according to the article, just one problem: there were no messages; there was no such technology; Montgomery had simply made it all up.

Much more significant than the wasted money, which in truth is minute in the long litany of Homeland Security waste over the last decade, is the truly shocking realization that, based just on this, foolish, even tragic actions were almost taken:

In December 2003, Mr. Montgomery reported alarming news: hidden in the crawl bars broadcast by Al Jazeera, someone had planted information about specific American-bound flights from Britain, France and Mexico that were hijacking targets.

C.I.A. officials rushed the information to Mr. Bush, who ordered those flights to be turned around or grounded before they could enter American airspace.

Senior administration officials even talked about shooting down planes identified as targets because they feared that supposed hijackers would use the planes to attack the United States.

You should read the entire article: it's chilling and infuriating.

UPDATE As several people have pointed out, you should also read the excellent article by Aram Roston, written 18 months ago:The Man Who Conned The Pentagon, published in (yes) Playboy Magazine. Separately, Mike Masnick points out that not only did the government spend millions of taxpayer dollars on this non-existent software, they also allowed Montgomery to patent it!

Sadly, this last decade has been one tragic story after another along these lines. The gigantic money trough that is the Homeland Security Administration has resulted in all too many fraudsters and con-men showing up to siphon off what they could. Just last month we learned that what appeared to be a respectable security consulting company was in fact headed by a CEO who faked his evidence and manufactured claims that he knew would excite interest and therefore lucrative contracts. Wired's Threat Level blog, together with Ars Technica, another well-respected technology site, did a wonderful job of reporting on the scandal in a pair of articles:

Of course, charlatans are nothing new in the security field -- there's a reason that Bruce Schneier never seems to lack for material to fill the "Doghouse" section of his monthly security newsletter.

But it's certainly dispiriting to have it once again confirmed that even the people who really truly ought to know better, such as the CIA and the Air Force, fall victim to these scams again and again. Here's a big "thank you" to the New York Times, to Ars Technica, to Wired, to people like Brian Krebs and Bruce Schneier, and to all those hard-working and dedicated journalists who refuse to be bluffed by the schemers and fakers and work to educate us about what is real and what is not.

The three projects all look at similar problems, but all are pointed in different directions:

Aether is concerned with single-system multi-core parallelism, and proposes the use of sophisticated high-concurrency non-blocking data structures and algorithms.

Megastore is concerned with offering traditional transaction-processing ACID guarantees on top of a massively-replicated infrastructure, and proposes a Paxos algorithm using timestamping for implementing a single logical log across the many-node systems.

Hyder, like Megastore is concerned with multi-system scale-out, but proposes the use of multi-version database concurrency control and a melding protocol to merge the individual system transaction histories into a totally ordered outcome.

The melding protocol proposed by Hyder is the heart of the work, and is described in great detail. Here's the overview of the idea, from the abstract:

Each transaction executes on a snapshot, logs its updates in one record, and broadcasts the log record to all servers. Each server rolls forward the log against its locally-cached partial-copy of the last committed state, using optimistic concurrency control to determine whether each transaction commits.

Another very interesting aspect of the Hyder work is the tradeoff between traditional BTree architecture and replication protocols, and in particular that there can be significant benefit in minimizing log record size:

An updated tree is logged in a transaction's intention. For good performance, it is important to minimize its size. For this reason, binary trees are a good choice. A binary tree over a billion keys has depth 30. A similar number of keys can be indexed by a four-layer B-tree with 200-key pages. But an update of that Btree creates a new version of four pages comprising the root-to-leaf path, which consumes much more space than a 30-node path.

The paper is somewhat indefinite with regards to this, observing that: "A comparative evaluation of these tradeoffs would be a worthwhile investigation." Sort of a details-are-left-as-an-exercise observation, unfortunately. However, this is active research and I'm pleased that they are sharing their thoughts and ideas; I'd always rather read about the road not taken, and why, then get everything all nicely wrapped up in a package, without the background information that helps understand how they arrived at these conclusions and what other ideas they considered.

All three papers are fascinating and worth reading, and I'm certainly enjoying this renaissance in transactional processing implementation of late. All of a sudden, the only thing I'm short of is time to read all these ideas!

Sadly, the article's headline promises much more than the story actually delivers; I thought the article was interesting, but ultimately frustrating.

Although the writer is able to secure interviews with Matt Cutts of Google, and with a mysterious "Mark Stevens" of an unnamed company (neither the company's name, nor "Mr. Stevens"'s name is revealed in the article, though it's hard to see why), the article ends up mostly teasing, and exposing very little about this seamy under-belly of the web search world.

We never find out exactly which company was gaming the Google results, nor exactly how.

We never find out whether Penney was aware of that company's techniques, or how they selected that company, or much of anything about Penney's relationship with the unnamed vendor that they used.

We never find out exactly what Google did about the situation, other than that "manual action" was taken.

The only concrete and specific piece of evidence that the article seems able to unearth is a bit about a web site in Switzerland which posted a strange and completely unrelated link to the Penney's web site, apparently via a company called TNX.net, but, as the article says: "Efforts to reach TNX itself last week via e-mail were not successful."

Clearly there is a thick curtain here, and the Times apparently decided that they needed to publish what they had, now, even though they had very little to actually write about, because the story, such as it was, was nearly over: "On Feb. 1, the average Penney position for 59 search terms was 1.3. On Feb. 8, when the algorithm was changing, it was 4. By Feb. 10, it was 52." The Times writer ultimately fails to provoke either Google or Penney to talk about the details of what happened; Google's Cutts flat-out refuses to say: "Mr. Cutts said he did not plan to write about the situation ... because Google's goal is to preserve the integrity of results, not to embarrass people." So, instead, the article wanders around, venturing into what the author himself notes is just "another hypothesis, this time for the conspiracy-minded."

This is clearly an important topic, and serious. There is real money at stake, and real questions about legality, and ethics, and transparency. It's frustrating that even as powerful an institution as the New York Times can't break through and bring some real sunlight into these hidden corners.

Saturday, February 12, 2011

I'm not a fanatic Test First Development developer, though I think there's a lot of value in that practice. I'd say I:

occasionally write my tests first,

usually write my tests simultaneously with my code (often with two windows open on the same screen), jumping back-and-forth, adding each new test case as soon as I think of it,

rarely write my tests afterwards.

I'm comfortable with that distribution; it feels about right to me.

Interestingly, my behavior changes dramatically when I'm fixing a bug, as opposed to working on new feature development. When I embark on a bug fix, I nearly always write the test first; I think the proportion may be as high as 95% of the time. I do this because:

it's extremely comforting to make a bug fix, and watch the test case flip from "failing" to "passing", while all the other test cases continue to pass,

but more importantly, I've found, over the years, that writing and refining the test case for the bug is just about the best process for isolating and refining my ideas about what precisely is wrong with the code, and how exactly the code should be fixed.

So, anyway, I've been writing a lot of tests recently, and as part of that effort I've been spending some time studying code coverage reports. I've been using the built-in gcov toolset that is part of the GNU Compiler Collection, and also using the nifty lcov tools that build upon gcov to provide useful and easy-to-read reports and displays.

I take a very pragmatic view when it comes to tests:

Testing is a tool; writing and running tests is one way to help ensure that you are building great software, which should be the only sort you even try to build.

Code coverage is a tool for helping you write tests. If you care about writing tests (and you should), then you should care about writing the best tests you can. Code coverage is something that can help you improve your tests.

I don't have any sort of religion about code coverage. I don't think that tests with higher coverage are mandatory; I don't think that there is some magic level of coverage that you must achieve. I think that anybody who is spending any time thinking about code coverage should immediately go read Brian Marick's excellent essay on how to use code coverage appropriately: How To Misuse Code Coverage.

However, I do think that, all things being equal, higher code coverage is better than lower code coverage, to wit:

If I add a new test case, or suite of cases, and overall code coverage goes up, I am pleased. The test suite is more comprehensive, and therefore more useful.

If, however, code coverage tells me that I've already got a lot of coverage in this area, then I need to think about other ways to improve my tests.

In my experience, there are often large gaps in test coverage, and there is often a lot of low-hanging fruit: writing a small number of simple cheap-to-run tests can quickly ensure that your tests are covering a much larger portion of your code.

Furthermore, studying your code coverage tests can help you think about new test cases to write. A good coverage tool (like lcov) will show you not just line coverage, but branch coverage, function coverage, and many other ways to think about how your tests are driving your code. Just sitting down and staring at code coverage reports, I always find that ideas for new tests just seem to leap off the screen.

And that's what I'm really looking for when I pull up the code coverage tool: inspiration. Writing tests is hard, but there are always more tests to write, and always ways to make my tests better, so any tool which helps me do that is a tool which will have a prominent place on my shelf.

Friday, February 11, 2011

I had a great time this morning; or, at least, about as much fun as a programmer can have.

I was debugging some new code I'd written, in a networked server process which contains a master loop that more-or-less looks something like the following:

... various code to set up variables ...

do {

... accept a new connection from a client ...

... fork a new process to handle that client ...

} while ( ! done );

The code is massively simplified, but for the purposes of this article it doesn't matter.

What does matter is the behavior that I saw, which was most puzzling:

I placed a printf() statement above the top of the do ... while loop, in the "set up variables" section.

That printf was executed each time I accepted a new connection!

Well, OK, I've sort of given it away, but I admit I was sorely puzzled: how could the printf statement be executed on each new connection, when that code wasn't even inside the loop?

OK, here is a reasonable place to stop and think for a second, to avoid spoiling the fun too much.

Have you figured out the answer?

Here it is: the printf statement actually wasn't executed each time through the loop. It was only executed once. But, the printf statement was buffering its output, and the buffered output was present in the process's memory space, and when the process forked a new child process, the fork system call naturally duplicated the buffered printf output, and then when the child process executed a completely unrelated printf call of its own, it also flushed (and hence apparently re-executed) the buffered output from the parent!

So, everything was fine, it was just that the interaction of a buffered and un-flushed bit of printf output, and the thorough duplication of process state by the fork API, made me think that the code was being run twice, when in fact it was just the data that was being duplicated.

When something stops working, you begin developing theories for why it doesn't work, and normally, you start with simple theories that involve things close to you, and only after you exhaust those possibilities do you expand your scope. Typically, you don't consider that there is a global conspiracy against you, or at least that's not usually your first theory.

When programming, and debugging, it's so easy to convince yourself that your theory is correct, and to find ways to force the evidence to match your theory. So when something seems impossible, stop and think: it's unlikely there is a global conspiracy against you, you're just looking in the wrong place!

Thursday, February 10, 2011

Where do I go to learn more about the implementation and behavior of the Mac OS X firewall?

I had a very strange situation recently; it will take a bit of time to explain, but maybe somebody can shed some light on what was happening?

Here's the situation:

I run Mac OS X 10.6.6, with all the latest patches

I also run VMWare Fusion version 3.1.2 on my Mac

I have a variety of guest operating systems that I run in VMWare

I was running a suite of client-server networking tests, with the server on a VMWare guest, and the clients on my Mac host. The test harness is a script with lots and lots of client invocations; during a test case, we generally run a client which initiates a connection to the server, does some work, and then exits.

The behavior I saw was as follows:

The tests would occasionally run to completion, but usually they would run partway, then the network connection between the host and the VMWare guest would be disrupted

When the disruption occurred, the host could continue to talk to other networked machines, both near and far.

And, the guest could continue to talk to other networked machines, both near and far.

But the host and guest were having trouble talking to each other.

Simply waiting for an hour or two, the network connection between the two machines would appear to "magically" repair itself. Or, re-booting the host would repair the connection (rebooting the guest was not enough).

After lots of configuration and experimentation, I discovered that disabling the Mac's built-in firewall software stops this problem from occurring.

But what I don't understand is: why? The firewall, as I understand it, is supposed to be controlling in-bound connections from other machines into my Mac. But in my test suite, all of the network connections that I was making were out-bound, from my Mac to my VMWare guest. So why was the firewall involved in that network processing at all?

For now, it's a mystery, although happily one that I care much less about since I've figured out this workaround.

But it does leave me with that initial question: where do I go to learn more about the implementation and operation of the built-in firewall on Mac OS X 10.6.6?

Tuesday, February 8, 2011

Recently I've been spending a little bit of time learning about FUSE, CUSE, and UIO, which are related technologies for user space device driver implementations in Linux.

What ties these various technologies together is that they are Userspace techniques for implementing functionality that previously required kernel-mode programming. As the Userspace I/O HOWTO says:

For many types of devices, creating a Linux kernel driver is overkill. All that is really needed is some way to handle an interrupt and provide access to the memory space of the device. The logic of controlling the device does not necessarily have to be within the kernel, as the device does not need to take advantage of any of other resources that the kernel provides. One such common class of devices that are like this are for industrial I/O cards.

To address this situation, the userspace I/O system (UIO) was designed. For typical industrial I/O cards, only a very small kernel module is needed. The main part of the driver will run in user space. This simplifies development and reduces the risk of serious bugs within a kernel module.

CUSE, which is an extension of FUSE to handle character devices, is younger and still evolving. There is still considerable controversy about how and when to use CUSE effectively.

Apparently there is even a BUSE, for Block device User Space drivers, although this appears to be sketchier still than the CUSE work. And you can certainly find people who still find the whole world of Linux device driver writing controversial, though you'd think that this program would have resolved most, if not all, of those complaints.

Luckily I don't have to write device drivers or filesystems myself, though I find them fascinating to study. I wandered down this path because I was trying to get a more clear understanding of the difference between the f_bsize and f_io_size fields in the statfs structure, and how those fields relate to the f_frsize and f_bsize fields in the statvfs structure. I'm still engaged in that investigation, but at least my side-trip into Linux user-space driver implementations was interesting.

I was a heavy user of DEC systems for many years. I worked on a VT100 emulator for IBM PC's back in 1988 at CCA in Boston. I had a VAXStation in 1990, and learned to program DBMS storage and recovery systems on it; we used to read the VMS microfiche to understand how the cluster lock manager worked. I once interviewed with a DEC team which was building an early workflow product to run on departmental mini-computers; that product eventually became Forte Fusion and was my first introduction to XML.

Later, I had an Alpha box, and ran Digital Unix; I remember that box as being the one where I learned about the various different techniques for synchronizing access to shared memory.

DEC rose, foundered, and then was bought by Compaq, which in turn was bought by HP.

Many brilliant engineers worked at DEC during its time; Mr Olsen, you certainly ran an influential and fascinating company.

Monday, February 7, 2011

Recently this has been a very interesting conference, with lots of information about the latest topics. Here's the current list of sessions, but I think it's just the tip of the iceberg; there will be more sessions than these.

Saturday, February 5, 2011

It will surprise nobody to hear that, for me, it's all about the code. Clear requirements statements are a must, crisply-written design specifications are crucial, architecture diagrams can convey lots of information, but when it comes to truly understanding a system, service, or API, I want code.

So I find myself really surprised that, after all these decades, no major computer system comes close to MSDN when it comes to recognizing the importance of code to students. Microsoft does provide plenty of those other materials, but just wander around for a while in the MSDN library and you'll immediately see what I mean about MSDN and code. For instance, here's a page I pretty much randomly brought up: FindFirstFile. There is plenty of descriptive text, of course, but there, big and clear and right up front is the Examples section.

And Microsoft's examples are, almost always, excellent: they are short and clear, but realistic; they include just enough complexity to show things like error-handling, parameter values, possible gotchas you should be prepared for. And if that example wasn't enough, look just after it:

Heck, their HTML even includes a convenient Javascript "Copy" button which nicely copies all the source code into your system clipboard to paste into your editor.

Furthermore, there's just no substitution for source code for understanding complex examples. Try this: Enter overlapped into the MSDN search box. You'll find yourself guided to Synchronization and Overlapped Input and Output, which is a fine page by itself, but (gasp!) no source code! Yet fear not, for the MSDN tech writers know what they are doing; within another click or two you'll be checking out GetOverlappedResult, and what does it say under the Examples section? That's right, "see Testing for the End of a File".

Simple things are simple, complicated things are possible, but, everywhere you go, and everywhere you look, source code. It's what programmers eat, drink, and breathe, and Microsoft is fully aware of that, and gives you lots and lots and lots of code.

So why don't other systems do this? It's been 30 years now, guys! Why is it then when I try the same thing for (say) Linux, and I want to learn how to read all the files in a directory, I can pretty quickly find my way to the Linux opendir API and its manual page, and that of readdir. But where's the source code?

Yes, there are 80 jazillion web sites around that collect this sort of thing, and try to make it available, and if you search for "linux opendir readdir" you'll soon enough find something tolerable, albeit short, poorly-formatted, and cursory.

So please, guys: come into this century, and learn from Microsoft! Your developer documentation is targeted at developers, so show them what they want: source code. When I type

man readdir

in my terminal window, it's great that I get a quick summary of the API and its arguments and return values, and it's nice that you've accumulated some text over the years about some of the complexities of calling the function and dealing with the situations that may arise.

But please, include the Examples! It matters, it really does.

And it's not just operating systems that suffer from these problems. Try learning to program a DBMS in SQL: you know you'll be needing to write a SELECT statement, so let's see what some manual pages look like. Here's one; pages and pages and pages of descriptive text, "railroad"-style syntax diagrams, and, buried down at the bottom, a handful of examples. But here's Microsoft's page; it's short, and what's that down at the bottom of the short and simple page? YES IT'S THE SAMPLE CODE!

Or suppose you want to do some network programming, and you're working in Java, so you look up the Javadoc for the socket object. Groan: text, text, text, and more text. Yes, it's all hyperlinked, but it's just more stupid text! How does Microsoft do here? Well, here it is: yes, there's a lot of text this time (boo on you Microsoft), but there, again, at the bottom: Yay! Example code! Simple, clear, ready to compile and run!

So please, everybody: I know it's hard to admit that Microsoft just has the whole world beat here, but can't all you other folk just get over that, and couldn't you just start putting those examples into the developer docs? Please?

Friday, February 4, 2011

A co-worker recently showed me another (yes, yet another) place in Windows 7 that I hadn't visited before:

Bring up Network and Sharing Center, then find your Local Area Connection

Click on Local Area Connection to bring up Local Area Connection Status, then click on the Properties button.

In the Local Area Connection Properties, click the button near the top labeled Configure...

You'll be taken to a new box labeled (something) Network Connection Properties.

Bwaa-hah-hah-hah-HA! You thought you were working with your connection properties, but now you have found your connection properties!

You are in a maze full of network properties, all alike...

In the Network Connection Properties dialog, you will now see a tab labeled Power Management

In the Power Management tab, you will find a checkbox:

Allow the computer to turn off this device to save power

On my machine, at least, this box was checked by default, which surprised me. I have a desktop computer! Why does Windows think it would be convenient or useful to sometimes turn off this device? Does it use a significant amount of power to keep my LAN network adapter running? I guess so...

Anyway, the point of this post, other than possibly letting you know about Yet Another Corner Of Windows 7 That You Haven't Visited Before, is to note that, although I found this checkbox, and I unchecked the checkbox, I am still a bit puzzled: how can I tell if this has had had any effect?

That is, how can I tell when Windows 7 has turned off my network connection device to save power, and how can I tell when Windows 7 has turned my device back on?

Wednesday, February 2, 2011

The Securities and Exchange Commission today voted unanimously to propose rules defining security-based swap execution facilities (SEFs) and establishing their registration requirements, as well as their duties and core principles.

Dodd-Frank requires security-based swap transactions that are required to be cleared through a clearing agency to be executed on an exchange or on a new trading system called a security-based swap execution facility.

The Dodd-Frank Act further requires security-based SEFs to be registered with the Commission and specifies that such a registered security-based SEF, among other things, must comply with 14 core principles.

The SEC press release calls out the 14 core principles in detail. I'm neither a lawyer nor a banker, but they seem like good solid principles to me.

It's great to see that this is finally occurring, but my it sure takes a long time:

He also wants new government oversight of the arcane world of credit default swaps, a business with a notional value and risk of $50 trillion. “Everyone is missing the elephant in the room,” he said.

It was the interlocking relationships between thousands of investors and banks over credit default swaps that pushed the Fed to help rescue Bear Stearns. In particular, Mr. Griffin wants the government to require the use of exchanges and clearing houses for credit default swaps and derivatives.

That way, instead of investment banks playing matchmaker between parties, an exchange will do it with strict rules in place, eliminating billions of dollars in exposure and creating more transparency.

“It’s not sexy, but it’s simple, it’s cost forward, its straightforward, and it’s what we should have done after 1998,” referring to the collapse of Long-Term Capital Management, a big hedge fund. He added that it “is a very sad commentary on where we are from a regulatory perspective” that such a move hasn’t happened already.

The world is a complicated place and I understand that these rules and regulations are complex and intricate. I'm pleased that progress is being made, just (slightly) dazed that it takes such an incredibly long time.