Friday, January 31, 2014

I had a great time at the ex-AmberPoint reunion this week. Many thanks to Tom and Becky for organizing and hosting it! It's fascinating how all my former co-workers look about a year older, yet I haven't aged at all...

Most other things, like the terrain generation seeds and entity locations use 64 bit doubles for locations, and they do much subtler things. For example, at extreme distances, the player may move slower than near the center of the world, due to rounding errors (the position has a huge mantissa, the movement delta has a tiny, so it gets cut off faster). The terrain generator can also start generating weird structures, such as huge blocks of solid material, but I haven’t seen this lately nor examined exactly what behavior causes it to happen. One major problem at long distances is that the physics starts bugging out, so the player can randomly fall into ground blocks or get stuck while walking along a wall.

But, at extreme distances from a player’s starting point, a glitch in the underlying mathematics causes the landscape to fracture into illogical shapes and patterns. “Pretty early on, when implementing the ‘infinite’ worlds, I knew the game would start to bug out at long distances,” Persson told me. “But I did the math on how likely it was people would ever reach it, and I decided it was far away enough that the bugs didn’t matter.”

Ostensibly, this fight was for control of a single system. To maintain one's sovereignty over a system requires that one pay certain maintenance fees, and Pandemic Legion (PL) messed this up. The system affected? Their staging system for the current war, B-R5RB in Immensea.

EVE Online has its own economics, politics, and trade systems, built almost entirely by players in the 10 years the game has been running. It also has its own wars, as huge alliances vie for control of tracts of space in the massively multiplayer online game. One such conflict came to a head yesterday in the biggest battle in the game's decade-long history. More than 2,200 of the game's players, members of EVE's largest alliances, came together to shoot each other out of the sky. The resultant damage was valued at more than $200,000 of real-world money.

It’s hard to keep up with the changing names in the news. H1Nwhat? Bird flu. Pig flu. MERS. SARS. Here is a quick overview of this dizzying, dyslexia inducing array, with what you need to worry about, even if some aren’t yet in your backyard.

Crossrail, the largest construction project in Europe, is tunneling under the British capital to provide a new underground rail link across the city, and has encountered not only a maze of existing modern infrastructure, but historic finds including mammoth bone fragments, Roman roads (with ancient horseshoes embedded in the ruts), Black Plague burial grounds, and 16th century jewelry.

Proof systems for NP let an untrusted prover convince a verifier that “x e L” where L is some fixed NP-complete language. Proof systems for NP that satisfy the zero knowledge and proof of knowledge properties are a powerful tool that enables a party to prove that he or she “knows” a secret satisfying certain properties, without revealing anything about the secret itself.

The existence of LLVM is a terrible setback for our community precisely because it is not copylefted and can be used as the basis for nonfree compilers -- so that all contribution to LLVM directly helps proprietary software as much as it helps us.

Ultimately, however, the disagreement between Stallman and O’Reilly—and the latter soon became the most visible cheerleader of the open source paradigm—probably had to do with their very different roles and aspirations. Stallman the social reformer could wait for decades until his ethical argument for free software prevailed in the public debate. O’Reilly the savvy businessman had a much shorter timeline: a quick embrace of open source software by the business community guaranteed steady demand for O’Reilly books and events, especially at a time when some analysts were beginning to worry—and for good reason, as it turned out—that the tech industry was about to collapse.

My favorite Stevie Wonder jam remains "Superstition" (1972) due in part to the epic clavinet riff. Listen how complex that iconic bit o' funk is by watching YouTube user Funkscribe dissect its intricacies by isolating the multiple parts in the multitrack recording files.

A skiplist is an ordered data structure providing expected O(Log(n)) lookup, insertion and deletion complexity. It provides this level of efficiency without the need for complex tree balancing or page splitting like that required by Btrees, redblack trees or AVL trees. As a result, it’s a much simpler and more concise data structure to implement.

In this paper we show that the two fundamental approaches to storage reclamation, namely tracing and reference counting, are algorithmic duals of each other. Intuitively, one can think of tracing as operating upon live objects or “matter”, while reference counting operates upon dead objects or “anti-matter”. For every operation performed by the tracing collector, there is a corresponding “anti-operation” performed by the reference counting collector.

Over the course of the season, I've discovered lots of different ways to hack Madden NFL 25 into a thing that no longer resembles football as we know it. I've played around with rules, injury settings, all manner of player ratings, player dimensions, and anything else the game's developers have made available to us.

This time is special, though, because I'm pulling out every single one of the stops at the same time. No other scenario I've built in Madden has been so abjectly cruel or unfair; no other scenario has even been close.

New files tend to be added to a Perforce server in batches. It's relatively rare to see a changelist with just a single file in it; changelists often have dozens or hundreds of files, and changelists with thousands of new files are not uncommon.

As we near the end of 2013, it's exciting to look back at this year. You can make a strong argument (and I do!) that this has been the busiest and most eventful year in the history of the Perforce server.

During the 'p4 sync' command, the server and the client are both busy: the server is sending files to the client as fast as the network connection will allow, and the client is writing those files to its filesystem as fast as the workstation will allow. And each time the client finishes one of those files, it sends an acknowledgement back to the server so that the server will know.

I think that Grigorik's editors at O'Reilly, in an excess of caution, did him a bit of a disservice. The book ought to be titled:

High Performance Networking
What every developer should know about networking performance

My point is just that Grigorik's book is far broader and more generally useful than its title might indicate.

The book is broadly divided into four sections:

Networking 101

Performance of Wireless Networks

HTTP

Browser APIs and Protocols

This is a nice order to present the material, moving smoothly from the most fundamental and broadest topics, ending with some specific guidance on particular recent developments in the browser networking world.

If I could fault the book, it's that it is weak on tools. There are many powerful tools for collecting and studying network performance issues, so many in fact that they deserve a book all of their own. Still, it would have been nice to get some general pointers and advice about the most common and necessary of those tools (netstat, tcpdump, traceroute, nmap, wireshark, etc).

As with many O'Reilly books, you can find nearly all of this material elsewhere, but O'Reilly have once again done the world a service by (a) finding an author who is a world class expert in the material (Grigorik works at Google on the networking libraries in the Chrome browser), and (b) gathering together all the relevant material in a single well-presented book.

It helps that Grigorik is a good writer and organizes and presents the material well.

It's hard to imagine a serious software engineer who wouldn't benefit from studying this book. All modern applications are network-aware, and all modern applications must consider performance if they are to be useful and well-accepted.

So: if you have made a career in software development, it's simple. This book belongs on your bookshelf. Better, it belongs on your desk, open, often-consulted.

Thursday, January 23, 2014

the early cast at Danger was largely from Apple, WebTV, General Magic, and Be. I was one of those Apple people. Joe had invited me to lunch one day—sushi, as I recall—and when we were finished eating he casually mentioned that he and Andy had started a new company and asked if I wanted a job. I said, “yes,” without even asking what the company was going to do. I figured that it would probably be cool.

Depending on whom you ask, Evgeny Morozov is either the most astute, feared, loathed, or useless writer about digital technology working today. Just 29 years old, from an industrial town in Belarus, he appeared as if out of nowhere in the late aughts, amid the conference-goers and problem solvers working to shape our digital futures, a hostile messenger from a faraway land brashly declaring the age of big ideas and interconnected bliss to be, well, bullshit.

The default language choice for Operating Systems courses is C. Nearly all (at least 90% from my cursory survey, the remainder using Java; if anyone knows of any others, please post in the comments!) current and recent OS courses at major US universities use C for all programming assignments. C is an obvious choice for teaching an Operating Systems course since is it the main implementation language of the most widely used operating systems today (including Unix and all its derivatives), and is also the language used by nearly every operating systems textbook (with the exception of the Silberschatz, Galvin, and Gagne textbook which does come in a Java version). To paraphrase one syllabus, "We use C, because C is the language for programming operating systems."

A housing shortage that has been building up for the past thirty years is reaching the point of crisis. The party in power, whose late 20th-century figurehead, Margaret Thatcher, did so much to create the problem, is responding by separating off the economically least powerful and squeezing them into the smallest, meanest, most insecure possible living space. In effect, if not in explicit intention, it is a let-the-poor-be-poor crusade, a Campaign for Real Poverty. The government has stopped short of explicitly declaring war on the poor. But how different would the situation be if it had?

The Ottomans in their extreme arrogance will only leave a token force of between 2K and 3K to besiege your capital. This is your saving grace. The smaller army will take longer to defeat the garrison and, unless dice rolls perpetually go against you as they are wont to do in critical battles, you should be able to defeat the army with your starting 3K and reset the siege.

But both De Rhoodes and the Irish coast guard already tried to find the ocean liner a slew of times last year (paywall), to no avail, as the New Scientist reports. The New Scientist explains that surveillance equipment has some pretty big deficiencies when pitted against the ocean’s vastness.

According to one buoy northwest of the island of Kauai, the surf heading toward Hawaii was at its highest level since 1986, Tom Birchard, a senior forecast for the National Weather Service in Honolulu, told the Los Angeles Times.

In recent years Formula 1 has made a big push toward efficiency, and to make the technology propelling guys like Sebastian Vettel and Fernando Alonso around the track at least somewhat relevant to the cars the rest of us drive. They’ve experimented with kinetic energy recovery systems and even toyed with the idea of making the cars run only on electricity in the pits. Beginning this year, teams are are downsizing from a 2.4-liter V8s to 1.6-liter V6s that feature direct injection, turbocharging and a pair of energy recovery systems that pull in juice from exhaust pressure and braking.

The piers of the cantilever truss aren’t holding the bridge up. They’re holding it down. “This is like a highly strung bow,” says senior bridge engineer Brian Maroney. (A bow made of 50 million pounds of steel.) “You don’t want to just cut the bow because the thing will fly off in all directions.” So crews will first remove the pavement on the upper deck to lighten the bridge’s load and reduce the tension. Next they’ll isolate steel supports, jacking them out of tension until they can be cut without whipping apart. Then they’ll slowly release the jacks.

All told, demolition crews will remove 58,209 tons of steel and 245,470 tons of concrete that make up the 1.97-mile eastern span. The contractors will determine where the crushed and twisted remains of the bridge end up, Gordon said. Most will probably be either recycled or reused. Some pieces may be saved for a park planned at the eastern end of the bridge so people have something from the old span to remember.

Interestingly, even though this is a 300 million dollar project (at least) and will take 3 years (at least), it is a lot harder to find up-to-date status information about the demolition project, beyond high level summary documents.

I guess people aren't as interested anymore. Just cut it down and get it out of there.

A number of older bridges across the San Francisco Bay have been removed in recent years, most notably the old Carquinez Straits bridge.

At some points in the game, it is these things within a surprisingly short period of time.

The overall story involves the adventures of two brothers, who are traveling far away to attempt to get medicine for their desperately ill father.

You "play" both brothers simultaneously, operating one brother with your left hand and the other brother with your right hand. It's a bit challenging to get the hang of this, but it isn't as hard as it sounds.

The game auto-saves often, and it mostly guides you where it wants you to go. The joy of the game is in solving the puzzles, and in following the story as it develops.

I'm very glad I played this game, and in many ways I don't want to say anything else about it because if you decide to play it, you should have the same experience I did.

Our hotel is functional and clean. We luck into the top floor corner room, which has an extended balcony that wraps around to a secluded area all our own. Nice!

The hotel is the staging area for this weekend's regional Mock Trial competition, so there are dozens of high schools with their teams. In the morning, the Mock Trial participants are all dressed up, looking smart, enjoying the hotel buffet.

After breakfast, we make the short 5 minute drive to Pt Lobos. Arriving at 9:00, we find the refuge is already packed, and are glad we got an early start.

Many years ago, the Tasmanian painter Francis McComas declared that Point Lobos is "the greatest meeting of land and water in the world," and there's no denying the correctness of his assessment.

We park at the Sea Lion Point trailhead, one of our favorites, and spend 45 minutes walking around the point. There are cormorants everywhere, and a few odd black birds with bright red beaks. When the birds settle down to rest, they tuck those long red beaks back up into the shoulder feathers.

The rocks at Sea Lion Point are as majestic as ever, with crashing surf and pounding waves. You can hear the sea lions before you see them, but once you see them you realize the rocks are carpeted with the giant animals, with dozens more swimming in this water. A docent estimates there are 400 within sight from where we stand.

By 10:15, it's already quite warm, so we shed our fleece pullovers and head out for a longer walk.

We take the beautiful loop trail through the cypress grove, remarking as we walk about how much drier it is than last year. As we round the corner to the rocks of North Point, others on the trail are pointing and talking excitedly.

We get to the point and focus our binoculars. Down in the cove is a churning activity that clarifies with the strong lenses of the binoculars: dolphins! Dozens of them, leaping and diving and swirling around.

And then, within the blur of dolphin fins, the unmistakable broad gray shape, then the spout, and the fluke: whales!

Mesmerized, we spend 20 minutes watching this group of at least 6 gray whales, with dozens of dolphins and other hangers-on, contentedly feeding and exploring and playing, before they proceed back out of the cove and resume swimming south. Their spouts and flukes can be just barely spotted in the distance until they are several miles away.

Back on our feet, we take the North Shore Trail to Whalers Cove. Shortly before we reach the cove, a clear spot on the trail offers a gorgeous view down into the crystal clear waters of the preserve, 75 feet below us.

At first, we think we are watching a plastic grocery bag, floating in the surge, but with the clear view of the binoculars there is no doubt: there is a large whitish-brown jellyfish, floating in the surf. It must be 18-24 inches in diameter. We watch it carefully and marvel: it swells and contracts, floating its tentacles in the current, calm and peaceful in the gentle waters near the kelp.

We reach Whaler's Cove just in time to meet and talk with the divers. One has found a "decorator crab", a rather unusual creature whose shell is covered in grasses and algae.

We retrace our steps back to where we left the car. We stop again at the spot where we saw the jellyfish. This time, at the same location, we are struck by an amusing spectacle: a large log is floating in the cove and 3 or 4 sea lions are climbing up on it, taking the sun, then slipping back into the water with nary a sidelong glance at their companions. At time, the log has several sea lions atop it, but they don't remain perched for long.

It is lunchtime, so we head over to the picnic area at the Bird Island trailhead, perhaps the most scenic picnic area within a 150 mile radius of our home (and in Northern California, that's strong praise!). A shady and quiet table provides the perfect 45 minute break.

Sunday morning, the weather is glorious, so we decide to return to Pt Lobos. We park on Route 1 and take the pedestrian entrance. We follow the Carmelo Meadows Trail and the Moss Cove Trail out to Granite Point.

From the Granite Point overlook, the coves and points and bays spread out below us. The visibility is superb and we spend 45 minutes just watching.

In addition to 6 sea otters, peacefully floating on the large kelp beds of Whaler's Cove, and dozens of sea lions and seals swimming and bobbing in the surf, the most remarkable sight unfolds slightly further away from shore. Leaping and diving and swimming, there are dolphins everywhere! In any direction, once we focus the binoculars, we can see dolphins swimming along. These are big animals, perhaps 10 feet long (probably they are Pacific white-side dolphins), and we can clearly, if faintly, see their fins, bodies, and flukes, as they jump out of the water and dive back in, perhaps 1/4 a mile from shore. Amazing.

Thursday, January 16, 2014

HTTP, if you're somehow not aware of it, is the HyperText Transport Protocol, the application protocol that supports the entire World Wide Web. It's what browsers use to talk to web servers, and moreover has been built upon by all sorts of other applications such that basically any application or device on the Internet is using HTTP to communicate with the other networked entities in the world.

HTTP 1.1 has been around for 20 years, and has worked remarkably well, but it has many problems, and we need to do something to fix them.

But with the universality of HTTP 1.1, replacing it is a delicate process. The new protocol must be extremely carefully engineered, learning from all of the experience that we've gained over those 20 years, to include all the good bits of HTTP 1.1, fix its problems, and yet somehow not introduce new problems.

Furthermore, rolling out HTTP 2.0 is very complex, because we can't just

Shut down the web

Upgrade everything

Start it back up again

Instead, the new protocol must be precisely built to be upgradable and highly compatible with HTTP 1.1, with version identifiers and protocol negotiation features so that devices can support both HTTP versions, in parallel, for as long as it takes (years? decades?).

All in all, it's certainly the biggest computing project that the world has undertaken, and it will affect all seven billion of us, so it's worth understanding both why it's so hard, and what the people working on it are doing.

In his epic, book-length blog post, Chan takes us carefully and thoroughly through the HTTP 2 work, describes the major problems that are being worked on, points out where there is consensus and where there is still controversy, all in a clear and easy to follow style.

Even better, as befits an article about the basic protocols of the web, Chan's essay is simply riddled with hyperlinks. If there's information about HTTP 2 that's worth reading, it's linked to from Chan's article.

You're surely not going to follow every hyperlink, and you're surely not going to want to deeply understand every detail that Chan covers. Even Chan, who is one of the core engineers building the new standard, leaves certain areas to others, and when he marks a topic as "TL;DR", it undoubtedly is.

But you almost certainly want to skim this article, so that at least you're aware of what's going on, where the major battlegrounds are, and why it matters.

And, if you're like me, and building Internet-friendly networked software is the core of your professional existence, you're probably going to be busy studying Chan's essay for a while to come...

Wednesday, January 15, 2014

I'm playing as Castile, and getting a pitifully low income from my trading operations.

Among other mistakes, I think I'm:

Placing a merchant in the trade node where my capital resides (Sevilla). According to this webpage, that's wholly unnecessary and just wastes my merchant.

Placing a merchant in Bordeaux, which is indeed a trade node where I own provinces, but I've told my merchant to collect revenue, when instead he should be steering it.

Placing a merchant in trade nodes like Frankfurt, Antwerp, or Venice, which are indeed valuable nodes, but are regions where I have absolutely no presence whatsoever.

Next time I find some time to play the game, I really must try to learn more about how the trade functionality works, because I have this feeling I could be getting five times as much value out of my merchants and trade as I currently do.

There are lots of things I want to try out in the game, but for the time being I can't do much of anything, because I have absolutely no money at all, which I suspect is largely due to my ineptitude at trading.

As the authors describe, they started out thinking about how cloud computing has changed the needs of filesystem users, and ended up blending ideas from distributed systems and from version control to arrive at their project:

Ori is a file system that manages user data in a modern setting where users have multiple devices and wish to access files everywhere, synchronize data, recover from disk failure, access old versions, and share data. The key to satisfying these needs is keeping and replicating file system history across devices, which is now practical as storage space has outpaced both wide-area network (WAN) bandwidth and the size of managed data. Replication provides access to files from multiple devices. History provides synchronization and offline access. Replication and history together subsume backup by providing snapshots and avoiding any single point of failure.

The paper is fascinating, but rather chaotic: they cover a lot of ground, and do so in a hurry.

Of course, the ideas that they are building on are not new: distributed filesystems have been around for decades, version control has been around for decades, and even the idea of including version control in filesystems has been around for decades (Microsoft's Previous Versions feature was introduced in Windows Vista in 2003, if I recall correctly).

They do have some interesting new ideas about how to "graft" filesystems together, and what that might mean:

Ori is designed to facilitate file sharing. Using a novel feature called grafts, one can copy a subtree of one file system to another file system in such a way as to preserve the file history and relationship of the two directories. Grafts can be explicitly re-synchronized in either direction, providing a facility similar to a distributed version control system *DVCS) such as Git. However, with one big difference: in a DVCS, one must decide ahead of time that a particular directory will be a repository; while in Ori, any directory can be grafted at any time. By grafting instead of copying, one can later determine whether one copy of a file contains all changes in another (a common question when files have been copied across file systems and edited in multiple places).

This idea of analyzing history and looking for evidence about the pedigree of a change is a classic Version Control topic, the sort of thing that practitioners spend all their waking hours contemplating. It's enormously powerful, but devilishly intricate in the details.

With a project like this, defined mostly as new ways to interconnect older ideas, it's all about the execution of the ideas.

The Ori team have decided to develop their implementation in the open, and have open sourced the project from their project page.

It will be interesting to follow this project and see how it develops.

Monday, January 13, 2014

I don't watch many Sci Fi movies, but I happened to check out last summer's Europa Report, now available on streaming services like Netflix.

I thought this review did a super job of describing the movie, including this:

The film is inspired by the real-life 2011 discovery of water beneath the ice of Europa, one of Jupiter’s moons. In the film, directed with a combination of technical flair and great calm by Sebastian Cordero, a private firm run by Dr. Samantha Unger (Embeth Davidtz) sends a crew to Europa, to see if there is life in that water.

A group of scientists blasts off for Jupiter, a trip we see through found footage and the many cameras that operate inside the vessel. To say that the trip is low-key is probably an understatement, but the patience with which Cordero paces this part of the film leads to a more realistic experience. Between the scientific jargon — just enough, without being off-putting — and the way the film is structured, it plays like the documentary it purports to be.

The last 5 minutes of the movie are kind of a mess, but overall it was quite enjoyable.

It appears that the Netflix suggestion engine actually suggested a movie I liked!

Maphead instantly grabbed me, perhaps because Jennings and I could be a pair of those "separated at birth?" twins. Here Jennings is describing himself, but ends up describing me almost to a "T":

Today, I will still cheerfully cop to being a bit of a geography wonk. I know my state capitals -- hey , I even know my Australian state capitals. The first thing I do in any hotel room is break out the tourist magazine with the crappy city map in it. My "bucket list" of secret travel ambitions isn't made up of boring places like Athens or Tahiti -- I want to visit off-the-beaten-path oddities like Weirton, West Virginia (the only town in the United States that borders two different states on opposite sides) or Victoria Island in the Canadian territory of Nunavut (home to the world's largest "triple island" -- that is, the world's largest island in a like on an island in a lake on an island).

Jennings is a cheerful writer, and his book flows very nicely. As with many nonfiction works, each chapter of the book is mostly self-contained, and describes some map-related topic or another:

The visit Jennings makes to the Map Library of the Library of Congress

Jennings visiting the "Century Club," composed of travelers who have visited at least 100 countries.

A fun group who compete in mental "Rally Races," which have the form and structure of typical Road Rally races, but which are conducted entirely from your armchair, with an atlas in your lap. (In some ways, these rallies are considerably more challenging than the In Real Life sort; Jennings confesses that he barely scores 50% on the rally.)

The modern game of Geocaching, made possible by the opening up of the military's GPS satellites.

And so forth.

If these are the sorts of topics that fascinate you, you'll enjoy Maphead very much. Myself, my first full-time job in college was in basement level B, in a windowless room next door to the Map Library of the Regenstein Library, so you know where I fall on that spectrum.

I'm done with Maphead for now, but I'm surely not done with maps: I'm heading back to my Europa Universalis game to see if I can figure out whether it's better to concentrate my trade flows through Bordeaux or through Genoa; if that's not a map-related activity, what is?

Thursday, January 9, 2014

On many Unix systems, the marvelous lsof utility provides vast amounts of information about what your processes are doing.

One part of that information is supposed to include details about the locks that are held on the open files:

The mode character is followed by one of these lock characters, describing the type of lock applied to the file:

N for a Solaris NFS lock of unknown type;
r for read lock on part of the file;
R for a read lock on the entire file;
w for a write lock on part of the file;
W for a write lock on the entire file;
u for a read and write lock of any length;
U for a lock of unknown type;
x for an SCO OpenServer Xenix lock on part of the file;
X for an SCO OpenServer Xenix lock on the entire file;
space if there is no lock.

On Linux, at least, this works great; I see the lock characters as I expect.

Lsof has code to test for locks defined with lockf() or
fcntl(F_SETLK) under NEXTSTEP 3.1, but that code has never
been tested. I couldn't test it, because my NEXTSTEP 3.1
lockf() and fcntl(F_SETLK) functions return "Invalid
argument" every way I have tried to invoke them.

I'm not sure if this section applies to the Mac OS X version of lsof, or not.

For yucks, I tried looking at the source, and it sure looks like the implementation retrieves the access mode and turns it into 'r', 'w', or 'u', appropriately, and also retrieves the file size and type information, but there doesn't appear to be any code to fetch the lock information.

Searching the web for more information found a random web page or two where others seemed to say that they couldn't get this feature of lsof to work on their Mac.

And, of course, I can't get it to work on my Mac (I appear to be running lsof 4.82 on Mac OS X 10.6.8, if that's relevant).

As you can see from the map, there are entire states that are covered by snow right now: North Dakota, Minnesota, Wisconsin, Maine.

But California?

Nada.

Well, OK, there is a small ellipse of snow.

And when you look at the the numeric data, you can see that, above 7000 feet or so, there is in fact about a foot of snow on the ground, and by the time you get to 9,000 feet, perhaps two feet of snow on the ground.

But, at this time of year, those numbers would more typically be 4 feet of snow at the lower elevations, and 10 feet of snow at the higher elevations.

The nightly news contain hopeful weather forecasters who stare at the blank maps of weather fronts, mumble things about "a persistent ridge of high pressure over the Western Pacific," and speculate wildly about "a 5% chance of some precipitation a week from now."

As shown in Table 2, only three out of 18 databases provided serializability by default, and eight did not provide serializability as an option at all. This is particularly surprising when we consider the widespread deployment of many of these nonserializable databases, like Oracle 11g, which are known to power major businesses and product functionality. Given that these weak transactional models are frequently used, our inability to provide serializability in arbitrary HATs appears non-fatal for practical applications. If application writers and database vendors have already decided that the benefits of weak isolation outweigh potential application inconsistencies, then, in a highly available environment that prohibits serializability, similar decisions may be tenable.

Let's see if I can restate that in plain English:

Lots of major businesses use Oracle 11g, and therefore that's good enough.

It might seem that this is just a sort of throwaway observation, but the authors return to it several times later in the paper. In Section 5.3, at the end of the meat of the paper, which is a long and detailed discussion of transaction theory, the authors summarize their work, and then comment upon the summary:

In light of the current practice of deploying weak isolation levels (Section 3), it is perhaps surprising that so many weak isolation levels are achievable as HATs. Indeed, isolation levels such as Read Committed expose and are defined in terms of end-user anomalies that could not arise during serializable execution. However, the prevalence of these models suggests that, in many cases, applications can tolerate these associated anomalies. Given our HAT compliance results, this in turn hints that–despite idiosyncrasies relating to concurrent updates and data recency–highly available database systems can provide sufficiently strong semantics for many applications.

And, still later, they reflect on this observation once more, in the conclusion to the paper:

Despite these limitations, and somewhat surprisingly, many of the default (and sometimes strongest) semantics provided by today’s traditional database systems are achievable as HATs, hinting that distributed databases need not compromise availability, low latency, or scalability in order to serve many existing applications.

In this paper, we have largely focused on previously defined isolation and data consistency models from the database and distributed systems communities. Their previous definitions and, in many cases, widespread adoption hints at their utility to end-users.

About half-a-dozen years ago, I was privileged to hear a high-placed executive from the largest telecommunications company on the planet describe the state of their internal systems. During the talk, he made a statement that can be loosely paraphrased as follows:

Based on our internal data, we've estimated that nearly 70% of the orders that enter our order processing systems get lost at some point in the overall workflow, and manual intervention is required to correct the damaged order and restart the processing.

I'd like to say that this is an outlier, but in fact, the state of correct enterprise application software is horrific. As a practical matter, "many existing applications" are complete junk. Have you ever interacted with a call center? Tried to correct personal information that a company holds about you? Attempted to track down the status of a work request? Our existing applications simply don't work, and what's going on is that an army of humans spend all of their waking hours dealing with and resolving the mess that these applications leave behind.

So, from the perspective of one of the poor users of the existing applications that litter the planet, may I politely suggest that AGAOIGE is not a sufficient standard for software quality, and that, when designing the systems of the future, we should do everything possible to hold ourselves to a more rigorous measurement?

Kreps starts out with a history of how database implementations have used logs for decades to record the detailed changes performed by a database transaction, so that those changes can be undone in case of failure or application abort, redone in case of media crash, or simply analyzed and audited for additional information.

Then, in an insightful observation, he shows the duality between understanding logs, and the "distributed state machine" approach to distributed systems design:

You can reduce the problem of making multiple machines all do the same thing to the problem of implementing a distributed consistent log to feed these processes input. The purpose of the log here is to squeeze all the non-determinism out of the input stream to ensure that each replica processing this input stays in sync.

I love the phrase "squeeze all the non-determinism out of the input stream;" I may have to make up some T-shirts with that!

Later, ruminating on the wide variety of uses of historical information in systems, Kreps brings up a point dear to my heart:

This might remind you of source code version control. There is a close relationship between source control and databases. Version control solves a very similar problem to what distributed data systems have to solve—managing distributed, concurrent changes in state. A version control system usually models the sequence of patches, which is in effect a log. You interact directly with a checked out "snapshot" of the current code which is analogous to the table. You will note that in version control systems, as in other distributed stateful systems, replication happens via the log: when you update, you pull down just the patches and apply them to your current snapshot.

Data about the history of a system is incredibly valuable, and it's so great to see an author focus on this point, because it's often overlooked.

One difference between log processing systems and version control systems, of course, is that log processing systems record only a single timeline, while version control systems allow you to have multiple lines of history. Version control systems provide this with their fundamental branching and merging features, which makes them simultaneously more complex and more powerful than database systems.

Also, of course, it makes them slower; no one is suggesting that you would actually use a version control system to replace your database, but it is very valuable, as Kreps observes, to contemplate the various ways that the history of changes provides you with great insight into the workings of your system and the behavior of your data. That is why many uses of major version control systems (such as the one I build in my day job) use them for many types of objects other than source code: art, music, circuit layouts, clothing patterns, bridge designs, legal codes, the list goes on and on.

I'll return to this point later.

But back to Kreps's article. He suggests that there are three ways to use the basic concept of a log to build larger systems:

Data Integration—Making all of an organization's data easily available in all its storage and processing systems.

Real-time data processing—Computing derived data streams.

Distributed system design—How practical systems can by simplified with a log-centric design.

Data Integration involves the common problem of getting more value out of the data that you manage by arranging to export it out of one system and import it into another, so that the data can be re-used by those other systems in the manner that best suits them. Says Kreps about his own experience at LinkedIn:

New computation was possible on the data that would have been hard to do before. Many new products and analysis just came from putting together multiple pieces of data that had previously been locked up in specialized systems.

As Kreps observes, modeling this process using an explicit log reduces the coupling between those systems, which is crucial in building a reliable and maintainable collection of such systems.

The log also acts as a buffer that makes data production asynchronous from data consumption. This is important for a lot of reasons, but particularly when there are multiple subscribers that may consume at different rates. This means a subscribing system can crash or go down for maintenance and catch up when it comes back: the subscriber consumes at a pace it controls. A batch system such as Hadoop or a data warehouse may consume only hourly or daily, whereas a real-time query system may need to be up-to-the-second. Neither the originating data source nor the log has knowledge of the various data destination systems, so consumer systems can be added and removed with no change in the pipeline.

This is just the producer-consumer pattern writ large, yet too often I see this basic idea missed in practice, so it's great to see Kreps reinforce it.

Often, proposals like this bog down in the "it'll be too slow" paranoia, so it's also great to see Kreps address that concern:

A log, like a filesystem, is easy to optimize for linear read and write patterns. The log can group small reads and writes together into larger, high-throughput operations. Kafka pursues this optimization aggressively. Batching occurs from client to server when sending data, in writes to disk, in replication between servers, in data transfer to consumers, and in acknowledging committed data.

I can wholeheartedly support this view. The sequential write and sequential read behavior of logs is indeed astoundingly efficient, and a moderate amount of effort in implementing your log will provide you with a logging facility that can support a tremendous throughput. I've seen production systems handle terabytes of daily logging activity with ease.

After discussing the traditional uses of logs, for system reliability and and for system integration, Kreps moves into some less common areas with his discussion of "Real-time Stream Processing".

The key concept that Kreps focuses on here involves the structuring of the consumers of log data:

The real driver for the processing model is the method of data collection. Data which is collected in batch is naturally processed in batch. When data is collected continuously, it is naturally processed continuously.

This is actually not such a new idea as it might seem. When database systems were first being invented, in the 1970's, they were initially built with entirely separate update mechanisms: you had your "master file," which was the most recent complete and consistent set of data, against which queries were run by your users during the day, and your "transaction file," or "batch file," to which changes to that data were appended during the day. Each night, the updates accumulated in the transaction file were applied to the master file by the nightly batch processing jobs (written in Job Control Language, natch), so that the master file was available with the updated data when you opened for business the next morning.

During the 1980's, there was a worldwide transition to more powerful systems, which could process the updates in "real time," supporting both queries and updates simultaneously, so that the results of your change were visible to all as soon as you committed the change to the system.

This was a huge breakthrough, and so, at the time, you would frequently see references to "online" DBMS systems, to distinguish these newer systems from the older master file systems, but that terminology has been essentially jettisoned in the last 15 years, as all current Relational Database Management Systems are "online" in this sense.

Nowadays, those bad old days of stale data are mostly forgotten, as almost all database systems support a single view of accurate data, but you still hear echoes of that history in the natterings-on about NoSQL databases, and "eventual consistency", and the like.

Because, of course, building distributed systems is hard, and requires tradeoffs. But since there's no reason why any application designer would prefer to run queries that get the wrong answer, other than "it's too expensive for us right now to build a system that always gives you the right answer," I predict that eventually "eventual consistency" will go the way of "master file" in computing, and somebody 30 years from now will look at these times and wistfully remember what it was like to be a young programmer when we old greybeards were struggling with such issues.

Anyway, back to Kreps and his discussion of "Real-time Stream Processing". After describing the need for consuming data in a continuous fashion, and relating tales of woe involving a procession of commercial vendors who promised to solve his problems but actually only provided weaker systems, Kreps notes that LinkedIn now use their logging infrastructure to keep all their various systems up to date in, essentially, real time.

The only natural way to process a bulk dump is with a batch process. But as these processes are replaced with continuous feeds, one naturally starts to move towards continuous processing to smooth out the processing resources needed and reduce latency.

LinkedIn, for example, has almost no batch data collection at all. The majority of our data is either activity data or database changes, both of which occur continuously.

They've also done a great job in providing lots of additional documentation and design information. For example, one of the basic complications about maintaining transaction logs is that you have to deal with the "garbage collection problem", so it's great to see that the Kafka team have documented their approach.

Kreps closes his article with some speculation about systems of the future, noting that the trends are clear:

if you squint a bit, you can see the whole of your organization's systems and data flows as a single distributed database. You can view all the individual query-oriented systems (Redis, SOLR, Hive tables, and so on) as just particular indexes on your data. You can view the stream processing systems like Storm or Samza as just a very well-developed trigger and view materialization mechanism. Classical database people, I have noticed, like this view very much because it finally explains to them what on earth people are doing with all these different data systems—they are just different index types!

Yes, exactly so! This observation that many different data types can be handled by a single database system is indeed one of the key insights of the DBMS technology, while the related observation that many different kinds of data (finance data, order processing data, multimedia data, design documents) can benefit from change management, configuration tracking, and historical auditing is one of the key insights from the realm of version control systems, and the idea that both abstractions find their basis in the underlying system transaction log is one of the key insights of Kreps's paper.

Kreps suggests that this perspective offers the system architect a powerful tool for organizing their massive distributed systems:

Here is how this works. The system is divided into two logical pieces: the log and the serving layer. The log captures the state changes in sequential order. The serving nodes store whatever index is required to serve queries (for example a key-value store might have something like a btree or sstable, a search system would have an inverted index). Writes may either go directly to the log, though they may be proxied by the serving layer. Writing to the log yields a logical timestamp (say the index in the log). If the system is partitioned, and I assume it is, then the log and the serving nodes will have the same number of partitions, though they may have very different numbers of machines.

The serving nodes subscribe to the log and apply writes as quickly as possible to its local index in the order the log has stored them.

The client can get read-your-write semantics from any node by providing the timestamp of a write as part of its query—a serving node receiving such a query will compare the desired timestamp to its own index point and if necessary delay the request until it has indexed up to at least that time to avoid serving stale data.

The serving nodes may or may not need to have any notion of "mastership" or "leader election". For many simple use cases, the serving nodes can be completely without leaders, since the log is the source of truth.

You'll have to spend a lot of time thinking about this, as this short four-paragraph description covers an immense amount of ground. But I think it's an observation with a great amount of validity, and Kreps backs it up with the weight of his years of experience building such systems.

I would suggest that Kreps's system architecture can be further distilled, to this basic precept:

Focus on the data, not on the logic. The logic will emerge when you understand the data.

That, in the end, is what 50 years of building database systems, version control systems, object-oriented programming languages, and the like, teaches us.

Finally, Kreps closes his article with a really super set of "further reading" references. I wish more authors would do this, and I really thank Kreps for taking the time to put together these links, since there were several here I hadn't seen before.

Although I began my career in database systems, where I worked with traditional recovery logs in great detail, designing and implementing several such systems, I've spent the last third of my career in distributed systems, where I spend most of my time working with replicated services, kept consistent using asynchronous log shipping.

This means I've spent more than three decades working with, implementing, and using logs.

So I guess that means I was the perfect target audience for Kreps's article; Kreps and I could be a pair of those "separated at birth" twins, since it sounds like he spends all his time thinking about, designing, and building the same set of distributed system infrastructure facilities that I do.

But it also means that it's an article with real practical merit.

People devote their careers to this.

Entire organizations are formed to implement software like this.

If you're a software engineer, and you hear developers in your organization talking about logs, replication, log-shipping, or similar topics, and you want to learn more, you should spend some serious time with Kreps's meaty article: read and re-read it; follow all the references; dissect the diagrams; discuss it with your colleagues.