About: A collection of links and short
commentary, published weekly. In theory the inclusion criteria
are for it to be something I read last week (duh) and was either
particularly interesting or something I might want to refer to
later.
Subscribe with RSS
or see my main blog for
long-form writing.

Why POSIX filesystem semantics aren't a good fit for large scale systems. (The funny thing is that it never occurred to me that anyone would use POSIX APIs in a modern distributed context. But apparently in supercomputing they do).

Mison: A Fast JSON Parser for Data AnalyticsExecute queries on a JSON file without parsing the file proper. Do a quick SIMD-based pass that gets the field/value locations. Do a full parse of a small sample of the file. Use the sample to predict the structural locations of the fields the user is interested in. Combine these bits of information to speculatively find the physical locations of the relevant fields.

COST in the land of databases> In the past two years the database community has gotten substantially worse at doing graph processing with SQL. I wouldn't recommend it, really. They've gone from a 16-core machine being only 10x slower than a single-threaded for-loop, to being up to 1000x slower but only on graphs up to 500MB in size.

Rendezvous Hashing: My Baseline “Consistent” Distribution Method> Regardless of the reason for consistent hashing’s popularity, I feel the go-to technique should instead be rendezvous hashing. Its basic form is simple enough to remember without really trying (one of those desert island algorithms), it is more memory efficient than consistent hashing in practice, and its downside–a simple implementation assigns a location in time linear in the number of hosts–is not a problem for small deployments, or even medium (a couple racks) scale ones if you actually think about failure domains.

How do you enjoy the process of creating a new language,
when you've been writing compilers for a long time? By adding artificial restrictions. Only assembly; no libraries, programming languages, or code generators.

Implement a minimal BCPL-like language in assembly. Then use that to implement a Lisp interpreter, that interpreter to create a reasonably featured VM (e.g. garbage collector, delim. continuations). Write a compiler, assembler, disassembler, linker targeting the VM. Then use these tools to write the language you originally wanted with objects, pattern matching, and non-sexp macros.

Why Command And Vector Processors Rock> You see, the Amiga documented its command processor. The designers wanted you to write programs that ran on it. They wanted you to use it for doing all sorts of clever things. They recognized that the power to operate the underlying horsepower directly was something that could amplify the capabilities of a system way past the limits of its original design.

Read this as part of some archaeology into numeric representation in early Lisp systems. But it actually turned out to be pretty neat systems paper in general. One thing that's striking is how readable this 50 year old paper still is. The vocabulary of systems programming has changed surprisingly little (just switched from words to bytes), and even the problems being solved are at the core the same. It's all about memory hierarchies, even at the dawn of computing.

Paper describes an early version of BBN Lisp for a machine with 16K words of core memory, and 88K words of absurdly slow drum memory. Hardware has no paging support. How do you make efficient use of the drum memory, to fit in meaningful programs? So you need to somehow do paging in software, and reorganize the data layouts to minimize pointer chasing and page faults. (The latter bit is what I was really interested in, while looking at the history of tagged pointers).

Slides with anecdotes on game optimization in general, but on the Jaguar CPU in particular. E.g. didn't realize you really have to use SIMD on those CPUs, or you can't even use the full cache bandwidth. Neat example of a custom spatial database near the end.

The painful step-by-step journey of implementing a seemingly trivial optimization in a production compiler. Especially the "Lessons Learned" part is great; I'm fighting the temptation not to just quote all of it here.

> I switched to a 12” MacBook before I started working on my swiftc PR. It was so slow that I was only able to iterate on the code once a day, because a single compile and test run would take all night. I ended up buying a top-of-the-line 15” MacBook Pro because it was the only way to iterate on the codebase more than once a day.

> It’s really easy to break swiftc because of how complex it is. My original pull request was approved and merged in a month. Despite only having about 200 lines of changes, I received 125 comments from six reviewers. Even after that much scruitiny, it was reverted almost immediately because it introduced a memory leak that a seventh person found after running a four hour long standard library integration test.

A chip reverse engineering story with the best digressions. It's not just about figuring out that the supposed RAM chip is actually a touch tone dialtone generator; it's also figuring out the maths on every dialtone generator on the market to exactly identify this one. And then going into some semiconductor physics for good measure.

Slava Pestov reads through The NeWS Book: An Introduction to the Network/Extensible Window System from 1989. I never knew anything about NeWS, except from the Unix Haters Handbook X11 rant, so it was nice to fill it in with some more facts.

> Specifically, what I needed was mostly like a tree diff but I wasn’t optimizing for the same thing as other algorithms, what I wanted to optimize for was resulting file size, including indentation.

Many people don't appreciate how complicated handling configuration data is in the real world. (Pretty much every one of my jobs has at some point turned into a configuration handling nightmare). This is a good story on exactly that. There's a need for a seemingly very simple config manipulation operation, but a couple of weeks later you find yourself doing dynamic programming.

(Also, this is not just a good story, but a great example on just how to present an algorithm).

How to make a practical web search system using bloom filters rather than an inverted index. I especially like the notes on how classical problems of signature-based don't really matter in this domain. E.g. a modest amount of false positives is not a problem, since the full result set needs to be scored no matter what. Or how sharding the index by number-of-unique-terms was impractical in the past due to excessive disk seeks, but no problem when the index needs to be sharded to hundreds of machines anyway.

Reverse engineering the microcode in Athlons and Phenoms. Half of this work was done by mutating existing microcode update files, and probing the behavior of various instructions in a minimal operating system. The other half was done by delayering a CPU and using a electron microscope to find and read the microcode ROM.

Another trip to crazytown. How Windows Vista would artificially limit network throughput if any sound was playing. (With an effect that would be magnified linearly as more NICs were added to the machine).
Brought up in the HN discussion of my PS4 download speed post.

A case study in how not to change defaults when evolving a program from one use case to another. (Any blog platform will inevitably try to transform into a general purpose CMS and call a dystopian hellscape of ecommerce plugins an "ecosystem"). But I can't understand how anyone would think that changing the default RSS feed item count from 10 (which sounds pretty standard) to infinite could be the right thing.

A good discussion on the problems with transparent huge pages. (I turn them off at work for our data analysis machines, due to some absolutely crippling throughput issues they cause. Really need to check whether that server is already running on a 4.6+ kernel, with the supposedly improved THP behavior mentioned in this thread.)

Why does Linux load average include processes that are blocked on swapping. (Never realized they did; thought it used the classical definition). You know it's good software archaeology when it's treating with something that's still relevant today, and the search bottoms out in MACRO-10 code.

The thesis here is that the Linux kernel isn't a monorepo. Instead it's a monotree with multiple repositories. There are multiple repositories, e.g. the main one by Linus, subsystem specific ones, etc. Hence not a monorepo. But all of those repositories are rooted in the same tree, with changes flowing between the repos arbitrarily (so they're not polyrepos, which would generally need to be totally independent of each other). Hence the need for the new term.

> But CSS wouldn’t be introduced for five years, and wouldn’t be fully implemented for ten. This was a period of intense work and innovation which resulted in more than a few competing styling methods that just as easily could have become the standard.

Mike Hearn on the hard lessons about user account authentication learned at Google. I think I disagree about the ultimate conclusion about it being futile to implement your own system and just use OAuth to piggyback on Google/FB auth. Or that the only good alternative is session-token generating email links. As a user I don't think I'd like either of those.
But it's still super important to be aware of the actual issues.

On the social implications of rating systems in games.
What happens when the output of a rating system stops being used as a prediction, and instead becomes a status symbol? (And an argument for keeping MMRs purely hidden, while making the public "ranks" something you can advance on by sufficient grinding).

Packing functions in memory such that caller/callee are more likely to be on same cache line / same page. (Surprising to see the "same cache line" part actually happens 5% of the time; the ITLB improvements make a lot more intuitive sense).
Do this using callgraph information collected continuously from production machines.

Then use same mechanism for keeping the very hottest code in huge pages. Can't do this universally, due to the tiny number of hugepage I-TLB entries.

(Excluding FreeBSD on their CDN servers, of course). Asked in the context of Gregg being an ex-Solaris hacker.

It's very easy for people to underestimate how big the cumulative effect from 20 years of even slightly faster improvements ends up being. E.g. were there any major enhancements to the Illumos TCP stack in this decade? If ther were, it's at least not obvious. Or (since I dug this post out due to a "Why would people run Linux instead of OpenBSD" discussion), anyone wanting to run a major Internet service on OpenBSD would probably need to hire 1-2 fulltime hackers to modernize the TCP implementation.

That's just the bit of operating systems I'm familiar with. But hard to believe it would somehow be a unique problem area./p>

The Gamecube had a GPU with some programmable parts, rather than being purely fixed-function. For Dolphin to emulate that, they need to compile the Gamecube GPU programs to modern GPU shaders. But this compilation takes time, and they don't know the set of needed shaders up front (it's fully dynamic). How do you solve that?

> But what if we don't have to rely on specialized shaders? The crazy idea was born to emulate the rendering pipeline itself with an interpreter that runs directly on the GPU as a set of monsterous flexible shaders.

The great thing about Dolphin updates is that they don't just explain what a new feature is; they explain what other solutions have been tried or proposed, and why those solutions don't actually work.

Branch PredictionHow compilers decide when to use branches vs. when to use conditional moves.

Bad Record Mac A user reports that loading web pages from a MirageOS server occasionally trips an "impossible" TLS error condition. Turns out the Xen network driver for Mirage had a bug that corrupted packets under sufficient load. Problem not caught by transport layer since the router accepted TCP packets with invalid checksums (no validation), and then updated the checksums by recomputing them from scratch rather doing incremental update.

Design problems of Solidity> Solidity has far worse problems than not being an advanced research language. Just being a sanely designed normal language would be a big step up. Solidity is so riddled with bizarre design errors it makes PHP 4 look like a work of genius.

It's the contract programming language for Ethereum, where bugs in contracts lead to $10M cyber-heists. And if you read the details, the quoted bit is actually a fair summary...