March 03, 2015

As soon as Rust 1.0 Alpha 2 was released, I downloaded the release and sat down to finally play with Rust. With the final release of 1.0 on the horizon, I decided that the language was finally mature enough to try writing some real code. My first test of the language would be to try to implement a program with deep UNIX heritage: a shell.

Writing a shell exercises a lot in a language—string manipulation, subprocesses, input/output, other parts of file handling, data structures, memory management, and more. Rust can call any function in libc, but the goal is to test out Rust, not UNIX, so anything in libc::* is off-limits. Additionally, using any function from libc is inherently unsafe. Rust's type system, memory-safety, and compile-time checks can only validate Rust code. A Rust program that uses features provided by a C library is only slightly less likely to crash due to a dangling pointer than a pure C program. Therefore, we limit ourselves to Rust's native libraries. The std crate is of particular interest. If std is sufficient to write a shell, then Rust is ready for production.

First, let's decide on the features the shell will support. The POSIX shell grammar is ridiculously complicated, and would require the use of a parser-generator to reasonably implement. There's preliminary ragel rust support, but it looks like it might not yet be updated to be compatible with the upcoming 1.0-final. ragel is also limited to regular languages, but shell grammar can't be parsed with regular expressions—it's context-free, since balancing parenthesis in "$((...))" calls for pushdown automata. There's a PEG plugin for rustc, which looks functional, and a yacc-like, but it's not usable yet.

With parser-generators still under active development for Rust, I decided to scale back the shell to something a bit more modest: UNIX v6 compatibility. To start with, I decided to match the Xv6shell, which is fairly simple. The Xv6 shell supports only command execution, file redirection (for stdin and stdout only), and pipelines. There's no support for quoting, subshells, or command lists—those are left as challenge problems for students.

After some time staring at documentation, I managed to write the bare bones of the shell: it was able to read a line of user input, split the command on spaces, and process the builtins cd and pwd. The recommendation to use std::old_io instead of std::io was confusing, but didn't make things any more difficult. Writing cd proved to be a bit harder, since extracting $HOME is provided by std::env::home_dir() but std::env still uses std::old_path, not std::path, which was not immediately visible from the documentation.Finally, it was time to add command invocation. It turns out that this is currently impossible without resorting to libc::* calls. Rust has a very rich API for spawning subprocesses in development, std::process::Command, but unfortunately it's lacking support for stdin/stdout redirection other than to /dev/null, inheriting it from the current process, and capturing the output. I filed a bug against the Rust RFCs requesting this feature, and I hope it gets considered before the final release. Until then, there are many common systems programming patterns and tasks that are impossible without resorting to non-Rust libraries.The shell-in-Rust project is currently on hold. Although this attempt was unsuccessful, I'm still really excited about Rust. I'll be keeping an eye on the Rust std::io and std::process RFCs and implementation, eagerly waiting for the necessary features to land.

January 31, 2015

tl;drA non-nameless term equipped with a map specifying a de Bruijn numbering can support an efficient equality without needing a helper function. More abstractly, quotients are not just for proofs: they can help efficiency of programs too.

The cut. You're writing a small compiler, which defines expressions as follows:

type Var = Int
data Expr = Var Var
| App Expr Expr
| Lam Var Expr

Where Var is provided from some globally unique supply. But while working on a common sub-expression eliminator, you find yourself needing to define equality over expressions.

You know the default instance won’t work, since it will not say that Lam 0 (Var 0) is equal to Lam 1 (Var 1). Your colleague Nicolaas teases you that the default instance would have worked if you used a nameless representation, but de Bruijn levels make your head hurt, so you decide to try to write an instance that does the right thing by yourself. However, you run into a quandary:

If v == v', things are simple enough: just check if e == e'. But if they're not... something needs to be done. One possibility is to renamee' before proceeding, but this results in an equality which takes quadratic time. You crack open the source of one famous compiler, and you find that in fact: (1) there is no Eq instance for terms, and (2) an equality function has been defined with this type signature:

eqTypeX :: RnEnv2 -> Type -> Type -> Bool

Where RnEnv2 is a data structure containing renaming information: the compiler has avoided the quadratic blow-up by deferring any renaming until we need to test variables for equality.

“Well that’s great,” you think, “But I want my Eq instance, and I don’t want to convert to de Bruijn levels.” Is there anything to do?

Perhaps a change of perspective in order:

The turn. Nicolaas has the right idea: a nameless term representation has a very natural equality, but the type you've defined is too big: it contains many expressions which should be equal but structurally are not. But in another sense, it is also too small.

Here is an example. Consider the term x, which is a subterm of λx. λy. x. The x in this term is free; it is only through the context λx. λy. x that we know it is bound. However, in the analogous situation with de Bruijn levels (not indexes—as it turns out, levels are more convenient in this case) we have 0, which is a subterm of λ λ 0. Not only do we know that 0 is a free variable, but we also know that it binds to the outermost enclosing lambda, no matter the context. With just x, we don’t have enough information!

If you know you don’t know something, you should learn it. If your terms don’t know enough about their free variables, you should equip them with the necessary knowledge:

The conclusion. What have we done here? We have quotiented a type—made it smaller—by adding more information. In doing so, we recovered a simple way of defining equality over the type, without needing to define a helper function, do extra conversions, or suffer quadratically worse performance.

Sometimes, adding information is the only way to get the minimal definition. This situation occurs in homotopy type theory, where equivalences must be equipped with an extra piece of information, or else it is not a mere proposition (has the wrong homotopy type). If you, gentle reader, have more examples, I would love to hear about them in the comments. We are frequently told that “less is more”, that the route to minimalism lies in removing things: but sometimes, the true path lies in adding constraints.

Postscript. In Haskell, we haven’t truly made the type smaller: I can distinguish two expressions which should be equivalent by, for example, projecting out the underlying Expr. A proper type system which supports quotients would oblige me to demonstrate that if two elements are equivalent under the quotienting equivalence relation, my elimination function can't observe it.

Postscript 2. This technique has its limitations. Here is one situation where I have not been able to figure out the right quotient: suppose that the type of my expressions are such that all free variables are implicitly universally quantified. That is to say, there exists some ordering of quantifiers on a and b such that a b is equivalent to b a. Is there a way to get the quantifiers in order on the fly, without requiring a pre-pass on the expressions using this quotienting technique? I don’t know!

January 08, 2015

It happens every so often that a program needs to figure out where its
executable file is located. A common situation is when you are writing a unit
test, and your test data is located in the same directory as your binary.

Unfortunately, POSIX does not provide any interface to do that, so if you
are writing portable C/C++ code, you are in for an adventure. Below are short
code snippets which do that on various platforms. Try to guess which ones work
with which platform.

Disclaimer: those are the snippets which I have collected from
various API references, StackOverflow answers and SDL source code. I have not
tested them, and you should not assume they actually always work correctly.

Variant 1

GetModuleFileName(NULL,path,sizeof(path));

Variant 2

path=getexecname();

Variant 3

/* This is, of course, an oversimplification. In practice, you would want to * do the link expansion recursively, or at least one level more deep than * this. */readlink("/proc/self/exe",path,sizeof(path))

The answers here are (highlight to show): #1 is
Windows, #2 is Solaris, #3 is Linux, #4 is FreeBSD, #5 is OS X, #6 could be
Python, Perl, PHP, Ruby (and probably others as well), #7 is C# and #8 is
Java.

In practice, many frameworks (like SDL or Qt) have already solved that
problem once, and you can just call SDL_GetBasePath or
QCoreApplication::applicationFilePath() to get the path you
need.

In the post, I explained that the reason this occurs is that unsafe FFI calls are not preemptible, so when unsafeBottom loops forever, the Haskell thread can't proceed.

This explanation would make perfect sense except for one problem: the code also hangs even when you run with the multi-threaded runtime system, with multiple operating system threads. David Barbour wrote in wondering if my claim that unsafe calls blocked the entire system was out of date. But the code example definitely does hang on versions of GHC as recent as 7.8.3. Based on the title of this post, can you guess the reason? If you think you know, what do these variants of the program do?

Change main = to main = runInUnboundThread

Change the second forkIO to forkOn 2

Add a yield before unsafeBottom, and another yield before print "Pass (not expected)"

The reason why the code blocks, or, more specifically, why the main thread blocks, is because the unsafe FFI call is unpreemptibly running on the operating system thread which the main thread is bound to. Recall, by default, the main thread runs in a bound operating system thread. This means that there is a specific operating system thread which must be used to run code in main. If that thread is blocked by an FFI call, the main thread cannot run, even if there are other worker threads available.

We can thus explain the variants:

main is run in an unbound thread, no blocking occurs, and thus the second print runs.

By default, a forked thread is run on the same capability as the thread that spawned it (this is good, because it means no synchronization is necessary) so forcing the bad FFI call to run on a different worker prevents it from blocking main.

Alternately, if a thread yields, it might get rescheduled on a different worker thread, which also prevents main from getting blocked.

So, perhaps the real moral of the story is this: be careful about unsafe FFI calls if you have bound threads. And note: every Haskell program has a bound thread: main!

December 04, 2014

I finally got around to upgrading to Utopic. A year ago I reported that gnome-settings-daemon no longer provided keygrabbing support. This was eventually reverted for Trusty, which kept everyone's media keys.

I'm sorry to report that in Ubuntu Utopic, the legacy keygrabber is no more:

It appears that the Unity team has forked gnome-settings-daemon into unity-settings-daemon (actually this fork happened in Trusty), and as of Utopic gnome-settings-daemon and gnome-control-center have been gutted in favor of unity-settings-daemon and unity-control-center. Which puts us back in the same situation as a year ago.

I don't currently have a solution for this (pretty big) problem. However, I have solutions for some minor issues which did pop up on the upgrade:

November 21, 2014

We have changed the MySQL default storage engine on sql.mit.edu from MyISAM to InnoDB. This change only affects newly created tables. Existing tables are unaffected, with the exception noted below of Trac databases. No user action is required.

InnoDB offers many improvements over MyISAM and has become the default engine upstream as of MySQL 5.5. Even though sql.mit.edu still runs MySQL 5.1, we made this change because it is required by Trac 1.0.2. We have also converted all Trac databases to InnoDB to let them continue working with Trac 1.0.2.

If you wish to take advantage of the new features provided by InnoDB, you can convert existing tables using the ALTERTABLE command. You can still create new MyISAM tables by passing the ENGINE=MyISAM parameter to CREATETABLE. You can also convert InnoDB tables back to MyISAM with ALTERTABLE.

Note that the version of InnoDB running on sql.mit.edu does not yet support FULLTEXT indexes. We expect this to be eventually fixed when we upgrade to MySQL 5.6 or later. Until then, tables that use FULLTEXT indexes should be left on the MyISAM engine.

November 15, 2014

Subtyping is one of those concepts that seems to makes sense when you first learn it (“Sure, convertibles are a subtype of vehicles, because all convertibles are vehicles but not all vehicles are convertibles”) but can quickly become confusing when function types are thrown into the mix. For example, if a is a subtype of b, is (a -> r) -> r a subtype of (b -> r) -> r? (If you know the answer to this question, this blog post is not for you!) When we asked our students this question, invariably some were lead astray. True, you can mechanically work it out using the rules, but what’s the intuition?

Maybe this example will help. Let a be tomatoes, and b be vegetables. a is a subtype of b if we can use an a in any context where we were expecting a b: since tomatoes are (culinary) vegetables, tomatoes are a subtype of vegetables.

What about a -> r? Let r be soup: then we can think of Tomato -> Soup as recipes for tomato soup (taking tomatoes and turning them into soup) and Vegetable -> Soup as recipes for vegetable soup (taking vegetables—any kind of vegetable—and turning them into soup). As a simplifying assumption, let's assume all we care about the result is that it’s soup, and not what type of soup it is.

What is the subtype relationship between these two types of recipes? A vegetable soup recipe is more flexible: you can use it as a recipe to make soup from tomatoes, since tomatoes are just vegetables. But you can’t use a tomato soup recipe on an eggplant. Thus, vegetable soup recipes are a subtype of tomato soup recipes.

This brings us to the final type: (a -> r) -> r. What is (Vegetable -> Soup) -> Soup? Well, imagine the following situation...

One night, Bob calls you up on the phone. He says, “Hey, I’ve got some vegetables left in the fridge, and I know your Dad was a genius when it came to inventing recipes. Do you know if he had a good soup recipe?”

“I don’t know...” you say slowly, “What kind of vegetables?”

“Oh, it’s just vegetables. Look, I’ll pay you back with some soup, just come over with the recipe!” You hear a click on the receiver.

You pore over your Dad’s cookbook and find a tomato soup recipe. Argh! You can’t bring this recipe, because Bob might not actually have tomatoes. As if on cue, the phone rings again. Alice is on the line: “The beef casserole recipe was lovely; I’ve got some tomatoes and was thinking of making some soup with them, do you have a recipe for that too?” Apparently, this happens to you a lot.

“In fact I do!” you turn back to your cookbook, but to your astonishment, you can’t find your tomato soup recipe any more. But you do find a vegetable soup recipe. “Will a vegetable soup recipe work?”

“Sure—I’m not a botanist: to me, tomatoes are vegetables too. Thanks a lot!”

You feel relieved too, because you now have a recipe for Bob as well.

Bob is a person who takes vegetable soup recipes and turns them into soup: he’s (Vegetable -> Soup) -> Soup. Alice, on the other hand, is a person who takes tomato soup recipes and turns them into soup: she’s (Tomato -> Soup) -> Soup. You could give Alice either a tomato soup recipe or a vegetable soup recipe, since you knew she had tomatoes, but Bob’s vague description of the ingredients he had on hand meant you could only bring a recipe that worked on all vegetables. Callers like Alice are easier to accommodate: (Tomato -> Soup) -> Soup is a subtype of (Vegetable -> Soup) -> Soup.

In practice, it is probably faster to formally reason out the subtyping relationship than it is to intuit it out; however, hopefully this scenario has painted a picture of why the rules look the way they do.

November 05, 2014

Last week, I found myself trying to update the software on a server that was still running Red Hat Enterprise Linux 5. My goal was to run an ASAN build of zephyrd, the Zephyr server to catch a heap corruption bug. ASAN is a feature of the clang compiler, which unfortunately wasn't available.

RHEL5, as well as the free derivative distributions that offer compatibility, was first released in 2007 with kernel 2.6.18; gcc 4.1.2; and glibc 2.5. Although RHEL5 is still "in support", receiving bug and security updates, it's not receiving new versions of the software it shipped it with.

clang provides binary downloads---getting a new version of clang onto this system should be easy! The first problem strikes: glibc is too old to support the binary clang, it depends on features from the newer version. Let's try compiling clang from source. Off goes ./configure...gcc 4.7 or newer required?! Oh no. Building clang would require going through building gcc just to get a new enough compiler. That's not worth the time.

At this point, I stepped back to re-evaluate my options. Although I had root access to the server, it wasn't a machine that I had full control over; the operations team owned it. On one hand, I could ask the team to reinstall it with something a bit more modern, but it would probably not happen until fleet-wide upgrades start. On the other, I could try to press onward with a larger hammer: install a chroot with a newer Enterprise Linux myself. Since the chroot was the more timely option, I decided to go try that first:

which promptly fails because the version of rpm in EL5 is too old. This can easily be fixed by using another host to build the chroot, as well as running yum install yum to get a minimal base image. Copying the base image over and running chroot . results in the dreaded FATAL: kernel too old message. No chroot tricks are going to work now; the version of glibc in EL7 assumes that the kernel is going to be much newer than the 2.6.18 that's installed.There's just one last thing to do before giving up and reinstalling the whole server: recompiling glibc for the older kernel. The glibc spec file around line 400 sets the minimum kernel version it supports:%define enablekernel 2.6.32by changing that to %define enablekernel 2.6.18we can build an EL7 glibc package that will make a base system suitable for building chroots that work on EL5 systems! The resulting base system with this glibc is the "Enterprise Linux Emergency Modernization Kit" because it makes it easy to get new software on ancient EL5 hosts. You can easily build your own by following these instructions, or you could download a CentOS7-based one now!

November 02, 2014

A few days ago, I was installing a Fedora 18 on a brand-new machine, which had two 1TB SATA hard drives which I wanted to put into a RAID1 (mirroring) configuration. mdadm makes managing software RAID on Linux easy, and the Fedora Installer (anaconda) has always had extremely powerful tools for setting up a RAID.

Anaconda also supports Logical Volume Manager (LVM), which is part of the recommended configuration for Fedora, because it adds lots of cool features, like being able to have non-contiguous partitions, moving partitions around the disk, between disks, and even spanning disks!

Combining the data redundant of mdadm with the partitioning features of LVM is a common strategy, but unfortunately one that's not longer supported in Fedora 18's Anaconda. In fact, there's an open bug on the subject, and a very heated discussion on the mailing lists.

The official solution is to forego the graphical installer entirely, and is to use a kickstart script -- Anaconda's automated install mode. After a lot of trial and error, I managed to get a kickstart that works, but only when the installer was booted with noefi. The kickstart was copied onto a FAT32-formatted flash drive as the file ks.cfg, and loaded with the boot parameter ks=hd:/sdc1/ks.cfgThis kickstart has one down side: it formats the entire raw disks, so it's not suitable for adding a Fedora installation to an existing LVM-with-mdadm configuration. I'm still investigating how to get anaconda to cooperate when the RAID and LVM is already configured.

October 27, 2014

On Linux and other UNIX system, a domain name is translated to an IP address using the resolver(3) family of functions. Usually, these are provided by libresolv, but on some systems are available directly in libc.

Over the last couple of days, I've been working to overhaul the code for Hesiod, a system for querying users, groups, and other useful information information in a networked computer environment through DNS. I've been testing the code on Linux, FreeBSD, Solaris, and OS X, all of which were able to successfully compile the library.To my surprise, on OS X, the configure script determined that the resolver functions exist in libc, and didn't include libresolv. When I went to run the Hesiod test suite, all I got was a linker error, that _res_9_init was not found. What.Let's write a simple program that calls res_init() and see what happens.

Loading ....

Compile it, no linker errors; run it, it works. What happened? Hold on, where's #include <resolv.h>, this could shouldn't have even compiled, let's add that back in and test again...now it breaks. Great.Some digging around resolv.h shows that the functions are all #define'd to different symbols names, which are only found in libresolv, but libsystem_info, one of the libraries libSystem depends on, provides most of the resolver(3)functions. Interestingly, it also seems to have implementations for dn_skipname, but the symbol name is __dn_skipname instead of _dn_skipname, so the linker can't find it.

Let's find out if these functions even work. I've written a simple C program that looks up an A record for the domain name you specify on the command line. We'll compile it twice, once without the resolv.h include and no libresolv, and once with, and observe the results.

Since all of the documentation on OS X says you're supposed to use libresolv, I wouldn't recommend depending on this behavior. Unfortunately, it means that if you're writing a configure script, you have to be careful to not just test if the resolver(3) functions exist in libc, but do so while including resolv.h.

October 18, 2014

I've wanted to do that for quite a while; the issue was that I could not
find a platform I liked. WordPress seemed like an overkill, many of more
minimalist solutions were written in Ruby, which I can't speak... In short, I
ended up writing my own blogging engine using Flask.

I am baffled by how well-developed web frameworks for Python are
these days. You see, even though I've spent quite a while improving various
parts of MediaWiki (which is decidedly a webapp), I've never actually worked
with any real web frameworks, even for PHP. MediaWiki was written
in vanilla PHP, back when it was an actual improvement over Perl. Over
time it grew its very own supporting framework-library; it includes
some quite crazy features, like a reverse-engineered IE6 content sniffer and a
Lua sandbox. At the same time it is missing some things which you would
normally consider vital, like an HTML templating system. So when I discovered
that I made a blog in just 300 lines of code, only ~100 of which were Python, I
was very confused and tried hard to figure out which important part I am
missing.

Most of the time I spent on this was debugging CSS to make sure that website
is responsive and works well on some reasonable subset of browsers. I still
have a feeling that I actually have no idea how CSS works (that is apparently
a somewhat widespread sentiment),
so if you find that something is broken for a browser/screen size combination
you use, please tell me.

October 03, 2014

systemd, to some Linux users, is a curse word. Dramatic change always is.

I was quite excited when it was first announced; features like proper dependency ordering and using cgroups to track service status meant I wouldn't have to write complex shell scripts for sysvinit. I even went out of my way to port the legacy boot scripts I had to systemd service units when I installed a Fedora 15 server. systemd as an init system quickly proved itself extremely capable and much more pleasant than Upstart.

systemd quickly began to expand in scope and started to provide replacement daemons for much of the traditional UNIX stack. Unification of these services is against the "UNIX way", which promotes the philosophy of doing one thing, but doing it well. This upset many system administrators, developers, and users; breaking away from a traditional philosophy quickly made it a polarizing technology.

No systemd component has been more polarizing than the Journal. journald is a replacement for the traditional syslog, but with one major twist: it's a binary file format. System Administrators everywhere were in an instant uproar—grep won't work, you now have to worry about the journals being corrupted, and traditional log analytics tools won't work on the files. I was skeptical as well. Log files are a sysadmin's primary debugging tool, and if they're not available or corrupt, they're worse than useless. journald has gotten in my way in a few occasions, swallowing messages intended for logs processing scripts.

A log like no other

Nonetheless, I continued to run journald on my Fedora systems, to keep an eye on features and improvements. One feature that recently caught my eye was the per-user journal.

A web host I help maintain, Scripts, had a long-standing bug to improve user access to error logs from Apache. Scripts has over 4000 users, but limited disk space, and an aggressive log rotation policy. We had a command called logview that was effectively a setuid binary that ran grep $USER error_log, which worked great if your application always prefixed its logs with the username. When the majority of users ran PHP applications, we included a custom PHP module to do exactly that. This works less well for custom Django or Rails applications, since there wasn't a way for us to drop in a global plugin. The Scripts Team was forced to inspect the error log by hand to identify issues when users got the dreaded 500 Internal Server Error.

When we deployed Scripts on Fedora 20, I decided to dig into the systemd journal. Since Scripts uses suexec for privilege separation, I was hopeful I could configure the webapps to log to the Journal and rely on it to record the User ID. Initially, I ran into the same problem that caused me to think per-user journals were a broken feature: I couldn't cause anything to be logged. That turned out to be a configuration problem—journald defaulted to SplitMode=login—I just couldn't see the entries that I was logging as normal user. Changing journald.conf to include SplitMode=uid fixed that issue.

I then modified suexec to redirect stderr, ignoring the Apache-provided handle to the error_log, to instead use a file descriptor provided by the systemd APIs. The patches to suexecare available. Modifying suexec is generally frowned upon, as it's an extremely critical and easy-to-get-wrong piece of security software. The Scripts Team discussed at length what the security implications of this patch was before we were comfortable deploying it into production. The primary things we were concerned with was that arbitrary code would be able to log to the journal. We made sure to open the connection to journald only after all privileges have been dropped and to disable level-prefix parsing to reduce the attack surface.

The end result of these changes was spectacular: for the first time in Scripts' 10 year history, we could reliably catch every byte printed to stderr by a user's application. A simple invocation of journalctl --user, or our legacy logview wrapper, presented the user with a timestamped, readable, searchable error log.It's a lot better for the system administrators as well. Being able to reliably filter out an individual user's logs while handling a support request removes a substantial portion of the guesswork that the fuzzy grep used to provide.The best part is that this is the only change we had to make to provide effective logging—no more language or framework specific hacks. journald natively files the messages in the correct location.

Change

The systemd journal definitely takes some getting used to. I wouldn't yet recommend running a system without either rsyslog or syslog-ng in addition to the journal. That said, systemd is coming to a system near you: Fedora and Arch already ship systemd, and Debian has selected systemd as the default init system for Debian 8.0 (jessie). I think Debian made the right choice, and I look forward to having a unified, powerful init system across all of the distributions.

September 07, 2014

This year at ICFP, we had some blockbuster attendance to the Haskell Implementor's Workshop (at times, it was standing room only). I had the pleasure of presenting the work I had done over the summer on Backpack.

Michael Adams: Extending "Optimize your SYB" (lightning talk) (In the original paper, they suggest speedups can be gained by aggressively evaluating expressions of type TypeRep, TyCon, Data and Typeable at compile time. Michael was wondering if there were any other types which should receive similar treatment. One audience-member suggested Int (i.e. to get rid of boxing), but I don’t find that very convincing.)

September 04, 2014

One of the major open problems for building a module system in Haskell
is the treatment of type classes, which I have discussed previously
on this blog. I've noted how the current mode of use in type classes in
Haskell assume “global uniqueness”, which is inherently anti-modular;
breaking this assumption risks violating the encapsulation of many
existing data types.

As if we have a choice.

In fact, our hand is forced by the presence of open type families
in Haskell, which are feature many similar properties to type classes,
but with the added property that global uniqueness is required for
type safety. We don't have a choice (unless we want type classes
with associated types to behave differently from type classes): we
have to figure out how to reconcile the inherent non-modularity of
type families with the Backpack module system.

In this blog post, I want to carefully lay out why open type families
are inherently unmodular and propose some solutions for managing this
unmodularity. If you know what the problem is, you can skip the first
two sections and go straight to the proposed solutions section.

Before we talk about open type family instances, it's first worth
emphasizing the (intuitive) fact that a signature of a module is supposed
to be able to hide information about its implementation. Here's a simple
example:

module A where
x :: Int
module B where
import A
y = 0
z = x + y

Here, A is a signature, while B is a module which imports the
signature. One of the points of a module system is that we should be
able to type check B with respect to A, without knowing anything
about what module we actually use as the implementation. Furthermore,
if this type checking succeeds, then for any implementation which
provides the interface of A, the combined program should also type
check. This should hold even if the implementation of A defines other
identifiers not mentioned in the signature:

module A where
x = 1
y = 2

If B had directly imported this implementation, the identifier y
would be ambiguous; but the signature filtered out the declarations so
that B only sees the identifiers in the signature.

With this in mind, let's now consider the analogous situation with
open type families. Assuming that we have some type family F defined
in the prelude, we have the same example:

If we view this example with the glasses off, we might conclude that it
is a permissible implementation. After all, the implementation of A
provides an extra type instance, yes, but when this happened previously
with a (value-level) declaration, it was hidden by the signature.

But if put our glasses on and look at the example as a whole, something
bad has happened: we're attempting to use the integer 42 as a function
from integers to booleans. The trouble is that F Bool has been
given different types in the module A and module B, and this is
unsound... like, segfault unsound. And if we think about it some
more, this should not be surprising: we already knew it was unsound to
have overlapping type families (and eagerly check for this), and
signature-style hiding is an easy way to allow overlap to sneak in.

The distressing conclusion: open type families are not modular.

So, what does this mean? Should we throw our hands up and give up
giving Haskell a new module system? Obviously, we’re not going
to go without a fight. Here are some ways to counter the problem.

The basic proposal: require all instances in the signature

The simplest and most straightforward way to solve the unsoundness is to
require that a signature mention all of the family instances that are
transitively exported by the module. So, in our previous example, the
implementation of A does not satisfy the signature because it has an
instance which is not mentioned in the signature, but would satisfy this
signature:

module A where
type instance F Int
type instance F Bool

While at first glance this might not seem too onerous, it's important to
note that this requirement is transitive. If A happens to import
another module Internal, which itself has its own type family
instances, those must be represented in the signature as well. (It's
easy to imagine this spinning out of control for type classes, where any
of the forty imports at the top of your file may be bringing in any
manner of type classes into scope.) There are two major user-visible
consequences:

Module imports are not an implementation detail—you need to replicate
this structure in the signature file, and

Adding instances is always a backwards-incompatible change (there
is no weakening).

Of course, as Richard pointed out to me, this is already the case for
Haskell programs (and you just hoped that adding that one extra instance
was "OK").

Despite its unfriendliness, this proposal serves as the basis for the
rest of the proposals, which you can conceptualize as trying to characterize,
“When can I avoid having to write all of the instances in my signature?”

Extension 1: The orphan restriction

While it is true that these two type instances are overlapping and
rightly rejected, they are not equally at fault:
in particular, the instance in module B is an orphan. An orphan
instance is an instance for type class/family F and data type T
(it just needs to occur anywhere on the left-hand side) which lives in a
module that defines neither. (A is not an orphan since the instance
lives in the same module as the definition of data type T).

What we might wonder is, “If we disallowed all orphan instances, could
this rule out the possibility of overlap?” The answer is, “Yes! (...with
some technicalities).” Here are the rules:

The signature must mention all what we will call ragamuffin instances transitively
exported by implementations being considered. An instance of a
family F is a ragamuffin if it is not defined with the family
definition, or with the type constructor at the head in the first
parameter. (Or some specific parameter, decided on a per-family basis.)
All orphan instances are ragamuffins, but not all ragamuffins are
orphans.

A signature exporting a type family must mention all instances which
are defined in the same module as the definition of the type family.

It is strictly optional to mention non-ragamuffin instances in
a signature.

(Aside: I don't think this is the most flexible version of the rule
that is safe, but I do believe it is the most straightforward.)
The whole point of these rules is to make it impossible to write an
overlapping instance, while only requiring local checking when an
instance is being written. Why did we need to strengthen the orphan
condition into a ragamuffin condition to get this non-overlap? The
answer is that absence of orphans does not imply absence of overlap, as
this simple example shows:

Here, the two instances of F are overlapping, but neither are
orphans (since their left-hand sides mention a data type which was
defined in the module.) However, the B instance is a ragamuffin
instance, because B is not mentioned in the first argument of
F. (Of course, it doesn't really matter if you check the first
argument or the second argument, as long as you're consistent.)

Another way to think about this rule is that open type family instances
are not standalone instances but rather metadata that is associated with
a type constructor when it is constructed. In this way,
non-ragamuffin type family instances are modular!

A major downside of this technique, however, is that it doesn't really
do anything for the legitimate uses of orphan instances in the Haskell
ecosystem: when third-parties defined both the type family (or type
class) and the data type, and you need the instance for your own purposes.

Extension 2: Orphan resolution

This proposal is based off of one that Edward Kmett has been floating
around, but which I've refined. The motivation is to give a better
story for offering the functionality of orphan instances without gunking
up the module system. The gist of the proposal is to allow the package
manager to selectively enable/disable orphan definitions; however, to
properly explain it, I'd like to do first is describe a few situations
involving orphan type class instances. (The examples use type classes
rather than type families because the use-cases are more clear. If
you imagine that the type classes in question have associated types,
then the situation is the same as that for open type families.)

The story begins with a third-party library which defined a data type T
but did not provide an instance that you needed:

Morally, we'd like to hide the orphan instance when the real instance
is available: there are two variations of MyApp which we want to
transparently switch between: one which defines the orphan instance,
and one which does not and uses the non-orphan instance defined in
the Data.Foo. The choice depends on which foo was chosen,
a decision made by the package manager.

Let's mix things up a little. There is no reason the instance has to be
a non-orphan coming from Data.Foo. Another library might have
defined its own orphan instance:

It's a bit awful to get this to work with preprocessor macros, but
there are two ways we can manually resolve the overlap: we can erase the
orphan instance from MyOtherApp, or we can erase the orphan instance
from MyApp. A priori, there is no reason to prefer one or the
other. However, depending on which one is erased, Main may have to
be compiled differently (if the code in the instances is different).
Furthermore, we need to setup a new (instance-only) import between the
module who defines the instance to the module whose instance was erased.

There are a few takeaways from these examples. First, the most natural
way of resolving overlapping orphan instances is to simply “delete” the
overlapping instances; however, which instance to delete is a global
decision. Second, which overlapping orphan instances are enabled
affects compilation: you may need to add module dependencies to be able
to compile your modules. Thus, we might imagine that a solution allows
us to do both of these, without modifying source code.

Here is the game plan: as before, packages can define orphan instances.
However, the list of orphan instances a package defines is part of the
metadata of the package, and the instance itself may or may not be used
when we actually compile the package (or its dependencies). When we do
dependency resolution on a set of packages, we have to consider the set
of orphan instances being provided and only enable a set which is
non-overlapping, the so called orphan resolution. Furthermore, we
need to add an extra dependency from packages whose instances were
disabled to the package who is the sole definer of an instance (this
might constrain which orphan instance we can actually pick as the
canonical instance).

The nice thing about this proposal is that it solves an already existing
pain point for type class users, namely defining an orphan type class
instance without breaking when upstream adds a proper instance. But you
might also think of it as a big hack, and it requires cooperation from
the package manager (or some other tool which manages the orphan resolution).

The extensions to the basic proposal are not mutually exclusive, but
it's an open question whether or not the complexity they incur are
worth the benefits they bring to existing uses of orphan instances.
And of course, there may other ways of solving the problem
which I have not described here, but this smorgasbord seems to be the
most plausible at the moment.

At ICFP, I had an interesting conversation with Derek Dreyer, where he
mentioned that when open type families were originally going into GHC,
he had warned Simon that they were not going to be modular. With the
recent addition of closed type families, many of the major use-cases for
open type families stated in the original paper have been superseded.
However, even if open type families had never been added to Haskell, we
still might have needed to adopt these solutions: the global uniqueness
of instances is deeply ingrained in the Haskell community, and even if
in some cases we are lax about enforcing this constraint, it doesn't
mean we should actively encourage people to break it.

I have a parting remark for the ML community, as type classes make their
way in from Haskell: when you do get type classes in your language,
don’t make the same mistake as the Haskell community and start using
them to enforce invariants in APIs. This way leads to the global
uniqueness of instances, and the loss of modularity may be too steep a
price to pay.

Postscript. One natural thing to wonder, is if overlapping type family instances are OK if one of the instances “is not externally visible.” Of course, the devil is in the details; what do we mean by external visibility of type family instances of F?

For some definitions of visibility, we can find an equivalent, local transformation which has the same effect. For example, if we never use the instance at all, it certainly OK to have overlap. In that case, it would also have been fine to delete the instance altogether. As another example, we could require that there are no (transitive) mentions of the type family F in the signature of the module. However, eliminating the mention of the type family requires knowing enough parameters and equations to reduce: in which case the type family could have been replaced with a local, closed type family.

One definition that definitely does not work is if F can be mentioned with some unspecified type variables. Here is a function which coerces an Int into a function:

...the point being that, even if a signature doesn't directly mention the overlapping instance F Int, type refinement (usually by some GADT-like structure) can mean that an offending instance can be used internally.

August 26, 2014

So perhaps you've bought into modules and modularity and want to get to using Backpack straightaway. How can you do it? In this blog post, I want to give a tutorial-style taste of how to program Cabal in the Backpack style. None of these examples are executable, because only some of this system is in GHC HEAD--the rest are on branches awaiting code review or complete vaporware. However, we've got a pretty good idea how the overall design and user experience should go, and so the purpose of this blog post is to communicate that idea. Comments and suggestions would be much appreciated; while the design here is theoretically well-founded, for obvious reasons, we don't have much on-the-ground programmer feedback yet.

A simple package in today's Cabal

To start, let's briefly review how Haskell modules and Cabal packages work today. Our running example will be the bytestring package, although I'll inline, simplify and omit definitions to enhance clarity.

Let's suppose that you are writing a library, and you want to use efficient, packed strings for some binary processing you are doing. Fortunately for you, the venerable Don Stewart has already written a bytestring package which implements this functionality for you. This package consists of a few modules: an implementation of strict ByteStrings...

These modules are packaged up into a package which is specified using a Cabal file (for now, we'll ignore the ability to define libraries/executables in the same Cabal file and assume everything is in a library):

It's worth noting a few things about this completely standard module setup:

It's not possible to switch Utils from using lazy ByteStrings to strict ByteStrings without literally editing the Utils module. And even if you do that, you can't have Utils depending on strict ByteString, and Utils depending on lazy ByteString, in the same program, without copying the entire module text. (This is not too surprising, since the code really is different.)

Nevertheless, there is some amount of indirection here: while Utils includes a specific ByteString module, it is unspecified which version of ByteString it will be. If (hypothetically) the bytestring library released a new version where lazy byte-strings were actually strict, the functionality of Utils would change accordingly when the user re-ran dependency resolution.

I used a qualified import to refer to identifiers in Data.ByteString.Lazy. This is a pretty common pattern when developing Haskell code: we think of B as an alias to the actual model. Textually, this is also helpful, because it means I only have to edit the import statement to change which ByteString I refer to.

Generalizing Utils with a signature

To generalize Utils with some Backpack magic, we need to create a signature for ByteString, which specifies what the interface of the module providing ByteStrings is. Here one such signature, which is placed in the file Data/ByteString.hsig inside the utilities package:

Notice that there have been three changes: (1) We've removed the direct dependency on the bytestring package, (2) we've added a new field indefinite, which indicates that this indefinite package has signatures and cannot be compiled until those signatures are filled in with implementations (this field is strictly redundant, but is useful for documentation purposes, as we will see later), and (3) we have a new field required-signatures which simply lists the names of the signature files (also known as holes) that we need filled in.

How do we actually use the utilities package, then? Let's suppose our goal is to produce a new module, Utils.Strict, which is Utils but using strict ByteStrings (which is exported by the bytestring package under the module name Data.ByteString). To do this, we'll need to create a new package:

That's it! strict-utilities exports a single module Utils.Strict which is utilities using Data.ByteString from bytestring (which is the strict implementation). This is called a mix-in: in the same dependency list, we simply mix together:

utilities, which requires a module named Data.ByteString, and

bytestring, which supplies a module named Data.ByteString.

Cabal automatically figures out that how to instantiate the utilities package by matching together module names. Specifically, the two packages above are connected through the module name Data.ByteString. This makes for a very convenient (and as it turns out, expressive) mode of package instantiation. By the way, reexported-modules is a new (orthogonal) feature which lets us reexport a module from the current package or a dependency to the outside world under a different name. The modules that are exported by the package are the exposed-modules and the reexported-modules. The reason we distinguish them is to make clear which modules have source code in the package (exposed-modules).

Unusually, strict-utilities is a package that contains no code! Its sole purpose is to mix existing packages.

Now, you might be wondering: how do we instantiate utilities with the lazy ByteString implementation? That implementation was put in Data.ByteString.Lazy, so the names don't match up. In this case, we can use another new feature, module thinning and renaming:

The utilities dependency is business as usual, but bytestring has a little parenthesized expression next to it. This expression is the thinning and renaming applied to the package import: it controls what modules are brought into the scope of the current package from a dependency, possibly renaming them to different names. When I write build-depends: bytestring (Data.ByteString.Lazy as Data.ByteString), I am saying "I depend on the bytestring package, but please only make the Data.ByteString.Lazy module available under the name Data.ByteString when considering module imports, and ignore all the other exposed modules." In strict-utilities, you could have also written bytestring (Data.ByteString), because this is the only module that utilities uses from bytestring.

Instead of renaming the implementation, I renamed the hole! It's equivalent: the thing that matters it that the signature and implementation need to be mixed under the same name in order for linking (the instantiation of the signature with the implementation) to occur.

There are a few things to note about signature usage:

If you are using a signature, there's not much point in also specifying an explicit import list when you import it: you are guaranteed to only see types and definitions that are in the signature (modulo type classes... a topic for another day). Signature files act like a type-safe import list which you can share across modules.

A signature can, and indeed often must, import other modules. In the type signature for singleton in Data/ByteString.hsig, we needed to refer to a type Word8, so we must bring it into scope by importing Data.Word.

Now, when we compile the signature in the utilities package, we need to know where Data.Word came from. It could have come from another signature, but in this case, it's provided by the definite package base: it's a proper concrete module with an implementation! Signatures can depend on implementations: since we can only refer to types from those modules, we are saying, in effect: any implementation of the singleton function and any representation of the ByteString type is acceptable, but regarding Word8 you must use the specific type from Data.Word in prelude.

What happens if, independently of my packages strict-utilities, someone else also instantiatiates utilities with Data.ByteString? Backpack is clever enough to reuse the instantiation of utilities: this property is called applicativity of the module system. The specific rule that we use to decide if the instantiation is the same is to look at how all of the holes needed by a package are instantiated, and if they are instantiated with precisely the same modules, the instantiated packages are considered type equal. So there is no need to actually create strict-utilities or lazy-utilities: you can just instantiate utilities on the fly.

Sharing signatures

It's all very nice to be able to explicitly write a signature for Data.ByteString in my package, but this could get old if I have to do this for every single package I depend on. It would be much nicer if I could just put all my signatures in a package and include that when I want to share it. I want all of the Hackage mechanisms to apply to my signatures as well as my normal packages (e.g. versioning). Well, you can!

The author of bytestring can write a bytestring-sig package which contains only signatures:

The implements fields is purely advisory: it offers a proactive check to library authors to make sure they aren't breaking compatibility with signatures, and it also helps Cabal offer suggestions for how to provide implementations for signatures.

Now, utilities can include this package to indicate its dependence on the signature:

Utils/Extra.hs defined in this package can import Utils (because it's exposed by utilities) but can't import Data.ByteString (because it's not exposed). Had we said reexported-modules: Data.ByteString in utilities, then Data.ByteString would have been accessible.

Do note, however, that the package is still indefinite (since it depends on an indefinite package). Despite Data.ByteString being "private" to utilities (not importable), a client may still refer to it in a renaming clause in order to instantiate the module:

Summary

We've covered a lot of ground, but when it comes down to it, Backpack really comes together because of set of orthogonal features which interact in a good way:

Module signatures (mostly implemented but needs lots of testing): the heart of a module system, giving us the ability to write indefinite packages and mix together implementations,

Module reexports (fully implemented and in HEAD): the ability to take locally available modules and reexport them under a different name, and

Module thinning and renaming (fully implemented and in code review): the ability to selectively make available modules from a dependency.

To compile a Backpack package, we first run the traditional version dependency solving, getting exact versions for all packages involved, and then we calculate how to link the packages together. That's it! In a future blog post, I plan to more comprehensively describe the semantics of these new features, especially module signatures, which can be subtle at times. Also, note that I've said nothing about how to type-check against just a signature, without having any implementation in mind. As of right now, this functionality is vaporware; in a future blog post, I also plan on saying why this is so challenging.

On Wednesday, August 27th, 2014, all of the scripts.mit.edu servers will be upgraded from Fedora 17 to Fedora 20, which was released on December 17. We strongly encourage you to test your website as soon as possible, and to contact us at scripts@mit.edu or come to our office in W20-557 if you experience any problems. The easiest way to test your site is to run the following commands at an Athena workstation and then visit your website in the browser that opens, but see this page for more details and important information:

August 21, 2014

Why are there so many goddamn package managers? They sprawl across both operating systems (apt, yum, pacman, Homebrew) as well as for programming languages (Bundler, Cabal, Composer, CPAN, CRAN, CTAN, EasyInstall, Go Get, Maven, npm, NuGet, OPAM, PEAR, pip, RubyGems, etc etc etc). "It is a truth universally acknowledged that a programming language must be in want of a package manager." What is the fatal attraction of package management that makes programming language after programming language jump off this cliff? Why can't we just, you know, reuse an existing package manager?

You can probably think of a few reasons why trying to use apt to manage your Ruby gems would end in tears. "System and language package managers are completely different! Distributions are vetted, but that's completely unreasonable for most libraries tossed up on GitHub. Distributions move too slowly. Every programming language is different. The different communities don't talk to each other. Distributions install packages globally. I want control over what libraries are used." These reasons are all right, but they are missing the essence of the problem.

The fundamental problem is that programming languages package management is decentralized.

This decentralization starts with the central premise of a package manager: that is, to install software and libraries that would otherwise not be locally available. Even with an idealized, centralized distribution curating the packages, there are still two parties involved: the distribution and the programmer who is building applications locally on top of these libraries. In real life, however, the library ecosystem is further fragmented, composed of packages provided by a huge variety of developers. Sure, the packages may all be uploaded and indexed in one place, but that doesn't mean that any given author knows about any other given package. And then there's what the Perl world calls DarkPAN: the uncountable lines of code which probably exist, but which we have no insight into because they are locked away on proprietary servers and source code repositories. Decentralization can only be avoided when you control absolutely all of the lines of code in your application.. but in that case, you hardly need a package manager, do you? (By the way, my industry friends tell me this is basically mandatory for software projects beyond a certain size, like the Windows operating system or the Google Chrome browser.)

Decentralized systems are hard. Really, really hard. Unless you design your package manager accordingly, your developers will fall into dependency hell. Nor is there a one "right" way to solve this problem: I can identify at least three distinct approaches to the problem among the emerging generation of package managers, each of which has their benefits and downsides.

Pinned versions. Perhaps the most popular school of thought is that developers should aggressively pin package versions; this approach advocated by Ruby's Bundler, PHP's Composer, Python's virtualenv and pip, and generally any package manager which describes itself as inspired by the Ruby/node.js communities (e.g. Java's Gradle, Rust's Cargo). Reproduceability of builds is king: these package managers solve the decentralization problem by simply pretending the ecosystem doesn't exist once you have pinned the versions. The primary benefit of this approach is that you are always in control of the code you are running. Of course, the downside of this approach is that you are always in control of the code you are running. An all-to-common occurrence is for dependencies to be pinned, and then forgotten about, even if there are important security updates to the libraries involved. Keeping bundled dependencies up-to-date requires developer cycles--cycles that more often than not are spent on other things (like new features).

A stable distribution. If bundling requires every individual application developer to spend effort keeping dependencies up-to-date and testing if they keep working with their application, we might wonder if there is a way to centralize this effort. This leads to the second school of thought: to centralize the package repository, creating a blessed distribution of packages which are known to play well together, and which will receive bug fixes and security fixes while maintaining backwards compatibility. In programming languages, this is much less common: the two I am aware of are Anaconda for Python and Stackage for Haskell. But if we look closely, this model is exactly the same as the model of most operating system distributions. As a system administrator, I often recommend my users use libraries that are provided by the operating system as much as possible. They won't take backwards incompatible changes until we do a release upgrade, and at the same time you'll still get bugfixes and security updates for your code. (You won't get the new hotness, but that's essentially contradictory with stability!)

Embracing decentralization. Up until now, both of these approaches have thrown out decentralization, requiring a central authority, either the application developer or the distribution manager, for updates. Is this throwing out the baby with the bathwater? The primary downside of centralization is the huge amount of work it takes to maintain a stable distribution or keep an individual application up-to-date. Furthermore, one might not expect the entirety of the universe to be compatible with one another, but this doesn't stop subsets of packages from being useful together. An ideal decentralized ecosystem distributes the problem of identifying what subsets of packages work across everyone participating in the system. Which brings us to the fundamental, unanswered question of programming languages package management:

How can we create a decentralized package ecosystem that works?

Here are a few things that can help:

Stronger encapsulation for dependencies. One of the reasons why dependency hell is so insidious is the dependency of a package is often an inextricable part of its outwards facing API: thus, the choice of a dependency is not a local choice, but rather a global choice which affects the entire application. Of course, if a library uses some library internally, but this choice is entirely an implementation detail, this shouldn't result in any sort of global constraint. Node.js's NPM takes this choice to its logical extreme: by default, it doesn't deduplicate dependencies at all, giving each library its own copy of each of its dependencies. While I'm a little dubious about duplicating everything (it certainly occurs in the Java/Maven ecosystem), I certainly agree that keeping dependency constraints local improves composability.

Advancing semantic versioning. In a decentralized system, it's especially important that library writers give accurate information, so that tools and users can make informed decisions. Wishful, invented version ranges and artistic version number bumps simply exacerbate an already hard problem (as I mentioned in my previous post). If you can enforce semantic versioning, or better yet, ditch semantic versions and record the true, type-level dependency on interfaces, our tools can make better choices. The gold standard of information in a decentralized system is, "Is package A compatible with package B", and this information is often difficult (or impossible, for dynamically typed systems) to calculate.

Centralization as a special-case. The point of a decentralized system is that every participant can make policy choices which are appropriate for them. This includes maintaining their own central authority, or deferring to someone else's central authority: centralization is a special-case. If we suspect users are going to attempt to create their own, operating system style stable distributions, we need to give them the tools to do so... and make them easy to use!

For a long time, the source control management ecosystem was completely focused on centralized systems. Distributed version control systems such as Git fundamentally changed the landscape: although Git may be more difficult to use than Subversion for a non-technical user, the benefits of decentralization are diverse. The Git of package management doesn't exist yet: if someone tells you that package management is solved, just reimplement Bundler, I entreat you: think about decentralization as well!

August 09, 2014

This summer, I've been working at Microsoft Research implementing Backpack, a module system for Haskell. Interestingly, Backpack is not really a single monolothic feature, but, rather, an agglomeration of small, infrastructural changes which combine together in an interesting way. In this series of blog posts, I want to talk about what these individual features are, as well as how the whole is greater than the sum of the parts.

But first, there's an important question that I need to answer: What's a module system good for anyway? Why should you, an average Haskell programmer, care about such nebulous things as module systems and modularity. At the end of the day, you want your tools to solve specific problems you have, and it is sometimes difficult to understand what problem a module system like Backpack solves. As tomejaguar puts it: "Can someone explain clearly the precise problem that Backpack addresses? I've read the paper and I know the problem is 'modularity' but I fear I am lacking the imagination to really grasp what the issue is."

Look no further. In this blog post, I want to talk concretely about problems Haskellers have today, explain what the underlying causes of these problems are, and say why a module system could help you out.

The String, Text, ByteString problem

As experienced Haskellers arewellaware, there are multitude of string types in Haskell: String, ByteString (both lazy and strict), Text (also both lazy and strict). To make matters worse, there is no one "correct" choice of a string type: different types are appropriate in different cases. String is convenient and native to Haskell'98, but very slow; ByteString is fast but are simply arrays of bytes; Text is slower but Unicode aware.

In an ideal world, a programmer might choose the string representation most appropriate for their application, and write all their code accordingly. However, this is little solace for library writers, who don't know what string type their users are using! What's a library writer to do? There are only a few choices:

They "commit" to one particular string representation, leaving users to manually convert from one representation to another when there is a mismatch. Or, more likely, the library writer used the default because it was easy. Examples: base (uses Strings because it completely predates the other representations), diagrams (uses Strings because it doesn't really do heavy string manipulation).

They can provide separate functions for each variant, perhaps identically named but placed in separate modules. This pattern is frequently employed to support both strict/lazy variants Text and ByteStringExamples: aeson (providing decode/decodeStrict for lazy/strict ByteString), attoparsec (providing Data.Attoparsec.ByteString/Data.Attoparsec.ByteString.Lazy), lens (providing Data.ByteString.Lazy.Lens/Data.ByteString.Strict.Lens).

They can use type-classes to overload functions to work with multiple representations. The particular type class used hugely varies: there is ListLike, which is used by a handful of packages, but a large portion of packages simply roll their own. Examples: SqlValue in HDBC, an internal StringLike in tagsoup, and yet another internal StringLike in web-encodings.

The last two methods have different trade offs. Defining separate functions as in (2) is a straightforward and easy to understand approach, but you are still saying no to modularity: the ability to support multiple string representations. Despite providing implementations for each representation, the user still has to commit to particular representation when they do an import. If they want to change their string representation, they have to go through all of their modules and rename their imports; and if they want to support multiple representations, they'll still have to write separate modules for each of them.

Using type classes (3) to regain modularity may seem like an attractive approach. But this approach has both practical and theoretical problems. First and foremost, how do you choose which methods go into the type class? Ideally, you'd pick a minimal set, from which all other operations could be derived. However, many operations are most efficient when directly implemented, which leads to a bloated type class, and a rough time for other people who have their own string types and need to write their own instances. Second, type classes make your type signatures more ugly String -> String to StringLike s => s -> s and can make type inference more difficult (for example, by introducing ambiguity). Finally, the type class StringLike has a very different character from the type class Monad, which has a minimal set of operations and laws governing their operation. It is difficult (or impossible) to characterize what the laws of an interface like this should be. All-in-all, it's much less pleasant to program against type classes than concrete implementations.

Wouldn't it be nice if I could import String, giving me the type String and operations on it, but then later decide which concrete implementation I want to instantiate it with? This is something a module system can do for you! This Reddit thread describes a number of other situations where an ML-style module would come in handy.

(PS: Why can't you just write a pile of preprocessor macros to swap in the implementation you want? The answer is, "Yes, you can; but how are you going to type check the thing, without trying it against every single implementation?")

Destructive package reinstalls

Have you ever gotten this error message when attempting to install a
new package?

$ cabal install hakyll
cabal: The following packages are likely to be broken by the reinstalls:
pandoc-1.9.4.5
Graphalyze-0.14.0.0
Use --force-reinstalls if you want to install anyway.

Somehow, Cabal has concluded that the only way to install hakyll is to reinstall some dependency. Here's one situation where a situation like this could come about:

pandoc and Graphalyze are compiled against the latest unordered-containers-0.2.5.0, which itself was compiled against the latest hashable-1.2.2.0.

hakyll also has a dependency on unordered-containers and hashable, but it has an upper bound restriction on hashable which excludes the latest hashable version. Cabal decides we need to install an old version of hashable, say hashable-0.1.4.5.

If hashable-0.1.4.5 is installed, we also need to build unordered-containers against this older version for Hakyll to see consistent types. However, the resulting version is the same as the preexisting version: thus, reinstall!

The root cause of this error an invariant Cabal
currently enforces on a package database: there can only be one instance
of a package for any given package name and version. In particular,
this means that it is not possible to install a package multiple times,
compiled against different dependencies. This is a bit troublesome,
because sometimes you really do want the same package installed multiple
times with different dependencies: as seen above, it may be the only way to fulfill
the version bounds of all packages involved.
Currently, the only way to work around this problem is to use a Cabal sandbox (or blow away your package database and reinstall everything, which is basically the same thing).

You might be wondering, however, how could a module system possibly help with this? It doesn't... at least, not directly. Rather, nondestructive reinstalls of a package are a critical feature for implementing a module system like Backpack (a package may be installed multiple times with different concrete implementations of modules). Implementing Backpack necessitates fixing this problem, moving Haskell's package management a lot closer to that of Nix's or NPM.

Version bounds and the neglected PVP

While we're on the subject of cabal-install giving errors, have you ever gotten this error attempting to install a new package?

There are a number of possible reasons why this could occur, but usually
it's because some of the packages involved have over-constrained version
bounds (especially upper bounds), resulting in an unsatisfiable set of constraints. To
add insult to injury, often these bounds have no grounding in reality (the package author
simply guessed the range) and removing
it would result in a working compilation. This situation is
so common that Cabal has a flag --allow-newer which lets you
override the upper bounds of packages. The annoyance of managing bounds has lead to the development of tools like cabal-bounds, which try to make it less tedious to keep upper bounds up-to-date.

But as much as we like to rag on them, version bounds have a very important function: they prevent you from attempting to compile packages against dependencies which don't work at all! An under-constrained set of version bounds can easily have compiling against a version of the dependency which doesn't type check.

How can a module system help? At the end of the day, version numbers are trying to capture something about the API exported by a package, described by the package versioning policy. But the current state-of-the-art requires a user to manually translate changes to the API into version numbers: an error prone process, even when assisted byvarioustools. A module system, on the other hand, turns the API into a first-class entity understood by the compiler itself: a module signature. Wouldn't it be great if packages depended upon signatures rather than versions: then you would never have to worry about version numbers being inaccurate with respect to type checking. (Of course, versions would still be useful for recording changes to semantics not seen in the types, but their role here would be secondary in importance.) Some full disclosure is warranted here: I am not going to have this implemented by the end of my internship, but I'm hoping to make some good infrastructural contributions toward it.

Conclusion

If you skimmed the introduction to the Backpack paper, you might have come away with the impression that Backpack is something about random number generators, recursive linking and applicative semantics. While these are all true "facts" about Backpack, they understate the impact a good module system can have on the day-to-day problems of a working programmer. In this post, I hope I've elucidated some of these problems, even if I haven't convinced you that a module system like Backpack actually goes about solving these problems: that's for the next series of posts. Stay tuned!

July 26, 2014

Hello loyal readers: Inside 206-105 has a new theme! I’m retiring Manifest, which was a pretty nice theme but (1) the text size was too small and (2) I decided I didn’t really like the fonts, I’ve reskinned my blog with a theme based on Brent Jackson’s Ashley, but ported to work on WordPress. I hope you like it, and please report any rendering snafus you might notice on older pages. Thanks!

July 11, 2014

Today, I'd like to talk about some of the core design principles behind type classes, a wildly successful feature in Haskell. The discussion here is closely motivated by the work we are doing at MSRC to support type classes in Backpack. While I was doing background reading, I was flummoxed to discover widespread misuse of the terms "confluence" and "coherence" with respect to type classes. So in this blog post, I want to settle the distinction, and propose a new term, "global uniqueness of instances" for the property which people have been colloquially referred to as confluence and coherence.

Let's start with the definitions of the two terms. Confluence is a property that comes from term-rewriting: a set of instances is confluent if, no matter what order
constraint solving is performed, GHC will terminate with a canonical set
of constraints that must be satisfied for any given use of a type class.
In other words, confluence says that we won't conclude that a program
doesn't type check just because we swapped in a different constraint
solving algorithm.

Confluence's closely related twin is coherence (defined in the paper "Type
classes: exploring the design space"). This property states that
every different valid typing derivation of a program leads to a
resulting program that has the same dynamic semantics. Why could
differing typing derivations result in different dynamic semantics? The
answer is that context reduction, which picks out type class instances,
elaborates into concrete choices of dictionaries in the generated code.
Confluence is a prerequisite for coherence, since one
can hardly talk about the dynamic semantics of a program that doesn't
type check.

So, what is it that people often refer to when they compare Scala type classes to Haskell type classes? I am going to refer to this as global uniqueness of instances, defining to say: in a fully compiled program, for any type, there is at most one
instance resolution for a given type class. Languages with local type
class instances such as Scala generally do not have this property, and
this assumption is a very convenient one when building abstractions like sets.

So, what properties does GHC enforce, in practice?
In the absence of any type system extensions, GHC's employs a set of
rules to ensure that type
class resolution is confluent and coherent. Intuitively, it achieves
this by having a very simple constraint solving algorithm (generate
wanted constraints and solve wanted constraints) and then requiring the
set of instances to be nonoverlapping, ensuring there is only
ever one way to solve a wanted constraint. Overlap is a
more stringent restriction than either confluence or coherence, and
via the OverlappingInstances and IncoherentInstances, GHC
allows a user to relax this restriction "if they know what they're doing."

Surprisingly, however, GHC does not enforce global uniqueness of
instances. Imported instances are not checked for overlap until we
attempt to use them for instance resolution. Consider the following program:

When compiled with one-shot compilation, C will not report
overlapping instances unless we actually attempt to use the Eq
instance in C. This is by
design:
ensuring that there are no overlapping instances eagerly requires
eagerly reading all the interface files a module may depend on.

We might summarize these three properties in the following manner.
Culturally, the Haskell community expects global uniqueness of instances
to hold: the implicit global database of instances should be
confluent and coherent. GHC, however, does not enforce uniqueness of
instances: instead, it merely guarantees that the subset of the
instance database it uses when it compiles any given module is confluent and coherent. GHC does do some
tests when an instance is declared to see if it would result in overlap
with visible instances, but the check is by no means
perfect;
truly, type-class constraint resolution has the final word. One
mitigating factor is that in the absence of orphan instances, GHC is
guaranteed to eagerly notice when the instance database has overlap (assuming that the instance declaration checks actually worked...)

Clearly, the fact that GHC's lazy behavior is surprising to most
Haskellers means that the lazy check is mostly good enough: a user
is likely to discover overlapping instances one way or another.
However, it is relatively simple to construct example programs which
violate global uniqueness of instances in an observable way:

Locally, all type class resolution was coherent: in the subset of
instances each module had visible, type class resolution could be done
unambiguously. Furthermore, the types of ins and ins'
discharge type class resolution, so that in D when the database
is now overlapping, no resolution occurs, so the error is never found.

It is easy to dismiss this example as an implementation wart in GHC, and
continue pretending that global uniqueness of instances holds. However,
the problem with global uniqueness of instances is that they are
inherently nonmodular: you might find yourself unable to compose two
components because they accidentally defined the same type class
instance, even though these instances are plumbed deep in the
implementation details of the components. This is a big problem for Backpack, or really any module system, whose mantra of separate modular development seeks to guarantee that linking will succeed if the library writer and the application writer develop to a common signature.

May 18, 2014

tl;dr The scope of backtracking try should be minimized, usually by placing it inside the definition of a parser.

Have you ever written a Parsec parser and gotten a really uninformative error message?

"test.txt" (line 15, column 7):
unexpected 'A'
expecting end of input

The line and the column are randomly somewhere in your document, and you're pretty sure you should be in the middle of some stack of parser combinators. But wait! Parsec has somehow concluded that the document should be ending immediately. You noodle around and furthermore discover that the true error is some ways after the actually reported line and column.

You think, “No wonder Parsec gets such a bad rep about its error handling.”

Assuming that your grammar in question is not too weird, there is usually a simple explanation for an error message like this: the programmer sprinkled their code with too many backtracking try statements, and the backtracking has destroyed useful error state. In effect, at some point the parser failed for the reason we wanted to report to the user, but an enclosing try statement forced the parser to backtrack and try another (futile possibility).

This can be illustrated by way of an example. A Haskeller is playing around with parse combinators and decides to test out their parsing skills by writing a parser for Haskell module imports:

stmt ::= import qualified A as B
| import A

Piggy-backing off of Parsec’s built in token combinators (and the sample code), their first version might look something like this:

Wait a second! The error we wanted was that there was an unexpected identifier s, when we were expecting as. But instead of reporting an error when this occurred, Parsec instead backtracked, and attempted to match the pImport rule, only failing once that rule failed. By then, the knowledge that one of our choice branches failed had been forever lost.

How can we fix it? The problem is that our code backtracks when we, the developer, know it will be futile. In particular, once we have parsed import qualified, we know that the statement is, in fact, a qualified import, and we shouldn’t backtrack anymore. How can we get Parsec to understand this? Simple: reduce the scope of the try backtracking operator:

Here, we have moved the try from pStmt into pQualifiedImport, and we only backtrack if import qualified fails to parse. Once it parses, we consume those tokens and we are now committed to the choice of a qualified import. The error messages get correspondingly better:

The moral of the story: The scope of backtracking try should be minimized, usually by placing it inside the definition of a parser. Some amount of cleverness is required: you have to be able to identify how much lookahead is necessary to commit to a branch, which generally depends on how the parser is used. Fortunately, many languages are constructed specifically so that the necessary lookahead is not too large, and for the types of projects I might use Parsec for, I’d be happy to sacrifice this modularity.

Another way of looking at this fiasco is that Parsec is at fault: it shouldn’t offer an API that makes it so easy to mess up error messages—why can’t it automatically figure out what the necessary lookahead is? While a traditional parser generator can achieve this (and improve efficiency by avoiding backtracking altogether in our earlier example), there are some fundamental reasons why Parsec (and monadic parser combinator libraries like it) cannot automatically determine what the lookahead needs to be. This is one of the reasons (among many) why many Haskellers prefer faster parsers which simply don’t try to do any error handling at all.

Why, then, did I write this post in the first place? There is still a substantial amount of documentation recommending the use of Parsec, and a beginning Haskeller is more likely than not going to implement their first parser in Parsec. And if someone is going to write a Parsec parser, you might as well spend a little time to limit your backtracking: it can make working with Parsec parsers a lot more pleasant.

May 09, 2014

Brandon Simmon recently made a post to the glasgow-haskell-users mailing list asking the following question:

I've been looking into an issue in a library in which as more mutable arrays are allocated, GC dominates (I think I verified this?) and all code gets slower in proportion to the number of mutable arrays that are hanging around.

...to which I replied:

In the current GC design, mutable arrays of pointers are always placed on the mutable list. The mutable list of generations which are not being collected are always traversed; thus, the number of pointer arrays corresponds to a linear overhead for minor GCs.

If you’re coming from a traditional, imperative language, you might find this very surprising: if you paid linear overhead per GC in Java for all the mutable arrays in your system... well, you probably wouldn't use Java ever, for anything. But most Haskell users seem to get by fine; mostly because Haskell encourages immutability, making it rare for one to need lots of mutable pointer arrays.

Of course, when you do need it, it can be a bit painful. We have a GHC bug tracking the issue, and there is some low hanging fruit (a variant of mutable pointer arrays which has more expensive write operation, but which only gets put on the mutable list when you write to it) as well as some promising directions for how to implement card-marking for the heap, which is the strategy that GCs like the JVM's use.

On a more meta-level, implementing a perfomant generational garbage collector for an immutable language is far, far easier than implementing one for a mutable language. This is my personal hypothesis why Go doesn’t have a generational collector yet, and why GHC has such terrible behavior on certain classes of mutation.

Postscript. The title is a pun on the fact that “DIRTY” is used to describe mutable objects which have been written to since the last GC. These objects are part of the remembered set and must be traversed during garbage collection even if they are in an older generation.

May 08, 2014

Elimination rules play an important role in computations over datatypes in proof assistants like Coq. In his paper "Elimination with a Motive", Conor McBride argued that "we should exploit a hypothesis not in terms of its immediate consequences, but in terms of the leverage it exerts on an arbitrary goal: we should give elimination a motive." In other words, proofs in a refinement setting (backwards reasoning) should use their goals to guide elimination.

I recently had the opportunity to reread this historical paper, and in the process, I thought it would be nice to port the examples to Coq. Here is the result:

It's basically a short tutorial motivating John Major equality (also known as heterogenous equality.) The linked text is essentially an annotated version of the first part of the paper—I reused most of the text, adding comments here and there as necessary. The source is also available at:

May 04, 2014

Weak pointers and finalizers are a very convenient feature for many types of programs. Weak pointers are useful for implementing memotables and solving certain classes of memory leaks, while finalizers are useful for fitting "allocate/deallocate" memory models into a garbage-collected language. Of course, these features don’t come for free, and so one might wonder what the cost of utilizing these two (closely related) features are in GHC. In this blog post, I want to explain how weak pointers and finalizers are implemented in the GHC runtime system and characterize what extra overheads you incur by using them. These post assumes some basic knowledge about how the runtime system and copying garbage collection work.

The userland API

The API for weak pointers is in System.Mem.Weak; in its full generality, a weak pointer consists of a key and a value, with the property that if the key is alive, then the value is considered alive. (A "simple" weak reference is simply one where the key and value are the same.) A weak pointer can also optionally be associated with a finalizer, which is run when the object is garbage collected. Haskell finalizers are not guaranteed to run.

Foreign pointers in Foreign.ForeignPtr also have a the capability to attach a C finalizer; i.e. a function pointer that might get run during garbage collection. As it turns out, these finalizers are also implemented using weak pointers, but C finalizers are treated differently from Haskell finalizers.

As we can see, we have pointers to the key and value, as well as separate pointers for a single Haskell finalizer (just a normal closure) and C finalizers (which have the type StgCFinalizerList). There is also a link field for linking weak pointers together. In fact, when the weak pointer is created, it is added to the nursery's list of weak pointers (aptly named weak_ptr_list). As of GHC 7.8, this list is global, so we do have to take out a global lock when a new weak pointer is allocated; however, the lock has been removed in HEAD.

Garbage collecting weak pointers

Pop quiz! When we do a (minor) garbage collection on weak pointers, which of the fields in StgWeak are considered pointers, and which fields are considered non-pointers? The correct answer is: only the first field is considered a “pointer”; the rest are treated as non-pointers by normal GC. This is actually what you would expect: if we handled the key and value fields as normal pointer fields during GC, then they wouldn’t be weak at all.

Once garbage collection has been completed (modulo all of the weak references), we then go through the weak pointer list and check if the keys are alive. If they are, then the values and finalizers should be considered alive, so we mark them as live, and head back and do more garbage collection. This process will continue as long as we keep discovering new weak pointers to process; however, this will only occur when the key and the value are different (if they are the same, then the key must have already been processed by the GC). Live weak pointers are removed from the "old" list and placed into the new list of live weak pointers, for the next time.

Once there are no more newly discovered live pointers, the list of dead pointers is collected together, and the finalizers are scheduled (scheduleFinalizers). C finalizers are run on the spot during GC, while Haskell finalizers are batched together into a list and then shunted off to a freshly created thread to be run.

That's it! There are some details for how to handle liveness of finalizers (which are heap objects too, so even if an object is dead we have to keep the finalizer alive for one more GC) and threads (a finalizer for a weak pointer can keep a thread alive).

Tallying up the costs

To summarize, here are the extra costs of a weak pointer:

Allocating a weak pointer requires taking a global lock (will be fixed in GHC 7.10) and costs six words (fairly hefty as far as Haskell heap objects tend to go.)

During each minor GC, processing weak pointers takes time linear to the size of the weak pointer lists for all of the generations being collected. Furthermore, this process involves traversing a linked list, so data locality will not be very good. This process may happen more than once, although once it is determined that a weak pointer is live, it is not processed again. The cost of redoing GC when a weak pointer is found to be live is simply the cost of synchronizing all parallel GC threads together.

The number of times you have to switch between GC'ing and processing weak pointers depends on the structure of the heap. Take a heap and add a special "weak link" from a key to its dependent weak value. Then we can classify objects by the minimum number of weak links we must traverse from a root to reach the object: call this the "weak distance". Supposing that a given weak pointer's weak distance is n, then we spend O(n) time processing that weak pointer during minor GC. The maximum weak distance constitutes how many times we need to redo the GC.

In short, weak pointers are reasonably cheap when they are not deeply nested: you only pay the cost of traversing a linked list of all of the pointers you have allocated once per garbage collection. In the pessimal case (a chain of weak links, where the value of each weak pointer was not considered reachable until we discovered its key is live in the previous iteration), we can spend quadratic time processing weak pointers.

April 01, 2014

Move aside, poker! While the probabilities of various poker hands are well understood and tabulated, the Chinese game of chance Mahjong [1] enjoys a far more intricate structure of expected values and probabilities. [2] This is largely due in part to the much larger variety of tiles available (136 tiles, as opposed to the standard playing card deck size of 52), as well as the turn-by-turn game play, which means there is quite a lot of strategy involved with what is ostensibly a game of chance. In fact, the subject is so intricate, I’ve decided to write my PhD thesis on it. This blog post is a condensed version of one chapter of my thesis, considering the calculation of shanten, which we will define below. I’ll be using Japanese terms, since my favorite variant of mahjong is Riichi Mahjong; you can consult the Wikipedia article on the subject if you need to translate.

Calculating Shanten

The basic gameplay of Mahjong involves drawing a tile into a hand of thirteen tiles, and then discarding another tile. The goal is to form a hand of fourteen tiles (that is, after drawing, but before discarding a tile) which is a winning configuration. There are a number of different winning configurations, but most winning configurations share a similar pattern: the fourteen tiles must be grouped into four triples and a single pair. Triples are either three of the same tile, or three tiles in a sequence (there are three “suits” which can be used to form sequences); the pair is two of the same tiles. Here is an example:

One interesting quantity that is useful to calculate given a mahjong hand is the shanten number, that is, the number of tiles away from winning you are. This can be used to give you the most crude heuristic of how to play: discard tiles that get you closer to tenpai. The most widely known shanten calculator is this one on Tenhou’s website [3]; unfortunately, the source code for this calculator is not available. There is another StackOverflow question on the subject, but the “best” answer offers only a heuristic approach with no proof of correctness! Can we do better?

Naïvely, the shanten number is a breadth first search on the permutations of a hand. When a winning hand is found, the algorithm terminates and indicates the depth the search had gotten to. Such an algorithm is obviously correct; unfortunately, with 136 tiles, one would have to traverse hands (choices of new tiles times choices of discard) while searching for a winning hand that is n-shanten away. If you are four tiles away, you will have to traverse over six trillion hands. We can reduce this number by avoiding redundant work if we memoize the shanten associated with hands: however, the total number of possible hands is roughly , or 59 bits. Though we can fit (via a combinatorial number system) a hand into a 64-bit integer, the resulting table is still far too large to hope to fit in memory.

The trick is to observe that shanten calculation for each of the suits is symmetric; thus, we can dynamic program over a much smaller space of the tiles 1 through 9 for some generic suit, and then reuse these results when assembling the final calculation. is still rather large, so we can take advantage of the fact that because there are four copies of each tile, an equivalent representation is a 9-vector of the numbers zero to four, with the constraint that the sum of these numbers is 13. Even without the constraint, the count is only two million, which is quite tractable. At a byte per entry, that’s 2MB of memory; less than your browser is using to view this webpage. (In fact, we want the constraint to actually be that the sum is less than or equal to 13, since not all hands are single-suited, so the number of tiles in a hand is less.

The breadth-first search for solving a single suit proceeds as follows:

Initialize a table A indexed by tile configuration (a 9-vector of 0..4).

Initialize a todo queue Q of tile configurations.

Initialize all winning configurations in table A with shanten zero (this can be done by enumeration), recording these configurations in Q.

While the todo queue Q is not empty, pop the front element, mark the shanten of all adjacent uninitialized nodes as one greater than that node, and push those nodes onto the todo queue.

With this information in hand, we can assemble the overall shanten of a hand. It suffices to try every distribution of triples and the pairs over the four types of tiles (also including null tiles), consulting the shanten of the requested shape, and return the minimum of all these configurations. There are (by stars and bars) combinations, for a total of 140 configurations. Computing the shanten of each configuration is a constant time operation into the lookup table generated by the per-suit calculation. A true shanten calculator must also accomodate the rare other hands which do not follow this configuration, but these winning configurations are usually highly constrained, and quite easily to (separately) compute the shanten of.

With a shanten calculator, there are a number of other quantities which can be calculated. Uke-ire refers to the number of possible draws which can reduce the shanten of your hand: one strives for high uke-ire because it means that probability that you will draw a tile which moves your hand closer to winning. Given a hand, it's very easy to calculate its uke-ire: just look at all adjacent hands and count the number of hands which have lower shanten.

Further extensions

Suppose that you are trying to design an AI which can play Mahjong. Would the above shanten calculator provide a good evaluation metric for your hand? Not really: it has a major drawback, in that it does not consider the fact that some tiles are simply unavailable (they were discarded). For example, if all four “nine stick” tiles are visible on the table, then no hand configuration containing a nine stick is actually reachable. Adjusting for this situation is actually quite difficult, for two reasons: first, we can no longer precompute a shanten table, since we need to adjust at runtime what the reachability metric is; second, the various suits are no longer symmetric, so we have to do three times as much work. (We can avoid an exponential blowup, however, since there is no inter-suit interaction.)

Another downside of the shanten and uke-ire metrics is that they are not direct measures of “tile efficiency”: that is, they do not directly dictate a strategy for discards which minimizes the expected time before you get a winning hand. Consider, for example, a situation where you have the tiles 233, and only need to make another triple in order to win. You have two possible discards: you can discard a 2 or a 3. In both cases, your shanten is zero, but discarding a 2, you can only win by drawing a 3, whereas discarding a 3, you can win by drawing a 1 or a 4. Maximizing efficiency requires considering the lifetime ure-kire of your hands.

Even then, perfect tile efficiency is not enough to see victory: every winning hand is associated with a point-score, and so in many cases it may make sense to go for a lower-probability hand that has higher expected value. Our decomposition method completely falls apart here, as while the space of winning configurations can be partitioned, scoring has nonlocal effects, so the entire hand has to be considered as a whole. In such cases, one might try for a Monte Carlo approach, since the probability space is too difficult to directly characterize. However, in the Japanese Mahjong scoring system, there is yet another difficulty with this approach: the scoring system is exponential. Thus, we are in a situation where the majority of samples will be low scoring, but an exponentially few number of samples have exponential payoff. In such cases, it’s difficult to say if random sampling will actually give a good result, since it is likely to miscalculate the payoff, unless exponentially many samples are taken. (On the other hand, because these hands are so rare, an AI might do considerably well simply ignoring them.)

To summarize, Mahjong is a fascinating game, whose large state space makes it difficult to accurately characterize the probabilities involved. In my thesis, I attempt to tackle some of these questions; please check it out if you are interested in more.

[1] No, I am not talking about the travesty that is mahjong solitaire.

[2] To be clear, I am not saying that poker strategy is simple—betting strategy is probably one of the most interesting parts of the game—I am simply saying that the basic game is rather simple, from a probability perspective.

[3] Tenhou is a popular Japanese online mahjong client. The input format for the Tenhou calculator is 123m123p123s123z, where numbers before m indicate man tiles, p pin tiles, s sou tiles, and z honors (in order, they are: east, south, west, north, white, green, red). Each entry indicates which tile you can discard to move closer to tenpai; the next list is of ure-kire (and the number of tiles which move the hand further).

March 17, 2014

So you may have heard about this popular new programming language called Haskell. What's Haskell? Haskell is a non-dependently typed programming language, sporting general recursion, type inference and built-in side-effects. It is true that dependent types are considered an essential component of modern, expressive type systems. However, giving up dependence can result in certain benefits for other aspects of software engineering, and in this article, we'd like to talk about the omissions that Haskell makes to support these changes.

Syntax

There are a number of syntactic differences between Coq and Haskell, which we will point out as we proceed in this article. To start with, we note that in Coq, typing is denoted using a single colon (false : Bool); in Haskell, a double colon is used (False :: Bool). Additionally, Haskell has a syntactic restriction, where constructors must be capitalized, while variables must be lower-case.

Given this, it is tempting to draw an analogy between universes and Haskell’s kind of types * (pronounced “star”), which classifies types in the same way Type (* 0 *) classifies primitive types in Coq. Furthermore, the sort box classifies kinds (* : BOX, although this sort is strictly internal and cannot be written in the source language). However, the resemblance here is only superficial: it is misleading to think of Haskell as a language with only two universes. The differences can be summarized as follows:

In Coq, universes are used purely as a sizing mechanism, to prevent the creation of types which are too big. In Haskell, types and kinds do double duty to enforce the phase distinction: if a has kind *, then x :: a is guaranteed to be a runtime value; likewise, if k has sort box, then a :: k is guaranteed to be a compile-time value. This structuring is a common pattern in traditional programming languages, although knowledgeable folks like Conor McBride think that ultimately this is a design error, since one doesn’t really need a kinding system to have type erasure.

In Coq, universes are cumulative: a term which has type Type (* 0 *) also has type Type (* 1 *). In Haskell, there is no cumulativity between between types and kinds: if Nat is a type (i.e. has the type *), it is not automatically a kind. However, in some cases, partial cumulativity can be achieved using datatype promotion, which constructs a separate kind-level replica of a type, where the data constructors are now type-level constructors. Promotion is also capable of promoting type constructors to kind constructors.

In Coq, a common term language is used at all levels of universes. In Haskell, there are three distinct languages: a language for handling base terms (the runtime values), a language for handling type-level terms (e.g. types and type constructors) and a language for handling kind-level terms. In some cases this syntax is overloaded, but in later sections, we will often need to say how a construct is formulated separately at each level of the kinding system.

One further remark: Type in Coq is predicative; in Haskell, * is impredicative, following the tradition of System F and other languages in the lambda cube, where kinding systems of this style are easy to model.

Function types

In Coq, given two types A and B, we can construct the type A -> B denoting functions from A to B (for A and B of any universe). Like Coq, functions with multiple arguments are natively supported using currying. Haskell supports function types for both types (Int -> Int) and kinds (* -> *, often called type constructors) and application by juxtaposition (e.g. f x). (Function types are subsumed by pi types, however, we defer this discussion for later.) However, Haskell has some restrictions on how one may construct functions, and utilizes different syntax when handling types and kinds:

For expressions (with type a -> b where a, b :: *), both direct definitions and lambdas are supported. A direct definition is written in an equational style:

Definition f x := x + x.

f x = x + x

while a lambda is represented using a backslash:

fun x => x + x

\x -> x + x

For type families (with type k1 -> k2 where k1 and k2 are kinds), the lambda syntax is not supported. In fact, no higher-order behavior is permitted at the type-level; while we can directly define appropriately kinded type functions, at the end of the day, these functions must be fully applied or they will be rejected by the type-checker. From an implementation perspective, the omission of type lambdas makes type inference and checking much easier.

Type synonyms:

Definition Endo A := A -> A.

type Endo a = a -> a

Type synonyms are judgmentally equal to their expansions. As mentioned in the introduction, they cannot be partially applied. They were originally intended as a limited syntactic mechanism for making type signatures more readable.

While closed type families look like the addition of typecase (and would violate parametricity in that case), this is not the case, as closed type families can only return types. In fact, closed type families correspond to a well-known design pattern in Coq, where one writes inductive data type representing codes of types, and then having an interpretation function which interprets the codes as actual types. As we have stated earlier, Haskell has no direct mechanism for defining functions on types, so this useful pattern had to be supported directly in the type families functionality. Once again, closed type families cannot be partially applied.

In fact, the closed type family functionality is a bit more expressive than an inductive code. In particular, closed type families support non-linear pattern matches (F a a = Int) and can sometimes reduce a term when no iota reductions are available, because some of the inputs are not known. The reason for this is because closed type families are “evaluated” using unification and constraint-solving, rather than ordinary term reduction as would be the case with codes in Coq. Indeed, nearly all of the “type level computation” one may perform in Haskell, is really just constraint solving. Closed type families are not available in a released version of GHC (yet), but there is a Haskell wiki page describing closed type families in more detail.

Open type (synonym) families:

(* Not directly supported in Coq *)

type family F a
type instance F Int = Char
type instance F Char = Int

Unlike closed type families, open type families operate under an open universe, and have no analogue in Coq. Open type families do not support nonlinear matching, and must completely unify to reduce. Additionally, there are number of restrictions on the left-hand side and right-hand side of such families in order maintain decidable type inference. The section of the GHC manual Type instance declarations expands on these limitations.

Both closed and type-level families can be used to implement computation at the type-level of data constructors which were lifted to the type-level via promotion. Unfortunately, any such algorithm must be implemented twice: once at the expression level, and once at the type level. Use of metaprogramming can alleviate some of the boilerplate necessary; see, for example, the singletons library.

Dependent function types (Π-types)

A Π-type is a function type whose codomain type can vary depending on the element of the domain to which the function is applied. Haskell does not have Π-types in any meaningful sense. However, if you only want to use a Π-type solely for polymorphism, Haskell does have support. For polymorphism over types (e.g. with type forall a : k, a -> a, where k is a kind), Haskell has a twist:

Definition id : forall (A : Type), A -> A := fun A => fun x => x.

id :: a -> a
id = \x -> x

In particular, the standard notation in Haskell is to omit both the type-lambda (at the expression level) and the quantification (at the type level). The quantification at the type level can be recovered using the explicit universal quantification extension:

id :: forall a. a -> a

However, there is no way to directly explicitly state the type-lambda. When the quantification is not at the top-level, Haskell requires an explicit type signature with the quantification put in the right place. This requires the rank-2 (or rank-n, depending on the nesting) polymorphism extension:

Definition f : (forall A, A -> A) -> bool := fun g => g bool true.

f :: (forall a. a -> a) -> Bool
f g = g True

Polymorphism is also supported at the kind-level using the kind polymorphism extension. However, there is no explicit forall for kind variables; you must simply mention a kind variable in a kind signature.

Proper dependent types cannot be supported directly, but they can be simulated by first promoting data types from the expression level to the type-level. A runtime data-structure called a singleton is then used to refine the result of a runtime pattern-match into type information. This pattern of programming in Haskell is not standard, though there are recent academic papers describing how to employ it. One particularly good one is Hasochism: The Pleasure and Pain of Dependently Typed Haskell Program, by Sam Lindley and Conor McBride.

Product types

Coq supports cartesian product over types, as well as a nullary product type called unit. Very similar constructs are also implemented in the Haskell standard library:

(true, false) : bool * bool
(True, False) :: (Bool, Bool)

tt : unit
() :: ()

Pairs can be destructed using pattern-matching:

match p with
| (x, y) => ...
end

case p of
(x, y) -> ...

Red-blooded type theorists may take issue with this identification: in particular, Haskell’s default pair type is what is considered a negative type, as it is lazy in its values. (See more on polarity.) As Coq’s pair is defined inductively, i.e. positively, a more accurate identification would be with a strict pair, defined as data SPair a b = SPair !a !b; i.e. upon construction, both arguments are evaluated. This distinction is difficult to see in Coq, since positive and negative pairs are logically equivalent, and Coq does not distinguish between them. (As a total language, it is indifferent to choice of evaluation strategy.) Furthermore, it's relatively common practice
to extract pairs into their lazy variants when doing code extraction.

Dependent pair types (Σ-types)

Dependent pair types are the generalization of product types to be dependent. As before, Σ-types cannot be directly expressed, except in the case where the first component is a type. In this case, there is an encoding trick utilizing data types which can be used to express so-called existential types:

Definition p := exist bool not : { A : Type & A -> bool }

data Ex = forall a. Ex (a -> Bool)
p = Ex not

As was the case with polymorphism, the type argument to the dependent pair is implicit. It can be specified explicitly by way of an appropriately placed type annotation.

Recursion

In Coq, all recursive functions must have a structurally decreasing argument, in order to ensure that all functions terminate. In Haskell, this restriction is lifted for the expression level; as a result, expression level functions may not terminate. At the type-level, by default, Haskell enforces that type level computation is decidable. However, this restriction can be lifted using the UndecidableInstances flag. It is generally believed that undecidable instances cannot be used to cause a violation of type safety, as nonterminating instances would simply cause the compiler to loop infinitely, and due to the fact that in Haskell, types cannot (directly) cause a change in runtime behavior.

Inductive types/Recursive types

In Coq, one has the capacity to define inductive data types. Haskell has a similar-looking mechanism for defining data types, but there are a number of important differences which lead many to avoid using the moniker inductive data types for Haskell data types (although it’s fairly common for Haskellers to use the term anyway.)

Basic types like boolean can be defined with ease in both languages (in all cases, we will use the GADT syntax for Haskell data-types, as it is closer in form to Coq’s syntax and strictly more powerful):

Inductive bool : Type :=
| true : bool
| false : bool.

data Bool :: * where
True :: Bool
False :: Bool

Both also support recursive occurrences of the type being defined:

Inductive nat : Type :=
| z : nat
| s : nat -> nat.

data Nat :: * where
Z :: Nat
S :: Nat -> Nat

One has to be careful though: our definition of Nat in Haskell admits one more term: infinity (an infinite chain of successors). This is similar to the situation with products, and stems from the fact that Haskell is lazy.

Haskell’s data types support parameters, but these parameters may only be types, and not values. (Though, recall that data types can be promoted to the type level). Thus, the standard type family of vectors may be defined, assuming an appropriate type-level nat (as usual, explicit forall has been omitted):

As type-level lambda is not supported but partial application of data types is (in contrast to type families), the order of arguments in the type must be chosen with care. (One could define a type-level flip, but they would not be able to partially apply it.)

Haskell data type definitions do not have the strict positivity requirement, since we are not requiring termination; thus, peculiar data types that would not be allowed in Coq can be written:

Inference

Coq has support for requesting that a term be inferred by the unification engine, either by placing an underscore in a context or by designating an argument as implicit (how one might implement in Coq the omission of type arguments of polymorphic functions as seen in Haskell). Generally, one cannot expect all inference problems in a dependently typed language to be solvable, and the inner-workings of Coq’s unification engines (plural!) are considered a black art (no worry, as the trusted kernel will verify that the inferred arguments are well-typed).

Haskell as specified in Haskell'98 enjoys principal types and full type inference under Hindley-Milner. However, to recover many of the advanced features enjoyed by Coq, Haskell has added numerous extensions which cannot be easily accomodated by Hindley-Milner, including type-class constraints, multiparameter type classes, GADTs and type families. The current state-of-the-art is an algorithm called OutsideIn(X). With these features, there are no completeness guarantee. However, if the inference algorithm accepts a definition, then that definition has a principal type and that type is the type the algorithm found.

Conclusion

This article started as a joke over in OPLSS'13, where I found myself explaining some of the hairier aspects of Haskell’s type system to Jason Gross, who had internalized Coq before he had learned much Haskell. Its construction was iced for a while, but later I realized that I could pattern the post off of the first chapter of the homotopy type theory book. While I am not sure how useful this document will be for learning Haskell, I think it suggests a very interesting way of mentally organizing many of Haskell’s more intricate type-system features. Are proper dependent types simpler? Hell yes. But it’s also worth thinking about where Haskell goes further than most existing dependently typed languages...

Postscript

Bob Harper complained over Twitter that this post suggested misleading analogies in some situations. I've tried to correct some of his comments, but in some cases I wasn't able to divine the full content of his comments. I invite readers to see if they can answer these questions:

Because of the phase distinction, Haskell’s type families are not actually type families, in the style of Coq, Nuprl or Agda. Why?

This post is confused about the distinction between elaboration (type inference) and semantics (type structure). Where is this confusion?

Quantification over kinds is not the same as quantification over types. Why?

March 04, 2014

We've all been there---a code base so horrible the only coherent thought is "This has to be re-written. Now. It's beyond saving." One of those days where you find yourself staring at your favorite text editor wishing you were instead reading The Daily WTF or that this is all just a dream. But alas, it's real. Unfortunately, rewriting this component is just the tip of the yak stack. Due to legacy "best practices", the component in question is so tightly integrated into the system it would take thousands of engineer-hours to extract it. Is it worth it?

Learning from Project Athena

MIT's Project Athena, created a distributed campus-wide computing environment. The project was a phenomenal success, inventing Kerberos and X; both are now ubiquitous. Other parts of Athena continued to be available in the monolithic "Linux-Athena" distribution that ran on computers across campus. As laptops became the preferred form factor, the monolithic structure of Athena was broken up into conveniently-sized packages for your Debian/Ubuntu system as Debathena. There are even a few users that have gotten it to work on Linux Mint; a daring few have ported over packages to Fedora and even OS X. Debathena is a huge success for SIPB, as it became the official Linux distribution used on campus and is supported by MIT IS&T.

Despite the innovations by the original Debathena team to modularize Athena, Debathena inherited almost all of the technical debt from the original Project Athena. Over 25 years of code doesn't come for free, and this is obvious throughout the legacy support libraries. Faced with the choice of making a system that worked or a system that was correct, the team made the hard call to encapsulate the legacy baggage and made a promise to come back to it later.

Five years later (2013), an entirely new team of Debathena contributors started a new effort: athfsck. The mandate of athfsck was simple: replace as much old, K&R C with modern Python 2.7 as reasonable; make reproducible builds possible; repopulate the long ignored Athena Distribution FTP Server---in essence, finally pay down the technical debt that had accumulated since 1983. athfsck is still on-going, but has already yielded huge wins: code that is finally easy to understand, easy to test, portable, and most importantly, secure.

A code health checklist

How did Debathena finally decide to invest time in a substantial rewrite of legacy components? When the majority of the developers were so sick of the hacks in the code base that "there's got to be a better way" became a common phrase. We created a checklist and used it to evaluate the code health of the individual components. Each component was prioritized based on how many of these faults it contained. Although this list contains examples from Debathena, the basic principles apply to all code.

Needless complexity

As code grows, so does complexity. Debathena contained thousands of lines of unsafe C, much of it written in K&R-style, to do tasks that take just 5 lines of Python. All of this legacy code turned off contributors, even those fluent in C, as complete gibberish. Strive to eliminate as much complexity as possible, and make the code understandable to all contributors.

Inappropriate design (code duplication and global state)

To cope with the complexities of modern systems, old code is frequently patched with similar fixes throughout. Without proper encapsulation of responsibilities and appropriate segmentation into libraries, the design of tools throughout systems crumbles. The use of global variables in libraries makes it extremely difficult to write code without invoking hidden interactions; avoid it all cost. For network protocols, use of UDP is almost always inappropriate.

Gaping security flaws

Debathena specifically contained a significant portion of legacy code that continued to rely on Kerberos 4 (deprecated) or DES encryption (brute-force takes just 23 hours for $200 on Cloudcracker). Use of weak or custom cryptography is a serious security vulnerability.

Confusing APIs

The Debathena Team's preferred programming language is currently Python, chosen for its portability and ease-of-understanding. The underlying libraries were all written in C, which can be loaded with Python's FFI features, but the interfaces exposed were extremely awkward and difficult to use correctly, even with ample documentation. Strive to make APIs that are clear and consistent; a good API isolates complexity.

Unstable build systems

Software that can't be built, can't be debugged. For Debathena, almost every directory contained a configure.in and a Makefile.in, but without a correct configuration. No auto-detection of platform features was being utilized, resulting in breakages with every new Ubuntu release. To make matters worse, make -j randomly broke due to incorrectly specified dependencies and makefile race conditions. Make sure that make distcheck (or the appropriate equivalent for the build system in use) passes cleanly.

Fragility (assumptions about system makeup)

Refactoring projects throughout the years separates the various components of a once-monolithic system, exposing error handling code paths that were previously not exercised. Ensure that these code paths are exercised, or you may find that they are extremely fragile, causing breakage on new systems with otherwise valid configurations.

Version control system limitations

Does the VCS support the project? Debathena's use of Subversion made contributions by developers outside of the core team a laborious process due to the lack of cheap branching and publishing as offered by a DVCS like git.

License issues

Old open source projects may find that some contributions are under a different license than expected. Debathena found some code was found to be released under 4-Clause BSD and had to be re-written clean-room.

The faults must be weighed against the importance of the components. Rewrite energy should be targeted at the components that would result in the most visible benefits.

Don't just rewrite; redesign

If the code in question has sufficiently many faults, then it is a good candidate for a rewrite. However, there is one caveat: rewrites for the sake of language translation alone are almost always a mistake; the exception being when it's impossible to integrate with the rest of the system otherwise. Debathena did not choose to rewrite components into Python simply because Python was the preferred language. The rewrites only occurred when it was clear the old code was no longer capable of effectively handling the modern use cases. While rewriting, keep these principles in mind:

Avoid the pitfalls of the past generation. New code should encourage modularity, code re-use, consistent error handling, and have a focus on correct implementations of the core features. Drop as many vestigial features as possible, especially those that are discovered to have been broken for a long time. However, don't break compatibility unless absolutely necessary. An effective redesign-and-rewrite should act as a drop-in replacement for (at least) the 90% use case.

Don't just do a mechanical translation. At best, a mechanical translation is bug-for-bug compatible with the original implementation, and has all of the same faults. At worst, it introduces additional bugs while reducing code clarity. The original source is a wonderful reference, frequently full of useful information about edge cases and past strategies. However, the original source is also a wonderful reference of past mistakes. Take the opportunity to rethink how the same features could be delivered with modern practices. Especially focus on clearing up logic and ensuring internal consistency. Question all parts of the original design and convince yourself of correctness before proceeding.

Test as you rewrite. Continually verify that the new code passes old test cases. Write new test cases to validate assumptions about system state. Expose the new code to users as early as possible; beta testers are still one of the most effective ways of finding bugs. If available, make use of lint tools to ensure the code style is consistent and avoid common mistakes.

Document behavior. Digging into the bowels of the old system will almost certainly uncover unexpected behavior and new features. Document them as much as possible, particularly noting if the behavior is available in the rewrite with an explanation if not.

Iterate. Whenever possible, rewrite small components rather than whole systems. Iteration enforces compatibility and provides a transition path, while also making the project more manageable. Take every iteration cycle as an opportunity to re-consider the design thus far. Refactor as new features are added; promote code health throughout the process.

Commit often. A rapid iteration cycle means lots of code churn. It's a lot better to have too many commits than to lose work.

Set finite goals. Don't let the re-write suffer from scope creep. Set reasonable milestones and make plan to reach them. Be hesitant about adding to your yak-stack.

Be willing to backtrack. In rare cases, the original code is hairy because there isn't a better way. Consider the opportunity cost of continuing with a rewrite before progressing.

Tending to the garden requires balance

Not all code that fails the Code Health checklist is a good candidate for a rewrite. Some code, like Athena's Discuss, needs not just a redesign but a completely new architecture. In such scenarios, it's often better to keep the old code on life support while the new system can be built.

Sufficiently static code, even with a low Code Health score, is also a poor candidate for a rewrite. There's no reason to re-engineer something that works just because it's not written in the language of vogue, or happens to be ugly.

Most importantly, cultivating code requires finding a balance between the rewrite, the long revitalization, and the clever fix. Don't declare a project ripe for rewrite just because adding a particular feature is hard---the rewrite is much, much harder. Instead, work to add the feature while improving code health. Small updates throughout the project's life will keep it from becoming a sad, wilted branch in need heavy revitalization.

The repository is like a garden: if tended to properly, it flourishes. If ignored, it bitrots. Cultivate your code carefully; rip out the weeds of bad code when it becomes necessary.

During our Homotopy Type Theory reading group, Jeremy pointed out that the difference between these two principles is exactly the difference between path induction (eq0) and based path induction (eq1). (This is covered in the Homotopy Type Theory book in section 1.12) So, Coq uses the slightly weirder definition because it happens to be a bit more convenient. (I’m sure this is folklore, but I sure didn’t notice this until now! For more reading, check out this excellent blog post by Dan Licata.)

January 21, 2014

etckeeper is a pretty good tool for keeping your /etc under version control, but one thing that it won’t tell you is what the diff between your configuration and a pristine version of your configuration (if you installed the same packages on the system, but didn’t change any configuration). People have wanted this, but I couldn’t find anything that actually did this. A month ago, I figured out a nice, easy way to achieve this under etckeeper with a Git repository. The idea is to maintain a pristine branch, and when an upgrade occurs, automatically apply the patch (automatically generated) to a pristine branch. This procedure works best on a fresh install, since I don’t have a good way of reconstructing history if you haven’t been tracking the pristine from the start.

Here’s how it goes:

Install etckeeper. It is best if you are using etckeeper 1.10 or later, but if not, you should replace 30store-metadata with a copy from the latest version. This is important, because pre-1.10, the metadata store included files that were ignored, which means you’ll get lots of spurious conflicts.

Initialize the Git repository using etckeeper init and make an initial commit git commit.

Permit pushes to the checked out /etc by running git config receive.denyCurrentBranch warn

All done! Try installing a package that has some configuration and then running sudo gitk in /etc to view the results. You can run a diff by running sudo git diff pristine master.

So, what’s going on under the hood? The big problem that blocked me from a setup like this in the past is that you would like the package manager to apply its changes into the pristine etc, so that you can merge in the changes yourself on the production version, but it’s not obvious how to convince dpkg that /etc lives somewhere else. Nor do you want to revert your system configuration to pristine version, apply the update, and then revert back: this is just asking for trouble. So the idea is to apply the (generated) patch as normal, but then reapply the patch (using a cherry-pick) to the pristine branch, and then rewrite history so the parent pointers are correct. All of this happens outside of /etc, so the production copy of the configuration files never gets touched.

Of course, sometimes the cherry-pick might fail. In that case, you’ll get an error like this:

Branch pristine set up to track remote branch pristine from origin.
Switched to a new branch 'pristine'
error: could not apply 4fed9ce... committing changes in /etc after apt run
hint: after resolving the conflicts, mark the corrected paths
hint: with 'git add <paths>' or 'git rm <paths>'
hint: and commit the result with 'git commit'
Failed to import changes to pristine
TMPREPO = /tmp/etckeeper-gitrepo.CUCpBEuVXg
TREEID = 8c2fbef8a8f3a4bcc4d66d996c5362c7ba8b17df
PARENTID = 94037457fa47eb130d8adfbb4d67a80232ddd214

Do not fret: all that has happened is that the pristine branch is not up-to-date. You can resolve this problem by looking at $TMPREPO/etc, where you will see some sort of merge conflict. Resolve the conflict and commit. Now you will need to manually complete the rest of the script, this can be done with:

To make sure you did it right, go back to /etc and run git status: it should report the working directory as clean. Otherwise, there are discrepancies and you may not have done the merges correctly.

I’ve been testing this setup for a month now, and it has proceeded very smoothly (though I’ve never attempted to do a full release upgrade with this setup). Unfortunately, as I’ve said previously, I don’t have a method for constructing a pristine branch from scratch, if you have an existing system you’d like to apply this trick to. There’s nothing stopping you, though: you can always decide to start, in which case you will record just the diffs from the time you started recording pristine. Give it a spin!

January 17, 2014

POPL is almost upon us! I’ll be live-Tumblr-ing it when the conference comes upon us proper, but in the meantime, I thought I’d write a little bit about one paper in the colocated PEPM'14 program: The HERMIT in the Stream, by Andrew Farmer, Christian Höner zu Sierdissen and Andy Gill.
This paper presents an implementation of an optimization scheme for
fusing away use of the concatMap combinator in the stream fusion
framework, which was developed using the HERMIT optimization framework. The HERMIT project has been chugging along for some time now, and a stream of papers of various applications of the framework have been trickling out (as anyone who was at the Haskell implementors workshop can attest.)

“But wait,” you may ask, “don’t we already have stream fusion?” You’d be right: but while stream fusion is available as a library, it has not replaced the default fusion system that ships with GHC: foldr/build fusion. What makes fusion scheme good? One important metric is the number of list combinators it supports. Stream
fusion nearly dominates foldr/build fusion, except for the case of
concatMap, a problem which has resisted resolution for seven years and
has prevented GHC from switching to using stream fusion as its default.

As it turns out, we’ve known how to optimize concatMap for a long time; Duncan Coutts gave a basic outline in his thesis. The primary
contribution of this paper was a prototype implementation of this
optimization, including an elucidation of the important technical
details (increasing the applicability of the original rule, necessary
modifications to the simplifier, and rules for desugaring list
comprehensions). The paper also offers some microbenchmarks and real world
benchmarks arguing for the importance of optimizing concatMap.

I was glad to see this paper, since it is an important milestone on the way
to replacing foldr/build fusion with stream fusion in the GHC standard
libraries. It also seems the development of this optimization was
greatly assisted by the use HERMIT, which seems like a good validation
for HERMIT (though the paper does not go into very much detail about how
HERMIT assisted in the process of developing this optimization).

There is something slightly unsatisfying with the optimization as stated in the paper, which can be best articulated by
considering the paper from the perspective of a prospective implementor
of stream fusion. She has two choices:

She can try to use the HERMIT system directly. However, HERMIT induces
a 5-20x compilation slowdown, which is quite discouraging for real
use. This slowdown is probably not fundamental, and will be erased in
due time, but that is certainly not the case today. The limited
implementation of stream fusion in the prototype (they don’t implement all of the combinators, just enough so they could run their numbers) also recommends
against direct use of the system.

She can directly incorporate the rules as stated into a compiler.
This would require special-case code to apply the non-semantics
preserving simplifications only to streams, and essentially would
require a reimplementation of the system, with the guidance offered by
this paper. But this special-case code is of limited applicability
beyond its utility for concatMap, which is a negative mark.

So, it seems, at least from the perspective of an average GHC user, we
will have to wait a bit longer before stream fusion is in our hands.
Still, I agree that the microbenchmarks and ADPFusion case study show
the viability of the approach, and the general principle of the novel
simplification rules seems reasonable, if a little ad hoc.

One note if you’re reading the nofib performance section: the experiment was done comparing their system to foldr/build, so the delta is mostly indicative of the benefit of stream fusion (in the text, they point out which benchmarks benefitted the most from concatMap fusion). Regardless, it’s a pretty cool paper: check it out!

January 14, 2014

Ott and PLT Redex are a pair of complimentary tools for the working semanticist. Ott is a tool for writing definitions of programming languages in a nice ASCII notation, which then can be typeset in LaTeX or used to generate definitions for a theorem prover (e.g. Coq). PLT Redex is a tool for specifying and debugging operational semantics. Both tools are easy to install, which is a big plus. Since the tools are quite similar, I thought it might be interesting to do a comparison of how various common tasks are done in both languages. (Also, I think the Redex manual is pretty terrible.)

Variables. In Ott, variables are defined by way of metavariables (metavar x), which then serve as variable (by either using the metavariable alone, or suffixing it with a number, index variable or tick).

In Redex, there is no notion of a metavariable; a variable is just another production. There are a few different ways say that a production is a variable: the simplest method is to use variable-not-otherwise-mentioned, which automatically prevents keywords from acting as variables. There are also several other variable patterns variable, variable-except and variable-prefix, which afford more control over what symbols are considered variables. side-condition may also be useful if you have a function which classifies variables.

Grammar. Both Ott and Redex can identify ambiguous matches. Ott will error when it encounters an ambiguous parse. Redex, on the other hand, will produce all valid parses; while this is not so useful when parsing terms, it is quite useful when specifying non-deterministic operational semantics (although this can have bad performance implications). check-redundancy may be useful to identify ambiguous patterns.

Binders. In Ott, binders are explicitly declared in the grammar using bind x in t; there is also a binding language for collecting binders for pattern-matching. Ott can also generate substitution/free variable functions for the semantics. In Redex, binders are not stated in the grammar; instead, they are implemented solely in the reduction language, usually using substitution (Redex provides a workhorse substitution function for this purpose), and explicitly requiring a variable to be fresh. Redex does have a special-form in the metalanguage for doing let-binding (term-let), which substitutes immediately.

Lists. Ott supports two forms of lists: dot forms and list comprehensions. A dot form looks like x1 , .. , xn and requires an upper bound. A list comprehension looks like </ xi // i IN 1 .. n />; the bounds can be omitted. A current limitation of Ott is that it doesn’t understand how to deal with nested dot forms, this can be worked around by doing a comprension over a production, and then elsewhere stating the appropriate equalities the production satisfies.

Redex supports lists using ellipsis patterns, which looks like (e ...). There is no semantic content here: the ellipses simply matches zero or more copies of e, which can lead to nondeterministic matches when there are multiple ellipses. Nested ellipses are supported, and simply result in nested lists. Bounds can be specified using side-conditions; however, Redex supports a limited form of bounding using named ellipses (e.g. ..._1), where all ellipses with the same name must have the same length.

Semantics. Ott is agnostic to whatever semantics you want to define; arbitrary judgments can be specified. One can also define judgments as usual in Redex, but Redex provides special support for evaluation semantics, in which a semantics is given in terms of evaluation contexts, thus allowing you to avoid the use of structural rules. So a usual use-case is to define a normal expression language, extend the language to have evaluation contexts, and then define a reduction-relation using in-hole to do context decomposition. The limitation is that if you need to do anything fancy (e.g. multi-hole evaluation contexts), you will have to fall back to judgment forms.

Type-setting. Ott supports type-setting by translation into LaTeX. Productions can have custom LaTeX associated with them, which is used to generate their output. Redex has a pict library for directly typesetting into PDF or Postscript; it doesn’t seem like customized typesetting is an intended use-case for PLT Redex, though it can generate reasonable Lisp-like output.

Conclusion. If I had to say what the biggest difference between Ott and PLT Redex was, it is that Ott is primarily concerned with the abstract semantic meaning of your definitions, whereas PLT Redex is primarily concerned with how you would go about matching against syntax (running it). One way to see this is in the fact that in Ott, your grammar is a BNF, which is fed into a CFG parser; whereas in PLT Redex, your grammar is a pattern language for the pattern-matching machine. This should not be surprising: one would expect each tool’s design philosophy to hew towards their intended usage.

January 07, 2014

Warning: (ezyang) mkdir(): No such file or directory in /afs/athena.mit.edu/user/e/z/ezyang/web_scripts/blog/wp-content/plugins/rest-wordpress/rest.php on line 82

Warning: (ezyang) fopen(/tmp/rest_cache/9e037278b0fbbee6f64cdaff1e7ce56d.html): failed to open stream: No such file or directory in /afs/athena.mit.edu/user/e/z/ezyang/web_scripts/blog/wp-content/plugins/rest-wordpress/rest.php on line 208

MVars are an amazingly flexible synchronization primitive, which can serve as locks, one-place channels, barriers, etc. or be used to form higher-level abstractions. As far as flexibility is concerned, MVars are the superior choice of primitive for the runtime system to implement—as opposed to just implementing, say, a lock.

However, I was recently thinking about GHC's BlockedIndefinitelyOnMVar exception, and it occurred to me that a native implementation of locks could allow perfect deadlock detection, as opposed to the approximate detection for MVars we currently provide. (I must emphasize, however, that here, I define deadlock to mean a circular waits-for graph, and not “thread cannot progress further.”)

Here is how the new primitive would behave:

There would be a new type Lock, with only one function withLock :: Lock -> IO a -> IO a. (For brevity, we do not consider the generalization of Lock to also contain a value.)

At runtime, the lock is represented as two closure types, indicating locked and unlocked states. The locked closure contains a waiting queue, containing threads which are waiting for the lock.

When a thread takes out a free lock, it adds the lock to a (GC'd) held locks set associated with the thread. When it returns the lock, the lock is removed from this set.

When a thread attempts to take a busy lock, it blocks itself (waiting for a lock) and adds itself to the waiting queue of the locked closure.

Critically, references to the lock are treated as weak pointers when the closure is locked. (Only the pointer from the held lock set is strong.) Intuitively, just because a pointer to the lock doesn’t mean you can unlock; the only person who can unlock it is the thread who has the lock in their held locks set.

If a thread attempts to take out a lock on a dead weak pointer, it is deadlocked.

Theorem.Any set of threads in a waits-for cycle is unreachable, if there are no other pointers to thread besides the pointer from the waiting queue of the locks in the cycle.

Proof. Consider a single thread in the cycle: we show that the only (strong) pointer to it is from the previous thread in the cycle. When a thread is blocked, it is removed from the run queue (which counts as a GC root). Given the assumption, the only pointer to the thread is from the waiting queue of the lock it is blocked on. We now consider pointers to the lock it is blocked on. As this lock is busy, all pointers to it are weak, except for the pointer from the thread which is holding the lock. But this is exactly the previous thread in the cycle. ■

At the cost of a weak-pointer dereference when a lock is taken out, we can now achieve perfect deadlock detection. Deadlock will be detected as soon as a garbage collection runs that detects the dead cycle of threads. (At worst, this will be the next major GC.)

Why might this be of interest? After all, normally, it is difficult to recover from a deadlock, so while accurate deadlock reporting might be nice-to-have, it is by no means necessary. One clue comes from a sentence in Koskinen and Herlihy's paper Dreadlocks: Efficient Deadlock Detection: “an application that is inherently capable of dealing with abortable lock requests...is software transactional memory (STM).” If you are in an STM transaction, deadlock is no problem at all; just rollback one transaction, breaking the cycle. Normally, one does not take out locks in ordinary use of STM, but this can occur when you are using a technique like transactional boosting (from the same authors; the relationship between the two papers is no coincidence!)

Exercise for the reader, formulate a similar GC scheme for MVars restricted to be 1-place channels. (Hint: split the MVar into a write end and a read end.)

January 01, 2014

One of the appealing things about GHC is that the compiler is surprisingly hackable, even when you don’t want to patch the compiler itself. This hackability comes from compiler plugins, which let you write custom optimization passes on Core, as well as foreign primops, which let you embed low-level C-- to manipulate the low-level representation of various primitives. These hooks let people implement and distribute features that would otherwise be to unstable or speculative to put into the compiler proper.

A particular use-case that has garnered some amount of interest recently is that of concurrency primitives. We engineers like to joke that, in the name of performance, we are willing to take on nearly unbounded levels of complexity: but this is almost certainly true when it comes to concurrency primitives, where the use of ever more exotic memory barriers and concurrent data structures can lead to significant performance boosts (just ask the Linux kernel developers). It’s very tempting to look at this situation and think, “Hey, we could implement this stuff in GHC too, using the provided compiler hooks!” But there are a lot of caveats involved here.

After answering a few questions related to this subject on the ghc-devs list and noticing that many of the other responses were a bit garbled, I figured I ought to expand on my responses a bit in a proper blog post. I want to answer the following questions:

What does it mean to have a memory model for a high-level language like Haskell? (You can safely skip this section if you know what a memory model is.)

What is (GHC) Haskell’s memory model?

How would I go about implementing a (fast) memory barrier in GHC Haskell?

Memory models are semantics

What is a memory model? If you ask a hardware person, they might tell you, “A memory model is a description of how a multi-processor CPU interacts with its memory, e.g. under what circumstances a write by one processor is guaranteed to be visible by another.” If you ask a compiler person, they might tell you, “A memory model says what kind of compiler optimizations I’m allowed to do on operations which modify shared variables.” A memory model must fulfill both purposes (a common misconception is that it is only one or the other). To be explicit, we define a memory model as follows (adapted from Adve-Boehm):

A memory model is a semantics for shared variables, i.e. the set of values that a read in a program is allowed to return.

That’s right: a memory model defines the behavior of one the most basic operations in your programming language. Without it, you can’t really say what your program is supposed to do.

Why, then, are memory models so rarely discussed, even in a language community that is so crazy about semantics? In the absence of concurrency, the memory model is irrelevant: the obvious semantics apply. In the absence of data races, the memory model can be described quite simply. For example, a Haskell program which utilizes only MVars for inter-thread communication can have its behavior described completely using a relatively simple nondeterministic operational semantics (see Concurrent Haskell paper (PS)); software transactional memory offers high-level guarantees of atomicity with respect to reads of transactional variables. Where a memory model becomes essential is when programs contain data races: when you have multiple threads writing and reading IORefs without any synchronization, a memory model is responsible for defining the behavior of this program. With modern processors, this behavior can be quite complex: we refer to these models as relaxed memory models. Sophisticated synchronization primitives will often take advantage of a relaxed memory model to avoid expensive synchronizations and squeeze out extra performance.

GHC Haskell’s memory (non) model

One might say the Haskell tradition is one that emphasizes the importance of semantics... except for a number of notable blind spots. The memory model is one of those blind spots. The original Haskell98 specification did not contain any specification of concurrency. Concurrent Haskell paper (PS) gave a description of semantics for how concurrency might be added to the language, but the paper posits only the existence of MVars, and is silent on how MVars ought to interact with IORefs.

The upshot is that, as far as Haskell the standardized language goes, the behavior here is completely undefined. To really be able to say anything, we’ll have to pick an implementation (GHC Haskell), and we’ll have to infer which aspects of the implementation are specified behavior, as opposed to things that just accidentally happen to hold. Notably, memory models have implications for all levels of your stack (it is a common misconception that a memory barrier can be used without any cooperation from your compiler), so to do this analysis we’ll need to look at all of the phases of the GHC compilation chain. Furthermore, we’ll restrict ourselves to monadic reads/writes, to avoid having to wrangle with the can of worms that is laziness.

Here’s GHC’s compilation pipeline in a nutshell:

At the very top of the compiler pipeline lie the intermediate languages Core and STG. These will preserve sequential consistency with no trouble, as the ordering of reads and writes is fixed by the use of monads, and preserved throughout the desugaring and optimization passes: as far as the optimizer is concerned, the primitive operations which implement read/write are complete black boxes. In fact, monads will over-sequentialize in many cases! (It is worth remarking that rewrite rules and GHC plugins could apply optimizations which do not preserve the ordering imposed by monads. Of course, both of these facilities can be used to also change the meaning of your program entirely; when considering a memory model, these rules merely have a higher burden of correctness.)

The next step of the pipeline is a translation into C--, a high-level assembly language. Here, calls to primitive operations like readMutVar# and writeMutVar# are translated into actual memory reads and writes in C--. Importantly, the monadic structure that was present in Core and STG is now eliminated, and GHC may now apply optimizations which reorder reads and writes. What actually occurs is highly dependent on the C-- that is generated, as well as the optimizations that GHC applies, and C-- has no memory model, so we cannot appeal to even that.

This being said, a few things can be inferred from a study of the optimization passes that GHC does implement:

GHC reserves the right to reorder stores: the WriteBarrier mach-op (NB: not available from Haskell!) is defined to prevent future stores from occurring before preceding stores. In practice, GHC has not implemented any C-- optimizations which reorder stores, so if you have a story for dealing with the proceeding stages of the pipeline, you can dangerously assume that stores will not be reordered in this phase.

GHC reserves the right to reorder loads, and does so extensively. One of the most important optimizations we perform is a sinking pass, where assignments to local variables are floated as close to their use-sites as possible. As of writing, there is no support for read barrier, which would prevent this floating from occurring.

There are a few situations where we happen to avoid read reordering (which may be dangerously assumed):

Reads don’t seem to be reordered across foreign primops (primops defined using the foreign prim keywords). This is because foreign primops are implemented as a jump to another procedure (the primop), and there are no inter-procedural C-- optimizations at present.

Heap reads don’t seem to be reordered across heap writes. This is because we currently don’t do any aliasing analysis and conservatively assume the write would have clobbered the read. (This is especially dangerous to assume, since you could easily imagine getting some aliasing information from the frontend.)

Finally, the C-- is translated into either assembly (via the NCG—N for native) or to LLVM. During translation, we convert the write-barrier mach-op into an appropriate assembly instruction (no-op on x86) or LLVM intrinsic (sequential consistency barrier); at this point, the behavior is up to the memory model defined by the processor and/or by LLVM.

It is worth summarizing the discussion here by comparing it to the documentation at Data.IORef, which gives an informal description of the IORef memory model:

In a concurrent program, IORef operations may appear out-of-order to another thread, depending on the memory model of the underlying processor architecture...The implementation is required to ensure that reordering of memory operations cannot cause type-correct code to go wrong. In particular, when inspecting the value read from an IORef, the memory writes that created that value must have occurred from the point of view of the current thread.

In other words, “We give no guarantees about reordering, except that you will not have any type-safety violations.” This behavior can easily occur as a result of reordering stores or loads. However, the type-safety guarantee is an interesting one: the last sentence remarks that an IORef is not allowed to point to uninitialized memory; that is, we’re not allowed to reorder the write to the IORef with the write that initializes a value. This holds easily on x86, due to the fact that C-- does not reorder stores; I am honestly skeptical that we are doing the right thing on the new code generator for ARM (but no one has submitted a bug yet!)

What does it all mean?

This dive into the gory internals of GHC is all fine and nice, but what does it mean for you, the prospective implementor of a snazzy new concurrent data structure? There are three main points:

Without inline foreign primops, you will not be able to convince GHC to emit the fast-path assembly code you are looking for. As we mentioned earlier, foreign primops currently always compile into out-of-line jumps, which will result in a bit of extra cost if the branch predictor is unable to figure out the control flow. On the plus side, any foreign primop call will accidentally enforce the compiler-side write/read barrier you are looking for.

With inline foreign primops, you will still need make modifications to GHC in order to ensure that optimization passes respect your snazzy new memory barriers. For example, John Lato’s desire for a load-load barrier (the email which kicked off this post) will be fulfilled with no compiler changes by a out-of-line foreign primop, but not by the hypothetical inline foreign primop.

This stuff is really subtle; see the position paper Relaxed memory models must be rigorous, which argues that informal descriptions of memory models (like this blog post!) are far too vague to be useful: if you want to have any hope of being correct, you must formalize it! Which suggests an immediate first step: give C-- a memory model. (This should be a modest innovation over the memory models that C and C++ have recently received.)

For the rest of us, we’ll use STM instead, and be in a slow but compositional and dead-lock free nirvana.

December 17, 2013

Anyone who has done some coding in Rust may be familiar with the dreaded borrow checker, famous for obstructing the compilation of otherwise “perfectly reasonable code.” In many cases, the borrow checker is right: you’re writing your code wrong, and there is another, clearer way to write your code that will appease the borow checker. But sometimes, even after you’ve skimmed the tutorial, memorized the mantra “a &mut pointer is the only way to mutate the thing that it points at” and re-read the borrowed pointers tutorial, the borrow-checker might still stubbornly refuse to accept your code.

If that’s the case, you may have run into one of the two (in)famous bugs in the borrow-checker. In this post, I want to describe these two bugs, give situations where they show up and describe some workarounds. This is the kind of post which I hope becomes obsolete quickly, but the fixes for them are pretty nontrivial, and you are inevitably going to run into these bugs if you try to program in Rust today.

Mutable borrows are too eager (#6268)

Summary. When you use &mut (either explicitly or implicitly), Rust immediately treats the lvalue as borrowed and imposes its restrictions (e.g. the lvalue can’t be borrowed again). However, in many cases, the borrowed pointer is not used until later, so imposing the restrictions immediately results in spurious errors. This situation is most likely to occur when there is an implicit use of &mut. (Bug #6268)

Symptoms. You are getting the error “cannot borrow `foo` as immutable because it is also borrowed as mutable”, but the reported second borrow is an object dispatching a method call, or doesn’t seem like it should have been borrowed at the time the flagged borrow occured.

Examples. The original bug report describes the situation for nested method calls, where the outer method call has &mut self in its signature:

This code would like to retrieve the value at key 1 and store it in key 2. Why does it fail? Consider the signature fn insert(&mut self, key: K, value: V) -> bool: the insert method invocation immediately takes out a &mut borrow on map before attempting to evaluate its argument. If we desugar the method invocation, the order becomes clear: HashMap::insert(&mut map, 2, *map.get(&1)) (NB: this syntax is not implemented yet). Because Rust evaluates arguments left to right, this is equivalent to:

Discussion. Fortunately, this bug is pretty easy to work around, if a little annoying: move all of your sub-expressions to let-bindings before the ill-fated mutable borrow (see examples for a worked example). Note: the borrows that occur in these sub-expressions really do have to be temporary; otherwise, you have a legitimate “cannot borrow mutable twice” error on your hands.

Borrow scopes should not always be lexical (#6393)

Summary. When you borrow a pointer, Rust assigns it a lexical scope that constitutes its lifetime. This scope can be as small as a single statement, or as big as an entire function body. However, Rust is unable to calculate lifetimes that are not lexical, e.g. a borrowed pointer is only live until halfway through a function. As a result, borrows may live longer than users might expect, causing the borrow checker to reject some statements. (Bug #6393)

Symptoms. You are getting a “cannot borrow foo as immutable/mutable because it is also borrowed as immutable/mutable”, but you think the previous borrow should have already expired.

Examples. This problem shows up in a variety of situations. The very simplest example which tickles this bug can be seen here:

table is a map of integer keys to vectors. The code performs an insert at key: if the map has no entry, then we create a new singleton vector and insert it in that location; otherwise, it just pushes the value 1 onto the existing vector. Why is table borrowed in the None branch? Intuitively, the borrow for table.find_mut should be dead, since we no longer are using any of the results; however, to Rust, the only lexical scope it can assign the borrowed pointer encompasses the entire match statement, since the borrowed pointer continues to be used in the Some branch (note that if the Some branch is removed, this borrow checks). Unfortunately, it’s not possible to insert a new lexical scope, as was possible in the previous example. (At press time, I wasn’t able to find a small example that only used if.)

Sometimes, the lifetime associated with a variable can force it to be assigned to a lexical scope that is larger than you would expect. Issue #9113 offers a good example of this (code excerpted below):

This code is attempting to perform a database lookup; it first consults the cache and returns a cached entry if available. Otherwise, it looks for the value in the database, caching the value in the process. Ordinarily, you would expect the borrow of self.cache in the first match to extend only for the first expression. However, the return statement throws a spanner in the works: it forces the lifetime of data to be 'a, which encompasses the entire function body. The borrow checker then concludes that there is a borrow everywhere in the function, even though the function immediately returns if it takes out this borrow.

Discussion. The workaround depends on the nature of the scope that is causing trouble. When match is involved, you can usually arrange for the misbehaving borrow to be performed outside of the match statement, in a new, non-overlapping lexical scope. This is easy when the relevant branch does not rely on any variables from the pattern-match by using short-circuiting control operators:

The boolean can be elaborated into an enum that holds any non-references from the pattern-match you might need. Note that this will not work for borrowed references; but in that case, the borrow truly was still live!

It is a bit more difficult to workaround problems regarding lifetimes, since there is nowhere in the function the pointer is not “borrowed”. One trick which can work in some situations is to convert the function to continuation passing style: that is, instead of returning the borrowed pointer, accept a function argument which gets invoked with the function. pnkfelix describes how you might go about fixing the third example. This removes the lifetime constraint on the variable and resolves the problem.

The lexical scope assigned to a borrow can be quite sensitive to code pertubation, since removing a use of a borrow can result in Rust assigning a (much) smaller lexical scope to the borrow, which can eliminate the error. Sometimes, you can avoid the problem altogether by just avoiding a borrow.

Conclusion

To sum up:

Bug #6268 can cause borrows to start too early (e.g. in method invocations), work around it by performing temporary borrows before you do the actual borrow.

Bug #6393 can cause borrows to end too late (e.g. in match statements), work around it by deferring operations that need to re-borrow until the original lexical scope ends.

Keep these in mind, and you should be able to beat the borrow checker into submission. That is, until Niko fixes these bugs.

October 31, 2013

GHC’s block allocator is a pretty nifty piece of low-level infrastructure. It offers a much more flexible way of managing a heap, rather than trying to jam it all in one contiguous block of memory, and is probably something that should be of general interest to anyone who is implementing low-level code like a runtime. The core idea behind it is quite old (BIBOP: Big Bag of Pages), and is useful for any situation where you have a number of objects that are tagged with the same descriptor, and you don’t want to pay the cost of the tag on each object.

Managing objects larger than pages is a bit tricky, however, and so I wrote a document visualizing the situation to help explain it to myself. I figured it might be of general interest, so you can get it here: http://web.mit.edu/~ezyang/Public/blocks.pdf

Some day I’ll convert it into wikiable form, but I don’t feel like Gimp'ing the images today...

June 21, 2013

"We just passed 10 years of cumulative uptime!" an XVM maintainer announced, celebrating how reliable the 8 servers that power the SIPB XVM Virtual Machine Service have been. Just an hour later, at 4:38 PM, Nagios alerted "Host DOWN PROBLEM alert for xvm!"

xvm.mit.edu is a virtual machine running on the XVM hosts itself, tasked with running the web interface, DNS, and DHCP server for the project. xvm.mit.edu, like most servers at MIT, also has OpenAFS installed, which is known to trigger kernel bugs that frequently require a reboot to correct. We attempted to get access to the host running xvm.mit.edu, babylon-four, only to find that it, too, was not accessible! Past experience suggested this was a catastrophic crash of the VM host itself, and that a full power cycle would be needed to bring it on line.

At 4:52 PM, a maintainer resorted to logging into babylon-four's IPMI console, and was surprised to find the machine responsive, with all VMs (including xvm.mit.edu) still running. Investigation of the network interfaces showed the backend networks (which provide connectivity to the RAIDs that have the hard drive images of XVM) were fully functional. The frontend network interface was properly configured, but was unable to handle any traffic, including ping. However, tcpdump showed that some network traffic was able to pass, including ARP traffic inbound and outbound. By 5:05 PM, we had determined that the network card believed it had link with ethtool and mii-tool, and asked the NIC to re-negotiate its link with ethtool --renegotiate. However, we were able to receive responses to arping. These symptoms directly match those of a bad switch in other buildings at MIT, such as the Student Center (W20), leading a few maintainers to hypothesize that the issue was a bad switch or bad switch port, while others pointed out that the manufacturers of the switches in question are entirely different. The maintainers tried various debugging techniques with arping and tcpdump to attempt to get the NIC to pass traffic.At 6:05 PM, we paged the IS&T Network Team, having concluded that there's nothing wrong with babylon-four, and asked them to investigate the issue. Per the Network Team's request, we also submitted a ticket to their queue on help.mit.edu at 6:20 PM.At 6:35 PM, the entire SIPB network, 18.181/16, became inaccessible. This includes, but is not limited to, Scripts, Linerva, SIPB AFS, the SIPB IPv6 tunnel, XVM, and all user VMs. Mirrors was not affected. The IM client being used by the maintainers to communicate, Barnowl, crashed on numerous machines when the SIPB network became inaccessible. We believe this was caused by a bad interaction between the Barnowl client being unable to spawn a helper program called zcrypt, which lives in the SIPB AFS space. The maintainers resorted to communicating with an older IM client for the same protocol, zwgc, until 6:48 PM, when the SIPB network once again became accessible. The cause of this 13-minute total outage is currently unknown.The Scripts maintainers noticed at 6:55 PM that although the individual Scripts web servers were accessible, the scripts.mit.edu address was not. We inspected the primary Scripts Director (load balancer), stanley-kubrick, and found it believed it had the scripts.mit.edu address, and was prepared to handle traffic for it. The secondary Scripts Director (rack-backward) in the cluster agreed this was the state of the cluster, but the tertiary Scripts Director (rack-forward) was not accessible. A maintainer went to reboot the tertiary Scripts Director, but found it fully responsive to the console at 7:05 PM, and it regained network connectivity at 7:06 PM. At 7:11 PM, the Scripts load balancing service was migrated from stanley-kubrick to rack-backward, and then unmigrated, restoring access to Scripts.

We then noticed an email timestamped 6:57 PM from the Network Team, with the results of the diagnostic on the switch that babylon-four was connected to, with the result of no issues detected. Although access to most SIPB network services was restored at 6:48 PM, and Scripts at 7:11 PM, XVM was still partially inaccessible.At 7:22 PM, we noticed the Scripts web server cats-whiskers was exhibiting the same symptoms as babylon-four, and performed an experiment of ifdown and then ifup on the network interface. Network connectivity was restored, leading the maintainers to try to repeat the procedure on babylon-four. To simulate a link change, we attempted to force the interface in 100Mbit mode, and then allow it to re-negotiate. When this failed to yield any useful results, we removed the physical network device from the bridge device, and restarted it. After an initial spurt of traffic which seemed to indicate progress, the interface returned to its earlier, non-functional state. The same procedure was then applied, with the additional step of restarting the network card driver, but still yielded no change. We did not want to run ifdown and ifup on the interface, as the VM host had a version of Xen installed that did interface renaming, meaning these tools would be unable to recognize the device.After trying one last diagnostic requested by the Network Team, we responded to their email at 8:03 PM, with the result that we were unable to ping the local interface that had been created on the switch from the affected host. At 8:44 PM, we received a reply that we should try moving to a different port on the switch. At 8:55 PM, a maintainer started walking towards the W91 data center that houses XVM to perform the port swap. At 9:17 PM, the frontend network on babylon-four was moved to a different switch port and the XVM service became fully accessible.We are still waiting on additional information from the Network Team about the outage, including detailed diagnostics on the original port babylon-four was connected to, as well as information on the 13-minute total outage.

June 03, 2013

Every year, hundreds of prospective students come to MIT for Campus Preview Weekend, to explore the campus and learn what life is like. Departments, living groups, dorms, and even student groups all put on events to demonstrate all of the awesome things that happen at MIT.

SIPB decided that, in addition to our normal events of the fsck and inode block party, the Everything You Wanted To Know About Computers (But Were Afraid To Ask) panel, and SIPB Machine Room Tours, we were going to put up a Retrocomputing Exhibit. We worked to bring back a bunch of different computers that once powered MIT's Project Athena, including the Blue Toaster (SGI O2), Digital Personal Workstation 500au, a VT100 (with a Raspberry Pi inside, to simulate a mainframe) dubbed "Maxberry Pi", and a VAXstation 3100 named "Binkley".

Each of these computers was a challenge to set up, between finding compatible spare parts to get them all into working condition, finding the right versions of old operating systems and software, to making the systems work as they originally intended but with modern Athena. Unfortunately, the VAXstation predates the rest of modern Athena so much that it couldn't query the Athena server about usernames or the location of critical system files, because it was speaking an ancient version of the DNS protocol. Binkley was running a modified version of 4.3BSD for use at Athena, and queried the Athena servers for information using the Hesiod DNS class, HS.

The HS DNS class is long deprecated, and the Athena DNS servers no longer know how to respond to HS queries. Recompiling Hesiod for the VAXstation would be difficult, because it was embedded statically in many binaries, some of which live in AFS, and not local disk. Instead, a translator between HS and standard IN records would be needed, so that Binkley could be re-configured to talk to a new DNS server that did the translation, querying the actual Athena DNS servers.

A friend, Geoffrey Thomas, jokingly suggested I write the translator in node.js, instead of Python as I originally intended. About 30 minutes later, the translator was completed! With a one-line configuration change to re-point to the new name server, Binkley was able to fully boot for the first time in roughly a decade. Check out the code!

January 15, 2013

Today, at around 2:30 PM EST, Scripts went down. For most MIT students, this means a few class websites are inaccessible, a few friends' blogs are down, and web development is a bit annoying. For me, it means panic mode. I'm a member of the Scripts Maintainer Team, so the availability and security of the service is my responsibility.

The MIT network has been particularly unreliable recently, leading to speculation that there was a DDoS attack mounted by Anonymous. While we're still waiting for an official diagnosis from IS&T, I'm going to go ahead and say this: if Anonymous did take down the MIT network, and is continuing to attack it, please stop. You're just making students' lives worse.

The current pattern of "hacktivism" is supposed to get people's attention, and it is—just not those that can do anything. Scripts is MIT's largest web host but it's run entirely by student volunteers. Any and all attacks that are supposed to get the attention of the administration are instead being handled by SIPB members.

For those of you that want to do something in Aaron Swartz's memory, please find a more positive and productive task than upsetting MIT students that had nothing to do with the administration's decision.

November 18, 2012

Most of the time when programmers write code, they're writing in userspace. This "normal" code doesn't have to worry about the underlying hardware, it interacts with the operating system's abstractions only.

I'm currently taking 6.828 Operating System Engineering, which teaches how to write a 32-bit multitasking operating system, using the exokernel design, for the Intel x86 platform. This means dealing with all sorts of low-level x86 things, like booting, interrupts, faults, protection modes, rings, virtual memory management, IO; and also dealing with some less x86-specific things, like implementing a kernel monitor/debugger, syscalls, and a userspace library.

The kernel—called JOS—has a lot of limitations on memory. First of all, it's only 32-bit, so it can only address 4 gigabytes of RAM. Second, the kernel maps all of physical memory at KERNBASE, the location at which the kernel itself is loaded, currently 0xF0000000, meaning that JOS can only utilize 256 megabytes of RAM. To make matters worse, JOS is actually incapable of detecting more than 64 megabytes of RAM because it queries non-volatile memory on the MC146818, which only returns a 16-bit value representing the kilobytes of RAM.

Changing KERNBASE to 0x80000000 would allow JOS to utilize an optimal 2 gigabytes of memory, but reduces the available mappings for userspace. Even then, it still doesn't get us the 4 gigabytes that x86 promises us.

No, to make JOS more memory aware, there's only one thing to do, besides redesigning how it deals with memory: upgrade to 64-bit. The 64-bit x86, known as x86-64, AMD64, EM64T, IA-32e, and Intel 64, makes the whole virtual memory business even more complicated. Not only are there page tables and page directories, there's also the page-directory-pointer table and the page map level 4. Jumping straight from 32-bit to 64-bit is enough work not even counting the massive changes to the page-management portion of the kernel. Luckily, Physical Address Extension (PAE) looks a lot like 64-bit paging. In fact, 64-bit paging is a strict superset of PAE, adding the page map level 4 and more entries to the page-directory-pointer table.

Implementing PAE looks fairly straightforward:

Set CR4.PAE = 1

Load a page-directory-pointer table (PDPT) into CR3 instead of a page directory

Change the page table entries to be 64 bits wide, rather than 32 bits.

Add in a little bit more code to support the 3-level hierarchy and all done! Yet, when I made these changes, the first thing that happened was I got a scary error message from gcc: error: initializer element is not computable at load time. It turns out that the IA-32 (x86) ELF ABI has no way to represent a 64-bit relocation, meaning the change to the initial page table and page directory arrays from 32-bits to 64-bits broke the entire system. I then decided to make the initial page tables and directory 32-bit, but insert a 32-bit 0 between the relevant entries. This compiled, and the resulting code promptly crashed with a triple fault.

The JOS was being tested by running it in qemu, so I promptly attached a debugger and single-stepped the boot up sequence, and of course, the crash was right after loading CR3 with the page-directory-pointer table. I then discovered the version of qemu the course staff had provided had an info pg command, which showed what the software MMU thought the page tables looked like. Entering the virtual machine monitor and typing the command yielded....nothing. The MMU thought there were no page tables, despite having them loaded correctly. My clever 32-bit page table entry hack from before, inserting the zeros, was actually my mistake: I forgot x86 was little-endian, and inserted the zero entries to the wrong side of the actual entries! With that small correction, the kernel was able to boot and switch to its own, managed PAE-enabled page table structure.

The story doesn't end here—JOS exposes the page table mappings to userspace by mapping the page directory into itself, a holographic mapping. Now, there are four page directories to map, which adds a bit of complexity. Every userspace program that previously assumed there were only 1024 entries would crash, as there were now 2048. Mapping all of the page directories and updating userspace to be aware that the entries are now 64-bit were just part of the changes needed to fix the code that made 32-bit assumptions.

Now, with a fully-working PAE JOS and userspace, there was much rejoicing, and code pushing.

Except, PAE JOS crashed when booted on VMware, VirtualBox, or bochs. 2 hours of debugging later, I noticed I had accidentally set one of the reserved bits in the page-directory-pointer-table, which according to the Intel Architecture Manual should result in a General Protection error. A bug in JOS resulted in finding a bug in qemu! One of the few times failure to crash is a bug.

It goes without saying, when writing your own operating system, you'll run into strange bugs. Adding PAE support to JOS already brought out some of the stranger bugs I'd seen, taking almost 18 hours of debugging and implementation time. Who knows what other strange problems I'll run into as I try to widen the bus to 64 bits?

June 10, 2012

On Tuesday, June 26th, 2012, all of the scripts.mit.edu servers will be upgraded from Fedora 15 to Fedora 17, which was released on May 29. We strongly encourage you to test your website as soon as possible, and to contact us at scripts@mit.edu or come to our office in W20-557 if you experience any problems. The easiest way to test your site is to run the following commands at an Athena workstation and then visit your website in the browser that opens, but see this page for more details and important information:

November 06, 2011

On Monday, November 21, 2011, all of the scripts.mit.edu servers will be upgraded from Fedora 13 to Fedora 15, which was released on May 25. We strongly encourage you to test your website as soon as possible, and to contact us at scripts@mit.edu or come to our office in W20-557 if you experience any problems. The easiest way to test your site is to run the following commands at an Athena workstation and then visit your website in the browser that opens, but see this page for more details and important information:

June 09, 2011

Our whitelist of file types served directly to the web has for a long time included .doc, .xls, and .ppt. With the advent of new XML-based Microsoft Office formats, and with the popularity of LibreOffice and OpenOffice.org, there have been requests for whitelisting these additional file types. As of yesterday, the new Office XML filetypes — .docx, .xlsx, .pptx, etc. — as well as ODF file types — .odt, .ods, .odp, etc. — will also be served directly to the web.

In addition to files you place in your locker, this also affects files uploaded to your website via the standard upload feature of apps such as MediaWiki and WordPress.

September 18, 2010

On Sunday, September 26, 2010, all of the scripts.mit.edu servers will be upgraded from Fedora 11 to Fedora 13, which was released on May 25. We strongly encourage you to test your website as soon as possible, and to contact us at scripts@mit.edu or come to our office in W20-557 if you experience any problems. The easiest way to test your site is to run the following commands at an Athena workstation and then visit your website in the browser that opens, but see this page for more details and important information:

August 16, 2010

I wrote another post last week for the Ksplice blog: Strace — The Sysadmin’s Microscope. If you’re running a Linux system, or just writing or maintaining a complex program, sometimes strace is indispensable — it’s the tool that tells you what a program is really doing. My post explains why strace is so good at showing the interesting events in a program (hint: it gets to sit between the program and everything else in the universe), describes some of its key options, and shows a few ways you can use it to solve problems.

Unfortunately there’s only so much you can say in a blog post of reasonable length, so I had to cut some of my favorite uses down to bullet points. Here’s one such use, which I can’t bear to keep off of the Web, just because I thought I was so clever when I came up with it in real life a couple of months ago.

(If you haven’t already, I encourage you to go read the main post first. I’ll be here when you come back.)

Strace As A Progress Bar

Sometimes you start a command, and it turns out to take forever. It’s been three hours, and you don’t know if it’s going to be another three hours, or ten minutes, or a day.

This is what progress bars were invented for. But you didn’t know this command was going to need a progress bar when you started it.

Strace to the rescue. What work is your program doing? If it’s touching anything in the filesystem while it works, or anything on the network, then strace will tell you exactly what it’s up to. And in a lot of cases, you can deduce how far into its job it’s gotten.

For example, suppose our program is walking a big directory tree and doing something slow. Let’s simulate that with a synthetic directory tree and a find that just sleeps for each directory:

The find just looked at tmp.HvbzfbbWSa, and now it’s going into tmp.MiHDWiBURu. How far is that into the total? ls will tell us the list of directories that the find is working from; we just have to tell it to give them to us in the raw, unsorted order that the parent directory lists them in, with the -U flag. And then grep -n will tell us where in that list the entry tmp.HvbzfbbWSa appears:

So tmp.HvbzfbbWSa is entry 258 out of 1000 entries in this directory — we’re 25.8% of the way there. If it’s been four minutes so far, then we should expect about twelve more minutes to go.

(But With The Benefit Of Foresight…)

I’d be remiss if I taught you this hackish approach without mentioning that if you realize you want a progress bar before you start the command, you can do it much better — after all, the ‘progress bar’ above doesn’t even have a bar, except in your head.

Check out pv, the pipe viewer. In my little example, you’d have the command itself print out where it is, like so:

Here we’ve passed --line-mode to make pv count lines instead of its default of bytes, and --size with an argument to tell it how many lines to expect in total. Even if you can’t estimate the size, pv will cheerfully tell you how far you’ve gone, how long it’s been, and how fast it’s moving right now, which can still be handy. pv is a pretty versatile tool in its own right — explaining all the ways to use it could be another whole blog post. But the pv man page is a good start.

That’s Just One

There’s lots of other ways to use strace — starting with the two I described in my main post, and the three more, besides this one, that I only mentioned there. I don’t really know anymore how I used to manage without it.

That kills the query. But notice what the mysql program is telling you after you hit that C-c: it’s sending a separate command, namely the “query” “KILL QUERY 183“, to the server.

In fact, that KILL query is the only way to get the MySQL server to stop running a query. In particular, the MySQL server is really bad at noticing when a client goes away. Suppose instead of hitting C-c, which the mysql program traps and handles in a smart way, I simply kill the program by hitting C-\:

mysql> SELECT COUNT(DISTINCT status) FROM artifacts;
^\Aborted

Then in fact the server keeps running the query. If I fire up the MySQL client anew and issue the query SHOW PROCESSLIST, I can see the query still chugging away:

Now, why would you or I care? After all, nobody in their right mind goes about hitting control-backslash or employing equally messy means to kill their MySQL clients. And control-C behaves just as you’d hope — so long as you are using the mysql command-line client.

Where the story isn’t so good is on a typical other client program. The KILL behavior on control-C is a feature of the mysql program, not of the MySQL C API. (If you think about it, it involves installing a signal handler — not something a well-behaved library will just do.) And because it’s not a feature of the MySQL C API, it’s probably not a feature of your favorite language’s MySQL bindings, which wrap that API. In particular, I know it’s not a feature of MySQLdb, the leading Python bindings.

So suppose you write a Python script to do some MySQL queries… and you have a big honking table in your database, and you write an inefficient query… and the query planner resorts to copying most of the table to a temporary table… and after a couple of hours you kill the Python script with control-C or kill or some other means because it’s taking forever. The query will keep running. And the next day maybe it’s copied enough that it fills up your disk, and the database has an outage.

I wish that were a hypothetical. Fortunately, the MySQL server will then remove the temporary table and the disk will have space again. If you’re lucky, the server will even come back up.

Lesson: when you want to kill a MySQL query, make sure it dies. Use SHOW PROCESSLIST to check and KILL QUERY to kill.

June 13, 2010

I upgraded my laptop to Snow Leopard yesterday, and one thing I’m still reeling from is the changes to Kerberos. I’m not usually one to fault developers for wanting to move forward at the cost of compatibility, especially for rarely used features, so when I found that Apple made substantial changes to the user-facing side of Kerberos, I started updating my scripts and configuration to catch up.

I still have a few more bugs to track down, so i you know how to solve either of these, let me know:

Automatically renewing tickets.

/System/Library/LaunchAgents/com.apple.Kerberos.renew.plist appears to be a launchd job to do this, but I can’t figure out what’s supposed to trigger it.

Triggering code on ticket acquisition/renewal.

On older versions of OS X, you could set the libdefaults.login_logout_notification option in /Library/Preferences/edu.mit.Kerberos and cause Kerberos to call into a bundle in /Library/Kerberos Plug-Ins, but that doesn’t seem to work on Snow Leopard – the Login and Logout Notification API appears to be gone

In the mean time, after two hours of source diving later, I have solved one of my major bugs with Kerberos: de-stickifying options passed to kinit.

kinit seems to have gotten a lot of TLC in Snow Leopard—it appears to have been substantially re-written to take advantage of the Kerberos Identity Management API, which looks to be an attempt to genericize everything that made Kerberos on OS X special (multiple credential caches, system Keychain integration, etc.).

Unfortunately, one of the features it gained through this API was the ability to remember Kerberos ticket settings. In Leopard, if you changed ticket parameters in the “New Tickets” window of Kerberos.app (such as the duration of the tickets or the flags on them), those changes would be remembered the next time you used Kerberos.app to get tickets. But if you changed ticket parameters by passing flags to kinit, they would only work for one invocation.

But now, if I pass a flag like -l1m (get tickets that last one minute), that sticks across kinit invocations:

I found this undesirable, because when I pass flags to kinit, I want them to be for that invocation only, and any other time, I want my “defaults” – i.e. tickets that last as long as possible.

It turns out that the KIM API implementation on OS X takes backends into the standard OS X preferences API, and writes its settings to ~/Library/Preferences/edu.mit.Kerberos.IdentityManagement.plist. And in particular, it checks for the RememberCredentialAttributes to determine whether or not to store the preferences that are passed in. So just run

to disable this feature. (Replace false with true if you want to undo the change)

If you want to follow the maze of twisty passages yourself, you can grab KerberosLibraries-81.46.1.tar.gz from Apple Open Source. Here’s the relevant call chain (except for kinit, which is in KerberosClients/kinit/Sources, all paths are relative to KerberosFramework/Kerberos5/Sources):

Around here the linearity of the call chain starts to break down, but kim_os_preferences_get_boolean_for_key gets called with kim_preference_key_remember_options, which gets passed to kim_os_preferences_cfstring_for_key in kim_os_preferences_copy_value, returning CFSTR ("RememberCredentialAttributes")

Eventually CFPreferencesCopyValue gets called asking for the “RememberCredentialAttributes” key in “edu.mit.Kerberos.IdentityManagement”, checking current-host/current-user, any-host/current-user, current-host/any-user, and any-host/any-user configuration settings, in that order.

March 25, 2010

For today’s post, I thought I’d survey what software projects related to virtualization name themselves.

It turns out that most projects have cute, but relatively unoriginal, names, formed by finding a word with “vert” and changing it to “virt”.

To collect today’s list, I started with sed -ne 's/vert/virt/p' /usr/share/dict/words. That gives me 433 words.

To pare down the list a bit, I used Google’s web search API to ignore words that had no Google results. That got the list down to 125 words.

Finally, I filtered the list down to things that I actually thought were reasonable project names (for instance, “advirt” might be a reasonable project; “incontrovirtible” and “ovirtalkative”, not so much). This admittedly subjective sampling got me down to 27 words that seemed worth exploring further.

And so, without further ado, here is a sample of names for today’s vertvirtualization projects:

ConVirt: ConVirt has seen a lot of evolution. It was originally a Linux desktop graphical Xen management application, a la virt-manager, and was originally called XenMan (you can still see some evidence of this at their old website. Since then, it’s been renamed to ConVirt, enhanced to support KVM (using libvirt), and moved into a rich web application. It’s still available under the GPL, but Convirture, the company supporting ConVirt development, is selling some advanced features along with their paid support.

Divirt: Divirt is a project to create virtual networks, allowing geographically disparate virtual machines to act as if they’re all connected by a LAN. Doesn’t really look like it ever got off the ground, though.

ExtraVirt: ExtraVirt was a project from UMich to detect processor errors by running the same system synchronously in multiple VMs, regulating non-deterministic inputs, and comparing the output. All I can find is a single two-page brief on the project.

IntroVirt: Another project from the same group at UMich, IntroVirt was a system for both active intrusion detection and post-facto intrusion analysis. It used a bunch of tests from the host to monitor suspicious activity.

Invirt: The super awesome project from the MIT SIPB. Invirt is a full-stack, multi-host, Xen-based management platform targeted at semi-public deployments with a web-based control interface. It has per-machine access control and quotas, and supports creation, deletion, installation, and general administration through the web interface, including an autoinstaller for Debian and Ubuntu. The primary deployment of Invirt, the XVM service, provides free VMs for the MIT community. XVM is currently running 246 separate VMs on 4 physical servers.

oVirt: Similar to ConVirt (at least in its current form), oVirt is a RedHat-sponsored web-based virtualization management platform. In contrast to something like Invirt that’s designed for building a “public cloud”, oVirt is designed for building more of a “private cloud”, where all of the VMs are managed by the same person. oVirt was one of the first projects to heavily utilize libvirt, and both projects come from the same group in RedHat.

ReVirt: From the group that brought you ExtraVirt and IntroVirt, ReVirt is a trustworthy execution logger. It logs a VM’s execution for later replay. Like IntroVirt, it seems to be primarily designed for intrusion detection and analysis.

SubVirt: Yet another paper from UMich, this paper was somewhat groundbreaking research into malicious virtualization technology. The SubVirt project developed proof-of-concept “VMBRs” (virtual-machine based rootkits), which installed themselves as a hypervisor on machines, transparently turning the OS previously running on bare metal into a virtual machine.

Xilinx Virtex: While not virtualizing the same layer as the other projects and products here, FPGA’s are basically virtualization for silicon, letting you literally create new digital logic on the fly, and Xilinx’s Virtex series of FPGAs is the top-of-the-line.

Virtigo: I’m sort of cheating here, because Google won’t help you find this one. When I left my internship at Google, I decided to pull the virtualization testing framework I was working on out from the larger body of work it was originally included in, and Virtigo is what I decided to call it. If I ever have the time to pull the project back together, it’ll live at virtigo.org.

In addition to those names, though, there are actually some names that haven’t yet been taken:

advirt

ambivirt

antevirt

chetvirt

controvirt

covirt

culvirt

discovirt

evirt

obvirt

pervirt

povirty

retrovirt

virtebra

virtical

So – what are you going to name your next virtualization-related project or product?

March 21, 2010

Magic SysRqs

One of the most powerful ways to debug or recover Linux is through the Magic SysRq keys.

Normally, if you’re using a serial console, you send BREAK, followed by the command you want. However, this doesn’t work with the Xen paravirtualized serial console driver. Instead, you have to use Ctrl+o, followed by the command you want.

For instance, to do an emergency sync, press Ctrl+o, then s.

Note that you use Ctrl+o instead of BREAK for both dom0 and domU serial consoles – they both are using the Xen paravirtualized driver.

March 15, 2010

This week I wrote a post on the Ksplice blog, our first substantive post, following an intro post by Waseem. As I mentioned last month, we swelled from 8 to 20 people this January with interns, and were triumphant in making the whole scheme work productively. If you want to know how we did it, read the post. In fact, just go read it. I’ll wait.

The crackerjack Ksplice PR team (*) got my post to show prominently all day Wednesday on Reddit and Hacker News, and then it went up on Slashdot all Wednesday evening and Thursday during the day. Traffic numbers were much, much more than anything else I’ve ever written, exceptYouTomb.

Naturally, we learned some things about interacting with your average comment-leaving reader on the Internet. The first wave of comments, a few both on link aggregators and on the post itself, were vicious denunciations of us for the (apparently) illegal practice of employing unpaid interns to do real work. These commenters were of course wrong—you can’t get any intern in software for free, let alone the kind of people we wanted, and we paid as much or more than they could make with their skills in research jobs on campus. I clarified that, I and others replied, and the comments shifted to mostly positive. Then when we landed on Slashdot, the text was a classic opposite-of-the-article Slashdot item: we had claimed to “bust” Fred Brooks’ pioneering observations on software project management. Dozens of commenters poured in to grouch that we hadn’t disproved his law, only sidestepped it—which was of course our point.

Fortunately, not all commenters are just being wrong. We had several good comments, but this afternoon came one last comment from a source far beyond any response I imagined. I feel a twinge of regret now for comparing the OS/360 project to Windows Vista, apt though it was. Prof. Brooks, of couse, did far better than the Vista managers in the end, in that he learned lessons from the experience and put them in a book that the whole profession learned from.

How we’re going to top that comment in our next post, I don’t know—it might be tough, for example, to get a comment from a man who hasn’t used email since before blogging was invented.

(*) Namely, us and our friends on zephyr/twitter lending a few upvotes to our posts. Several others at Ksplice made substantial comments and edits before the post was published, too, which greatly improved it.

[Update, 2010-03-18: there is now a straight-up newspaper-style article about... the comment threads on my post. The Internet never ceases to amaze me.]

March 09, 2010

I live in a really awesome apartment. I’m living with really awesome people. And we tend to err on the side of awesome when it comes to buying stuff for the apartment. Specifically, we’re big fans of communalism – we do communal groceries, communal furniture, whatever.

But all four of us are paying for stuff for all four of us, it does make keeping track of money a little tricky. The traditional solution is for everybody to stuff their receipts into a drawer, and every month you all sit down and slowly work your way through the receipts.

Each of us owes you $50 for your grocery run, and you owe me $20 for my grocery run.

It’s a painful process, and the difficulty scales super-linearly as you add more people. Even with four, it would be pretty bad.

Now, it turns out that programmers hate tedious tasks like that, so there’s a long history amongst my friends of programmatically solving this in various ways. When we moved into this apartment, I figured I’d try my hand at it, and BlueChips is the result.

Since we set it up, BlueChips has been used by us and by other roommate setups to manage their expenses. We use it for tracking everything – rent, utilities, groceries, furniture, when all of us go out for dinner…

BlueChips has a very simple data model. There are users. A user can move money in two ways: expenditures and transfers. In an expenditure, one person spends money on behalf of a bunch of people. As a result of the expenditure, each of those people owes the spender some amount of money. BlueChips lets different people owe different amounts as the result of a single expenditure. For example, when we pay rent, each of us pays a different percentage, and BlueChips can follow that.

BlueChips’ biggest feature, though, is its ability to calculate the transfers necessary to settle the books. When it makes this calculation it also does something we call “pushing transfers”. Let’s say Larry owes Moe $1, and Moe owes Curly $1. BlueChips can “push” the $1 through, and will tell you that, to settle the books, Larry should give Curly $1.

If you’re still confused, or just want to see what the app looks like, I have a demo instance running at http://demo.bluechi.ps.

The software’s been around for a year, and it’s been open-source for most of that time, but I’ve never quite gotten around to putting a finishing coat of polish on it and getting it into a form that other people can use it.

When I lived with Scott Torborg and some other friends over the summer, we used BlueChips again for handling finances. Scott decided to put some of that polishing effort into BlueChips, and I have him to thank for all of the styling, excellent test coverage, and the iPhone interface, along with innumerable other tweaks.

I finally coded up the last big feature that BlueChips was missing: the ability to add new users without directly interacting with the database.

And so today I’m pleased to announce that I’ve tagged and released a version 1.0.0 of BlueChips.

BUT WAIT! THERE’S MORE!

For those of you MIT folks, I’ve worked with the scripts.mit.edu team to provide a Scripts autoinstaller. To install BlueChips, you can run the following commands from any Athena workstation:

dr-wily:~ broder$ add blue-sun
dr-wily:~ broder$ scripts-bluechips

Please remember that this is not a Scripts-managed autoinstaller. If you run into any problems, like it says, please let me know at bluechips@mit.edu.

And if you find BlueChips to be missing a feature you want, please feel free to write it yourself! In general, I don’t expect to have a lot of time going forward for new feature development, but I’m more than willing to review contributions from others. It’s my hope that the community can pick up my slack and keep BlueChips moving forward.

March 01, 2010

As part of my involvement with SIPB, one of the biggest problems we run into is getting people started. As much as we emphatically insist that you don’t need to know anything about computers coming in (just be interested in them), it’s hard to implement that in practice.

One area that I think we do a particularly bad job of spinning people up on is how to use Unix-like environments. We’re a very Linux-heavy organization, and without some amount of *nix (and, in particular, *nix command line) comfort, it’s hard to figure out where to start.

When I’ve tried to teach people this sort of thing in the past, one thing I’ve always found is that you can’t use a system you don’t understand. You might be able to apply formulas to it (i.e. you might know “ls” or “blanche“), but without understanding the system, you can’t do things like building awesome complicated pipelines of 12 different commands, or whatever. So in the last 6 months or so, whenever I’m trying to teach somebody something, I take the time to teach it to them from the ground up. But I still didn’t have a good answer for teaching Unix.

I realized last night that I really learned how to think about Unix in 6.033, when we read the Unix Paper. In particular, sections 3, 5, and 6 are a pretty concise explanation of open, read, write, pipe, fork, exec, wait, exit, not to mention how input/output redirection, file descriptors, and shell fork+exec loops work.

And, modulo some slight naming changes, all of the information still applies to modern Linux. Not bad for a paper that’s 36 years old!

In any case, I’ve decided that pointing people at those three sections of that paper is my new answer for how to go from a formulaic understanding of Unix to actually being able to work with it. But it’s still only a start.

When did Unix click for you, and what actually did it? How do you help it click for other people? What other good beginner material is there, not just for Unix but other technical topics as well?

Every time I look for the definition of Vcs-Browser, Vcs-bzr, Vcs-cvs, Vcs-git, Vcs-hg, Vcs-hg, and Vcs-svn, it takes me forever to track it down.

In particular, searching for debian vcs-svn doesn’t actually find what I’m looking for. It took me running through a series of blog posts linking to forum posts linking to online list archives linking to Debian bugs to finally get the hint.

February 28, 2010

After Kevin’s post on commenting, I realized that I tend to be really bad about following through with blog comment conversations.

Kevin pointed out that he’s more likely to take the discussion to zephyr, the mostly-MIT-internal chat server. In fact, Nelson started the Iron Blogger event as a way to combat the fact that we tend to have all our interesting discussions on zephyr, instead of with the rest of the world. So blogging openly but replacing “commenting” with zephyr really defeats a lot of the point.

I know that for me the biggest reason I like having discussions on zephyr is because it’s easy to have a discussion. I don’t have to go seek out replies to my commentary – they show up automatically.

On the other hand, I read blogs through an RSS reader. I don’t tend to visit sites directly. And certainly I don’t go back through a blog’s history looking for replies to my replies. This means that it’s far too easy to make a comment and never look at the comment site again.

To try and combat this, at least for my blog, I’ve installed the “Subscribe to Comments” plugin. It was really easy – the plugin automatically adds the subscription checkbox to the comments form, although I decided to move it to put it above the comment textarea.

I’d encourage the rest of you to do the same – let’s bring the discussion, as well as the blogs, out of the MIT bubble.

February 27, 2010

for oo in $(cd .git/objects/ && ls ??/*); do
o=${oo%/*}${oo#*/}
# do something horrible with the Git object $o, which is in the file $oo
done

It doesn’t matter now exactly what the code was for. But a collaborator wrote back to me:

> > o=${oo%/*}${oo#*/}
> How does this line work/what is it supposed to accomplish? In
> particular not sure what the %foo and #foo do.

Stop for a moment: do you know how that line works? I wouldn’t have in my first years writing shell scripts.

This line demonstrates one of a repertoire of tricks I’ve picked up to get some things done in bash that might otherwise require invoking a separate program. None of these will be news to shell-programming experts, but I sure didn’t know all of them when I started writing in shell. Here’s a little braindump on one of my favorite tricks, and where to read about more.

The best documentation for Bash is the info page—the specific pages I find myself referring to most often are under “info bash” -> “Basic Shell Features” -> “Shell Expansions”. (If you’ve never tried it, you’ve been missing out! Type “info bash” at your favorite prompt. But not on a Debian or Ubuntu machine, where the info page is missing due to a stupid licensing dispute. Info is the home of the best documentation available for Bash, GCC, GDB, Emacs, miscellaneous GNU utilities, and Info itself.)

This feature is under “Shell Parameter Expansion” there.

`${PARAMETER#WORD}'
`${PARAMETER##WORD}'
The WORD is expanded to produce a pattern just as in filename
expansion (*note Filename Expansion::). If the pattern matches
the beginning of the expanded value of PARAMETER, then the result
of the expansion is the expanded value of PARAMETER with the
shortest matching pattern (the `#' case) or the longest matching
pattern (the `##' case) deleted.

The % and %% features work similarly, with “beginning” substituted with “end”.

My mnemonic for # versus % is that $ is for the variable; # is to the left of $, so it strips from the left, and % is to the right, so it strips from the right. I suspect this is the actual motivation for the choice of # and %, though I’m curious to see evidence to confirm or refute that thought.

So after my line o=${oo%/*}${oo#*/}, o consists of the part of oo to the left of the last slash, and then the part of oo to the right of the first slash. Since there should be just one slash in oo, it has the effect of making o be everything but the slash.

That makes one trick I use all the time. There’s plenty more, and those Info pages explain many of them. I’m not sure all these tricks are a good thing on balance—they serve as a crutch to make the shell go further, when maybe I should just be quicker to switch to a real programming language. But they sure come in handy.

If you’ve tried to use our recommended configuration for authenticating users using MIT certificates, you’ve probably discovered that Safari users are not offered the opportunity to select a certificate. This is due to a bug in Safari’s SSL implementation where it will never present a certificate unless the server requires that it present one (we do not require that a certificate be presented, so that we can show a page saying “you need certificates”).

Starting today, we’ve added some additional code that will force Safari to show the certificate selection dialog. If you are using the recommended configuration for certificate authentication, this will take effect for your site automatically. (Specifically, what we now do is that force an SSL renegotation if we find the Safari browser.)

If you are using any other configuration than our recommended configuration, the behavior should not change.

February 22, 2010

If you’ve tried to use our recommended configuration for authenticating users using MIT certificates, you’ve probably discovered that Safari users are not offered the opportunity to select a certificate. This is due to a bug in Safari’s SSL implementation where it will never present a certificate unless the server requires that it present one (we do not require that a certificate be presented, so that we can show a page saying “you need certificates”).

Starting today, we’ve added some additional code that will force Safari to show the certificate selection dialog. If you are using the recommended configuration for certificate authentication, this will take effect for your site automatically. (Specifically, what we now do is that force an SSL renegotation if we find the Safari browser.)

If you are using any other configuration than our recommended configuration, the behavior should not change.

February 21, 2010

Paravirtualized Clocks

In theory, Xen dom0′s are supposed to forcibly sync their system clock to the domU’s. In practice, due to some incompatibility between either Ubuntu’s version of the dom0 or domU patches, that doesn’t work, even though the feature is enabled, which leads to clock drift and occasionally weird clock lockup bugs.

The easiest way to fix this is to disable the Xen clock syncing entirely, and rely on the standard Linux clock mechanism. You can do that by adding these two lines just before exit 0 in /etc/rc.local:

February 15, 2010

If you work on computing in school, on the side, or in industry and you’ve been paying attention to the people around you, you’ve probably wondered why so many fewer women than men enter our field and stay in it.

This is no immutable law. In fact, the proportion of women in computer science in the United States was once much higher. Of people receiving bachelor’s degrees in computer science, women made up nearly 40% in the mid-1980s, declining to 20% in 2006. (graphs, NSF data.) And it varies among cultures, too—in Malaysia, women actually outnumber men in computer science. (data, analysis)

So the natural way to ask the question is in this form: What are we doing in computer science that causes so many fewer women than men to enter our field and to stay in it? And what can we do differently to change that?

Recently I picked up a book on this subject. Unlocking the Clubhouse is the product of a collaboration between Jane Margolis, a social scientist studying gender and education, and Allan Fisher, the founding dean of the undergraduate program in computer science at Carnegie Mellon University.

The authors gather scores of previous studies, and they did their own work from a privileged position at the helm of the undergraduate program at Carnegie Mellon University. Their success at answering these questions may be indicated by the reversal they achieved of national trends at CMU in the five years of their research:

Before the authors’ work, the proportion of women among entering freshmen ranged from 5% to 7% over the five years 1991-1995. At the conclusion of their project in 2000, this proportion had reached 42%.

Of students entering the program at the start of the project in 1995, only 42% of women remained after two years. This rate rose to 80% for women entering in 1996, and stabilized at nearly 90%. The rate among men was steady around 90%.

With that kind of success in practice, it’s clear their scientific findings and their recommendations have earned serious consideration. In a future post I’ll say more about those, and I’ll also look at what some other people have found on the subject. Ironically, it turns out one result of Margolis and Fisher’s success may have been to invalidate some of their findings in the new environment they created.

SIPB has two priorities: people and projects. Each active project has its own organizers, maintainers, and/or developers who move it forward and make its decisions, so the role of the chair and vice-chair is about keeping track of how things go, helping connect the project to outside resources and connect new contributors to the project, mediating shared resources like the machine room, and making sure that key projects get passed on from year to year.

It makes sense, then, that we spent most of our time talking about people—bringing people in the door at SIPB, making the office a welcoming place for them, drawing them into our community, and electing them as members. We hear in almost every membership election about how the organization could do better at this. Here’s a quick version of why it’s so important:

Every year, about 1/3 of student SIPB members graduate.

Put another way, in steady state:

Size of SIPB = 3 * (# new members / year)

For example, right now SIPB has 26 student members, and by my count 9 are planning to leave MIT in June. So the only way SIPB can stay as strong as it is is to get 9 new members this year, and about as many again the next year, and the next year, and so on. Fewer new members ⇒ fewer members ⇒ fewer awesome projects, fewer people to learn from, fewer people to hire away to Ksplice (ahem, maybe not everyone shares that motivation).

From those numbers in the last five years, it’s not hard to see how we got the organization to the point where threestrongcandidates stood at the last election for chair, and where the office is full to crowding at nearly every Monday’s meeting. It’s also clear how it wasn’t always this way—the numbers from the 2004 and 2003 academic years led directly to the election of 2005 in which the nine-member EC comprised every student member of the SIPB.

But my favorite aspect of these numbers is in the column on the right. When I was the chair in 2008-9, I put an emphasis on getting people involved in SIPB in their first and second years. I’ve heard a lot of people’s stories over the years of showing up at SIPB as a freshman or sophomore, going away for a variety of reasons, and finally coming back two or three or more years later and becoming members. Some of them went on to become highly active and valued contributors, and it’s too bad for everyone that we didn’t succeed in bringing them in the first time around. With the record 6 freshman and sophomore new members in the 2008 academic year, I think we succeeded in turning a lot of those stories around into members who will be active students for a long time. Edward and Evan have gotten this 2009 year to outpace 2008 so far, so the new team of Jess and Greg have the chance to finish it at another record. 2010 will be theirs to create, and I wish them the best of luck in outdoing 2008 and 2009 both.