About 15 years ago, I was hanging out at the MIT AI Lab, and there was an
ongoing seminar on the Coq proof assistant. The idea was that you
wouldn't have to guess whether your programs were correct; you could
prove that they worked correctly.

The were just two little problems:

It looked ridiculously intimidating.

Rumor said that it took a grad student all summer to implement and prove
the greatest common divisor algorithm, which sounded rather
impractical.

So I decided to stick to Lispy languages, which is what I was officially
supposed to be hacking on, anyway, and I never did try to sit in on the
seminar.

Taking another look

I should have taken a look much sooner. This stuff provides even more
twisted fun than Haskell! Also, projects like the
CompCert C compiler are impressive: Imagine a C compiler where
every optimization has been proven correct.

Even better, we can write code in Coq, prove it correct, then export it to
Haskell or several other functional languages.

Here's an example Coq proof. Let's start with a basic theorem that says
"If we know A is true, and we know B is true, then we know A /\ B
(both A and B) is true."

Theorembasic_conj:forall(AB:Prop),A->B->A/\B.Proof.(* Give names to our inputs. *)introsABH_A_TrueH_B_True.(* Specify that we want to prove each half of /\ separately. *)split.-applyH_A_True.(* Prove the left half. *)-applyH_B_True.(* Prove the right half. *)Qed.

But Coq proofs are intended to be read interactively, using a tool like
CoqIDE or Emacs Proof General. Let me walk you through how this
proof would really look.

Proof.

At this point, the right-hand pane will show the theorem that we're trying
to prove:

I've been fooling around with some natural language data from OPUS, the
“open parallel corpus.” This contains many gigabytes of movie subtitles,
UN documents and other text, much of it tagged by part-of-speech and aligned
across multiple languages. In total, there's over 50 GB of data, compressed.

I've long been a huge fan of Heroku. They've made it super easy to deploy
and scale web applications without getting bogged down in server
administration. Also, their free tier has been very generous, which made
Heroku a perfect place to run weekend projects. (And my clients have
happily paid plenty of money to Heroku over the years, so nobody's been
losing out.)

Heroku's costs and limitations

Lately, the costs of using Heroku for weekend projects have been creeping
upwards:

Over the years, I've learned to be cautious with C++ pointers. In
particular, I'm always very careful about who owns a given pointer, and
who's in charge of calling delete on it. But my caution often forces me
to write deliberately inefficient functions. For example:

vector<string>tokenize_string(conststring&text);

Here, we have a large string text, and we want to split it into a vector
of tokens. This function is nice and safe, but it allocates one string
for every token in the input. Now, if we were feeling reckless, we could
avoid these allocations by returning a vector of pointers into text:

Why does this fail? The function get_input_string returns a temporary
string, and tokenize_string2 builds an array of pointers into that
string. Unfortunately, the temporary string only lives until the end of
the current expression, and then the underlying memory is released. And so
all our pointers in v now point into oblivion—and our program just wound
up getting featured in a CERT advisory. So personally, I'm going to prefer
the inefficient tokenize_string function almost every time.

Rust lifetimes to the rescue!

Going back to our original design, let's declare a type Token. Each
token is either a Word or an Other, and each token contains pointers
into a pre-existing string. In Rust, we can declare this as follows:

Rust is a systems programming language designed around speed and
safety. It sits roughly halfway between Go and Haskell. In
particular, it combines precise, safe control over memory with
high-level functional programming. Haskell programmers, for example, will
notice that Rust's and_then works much like bind in Haskell's Maybe
monad:

usestd::os::getenv;usestd::io::net::ip::Port;/// Look up our server port number in PORT.fnget_server_port()->Port{getenv("PORT").and_then(|s|from_str::<Port>(s.as_slice())).unwrap_or(8080)}

Anyway, I spent this morning trying to get Rust working on Ubuntu 10.04
Lucid, as part of a larger effort to deploy a Rust application on Heroku.
(More on that soon.) On Ubuntu 10.04, rustc fails looking for
libstdc++.so.6:

Today, it's possible to build rich, sophisticated applications in the
browser. Everybody's familiar with GMail and Google Maps, of course, but
have you seen stuff like Mozilla's PopcornMaker?

This just blows me away. And applications like this are appearing all
over, and they're well within the reach of any tech startup.

Of course, the major problem with building sophisticated applications is
that now you need to maintain them. And just hacking everything together
with jQuery is probably going to make a mess.

I've built a few rich applications during the last twelve months, both for
clients and for myself. Based on that experience, here's a list of tools
that have worked well, and tools that look promising for the coming year.
This list is shamelessly subjective, and probably already obsolete.
Please feel free to contact me with better suggestions; I'll add
a bunch of them to the article.

Although I don't usually mention it here, one of my hobbies is learning
languages. French is my strongest by far, but I've been experimenting with
seeing just how slowly I can learn Middle Egyptian. Normally,
I need to reach a certain minimum degree of obsession to actually make
progress, but it turns out that software can help a bit, as I explain in
this post on the Beeminder blog.

But when I decided to learn Egyptian, I was faced with a dilemma: I
couldn't justify spending more than an hour per week on it. Hierogylphs
are cool, but come on—it's a dead language. Unfortunately, it's
hard to learn a language in slow motion, because two things always go
wrong:

I get distracted, and I never actually put in that hour per week…

I forget everything I learn between lessons…

Of course, one key tool here is Anki, which clever exploits the
spacing effect of human memory. To oversimplify, if I'm forced
to recall something shortly before I would have otherwise forgotten it,
I'll remember it at least twice as long the next time. This allows
remembering things for O(2^N) time for N effort, which is a nice trick.

Hierogloss

On a related note, I have a new toy up on GitHub: hierogloss, which
extends Markdown with support for interlinear glosses rendered using
JSesh:

Fitocracy is a great site for tracking exercise, one which manages
to have both a very friendly culture and an impressively gung-ho attitude.
But they've never gotten around to implementing any kind of official API.
If you want to look up your Fitocracy score from inside a script, you need
to jump through a surprising number of hoops.

What we need is a generic web scraping tool like mechanize, but with
the abilitity to deal with a rich JavaScript UI. It turns out the easiest
way to do this is to use a headless web browser.

First, let's create a Ruby Gemfile. We'll use capybara-webkit, which is
normally used for testing Ruby websites:

source"https://rubygems.org"gem"capybara"gem"capybara-webkit"# Optional, if you want to debug using save_and_open_page.#gem 'launchy'