Emil Sit

Gina Trapani and ExpertLabs have put ThinkUp, a cool tool for tracking replies to your posts on Twitter. As of September 2010, ThinkUp has a nifty drop-in web-based installer, much like WordPress. Simply grab ThinkUp 0.007 or later, unzip it somewhere that your PHP/MySQL-enabled web server can get at and it’ll prompt you through the installation.

After you unzip the ThinkUp dist, run chgrp -R web _lib/view/compiled_view and chmod -R g+w _lib/view/compiled_view so that the templating engine can cache its views.

Make sure you have a MySQL process enabled in your NearlyFreeSpeech control panel. A basic MySQL process costs $0.02/day but you can share the process with your WordPress database. Spin up phpMyAdmin in the right-hand sidebar and create a user called thinkup and make sure to check off the option to create a database with the same name and grant all rights to that user. Generate a random password, and copy that password.

In the ThinkUp database configuration section, enter thinkup as the user name and database name and paste your generated password. Open the advanced section and change the database host from localhost to your database host name. It’ll be something like username.db; mine, for example, is sit.db.

ThinkUp will fail to write the configuration file due to perms but helpfully offers the ability to copy and paste a file. Select the text in the config text box and go back to the terminal where you unzipped the dist. In the thinkup directory, cat > config.inc.php, paste and then Ctrl-D to save the file.

Check your e-mail for the activation link and configure your account. You’ll need to register your installation as a Twitter application and paste in the consumer key and consumer secret. The config page will send you to the Twitter registration page and tell you the callback URL to provide. Leave ThinkUp as a read-only application and leave the ‘Use Twitter for login’ unchecked.

That should do it!

For more details, check out the ThinkUp wiki for more up-to-date instructions.

According to their homepage, SpringSource is in the business of “eliminating enterprise Java complexity” and is a leader in Java application infrastructure and management. That’s not very concrete, and so I don’t feel it is particularly helpful, particularly if you are not an J2EE/JEE (Java Enterprise Edition) developer. In this post, I’ll talk about SpringSource in general and focus on the Spring Framework. Note that while I work for VMware (which owns SpringSource) and use the Spring Framework (commonly referred to as Spring) at work, I am not part of our SpringSource division nor do I have any particularly special access to the innards of SpringSource. I did get to take the Core Spring training for free, but it is only after 5 months of programming with Spring that I’ve started to understand the SpringSource philosophy.

SpringSource products let you write code that is as focused as possible on the needs of your application, and as little as possible on the boilerplate or hassle of dealing with different underlying environments (e.g., dev, test, production may have different database backends) or infrastructures (e.g., GAE, vCloud). This is the core value that underlies SpringSource, but it is only explored indirectly via its various instantiations in the SpringSource literature and product line.

The Spring Framework (aka Spring) provides glue. Spring provides glue in a relatively uniform manner, so that once you understand the basic approach(es), you can apply it to interfacing with different components. From the documentation, Spring seems to do everything, but at the same time, when you try to use it, you may feel that it seems to do almost nothing. It may be useful to compare the Spring Framework to the Debian Linux distribution: Debian provides a nice out-of-the-box experience with a uniform mechanism for managing software, and in particular, alternative software packages that can provide a common service. But to get at the power of the underlying packages, you must learn how to configure and use them. Likewise, Spring does not actually provide many services on its own. It does not free you from having to learn how to write a unit test, access a database, manage a messaging system, or implement security. Instead, it makes it possible for you to write code to do these things in a somewhat generic manner, so that your code can be as generic as possible.

Understanding these two key points will help you make sense of the variety of things written about Spring.

The core glue provided by Spring is its dependency injection support, also known as the “inversion of control” or IoC container. This means that, instead of class Foo explicitly instantiating an object implementing interface Bar, Foo will have a constructor argument or setter that accepts a Bar. The inversion of control container lets you specify the right kind of Bar for Foo in a given environment and handles constructing that Bar, and injecting it by calling the setter. This makes code less fragile because it no longer needs special-casing for testing (e.g., a stub Bar) or anything else. The mapping of a particular Bar to Foo becomes part of a configuration file that also captures all of Foo’s other dependencies.

For example, your application can use the Spring Framework with straight JDBC, or with generic object-relational mappers (ORMs) like JPA through to highly specific ones like iBatis or Hibernate. You can configure your application to then talk to a variety of database back-ends, with minimal changes in the actual configuration files, and write minimal code related to setting up database connections and processing error cases. Spring provides wrappers and translators to help unify service-specific method names, such as the one that causes an ORM system to generate database tables, and exceptions into more generic expressions of those concepts. This means you might be able to switch between JPA providers, for example, without changing too much configuration. However, you still have to configure your JPA provider correctly.

In line with allowing you flexibility from the infrastructure, Spring also provides flexibility of mechanism, so that the code and configuration that you write to integrate with Spring’s services are at a level that you are comfortable with. You can configure the IoC with XML or with Java; you can use annotations or you can use explicit configuration. To specify which of your business methods should be in a database transaction, you can annotate with @Transactional in your source code, or you can use an aspect-oriented programming filter to tag the relevant methods in an external configuration file.

All of this glue and the flexibility of mechanism contribute to making Spring hard to understand; however, their presence emphasizes the idea that Spring wants to get out of your way so that you can focus on application development. Other SpringSource tools such as Roo and Insight work similarly: they simplify development and debugging (respectively) without requiring that you make extensive changes to your source, and respecting current best-practices.

When I was growing up, we would listen to classical music stations in the car and try to figure out the composer and sometimes even the performer. Both musicians and composers often have their own distinctive style: you can hear the mathematical precision of Gould, or the clarity of Horowitz, whether they are interpreting Bach or Mozart. My last post started me thinking about a musician or composer’s style and drawing a parallel in the context of computer programming.

When thinking about music, one’s style is a matter of personal expression, but if you say “coding style” to a programmer (or really, to Google), you’ll find rules about whitespace, variable naming, plus some proverbs about how to write maintainable code (e.g., “avoid global variables”). Overall, I don’t think these are particularly relevant to the art of programming.

For example, formatting and naming conventions are important in a codebase only that a properly followed convention becomes invisible—just like your nose becomes acclimated to a smell, your brain quickly learns to recognize a formatting convention and ignore it. Having a convention (any convention!) allows you to focus on what the source code is really doing. Following a convention is good for everyone reading your code, even you. Automate your coding conventions and forget about it. (In the extreme, check out what the Go Language formatter can do.)

Similarly, coding style proverbs, like “write tests before code” or “keep code in a function at one level of abstraction”, are like any other proverb: these statements capture an element of experience from programmers past, but are often blindly followed by people new to the practice. It takes significant time before a programmer can truly internalize the reasons for and nuance behind any proverb. (Incidentally, if you are interested in studying proverbs, I highly recommend you examine the game of Go.)

What I am interested in exploring is personal expression and style in programming, outside of language/library/tools, proverbs or code formatting. Having a personal style is not a concept that we as computer programmers are generally exposed to. School focuses almost exclusively on the technical, ignoring both the practice (i.e., the stuff of proverbs) and the art (the subject of this post). Indeed, I am only beginning to be able to express what my personal style might be.

So, how do you express yourself in code? To begin exploring our artistic programming style, let’s continue to draw an analogy from the arts—instead of music, let’s look at the process of establishing a personal photographic style. The author talks about the importance of the choices—the choice of equipment (camera), subject matter, the approach to making a picture. You come upon a way of doing things that you believe is right, that supports your personal values. When looking at myself:

Equipment: Linux/Vim/Mutt/Xmonad/Git.

Subject matter: I care strongly about the process with which you build programs, and so I wind up working a lot on tools and scripting, but I’m also interested in distributed systems problems.

Approach: I like (re)using what is present; I strive for consistency, simplicity, elegance. I am somewhat inclined towards using functional constructs in imperative languages (though I never did like OCaml’s syntax). I always look at a diff of my code to ensure it is minimal before committing it, and I like to write verbose log messages.

Examples: Some things I’ve worked on that you can look at include of course Chord, and some contributions to Gina Trapani’s todo.txt tool.

I’m not entirely happy with this “approach” because it feels like mostly a list of platitudes. But some of my difficulty, I think is in not having thought about this specifically as I look at other people’s code, not even having the words that I can use to compare and contrast my approach with that of others. So, I’d like this post be a start for each of us to explore our own style.

Spend a few minutes thinking about what you value in your own code, and how you define yourself as an artistic programmer and write about it in a comment, perhaps using the template I’ve set for myself above. I hope we’ll each learn something!

I encourage you to watch this (and all 6 parts), even if you know nothing about classical music.

What stood out to me in viewing this series of videos was the fluidity with which Gould is able to discuss a piece of music, in its historical context, and to simply jump in to play a phrase from a different piece to call out a point for discussion. This demonstrates an incredible mastery of the subject matter, unifying history and context, theory, and practical implementation.

From my experience at the Manhattan School of Music, musical training seeks precisely to bring this unification into its students. As you study, you learn to play a variety of pieces or styles, from different time period. You are taught some history, to be able to understand the evolution of styles; you are taught the underlying theory, to be able to discuss this evolution using precise terms; and then you practice the mechanics needed to actually play the pieces.

How does this compare to computer programming? Computer “science”, as you may know, has a fair amount of artistry to it.

We are simply not trained to have these kinds of discussions. How many people do you know who, during a code or design discussion, might say, “Oh, this is very similar to System X, in contrast to how System Y did things,” and then pull up the source code (or architecture diagram) for System X and Y and compare them with the relevant pieces under discussion?

At MIT, the classes are (were?) organized around ideas and then around technical implementation (leading, hopefully, to understanding). Little emphasis is placed on how to be a programmer, such as prototyping, testing, revision control, or code-review; that is, these things rarely factor significantly into your grade. Even less emphasis is given to the ability to discuss ideas in context, even at the graduate level. Undergraduate classes at most schools seem to focus on learning best practices for a particular programming language. As graduate students, only the extremely motivated would explore beyond the papers presented in the course syllabus, or the immediate related work for a given project; tracking down the source code of other systems is almost never done (perhaps simply because it is not often available).

In the professional world, no one teaches you how to do code or design reviews at this level either. Just like in school, professional programmers are constantly subject to deadlines which override just about any extra-curricular work. Reviews are often focused on mechanics or on vague, unsubstantiated worries. Again, extreme personal motivation is required to move beyond this.

How can we improve this situation? Is there room at the undergraduate level for more capstone projects that unify the theory, the history, and the mechanics with the craft of programming? What about at the graduate level? How about in a professional environment? What has been your experience?

I’d like to find programmers that work this way, that are excited and passionate about the craft of programming. Are you one? Get in touch.

At PDOS, Frans Kaashoek and Robert Morris definitely encourage us to build real systems and make the system available. I definitely like this approach and never found that publishing the full source to Chord, including our work-in-progress/submission, caused any problems. It has also meant that a lot of people still play with Chord, even if I no longer actively maintain it.

Once you get a job at a company, you move from one side of the interview table to the other. My ideal candidate for just about any engineering position:

has the ability to present technical ideas on the fly;

has practical Unix knowledge;

can write clearly and concisely in English and in code;

has a strong technical background.

Knowledge of particular technologies or programming languages is generally not interesting. Rather, the candidate should be smart and passionate.

One way I’ve started looking for passion is to see if the candidate has been involved in any volunteer work or open source projects. But it can be hard to assess the other qualities, even in an hour long interview. Typically, an interview assesses your ability to solve a particular problem, possibly in code, but not much about how it would be actually work with you.

As we hire some more people for MVP, I’m considering changing up our standard “bring in candidate for a series of 45m one on ones” to include some ways to probe for my desired qualities before the interview. I’d like to have candidates perhaps:

Send in a dot-file of some sort (e.g., .bashrc, .vimrc, .emacs, httpd.conf, etc.) That is: does the candidate use Unix and customize it? Does the candidate comment dot-files?

Prepare and deliver for the interview panel, a 5m presentation on some (any!) technical topic. Ensures the ability to communicate ideas clearly and answer questions.

Provide some samples of bug reports the candidate has filed or technical discussions that the candidate has had on a mailing list. (Say, 3 from the past 3 years, ideally from an open source project.) Alternately, provide a pointer to the candidate’s blog. That is, can this person write cogently? “Excellent communication skills” anyone?

Provide a code sample, something the candidate has had primary responsibility for developing, on the order of 100–500 lines of code. Two interviewers will review the code with the candidate.

Provide a commit, i.e., a diff to existing code (perhaps the code sample provided) and a commit message. This would demonstrate the ability to provide a clean functional change and document it for the team.

Most companies I interviewed with (at the PhD level) required a presentation, but only one asked for code samples. I’ve not seen any requests personally for anything else. Have you?

Incidentally, the Mobile Virtualization Platform team at VMware is hiring (mostly for our Cambridge office). Get in touch if you’re interested.

Because of our work with the Linux kernel and with Android, we have started using Git more extensively at work, and my colleagues often have questions about how to get things done with Git. While the every-day command lists are helpful, most of the time, people would benefit more from getting a fundamental understanding of how git works.
Here is a brief list of useful resources to help achieve that:

Like presentations? Slides from an MIT SIPB Understanding Git class [PDF, 1MB], if you’re in a hurry, or Getting Git from Scott Chacon if you’re not.

Use Mark Lodato’s Visual Git Reference to help you understand how commands interact with history, the staging area and your working directory.

For me, the core difficulty is that people have to explicitly think about the history of their code and how they would like to share it with others. Git gives you many options whereas tools like Subversion and Perforce don’t; this plethora of options can make things confusing. In fact, it can lead to very philosophically different approaches for all aspects of your development process, ranging from shared-repository vs integrator to whether or not to merge frequently (yes? or no!). Here are a few useful readings on how people actually can use git:

The afternoon of Boston DevDays 2009 was, in my opinion, not as broadly appealing as the morning sessions (see my writeup of the morning here). However, there was still a lot of interesting material presented.

Joel welcomed us back from lunch by plugging
StackExchange and how it’ll mean the end of “crappy
old copies of Usenet” (by which he meant phpBB). He showed a pretty graph of
StackOverflow edging out ExpertsExchange in traffic. He also announced a new job search
site called careers.stackoverflow.com that
charges job seekers some money and asks you what your favorite editor is.
There was also a video ad for the FogCreek training videos. This man knows how
to monetize.

Patrick Hynds and Chris Bowen

The first technical session of the afternoon was on ASP.NET MVC.
Patrick started the session with an explanation of ASP.NET MVC’s history relative to
ASP Classic and ASP.NET, and why one might want to use a model-view-controller (MVC)
architecture for a website: for example, much finer control over generated HTML compared to
traditional ASP, test-driven development, and better URLs for SEO.

The rest of the talk was a demo of creating a hello world MVC
application in Visual Studio. The presenters walked through
updating models and view and controllers, setting up some basic
routing. It seems that ASP.NET MVC is a fine re-implementation of
Ruby on Rails or Django for the Microsoft world. One concrete
tip I learned was that in Visual Studio, Control-. will offer you some
completions or other shortcuts.

Reception to this talk was somewhat mixed, at least as far as I
can tell from the blogs and Tweets about it. The talk itself could have been
improved, of course; for example it
would have helped for Patrick to have explained what MVC
stood for (with a few architecture diagrams) before plugging its
advantages for ten minutes. My take is that if
you knew nothing about MVC, it was a straightforward talk that
gave an introduction to the concepts and the implementation in
.NET. If you were already familiar with MVC, I think you
would have thought it pretty content-free as there wasn’t a tremendous
amount of focus on the ASP.NET side of things.

John Resig

John Resig, the creator and lead developer
of jQuery, a very popular JavaScript library,
next took the stage to talk about JavaScript testing.

“Developing for JavaScript is a lot like whack-a-mole,” John
reported. The large space of operating systems, browsers,
browser versions, JavaScript engines, and browser plugins mean
that typically if you fix one thing you’re more than likely
to break something else. And so, in some informal studies,
John found that people just don’t test. This is something
John would like to change.

A unit test suite for JavaScript apparently isn’t that hard to
write. John threw up a bunch of increasingly feature-rich test
harnesses—with asserts, grouping by role, and a test runner
web-page—using a few dozen lines of JavaScript. The hardest
part of writing a test suite is likely to be adding support for
asynchronous events (e.g. XMLHttpRequests). Fortunately, there
are several pre-built suites such as
QUnit,
JSUnit, YUI
Test, and
Selenium. John spent a bit of time
talking about the differences between these frameworks, and
particularly plugged QUnit and YUITest.

Selenium is of particular interest since, unlike the others,
it is not just a unit test framework. It also has plugins
to allow recording and scripting events to a browser, so you
can do whole site testing. It even comes with Selenium
Grid which will let
you distribute and automate testing. This seems like a big win.

There are JavaScript engines like
Rhino. To help test code
in a browserless environment, John wrote env.js, which is a
browser-like environment that runs in pure JavaScript. He talked about how this could be used for
screen scraping.

Finally, John introduced testswarm.com.
This is a SETI@Home style site where anyone can visit the
site, download some tests to run and report back the results.
This should give very broad coverage and allow developers to
get feedback from a wide range of real environments (e.g.
mobile!).

Overall, John’s talk was a rapid-fire overview of JavaScript
testing resources from a JavaScript ninja. It was very practical,
easy to follow and probably great for anyone who does JavaScript
development. However, it lacked the “Python is awesome!” feel of
Ned Batchelder’s morning talk and so for a non-Javascript
developer such as myself, it was not as appealing.

Miguel De Icaza

Miguel De Icaza closed out the day
with a talk on Mono. Miguel explained that he wasn’t really sure
what he should talk about—Mono is a giant universe and
explaining “Mono” it is “kind of like explaining God”—and his
informal survey didn’t
really provide a mandate. He would up giving some nice technical
demonstrations of some recent Mono developments, with a fair amount
of personal flair to keep the audience engaged.

The core of Mono is an implementation of the Common Language
Runtime for Linux. One of the goals of this project was
to bring the best development tools to Linux.

The first demonstration Miguel gave was an impressive combination
of tools. Using a plugin to Visual
Studio, Miguel demonstrated that
you could develop a Linux port of a .NET application entirely using
Visual Studio on Windows, and seamlessly testing and packaging
on a Linux machine (or VM), by walking through a live example
of porting BlogEngine.NET. This made use of Bonjour to
dynamically discover the Linux machines, pushing execution
to the selected machine, and viewing the debugging results
in Visual Studio.

Miguel then decided that a developer might want to publish their
application as a software
appliance, so he
walked through a complete demonstration of using SUSE
Studio. He seamlessly built an RPM on
his Linux box from Visual Studio and pushed it into “the cloud” of
SUSE Studio. From his browser, he configured an appliance with
that RPM, baked it the way they do on cooking shows, booted the
virtual machine in the cloud, accessed it using a Flash-based console in
the web browser, and accessed the port of BlogEngine.NET that he
had just booted.

For his second major demonstration, Miguel moved over to
MonoTouch. This showed using
MonoDevelop, an IDE for Mono
developers, running on a Mac, to work with the iPhone interface
builder application, to build a simple flashlight application
(i.e., a giant white button) for the iPhone simulator. He
talked a little also about the technical work involved here,
which was to compile the developer’s Mono code into ARM assembly
and link it into the Mono runtime, to create an iPhone
application. This gets around Apple’s “no interpreters” rule.

Miguel’s talk was easily the most entertaining one of the day. It
was perhaps most entertaining because, in addition to his wry
humor (check out the pictures from Ian Robinson’s
Tweet, for
example), as he performed his live demonstrations, things would
break, whereupon Miguel would think, realize what was wrong,
pop open a Terminal and fix it. That’s not something you
see in the usual carefully scripted demos at most shows.
Of course, Miguel was also demonstrating some interesting
technical features and giving an advertisement for a wide-range
of Mono-related tools as well, so there was something for
everyone.

Wrap-up

After Miguel’s talk, Joel suggested that the audience break
up into informal groups and get dinner, loosely organized
around seven areas that he suggested. I hadn’t planned for
that and had to get home; not sure how many people went.

Overall, I think DevDays was well worth attending. On the networking side, I got
to meet a few local developers (of whom I’ve posted a few pictures on Flickr)
and catch up briefly with some acquaintances from school. On the technical side, I got a broad overview of several popular technical areas from leading
figures in those areas.

Boston DevDays kicked
off a month-long tour of technical talks aimed at programmers, organized
by StackOverflow and Carsonified.
I had the good fortune to attend, meet a few interesting people and see
some fun talks. I tried to write a bit in real-time (search Twitter here)
but the WiFi was pretty over-subscribed and there was no cell coverage
to speak of so eventually I gave up. Here are some more detailed notes, starting with
the morning sessions.

The day ran very much on schedule, and after some very loud music, DevDays opened with a funny video demonstrating the
competent technical leadership of the keynote speaker…

Joel Spolsky

Joel opened the day by giving a demonstration of the tyranny of computers:
you are constantly interrupted with questions asking for decisions, like “Do you want to install these
10 updates to Windows?” or “Enable caret browsing?”, that can be really hard to answer.
He argued that one of the reasons that computers ask us so many questions is that programmers want to give
users the power (and control) to do what they want. But that’s another way of
saying that programmers don’t want to make the decisions to keep things simple. The
rest of Joel’s somewhat meandering but always entertaining talk was a discussion
of how programmers (us!) should approach decision making, framed as a trade-off
between simplicity and power.

Decisions are ultimately hard to make—there have been many studies that
demonstrate that when people have too many choices, they freeze up and choose
nothing. Thus, we’ve seen strong push towards simplicity in recent years; one
clear example of that has been the Getting Real book from 37signals. Joel
(jokingly?) points out that three of the four 37signals apps are in fact just
one big textarea tag that you type into. Other examples, given later, include
of course Apple’s products and Google.

But why do we wind up with programs with lots of options? Well, if you have a
simple program, you find that most people won’t buy your product if it doesn’t
have feature X (or Y or Z or …). So you wind up adding features over
time, as you get more customers and more experience. Thus, removing simplicity
often happens as a side-effect of making money.

How to balance these? If we don’t want to take away all decisions from the
user, we need a rule to guide us in what to remove and what to keep. One rule
to follow is that the computer does not get to set the agenda. Good decision
points are those that should help the user achieve what they want to do. Bad
decision points are those that interrupt the user, that the user really isn’t
equipped to answer (e.g., should GMail automatically display inline images in
HTML?), that are things that the programmer cares about.

To decide what is good or bad, developers need a good model to understand what the user is trying
to do—Joel says every user is ultimately trying to replicate their DNA, but
you may have some more refined model. Joel gave the example of Amazon’s
1-Click purchasing where the user should just be able to buy something with
a single click. Apparently, early drafts of 1-Click really weren’t
one click: programmers kept wanting to put in things like confirmation pages.
Eventually, they arrived at just one click—by not immediately starting the
order processing, but holding it for a few minutes to consolidate related
1-Click orders and allow for cancellation of errors. This was more work for
the developers, but simpler for the user. This is what we want to see happen.

Overall, I think Joel’s talk set a nice tone for how we should think as developers
but didn’t offer anything particularly ground-breaking.

Ned Batchelder on Python

Ned Batchelder presented Python as a “Clean noise free environment to type your ideas.”
Or, alternatively, “Python is awesome.” With a few bullets to lead-off (e.g.,
the REPL, duck typing, and “batteries included” nature of Python), we dove into code to
really understand what Python is capable of doing.
Ned’s slides are online if you want to
take a look. I won’t cover his talk in much detail since it was largely
explaining the Python language but will list some high points.

The second example was a custom build-up of a simple Python
templating engine, loosely based on Django’s template styles.
This example demonstrated simple formatting with the percent operator, but then quickly moved into more advanced features
like duck-typing (by implementing __getitem__) and callable.

One of the nice things about Ned’s presentation is that he demonstrated the
power of Python in two short but extremely powerful examples that left people
who didn’t know Python thinking, “Wow! That is really amazing!”

Dan Pilone

Dan Pilone’s talk was an overview of iPhone development,
first giving a quick market overview, then giving a broad overview
of the nitty-gritty of developing in Objective C, and finally diving
into the practical economics and realities of selling iPhone apps.
The iPhone app market (as of Apple’s September numbers), consists of
something like 85,000 applications, of which 75% are pay
applications. Some 35% of applications are games, whereas 6% fall
into more social applications. There are 5 or 6 different
iPhone/iTouch hardware platforms, and something like 20 million
plus devices sold. The iPhone has a great user
experience and comes with a great delivery model (the iTunes app
store). This combined with some of the numbers make it a
good platform to develop for, with significant amounts of money
that can be earned. (Dan emphasized that you really have to work
to develop and market the application, hence the “earned”.)

The iPhone development environment is very shiny. I’m sure
Ars Technica
has a much better overview but in short Dan demonstrated some
tools like XCode, CoreData (a graphical SQLite data modeler),
reference counting support (aka retain), and Instruments (a memory
profiler). Dan suggested that this shininess is to make up
for some of the oddities of Objective C
that you’ll have to live with. He also demonstrated some of
the interface builder tools and how they link up.

Testing turns out to be quite interesting; the simulator
is okay but limited and often your app will work in the simulator
but fail on real devices. For example, your app on a real device
might “run so slow you wish it had crashed”. The simulator also doesn’t
enforce sandboxing as strictly as real devices, where
each app has its own uid and single directory where it
can store data. There are also many different hardware
variants that you have to support that limit you in different ways: for example, early iPhones only give you
40MB of memory to play with whereas newer ones give you almost 120MB. This is not reflected well in the simulator either.

Shipping an iPhone app on the app store requires approval,
a process that can take two weeks per round-trip.
There’s no way to get around it so you must budget in time
for that in development. The approval store helps guarantee
a minimal level of quality of apps—they will verify that
your app indeed works on all different hardware, and they will
(eventually) catch any licensing violations but they’re
overall pretty reasonable.

Once you get approval, you show up in the recent releases section
of the app store, and there you have about 24 hours to get popular or
else you will fade into the long tail of oblivion. In fact, if you are in the top 50 apps,
you will easily get double the sales (presumably relative to the 51st app); if you
are in the top 10, you’ll be getting an order-of-magnitude more sales. So, make
sure you get your approval/release-date lined up with your marketing blitz.
The alternative is to charge a bit more than $0.99, and go for slow but steady sales.

As a Blackberry owner and Linux user, I found Dan’s talk to be a great introduction
to iPhone development. Presumably his new ORA book, Head First iPhone Development, would be a good buy if you are in to that sort of thing.

Joel on FogBugz

Before lunch, Joel took the opportunity to give a pitch for his company’s FogBugz product,
and announce some new features. He gave us a walk-through of its capabilities, from
organizing documentation, to project planning, to bug tracking, to user support. New features announced
are a rich plugin architecture, plus support for Mercurial and code reviews
in a new hosted plug-in to FogBugz called Kiln. He spent a fair amount of time
on that, demonstrating calling hg push from the command line. He also demonstrated the evidence-based scheduling features of FogBugz.

Nothing too exciting for me working in a big company using Perforce, but a good marketing opportunity for FogCreek
and a nice chance to see how some other people do scheduling and bug tracking. I was a bit disappointed that
there’s no direct way to do pre-commit (i.e. pre-push) reviews a la Gerrit, but @jasonrr says you can set up a branch repo, push to that and then review there before merging to main. I expect this means
that GitHub will be getting code review support soon.

Lunch!

With that, we broke for lunch. @jldio has the scoop on lunch, and his own write-up of the day too.
More to come later, thanks for reading.

Mike Freedman and I have known each other since we were Masters students at MIT, working on things like the Tarzan anonymizing network (a parallel, pre-cursor to Tor). He went on to build the hugely successful (“as seen on Slashdot”) Coral content distribution network, which figured largely in his dissertation. It’s a great treat to have him talk here about how Coral was built and deployed. Be sure also to check out his research group blog for more interesting thoughts from him and his students!

What did you build?CoralCDN is a semi-open content distribution network. Our stated goal with CoralCDN was to
“democratize content publication”: namely, allow websites to scale by
demand, without requiring special provisioning or commercial CDNs to
provide the “heavy lifting” of serving their bits. Publishing with
CoralCDN is as easy as slightly modifying a URL to include .nyud.net
in its hostname (e.g., http://www.cnn.com.nyud.net/),
and the content is subsequently requested from and served by
CoralCDN’s network of caching web proxies.

Our initial goal for deploying CoralCDN was a network of volunteer
sites that would cooperate to provide such “automated mirroring”
functionality, much like sites do somewhat manually with open-source
software distribution. As we progressed, I also imagined that small
groups of users could also cooperate in a form of time-sharing for
network bandwidth: they each would provide some relatively constant
amount of upload capacity, with the goal of being able to then handle
any sudden spikes (from the Slashdot effect, for example) to any
participant. This model fits well with how 95th-percentile billing
works for hosting and network providers, as it then becomes very
important to flatten out bandwidth spikes. We started a deployment of
CoralCDN on PlanetLab, although it never really migrated off that
network. (We did have several hundred users, and even some major
Internet exchange points, initially contact us to run CoralCDN nodes,
but we didn’t go down that path, both for manageability and security
reasons.)

CoralCDN consists of three main components, all written from scratch:
a special-purpose web proxy, nameserver, and distributed hash table
(DHT) indexing node. CoralCDN’s proxy and nameserver are what they
sound like, although they have some differences given that they are
specially designed for our setting. The proxy has a number of design
choices and optimizations well-suited for interacting with websites
that are on their last legs—CoralCDN is designed for dealing with
“Slashdotted” websites, after all—as well as being part of a big
cooperative caching network. The nameserver, on the other hand, is
designed to dynamically synthesize DNS names (of the form .nyud.net),
provide some locality and load balancing properties when selecting
proxies (address records it returns), and ensure that the returned
proxies are actually alive (as the proxy network itself is comprised
of unreliable servers). The indexing node forms a DHT-based
structured routing and lookup structure that exposes a put/get
interface for finding other proxies caching a particular web object.
Coral’s indexing layer differs from traditional DHTs (such as
MIT’s Chord/DHash) in that it creates a hierarchy of locality-based
clusters, each which maintains a separate DHT routing structure and
put/get table, and it provides weaker consistency properties within
each DHT structure. These latter guarantees are possibly because
Coral only needs to find some proxies (preferably nearby ones)
caching a particular piece of content, not all such proxies.

Tell us about what you built it with.
CoralCDN is built in C++ using David Mazieres’
libasync and libarpc
libraries, originally built for the Self-certifying File System (SFS).
This came out of my own academic roots in MIT’s PDOS group, where SFS
was developed by David and its libraries are widely used. (David was
my PhD advisor at NYU/Stanford, and I got my MEng degree in PDOS.)
Some of the HTTP parsing libraries used by CoralCDN’s web proxy were
from OKWS, Max Krohn’s webserver also written using SFS libraries.
Max was research staff with David at NYU during part of the time I was
there. It’s always great to use libraries written by people you know
and can bother when you find a bug (although for those two, that was a
rare occurrence indeed!).

When I started building CoralCDN in late 2002, I initially attempted
to build its hierarchical indexing layer on top of the MIT Chord/DHash
implementation, which also used SFS libraries. This turned out to be
a mistake (dare I say nightmare?), as there was a layering mismatch
between the two systems: I wanted to build distinct, localized DHT
clusters in a certain way, while Chord/DHash sought to build a single,
robust, global system. It was thus rather promiscuous in maintaining
group membership, and I was really fighting the way it was designed.
Plus, MIT Chord was still research-quality code at the time, so bugs
naturally existed, and it was really difficult to debug the resulting
system with large portions of complex, distributed systems code that I
hadn’t written myself. Finally, we initially thought that the “web
proxy” part of the system would be really simple, so our original
proxy implementation was just in python. CoralCDN’s first
implementation was scrapped after about 6 months of work, and I
restarted by writing my own DHT layer and proxy (in C++ now) from
scratch. It turns out that the web proxy has actually become the
largest code base of the three, continually expanded during the
system’s deployment to add security, bandwidth shaping and
fair-sharing, and various other robustness mechanisms.

Anyway, back to development libraries. I think the SFS libraries
provide a powerful development library that makes it easy to build
flexible, robust, fast distributed services…provided that one spends
time overcoming their higher learning curve. Once you learn them,
however, they make it really easy to program in an event-based style,
and the RPC libraries prevent many of the silly bugs normally
associated with writing your own networking protocols. I think Max’s
tame libraries significantly improve the readability and (hopefully)
lessen the learning curve of doing such event-based programming, as
tame removes the “stack-ripping” that one normally sees associated
with events. Perhaps I’ll use tame in future projects, but as I’ve
already climbed the learning curve of libasync myself, I haven’t yet.

That said, one of my PhD students at Princeton, Jeff Terrace, is
building a high-throughput, strongly-consistent object-based
(key/value) storage system called CRAQ using tame. He’s seemed to
really like it.

How did you test your system for correctness?
I deploy it? In seriousness, it’s very difficult to test web proxies,
especially ones deployed in chaotic environments and interacting with
poorly-behaving clients and servers.

I did most of my testing during initial closed experiments on about
150-300 PlanetLab servers, which is a distributed testbed deployed at
a few hundred universities and other institutions that each operate
two or more servers. Testing that the DHT “basically” worked was
relatively easy: see if you actually get() what you put(). There are
a lot of corner cases here, however, especially when one encounters
weird network conditions, some of which only became apparent after we
moved Coral from the network-friendly North American and European
nodes to those PlanetLab servers in China, India, and Australia.
Always be suspicious with systems papers that describe the authors’
“wide deployment” on “selected” (i.e., cherry-picked) U.S. PlanetLab
servers.

Much of the testing was just writing the appropriate level of debug
information so we could trace requests through the system. I got
really tired of staring at routing table dumps at that time. Last
year I worked with Rodrigo Fonseca to integrate X-Trace into CoralCDN, which would have
made it significantly easier to trace transactions through the DHT
and the proxy network. I’m pretty excited about such tools for
debugging and monitoring distributed systems in a fine-grained
fashion.

Testing all the corner cases for the proxy turned out to be another
level of frustration. There’s really no good way to completely debug
these systems without rolling them out into production deployments,
because there’s no good suite of possible test cases: The potential
“space” of inputs is effectively unlimited. You constantly run into
clients and servers which completely break the HTTP spec, and you just
need to write your server to deal with these appropriately. Writing a
proxy thus because a little bit of learning to “guess” what developers
mean. I think this actually has become worse with time. Your basic
browser (FireFox, IE) or standard webserver (Apache) is going to be
quite spec-compliant. The problem is that you now have random
developers writing client software (like podcasters, RSS readers,
etc.) or generating Ajax-y XmlHttpRequest’s. Or casual developers
dynamically generating HTTP on the server-side via some scripting
language like PHP. Because who needs to generate vaguely
spec-compliant HTTP if you are writing both the client and server?
(Hint: there might be a middlebox on path.) And as it continues to
become even easier to write Web services, you’ll probably continue to
see lots of messy inputs and outputs from both sides.

So while I originally tested CoralCDN using its own controlled
PlanetLab experiments, after the system went live, I started testing
new versions by just pushing them out to one or a few nodes in the
live deployment. Then I just monitor these new versions carefully
and, if things seemed to work, slowly push them out across the entire
network. Coral nodes include a shared secret in their packet headers,
which excludes random people from joining our deployment. I also use
these shared secrets to deploy new (non-backwards-compatible) versions
of the software, as the new version (with a new secret nonce) won’t
link up with DHT nodes belonging to previous versions.

How did you deploy your system? How big of a deployment?
CoralCDN has been running 24/7 on 200-400 PlanetLab servers since
March 2004. I manage the network using AppManager, built by Ryan
Huebsch from Berkeley, which provides a SQL server that keeps a record
of current node run state, requested run state, install state, etc.
So AppManager gives me a Web interface to control the desired runstate
of nodes, then all nodes “call home” to the AppManager server to
determine updated runstate. You write a bunch of shell scripts to
actually use these run states to start or stop nodes, manage logs,
etc. This “bunch of shell scripts” eventually grew to be about 3000
lines of bash, which was somewhat unexpected. While AppManager is a
single server (although nodes are configured with a backup host for
failover), CoralCDN’s scripts are designed for nodes to “fail same”.
That is, requested runstate is stored durably on each node, so if the
management server is offline or returns erroneous data (which it has
in the past), the nodes will maintain their last requested runstate
until the management server comes back online and provides a valid
status update.

How did you evaluate your system?
We performed all the experiments one might expect in an academic
evaluation on an initial test deployment on PlanetLab. Our NSDI
‘04 paper discusses these experiments.

After that stage, CoralCDN just runs – people continue to use it, so
it provides some useful functionality. My interest transitioned from
providing great service to just keeping it running (while I moved onto
other research).

I probably spend about 10 minutes a week “keeping CoralCDN running”,
which is typically spent answering abuse complaints, rather than
actually managing the system. This is largely because the system’s
algorithms were designed to be completely self-organizing – as we
initially thought of CoralCDN as a peer-to-peer system – as opposed
to a centrally-managed system designed for PlanetLab. System
membership, fault detection and recovery, etc., is all completely
automated.

Unfortunately, dynamic membership and failover doesn’t extend to the
primary nameservers we have registered for .nyud.net with the .net
gTLD servers. These 10-12 nameservers also run on PlanetLab servers,
so if one of these servers go offline, our users experience bad DNS
timeouts until I manually remove that server from the list registered
with Network Solutions. (PlanetLab doesn’t provide any IP-layer
virtualization that would allow us to failover to alternate physical
servers without modifying the registered IP addresses.) And I have to
admit I’m pretty lazy about updating the DNS registry, especially
given the rather painful web UI that Network Solution provides. (In
fairness, the UIs for GoDaddy and other registrars I’ve used are
similarly painful). I think registrars should really provide a
programmatic API for updating entries, but haven’t found one for
low-cost registrars yet. Anyway, offline nameservers are probably the
biggest performance problem with CoralCDN, and probably the main
reason it seems slow at times. This is partly a choice I made,
however, in not becoming a vigilant system operator for which managing
CoralCDN becomes a full-time job.

There’s a lesson to be had here, I think, for academic systems that
somehow “escape the lab” but don’t become commercial services: either
promote lessened expectations for your users (and accept that reality
yourself), build up a full-time developer/operations staff (a funding
quandary), or expect the project to soon die-off after its initial
developers lose interest or incentives.

Anything you’d like to add?
My research group actually just launched a blog. In the next few
weeks, I’ll be writing a series about some of the lessons I’ve learned
from building and deploying CoralCDN. I want to invite all your
readers to look out for those posts and really welcome any comments or
discussion around them.