Robert asserted that "when you are trying to build a business you don't have the luxury of digging through a ton of source code and then try to figure out the where the problems are and make changes." I disagree. You don't have the luxury of doing so, you have the requirement to do so. To put it simply, if you rely on a particular piece of software, and you need something changed, then you need it to be changed, and that is that. It is up to you.

Now, just because it is up to you doesn't mean you need to go implement the fix. Heck, you probably cannot even do it yourself unless the library is both open source and openly developed. If development is closed, your "option" is generally to throw money at the people who wrote the bug to make said bug go away.

If on the other hand you do have the option of fixing it yourself, you still rarely have to. An awful lot of the time merely pointing out the problem to folks already involved with the project will lead to it getting resolved. If just pointing it out won't get it fixed fast enough for you, it is downright shocking how many people will let you pay them to write software, so you can just pay someone to do it. If you don't have the funds or access to the information you need, you can play weird PR games to "force" them to do it (this actually works for security related stuff, frequently). If all else fails, you just fix the damned thing. That is about it: 1) ask nicely, 2) ask nicely with dollars on top, 3) threaten or bluster, 4) get sued by Nike, er, I mean, just do it.

The fact of the matter is that you need something changed in a library you use. You use this library, presumably, because it saves you a bunch of work. So you have to do a little work to still save you a bunch more work. (You might, alternately, decide that the amount of work you have to do to fix it is greater than the amount you saved, in which case you accept you made a bad call, and go do the right thing.)

Now, a key to understanding this is to realize that the people who wrote this library that is saving you a a bunch of work did it to save themselves some work. Frequently others chipped in as well. Shockingly, over time, you wind up with something like this, for example,

where a lot of folks contributed to apache, mod_gzip, mod_auth_passthrough, mod_log_bytes, mod_bwlimited, mod_ssl, OpenSSL, and PHP, so that the combined efforts have enabled a blog. Almost all of them said "oh shit, I need [foo], and [bar] doesn't do it" so went and did it. For the economics of it, go read Yochai Benkler. The gist is "you get out more than you put in."

But, you say, "Gee, I would love to write acts_as_infinitely_scalable but that pesky DHH guy won't let me change things in Rails, only the 'core team' get to do that." First off, that ain't true, they accept patches and have a good record of applying them. Second off, damned straight it is easier for them to do it -- they earned that trust. (Actually, acts_as_infinitely_scalable would probably be declined in favor of making it a plugin.)

They earned that trust for a number of reasons, but I will bet you pennies to bricks of gold that the majority got there because they relied on the project and needed more control over it. Maybe they rely on it because they are consultants who use it as advertising, possibly they need control over it in order to build up their personal ego, not infrequently they just needed increased responsibility in order to become better programmers, but most likely (I am not going to go find the citations, this is a blog not a journal) they were building products which relied on it. I doubt a single one thought about it in terms of "I need more control over [baz]," but in the end, that is what it was.

If you rely on [something], and you see it needs [something else], you do a quick calculation in your head to decide if the value of [something] is greater than the cost of [something else] and either do [something else] for [something] or ditch [something] in favor of [yet another thing] that already does [something else].

Figuring out the value and/or cost of [something], [something else], and [yet another thing] can be a pain sometimes, but that is why we get paid the big bucks :-)

So, the recent rash of controversial benchmarks have inspired me to do some of my own microbenchmarking. I want to test web servers, and specifically *slightly* dynamic stuff, basically how long does it take to setup and teardown the request processing stuff without doing anything significant with the request (like talking to a database) as I want to isolate and benchmark the actual web part, not anything else. Luckily, there is a perfet test for this, Hello World! All tests were performed on the same machine, and I won't provide the code or configurations (as then they might be considered useful). I did tune each one to the best of my ability for running in the environment it is in (4 core machine, plenty of ram, etc). So, without further ado, results of a dynamic Hello World on a number of different webapp-ish servers, sorted by performance:

Server

Requests / Second

Notes

Apache HTTPD (Worker) via mod_hello_world

Requests per second: 18823.58 [#/sec] (mean)

This is basically a minimal httpd module which just prints hello world to the response

Apache HTTPD (Worker) via mod_wombat

Requests per second: 17856.76 [#/sec] (mean)

This uses a server-scoped mod_wombat handler with a default pool of fifty Lua virtual machines

Apache Tomcat 5.5.20

Requests per second: 17644.40 [#/sec] (mean)

This used a JSP and a few hundred thousand requests before the benchmark to let the JVM warm up.

Jetty 6.1.1

Requests per second: 12449.36 [#/sec] (mean)

This used a JSP and a few hundred thousand requests before the benchmark to let the JVM warm up.

Mongrel accessed directly

Requests per second: 2378.05 [#/sec] (mean)

This was done via an HttpHandler rather than Rails in order to cut down on overhead

Mongrel, four instances, proxied through Pen

Requests per second: 2109.91 [#/sec] (mean)

This configuration was basically a test to make sure that ruby's single-threaded nature wasn't the bottleneck. It seems the additional proxying was much more expensive than not using the extra four cores available.

TwistedWeb

Requests per second: 2089.55 [#/sec] (mean)

This is the current TwistedWeb, not TwiestedWeb 2.

Mongrel (four instances) proxied through LightTPD

n/a

This was a very unstable configuration. I was unable to run more than a thousand or so requests before lighttpd would start losing track of mongrel instances and start returning errors. LightTPD returns errors very quickly, however.

Now, to complicate matters further. Mongrel doesn't support HTTP keep alive, so I re-ran all the other benchmarks with keep alive disabled in the client and the numbers dropped about in half in every case, the three-way-handshake seems to matter a lot for micro-benchmarks.

Finally, this is a wholly unscientific set of benchmarks which is basically useless, so please don't read anything important into it. Any real system is almost certainly not bound by request handling in the application server :-)

Mongrel is getting a lot of good (and deserved, in my opinion) attention lately as an app server for ruby. One of the things that bothered me about it, for a good while, was this decision, explained in a comment in mongrel.rb:

# A design decision was made to force the client to not pipeline requests. HTTP/1.1
# pipelining really kills the performance due to how it has to be handled and how
# unclear the standard is. To fix this the HttpResponse gives a "Connection: close"
# header which forces the client to close right away. The bonus for this is that it
# gives a pretty nice speed boost to most clients since they can close their connection
# immediately.

Interestingly, the whole HTTP status line and first couple headers are a constant, frozen, string -- short of patching mongrel or using your own TCP connection handling in your Handler, it *will* close the connection a la HTTP 1.0.

I know Zed is an awfully good programmer, so this decision really irked me. I recently asked why this was so, and the answer amounted to ~"because it fits the use case for which mongrel is intended, and makes life easier," which is valid. So, how does it fit this use case?

If you think of mongrel as being designed to run fairly big sites with one dynamic element and mostly static elements, and then this decision works. Basically you have mongrel serve the dynamic page (possibly from rails) and go ahead and close the connection because you know the same server isn't going to receive a followup resource request immediately, those are handled by servers optimized for that, or by a content distribution network. In this case the Connection:close on the initial request makes sense, the browser is going to be opening additional connections to a different host (or hosts for a CDN, or round-robined static setup) which will pipeline requests for resources.

Yahoo! is a good example of this, we see the initial response headers for the front page, made against www.yahoo.com, return the Connection: close header:

Mongrel is not designed to be a general HTTP server. However, put Apache 2.2 with the worker mpm and mod_proxy in front of it (making sure to strip out the Connection: close header) and you have a pretty decent setup for a high-load system. Just make sure static resources (including page caching) get served up by apache, not Mongrel :-) This will work best when Apache and Mongrel are on the same machine to reduce the overhead for mod_proxy's connection establishment, but given a fast network, the local connect will be far from the bottlenecks for dynamic pages (and Apache is serving the statics directly).

I have been corrupted by JavaScript. Ruby really annoyed me when I
could not just add properties and functions totally willy-nilly to
instances. Ruby makes it easy to reopen classes, or even instances,
but it is not so obvious how to do so without creating a new scope
(and hence losing your closure).

Aside from the fact that this is a vicious hack, it relies on the
fact that send lets you invoke private methods
(define_method) on the singleton class (more recently
going by the alias of eigenclass) of foo. This is pretty
much a bug and a violation of the object. It's like object rape, or
something.

A nicer way to do it, to my mind, is to use anonymous modules. The
really nice part of using anonymous modules is that the
Module constructor takes a block which is evaluated in
the context of the new module, letting you legally call
private methods, like define_method.

After the Silicon Valley Ruby Conference I wanted
to hack some at a declarative language for web service
orchestration. Tests pass, but the web service bindings
don't exist yet =( Right now it can be used as a nice
declarative process for... er, ruby?

annoys me. It's needed to lazily evaluate the construct, using
a real if there evaluates it at the wrong time. Astute
code readers will also notice I am not testing the transform. Haven't
done that yet -- will probably remove it as a first class concept
if I do anything more with the code, a transform is just a local
service, and services are all wrapped in ruby methods, so instead
of importing the service, just provide the transform and voila, your
toast is burnt.

Copies of my Rails presentation from ApacheCon US '05 are up
(finally). My apologies for having the wrong one on the CD from the
conference. I think the CD makes up for it though by including a
presentation on Drools under the Rails talk demonstration, along
with the older version (and probably more useful to a non-live
audience) of the Rails slides.

This was a fun, though somewhat scary, talk to give. It was fun
because the subject is dear to my heart (I do believe that Ruby (and
Rails)) is better for a lot (not all) of the development being done
right now). It was scary because Rails is such a hot topic that it
lead to a really big audience and probably high expectations. Then
Craig and Craig started asking questions. Craig and Craig being
rather eminent people in their respective fields, those fields being
object/relational mapping and web frameworks respectively (rather
relevant to Rails).

Feedback aftward indicated that the presentation was useful and
enjoyable though. Phew!

If I can figure out a way to post the presentation complete with the
undead (not live, but it looks live) code I will. Right now that
would mean posting a 70 meg keynote file though and I don't want to
inflict that on my bandwidth consumption =(

ps: an interesting conversation occured right before the talk. It
was interesting for two reasons. First, the guy I was talking to (at
ApacheCon) responds to an @microsoft.com email
address. Second, he was looking for someone to give a talk about
Rails (not for Microsoft directly, though).

I saw Bruce's post about (the incredibly useful) pbcopy/pbpaste today. They rock! That said, they are not quite what I need a lot of times, so here are a couple quick tools I whipped together for ye shell bangers.

get and put : These are
basically a clipboard for the shell. Get takes input from standard
in and stashes it. It is like yank or copy in a gui. Put does the
opposite, it takes whatever get stashed and spews it to standard
out (paste). No biggie, but useful: (I inserted the blank lines in
the output, for readability)

Okay, so this is like kicking a dog with three legs, but I cannot
resist =) Apparently, Java is good
enough. Michael has some very good points -- the biggest being
that languages have become dominant in the past when they have
ridden a new application paradigm. Java rode the move from
client/server to... a different client/server with a web client
backed by a gargantuan app server playing the same role as the
mainframe used to, or as he points out, with the mainframe still
right behind it.

Apparently Java is good enough -- specifically in the arena of web
app development. I think he misses Java's real strength which is
that there is a huge pool of mediocre developers available who can
do good enough work when given tools indistinguishable from magic =)
This is a huge benefit, which is not to be underestimated,
and no, my tongue is not in my cheek when I say that, really. The
productivity level achievable with IDEA (or Eclipse, or NetBeans)
and Java for the Corporate Developer (Sun's marketing term for who
they aim Java at, at least the one they say to people) is immense.

So, because we have tools which make the language irrelevant, is it
really Java we care about? I think we need to stop calling ourselves
Java Developers and start being honest about it, we are IDEA
Developers, Eclipse Developers, or, to be completely honest, XMLSpy
Developers.

You master your tool and can work wonders with it. I have macros,
er, I mean Live Templates, for probably a third or more of the code
I actually right. tnuo[tab], test[tab], set[tab], tear[tab],
xsde[tab], puts[tab], itar[tab], fore[tab], itli[tab],
etc[tab]. Nice. Now, consider for a second an alternate
approach. Consider a tool which made it really easy to make those
macros, by say, interpreting them at runtime. You'd have to pick
better combos than tnuo (throw new
UnsupportedOperationException("come back and implement
me!");) but funny how it works. Now, that could be kind of
handy. What if you can do the same thing for how the tool
works?

Sure, Java is good enough. So are two penny nails. Unless you are
trying to eat ice cream. Then a spoon helps.

Ruby isn't going to replace Java. Java is a marketing term for
Gnome. Ruby isn't a window manager. Well, er, it does play nicely
them though, not too
bad, actually. Sorry, back off the tangent. Ruby isn't going to
replace Java like Java replaced C++. Java replaced VB (which is an
awesome tool around a ... hey, wait a minute!).

Java is definately good enough. The key thing is for it to evolve to
stay good enough. Will it? Well, the language is under the control
of a committee of committees of people who dislike each
other. Luckily, we aren't actually Java Developers, we are IDEA
Developers, and IDEA is under the control of some very big-thinking
Russians. I think us IDEA Developers have some potential, not sure
about the Eclipse folks though, they have the whole open source
thing going, but it seems like most of the control over Eclipse is
by people who sell competition to it and use it as a testbed for new
research ideas ;-) So it will either be an awesome amalgamation of
cutting edge software engineering research and heart warming open
source community goodness, or a bike shed. Overall, not bad. I think
us IDEA and Eclipse, and even the NetBeans Programmers, have a
future.

Seriously now, IDEA is a fantastic, er, language (?) when you have
big complex systems with a significant number of developers. Static
typing, good garbage collection, a good compiler, excellent code
navigation, and medium-level performance for the actual
runtime (note that performance != scalability). It has fantastic
libraries via it's Java runtime system, and bindings to most every
information system around. IDEA has legs.

I still prefer Ruby, personally, but I do more work in IDEA than I
do in Ruby, so take that as you want.

Side not: Fortran still being used because it is "good enough" is
bollocks. Fortran is still used because nothing is faster when you
need hardcore number crunching. Part of this is the immense
optimization that has gone into compilers, the very mature vector
processing libraries, the harsh limitations in what you can actually
describe in the language in order to help the compiler optimize the
code better (us IDEA developers have complained about non-reentrant
EJB's, how about non-reentrant *functions*) etc. Most scientists I
have met (this is not a scientific survey, I don't know that many)
prefer Perl. Go figure.

Hmm, that is nifty. So what though? Well, this will work
transparently between ruby and Java (via the JMS API) --
right now just on the most performant and easiest to use open source
JMS implementation around (just my opinion) -- but with just a
couple hours work, any JMS implementation.

I was going to wait to post much more about this until I'd had a
chance to push together Perl, Python, PHP, Bash, PLT Scheme, and
maybe an SBCL implementation of the client -- but comments on the TSS
kinda pushed me over the edge, so I'll just post now =)

The TTMP protocol has changed some since my last post, but the
basics are the same. It will be changing some more, but a solid 1.0
protocol spec should be available after this coming weekend (unless
I have too much fun up in NYC with Patrick). The
implementation for ActiveMQ is in
subversion now and should be available with the upcoming 3.1 release
-- you are welcome to grab the snapshots, or build one to play. Once
I am happy I'll put a tarball up with a default ttmp handler,
alongside a default optimized binary (for the Java and C# clients).

Ruby isn't the threat to
Java, vendors jockying for advantage at the expense of their
users in the standards game is the threat to Java. I wholly
agree with Jason that Java is being disrupted though.

Ruby is a fantastic language, but not one which will "supplant" Java
(I still
believe that whatever the next dominant language is, it will
look and smell like Scala). I
have a sneaking suspicion that language diversity is picking
up. Sure, something will dominate like Java, C++, Fortran, Pascal,
COBOL, etc have -- but for a while there won't be. Ruby is one
option -- it has been my preferred language for a few years -- and I
use it where I can and where it is appropriate. That is actually
more and less places that might be thought. I cannot think of the
last significant Java project I have worked on which didn't have at
least some one-off ruby code generators, for instance. Will I stop
writing Java? Heck no -- I like Java, for all its foibles and flaws
(just as I do Ruby, for all its foibles and flaws). Pick the best
technology for the job -- sometimes that is even Scheme (which I
used most heavily when my primary role was systems admin stuff, go
figure) =)

So, I wanted to post some trackbacks to blogs.apachecon.com and, well, I don't use a fancy blog entry editor thing which can do trackbacks for me. In case anyone else is in the same situation, here is a little command line tool for posting trackbacks: trackback. Unlike the actual spec, all fields are required for this client, sorry, is 10 minutes of coding between sessions =)

brianm@kite:~$ trackback -h
Usage: trackback [options] excerpt
Options:
-b, --trackback-url VALUE The url to post the trackback to
-u, --url VALUE URL of the response post
-n, --blog_name VALUE The blog name for the trackback
-t, --title VALUE Show or hide result url in output
-h, --help Show this message
brianm@kite:~$

If you don't specify a part on the command line it will prompt you for it -- should be friendly enough.