Ryan "zenspider" Davis, Hardcore Ruby Hacker

Recorded at:

Bio Ryan Davis is an object-oriented languages bigot. He cut his teeth on Smalltalk, which made him a really bad C programmer. He got into Ruby when it came around in 2000 because "it just felt right." Ryan is a hacker in the true, positive sense: He spends a lot of time writing code and the Ruby community is all the better for that fact.

I've always a object oriented languages bigot. I cut my teeth on Smalltalk originally, and it fell in naturally. It changed the way I thought and I became a really bad C programmer very quickly because I started thinking in objects about everything. When Ruby came around in 2000 it just felt right. It let me code in ways that my brain was already thinking and it really had an impact on the way I was able to code, because at the time I was doing some Perl work, some Java, some C/C++ and it was always something with the language fighting against me and Ruby didn't do that. It got out of my way and allowed me to express myself.

I spent a lot of time writing code. I kind of bleed between regular software engineering and QA Development, QA tools development, so I write code that hurts code and I write code that hurts systems pretty naturally. At Amazon we were working on systems to make QA for non-programmers easier, web-site interaction, stuff like that; we were doing DSLs for web testing; we were doing that in a combination of Perl and Java. The engineering was ok, but it got to the point where we weren't able to scale it up anymore and it started to really hurt.

It was tough. We were working on some really core stuff and the best we had was the Ruby hacking line coupled with Excite's Japanese/English translator which is amusing. They don't have 'parsers' apparently in Japan (they have got 'pursers' for example.)

I've always been a language bigot. Languages are a hobby of mine. I like writing little languages to describe problems and that's basically called DSL now. Language analysis for Ruby itself is actually something that I like to focus on so while we were working on a number of projects, they kept coming up that we needed to be able to analyze the language more, so we started working on a project called ParseTree. It allows us to extract the parser information from Ruby and make it digestible to regular Ruby programs.

It led to a number of things, like I became friends with INGY, Brian Ingerson, who is the author of perl.inline, through a common friend of ours. He basically non-chalantly asked "Why can't we do that in Ruby?" I said "we can but how much will it take?" and it turns out that it takes about 63 lines in order to inline C code in your Ruby and have it automatically compiled. It's become a more complex business with a lot more stuff that we've added to it. The nice thing was that it existed as a collaboration between us. They took some ideas away from our project and reintegrated them into perl.inline.
On top of that we used it as a foundation to build a lot of other tools, so we've got ParseTree, which extracts ASTs, built on top of that a bunch of language analysis tools that allow us to do type inference or just lexical analysis. We were able to do dynamic translation really, really cleanly with a nice architecture. We've written things like Ruby to C, Ruby to Ruby... so instrumentation and injection of code is really a simple thing for us to do.

You'd do that for performance reasons or for binding to external libraries that you haven't written or for getting into Ruby's internals. So we used Ruby inline in order to do ParseTree because it was the simplest way to get to the Ruby internals. We've also used it to attach to third-party C libraries, although sometimes things like DL are a more natural fit for that. DL is the Dynamic Linker Library that ships with Ruby and it allows you to point to a dynamic object library (an SO, or DLL in Windows) and specify in fairly Ruby terms what the interface is and be able to call it as naturaly Ruby and have it linked through. It's all done through dynamic compilation with the interfaces that C provides, so it doesn't require any compilation or anything. You just have to have the binaries available. We take a different approach and we're actually writing C files, seeing if we need to recompile them, and recompiling them and linking them back in on the fly.

Basically where I see languages going is you're getting to the point where either you have a purely dynamic language with things like Smalltalk and LISP falling in that area or a fully compiled language like C or C++.
Then there's stuff in the middle basically called scripting languages. It's no longer about whether they're compiled or interpreted, but how open or closed they are in what they provide. Scripting languages are in-between purely static and purely dynamic languages.
Where Perl, Python, and Ruby fall down and drop the ball is by having a purely open meta-object model where you can manipulate how the language works at any level. Smalltalk and LISP allow you say how that method dispatch works because that stuff is available to you at every level. In Smalltalk it's because it is Smalltalk. I don't have as much intricate work with Lisp, but as far as I can tell it's because it's fully meta-circular.

It's a question I get a lot. I love those languages, but they are not broken. They don't need as much work and I work with things that are broken, because that's just how I'm wired. I see where red flags are raised and I go attack them.

ZenTest has really grown up in the last year. We've extended it a lot. It originally started off as just an auditing and code generation tool, where it would use Ruby's really strong reflection capabilities to look at your implementation and look at your tests, whether they existed or not and compare the two, using some naming rules. It could look at your test and say: it's got test blah but it hasn't implemented blah yet. So you generate that on the fly and just have it raised as an exception saying "I'm not implemented yet". That let's say you take the ddd approach.
Or if you adopt a new project or you're going to try to backfill it or audit its tests to see how strong they are, you can compare the two and you can see you've got a method on your implementation that doesn't have any corresponding test and it will generate those on the fly so such that they fail.
That let's really quickly ramp up and audit your code against some fairly flexible naming schemes. Some people don't like the flavor, but it's pretty flexible in that you can do many tests against a single implementation, and test all your edge cases separately, which I think it's a very good practice to do.

It supports both models and it's supposed to be bidirectional and pretty open to anyone's programming practices. I use it on the TDD side so that I can bypass the first initial steps of even creating an implementation file. I can start writing tests, get my first test case in place and instead of seeing "I don't know how to require that, I don't know what that class is, I don't know what that method is", we can skip all those.

It generates the implementation method against the test name. So if you have in a test, do_something, it expects to see do_something on the implementation side also and the class name is mapped as well, so you've got a one-to-one correspondence between a test class and implementation class. And then a many to one mapping between test names and implementation names. We can see that we have a particular test and we generate the implementation on the fly including the class definition if needed, which with Ruby being open you can keep opening over and over and that's fine. So it will generate you a new implementation method that raises an exception saying: "I'm not here yet", but that lets you focus on your test for the time being and get that part done with. And when it comes time to implement it you just fill in the blanks.

ZenTest is my oldest product. It is actually very useable. The test-auditing aspect of it might not be for everyone, but there are a lot of other components that are available in ZenTest that are useable for everyone in every context.

Probably the two neatest things that it has are called autotest and unit_dff. Unit_diff is the easiest thing to convey, so I'll start with that. Basically it is a pipe filter that lets you run your tests piped through unit_diff, and when you get a failure instead of getting a very large blob of code expected against a very large blob of code resulting, it runs those blobs through a diff and shows you exactly what's different. It lets you focus on your error almost immediately, instead of searching through a bunch of text in order to figure out what's wrong.
Autotest, which I'm falling in love with, written by Eric Hodel, understands especially Rails layouts but standard Ruby file system layouts and it understands naming conventions in various formats such that it can see that you have made a modification in a particular implementation file, or test file, view file, controller, model, whatever, and it'll go run the corresponding tests automatically in a continuous feedback loop. Basically what it is, is Tinderbox or CruiseControl before you've committed. It's continual and always there. So you just bring up a terminal, go the right directory, fire up Autotest and let it go. It runs your unit tests, your functional tests, and your integration tests and then just sits there and waits. If you go and modify a test such as you get a failure, it knows how to get that down so that you only rerun what you need to rerun in order to those things green again. Once you're green it will rerun everything again to make sure you didn't break anything behind the scenes, automatically. It is the tightest feedback loop I've ever seen. It is great.

No, you still do have that. What we didn't want to do at the time was we didn't want to have ourselves accidentally infect our results and so we're shelling out and invoking the right test files, sometimes with -n in order to name what particular test case we want, but we're still invoking that, we still get that cost. But we're working on tools to address that as well. We're building a tool that will actually be hooked-in automatically from autotest that will load up config/environment.rb, which is the bootstrap mechanism for Rails and incurs the most costs for loading much of anything. It will load all that stuff up, get everything initialized, set up the paths and everything and then open up a socket with an interface to interact with it, saying "I want you to run this class test with this test method and it will fork so that you've copied your cost of initializing and run just those things so you don't have that cost anymore. Basically we've seen a 10x improvement in speed.

ZenHacks is my playground, it's where all the stuff that is dependent upon these packages, ZenTest, Ruby Inline and ParseTree, RubyToC, are actually part of that product but actually it's nice and shiny and really interesting, they are the fun projects. That's where they go, they go in my playground and that's where we've got things like Ruby2Ruby which translates Ruby into an AST and then back into Ruby.
An AST is the internal representation of what the parser sees when it looks at your code. If you have a method with if/true then you're going to have a tree structure that actually represents that if; a tree structure that it's easy to evaluate.

To the application developer it doesn't matter as much. What we were able to do though is use that as a tool developer, we were able to use that to provide them with tools; that will give them more insight into their code. For example, we can take that if statement, the conditional, and we can instrument it automatically by converting it into an AST, analysing it, and dealing with the conditionals, injecting code so that we can do things like measure for coverage, measure for number of hits, for basic profiling on subsections of code; tracing at a conditional level instead of a method level. We were able to do a lot of stuff that we can provide them that they can use in order to get insight and improve their own code.

One of the nice things in Ruby is that it has an amazing runtime with a lot of reflection and dynamic abilities. For example, the debugger is written in Ruby. The profiler is written in 53 lines of Ruby! I think it's amazing that we've got a language available to us where we can write really heavy tools and something that prints on a simple page of code.
The debugger is a much bigger product. I think it falls into the 3-400 line range, but both of them suffer from some of the costs of being so dynamic, and they're actually pretty slow, when you're trying to do particular things. The profiler is just dirt-slow in general, because it's hitting everything. The debugger's slow and they start doing evaluated conditionals and break points.
What I've been able to do, was use things like Ruby Inline and my understanding of the Ruby internals to bypass the costly mechanisms, but still get all the dynamicism from it. I was able to convert the 53 line of slow profiler into a 150 line translation, where almost all the profiler code is exactly the same and the only thing that really differed was how we dispatched that into the Ruby code. We bypassed the slow mechanism, and we got some Ruby inline code that lets us go directly around the costly parts and hit that. The nice thing was that we were able to abstract that, since the invocation cost of the same between the profile and debugger; so we now have the exact same thing being reused in the debugger and we were able to do it in such a way that we're maintaining on our side about 10 lines of the debugger in order to stay completely compatible to the old code.

Not that much. The reason is not the traditional reason. I don't litter my code with puts in order to debug and stuff like that. I don't use a debugger nearly as much, because I write a lot of tests. I do a lot of TDD, I look at my output and if I see something wrong, if I'm running a web app or Rails app and I see something wrong, I say "What test didn't I write?", and I go replicate it there. That lets me nail it down in such a way that I can figure out what the problem is and I can prevent it from ever regressing. That's a double win.

Rubyholic is a Rails app that I wrote to meet the need of Ruby user groups wanting a consolidated place for people to figure out information about their cities' Ruby groups, locations etc. It's a site. It's a directory site. It's MeetUp for Ruby only.

I wrote it using what I now call "No-Peek TDD". I implemented the thing from scratch, from line one using TDD and having at no point in time from initial development till launch having loaded it in a browser and looked at it.

We've implemented the entire thing in a little under a week and a half and most of that was done starting at MindCamp last year in Seattle, over a 24-hour period, and then I did some more development on my vacation.
We wound up doing a private beta mid-November and found some glaring holes and the holes always wound up being in places where we didn't write any test whatsoever, where we refactored the view side early and didn't write any test replicate various sections of the website. So, over-eager code reuse was where we got bit. The nice thing was that we did a private beta and we got reports back that various actions were not saving, basically what it always came down to.
As we refactored our controllers well we were able to reuse a lot of code through helpers. We just put these things in and we didn't have the corresponding tests, and didn't have the corresponding action that actually saves. What wound up happening was they said that such and such section wasn't saving anything when we did the Ajax callback, and when I looked, sure enough there was no action to do that, no test to say we didn't do it right.

Yes it's a missing functionality. It was stubbed-out properly, we started off with paper and crayon, we moved that into skeletons and never plugged them in. The nice thing was that we had a situation where we had such well-factored tests that we could get a description saying "you didn't do this"; we'd write the test, the implementation and it was instantaneous, we'd push it back out and it would be plugged in right.
The only other thing that we had beyond that, and this was done a social experiment on purpose, we didn't use validates or anything; we wanted to go completely TDD and see how well that worked. We had places where people would put in, I think on purpose, bad data, in order to see what happens and that would go through.

I am a dynamic languages proponent, thus far with languages I do have experience with. I have yet to go with languages like Haskell or Erlange, where there is static tying that they tend to get in the developers way with how they think. I think on the fly -- I would like to be able to optimize later and express myself first. I want to be able to get into a text editor and get a language that's going to let me think in the code and brain dump how I see the code and make it work. Static languages make me stop and deal with extraneous data that usually doesn't help me.

I think they're certainly getting played. We have plenty of examples with uncommon web, with SeaSide, with a bunch of dynamic language web frameworks that work perfectly well without static typing, Rails obviously being an example of that. I think they are certainly getting played by typing proponents. There are many examples of perfectly compiled static programs that are incorrect. Typing does not prove your program is correct. Typing proves your type is correct, and that's about it. You still have a lot other aspects that go wrong, left and right.

No, it doesn't mean that at all. All that it means is that things line up with regard to minor details actually. Nine times out of ten the fact that my long matches with a long rather than a char doesn't really matter. Languages like C which are not actually strongly typed, in that they are actually dynamically typed, wind up being just as bad and you wind up masking that. It is almost like waterwings where you have this false sense of safety. It doesn't really help and it winds up spending a lot of your extra-time dealing with that type of stuff. I think the 80-20 rule applies.
I would rather spend less time dealing with typing and more time getting my code done, writing unit test and functional test to ensure that my code works at a top level, where it should be, than having a compiler get in my way and tell me that my types don't line up, when it doesn't matter in the first place.
All the supposed benefits of static typing are thrown out when you start doing really dynamic programming in the first place. As soon as you start doing meta programming, as soon as you start generating code on the fly you either need the compiler built in your system, working for you, which kind of defeats the purpose in the first place or you need to be able to express yourself in the most dynamic way as possible, and have the compiler or static-type checker get out of your way.

In the simplest sense, meta-programming is writing code to write code for you. It is generating code on the fly such that you can express yourself succinctly. You can say for example: a Rail model has many addresses and it's going to generate all the accessors, the generators, the validation checks, the consistency checks for you so that you can express yourself in one line of code and convey a lot.

Being able to speak in the vocabulary of the domain. The fact that you're going to be writing something succinctly more or less in an English manner, so that domain experts can understand what you're doing. Being able to say that a customer has many addresses means something to the business analyst means something. Being able to show them a reams and reams of code and saying "this means this", doesn't mean anything. So by bringing the interaction between the programmer and a domain expert to a higher level so that they can converse in English and have a common vocabulary means a lot and it cuts down a lot of miscommunication. It also cuts down a lot of maintenance.

Ruby provides a pretty straightforward manner in order to describe the things at a DSL level and have them read in such a way that fairly English-like way. Ruby uses funny syntax words like "if", "than", "do", "end", "begin" and that conveys. It also has a very strong reflection mechanism and dynamic programming system so that you can have it generate a lot of code behind the scenes, that you don't see, such that it is doing a lot of work for you, but is expressed in the highest level terminology as possible.

No it's just a natural fit and you get a lot for free. You can use language-parsing tools in order to write things in a completely custom grammar, but the fact that Ruby's syntax and semantics are as clean as they are, lends to DSLs very naturally. It's "bang for the buck!"

Yeah I've done both. Basically it really matters about your problem domain and which one will benefit you the most. When I was writing systems for doing QA on websites; we were doing high-level languages for a website interaction at the browser level. We wanted to be able to go to a particular web page and audit the images and links and make sure they are valid, queue up extra task etc, and delaying that action instead of making it immediate had the benefit of being able to distribute our work, being able to queue it up for later use, be able to have a lot of things going in parallel in different realms. We'd customize various workers to do a particular type of task and put them wherever we want.
As compared to the direct worker approach sometimes you just want immediate results and the implementation difference is actually pretty trivial. You're walking over these descriptions of high level events of various kinds, like data, actions etc. and you're either storing them off into a model and then providing a run command that will go and do those things, or you're doing those things straight up.
Basically the code is always in the same place just whether you're storing up a bunch of objects and then calling run, versus do whatever immediately. We use the thread model and the Queue class a lot. And it does a really good job of it.

Dunlavey wrote a book called "Building Better Applications" that was basically focused on DSLs before the TLA was coined. Basically it comes down to not letting the programmer define the terminology for the DSL and having the business analyst, the domain expert, your customer, whoever they are, they know the problem set, letting them express it such that they help define the language side by side with you. It trains them for the few peculiarities in Ruby language and it really couples them to the problem set and gets them really engaged. The programmer benefits in that, they get side by side knowledge with the domain expert and learn what the vocabulary really means. As a result you wind up with a more solid result, such that it conveys the information at a very high level that the domain experts are interested in, and it gets the job done well.

No, I don't think we're another Lisp or Smalltalk in the making. I think the only reason these two didn't succeed as well as they did was timing, more than anything else. They were ahead of the game and they didn't get the critical mass to cross the chasm?

I think a lot is timing. We have fairly English like syntax that allows for very good approachability. Smalltalk has that in many respects, but LISP does not. We have great timing on Web 2.0-type implementation and Rails did a great job of picking up the ball and getting a good implementation that allowed people to express web development, web interaction at a really high level and have the language get out of your way. We have a lot of Rails developers picking up Rails that don't have any Ruby experience and it's not getting in their way for quite a while and that says something. That says that it is an amazing DSL for writing websites and that alone can bring us the critical mass that we need for Ruby and Rails to really go and stay in the forefront for a long time to come.
I'd like to see Ruby go through some periods of contraction and cleanup. The implementation itself is hard to approach. We've got a lot of really interesting problems out there and really smart people that are working on really neat stuff that can be integrated into Ruby, but it's hard to do that because you gotta take the mental context switch of working high level in Ruby to working low-level in C. I'd like to see Ruby actually get cleaned up so that you can work on core language issues in Ruby itself and have that expressed in such a way that we can lower the barrier, the cost of learning how to work in core, such that those people can walk into the Ruby aspect language and do what they want to do at the level they want to do it.

I see Ruby kind of actually kind of sneaking into the skunkworks projects of the more mainstream corporate environment. When something needs to get done fast and done well, I see Ruby having a really good chance of being chosen for that stuff.
You don't see skunkworks projects getting off the ground and done in 6 weeks, 12 weeks, 18 weeks at the stretch in C++ anymore. It's just not something that you want to express highly fluctuating problem sets in, because it's not highly fluctuating for you. It works against you 9 times out of ten. I've worked with some really amazing C++ programmers, and as smart and capable as they are, the language's working against them are when it comes down from above that they need to turn 90 degrees and do so on a week. It's a lot of work and stress, whereas with Ruby if you're working a high level, if you're doing meta-programming and you're working on a DSL level that's nothing.

Over the next year or two we're going to see the availability of more tools for the developer to have them express themselves even more fluently. I think the IDEs and the editor are going to get the tools they need in order to provide them the refactoring tools they we currently don't have, to provide them the analysis tools, the profiling tools, the coverage tools (that we have in a more alpha state). I think we'll get to the point where expressing even more complex DSLs is going to get easier as we get more libraries focusing on clean definition of syntax and semantics. We'll be able to bring ourselves to a wider array of problem sets, at a really high level, such that the engineering on those things gets trivial.

Yes I am. I am working on Scheme actually. Something that I learned before at a more academic level, but never really written a real code in. So I'm going back to fill that out. I left Amazon feeling really burnt out and spent 8 weeks in a coffee shop working on my interpreter-a-week project, where I had the goal of writing one language per week. It could be the same every week, but the implementation and the architecture had to differ drastically every time. It wound up that time and time again I kept coming back to a LISP or a Scheme core that I was re-implementing every time. I'd implement one by hand, one in C, I did do one with YACC, I did one with ANTLR, a really nice high level language generator, and time and time again I kept coming back to the meta-circularity problem, wanting to do Scheme code. I've written more Scheme-like code than I've written lines of Scheme and it's time to write lines of Scheme. I'm going back to start learning that properly.

Actually both LISP and Smalltalk have got really powerful web frameworks already. They just don't have the marketing behind them that Rails has. They actually do a really good job. SeaSide is amazing; you can do some really powerful stuff with continuation based web programming.

"What is your perspective on static versus dynamic or non-static typing?"I am a dynamic languages proponent, thus far with languages I do have experience with. I have yet to go with languages like Haskell or Erlange, where there is static tying that they tend to get in the developers way with how they think. ...