Sunday, March 23, 2008

Monkey Patching

Earlier this month I spoke at the International Computer Science Festival in Krakow. Krakow is a beautiful city with several universities, and it is becoming a high tech center, with branches of companies like IBM and Google. The CS festival draws well over a thousand participants; the whole thing is organized by students. While much of the program was in Polish, there were quite a few talks in English.

Among these was Chad Fowler’s talk on Ruby. Chad is a very good speaker, who did an excellent job of conveying the sense of working in a dynamic language like Ruby. Almost everything he said would apply to Smalltalk as well.

One of the points that came up was the custom, prevalent in Ruby and in Smalltalk, of extending existing classes with new methods in the service of some other application or library. Such methods are often referred to as extension methods in Smalltalk, and the practice is supported by tools such as the Monticello source control system.

As an example, I’ll use my parser combinator library, which I’ve described in previous posts. To define a production for an if statement, you might write:

assuming that productions for expression and statement already exist. The purpose of the rules if and then is to produce a tokenizing parser that accepts the symbols #if and #then respectively. It might be nicer to just write:

ifStat:: #if, expression, #then, statement.

and have the system figure out that, in this context, you want to denote a parser for a symbol by the symbol directly, much as you would when writing the BNF.

One way of achieving this would be to actually go and add the necessary methods to the class Symbol, so that symbols could behave like parsers. I know otherwise intelligent people who are prepared to argue for this approach. As I said, Smalltalkers would call these additions extension methods, but I find the more informal term monkey patching conveys a better intuition.

Typically, one wants to deliver a set of such methods as a unit, to be installed when a certain class or library gets loaded. So these changes are often provided as a patch that is applied dynamically. Not a problem in Smalltalk or Ruby or Python (though I gathered from the Pythoners in Krakow that they, to their credit, frown on the practice).

Apparently, there is a need to explain why monkey patching is a really bad idea. For starters, the methods in one monkey’s patch might conflict with those in some other monkey’s patch. In our example, the sequencing operator for parsers conflicts with that for symbols.

A mere flesh wound, says our programming primate: I usually don’t get conflicts, so I’ll pretend they won’t happen. The thing is, as thing scale up, rare occurrences get more frequent, and the costs can be very high.

Another problem is API bloat. You can see this in Squeak images, where a lot of monkeying about has taken place over the years. Classes like Object and String are polluted with dozens of methods contributed by enterprising individuals who felt that their favorite convenience method is something the whole world needs to benefit from.

Even in your own code, one needs to exercise restraint lest your API becomes obese with convenience methods. Big APIs eat up memory for both people and machinery, reducing responsiveness as well as learnability.

Then there is the small matter of security. If you are free to patch the definition of a class like String (typically on the fly when their code gets loaded), what’s to stop malicious macaques from replacing critical methods with really damaging stuff?

The counter argument is that in many cases (though not in this example), the patch is designed to avoid the use of typecase/switch/instance-of constructs, which bring their own set of evils to the table.

Extractors are a new approach to pattern matching developed by Martin Odersky for Scala. They overcome the usual difficulty with pattern matching, which is that it violates encapsulation by exposing the implementation type of data, just like instance-of. It may be part of the answer here as well.

However, many monkey patches are motivated by a desire for syntactic sugar, as the example shows. Extractors won’t help here.

A variety of language constructs have been devised to deal with this and related situations. Class boxes and selector namespaces in Smalltalk dialects, context oriented programming in Lisp and Smalltalk, static extension methods in C# and even Haskell type classes are related. These mechanisms don’t all provide the same functionality of course. I confess that I find none of them attractive. Each comes at a price that is too high for what it provides.

For example, C# extension methods rely on mandatory typing. Furthermore, they would not address the example above, because we need the literal symbols we use in the grammar to behave like parsers when passed into the parser combinator library code, not just in the lexical scope of the grammar.

Haskell type classes are much better. They would work for this problem (and many others), but also rely crucially on mandatory typing.

Class boxes are dynamic, but again only effect the immediate lexical scope. The same is true of simple formulations of selector namespaces. Richer versions let you import the desired selectors elsewhere, but I find this gets rather baroque. I'm not sure how COP meshes with security; so far it seems too complex for me to consider.

I’ve contemplated a change to the Newspeak semantics that would accommodate the above example, but it hasn’t been implemented, and I have mixed feelings about it. If a literal like #if is interpreted as an invocation of a factory method on Symbol, then we can override Symbol so that it supports the parser combinators. This only effects symbols created in a given scope, but isn’t just syntactic sugar like the C# extension methods suggested above.

Of course, this can be horribly abused; one shudders to think what a band of baboons might make of the freedom to redefine the language’s literals. On the other hand, used judiciously, it is great for supporting domain specific languages.

So far, I have no firm conclusions about how to best address the problems monkey patching is trying to solve. I don’t deny that it is expedient and tempting. Much of the appeal of dynamic languages is of course the freedom to do such things. The contrast with a language like Java is instructive. Adding a method to String is pretty much impossible. One has to sacrifice one’s first-born to the gods of the JCP and wait seven years for them to decide whether to add the method or not. I’m not endorsing that model either: I know it only too well.

Regardless, given my flattering portrayals of primate practices, you may deduce that my main comment on monkey patching is “just say no”. The problems it induces far outweigh its benefits. If you feel tempted, think hard about design alternatives. One can do better.

21 comments:

The way I understood the term "Monkey Patching" is that it does not refer to what in Smalltalk is called a class extension (adding a method to a class) but an override (replacing a method of a class of an other one). These are two very different pairs of shoes. The later is the path to the abyss.

I may be misinterpreting the term. However, I don't see the two "shoes" as all that different. In a deployment scenario, how does one know one isn't stepping on an existing method?And even if one knew, what would one do when one found out about such a conflict?

So, I consider both of these a a path to the abyss.It only works in a closed world scenario, where you know exactly what software s going to be used.

Isn't it a suitable solution to replace methods with generic functions with a specified piece of behavior that's scoped to a specific module or namespace? This is what Factor and Common Lisp do, I believe. It seems like, in the first motivating example, generic dispatch isn't relied on. In cases where it is (say you have a function that should do a particular thing with strings and something else with numbers), you never need to invade the encapsulation around any datatypes. The only conflicts, then, are between implementations of generic words that are expected to do the same thing. This should eliminate at least half of the conflicts from monkey patching, unless I misunderstand the problem.

Multimethods usually suffer from encapsulation problems of their own. Adding a new generic function can break existing code - for example, a legal call can become ambiguous.

One can avoid ambiguity with schemes that give priority according the order of the parameters. This still can cause problems, as unrelated definitions of a generic function can still interfere with each other.

So altogether, I'm not too keen on multimethods, despite their many attractions.

Multimethods and module-scoped generic functions are orthogonal. Slate has mutimethods without this scoping, and Factor has only module-scoped generic functions (for now at least). The advantages and disadvantages are separate.

This is an interesting discussion. It reminds me of the problems often encountered in extending objects and prototypes in JavaScript without inadvertently stepping on someone’s feet. I’ve thought a bit about the you presented, and come up with my own characterization and proposed solution. I ended up writing a lot more than I'd intended, so I've put the full description on my own website.

Here are the requirements that I gathered from your post:

1. The ability to change the way some objects behave, from the user’s perspective (here the user is itself a program),2. without interfering with implementation details that the object internally depends upon,3. without inadvertently changing the behavior or interfaces of other objects, and4. without introducing security holes.

In other words:We want to change how some objects work without causing collateral damage. The collateral damage comes from extending all objects of a particular class throughout the system, rather than only those (parts of) objects that you have access to in a particular context. As I see it, your if:: rule that wraps symbols as parsers is safer precisely because it can only modify the behavior of publicly accessible parts of those objects to which it has access. So the question is: how do we make it easier to follow a more general version of that pattern?

Maybe I misunderstand your example, but it seems to me that you try to achieve "monkey coercion" rather than "monkey patching". As far I understood, you want to coerce a mixed list of symbols and parsers into a list of parsers only at the very moment when they are passed into into the parser combinator library code.

IMHO both problem, ie monkey conversion and monkey patching, can be solved with the same lexically scoped technique.

Of course, in a dynamic language, such a coercion can not happen automagically. In a static language, ifStat:: would we types as, lets say, Parsers[] and whenever we pass some argument of another type, the language looks for a constructor of Parsers that accepts that other type and then uses this constructor to coerce the parameters. In a dynamic language however we must do that manually. I suggest as solution: In the first line of the ifStat:: methods we apply

args.collect! { |x| x.as_parser }

and we monkey-patch this lexical scope with a method extension of Symbol#as_parser and Parser#as_parser. Hence, we use lexically-scoped monkey-patching to solve the problem of monkey coercion: outside of the library you are free to pass around Symbols, inside its all Parsers

On the other hand, the actual "monkey patching" problem (eg String#pluralize), in my experience, only arises in lexically controlled scopes, ie in code I write, I want to use my pluralize method, in the someone else writes he wants to use his pluralize method.

Maybe there are strong examples that motivate the need that one would have to monkey patch someone else's lexical scope. I dont know of any.

PS at least none which are not better to be solved with instance-specific methods or a full-fledged dependency injection mechanism... :)

Probably more relevant for this problem than Scala's extractors are Scala's implicit conversions. They also "rely crucially on mandatory typing", so probably you won't find them sufficiently compelling. The idea is introduced (to an audience of working programmers) here.

This approach also mitigates the problem of conflicts, because conversions are brought into scope using established mechanisms, and can be overridden manually. If more than one conversion applies in some lexical scope, the compiler will provide an error, and you'll need to specify the conversion by hand. This may be annoying, but it seems to be just about the best you could hope for in the presence of such conflicting monkey patches...

Thanks for your excellent comments. I don't know if you realize this (probably not) but you are well on your way to re-inventing class boxes. As I noted in the post, I'm not enthused about that approach.

It adds considerable runtime overhead, and significant complexity. In my view, the problem is not so acute as to justify the costs. That is of course a judgement call that is hard to quantify. Your mileage may vary.

I am familiar with Scala's approach to the problem, and with the Sparsec library as well (discussed in older posts about parser combinators).

As I said in the post, mandatory typing isn't of interest to me. I've done my share of work on typing, and I don't believe the price is worth paying. My parser combinator library is actually a nice example of that; see the paper on Executable Grammars in Newspeak if you really care.

Runtime efficiency is always a good thing to keep in mind, but I think you need a more precise description of the semantics that you want and don't want before you can make a fair comparison of the efficiency of different solutions. My intention was not to describe how a particular implementation should work, but rather an idea about how something might work, from the user's perspective.

I think it would be useful to take a step back from particular concrete solutions. It seems to me what you're looking for is (an efficient algorithm implementing) a function that maps a program context, an object, and a method name to a method body:

C |- o.m = b

where o is the address of an object, and the expression b is some function of C, o and m. In static typing, o would instead be an unevaluated program term. In standard dynamic typing, the context C would be structured as a memory (a function from object addresses to object values), an object value would be a function from method names to method bodies (as well as field names to field values--but that's not too important here), and b would be defined as:

C |- o.m = C(o)(m)

Now what it seems like you're looking for is a dynamically-typed system where there is additional information in C that affects the choice of b. There are a lot of different ways that this could be implemented, but let's not worry about that for now.

So the questions are:What is that additional information?How does it affect the choice of b?How did the information get into the context? (i.e., language constructs that change how objects behave)And in what points is that information filtered out of the context? (i.e., scoping rules)

It's nice to see a principled approach. As you note, figuring out what the desired semantics is the first task. Monkey patching is handy in a variety of scenarios: it's not clear they all require the same answer.

One set of problems involves syntactic sugar. This motivated C# extension methods (to support LINQ).Another involves a desire to avoid explicit type testing. This is where extractors can help: type tests no longer need to violate encapsulation.

While coercion seems to be a common thread through all this, I'm not a fan of implicit coercions. They can cause unforeseen interactions. This is why they were largely discredited since PL/1 on.

Scala gives the programmer the power to decide how much to use them; if used wisely, they are quite nice. Otherwise ...So I have very mixed feelings about coercions as a basis for addressing these issues.

Ultimately, one has to decide if the cost of the solution is justified by the problem.

One also has to think how well any proposed solution fits in with the rest of the language. This is "conceptual cost", and is quite separate from any analysis of implementation effort or performance impact. Both are part of the overall cost however.

Of course applying double dispatch is perhaps not a cure for every possible monkey patch but, it has properties which are desirable in software engineering, like for example that the library maintainer (of Symbol, in your example) can be asked to support it from now on for #, messages. Another one is its indepence of scope.

If you're lucky. the binary operators all use double dispatch already. In practice, using the existing Squeak libraries, this isn't the case.

So now you're back to modifying someone else's code, which you may not have the permission to do in a secure system. Even if you do, f some one else decided to make a similar modification, and named their method "concatenateFromSymbol:' one of you is hosed.

You'd also need to make similar arrangements for other parser combinators, like |, star, plus and more. Some of these don't exist for Symbol. So again you have the potential for conflict andthe security issue.

Concerning security, if someone doesn't have permission to modify that piece of code in class Symbol, then I'd say the same applies to Class Boxing attempts and also to any solution which is similiar to class boxing (no permission must be no permission regardless of the trick used).

So all that remains is, to provide a facility (in Symbol) which allows Symbol to be adaptable to the unforseeable (in the sense of inventing all double dispatches in advance).

Thanks for your post Gilad. As one of the authors of Classboxes, I do agree with you. Having a simple model was our main goal, but a complex situation may be easily created.

There is one important dimension that hasn't been raised when considering "monkey patching". Will I allow legacy code to benefit from my extension? I called this property "reentrance". This is the difference between Classboxes and Selector Namespaces. Note that this property is different from having a global visibility for class extensions. See Section 6 of Analyzing Module Diversity, Paragraph Selector namespaces are non reentrant (do not look at the formulas, this only help getting paper accepted :-).

Klaus Witzel asked me whether and how monkey patching can be addressed without classboxes. Solely using class inheritance introduces a significant amount of code duplication, anomalies in the class and type hierarchy. People interested into this may read the analysis I did (Section 2 of this) on AWT and Swing, the latter being an unanticipated evolution of the former.

Maybe you can have a dynamically scoped exception handler that handles a "doesNotUnderstand" exception. When the runtime sees a misunderstood method it raises an exception. The exception handler sees the "," method invoked on a Symbol, it returns tokenFromSymbol instead of re-raising the exception.

This is a slightly different mechanism from the standard Smalltalk-80 doesNotUnderstand as far as I understand it (however, that could be simulated by having a top-level handler that calls the doesNotUnderstand method on the receiver as usual). It requires exception handlers to be called before the stack is unwound (but this is usual in Smalltalk - and Common Lisp - AFAIK).

This provides dynamically scoped "monkey patching" (actually "monkey adding"). Handlers could be nested like normal exception handlers to add more methods. Probably not very efficient for many patches and smells of implementing your own dispatch (once you have a set of methods that you want to add).

That said I quite like the idea of selector namespaces, assuming that's what I think it is.