Wednesday, March 7, 2012

Wednesday, February 15, 2012

In Java's more than 15 years no language has repeated Java's experiment with checked exceptions, other than some languages designed as Java extensions (e.g. MultiJava and GJ). Certainly no mainstream or even nearly mainstream languages have. Notable languages that don't bother with checked exceptions include C#, which started as very nearly a clone of Java, and Scala, which borrows many Java concepts and then beefs up their static type checking.

Even within the Java community checked exceptions have been at least somewhat deprecated. Spring and Hibernate, for instance, moved strongly away from checked exceptions. Bruce Eckel, author of Thinking in Java, considers them a mistake and Joshua Block, author of Effective Java, cautions against overuse.

Now, I've seen lots of arguments that you should "use checked exceptions when the caller must somehow recover." But that argument assumes that Java is basically a procedural language. It's not. It's an object oriented language where reusable abstractions are common. Just a couple of examples from the standard Java library reveals exactly what's wrong with Java's checked exceptions.

The java.util.Iterator methods, for instance, aren't declared as throwing any checked exceptions. If you create your own Iterator that calls a throwing method then the implementation must do a try {...} catch (CheckedException e) {throw new UncheckedException(e)}. Or worse, you swallow the exception. Either way it's boilerplate.

On the flip side, to avoid that problem the call method on the java.lang.Callable interface declares that it might throw "java.lang.Exception". But Exception tells you nothing about what might fail making it exactly equivalent to the unchecked RuntimeException except that you must "handle" it by either redeclaring it in the throws clause (which just pushes the problem upstream) or doing the try/catch/rethrow-unchecked dance. Again more boilerplate. Or, again, more opportunity to swallow.

You might think generics give a way out of the quagmire, but Java has a fatal flaw here. The throws clause is the only point in the entire Java language that allows union types. You can tack "throws A,B,C" onto a method signature meaning it might throw A or B or C, but outside of the throws clause you cannot say "type A or B or C" in Java. So if you have "interface MyInterface<T extends Exception> {void mightThrow() throws T...}" then T must be bound to a single exception type for any given instantiation of MyInterface. And, as special bonus, with that structure you can't say some particular implementation doesn't throw at all. Which means that in practice it's little better than the java.lang.Callable "throws Exception" solution.

The team working on lambdas for Java has found checked exceptions a major stumbling block essentially for the reasons outlined here. A significant amount of work is going into easing that pain

In short, as it stands the design of the Java language requires you to either avoid reusable abstractions or wrap useless checked exceptions in boilerplate. And if I want to avoid reusable abstractions then I know where to find Pascal. Could checked exceptions be made workable? Perhaps with some careful language design. But Java isn't that design.

With this post I want to ask a question: what does it mean for code to be "too dense?" This question has implications on everything from languages to APIs to coding style.

I've seen debaters defending Java's verbosity precisely because it isn't "too dense." They say the sparsity of the code makes it easy to understand what's going on. Similarly it's common to bash programmers for playing "golf" when their code is dense. But if we're allergic to density then why do programmers seem to prefer to use tools that create code density when there are fairly straightforward ways to create less dense code?

For an example I'm going to use regular expressions(1) since just about every programmer knows what they are, they're very dense, they exist in direct or library form for every general purpose programming language, and they are easy to replace with "normal" code.

Regexes are tight little strings that have very little in the way of redundancy. They're frequently accused of being "write only" - impossible to read and maintain once written. They are the poster children for "too dense" if anything is.

With the modern-ish focus on refactoring and the understanding that code is read far more often than it is written then if regexes are too dense you'd think programmers would be eager to replace those dense strings with more standard code just to improve readability. After all, a regex encodes a simple state machine or perhaps something a bit stronger if the common Perl-ish extensions are used, so replacing them is easy.

Yet it doesn't happen, at least not much. Regexes remain a mainstay. New regexes are continually written and old ones aren't ripped out and rewritten as loops and if statements just to gain some more readability. They're expanded for performance reasons or when the logic needed exceeds the power of regexes, but they almost never get replaced with an explicit state machine just to improve maintainability.

Why is that? We can't blame a few bad programmers. Regexes are far too widely used for that simple cop out.

What regexes and our use of them suggests is that we're not allergic to density in information per character but to something else. One culprit is is simply unfamiliarity. Regexes are okay because we're familiar with them, other forms of density are bad because we're not familiar with them.

But maybe it's even stronger than that. Perhaps the familiarity with regexes makes us aware of a different kind of density/sparsity trade off. A regex's information density may make it slower to read in terms of characters per minute but we know that expanded code would be slower to read in terms of concepts per minute.

In this post, I picked on regular expressions because they're so widely known and used but the bigger question is in the design of languages, APIs, and coding conventions. This article started with a question and will end with more. Are regular expressions outliers, unusual in creating value out of density? Is there some optimum relationship between frequency of use and density where something becomes too dense if we don't use it often enough? If we create dense languages, APIs, or coding conventions are we creating impenetrable barriers to entry for newbies? If we don't create dense notations are we providing a disservice to those who will use the notation often? Is there any hope that a designer of a language, API, or coding convention can find a near optimum density for his or her target audience that remains near optimal for a long time over patterns of changing usage?

Footnotes

Thursday, January 5, 2012

Paul Snively and I are having a bit of disagreement about the value of treating type errors as warnings during development. It's hard to make anything like a cohesive point on Twitter so I thought I'd write a quickie post on what I mean and why I think it's valuable.

What I'm talking about is not revolutionary. Eclipse JDT does it and I'm sure others do as well. The idea in a nutshell is that when the compiler encounters a type problem in addition to reporting a problem and instead of stopping it should optionally elide the offending code and replaces it with code that will throw an exception if executed. As a trivial example the Java code

int foo(String x) {
return x*2;
}

would get an error report and be replaced by the equivalent of

int foo(String x) {
throw new TypeError("Expected an int but got a String at line 42 of Bar.java.");
}

That kind of loose handling of type errors obviously shouldn't be enabled for production builds or even continuous integration builds - otherwise I might as well use a dynamically typed language that does it better anyway. But for development I find that kind of behavior very useful. And while my toy example was in Java I have the same desire when working on any large program in a statically typed language.

My main use case is modifying a data structure definition that is used several places in a large program. As one concrete example, if I modify a language AST definition I may not want to bother fixing up the optimizing code path until I've ironed out the kinks in the non-optimizing code path. Perhaps the whole idea is rubbish and any work I do on the optimizing path would be wasted. Or if I had a Boolean field but realized I should have used something more meaningful or with more options then I could break code everywhere but want to fix and unit test parts of the program incrementally, allowing me to think about more manageably sized chunks than "everything this one change breaks."

When a program gets changed it may very well pass through stages where parts of it are nonsense, or at least not provably sensible. Rather than having to fix everything up before exploring the consequences of my change I find that it is sometimes handy to work in a more piecemeal fashion, restoring sense to some parts and exploring the consequences. Compilers that can treat type errors as warning support that work style. Without such support I frequently end up manually peppering my code with exceptions and TODOs. Why not let the compiler do that bookkeeping for me?

Edit: Clarifications and Rebuttals

I'm not talking about optional typing where you can turn off type checking. Nor am I talking about gradual typing where you can turn type checking on or off for various parts of your program. Optional and gradual typing might (or might not) be nice, but they're orthogonal to what I'm talking about. All I'm suggesting is that when a static type checker (optional or not, gradual or not) finds a problem I always want the error report but during development I don't necessarily want the errors to prevent code generation. And while there might be sophisticated ways to generate code around type errors the most straightforward is to emit code for an exception or program termination plus some diagnostics.

There are suggestions in the comments that the result will be something like the wars over turning on -Wall (warn for all known potential problems). But -Wall isn't the right comparison, since I'm not suggesting that any type checks can be turned off. What I'm proposing is more nearly the equivalent of turning off -Werror (error on warnings). The difference is that code will often work (or at least "work" with scare quotes) in the presence of warnings. The temptation to ignore warnings can be quite strong. But in the case of type errors my suggestion would produce an executable that absolutely can't work if an offending code path is executed. Thus the temptation to turn type errors into warnings on production builds should be minimal.

If a compiler writer is seriously concerned that this behavior would be abused for production builds then the answer might be to only expose it via an API that can be used by IDEs, Emacs SLIME style modes, etc, but which isn't available in the supplied command line batch compiler. That sounds like overkill to me, but whatever.