There’s an argument that static typing prevents errors by detecting them at compile and/or edit time. In a trivial sense, this is absolutely true. You can, for example, write a Java program with such an error and watch Eclipse highlight the offence.

The interesting thing I’ve noticed is that many of the people in favour of static typing are arguing from a position of “Do what I say, not as I do.”

I don’t mean that they program in Python. I mean that when I ask them whether they would have typing troubles in a dynamic language, their answer is often “well, I wouldn’t, but every business needs hordes of monkey/offshore/intern/new graduate programmers who do make these errors.”

“People shouldn’t be able to open my classes.” “They need compile time type checking so the app doesn’t blow up.” “Extending system libraries is bad.”

Concerns like these all seem to boil down to one major theme: The people I work with are stupid.

It’s not really stupid people, it’s people who’ve accepted a stupid idea. Java’s design is based on the idea that a language can prevent misuse by making bad things hard to do. It’s defensive thinking.

Here’s an interesting question: what sorts of typing errors do experienced, intelligent programmers actually make? And what sorts of typing errors sneak through unit tests and even QA and into production? And most especially, what sorts of typing errors have catastrophic consequences in production?

Now those are interesting errors. Those are worth worrying about. I’ll go further: those are worth static typing.

Here’s one from my actual, hands on experience. Distinguishing escaped from unescaped strings. I don’t know if I’m using the right words here: I’m thinking of a typical XML or XHTML application where some of the time a string is just a string, but some of the time it has a bunch of its characters replaced or escaped with special entities.

Another case of escaped and unescaped strings concerns safely composing SQL queries and updates (another solved problem in other languages). The argument is always in favour of using library functions that do the conversion for you, like PreparedStaement. If you think about it, that’s no damn good. What that does is treat everything like an unsafe String and only convert it at the very last second.

If you’re going to do any fancy SQL composition, you can only do it with stuff that isn’t a user value, so you have to keep track of escaped and unescaped strings anyways. And finally, your libraries still have all the unsafe APIs that don’t perform the conversion for you, so you are relying on your iron will and self-discipline to prevent errors, rather than having the compiler perform what is really a rather trivial check.

I have no problem with relying on iron will and self-discipline to eliminate errors. But if your argument is that iron will is appropriate for preventing SQL injection attacks, why isn’t iron will appropriate for preventing trivial type errors that would result in a MethodNotImplemented exception?

I’m not alone in considering this a problem. Web applications that screw this up are vulnerable to cross site scripting (XSS) attacks. This is very bad, and if static typing could help I’d eagerly embrace it.

A Little Java, a Few Patterns: The authors of The Little Schemer and The Little MLer bring deep and important insights to the Java language. Every serious Java programmer should own a copy.

How could static typing help? Well, imagine if you designate some strings as escaped and some as unescaped. So our type hierarchy is that there is String, UnescapedString extends String, and EscapedString extends String. (Actually, I’d prefer interfaces if I were designing a Java-like language, but that’s by the by).

Now there are certain critical places where we would need to harden our application. The first is everywhere we get strings from users. These strings, just like tainted or unsafe variables in scripting languages like PHP, need to be UnescapedStrings. We would type our methods accordingly (in Java, this could be accomplished with annotations). For example, anything snarfed from the HttpRequest object is an UnescapedString.

Then when we present strings, we type the methods as taking EscapedStrings only. If we try to pass a POS (Plain Old String) or UnescapedString to a method parameterized by an EscapedString, we get a compile time error.

To get around the errors, we need to escape our strings. We do that by writing a conversion method somewhere that, you guessed it, takes an UnescapedString as a parameter and returns an EscapedString.

Naturally our application would be full of bookkeeping annotations as we keep track of which strings are escaped and which aren’t. But my gut feeling is that catching this kind of error at compile time would be worth it.

Here’s another error that I think is worth the effort of static typing. The bane of my existence when maintaining legacy Java code: NullPointerExceptions.

This is actually a solved problem in languages like Haskell. Static typing can easily distinguish between methods that might return a null (like getting a column from a database row) and variables that must not contain a null. The compiler can and should force you to write code that handles the null case.

The Little MLer introduces ML (and Ocaml) through a series of entertaining and straightforward exercises leading up to the construction of the Y Combinator.

With ML and Ocaml you can design rich types that fit the domain model and all types are checked at compile time through type inference.

Here’s my question to my fellow Java programmers: why do we tolerate a compiler that forces us to type some things as BigDecimals and some things as Integers, but we don’t insist that the compiler catch places where we aren’t checking for null?

These are just two places where static typing could help experienced programmers solve problems that plague real, production code. I’m all for static typing, if it can help me with the errors I actually encounter.

That being said, there is a lot of work being done in this area, although obviously not by Sun or Microsoft (to be specific, not by their C# team). As mentioned, Haskell and several other languages provide static typing that is sophisticated enough to prevent errors like this.

I’m not even close to being the first person to notice the problem:

Joel Spolsky described using naming conventions to highlight errors like this. It’s interesting that Hungarian Notation was invented for this kind of thing, but somehow a Cargo Cult has arisen around Systems Hungarian where programmers use it for things like marking integers, a fact that the compiler and IDE already know, but don’t use it for domain-specific things like whether a string is safe.

Tim Sweeney of Epic Games wrote an incredibly lucid wish list for The Next Mainstream Programming Language, where he discusses in detail the exact issues his team has faced building and maintaining Unreal. His presentation is available in PowerPoint and PDF formats. He has given a lot of thought to how static typing could help build and maintain huge, complex, commercial applications.

Agitate for language features that can help us solve these important problems. If the next version of javac can do escape analysis, it can identify potential null pointer exceptions, possibly through inference so we don’t even have to type more code.

Educate ourselves about the bleeding edge of language development. No, that isn’t C# 3.0, Common Lisp, or Ruby 2.0. It’s ML, Haskell, Erlang, and a bunch of other things I need to learn. We may not be able to use Haskell to build Yet Another Boring Web Commerce Application, but we might learn enough to use a new naming convention or possible to write a string container that enforces escaped safety.

* The escaped and unescaped strings... While explicit static typing may somewhat help you there, dynamically typed langage don't "lose" either. Check Ruby, Ruby objects have a taintedness flag that you can check for, and with a pair of commands you can change Ruby's level of security so that it forbids more and more operations to tainted objects. Problem solved, no static typing used.

* NPE issue, check the Nice language, it runs on the JVM and it has two kinds of "types": nullable and not nullable. I guess you know how they behave already

* Whatever you'd like to see, Lisp is still the cutting edge, it has always be and it will probably always be. If only because anything you create for another language you can trivially include to Lisp and Lisp dialects via the use of macros

Very nice. Yes, typing should be higher level than just some machine architecture detail.

For example... perhaps typing variables not as just a number, but as 'meters' or 'feet' might have saved the Mars Climate Orbiter. Also: subtracting two dates makes sense, adding two dates doesn't. So, put physical units of measurement in a type system.

Tagging strings with the character set could make internationalization easier. "utf-8" vs. "iso-latin1"

To the point: a static type system as you describe should be layered, to allow for more than one interpretation of the actual variable.

The sun crew is working on a language called 'FORTRESS'. One of its features is relations between types, allowing you to declare a variable as containing 'miles', another as containing 'kilometers', and setting up a conversion routine.

As far as non-nullable goes, java 1.6 should have the features for an annotation to cover it.

And as far as your main argument goes: You're forgetting one extremely major benefit of static typing:

It makes picking up new libraries an absolute breeze, because your IDE knows exactly what you can call.

Yes, Java should have nullable types like Nice, but there's a lot you can do without switching languages. For example, every web app I've seen has to manipulate lots of identifiers but few programs have an Id class, let alone EmployeeId. They often just use ints and strings. Introduce the right low-level types and move some methods around, and you'd be surprised how much clearer your code gets. String escaping can also be a solved problem if you do it right.

More than one person has said that a benefit of static typing is that IDEs provide some content assistance.

This is (a) off-topic. It's as if I said "trucks consume more gasoline than station wagons" and someone says "trucks provide a higher viewpoint for the driver." True, but off topic.

I'm not writing a post that says dynamic typing is better than static typing under every circumstance, for every programmer, on every project.

(b) it only seems to be true to you, only because no-one has written AND POPULARIZED a really good content assist IDE for languages like Ruby and Python.

Here's an exercise for everyone who likes Eclipse's support for Java: explain why you cannot provide the same or better support for a Ruby, SmallTalk, or Python program that has the same structure as a Java program.

About the tainted flag, and evey equivalent feature in languages like PHP.

This is not the same thing as static analysis, although it's better than crossing your fingers and praying. The point of static analysis is to prove that it will never happen, not just catch it at run time.

If it is acceptable to catch it at run time, then I jump to a previous post--A "fair and balanced" look at the static vs. dynamic typing schism"--and say let's just stick with run time type checking.

What I was trying to say in this post (perhaps poorly) was that if I had to put up with all the noise of static typing, then at least help me with issues I think are serious: that extra signal would make up for the incredible amount of noise!

languages with pattern matching, like ML, make you handle the null pointer case. When you write down your patterns, if you leave one out (corresponding to forgetting about the null pointer), you get a warning/error.

In a dynamically-typed language like Smaltalk, there is a technique for handling the NullReferenceExceptions called the Null Object Pattern [1]. It offers a way to deal with null exceptions without needing static typing.

Some people might disagree and say that this technique will only sweep the problems under the rug, but maybe with some real-world experiment, we can truly find out the pros & cons to a proposition like this.

coding in ml (the ocaml variant) is a joy for certain kinds of problems -- when it compiles, it almost always works, and when it doesnt work, it usually is the problem with the algorithm -- not a coding bug.

the other advantage of coding in ocaml is that you can express more algorithm per line of code, i.e., you can be very productive.

that said, there are many problems for which ocaml is a misfit (those that involve i/o),

>> It makes picking up new libraries an >> absolute breeze, because your IDE knows >> exactly what you can call.Doesn't have a leg to stand, check Smalltalk, it's a dynamically typed langage and yet it's varions IDEs are some of the most advanced ever.

In fact, Eclipse (for example) was started by Smalltalk "refugees" that wanted the power of the Smalltalk IDEs back. Smalltalk systems had autocompletion, refactoring (hell, the Smalltalk guys invented refactoring), doits (basically, executing only part of what you coded, select a snipped of code and execute it alone), and since Smalltalk IDEs are smalltalk systems you can even edit them on the fly (think Eclipse plugins but without the need of a specific environment and without the need to restart every time you changed something).

And even then, the current Smalltalk IDEs pale compared to the ones that were running in the 80s at Xerox PARC on Alto and Dorado machines.

Static typing isn't better for IDEs, it doesn't bring the IDE any more potential power (if anything, it gives the IDE much less potential), it just makes coding the IDE easier.

I think you're missing the point by saying "Unit tests will catch the trivial errors that involve static types".I prefer to think of the static type system as a way to write unit tests really conveniently and briefly.

Type declarations should be optional. If you're like me, you want your code to be "right". (that means clean, readable, etc...and not only working)To get right code, you'll have to rewrite a program many times. So most code will be thrown away. Do you really want to type all those type declarations just to throw them away later?

There are many solutions:

1. The CL way: you can add type declarations if you want, but they are not required

2. The Ruby/Python way: no type declarations at all

3. The Haskell way: (nearly) no type declarations, these will be inferred by the compiler

And about escaped/unescaped strings: if you add the these subclasses to your program you're aware of the problem anyway, so if would be trivial to solve without these subclasses. (by tainting objects for example).

I prefer to think of the static type system as a way to write unit tests really conveniently and briefly.

I agree in theory. In practice, the statement hinges on your perception of what is "brief" and how much syntactic overhead the language forces down your throat.

For example, ML and Haskell use type inference to minimzie the number of declarations you have to make. That's good.

M$'s latest vapourware, C# 3.0, promises a very limited inference facility using the var keyword. James Gosling said he considered this facility for Java using two new assignment operations, "assign and type" and "assign final and type".

But whether we like or dislike the static typing enforced by Blubs like Java or C#, all I'm saying is that I think that catching certain semantic problems like nulls and unsecaped strings is more important.

And about escaped/unescaped strings: if you add the these subclasses to your program you're aware of the problem anyway, so if would be trivial to solve without these subclasses. (by tainting objects for example).

The problem with tainting objects is that you require 100% unit test branch coverage to be sure that the runtime checking has caught every possible problem.

I'm not claiming that I don't like runtime checking, or that the benefits of a dynamic language when used properly don't outweigh the edge cases where static checking would help. (Nor am I claiming the reverse).

My simple statement is that I would consider static typing as practiced in high-noise type systems like Java or C# worthwhile if it could solve these types of problems for me.

This is a factorial function. GHC (a Haskell compiler infers this type:

fac :: (Num a) => a -> a

This means that it takes a number and returns a number.

To make the code better, you could add this declaration:

fac :: (Integral a) => a -> a

If you do, fac will only accept integral numbers, so:

fac 0.2

Will give you a compile time error. (note that this makes sense, because fac 0.2 will never reach the base case fac 1 = 1.

This is very nice, not only because the type declaration is optional, but because it has a nice and readable syntax too.

But even Haskell's type system has limitations:

fac (-5)

will loop forever (until you get stack overflow: fac is not tail recursive).

To prevent this, you have to add a guard:

fac x | x > 0 = x * fac (x - 1)

This makes the factorial function more difficult to understand. You have to decide how far you want to go.

In a system where most types can be inferred, static typing is nice. But I would rather use Ruby and unit test everything than C# with verbose type declarations (at least for prototyping, if you need more speed, you have no choice).

>I think you're missing the point by saying "Unit tests will catch the trivial errors that involve static types".>I prefer to think of the static type system as a way to write unit tests really conveniently and briefly.

But unit tests catch so much more than static typing. So it's like you're just happy with settling for less safety. If you take the route of writing unit tests to statically typed programs, you actually pay twice for the bugs that can be caught by either unit tests or statical typing.

I don't believe that there exists a programmer who *only* writes bugs that can be caught by static typing. So if you like to take the "lets pray my program doesn't crash" route, static typing may be a good choice for you; At least the language then forces you to add some simple tests to your code. But if you take the route of testing everything to get greater safety, static typing will give no benefit -- only additional work.

"I mean that when I ask them whether they would have typing troubles in a dynamic language, their answer is often 'well, I wouldn't, but every business needs hordes of monkey/offshore/intern/new graduate programmers who do make these errors.'"

Weird. When I use static typing it is to catch errors I'm afraid I'm going to make. Usually they aren't just typos, though. It's forgetting that I need a conversion or just plain using the wrong variable.

That said, I'm always surprised by how few type mismatch errors I have when I use a dynamically typed language.

I would say that unit tests catch a different set of error than static typing.

Your unit tests tells you that in the execution path tested, everything works as expected: good for checking that normal case works, not so good to protect yourself against an attacker which will try to use uncommon execution path to abuse your program.

Static typing gives you some (small) assurance that hold true in any case.

With local type inference being fashionable (removing a lot of clutter in variable declaration), the only downside I can think about static typing is that it is often used as an excure to avoid unit testing..

"I mean that when I ask them whether they would have typing troubles in a dynamic language, their answer is often 'well, I wouldn't, but every business needs hordes of monkey/offshore/intern/new graduate programmers who do make these errors.'"

Dude. I make simple spelling typos *all the time*. I **constantly** get NameErrors and AttributeErrors and arguments-in-the-wrong-order errors when I program in Python (which is pretty common; it's my language of choice).

I'm working on a JavaScript project right now, and I often find that my code is calling frob.asyncDoStuff() when the method is actually called doStuffAsync(); or that I'm doing "throw FrobError(msg)" when I mean "throw new FrobError(msg)".

I actually do have a test suite with some 300 tests. I run it frequently. But it doesn't cover every path through every method. It certainly doesn't cover "this should never happen, build a comprehensive error message so Jason can usefully debug this" paths, arguably the most important ones to get right the first time. And it already takes a lot longer to run the tests than it would take to compile a whole C# project of comparable complexity.

These aren't deep bugs, but they are so very, very common (and such understandable mistakes) that I have a hard time believing they're only hit by monkeys, interns, and me.

Anonymous wrote:'Tagging strings with the character set could make internationalization easier. "utf-8" vs. "iso-latin1"'

Using static typing for this seems impossible. In applications where character sets matter, the encoding is often known only at runtime.

If you *do* know your input is always going to be latin-1 or whatever, you should just convert it to unicode on input, e.g. using a FileReader or an XML parser or any of a dozen other methods. Thereafter, as long as it's in memory, character set is not an issue.

I'm not sure if this is fundamentally important, but anyone actually implementing the EscapedString/UnescapedString idea on a website would need to consider which method of escaping or encoding is needed. Escape quotation marks with backslashes? Double up the single quotes? URL encoding? HTML encoding? Trying to insert a string into single- or double-quoted javascript string? Perhaps you could make each of these into different subclasses of String.

I can see how using classes like UntrustedUserInputString and HopefullyDefangedString help you keep tabs on whether dangerous input has been considered, but I don't see much reduction in overall complexity. This may create new ways for me to shoot myself in the foot.

This is just off the top of my head, but I'd rather add methods like .getURLEncodedString() and .getJavascriptSingleQuotedString() to the String class and have the respective code aware of what kind of protection is needed.

can see how using classes like UntrustedUserInputString and HopefullyDefangedString help you keep tabs on whether dangerous input has been considered, but I don't see much reduction in overall complexity.

I agree, you won't see a reduction in overall complexity: it's a trade-off. You increase the program noise in exchange for reducing or eliminating a bug you may consider serious.

This is, IMO, the same trade-off you make when using strong, static typing. Extra noise in exchange for safety.

I was trying to say that for me, the trade-off is not particularly beneficial for the current types offered by languages such as Java, but I would reconsider the value of static typing if I could discover safety and null pointer bugs at compile time.

Where can i find an IDE for Smalltalk that has every feature of Eclipse? Why aren't there IDEs like that for Ruby and/or Python? I don't really care why, just give it to me and i'll start coding in the language, 'cause Java sure isn't perfect. But until then I'll use Java+Eclipse.

I guess You could write an equally brilliant IDE for C/C++ but where the hell is it? Well I guess C++ is too complex for such an IDE, so "it just makes coding the IDE easier" IS important after all isn't it?

So I LOVE Java! But not for Java as a language ('cause it's crap) and not for Java as a VM ('cause I don't like VMs in general) and not for Java as a religion (I'm not religious) but Java's tools sure do the job and I'm lovin' it as I would love any other language for such a toolset.

Yeah, okay, but where can I find an IDE that I could use or at least try without first buying (even that's not an option with IBMs VisualAge). And although Eclipse might be derived from a Smalltalk IDE, Eclipse before 3.0 was quite a crap frankly. So I'm not sure VisualAge would be worth it...

I've used both VisualAge Smalltalk and Eclipse for several years, and worked closely on other projects with the people who developed both IDEs. Here are some random impressions:

(1) Eclipse is more powerful than VA Smalltalk. It has stronger browsing support, better refactorings, and better task management (perspectives/editor pinning/fast outline/Open Type/and so on)(2) Smalltalk is considered a better language by most of the original Smalltalkers who moved on to do the Java work(3) The static typing helps Eclipse know things about your code that it would not otherwise be able to know (like when you invoke code assist in the parameter list of a SetParent(Foo) method, it knows to propose variables whose types are derived from Foo, but not variables of some other type Bar).(4) Eclipse takes away some of the pain of doing Java. It manages imports for you. You can override methods easily using code assist. You can Refactor>Move or Refactor>Rename just about anything. It can extract interfaces, modify method signatures up and down the heirarchy, search for all declarations/references/implementors in a project or heirarchy, perform quick fixes for errors, and so on. It really is less painful than most statically typed languages. It's not as good as a REPL loop, but its about half way there from the other extreme of manually compiling your C++ code.

I don't make a lot of typos (thanks to Emacs autocompletion), and I unit test heavily, so I have very little need for type-checking when working in Ruby or Lisp. I think Ruby metaprogramming, in particular, hits a sweet spot in the design space, and it wouldn't work with stronger types.

But when programming in Haskell, I've come to love two aspects of (very) strong typing:

1) Hoogle, the Haskell search engine, allows me to search the standard libraries using approximate type signatures. This is really useful, because Haskell has 15 bazillion higher-order library functions, and I often need to find one that solves a particular problem. I especially miss Hoogle when I'm working in Scheme, for some reason.

2) Haskell's type system provides a safety net when I'm trying to do insane higher-order stuff, such puzzling out what comonads actually do, or trying to translate bits of category theory into running code. This doesn't really apply to production code (where there's little need for such crazy abstractions), but it's nice when the compiler can keep all the meta-levels straight, and explain to me what some higher-order function actually does. Haskell lets me experiment with ideas that I can't even keep straight in Lisp.

And as for Java: Eclipse refactoring support really is awesome, and it's hard to make something like that 100% accurate without fairly strong types. I know how to get it 90% right for a language like Ruby using type inference, but there's a big difference between 90% and 100% in terms of maintaining "flow." So if killer refactoring support is more important to you than, say, Ruby-style metaprogramming, then you might favor Java for this reason.

For the feet/metres issue, the people on the Boost mailing list are busy developing a template system (there are 2 different alpha implementations already) that do compile-time dimensional analysis.

It means that if you accidently doquantity<length> pos = myvelocity / mytime;, you get a compile-time error. (You should have done quantity<length> pos = myvelocity * mytime;)

They also handle units, soquantity<length> = 2 * metres;Will give a different value fromquantity<length> = 2 * feet;And will either add conversion factors as necessary or will force you to use some wort of unit_cast to make the conversion explicit.

And thanks to templates, the checking is done entirely at compile-time, so there's zero runtime cost.

"To get around the errors, we need to escape our strings. We do that by writing a conversion method somewhere that, you guessed it, takes an UnescapedString as a parameter and returns an EscapedString.

Naturally our application would be full of bookkeeping annotations as we keep track of which strings are escaped and which aren’t."

Or not. Why not, instead of having the compiler throw an error, have it fix the problem for you? Declare a converter function for UnescapedString to EscapedString, - or any other type pair of your choice - and have it used implicitly whenever you pass Type A to something that is expecting Type B.

In fact, doesn't nearly every language already do something like this when you pass a number to print()?

Clearly not safe in all cases - and for cases where you don't want it, you just don't define a converter function, or flag the function(s) where you don't want it to happen - then you still get the error.