Monday, October 10, 2011

Feeble Typing (a thought on Dart)

Update: A Word About The Title. If you came here from Twitter, the line I used - "not unsound, incomplete" - referred to the original title of the article "Incomplete by design," which was based on my misunderstanding of the dominant terminology in static analysis (see updates, and thanks David and Sam in the comments for pointing out my wrongness). When I realized this, I renamed the article "Deliberate decisions," but that wasn't an interesting title. When I read Rafaël Garcia-Suarez's take on Dart this morning, Why Dart is not the language of the future, I decided to support his proposal of calling types as a "lint-type development aid, not a language feature" feeble typing, and renamed the article again. You should read Rafaël's post and this one; it is possibly the most agreement you will ever find between a Perl blogger and a Carnegie Mellon University programming languages graduate student.

...anyway...

There's a lot of feedback, and a non-trivial amount of snark, going around the internet based on the release of Dart, a proposed Javascript-killer by Google. My primary experience with Javascript is that people tell me it's the worst compiler target language that is widely used as a compiler target language, so I basically have nothing invested in the language, but was interested by the discussions it brought up.

The snark about Dart has centered around the following line on Page 72 of the language spec, which I believe was pointed out to the Twitterverse by Debasish Ghosh.

The type system is unsound, due to the covariance of generic types. This is
a deliberate choice (and undoubtedly controversial). Experience has shown that
sound type rules for generics fly in the face of programmer intuition. It is easy
for tools to provide a sound type analysis if they choose, which may be useful
for tasks like refactoring.

But what does it mean for a type system to be unsound? I really think that the most illustrative snippet about Dart types came on the following page of the language spec was not the one that Debasish retweeted, but one that came on the next page:

A Dart implementation must provide a static checker that detects and reports exactly those situations this specification identifies as static warnings.
However:

Running the static checker on a program P is not required for compiling and running P.

Running the static checker on a program P must not prevent successful compilation of P nor may it prevent the execution of P, regardless of whether any static warnings occur

This, for me, clarified what was going on substantially. Let me tell you a parable.

How to anger students in an undergraduate PL course

In four easy steps!

Step 1

Tell students to implement the following dynamic semantics of a programming language. Here's an example of a very simple language:

The implementation was to be an ML function step with type expr -> expr option, and the specification was that step e = SOME e' if there existed an e' such that e \(\mapsto\) e', and that step e = NONE otherwise (for instance, \(\tt true\) obviously can't take a step according to these rules).

Step 2

Describe how they can type-checking the language, by defining a type system like this. Have them implement this type checker as an ML function with type expr -> typ option, same idea.

Step 3

Have students prove the theorem that this type system does something. The theorem statement goes as follows, and the proof is by the by-now standard technique of safety-via-progress-and-preservation.Theorem (Safety): If \(e : \tau\) and \(e \mapsto \ldots \mapsto e'\), then \(e' : \tau\) and also there either exists some \(e''\) such that \(e' \mapsto e''\) or else \(e'\) is of the form \(\tt true\), \(\tt false\), or \({\tt num}(n)\).

Step 4

Test their ML code from Step 1 on expressions like if 4 then true else 9, breaking many of the students implementations of the dynamic semantics which were prepared only to handle well-typed inputs.

Analysis: is this fair?

Think about the perspective of the student who complained about the fact that their interpreter either crashed (or maybe returned SOME(Num 9)!) after being handed if 4 then true else 9. On one hand, they clearly violated the spirit of the assignment: Step 1 was a perfectly well-defined assignment all on its own, and they didn't fulfill the specification of that particular assignment. But on the other hand, they proved the theorem in Step 3, and perhaps feel as if they should get something out of the fact that they proved that theorem: the ability to only have to reason about the behavior of well-typed programs: why should it be surprising that garbage-in produced garbage-out?

Compiler writers actually get to think like that; indeed it's almost essential that they be allowed to. On a 32-bit machine, most values are compiled to plain ol' 32-bit words, and so the representation of \(\tt false\) might have the same in-memory representation as, say, \({\tt num}(0)\). Or it might have the same in-memory representation as \({\tt num}(1)\)! It doesn't matter, because, for the compiler writer, the safety theorem has already given a guarantee that the language has canonical forms - that if \(e : {\tt bool}\) and \(e\) eventually steps to an irreducible expression, then that irreducible expression must either be \(\tt true\) or \(\tt false\).

This means that the compiler writer need not worry about how if 1 then true else 9 and if 0 then true else 9 might behave - they may raise an exception, return (the memory representation of) true, or return (the memory representation of) false. The only programs upon which the compiler promises to behave the same way as the language definition are those that pass the typechecker, and the type safety theorem is a critical component of that promise.

In this way of looking at the world, the representation independence given by a type system is really quite important, and it means that an unsound type system could cause very very bad things to happen: if you're allowed to mess around with the representation of, say, a pointer, by adding things to it, then you have introduced buffer overflow errors to a language, which would be pretty awful if you allowed code written in this language to execute in a privileged way in your browser. That (hopefully!) is not at all what Dart means when they mean their programming language is unsound.

Theorems versus bugs

I think the problem here is that, as a broad overgeneralization, there are two ways to look at what people are doing with types in the first place. On one hand, there is the view that types are a tool to provably preclude certain classes of errors - like the possibility that you might end up with the expresion if 1 then true else 9 which is "stuck" according to the defined operational semantics. On the other hand, there is the idea that types are a language feature that is aimed at helping document code and reduce the number of bugs in a program without necessarily precluding any particular errors. In the academic circles I travel in, a type system is understood as a technical definition about the former: if you can't prove a theorem about what kind of errors your type system precludes, then it is something else besides a type system. I think it's quite fair to both type systems and static analysis researchers to call the latter notion of types a simple form of static analysis.

[Updated] There's nothing wrong, per se, with such a static analysis, though I think it's fair to call it an unsound static analysis instead of an unsound type system. To use the language of Ciera Jaspan's recent thesis, start with a particular class of error (gets-stuck, raises-a-certain-exception, divides by zero, whatever). An analysis is sound if it never passes a program with a particular sort of error (permitting a safety theorem about that analysis!) and complete if it fails only programs that will actually manifest the error at runtime.1 A sound but incomplete analysis is called conservative; the type checkers of ML and Java represent such analyses. An analysis that is neither sound nor complete is called pragmatic by Jaspan, as there aren't any theorems to be proved about such an analysis: they can be justified only by their utility in practice.

I can certainly think of situations where I'd want a pragmatic analysis. In the past, I have had occasion to write Python, which I will admit I have a certain fondness for. However, I have also complained about how my complicated Python program ran for half an hour and then failed with some cast exception that, upon inspection of the code, was always, statically, evidently going to happen no matter what the complicated bits of the code did and why couldn't it have warned me that it was going to do that before running for half an hour. Even if I implemented an analysis to generate such a warning, Python is and would remain a (hopefully) safe, strongly-typed programming language with exactly one type - the type of tagged data that causes an exception if you try to use an object as an integer (or whatever). The static analysis is trying to prove a different kind of theorem - one that says "you have indicated that raising ClassCastException is undesirable, and here's a proof that your current program will raise ClassCastException". If the static analysis can't prove that theorem (thus demonstrating a bug), I'm no worse off than I was when I used Python without that static analysis. A type safety theorem, however, would have the form "if the program passes the typechecker, then ClassCastExceptionwill not be raised."

And with my current understanding, the "unsound type system" of Dart is just such a "pragmatic analysis" as described by Jaspan. I hope my examples explain what might still be wrong with such a language - if you can't static preclude certain classes of errors, you must either allow "unspecified behavior" (and that way lies buffer overruns and security violations) or else you must be able and willing to check, at execution time, for the occurrence of those errors, which is not efficient (and for some classes of errors may be impossible). You're back in the world of our hypothetical angry student: you've got to be able to handle all the ill-formed programs and obey the language definition on a wider class of programs.

[Updated] ...you could argue, of course, that you're no worse off than you were when you finished Step 1. On some level, you're certainly right; my hypothetical Python+tool-to-catch-a-couple-of-errors is better, in my humble opinion, than Python without (and this view has the virtue of honesty). If you want to make that argument, however, I encourage you to read Chung-chieh Shan's related blog post about covariant generics, which argues from a less practical-compiler-optimizations and more philosophical point of view that I also find quite convincing. The point he makes is that the traditional view of types is important because types should mean things - unless you want to choke on pencils!

Conclusion

In summary: type systems are useful because of type safety theorems: a type safety theorem means that certain things can't happen. One reason this is nice is because the complier writer, the programmer, and the language designer needn't worry about what happens in the case that such an impossible thing happens.

Types are also a form of documentation, and they're a regular form of documentation that a complier can then take, generating warnings or errors about certain classes of bugs without actually promising to preclude those bugs. A static analysis that uses type information to preclude some (but not all) errors of a particular type is probably better termed an "pragmatic type-based analysis" than an "unsound type system." Garcia-Suarez called it "feeble typing" and I renamed my post accordingly. It's a bit more pejorative than the tone I was originally trying to take in the article, but I like it anyway.

I think that's a fair way of looking at things, and it puts decidable type systems, on a practical level, as a member of a larger class of type based-static analyses.2 In a type system, we must be able to ask a question about what sort of theorem is proved about programs that are well-typed, and if there's no such theorem, then the analysis is still within the class of type-based static analyses, but isn't so much a type system. At the end of the day, of course, English has no central authority, so asking people to distinguish "type-based analyses" from "type systems" may be a fool's errand,3 but I think it's worthwhile to delineate the difference, and I don't think my delineation significantly departs from current usage (apart from "feeble typing" which was, I think, coined the day after I originally posted this).

Something I started off wanting to talk about before this post got too long was why it is the case that "sound type rules for generics fly in the face of programmer intuition" (which is pretty clearly, in my opinion, missing the addendum "in languages with subtyping"), because two of the reasons why I think this is the case are quite interesting on their own. One of them has to do with polynomial data types and persistent data, and the other has to do with base types and refinement types as explored by William Lovas in his thesis. Neither of these ideas are adequately represented in existing programming languages, though they are more-or-less theoretically understood at this point. Another day, perhaps.

Last word

I noticed that one of the principal language designers was quoted as follows

You have to understand that the types are interface types, not implementation types – so the runtime cannot use the type information for anything. The types are simply thrown away during compilation.

That is to say, insofar as this article goes, I think I've only said factual things that the language designers would essentially agree with; in particular, they seem to recognize that their use of the word "types" seems bound to confuse (or troll) others. But calling a "feeble type" an interface types and a "type" an implementation type seems to just be making up words. And, as made-up-words go, I really dislike "interface types" as a neologism (certainly it has nothing to do with Java interface or anything that comes up when I search for "interface type"). The theorist's critique of Dart is precisely that things you call "type" should define inviolate interfaces and not mere suggestions that are neither enforced nor checked. Calling them interface types makes them sound like contracts, which are not thrown away by the compiler. "Suggestion type" might be a less pejorative version of "feeble type," perhaps? And "implementation types" is a terrible term to use to describe types in a a type system, types that (by way of a theorem about canonical forms) can be relied upon by both the programmer and the implementation.1 Godefroid, Nori, Rajamani, and Tetal call the sound analyses may analyses (though it should perhaps be may not, as a sound analysis precludes a certain behavior) and call the complete analyses must analyses (the error must happen) in the paper "Compositional may-must analysis."2 This view really isn't fair to the reason that we are interested in type systems and believe they're supposed to be useful, but that has to do with Curry-Howard and the unreasonable effectiveness of mathematics and with other things Shan talked about in his linked post.3 Honestly, it probably won't help the fool's errand if I try to call the other group's type based analyses "feeble typing," but here I am, doing it anyway.

Even a sound type system is an "incomplete static analysis" (if it's computable). ML type inference, for example, does not capture the class of programs that are either values or can make progress. No computable analysis could. Both type systems and static analyses (is there a difference?) are interesting for what they say *doesn't* happen (ML: proves the program doesn't go wrong; 0CFA: proves that lambda doesn't show up at this call site). My simplistic understanding of what's going on with Dart is the system doesn't say anything about what doesn't happen. It's both unsound and incomplete. Maybe?

ML is a conservative static analysis, in that it rejects programs that would indeed pass the conditions laid forth by the conclusion of the type safety theorem. Whatever Dart is, it goes in the other direction: it passes programs that would fail the conditions laid forth by a purported statement of "type safety."

You have worried me that I got "incomplete" versus "unsound" exactly backwards in the context of static analysis; I've edited the post to reflect this and will check with Ciera.

I think the correct way to look at Dart is that there are two classes of errors. Some errors - hopefully - like manipulating the memory representation of objects are - hopefully - precluded in any well-formed program; if you can cast object pointers back and forth to integers by going through arrays that's a huge problem, obviously, so I hope that's not the case. If I'm correct, then Dart programs are all statically well-typed in the sense of that Python programs are.

There are also other errors that raise well-behaved exceptions at runtime; it's possible to preclude some, but not all, of them, using the type system, so they're more like checked documentation than what is understood as a type system.

WRT your update, soundness/unsoundness means the same thing for type systems as for static analyses. In fact, type systems just *are* a particular kind of static analyses. The Dart "type system" is both incomplete and unsound -- it's more accurately described as a hybrid of Lint and JavaDoc.

There are interesting attempts at creating complete (but necessarily unsound) static analyzers. Success typing, a strategy pursued for Erlang static analysis, is a example of this.

Any object-oriented language which allows down-casting is unsound right out of the gate--the fact that a ClassCastException can be raised at runtime is an admission of this fact. Most runtimes are quite resigned to cleaning up messes that the static type system has failed to prevent.

Tom: It's still possible to differentiate the statement "ClassCastException will be raised only when you have a cast, so cast-free programs will not raise ClassCastException" from a language where any assignment, projection, or numeric operation might raise ClassCastException. And the runtime has a rather different burden depending on which statement you prove.

An analysis is sound if it never passes a program with a particular sort of error (permitting a safety theorem about that analysis!) and complete if it never fails only programs that will actually manifest the error at runtime.

I think the second "never" shouldn't be in that sentence: sound would mean that no bad programs get through, and complete would mean that all good programs get through, yes?

I almost disagreed with you on "all good programs get through," but that was because I'm so used to types that I can't think of 5 + (if true then 9 else false) as a "good" program. Yes - a complete analysis means that every non-buggy program (and maybe some more!) get through.

I would define a type system as something sufficient to ensure that the semantics of a program is defined.

If the semantics is intended to be defined regardless of the type system then I wouldn't call it a type system - instead the actual type system is a "dynamic" type system that accepts all programs that parse correctly, and assigns everything the same monotype at compile time.

If what's being called the "type system" works similarly to a type system in the way it reports errors then it might properly be called a "type refinement" system - with the extreme situation where all refinements refine the same monotype (sometimes called "soft typing"). But, even type refinements are usually expected to soundly express properties of programs.

Personally I'll be taking the invitation regarding "tools to provide a sound type analysis" seriously, and thinking about how to design a sound system of type refinements that can be used in place of the unsound system. (Of course, it's a shame that sound static types aren't part of the foundation of the language.)

Rob: you're taking an overly-narrow view of types if you think that "if true then 9 else false" doesn't type. There are lots of type systems where that will type. Anything with a common supertype of 9 and false, or anything that can tell that "true" is always true.

Rowan: your definition seems to make the definition of "type system" hinge on how the operational semantics of your program is modeled. What about a big-step semantics where divergent programs have no semantics? Or an operational semantics with a distinguished wrong value?

Sam: I did say "almost disagreed" - I corrected myself! I should have said "I am so used to the conservative view of types," perhaps. Obviously Python or Dart do have a common supertype ("The Type" or "The Dynamic Type") in what I would consider the "real" type system (the one that permits a type safety theorem and suggests that memory safety is preserved at runtime) - and there exist type systems that inhabit the vast territory in the middle.

The idea that a type system might say "oh, because if true then 9 else false is just definitely 9, I'll treat it like a number" is unfamiliar to me, but it's not wrong. To use Shan's analogy, it's the type system permitting me to put a pencil in a bowl of apples if I promise to take it out before I hand it to anyone expecting a bowl of apples - that's a a legitimate meaning.

Sam and Rowan: I think what you're discussing is a point that I didn't address and that Chung-chieh Shan really did address in a way I find really pleasing.

I didn't address the idea that types are connected to meaning, and he argues that they should be. You're right, Sam, when we don't handle divergence in our language's static semantics, meaning is defined somewhat differently between the big step and small step semantics. I believe that's the point Shan made, that I was originally quite confused by, when he said that "unsoundness in Dart is no different than Type:Type" on Twitter. His point was that both weaken the meaning interpretation of types.

In his post, Shan evaluates the idea that these two ways of weakening the types are different. In my view, the guarantee of canonical forms given my "plain ol'" type systems in nonterminating languages to be an important guarantee. In someone else's view, you could make the claim that it's the modern return of Hoare's billion dollar mistake of null pointers, because every expression of type "apple" either denotes an apple or nontermination, and there's no way to denote in the type system "no, really, apple." That's not a discussion I'm interested in having in this thread (it's too bad Shan's blog appears to have no comments).

Rowan: as for "If the semantics is intended to be defined regardless of the type system then I wouldn't call it a type system," that concisely states the point I was verbosely trying to get across all along.

It's more like a refinement system in practice, but again, there's no theorem, so it's a member of a broader family of static analyses. I'm retitling the article "feeble typing," which Rafaël Garcia-Suarez, a Perl guy, coined in an article making basically the same points I've made here.

Wait a second -- Rob, you said, "I'm so used to types that I can't think of 5 + (if true then 9 else false) as a "good" program." Nothing I'm proposing would require you to think of that program as "good".

Oh, I think I know what's wrong -- maybe we're defining "good" differently. When I said that "sound" and "complete" can be understood as "no bad programs get through" and "all good programs get through", respectively, I meant that "good" (and its opposite, "bad") should be instantiated with whatever property our analysis is trying to prove (or disprove). So, for a type analysis, "good" would be well-typed and "bad" would be ill-typed. Maybe you were thinking of "good" as "doesn't get stuck"?

I think I agree with you on what I was using as a definition of "good." I also think the definition you're using is actually closer to the objectively correct one for thinking about type systems. That is, the reason we use types is in order to prove something - to make sure our programs don't get stuck (or, even more strongly to make sure that our programs denote things), and "5 + (if true then 9 else false)" is an actual, if unusual, way of denoting 14.

My original, informal way of thinking about things, is fine, informally, as far as it goes, but the discussion in the comments reminded me that I shouldn't (indeed can't) think too rigidly along those lines. Indeed, elevating the idea "I have a static analysis called type checking that defines programs that are good because I say they're good" over the idea "I have a property that I want to prove about my program, like that it means something or won't crash, so I use types" is kind of what I'm saying the Dart designers did.

An even simpler example would be my hypothetical flow-based Python analysis, that tries to notice a situation where all paths through the program will lead to a type error and flag those for me; if there's any path through the program that might avoid disaster, then the analysis can assume I know what I'm doing and stay silent.

Rob: I think there are two issues getting mixed up here. Regardless of Dart or soundness or static analysis vs type systems or meaning, the program "if true then 9 else false" is a well-typed Scala program (modulo syntax). If the claim is that all type systems with subtyping are outside the "conservative" view of types, I would say that's a mighty odd definition of conservative. (As an aside, (+ 5 (if #true 9 #false)) is a well-typed Typed Racket program.)

Sam: Yep - what you're seeing is precisely that I'm mixing up two issues. In particular, I'm having trouble disengaging the type systems I'm used to (which generate, in my brain, a particular use of the word "conservative" which is by no means canonical) from the actual critical point, which is what is the theorem.

It is a bias I was not fully aware of, and I appreciate you and Lindsey Kuper making me aware of it!

Subscribe To

Request for Contact Info

Disclaimer

I discuss slightly-, half-, and mostly-baked research ideas here, but I treat the blog like an infinitely large office whiteboard. In particular, I usually edit posts post-publication; while I don't generally edit months-old posts, I might.

Therefore, if for some reason you want to cite something from this blog that's not otherwise written down, shoot me an email. I'll say "Whoohoo!" and then I'll throw the relevant post on Arxiv or somewhere similarly archival.