Data typing might be important for someone

Excellent post and interesting discussion at Neil Brown’s blog, on the question of the role of types for professional software developers and for students. I agree with his points — I see why professional software developers find types valuable, but I see little value for novice programmers nor for end-user programmers. I have yet to use a typing system that I found useful, that wasn’t just making me specify details (int vs Integer vs Double vs Float) that were far lower level than I cared about nor wanted to care about.

Broadly, what I’m wondering is: are dynamically/flexibly typed systems a benefit to learners by hiding complexity, or are they a hindrance because they hide the types that are there underneath? (Aside from the lambda calculus and basic assembly language, I can’t immediately think of any programming languages that are truly untyped. Python, Javascript et al do have types; they are just less apparent and more flexible.) Oddly, I haven’t found any research into these specific issues, which I suspect is because these variations tend to be per-language, and there are too many other confounds in comparing, say, Python and Java — they have many more differences than their type system.

I just started working in Scala in July, after years of Java, Perl, and Ruby. Scala ranks as my #1 Worst Language For Beginners right now: the type system is useful, but there’s little chance of using it casually. The time I would have spent on unit tests in Ruby, I mostly now spend on types in Scala–they can be onerous for intricate things, but mostly they save code you’d write for safety checks, especially with Scala’s type inference. You can sort of see what Haskell fanatics are on about, and you can see that Java’s type system isn’t worth much.

But for beginners? No way. They need to focus on basic computational thinking. Let them learn for loops and ADTs first before we hit them with a Turing-complete type system.

Might you be throwing the baby out with the bathwater here? I fully agree that int vs float vs double is too low level for novices and end-users. But what if your type system instead let you capture ideas such as “the input list/array must be sorted”, or other characteristics of your data? Then you’d be getting the real benefits of types: helping you articulate assumptions and catching errors at the point of creation rather than observation. I’ve certainly seen value in asking novices to articulate these assumptions (and document them in comments).

The mainstream languages that we’ve taught programming with don’t support such types. But some languages do. This is one of the features built into Pyret (pyret.org), the new language out of Brown designed around pedagogy with testing and annotations. Pyret makes type annotations optional, so students can ease into using them over time, and you can mix annotated and unannotated functions in the same program.

Types are a prime example of where pedagogic concerns should be influencing language design. The problem lies in lack of influence of pedagogic concerns on language design in this area, not the concept of types by itself.

Kathi, you are designing a new kind of type system in a new language to support student learning. Of course, that might work well, and I wouldn’t be so presumptuous as to say that it wouldn’t. I look forward to seeing the results of you using Pyret with students! The two questions I would have are:

(1) What are the learning goals? What would you expect students to learn when you have those kinds of types built into the system?

(2) Does the increased cognitive load (of having to figure out types, too, in addition to everything else) lead to decreased retention?

I think its important to distinguish between two use-cases. You use the phrase “data typing” in the title of the post, and that nicely subsumes both concepts I think about here.

The distinction is between a type system—with its own language of types and semantics for checking them—and constructs for structuring data.

I believe (and this one is hard for me to see fault with, but still warrants testing), that showing beginners how to structure data well, and having the language facilitate it, is absolutely crucial. If we only teach students to program with weak structures like lists and dictionaries, we’re missing out on a key part of programming: defining the shape of data that fits the problem. HtDP with Scheme/Racket has always focused on this, creating records with named fields to organize data and define recursive structure, and building up algorithms from there (lists are the simplest recursive data structure we use).

In this case, there is no type system checking anything, but there are programmer-defined *data types*. Pyret does this with “data”, which is modelled after algebraic datatypes (see the examples at pyret.org). Understanding the relationship between a real-world problem and a computational representation of it, and being able to design such representations, are two really important skills that are explicit teaching goals for us.

Type annotations and type checking are a distinct language feature, and neither our historical curricula nor Pyret require their use initially. All the annotation positions are completely optional. Types introduce an entire second language other than the programming language that defines behavior, and I believe that you’re right (and this could also use testing) that trying to learn both simply confuses things.

However, another explicit learning goal we have is that students learn multiple different ways of describing the behavior of programs. One of these is simply writing the program. Another is to describe it in English. A third is to write test cases for it. A fourth is to describe it in terms of the data it works with: “this is a function that takes binary trees and produces numbers”. A fifth is to enumerate invariants over the computation: “this binary tree is balanced and ordered”.

The fourth learning goal I listed (and a bit of the fifth) is where types play an important role. Once students have gotten used to writing tests, defining data, and writing functions, it’s possible to start asking more abstract questions about their functions, like what a more formal description of the input and output are. The language of types is designed for just this, and learning how its descriptive power relates to the behavior of programs is an outcome in itself.

Where exactly to start putting annotations is worth some serious consideration. For beginning students, I believe that unit tests, especially those written by the student, are much more effective at catching bugs, describing computation, and giving students actionable feedback than types and type errors. Then, at some point of programmer development, or perhaps just program size and complexity, types become not only useful as a concept in their own right, but also help structure programs, catch bugs, and make development easier, for all the reasons that software engineers like types.

To summarize:

1. Structuring data is valuable regardless of if there is a type language or type-checker. Describing well the data we’re working with is a cornerstone of how we teach programming.

2. Types are a language for describing data and computation in the abstract. Being able to describe programs in many ways is an explicit learning goal, and types are an excellent descriptive language that students should learn to use.

3. The main question is *when* to introduce types, with their associated second sub-language and new vocabulary. Does this work early on, after the first few functions a student writes? Or should it be left until they’ve had more experience with defining and testing functions over structured data, even for large programs?

I don’t think you even need to resort to fancy types like list sortedness. I think the “documentation of assumptions and have them automatically checked” applies even in the simple case of documenting simple interfaces between functions. For example, if you decompose a problem into a function that uses some numeric data from a data file, does that function expect a string (i.e., raw data from the file) or a list of integers (the parsed data)?

Peter-Michael, you don’t “need” to do any of these things, including have run-time checks that enforce types. However, we do do these things both to improve communication and to reduce error. What Kathi is alluding is pointing out is that this also plays into the design process. [Longform below for those who don’t know these concepts, which I know you do.]

To concretize (let’s say we stick to sorting numbers in ascending order), we have

sort :: List -> List

and let’s say we want to do it with insertion sort, which requires this helper function:

insert :: Number * List -> List

Somewhere I have got to write down that the list given to insert is already sorted, otherwise I have no hope of writing this code at all (because insert, far from being a “helper”, becomes at least as hard as sorting itself). So I could write

What Kathi’s pointing out is that with language support, you can also write

sort :: List -> List(is-sorted)

insert :: Number * List(is-sorted) -> List(is-sorted)

which means this is actually enforced, rather than just being in comments and misleading the programmer (student) in cases when it’s violated. That is, it does exactly the same job as writing down List did in the first place.

[The above “refinement” annotations can’t be written in most languages, but they can be written, and are checked, in the language Kathi referred to (Pyret: pyret.org).]

I very much disagree. My experiences teaching shaky programmers using an untyped language have been disastrous. In a language with strong typing, many errors can be picked up by the compiler. For example, my students tend to endlessly confuse collections of items with single items. They have trouble understanding whether a function returns an array of things or a single thing. In a typed language, the compiler spots this and gives them an error message so they know something is wrong and can get to fixing it right away. In the untyped language, this isn’t picked up until runtime, when the program blows up. Time after time, my students would spend hours hunting for the error when this happened. Because they were beginners, they did not have good skills for debugging runtime errors and would end up looking randomly all over the program. I think beginning programmers need all the compile-time support they can get. Strong typing increases the odds that the compiler will pick up fundamental mistakes.

I would love to see a study of whether or not the students get what you’re saying, Bonnie. We know from lots of work (e.g., the Commonsense Computing work on sorting) that students don’t understand types well. But do they get what their purpose is? Do they see that it reduces errors at runtime? Do they value that?

The “Unlocking the Clubhouse” and our MediaComp work both found that students find computer science classes mostly “irrelevant.” They don’t know why they’re doing all that we make them do. Do types make that even worse? The CMU folks suggested that that sense of irrelevance increased attrition, especially among female students. Do types contribute to that?

I suspect they value spending only 15 minutes fixing a compiler error that takes them to the exact line that is wrong vs spending 4 or 5 hours staring at a program that crashes, trying to figure out how to solve the problem!!!

You might be right. The work on students’ responses to compiler errors suggests that they find them off-putting, particularly because compiler error messages are not designed well for novices. Which impacts self-efficacy more — an error that “yells” at them (the verb that my students use when describing errors) and so won’t run at all, or a program that runs but with wrong behavior or that crashes during the run? It’s an interesting research question.

I agree that standard compiler error messages are not the greatest, though it is pretty easy to teach CS1 students what the common ones mean, especially the ones related to type checking. And it is still way better to know, via the little red dot in Eclipse, or whatever environment, that there is something wrong on THIS line, then to be staring at a program that has mysteriously died or done something very odd, without any idea where to look.

If I had time to design a teaching language, I doubt I would do anything fancier than to strip down the language to a small set of constructs, and to add friendly compiler messages. Oh, and my textbook, instead of being a 500 page tome that covers every nook and cranny of a language, will be small and sleek and friendly in tone.

The sorts of issues we’re discussing are at the heart of my dissertation research. I look at electrical engineering students learning to design programs in C, where typing is explicitly enforced by the compiler. I wanted to react in particular to Mark’s comment:

> I have yet to use a typing system that I found useful, that wasn’t just making me specify details (int vs Integer vs Double vs Float) that were far lower level than I cared about nor wanted to care about.

Comments like that make me wonder how our work connects to findings from language acquisition in general and second language acquisition in particular. Take the following example statements, where I think the issue of “typing” in computational analogues has strong connections to grammatical rules in human languages. The cascade I’ll show uses conjugation, gendered nouns, and declension as a comparable constructs to the cascade of type enforcement. The thing to consider at each stage is Mark’s question: is the language making its speakers specify details that are at a far lower level than they care about, or want to care about?

> We went to the shop. (English)

In English, the verb “to go” is conjugated identically no matter what the subject is. So, if one wanted to perform a subject replacement, changing “we” to “he” or “I,” none of the rest of the statement has to change.

> Nosotros fuimos a la tienda. (Spanish)

In Spanish, the verb “to go” has distinct conjugations. In this case, the subject is “we,” for which “to go” has a one-to-one conjugation. The ending “-imos” is only ever used when the subject is “we,” which actually makes the subject of the sentence redundant. We could just as easily write “Fuimos a la tienda” without any loss of meaning. Note also that store is gendered as female, which is why we have to add “la” (as opposed to “el”, the masculine equivalent for a definite article). So, here, an advantage to having to specify detail in verb conjugation is that we encode information in the verb that makes the sentence robust to lost information. If we dropped the explicit subject, the phrase “fuimos a la tienda” is still parseable as is to an unambiguous meaning.

In German, “they” and “we” don’t have distinct preterite conjugations for the verb “to go.” Instead, they share one, “gingen.” But, the verb would still have to be reconjugated if we changed the subject to “I.”

“Shop,” meanwhile, is a noun in German, and in German nouns must be declined to indicate their part of speech. Here, it’s a direct object, so “den Laden” is appropriate. Were it the subject of a phrase, say, “the shop was closed,” the very same noun gets a new definite article, so “den Laden” becomes “Die Laden.”

In summary, with respect to this phrase, English is like the Python of these languages. I can change the subject at will, even inverting the subject and object of the sentence. “The shop went to us” is a simple copy-paste operation that produces a grammatically correct (if hard to interpret) result. No additional “code change” needed.

Spanish makes it trickier. If I invert the subject and the object, I have to reconjugate the verb: “La tienda se fue a nosotros.” I get to keep shop and us as is, because they’re the same word whether they occupy the subject or object positions in the sentence. So, this refactor is a bit tougher and requires typing, as opposed to just copying and pasting. But, most of the entities don’t change.

German is the most stringent. To invert subject and object, I have to write “Das Laden ging an uns.” This refactor requires *three* major changes, all of which necessitate typing:

– the subject-object switch means “den Laden” becomes “Das Laden.” Same entity; different grammatical function.
– the subject-object switch also means”wir” becomes “uns”
– Since “to go” is effectively function taking a new type of argument (singular thing as opposed to plural we) it gets modified from “gingen” to “ging”

We might conclude that German is a lousy language for rapid prototyping, because refactors are expensive if we want to make changes to the schema of entities. Or, we might conclude that English is a lousy first language for beginners because they won’t understand the stricter rules of conjugation, declension, and gender that are necessary to write valid statements in other languages.

But, Mark’s original statement, ported to my examples, becomes “do languages like Spanish and German require me to specify detail at a lower level than I care about, or want to care about? My point is that in the acquisition of human language, we rarely (if ever) have a choice about what language is acquired first. So, Mark’s statement becomes peculiar (in a good way) if you think about it from a human language standpoint.

A response I’m expecting is “but, see, that’s the point. We can choose which artificial language we expose students to, which is the center of this debate.” I understand that viewpoint, but I think it sidesteps the issue. If students display fluency in being able to read, write, speak, and listen to *human* languages, what cognitive (or interactional) resources are they making use of to do that, and how might we tap them for computation instruction? And, why do we not (or, if we do, I would love someone to point me to where we do it) actively pursue theories and findings from second language acquisition for understanding students’ struggles to grasp things like type constraints?

C does not enforce “types” in the compiler. There are two distinct type systems in C, one that the static system enforces and a completely different one that the run-time system supports. This is the source of much confusion. In addition, C’s type system is extraordinarily weak from any modern perspective. Put these two together and a study based on C’s types has no generalizability, nor even validity to any mainstream understanding of what types are about. So, study C by all means if you want, but don’t say it’s about types: say it’s about C.

Then you say there’s a static system of type-checking that runs prior to a programs execution. I assume the distinction you’re making is that type-checking isn’t part of the C compiler, but nonetheless types are checked prior to run-time in C. Is that right? If that’s not right, could you elaborate?

Brian Danielak: C is very confusing to talk about in this regard because, as I said, it has two type systems. To pick a well-understood example for contrast, Java has one: the types at run-time correspond to those that the compiler understands. Furthermore, Java has a safe run-time, so these types are enforced. Thus you can have a soundness theorem for Java. (Ditto for Scala or ML or Haskell or ….) But C’s run-time is (a) defined over word sizes, not the static types that the language offers; and (b) these are not enforced. Thus, (b) tells us it isn’t sound, but (a) tells us it isn’t even meaningful to talk about soundness. [NB: I’m using “sound” in the technical sense. If you’re not sure what it means, please ask.]

The “compiler” is irrelevant to this discussion, so I shouldn’t have mentioned it (I was just copying your phrase). That’s merely an implementation strategy. If you had a C interpreter instead (and such things have existed), the issues would remain the same.

Unfortunately (in my opinion), you can’t get completely away from types. In Python, for example, students have to understand that 5 is not the same as ‘5’ and that 6 + 5 yields a different result than ‘6’ + ‘5’ and that 6 + ‘5’ causes an error. Don’t get me started on what happens when you try to multiply something by 1/3 – thank goodness Python 3 finally fixed that problem

Hi Mark, I think you’ve mixed up a whole bunch of related but distinct concepts, and some of the comments here reveal it.

– Are you referring to *static* types, i.e., types that are checked before a program’s execution (and prevent it if the checker does not pass)?

– Are you referring to the ability to create new, truly distinct, forms of data?

– Are you referring to the ability to create new abstract datatypes whose representations are hidden (i.e., ADTs — distinct from classes, objects, etc.)?

– Are you referring to the fact that a run-time might check values for integrity according to a partitioning of data? (This is what Lloyd Smith in comment 11 refers to, but this is entirely distinct from static types. Neil Brown’s quoted paragraph is also similarly confused between types and tags. But this is a common misunderstanding.)

– Are you referring to the *annotations* that are required for static types? (And I presume you’re aware they are not required in the presence of type inference?)

These are not even all orthogonal issues. So you have to be careful to define what it is you’re talking about.

Assuming you are referring to static types, let me point out that you still have two unjustified (and incorrect) assumptions in your message:

1. You write as if languages must be statically typed or not — it’s one or the other. That is not true. For instance, in Pyret (http://www.pyret.org/), you can write either a typed or untyped program. In particular, you can start with an untyped program, and then gradually add types until (if you wish) the program is fully typed.

2. You haven’t given an exhaustive enumeration of what you consider irrelevant details, but Pyret, for instance, does not have meaningless distinctions between “int”s and “float”s. There is a type Number, and the corresponding values behave like numbers.

Shriram, I’m concerned with how students perceive issues of types. My particular research interest is in non-majors, the vast majority of programmers who are not and never plan to be professional software developers. Differences that students don’t perceive don’t really exist. Unless you can show that students can use something and understand it, you don’t yet have evidence of success.

– I don’t know of evidence that students perceive a difference between run-time and compile-time type errors. They see errors, and don’t really think about where they come from.

– I do know of evidence that (a) students don’t understand most type systems and errors associated with types and (b) students find the specification of static types (what I think you mean by annotations) onerous. The best work that I know in this space is Andreas Stefik’s work with Quorum where he’s shown how student-centered design of a type system leads to a more usable and more understandable type system.

– From what I know of Pyret (and please do correct me if I’m wrong, because I’d love to read it), you have not yet conducted studies of students using Pyret (let alone having them vetted through publication in a peer-reviewed forum). I know that you have created Pyret as a language that can have either, but defining a language isn’t the same as finding that students can use the facilities you’ve designed and understand it the way that you meant. Do you know of other evidence that students can successfully use a language where they start with untyped and gradually add types?

Mark, I too am deeply concerned with how students perceive these things; I’m not really that interested in what academics and language designers think (partly because I already know, and know which of these opinions truly matter and which are merely viewpoints conventional to the community). To address your bullets:

I’ve found that even students who have quite some programming experience don’t have much understanding of static vs dynamic errors. That said, part of my goal and job as a computer science educator is to get students to understand these distinctions. They really do matter as one goes further into the subject. And Pyret is being designed for introductory _computer science_ education. That said, because it can be used without any static types at all, it doesn’t force anyone into a typed setting.

I have seen Quorum. The studies are more interesting than persuasive, and the materials on the language are really weak. That said, I fully support the goals of the project. I’d love to see more work in this space.

We haven’t yet done any peer-reviewed research on Pyret. You need a language first to do research on it!, and an environment, performant runtime, etc., to avoid confounding factors. This is the first semester Pyret has been public. But we are gathering data on it. We already have lots of student programs, written without forcible guidelines one way or another, on which we can see to what extent students will write annotations “in a state of nature”. We also have these for two quite different levels of students, so we can perform comparisons between them, too. Of course, we need to do both formal studies and get data from a broader range of students.

But I do think you’re speaking a bit too loosely about what are complex topics in their own right, and saying you only care about student perception is no more useful than saying one only cares about mathematical truths about programming languages: both are extreme positions and somewhere between useless and dangerously misleading in isolation. Some of the distinctions I’m raising are matters that users of a language will face, some maybe even observable to novices.

I also think this business about “novices” is too easy a dodge. But that’s perhaps a discussion for another day.

Points well-taken, Shriram. Barb is preparing for her qualifying examinations in Human-Centered Computing, and she’s been reading LaTour’s work on sociology of science and Kuhn’s concept of paradigms. It’s led to discussions about how different communities define their issues and view evidence differently. I think the programming languages and education researchers have different definitions of what’s a problem and what constitutes evidence.

I find static types checked the compiler to be very valuable in large projects (multiple programmers or multiple years). I find them to be mainly a nuisance in small programs (one programmer for a week or less). I like strong typing, but not static typing, for small programs, and I now do most of my prototyping in Python. Larger projects I do in c++ (not because it is “best”, but because I’ve used it for decades and am more comfortable with its idiosyncracies than with those of languages I’ve used less).

I do wonder what shriramkrishnamurthi means by “There is a type Number, and the corresponding values behave like numbers.” The axiomatic basis for integers, rationals, reals, complex numbers, and quaternions are all different. Some operations (like square root) are functions in the complex numbers, but not in the reals or rationals (though a limited form of square root is possible if you limit the domain to the non-negative reals). Other operators (like <) are not definable in the complex numbers, but are in the reals, rationals, and integers. Modular arithmetic is well defined for integers and can be extended fairly cleanly to the rationals and reals, but the extensions I've seen to the complex numbers are unsatisfying.

Telling me something is a "number" does not tell me what operations I can do on it.

Our axiomatic stance is simple: it’s an alias for rationals, the largest type we intrinsically support. We could talk more about these things but this isn’t the forum. To me, the much more interesting question is how Mark reconciles your remark.

You have rightly pointed out that these notions are not interchangeable. Mark has said, “details (int vs Integer vs Double vs Float) that were far lower level than I cared about nor wanted to care about”. We know, for instance, that it’s painfully easy to see that the floats are not associative under +. So I wonder whether Mark believes that these properties simply don’t matter at all — eg, a student would never ever have a situation where A + (B + C) turns into (A + B) + C while, say, cleaning up a program.

The non-associativity of floating-point + came up in my bioinformatics lecture today, as an off-hand remark that one shouldn’t do equality tests in the traceback algorithm if using floating point, though it was fine when doing integer arithmetic.

Sorry, it’s been a while since I studied floats so I could be quite off, but I don’t see the connection. The problem with floating-point equality does not mean you should not test (you should!) — it’s that your check should be up to some tolerance, not for the exact answer. (This is also related to reading in floating point numbers from source files.) But the lack of associativity means your can’t refactor code, because the answer may end up arbitrarily far (in float space) from the intended answer. [Though, then, all the more reason to have written tests to make sure you didn’t fudge things up completely]

The traceback algorithm looks for which of several prior places got the result of a max operation, to which another number was added.
If you do exact arithmetic, you can stop as soon as you get a match, but with floating point, (a+b)-b does not necessarily result in a, and so figuring out which predecessor was the max requires looking at them all. (Sorry, this comment is too short to explain in detail the dynamic programming problem, the traceback, and the optimization that can be done with exact arithmetic but not with non-associative addition—I spend about 4–5 hours on the algorithm and its variations in lecture.)

@gasstatiowithoutpumps, got it, makes perfect sense now. Great example to keep in mind. By a fun coincidence, one of my tasks for the flight I’m leaving for in an hour was to write up my notes on Smith-Waterman; what I didn’t know is that this step has a name. Now I can incorporate not only the name of the traceback but also this comment about floats. Thanks!

I’ve exceeded the comment depth, and am getting way off topic, but the algorithm where the floating-point non-associativity came up was not Smith-Waterman (which we’ll get to on Monday), but alignment with arbitrary gap cost. Because I want students to learn how to construct the recursions and dynamic programming for new problems, we go through many alignment algorithms: gapless, arbitrary gap cost, linear gap cost, and affine gap cost. We do global and local alignment for each. At the end, I have them implement both global and local alignment with affine gap costs (in Python). The most common bugs are in the initial conditions and in failing to do the same computation on traceback as in the forward computation (generally forgetting to include the gap costs on the traceback).

I have to think about this but the argument that types are only important for experienced programmers makes me uncomfortable. It has to do with clear, logical thinking and the mistakes that can come about when one fails to use such thinking. Lack of precise data types has been the source of many of my programming problems over the years and failure to instill their importance makes me fear fuzzy thinking and bad logic.

This all stems from my early education. I can still recall taking physics and learning over and over how important knowing your units is for correct calculations and, indeed, correct thinking when it comes to solving physics problems. If you aren’t sure what your units are, you can so easily go wrong because the numbers are only a means of symbolizing physical quantities. In my later career I found this to be true well beyond the realm of physics. For example, throughout my career I’ve observed people misusing percentages because they lost track of the units, leading me to come up with the “Frailey percent question” – when you see someone claim that a certain percentage applies, ask youself what it is a percentage of. For example, recall a recent presidential candidate’s claim that “47 percent pay no income taxes”. Is this 47 percent of the people, 47 percent of the families, 47 percent of the wage earners, or 47 percent of what? The answer can make a huge difference in the implications. For example, if it were 47 percent of the people, that would be perfectly reasonable considering that most of those 47 percent are children.

Data types are not the same as units but they are pretty similar. Imprecise data types may be helpful at certain primordial stages in teaching people about programming, but I’m inclined to think that precise data typing should be burned into programmers’ minds at an early stage, just as the importance of units was burned into mine. And while non-programmers may not need to understand data types the way programmers do, when they fail to understand the units being bandied about, they can reach grossly inaccurate conclusions.

I don’t think that anyone here is arguing that knowledge of types are only for experienced programmers, but rather that the very restrictive declaration of the type for each variable is a lot of overhead for beginning programmers, with relatively little payoff.

Part of the problem has been already identified—types in most languages don’t provide the sort of useful information that units do in physics.

From what I’ve seen, students struggle more with assigning consistent meaning to their data than with the sort of structural help that compilers give. Units checking in physics provides a more semantic check that uncovers more misunderstandings.

It sounds like Pyret is a step towards trying to provide a more useful type declaration—I’ve not looked at it, so I’ve no idea how reasonable the system is, nor whether it would help beginning programmers.

In small programs, static type checking (of the type in C++ or Java) usually only catches errors that would not have been errors except for the requirement to declare the types, so does not help in debugging. In larger programs, the consistency checks provided by static type checking at interfaces is more valuable.

The assumption in the first para here is that the types have to be annotated and can’t be inferred, which is false. In fact, there are inference mechanisms that can, in a certain provable sense, “beat” any human. But this is unfamiliar to people who’ve only programmed in C, Java, and the like.

Inference matters because it suddenly and heavily changes both sides in the cost-benefit equation.

I’m not sure that type inference would help beginning programmers at all. They would do just as well with entirely run-time type checking (as in Python). Type inference is mainly a way to reduce run time costs (and to provide checking on code branches that are rarely executed, so may be missed in run-time tests).

I’m not assuming that types have to be annotated—I do most of my prototype programming in Python, which has strong types at runtime, but no static types. This results in a fairly large interpreter overhead (which type inference could reduce), but type inference is not needed for running programs with types.

I like the idea of optional type definition to provide checking where programmers see the advantage, and I like the idea of types with checkable properties, but I don’t see type inference as particularly helpful to novices.

Incidentally, I see one HUGE problem coming for Pyret with novice programmers: “x-y” and “x – y” are totally different, since Pyret apparently allows hyphenated ids, but still uses “-” for subtraction.

As I said, my goal is to eventually migrate students to understanding languages with static types. Right now, curricula have to change languages entirely, and often with that the entire ecosystem and mindset. But you’re really talking about a policy decision — when should this shift happen — and I think that will depend entirely on individual instructors and curricula.

The syntax issue you point out was an intentional experiment. I prefer to study this through user studies. Of course, I conjecture the opposite as you (which is why we put this in). Naturally, if the evidence points out that you were right and I was wrong, it would be straightforward to change this. But we already have some evidence that this isn’t confusing.

I very much agree with your points. My experience with teaching novices over many years has been that type mismatch errors almost always trace back to real mistakes in the logic, usually due to fuzzy thinking about the problem. Again, the common mistake I see with novices is confusing a collection of things with one thing. When I help students who have made that mistake, I can see that it isn’t just a mistake in syntax – it is always a mistake in understanding the problem. Typically, their variable names will show the same confusion, and when I ask them to describe the problem they are solving, the confusion shows up there too. Keep in mind, novice programmers also have strong tendency to type their entire program in before running it. So getting a compiler error message, even a cryptic one, early in the process helps them stay on track.

When you are discussing type inference, would you include the type inference system in PHP? I had to teach PHP to a group of shaky programmers a couple of years ago, and discovered that they found type inference to be MUCH harder to understand than static typing. They hated it and made numerous mistakes.

Hi Bonnie — PHP does not have type inference in any conventional sense. I think PHP is a train-wreck in almost every possible sense, so your experience is unsurprising.

When I say type inference I mean along the lines of what ML languages (Standard ML, OCaml) and Haskell do. However, their algorithms also have a bad problem in them: the error messages on type-*incorrect* programs can be very bad.

Many modern languages (like Scala) therefore use something called “local” type inference that (roughly) works within a method body but not across them (so you still need some annotations). This then gives far better errors (really not much worse than regular type-checking errors). [NB: Scala has lots of other type system complexity that makes errors worse than they can be.]

In Pyret, we’re working on ways to do type inference that don’t have these problems — and we’re doing it by redefining the problem (because these problems are inherent to the way the problem is defined). We hope to have user studies on these topics once the algorithm has been implemented and field-tested on more sophisticated users. There are two things to study: how effective is it at inferring types, and how bad are the errors when things go wrong. The first of these we may be able to study by the end of this semester; the latter we will do as controlled studies later (hopefully as early as the spring).

The “novice” programmers I teach are not really novices—they have had anywhere from 2 programming courses in Java to a full BS in computer science (sometimes with years of programming experience on top of that). I don’t (at this level) often see confusion about single objects vs. containers (either because containers like lists and dicts are pretty simple in Python or because students with that sort of confusion have been weeded out in prior courses).

Many of them (even those with a full BS in CS) have trouble with the names of their variables and semantic confusion about what the variables mean. They use useless names like “flag” that tell the type of the variable rather than the meaning of the variable.

Another big problem is students not having any clue how to subdivide a problem into smaller parts with a clean interface—they keep I/O formats in internal data structures (storing a list of numbers as a long string of space-separated, decimal representations, for example), or mix command-line argument parsing, input, and processing in a single unreusable routine, and so forth. I think that so much time is spent in the beginning courses on type syntax that not enough attention is being paid to programming. This may be a problem just with the way the courses are taught here, where the beginning students do essentially all their in highly scaffolded ways, so that they haven’t ever really had an opportunity to decompose the problem themselves—they’ve just learned to code bits of someone else’s design.

A heavy emphasis on i/o, plus languages in which strings automatically turn into other objects without explicit conversion, would very much lead to the phenomena in your last para; unfortunately, too many curricula do the former, and too many languages these days do the latter. You then are stuck with telling students, “Yes, you can do this, and yes, it works, but this isn’t a Good Idea”, which usually falls on deaf ears.

Wow, lots of discussion on this issue! My conclusion is mainly open-ended because I have no data on the matter. My personal belief is types all the way from the start, but I can see the logic behind starting with less overhead, and it seems like there has been little useful study of this typing issue in education. (Despite it being an obvious question; too hard to design and conduct a powerful enough study?)

I have been doing bits of R recently, and as a knowledgeable end-user programmer, the lack of types makes me swear repeatedly. “What do I have and what I can do with it?” is a fundamental question in programming, especially in a language and system you are not familiar with. Static types help answer that question, and dynamic languages all too often leave it unclear.

Let me suggest two metrics. Given a program, we count:
1. The number of explicit decisions required
2. The number of implicit decisions made

Note that I’m ignoring type inference because it blurs the lines, and I think (this is an unsubstantiated opinion) that in non-trivial cases one must have a fair working knowledge of their language’s type inference in order to actually use it. For purposes here I’d argue that inferred types are explicit decisions made by the programmer that are not obviously apparent in the source code.

Explicit decisions have an immediate cognitive cost, especially to a novice. For example, a programmer may want to create a variable for a number. In, say, Java, the programmer has to decide among byte, short, int, long, float, and double. So they have to pick between six types. Also, they have to remember to case (Java object types usually but not always begin with a capital and are camel case, and primitives are all lower case, oh and then there’s String which is an object but has syntax in the language). I imagine this could be very confusing to a novice. Never mind the fact that none of these really behave the way someone who learned decimal arithmetic would expect.

The more explicit decisions are required, the more a novice will struggle to created a small program that appears to work.

But explicit decisions can also be checked by a compiler or interpreter.

Explicit decisions can also be somewhat generalized to explicit distinctions, where “int versus long” is in the same class as “compile time versus runtime.”

Implicit decisions are ones made that are not captured in the source code. They have no immediate cognitive cost. However, the compiler/interpreter has no way of checking them, so any inconsistencies will only show up at runtime and quite likely only in certain cases. This can be quite frustrating, for novices and journeymen alike, because once program size exceeds a certain point they can no longer keep it all in their head. This is compounded by team size and by time.

Given some knowledge of types, it’s possible for the programmer to offload a significant amount of cognitive effort onto the compiler. This goes far beyond “what operations can I perform on this.” Recently, I had a relatively complex algorithm written in Scala that I just couldn’t get working. The processing involved a lot of “names” that represented distinct types of things but of course came in as strings, much like a compiler (actually, what I was writing could be described as a compiler, just not general purpose). There were a lot of maps keyed on strings and sets of strings. I replaced a number of the strings with distinct types that were just wrappers around strings, then proceeded to battle compiler errors for the better part of a day, maybe more.

When it compiled, it worked. I had offloaded a mental task that exceeded my mental abilities for holding simultaneous details in my head onto the compiler. I could conceive the architecture of the program, but I could not keep track of the details of enforcing it, so I encoded the architecture in the type system and let the compiler enforce it.

……….

On one side we have the necessity of explicitly representing distinctions increasing the cognitive load on the programmer, because it can greatly increasing the branching factor of writing a line of code. On the other side we have explicitly representing distinctions significantly decreasing the cognitive load on the programmer, because the compiler can effortlessly keep track of several orders of magnitude of details than even the expert programmer can.

I think the key in computing education is to teach students to move from one to the other; from the sketch to the blueprints.

……..

P.S. The examples here emphasized the utility of static typing. The real utility is in non-leaky, fail-fast abstractions. I could describe examples where refactoring Python or Matlab code lead to the exact same benefits as the type checking in Scala. The trick is always making it so flaws surface quickly without specially crafted tests to make them fail.

Erik, this is a very cogent and persuasive argument. This is why I am not a believer in introducing types at the very beginning. As long as primitives check their arguments, programs will fail-fast, and students get many of the same advantages without the “and now I have to learn two distinct languages, which are intertwined with each other into one” problem. It isn’t until you can start to encode rich architecture (as in your Scala example) in the type system that you start to see real benefits.

However, you are also making a strong assumption about inference, and one that I think is neither fully true of modern systems nor strictly necessary. The long line of research into type error message reporting takes the former tack (but the fact that it’s still a research problem says everything I need to hear about it [incidentally, the latest strong result on it having been done by my own post-doc, so I’m not ignorant of the area]). But in Pyret we’re trying to take a very different tack on the inference question, with the goal of avoiding these points of overhead. This blog is probably not the right place to get into a detailed technical discussion about how we are trying to do so.

FYI, I tried an experiment where I slowly turned typing on for a CS2 Java-based course, and it did not go well. We use (in Calico) the Java parser from DrJava which has the ability to turn on and off typing requirements. With it off, Java looks a bit more like Python (eg, no need to define variables with types). With it on, it is regular Java. The problem was that you can ignore types in some places, but can’t in others. The worst of all possible scenarios. In the future, I will use Calico Python or Calico Java with types on… the middle ground was too confusing.

Doug, the problem is that you’re looking at an ad hoc mechanism for removing types from a typed language, which is a process inherently likely to fail. You need to be able to _add_ types as you go along, and the _language_ needs to support it — without both these requirements, sooner or later you will run into incoherent corners, and they will make no sense to anyone because what you’re really seeing is the ad hoc interaction of two random tools. That’s why Pyret’s design is for you to start with no annotations and add them where you want to, incrementally.