Subscribe to this blog

Follow by Email

Some Random Thoughts on Haskell

It seems like all my Python friends are playing around with Haskell lately, so it's been on my mind a lot lately. Since I'm a language guy, Haskell has always fascinated me. I wrote my first article on Haskell in 2006 (Everything Your Professor Failed to Tell You About Functional Programming), and I think it's a beautiful, interesting, awe-inspiring language. However, I've never really achieved full mastery of it. Since I've been thinking about it a lot lately, I thought I'd share some of my thoughts:

Haskell is well known for being very concise. The functionality / lines of code ratio is very good. However, if I compare Haskell code to Python code, one thing is clear to me. Haskell squeezes in a lot more function calls per line of code than Python does. We have this idea in programming that the number of lines of code that a programmer can write in a given time period is relatively constant regardless of language. However, in my experience, I can write lines of code in Python, Ruby, and Java much more quickly than I can in Haskell. Part of this is because you can squeeze a heck of a lot more function calls into a single line in Haskell.

It's pretty common to compare languages based on how many lines of code it takes to implement a particular algorithm. I think it'd be really interesting to compare languages based on how many symbols (such as words, operators, punctuation, significant whitespace, etc.) it takes to implement a particular algorithm. It'd also be interesting to compare languages based on their symbol / line ratio. Haskell and APL have very high symbol / line ratios. Assembly has a pretty low symbol / line ratio.

Some languages are easier for newbies to understand than other languages. Python and Java are easy to understand even if you don't know them. APL and Forth are impossible to understand if you don't know them. I think there are a lot of things that factor into whether a language is easy for newbies to understand. For instance, how close is the language to English? COBOL tries to be like English, so it's easy for newbies to understand. How many unusual symbols are used? APL uses a lot, so it's hard for newbies to understand. Does it follow normal, mathematical, infix notation? Forth doesn't, so it's difficult for newbies to understand. It can be difficult for newbies to understand Haskell code due to the liberal sprinkling of things like $, ., >>=, etc.

There's also something to be said for languages that tend to use overly concise names. Consider the name creat() in C--it hurts my brain not to put an e at the end of it! Slightly more verbose names can be very helpful for newbies. If I read some Python code, and I see "threading", I have a decent idea what that is about. If I read some Haskell code, and I see "TVar", I have no clue what that is about. TVars are so incredibly interesting, but I certainly wish it had a more newbie-friendly name!

Haskell has fascinated me for years. However, as soon as someone mentions Category Theory to me, my eyes start to glaze over, because when it comes to programming, I am a linguist, not a mathematician. I remember Larry Wall said that he approached Perl from a linguistics point of view, and that he really liked the fact that Perl was a little messy, contextual, redundant, etc. Afterall, so is English!

When I learn a new language, the questions that come to my mind are what's the syntax and what does the syntax do? Only after I truly appreciate the syntax and semantics can I appreciate the underlying model. For instance, I've always liked Python because of its syntax, but it took years for me to realize how cool Python was because of its "everything is dicts and functions, and you can hack stuff all you want" nature.

The last thing I want to cover is composability. Some languages and language features are more composable than others. For instance, it's really easy with structs, objects, algebraic data types, etc. to write something like a.b.c.d. This means an instance of A has a reference to an instance of B, which has a reference to an instance of C, etc. It's trivial to connect an object graph in this way. Functions are also very composable. In Haskell, a . b . c . d basically means a(b(c(d()))). However, monads aren't quite as trivially composable. It has taken a non-trivial amount of time for my buddy John Chee to try to explain liftM and monad transformers to me. It'll be interesting to see how all that stuff turns out.

Of course, Haskell isn't the only language to have to worry about composability. I remember seeing a talk by one of the Scala compiler developers who said that the biggest source of bugs he had seen was from people mixing features of Scala togehter in ways that he had never considered before. Compiling each feature one at a time to work on the JVM is easy--getting all the features to play nicely with one another when they're used at the same time in unexpected ways is quite a bit harder.

So as I watch all of my friends learn Haskell, these are the things that have been on my mind. It's a fascinating language, so I always enjoy reading other people's perspectives on it!

By the way, thanks go to John Chee and Neuman Vong for patiently answering all my Haskell questions.

Comments

Interesting point of view. You forgot a crucial point: learning curves. The learning curve in Haskell is far away the worst (even over C++), because the language exposes plenty of complex and mathematical concepts (like Category, composition, Kleisli, Arrow and so on.

The thing is, Haskell doesn’t tend to be user-friendly. You have to get into it to actually be able to understand it. You say it’s clearer to use a function that has “thread” in it than “TVar”, but when you know how to use Haskell, it’s pretty clear what a function does – plus the signature is far away more lisible than a regular C-like language prototype. The same goes for monadic functions – ending with a M, applicatives ones – ending with a A – and so on. When you know what is a monad, it’s better using a function called >>= than a function called bind.

Finally, about composability, what you said is totally true! But, you should have digged Haskell a bit more to get into… lenses! Lenses give you instances for (.) in order to use the dot notation exactly the same way you use it in C-like languages, but with extra features you don’t have in those languages! – like functor mapping on the fly. In the end, you can use the dot notation in Haskell with “lensed-data” :)

I'd just like to say that I disagree strongly that Haskell's learning curve is worse than C++'s. Haskell has a fraction of the complexity of C++ in several metrics. (The Haskell Report and GHC User Guide could fit inside the C++ Standard about 5 times over, the grammar is much simpler, and so on.)

The only reason it would be harder to learn Haskell than C++ is that perhaps C++ is closer to languages that you already know. If you've been programming in imperative OO languages, you can expect Haskell to be about as challenging as your first programming language (plus maybe a little extra cost for unlearning bad habits). All the stuff you know about algorithms will eventually be useful again though, once you relearn how to express basic things, so not all is lost.

If you'd be starting from a blank slate, I don't think it's harder than average at all, and C++ would definitely be harder than average there.

The fact that certain things in the Haskell libraries are inspired by mathematical concepts also says nothing about how hard it is to learn to use them for programming. In fact, if you'd just like to get on with writing programs, studying the category theory and other mathematics involved, while it might be interesting, is for the most part a distraction.

The fact that the libraries use the standard mathematical names for abstractions mean that when you perhaps want to study them in more depth later on, you'll know what to search for. But category theory is by no means any kind of prerequisite.

Also a point in favor of Haskell's learning curve over C++, is that you can be a novice and still write decent Haskell code. Whereas C++ has so many subtle traps and pitfalls that some C++ experts are saying that C++ is an expert-only language.

Yes, SLOC is only a so-so measure of a programming language's expressiveness. Number of tokens is a much more precise measure, though some rather illegible languages like Perl and APL would always win.

The notion of 'user friendly' systems depends somewhat on who you expect your new users to be. First, let's define 'user friendly' to be somewhat close to the common definition: Some system is user friendly if it is easy for a user not trained in the system to use it.

People who are not trained in a particular programming language nevertheless have some kind of relevant experience that they bring to the table. Languages like Python and Java that have verbose, descriptive identifiers take advantage of the training that many novices have in the English language to help them out.

Haskell, on the other hand, was put together to be user friendly for an entirely different class of users: those who are familiar with basic mathematical notation, but not necessarily with programming. It could be argued that Haskell is far more user-friendly to people who have training in mathematical notation but not in programming than Python is, because it retains both basic notation of mathematics along with fundamental properties like the essential meaning of words like 'function' and 'variable' that are re-defined to mean something different in languages like Python.

Basic haskell doesn't even rely on particularly fancy mathematical concepts. Sets, functions, variables and their associated notation should be familiar to most high-school kids. If they've already learned another language and gone through the re-learning of expectations about what things in programming languages mean, Haskell will maybe not seem very friendly to them, but if they've just walked out of an algebra class and have no programming training at all, it will probably seem pretty intuitive!

Of course, most people who comment on the user-friendliness of programming languages are not actually untrained programmers. They have most likely been programming for years and have had the basic model of imperative programming drilled into them until it's second nature. Of course Haskell will seem unfriendly to them at first.

As to exposing complex mathematical ideas; have you considered the mathematical ideas involved in the semantics of object-orientation? They are not as trivial as you might think, and people often produce poor code due to not sufficiently understanding what a sub-typing relationship in a class hierarchy ought to imply.

To be fair, '.' (or '∘') is the symbol for function composition used in mathematics. I don't know about other countries education systems, but that's been a very familiar notation to me for very many years, so I would expect people new to programming to be able to recognise what it means.

When a programming language seems hard is looking at it from one perspective when they should be looking at it from another. If you look at Haskell coming from Ruby you may not like it. If you look at Haskell coming from C it is likely that you may love it. Today languages also have a community surrounding them and these communities have values and preferences. In Ruby I see that many prefer conciseness over everything else. I even see people in-lining well factor functions to make it look more concise like that's cool or something. Show a dude like that Haskell, go ahead.

Thanks for this article - I think it follows some of my feelings on Haskell at a language level, although using it in practice there are much more significant issues and suggestions I might give. For example, if you're coming from a python background (as I did), the module system will drive you up the wall - import * is very bad, but in Haskell it's just what people do. You want to know where this type is defined, where this constructor is created or matched on? Etags (or ctags) aren't really going to help you. The number of nonstandard language extensions is a bit unusual, and typically build systems set up environments that are difficult to emulate if you want to use a module interactively - without building it all - in ghci or so.

On your specific points, I find my python code to be slightly more concise and significantly more readable than my Haskell. I want to think this is mostly because of the type/value distinction, but my Idris code doesn't seem to be that much less verbose. I think it's really a consequence of language-enforced purity.

I am a mathematician, so I really appreciate the value of abstract algebrae and the categorical machinery that manipulates it. Of course, it can be introduced as you come up with a need for it. So I hope that you (and everyone!) comes to appreciate the composability that some of that machinery brings!