Pages

Saturday, January 06, 2007

What makes a programming language 'more productive'

First: What does 'productive' mean? I would loosely define it in the context of programming as "Given two equally skilled programmers or teams of programmers, the less time they need to create and maintain a program for a given problem, the more productive is the implementation language".

This definition may have some holes in it, but I first want to make clear, that productivity for me is not an abstract thing like "code beauty" or "conciseness", its simply outcome oriented. If you're able to create a program in a consistently shorter time by using a certain language or framework compared to another, I think it's a good idea to consider it as 'more productive'.

What's the secret of productivity? In fact its utterly simple: Code reuse.

If you have to solve a problem you have to think about it first. This takes a lot of time. You also have to think about it while you're implementing a solution and later while debugging or maintaining your solution. Thinking about the problem generally requires the most part of the time in software development. There's only one way a programming language can solve it: By providing an existing solution. Sure, there are simple cases where you only have to type a simple solution in without thinking about it much. But even then: If you don't have to write the code yourself and reuse existing code it's simply the fastest way to a working result. Existing code is

It doesn't even matter how powerful a language is: If you can build your solution on existing code with only small and easy modification, you will be faster then using the most powerful language. Also existing code is already debugged and tested. And if it's from a 3rd party developer you don't even have to maintain it.

But code reuse isn't always identical to 'using a framework' or 'importing a lib'. Sometimes it's build right into the language: When I switched from C++ to Java I experienced a huge productivity gain. A big contributor to this was garbage collection. Instead of thinking about memory allocation, when to release an object, how to write a fast custom allocator, allocating data on the stack or on the heap etc. I no could allocate objects without thinking much about it. I could simply consider the problem as 'solved' by the language.

Something like this happens on many more occasions: I can create classes in C, but by having the language do it like in C++ or Java, I can consider the problem 'solved'. In assembler I have to create function calls and local variables myself - in a higher level language this problem is again solved. All this is code reuse: Somebody has looked at common programming situations ('patterns') and created a universal solution for it. Sometimes this solution is directly incorporated into the language, more often it's put into a library. But nearly always both ways are possible, so it doesn't make sense to call code reuse which is part of a language a 'abstraction' and code reuse from a library simple 'code reuse'.

So the reason why certain languages are more 'productive' that others is code reuse. C is more productive than assembler because we don't have to think about allocating variables or building stack-frames. By having the problem solved we don't need to think about it, we don't need to implement it again and again and we don't need to search for bugs which result from making mistakes by doing it ourself.

Now we can try to identify the reasons why certain languages are more productive than others, and why sometimes even more powerful looking languages don't deliver.

Lets first look at Java. I mentioned that my productivity increased a lot after switching from C++ to Java. Besides the already mentioned garbage collection, the huge number of available libraries are another part of the reason. A interesting question is: Why are there so many libs for Java? There are other languages which had huge commercial support but never had as many reusable code as Java. Why?

If I want to reuse code, the code has to fit into my existing code. But more important: It has to fit into the code of other vendors. If I create a program which uses 5 different libs and some 1000 lines of my own code, writing my own code in a way that it fits to the libs is possible - but this doesn't solve the problem of fitting those 5 libs together if all those are written independently. To make code 'fit' it has to use the same conventions, similar interfaces and standards. This works much better if the language enforces it.

One example is garbage collection: When I used C++ there where multiple methods of doing memory allocations. Many used reference counting, but since there was no standard implementation, each lib used their own one. But how can you combine two libs which both have their own ref-counting scheme? In Java this was a non-problem because the language has solve the gc problem.

But there are other examples. Like having standard libs for most of the basic stuff. If two libs use their own array implementations you can't simply use an array from one lib in the other. But if the standard libs provide something it's unlikely that every vendors creates it's own solution.

But it continues on a higher level: If two libraries use different methodologies to solve a similar problem you will get an impedance mismatch if you try to use them together. Maybe one is written in a more procedural and the other in a more object oriented way: Now you need lots of boilerplate code to make such code work together which in turn makes reuse harder.

Java tackled this problem by removing abilities from the language to enforce a certain way of programming. If there is only one sensible way to do something (because other ways are artificially made much more difficult) the programmer may curse the language for this in the moment, but at a later time he may be happy about it because it allowed him to reuse the code in a different application. And that possible because the language enforced a certain way of solving things.

And if independent programmer creates libraries without even knowing in the moment who will use the libraries later, it can really help if they are forced to use a certain methodology, even if it may hurt at the moment.

So while Java really has it's downsides, in the regard of code reuse it really made a lot of progress compared to many earlier languages. So if we want to create better languages we always have to consider this lesson we learned from Java: Having a powerful language alone isn't enough to gain productivity if the language neglects code reuse. Even a less powerful language can fly ahead a more powerful one if it encourages and eases the creation of reusable code.

But time has moved ahead and many ask if its possible to reach the goal of better code reuse AND have a more powerful and more concise language than Java? I'm sure it is, but only as long as we have the idea of code reuse in mind. Creating a language with only 'clarity', 'conciseness' or 'power' in mind isn't enough, we always have to think about how it's possible to enforce the creation of reusable code in this language. Yes, we need to enforce it. Not because programmers are stupid or deliberately write un-reusable code, but because only by enforcing we can be sure that two teams of developers who don't know about each other can create code which will fit together later if reused by a third team of developers. We simply need rules to make their work fit together.

But this leads immediately to a conclusion: Multi-paradigm-languages won't work. While it looks as a good idea to give programmers more freedom to express their ideas, this in turn leads do code which is to different to fit together.

(I suspect that this is the prime reason why Lisp never made a real breakthrough - but also why there are some success stories with Lisp. If you don't need to reuse code (for example if you work in a new field where simply no code exists) Lisp can give you a big productivity gain and create a nice success story. But if a language like Java can play out it's code-reuse card than the gains are irrelevant because the Java programmer simply puts some libs together while the Lisp developer is still thinking about which libs there are on the market and if it's possible to integrate them into one design or do a new implementation instead).

But multi-paradigm isn't the only 'no-no'. Making languages to customizable is another one: If a language can be easily customized by macros, reflection, template meta-programming etc., this can also reduce code reuse: If we want to integrate two libraries which both use and rely on lots of those customizations, it's quite probable that those customizations won't fit together. It can work but often it won't.

This is not only true for 'real' meta-programming like macros or reflection, it can also happen with 'to flexible abstractions'. Lets take a short look at Haskell's monads: They are very powerful - but this leads to problems. If you have code which uses a A-monad and want to fit it together with code which runs in a B-monad, you get problems. And if some 'monad-free' code requires to be run into some monad later you maybe have to rewrite it completely, even if only a very small portion need access to the monad. This can be quite annoying if you have to rewrite your own code - but if you have to reuse 3rd party code it can even render it impossible.

The problem here is not the monad concept itself, its the choice you have to use it or not. The choice creates the possibility to go different way - and using the wrong way can lead to lots of rewrites or reimplementations you have to do instead of simply reusing existing code.

So the secret of successful code reuse is to "removing choice". But thats something which seems unswallowable for many programmers, especially those who consider them selfs 'hackers'.

If you are one of those, let me ask you a question: Would you like a game like chess more if there are no fixed rules and you could do everything? I doubt it, the fixed rules are just the reason why chess is interesting: You have to solve problems in the context of a fixed and rather limited set of rules. If you could simply win by hitting your opponent over the head with a club, chess would loose lots of it's appeal, wouldn't it? So just look at the 'choice problem' in a different way: If there is a limited set of ways to solve a problem, can't this not even makes it more interesting to solve it?

And to the language designers: Isn't it an interesting problem to create features which are expressive AND lead the programmer in a certain direction? Simply putting everything into a language is not difficult (think of Homer in the Simpsons Episode where he designed a car). But it's also simply just to remove things: Creating a language based on a single easy concept is simple too. And a dynamically typed language is much more simple to design than a language with a good static type system. If you really like the challenge then design something new, something different, something difficult.

A language which allows for good code reuse don't have to be simple, it has to force the user to solving problems in a certain way without limiting him to much. This sound like a contradiction and yes it is, but thats the difference between theory and practice: We always have to do compromises or we create things which are good in theory but unusable in practice.

12 comments:

Well, you are missing one important fact: each program is written because its different. Do you see what I am saying? If you already have the "main" parts for something, then you are not really programming anything, you are just putting things together. Programming is about making those parts. And no matter how many parts you make, we will still have problems that need more parts, a different kind of parts.

I'm not talking about the 'perfect programming language'. I'm talking about ways to improve productivity. Like finishing a project in 5 man years instead of 20. Or maybe even in 2. Or 1.

If a project is equal to a previous one, code reuse is easy: It's called 'buying software from stock'. Most people have used this way of code reuse.

But if the problem is different it's also more difficult to reuse code. It it's even possible? Yes, for sure. Just think about a SQL based database server: Instead of having to write code to read and write data from/to disk, creating and maintaining indices and writing complex code to query data you simply reuse the code of the RDBMS and can solve your problem in a much shorter way. Instead all those code all you need now are some simple SQL queries. Or don't you believe that most of todays webapps would take much longer to develop if the developers had to write all this stuff themself?

Sure, the webapp itself is different, but the database access is similar and so we can reuse code. But that's also true for parts of the webapp itself: Many use a 3rd party ajax lib instead of write a new one them self, some use PHP, others Java, others Python, Ruby etc. to make their live more easy instead of coding everything directly in assembler. And they don't only use the language, they only use lots of libs. From simple string handling to big frameworks: Code reuse everywhere.

If you were right all this would be useless. But it isn't. And we need to find ways to make it even easier if we want to be able to build bigger and more solid software systems.

Yes, it may be possible to overcome the problem by writing all code in way that it uses monadic return values ('use do everywhere'). But this would in the result make Haskell a different language. And it's also not done in the Haskell standard libs. Please take a look at the reddit discussion to my 'experiences with Haskell' article here

@Weapon Liu:

Maintainance costs can by most efficiently lowered by code reuse. If you can't reuse your code and have to write multiple, slightly different implementations, all of them have to be maintained separately which would in turn massively increase the maintenance costs.

And if you use 3rd party code you outsource the maintenance which can also lower your costs. Because the costs to maintaining this 3rd party code are now shared between all the users, you only have to pay a part of it.

So reusing code is the best way to lower maintainance costs.

@Anonymous:

The chess metaphor was only directed at those 'hackers' which seem to see it a a personal insult to be forced by a language to use a certain way of solving a problem, because this would limit the fun. If your only goal is productivity this don't addresses you.

You've hit the nail on the head with regards to code reuse and limiting options. Why can't we express ourselves in all the ways we want? Because we are programmers, not expressionist painters. I remember how I've been frustrated by this with the theorem prover HOL, where the moment you stepped out of the standard proofs, people would begin by defining the fixities of their own operators, and in effect devising their own little language which looked nothing like regular HOL. It's almost like defining { as BEGIN and } as END in C (as I think I have seen and old C text suggest as use for the preprocessor!)

Code reuse as you describe it isn't everything, though. There is also code adaption, which is the form of code reuse where you have to reuse your own code by modifying it slightly to fit it to new or more precise requirements.

Java has shown itself to work great for team programming, because it's a language that comes with a standard way of doing things, both in the form of language constraints and in a culture of OO design, design patterns + specialised frameworks for things that are outside the language's (most obvious) strong points.

Haskell could perhaps, perhaps, become as good, or even better, because it isolates potential reuse problem areas so agressively. But I'm not sure how well it works for team programming and code adaption. They say that GHC's innards aren't easy to understand even for experienced Haskell programmers.

These problems will probably improve when some "best practices" and design patterns are soundly established, and when Haskellers learn to be careful with the more powerful features, just as C programmers learned to be careful with the powerful preprocessor... BTW, for the monad problem you cited, I believe there is a common pattern which works around it, namely to write everything as purely as possible, and only lift it into the monad(s) when needed.

I can't be sure whether that works, because I haven't written all that much Haskell yet, but I think it was in haskell-cafe that a similar case came up.

After reading (and participating) in such discussions for years, I am beginning to question some of the fundamental assumptions.

1) The concept of "equally skilled" is often used in the sense of"having an equal level of skill" rather than "posessing exactly the same skillset". I don't think it's possible to put programming compentence in a single linear order.

2) In the "real world" languages don't exist in a vacuum, but rather exist as part of a community or culture, comprising people with shared concepts and experiences. So it is with programming languages. Think of the differences in approach which would be taken by different highly-skilled programmers who worked in LISP, Java, Haskell, and SQL.

I certainly believe that one can find specific cases of two language where one facilitates better programming for a particular class of problems than the other. But even then, one can probably find another class of problems/tasks for which the second has advantages. In addition, the environment/context of the programming task is relevant. One programmer, working alone for pure exploratory purposes might find language X more suitable than the same programmer working in a team setting to prepare production code which has to run stably 24x7 (in which case the same programmer might prefer language Y).

However, such discussions often omit even the most general boundaries over what types of problems and environments are being considered. This leaves me with the sense that the discussion is about whether a plumber with a monkey wrench can build a house more quickly than an electrician with a voltmeter.

I agree that code re-use is a very good measure of productivity, but I strongly disagree with your take on Haskell. I've read the discussion on reddit you point to - and it seems that you're still thinking too much in the imperative style. It does take much longer to become proficient enough in Haskell to write good code than it does in most other languages I've tried, but once you do then you (or at least I :) find that the level of reuse outstrips most other languages.

I don't think that we should continue to distinguish between programming languages and programming environments. The usability and productivity of a system depends on both. And the days of command line compilers and primitive editor should be history.

@Harald Korneliussen:

I don't see a reason why to distinguish between code reuse and code adaptation. In fact code adaptation is nothing than 'imperfect' code reuse. If only adaptation instead of reuse is possible, this is a sign of a weakness in the language.

I'm following the development in the Haskell community and think that there is lots of room for improvement. I don't know for sure, which pattern you mean, but I suspect I know what you're talking about. But even this don't helps by the transformation of 'pure' code into code which is 'monad compatible'. It can only help to make code which is generically adaptable to a wide range of monads. That the reason why I thought about making 'do everywhere' by the compiler which could maybe solve the problem. But until now I've only heard a lot of objections against it - and maybe those are right, but I'm still a bit skeptical about them.

@joel.neely:

My intend with the article was to look at the reasons for productivity. While there are lots of discussions which are about code beauty, conciseness and powerful abstractions, I don't think that code reuse is to underrepresented in those discussions. I'm sure that there are a lot of people who disagree, but from my experience in the end it always boils down how much code you have to write yourself and how much you can reuse.

But code reuse is also about language specific features which reduce boilerplate code, so some language can be more productive even without big libraries - but not because of some esoteric 'code beauty' but simply because the compiler can create more code by itself than the other language.

About the 'equally skilled': Sure, it is a bit problematic, but I wanted to simply express that we shouldn't compare a newbie programmer in language A with an expert in language B, because we want to compare the languages, not the abilities of certain programmers.

@Kris Shannon:

I am no 'functional programming newbie'. I've very accustomed to writing code in functional style, but it's not always easy. Not because I am not skilled enough, but simply because there are problems with inherent imperativeness. In Haskell those problems can be solved in three ways:

- creating a new functional algorithm. This often works, but it costs much time and often leads to less fast code, If you look at the Haskell examples in the shootout, you can often see that those examples are less readable and still perform worse than their counterparts in imperative languages.

- using the state chaining approach: This is a quite functional approach which is really natural, but it requires lots of boilerplate code: Every function gets an additional parameter and has to return and chain it thru calls. This makes programs less clear and it also has the problem that you have to rewrite all your code if you want to add this style of programming later. The 'Clear' languages uses this approach for mutation and if you look at code you see that it reduces readability and creates lots of redundancy.

- use monads. I've written a lot about it and I still think it holds: Having to rewrite huge amounts of code in the moment you see that some function deep in the call hierarchy requires state is no real solution. And the rewriting isn't even trivial and can take some time sometimes. This totally eliminates the advantage of pure Haskell code has for me, because you never know if you can really use it or if you have to rewrite it later. Also normal state monads can cost some performance and often increase memory consumption. This can be solved with MVars or IORefs but I simply don't like this because its totally against the concept of Haskell (like using casts in Java - which also was a big pet peeve of me but was fortunately solve in Java 5 to a large extent).

Have you experience with Haskell in a bigger project? How do you cope with those problems?