Large codebases are more difficult to maintain when they are written in dynamic languages. At least that's what Yevgeniy Brikman, lead developer bringing the Play Framework to LinkedIn says in a video presentation recorded at JaxConf 2013 (minute 44).

This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.

2

Have you cross checked with other sources?
– mouvicielDec 17 '13 at 10:16

38

I'm the author of the Play Framework talk mentioned in the question. I was going to write a reply, but Eric Lippert's answer below says it better than I could have, so I upvoted it instead and recommend everyone reads it.
– Yevgeniy BrikmanFeb 9 '14 at 21:00

@Zubair Static does not mean boiler plate code. Have you checked out Scala? You experimented with Clojure; thats why your view is biased.
– Jus12Nov 26 '14 at 11:43

2

Does this comparison account for the fact that a large codebase tends to be become larger when written in a statically typed language. In other words, comparing a 1 million line codebase in Java vs Ruby is a biased comparison, since the Ruby probably does a lot more. The correct comparison would perhaps be a 1 million line Ruby codebase vs a 5 million line Java codebase. Is the Java codebase still more maintainable? I suppose not.
– Vaddadi KartickApr 23 '16 at 11:12

6 Answers
6

I have been on the design committees for JavaScript (a very dynamic language), C# (a mostly static language) and Visual Basic (which is both static and dynamic), so I have a number of thoughts on this subject; too many to easily fit into an answer here.

Let me begin by saying that it is hard to maintain a large codebase, period. Big code is hard to write no matter what tools you have at your disposal. Your question does not imply that maintaining a large codebase in a statically-typed language is "easy"; rather the question presupposes merely that it is an even harder problem to maintain a large codebase in a dynamic language than in a static language. That said, there are reasons why the effort expended in maintaining a large codebase in a dynamic language is somewhat larger than the effort expended for statically typed languages. I'll explore a few of those in this post.

But we are getting ahead of ourselves. We should clearly define what we mean by a "dynamic" language; by "dynamic" language I mean the opposite of a "static" language.

A "statically-typed" language is a language designed to facilitate automatic correctness checking by a tool that has access to only the source code, not the running state of the program. The facts that are deduced by the tool are called "types". The language designers produce a set of rules about what makes a program "type safe", and the tool seeks to prove that the program follows those rules; if it does not then it produces a type error.

A "dynamically-typed" language by contrast is one not designed to facilitate this kind of checking. The meaning of the data stored in any particular location can only be easily determined by inspection while the program is running.

(We could also make a distinction between dynamically scoped and lexically scoped languages, but let's not go there for the purposes of this discussion. A dynamically typed language need not be dynamically scoped and a statically typed language need not be lexically scoped, but there is often a correlation between the two.)

So now that we have our terms straight let's talk about large codebases. Large codebases tend to have some common characteristics:

They are too large for any one person to understand every detail.

They are often worked on by large teams whose personnel changes over time.

They are often worked on for a long time, with multiple versions.

All these characteristics present impediments to understanding the code, and therefore present impediments to correctly changing the code. In short: time is money; making correct changes to a large codebase is expensive due to the nature of these impediments to understanding.

Since budgets are finite and we want to do as much as we can with the resources we have, the maintainers of large codebases seek to lower the cost of making correct changes by mitigating these impediments. Some of the ways that large teams mitigate these impediments are:

Modularization: Code is factored into "modules" of some sort where each module has a clear responsibility. The action of the code can be documented and understood without a user having to understand its implementation details.

Encapsulation: Modules make a distinction between their "public" surface area and their "private" implementation details so that the latter can be improved without affecting the correctness of the program as a whole.

Re-use: When a problem is solved correctly once, it is solved for all time; the solution can be re-used in the creation of new solutions. Techniques such as making a library of utility functions, or making functionality in a base class that can be extended by a derived class, or architectures that encourage composition, are all techniques for code re-use. Again, the point is to lower costs.

Annotation: Code is annotated to describe the valid values that might go into a variable, for instance.

Automatic detection of errors: A team working on a large program is wise to build a device which determines early when a programming error has been made and tells you about it so that it can be fixed quickly, before the error is compounded with more errors. Techniques such as writing a test suite, or running a static analyzer fall into this category.

A statically typed language is an example of the latter; you get in the compiler itself a device which looks for type errors and informs you of them before you check the broken code change into the repository. A manifestly typed language requires that storage locations be annotated with facts about what can go into them.

So for that reason alone, dynamically typed languages make it harder to maintain a large codebase, because the work that is done by the compiler "for free" is now work that you must do in the form of writing test suites. If you want to annotate the meaning of your variables, you must come up with a system for doing so, and if a new team member accidentally violates it, that must be caught in code review, not by the compiler.

Now here is the key point I have been building up to: there is a strong correlation between a language being dynamically typed and a language also lacking all the other facilities that make lowering the cost of maintaining a large codebase easier, and that is the key reason why it is more difficult to maintain a large codebase in a dynamic language. And similarly there is a correlation between a language being statically typed and having facilities that make programming in the larger easier.

Let's take JavaScript for example. (I worked on the original versions of JScript at Microsoft from 1996 through 2001.) The by-design purpose of JavaScript was to make the monkey dance when you moused over it. Scripts were often a single line. We considered ten line scripts to be pretty normal, hundred line scripts to be huge, and thousand line scripts were unheard of. The language was absolutely not designed for programming in the large, and our implementation decisions, performance targets, and so on, were based on that assumption.

Since JavaScript was specifically designed for programs where one person could see the whole thing on a single page, JavaScript is not only dynamically typed, but it also lacks a great many other facilities that are commonly used when programming in the large:

There is no modularization system; there are no classes, interfaces, or even namespaces. These elements are in other languages to help organize large codebases.

The inheritance system -- prototype inheritance -- is both weak and poorly understood. It is by no means obvious how to correctly build prototypes for deep hierarchies (a captain is a kind of pirate, a pirate is a kind of person, a person is a kind of thing...) in out-of-the-box JavaScript.

There is no encapsulation whatsoever; every property of every object is yielded up to the for-in construct, and is modifiable at will by any part of the program.

There is no way to annotate any restriction on storage; any variable may hold any value.

But it's not just the lack of facilities that make programming in the large easier. There are also features that make it harder.

JavaScript's error management system is designed with the assumption that the script is running on a web page, that failure is likely, that the cost of failure is low, and that the user who sees the failure is the person least able to fix it: the browser user, not the code's author. Therefore as many errors as possible fail silently and the program keeps trying to muddle on through. This is a reasonable characteristic given the goals of the language, but it surely makes programming in the larger harder because it increases the difficulty of writing test cases. If nothing ever fails it is harder to write tests that detect failure!

Code can modify itself based on user input via facilities such as eval or adding new script blocks to the browser DOM dynamically. Any static analysis tool might not even know what code makes up the program!

And so on.

Clearly it is possible to overcome these impediments and build a large program in JavaScript; many multiple-million-line JavaScript programs now exist. But the large teams who build those programs use tools and have discipline to overcome the impediments that JavaScript throws in your way:

They write test cases for every identifier ever used in the program. In a world where misspellings are silently ignored, this is necessary. This is a cost.

They write code in type-checked languages and compile that to JavaScript, such as TypeScript.

They use frameworks that encourage programming in a style more amenable to analysis, more amenable to modularization, and less likely to produce common errors.

They have good discipline about naming conventions, about division of responsibilities, about what the public surface of a given object is, and so on. Again, this is a cost; those tasks would be performed by a compiler in a typical statically-typed language.

In conclusion, it is not merely the dynamic nature of typing that increases the cost of maintaining a large codebase. That alone does increase costs, but that is far from the whole story. I could design you a language that was dynamically typed but also had namespaces, modules, inheritance, libraries, private members, and so on -- in fact, C# 4 is such a language -- and such a language would be both dynamic and highly suited for programming in the large.

Rather it is also everything else that is frequently missing from a dynamic language that increases costs in a large codebase. Dynamic languages which also include facilities for good testing, for modularization, reuse, encapsulation, and so on, can indeed decrease costs when programming in the large, but many frequently-used dynamic languages do not have these facilities built in. Someone has to build them, and that adds cost.

@ThiagoSilva: Languages are purpose-built. If you are building a language for programming in the large you are highly likely to add all the features that make programming in the large cheaper, which then entails a lot of "ceremony" and restrictions in what you can write. If you are building a language for, say, making the monkey dance when you mouse over it then you want a one-line program to be one line. Dynamic typing is natural for such a scenario because it is gives a lot of flexibility to the developer.
– Eric LippertDec 18 '13 at 15:53

18

While I generally agree with your opinions, you're way off here. Modern JavaScript is very very different from what Microsoft did in 1996-2001. You have module systems (AMD, CommonJS) , you have encapsulation by convention like in Python (or closures, but I don't find that necessary), there are ways to annotate storage on variables (by using getters/setters for example) and the inheritance system is a lot better understood than it was 13 years ago. It's trivial to build strong robust applications in JavaScript today. Your example should read "let's take JavaScript from 2001 for example".
– Benjamin GruenbaumJan 28 '14 at 8:22

53

@BenjaminGruenbaum: Your criticisms are warranted; however, I would suggest to you that a close look at many large modern JS codebases finds many examples of the sorts of problems I cite; just because disciplines exist does not imply that everyone knows about them and uses them. And you might also be surprised at the number of times a JS loop contains a braek; or cotninue; statement -- perfectly legal! Doesn't do what was intended. Still got checked in.
– Eric LippertJan 28 '14 at 14:57

14

Benjamin, if you are talking about a mature development team who uses IDE, static analysis tool and CI server, you can definitely workaround the limitations of a dynamically typed language. But in a world where there are teams who haven't moved to DVCS and CI server tools, having the language enforce the rules while you are writing code is tremendously useful. And jslint is useful, but it can NEVER be as powerful as a static analysis tool targeting a statically typed language simply because there is not enough type information to analyze.
– SolutionYogiJan 28 '14 at 16:45

41

I think this post left out a major benefit of static typing- the ability to refactor. Try renaming a variable in a million line JS application or finding all references of a variable. Without static analysis your IDE cannot possibly do an effective job with such an operation.
– MgSamJun 29 '14 at 13:39

Because they deliberately abandon some of the tools that programming languages offer to assert things you know about the code.

The best-known and most obvious example of this is strict/strong/mandatory/explicit typing (note that the terminology is very much disputed, but most people agree that some languages are stricter than others). When used well, it acts as a permanent assertion about the kind of values you're expecting to occur in a particular place, which can make reasoning about the possible behaviour of a line, routine or module easier, simply because there are fewer possible cases. If you're only ever going to treat someone's name as a string, many coders are therefore willing to type a declaration, to not make exceptions to this rule, and to accept the occasional compilation error when they have made a slip of the finger (forgot quotes) or of the brain (forgot that this rating is not supposed to allow fractions).

Others think that this restricts their creative expressivity, slows down development and introduces work that the compiler should do (e.g. via type inference) or that isn't necessary at all(they'll just remember to stick to strings). One problem with this is that people are quite bad at predicting what kind of errors they will make: almost everybody overestimates their own ability, often grossly. More insidiously, the problem becomes gradually worse the larger your code base - most people can, in fact, remember that the customer name is a string, but add 78 other entities to the mix, all with IDs, some with names and some with serial 'numbers', some of which really are numeric (require computation to be done them) but others of which require letters to be stored, and after a while it can become pretty hard to remember whether the field you're reading is actually guaranteed to evaluate to an int or not.

Therefore, many decisions that suit a quick prototype project well work much less well in a huge production project - often without anyone noticing the tipping point. This is why there is no one-size-fits-all language, paradigm or framework (and why arguing which language is better are silly).

Both restrictive languages and non-restrictive languages have their high and low points, I suppose. However in my experience, having more degrees of freedom is like programming in assembly in that you can do everything, but it is difficult to do any complex programs. Nice answer.
– NeilDec 17 '13 at 11:24

37

The word you're looking for is static typing. Statically typed languages can be strict or lax, they can be strong or weak, they can be mandatory or optional, and they can be explicit or implicit, but what they all have in common is that types are facts that can be deduced from the text of the program without actually running it. "Dynamic" languages are so called because facts about the program can sometimes not be known until the program is actually running.
– Eric LippertDec 17 '13 at 15:49

3

I'd like to add that there is a hell of a difference between the types in Java (which allow easy and dangerous cast-to-Object violations of the type system) and Haskell (which requires on average one explicit type signature per five or six functions, the rest is inferred, but will spank you if you try anything funny).
– Karl Damgaard AsmussenDec 17 '13 at 16:10

7

"Others think that this restricts their creative expressivity, slows down development ..." -- I would add "introduces artificial complexity into the design" to the list. Two strong examples of different kinds of complexity that a "static typed" language can force you to cope with is (a) monads as present in Haskell; and (b) Peter Norvig showing that 16 of 23 patterns of the design patterns of that popular book are "invisible" or simpler in "dynamic typed" languages whereas in other languages they are mostly bloat working around static checking limitations: norvig.com/design-patterns
– Thiago SilvaDec 17 '13 at 16:24

3

@ThiagoSilva: Monads are not an example of complexity per se. Many people find them hard to learn, but as abstractions go they are quite simple--the difficulty is that they are also quite abstract. In fact, monads often simplify a design by making it more explicit: they just highlight things which are magical and unacknowledged in other languages. And Norvig's design-pattern article is not relevant to statically typed functional languages at all; it's not a comment about static typing in general but rather about Java-style type systems (which we can all agree are a mess).
– Tikhon JelvisJan 27 '14 at 21:51

Why don't you ask the author of that presentation? It's his claim, after all, he should back it up.

There are plenty of very large, very complex, very successful projects developed in dynamic languages. And there are plenty of spectacular failures of projects written in statically typed languages (e.g. the FBI Virtual Case File).

It is probably true that projects written in dynamic languages tend to be smaller than projects written in statically typed languages, but that is a red herring: most projects written in statically typed languages tend to be written in languages like Java or C, which are not very expressive. Whereas most projects written in dynamic languages tend to be written in very expressive languages like Scheme, CommonLisp, Clojure, Smalltalk, Ruby, Python.

So, the reason why those projects are smaller is not because you can't write large projects in dynamic languages, it's because you don't need to write large projects in expressive languages … it simply takes much fewer lines of code, much less complexity to do the same thing in a more expressive language.

Projects written in Haskell, for example, also tend to be pretty small. Not because you can't write large systems in Haskell, but simply because you don't have to.

But let's at least take a look at what a static type system has to offer for writing large systems: a type system prevents you from writing certain programs. That's its job. You write a program, present it to the type checker, and the type checker says: "No, you can't write that, sorry." And in particular, type systems are designed in such a way that the type checker prevents you from writing "bad" programs. Programs that have errors. So, in that sense, yes, a static type system helps in developing large systems.

However, there is a problem: we have the Halting Problem, Rice's Theorem and many other Incompleteness Theorems which basically tell us one thing: it is impossible to write a type checker which can always determine whether a program is type-safe or not. There will always be an infinite number of programs for which the type checker can't decide whether they are type-safe or not. And there is only one sane thing to do for the type checker: reject these programs as not type-safe. And an infinite number of those programs will, in fact, not be type-safe. However, also an infinite number of those programs will be type-safe! And some of those will even be useful! So, the type checker has just prevented us from writing a useful, type-safe program, just because it cannot prove its type-safety.

IOW: the purpose of a type system is to limit expressiveness.

But, what if one of those rejected programs actually solves our problem in an elegant, easy to maintain manner? Then we cannot write that program.

I'd say it's basically a give-and-take: statically typed languages restrict you from writing bad programs at the expense of occasionally also preventing you from writing good programs. Dynamic languages don't prevent you from writing good programs at the expense of also not preventing you from writing bad programs.

The more important aspect for maintainability of large systems is expressiveness, simply because you don't need to create as large and complex a system in the first place.

you should also include Scala, which has awesome type inference.
– Jus12Dec 17 '13 at 12:38

3

An interesting way of looking at it. Personally, I find that for large projects, type-checkers can be a life saver. They implicitly provide an extremely basic kind of unit-testing (if we can call it that - it's simply testing whether the structures agree or not). This is the kind of testing I have no time to write manually, but needs to happen when the project grows beyond what you can easily hold in your head. I suspect this happens fairly rapidly for most systems, regardless of expressiveness; how is this problem typically solved in the dynamic world?
– Daniel BDec 17 '13 at 12:47

5

Just to add to the above comment, I mean: the nature of many (medium / large) problems is such that you will need a couple of hundred entities to model it, expressive language or not. An expressive one might cut down the code by a factor of 10x or more, but it will still not be manageable without additional tooling; I'm wondering what this tooling is.
– Daniel BDec 17 '13 at 12:49

5

@DanielB: Depends on the language. Remember, dynamic != weak. In python, for example, "1" != 1, and if you try to use them interchangeably you'll get type errors at runtime. Mostly the closest you get is duck typing, though--if you have the wrong type and call a method, runtime exception. It's not anywhere near as robust as a proper static type system, but it's not untyped.
– PhoshiDec 17 '13 at 14:02

3

I'm the author of the Play Framework talk mentioned in the question. I was going to write a reply, but Eric Lippert's answer says it better than I could have, so I upvoted it instead and recommend everyone reads it. Also, remember that a "large codebase" can be "large" across multiple dimensions, including lines of code, how many people work on it simultaneously, and how long it has been around. All of these factors increase "code rot"; static typing is not a magic bullet, but rather one tool to decrease code rot.
– Yevgeniy BrikmanFeb 9 '14 at 21:06

Explicit static types are a universally-understood and guaranteed correct form of documentation which is not available in dynamic languages. If this is not compensated for, your dynamic code will simply be harder to read and understand.

Python has inline unit tests called doc tests that are automatically tested. Doc tests go further in providing documentation than types do as they give you an example of code usage.
– aoeu256May 26 '18 at 14:54

Modern ides for Python(Pydev,Pycharm) can use type inference to tell you the type of things without you having to type it explicitly. There are also ways of logging previous calls to a function/methods although its not mainstream. If you set up a breakpoint in a function you can access the locals() in the Python REPL(Pydev & Pycharm connect the REPL to the current context), and developing your application while its still running not only allows you to access all the types, but the values.
– aoeu256May 26 '18 at 15:39

Consider a large codebase including database bindings and a rich
testsuite and let me highlight a few advantages of static languages
over dynamic languages. (Some examples may be idiosyncratic and not
apply to any static or dynamic language.)

The general idea—as others pointed that out—is that the type system is
a “dimension” of your program which exposes some information to
automated tools processing your program (compiler, code analysis
tools, etc.). With a dynamic language, this information is basically
stripped and therefore not available. With a static language, this
information can be used to help writing correct programs.

When you fix a bug, you start with a program that looks good to your
compiler but has faulty logic. When you fix the bug, you make an edit
fixing locally the logic of your program (e.g. within a class) but
breaking this logic at other places (e.g. classes collaborating with
the previous one). Since a program written in a static language
exposes much more information to the compiler¹ than a program written
in a dynamic language, the compiler will help you to locate the other
places where the logic breaks more than a compiler for a dynamic
language will do. This is because a local modification will break the
type correctness of the program at other places, thus forcing you to
fix the type correctness globally before having a chance to run the
program again.

A static language enforce type-correctness of a program, and you can
assume that all type errors you encounter when working on the program
would correspond to a runtime failure in an hypothetical translation of
the program in a dynamic language, thus the former has less bugs than
the latter. As a consequence, it requires less coverage tests, less
unit tests and less bugfixes, in one word, it is easier to
maintain.

Of course, there is a tradeoff: while it is possible to expose a lot
of information in the type system and thus taking the chance to write
reliable programs, it might be difficult to combine this with a
flexible~API.

Here is a few examples of information that one can encode in the type
system:

— Const correctness the compiler can guarantee that a value is
passed “read-only” to a procedure.²

— Database schema the compiler can guarantee that code binding the
program to a database corresponds to the database definition. This
is very useful when this definition changes. (Maintainance!)

— System resources the compiler can guarantee that code using a
system resourcce only does it when the resource is in the correct
state. For instance, it is possible to encode the attribute close
or open of a file in the type system.

¹ It is not useful to distinguish between a compiler and an
interpreter here, if such a difference exists.

“all type errors you encounter when working on the program would correspond to a runtime failure”: this isn't true (which is a big argument of dynamic typing proponents). However, if there would be no runtime failure, this is for a nonobvious reason that needs to be documented. Rather than document in the form of a comment, you might as well document it in a way the compiler understands and checks. I would say that all type errors you encounter when working on the program would correspond to a runtime failure or maintenance nightmare.
– GillesDec 17 '13 at 16:07

Because static typing enables better tooling, which improves the productivity of a programmer when he tries to understand, refactor or extend a large existing code base.

For instance, in a large program, we'll likely have several methods with the same name. We instance, we might have an add method that adds something to a set, another that adds two integers, another that deposits money into a bank account, ...). In small programs, such name collisions are unlikely to occur. In large programs worked on by several people, they occur naturally.

In a statically typed language, such methods can be distinguished by the types they operate on. In particular, a development environment can discover, for each method invocation expression, which method is being invoked, enabling it to show a tooltip with that method's documentation, find all call sites for a method, or to support refactorings (such as method inlining, method renaming, modifying the parameter list, ...).

Making everything global? That's not a necessity of dynamic languages, I'd call that a badly-written codebase...
– IzkataDec 17 '13 at 16:42

1

Perhaps I should have mentioned I am talking about object oriented programming languages with dynamic dispatch. Such methods are not global, but figuring out which implementation is going to be called requires knowledge about the type of the receiver.
– meritonDec 17 '13 at 16:47

2

Showing tooltips on mouse-over was a standard feature of dynamic language IDEs, long before programmers in static languages had IDEs or even mice. Automated refactoring tools were invented in dynamic languages, heck, IDEs were invented there. Refactoring tools for dynamic languages still can do things that e.g. Eclipse, IDEA or Visual Studio can't, such as refactoring code hat has already been deployed or refactoring code that hasn't been written yet.
– Jörg W MittagDec 17 '13 at 17:27

1

I didn't claim that tooltips, IDEs or mice were invented in statically typed languages. I only claim that in an object oriented language, a function's name is in general insuffient to identify the function, and hence tooling can not know which function is being called, and display the right tooltip, or inline the right function, and so on - at least not without asking the user.
– meritonDec 17 '13 at 18:43

Modern IDEs for dynamic languages can use type inference to generate this information when the program is used like a static language program. Optional types can also help tip the IDE. In theory in a dynamic language you can log the arguments and return value of previous function call, and use this information for type inference. If you keep the program running stopped at a breakpoint, it can tell you not only the types but the values of all objects. Pydev & Pycharm lets the Python REPL access the local scope.
– aoeu256May 26 '18 at 16:00