Scalability: Dynamic and Static Programming Languages

In the wake of the demise of Chandler personal information management project, a discussion has occurred on TSS about the scalability potential of dynamic languages. Ted Neward attempted to go beyond language quarrel in order to provide some structured insights on this issue.

First of all, Neward emphasizes that a language scaling can be understood in terms of "size of project, as in lines-of-code" or in terms of "capacity handling, as in "it needs to scale to 100,000 requests per second"". The two need to be delineated because even though both are important, they do not always go hand in hand: assembly languages or C, for instance, are conducive to capacity scalability but not to size scalability.

Neward defines size scalability in terms of “a language’s ability to extend or enhance the complexity budget of a project”. He refers to Mike Clark's concept that implies that “every project has a fixed complexity budget, and the more you spend on infrastructure and tools, the less you have to spend on the actual business logic." And according to Neward, this point is at the core of the debate about the scalability capacity of static and dynamic languages.

The adherents of static languages argue that type-safety check results in fewer work for the programmer “as an automated tool now runs through a series of tests that the programmer doesn't have to write by hand”. Moreover IDE support that exists for these languages provides powerful tools for refactoring that are “widely believed to be impossible on dynamic language platforms.” Dynamic language proponents, however, put forward the fact that “the dynamic nature of these languages means less work during the creation and maintenance of the codebase, resulting in a far fewer lines-of-code count […] thus implicitly improving the scale of a dynamic language”.

According to Ted Neward, it is true that “dynamic language programmer can typically slam out more work-per-line-of-code than his statically-typed compatriot”, but he will most probably need to produce more unit tests given that dynamic languages do not use a compiler that, in a way, ensures a certain number of systematic tests.

As for IDE refactoring, Neward references Dave Thomas who admits that refactoring support of Eclipse, for example, is limited for dynamic language platforms given that type information is missing until runtime. Thomas highlights, however, that “simple search-and-replace across files, something any non-trivial editor supports, will do many of the same refactorings as Eclipse or IntelliJ provides, since type is no longer an issue.” And Neward emphasizes that he expects IDE vendors develop tooling specifically designed for dynamic languages:

[… ]it's relatively easy to imagine that the IDE could be actively "running" the code as it is being typed, in much the same way that Eclipse is doing constant compiles, tracking type information throughout the editing process.

Moreover one should not forget that “the original refactoring browser was an implementation built for (and into) Smalltalk, one of the world's first dynamic languages”, as it was highlighted in the TSS debate.

With respect to capacity handling scalability, Ted Neward stresses its importance because “a project that cannot handle the expected user load during peak usage times will have effectively failed just as surely as if the project had never shipped in the first place.”

Dynamic language opponents argue that these languages cannot scale in terms of capacity handling, because they are “built on top of their own runtimes, which are arguably vastly inferior to the engineering effort that goes into the garbage collection facilities found in the JVM Hotspot or CLR implementations.” Dynamic language supporters respond that “there are plenty of web applications and web sites that scale "well enough" on top of the MRV (Matz's Ruby VM?) interpreter that comes "out of the box" with Ruby”.

Ted Neward, at his turn, points out that “with the release of JRuby, and the work on projects like IronRuby and Ruby.NET, it's entirely reasonable to assume that these dynamic languages can and will now run on top of modern virtual machines like the JVM and the CLR”:

While a dynamic language will usually take some kind of performance and memory hit when running on top of VMs that were designed for statically-typed languages, work on the DLR and the MLVM, as well as enhancements to the underlying platform that will be more beneficial to these dynamic language scenarios, will reduce that.

Ted Neward seems to believe that there is a window of opportunity for improving scalability of dynamic languages in terms of both project size and capacity handling by adapting tools and optimizing runtime environments to their specificities. More generally speaking, he rather opposes the dichotomy between static and dynamic languages. He highlights the fact that some applications, Excel for instance, successfully combine the two "by creating a core engine and surrounding it with a scripting engine that non-programmers use to exercise the engine in meaningful ways.” Neward sums up rephrasing Karl Marx: "From each language, according to its abilities, to each project, according to its needs."

With languages supporting both static and dynamic approaches, you can easily scale.Boo Programming Language is one of them, built on the CLI (.NET / Mono), you have python-inspired syntax with the choice of static-typing (with performance comparable to C#) OR dynamic-typing (still quite fast) for each of the variables or members _you_ decide.

So easy to imagine, its almost like it already exists....
by
Francois Ward

it's relatively easy to imagine that the IDE could be actively "running" the code as it is being typed(...)

Thats exactly how VS2008's javascript intellisense works. Not completly perfect however, since if you do something too weird, or the code isn't compiling, everything goes down...maybe in the years to come it will be better.

After many years of Smalltalk, I find it gratifying that, as a 'mainstream' language, C# is adopting some of the features of Dynamic Languages such as Closures; Extension Methods (add methods to base classes); Type Inferencing; Metaprogramming (attributes/annotations); etc. Granted these still require too much typing when compared to pure dynamic languages and some truly dynamic features (e.g. ability to define instance specific behavior, or Monads [the ability carry state ‘on the stack’]) are probably very hard to replicate.

However, one the most useful features for me is Partial Classes. This is a very simple, almost overlooked feature where the source of one class can be split into multiple files. The most useful effect is that it allows generated and handcrafted code to co-exist peacefully. For example, if I want to add my own GetHash() method to a generated class (such as by processing a WSDL or XML Schema), I can simply do that in a separate file from the generated one. The generated file may be overwritten many times through the course of evolving the interface, model, UI, etc. without affecting the handcrafted code. Visual Studio exploits this feature extensively.

I would put type inferencing and partial classes on top of the list of features that Java should add support for.

Well, Visual Basic has both dynamic and static typing too. However I am not sure how practical this can be.I guess it is like security, you can't have a small security hall. Either a system is secured, or not!A mixed model of typing can break type inferrence model . I like C#'s dynamic block i guess, at least it offers a nicer syntax for doing dynamic programming and reflection (without passing method names as strings for example). However, it doesn't break, as far as I know, the strongly statically typed system and the type inference.

Whew! Those discussions are mostly just folks tossing dirt, as Ted Neward writes:

I find it deeply ironic that the news piece TSS cited at the top of the discussion claims that the Chandler project failed due to mismanagement, not its choice of implementation language. It doesn't even mention what language was used to build Chandler, leading me to wonder if anybody even read the piece before choosing up their sides and throwing dirt at one another.

Chandler has both a Python client and a Java server side component! Then the TSS article says in the introduction:

Using the Eclipse project as its baseline and Plone for comparison, the author begins to lay the foundations for discussions on issues projects face as they grow: code evolution, modularization, refactoring, etc.

The linked article in this case is contains muddled nonsense such as, "Plone, Zope and Python doesn't have a sophisticated module system (this may have been fixed with the Archetypes system)".

There are of course fascinating stores from all of these projects in regards to size-of-project scaling: Eclipse, Chandler, Zope and Plone. Some interesting technical comparisons could be made -a comparison of the Eclipse's use of Interfaces and Adapters to the those used in the Zope Component Architecture would make for interesting discussions in size scalability and comparisons of dynamic/static languages.

But perhaps more interesting is comparisons of project management. Eclipse came from a focused team of experienced developers with a well designed architecture and goal, Chandler faltered because it had "no objective basis for decision making" (dirtsimple.org/2008/01/programming.html), Zope's interesting project management story comes from how you do (and don't!) migrate a large user base from a web framework designed in 1996 to re-invent itself to stay relevant im 2008, and Plone is a story of large project with a very wide scope (Content Management) made entirely from contributors working in their spare time and how the leaders of the project do (and don't!) attempt to herd the community in productive directions (plone.org/events/2008-summit).

As long as one paradigm or the other is the dominant one, and not some bizarre hybrid, I think they probably can coexist in a language. However, to beat a dead horse further, all those checks you need to do in Dynamic languages can really get in the way of performance. I mean REALLY get in the way of performance. I've found VB.NET to run as much as 20% slower than C# apps that did the same thing on a line by line basis (admittedly ,these are smallish apps I wrote simply to do the comparison). Why? All those type checks make for more code that needs to be executed. The type checking gets significant when you start dealing with multiple layers of properties (properties of properties of properties). Just decompile the IL of equivalent C# and VB apps... you only need to look at the length of the files produced to see what I mean. For that reason, I think I'd prefer static as the default with the ability to do dynamic when needed.

One thing I think we should avoid is making any one language so complex that no one can possibly understand all the implications of all the features. If that happens, I see no reason to not just use C++.

If your argument against static typing is the amount of code you have to type (fair enough! though not good enough for me personally), then it is worth giving Scala a try - there is really very minimal type information you need to type - and it is completely statically type checked.

With the advent of static typing with type inferencing now shown to work very well, I personally don't see the point creating more dynamic languages...

Note to Faisal Waris: You might have known this, but all of your desired features are present in Scala including partial classes (well, kind of, they are quite a clean way of doing that, called Traits).