About ten years ago, I chose to develop a commercial automated software testing tool with Tcl and Tk. This post explains that decision and its consequences.

Despite testing tools that tout “visual programming” interfaces, the primary user interaction with a software testing tool is through its test scripting (programming) language. If this isn’t a fully featured programming language, the user cannot create effective, scalable, and maintainable test suites. Many built-in testing functions are also needed. There are essentially two ways to do this: as a framework/library added to an existing language or as elements of a domain-specific language (DSL.)

Being a bootstraping startup, we couldn’t afford to develop yet another test scripting language (a DSL) and thought that would be ridiculous anyway, given the ease of simply adding a framework to an existing language. Also, testing DSLs are well-known lock-ins and will be resisted by any savvy customer — which our target customers certainly were.

The platforms of our target customers were a very mixed bag, so being able to run and/or interface with multiple platforms was important.

We wanted a language that could produce binaries for small embedded processors as well as other kinds, support distributed testing, was easy to learn, and could be easily extended to get into the physical layer of the system under test (this is necessary for test control and observation.)

So which base language to use? Knowing the intense tribalism of languages and platforms, I wanted something that was neutral. If I used Java, I could kiss the Microsoft world goodbye. If I used C#, I could kiss everything else goodbye, and so on.

Tcl was already in use for many kinds of test automation. Its open source development cadre appeared to be stable and competent. It had been incorporated to support scripting in other successful products. It certainly met all of the technical criteria. Tcl also had a very large and attractive library of platform-abstracted features. It could achieve multi-platform coverage out of the box. Cisco had something like 15 MLOC of Tcl running in commercial release in its IOS, so it was not an unproven lab rat. Lots of good books and training were available and there was an active UseNet discussion group. The BSD license meant we did not have to be married to the doctrinaire impositions of GPL and its assorted poison pills for commercial licensing.

As part of my technical due diligence, I talked to a lot of Tcl users to find out what it was like to live with. I learned that Tcl quickly and often resulted in a big ball of mud. Why? The perception that “scripts” were throw-aways for which good design was over-engineering, its LISP-ish flexibility, and primitive modularity were the common factors. However, as our app would support test suite management and implement the test object pattern (like Junit), I didn’t see that as a problem. We experimented with Incr Tcl and found it provided good support for the basic elements of object-oriented programming, following the familiar C++ object idiom. With Incr Tcl, our tool could support test objects, so I concluded we could achieve a scalable and maintainable testing platform with Tcl.

So, we chose Tcl. To make our product platform-agnostic, we then went all-in and developed the product front end in Tk and Tcl. I personally did a lot of Tk and Tcl programming.

Over about four years of this, things went from bad to worse.

The syntactic and conceptual worm holes that give Tcl much of its power are confusing, distracting, and error-prone if you’re not a devotee. As Tcl was a means to an end, not a mission, this was a constant source of vexation for us and our customers. But, you don’t get any sympathy about this from the Tcl community, who take it as a badge of honor (they aren’t much different than other language tribes in this respect.) The sophisticated support now routinely available in development environments like Eclipse or Visual Studio isn’t available for Tcl – when I first saw automated refactoring and commenting in Visual Studio after years of using a 1990s text editor for Tcl coding, I was depressed.

But, as the core technology of a commercial tool, Tcl was the wrong choice.

In any complex app, sooner or later, you need reliable, efficient, and controllable support for concurrent processing. Ideally, this is provided by your platform and seamlessly integrated into your programming language. It has taken the Java world and Microsoft the better part of fifteen years to get this right. The Tcl support for this was okay for toy applications, but didn’t scale for us. The only alternative was to instantiate multiple interpreter processes, for which we then had to develop an IPC protocol. We did that, but it was ugly and inefficient.

As we pushed the Tcl interpreter to its limits (often related to concurrency), it started to break in ways that cost us sales and customers. Although we were paying a third party for support and made it abundantly clear to them these bugs were an existential problem, all we got were partial answers and some rude innuendo about our intellectual limitations. Our problems simply weren’t of interest to the maintainers. Finally, we were told to fix it ourselves, because after all, the Tcl interpreter is open source. We were maxed out on product development, had no such expertise, and branching the core tree would have been insanity. We were simply stuck and had to devise work-arounds for customers.

The Tcl talent pool is very small. When I tried to hire Tcl devs, instead of getting inundated with resumes, I ended up seeing the same several dozen names. The back end of our system was implemented in C++ and C. We tried to cross-train, but that didn’t work. As a small startup team, this was the source of many, many headaches. Had the product taken off, it would have been a show-stopper.

Finally, the requirement for customers to learn yet another programming language and one they routinely perceived as weird didn’t ever help. Customers didn’t care that it was platform-agnostic: they wanted to use what they knew.

I’ve taken many lessons from all of this. In the main: for a system of any criticality and complexity, a niche language like Tcl is a very bad choice. I’ll never forget what one VC said when I mentioned that we were using it: “Tcl? Sounds like a bad acid trip.” I nearly boiled over at the time, but in retrospect, he was spot-on.

If I were starting today, I develop all core functionality in C++, with a remote procedure call API. I would provide an execution framework and API that allowed test objects to be written in C/C++, Java or C#.

There are similar stories about Lisp. Today there is threading support in Tcl (not only “events”). Also there is support in Eclipse, but why is it needed when there is Vim/Emacs? 🙂
IMHO Tcl is “glue” language, really. But today, in 2012, I don’t know how it’s good for massive multiprocessing, and have not such experience. Robert, what were the problems with “C-in-core + Tcl-as-glue”?

I came across this post 5 years late but am compelled to comment as an admitted Tcl fan 🙂 I ran engineering at two startups where Tcl was used extensively along with C/C++ for the “hot spots”. Tcl drove our native user interface in the first instance and our Web back end in the other. In both places, we used Tcl extensively in testing. Being network focused, our test processes involved distributed testing with coordinated agents across Windows, Linux, Solaris and HP-UX. We were hugely happy with the results in terms of productivity.

Even in terms of scale, historically there are many examples of Tcl in production. AOL’s web farms were Tcl powered as was NBC’s GEnesis broadcast network studios. In modern times, Argonne National Labs petascale computing Swift/T framework has Tcl underpinnings. None of these are “toy” applications.

Having said that, I will agree with the other points that Tcl programmers are not easy to find and cross training is not necessarily simple (Lisp programmers would more comfortable with Tcl programming idioms but those are not easy to find either!)