This blog tracks development of the open source accounting and ERP software LedgerSMB. I also offer some perspectives on PostgreSQL including new features which we may find useful. Brought to you by Metatron Technology Consulting.

Thursday, February 27, 2014

In Praise of Perl 5

I have decided to do a two-part series here praising Perl in part 1 and PostgreSQL in part 2. I expect that PostgreSQL folks will get a lot out of the Perl post and vice versa. The real power of both programming environments is in the relationship between domain-specific languages and general purpose languages.

These posts are aimed at software engineers more than developers and they make the case for building frameworks on these platforms. The common thread is flexibility and productivity.

This first post is about Perl 5. Perl 6 is a different language, more a little sister to Perl 5 than a successor. The basic point is that Perl 5 gives you a way to build domain specific languages (DSL's) that can be seemlessly worked into a general purpose programming environment. This is almost the exact inverse of PostgreSQL, which offers, as a development environment, a DSL with an ability to work in almost any general purpose development tools into it. This combination is extremely powerful as I will show.

All code in this post is from the LedgerSMB codebase in different eras (before the fork, during the early rewrite, and now planned code for 1.5). All code in this post may be used under the GNU General Public License version 2 or at your option any later version.

You can see in the code samples below our evolution in how we use Perl.

This is (bad) Perl

Perl is a language many people love to hate. Here's an example of bad Perl from early in the LedgerSMB codebase. It is offered as an example of sloppy coding generally and why maintaining Perl code can be difficult at times. Note the module makes no use of strict or warnings, and almost all variables are globally package-scoped.

This is not a good piece of maintainable code. It is very hard to modify safely. Due to global scoping, unit tests are not possible. There are many other problems as well. One of our major goals in LedgerSMB is to rewrite all this code as quickly as we can without making the application unusably unstable.

So nobody can argue that it is possible to create unmaintainable messes in Perl. But it is possible to do this sort of thing in any language. One can't judge a language solely because it is easy to write bad code in it.

This is (better) Perl

So what does better Perl look like? Let's try this newer Perl code, which was added late in the LedgerSMB 1.3 development cycle, and handles asset depreciation:

This function depreciates all asset classes to a point at a specific date. There's a fair bit of logic here but it does many times more work than the previous example, is easier to maintain, and is easier to understand.

This is also Perl!

The above two examples are pretty straight-forward Perl code examples, but neither one really shows what Perl is capable of doing in terms of writing good-quality, maintainable code.

The fact is that Perl itself is a highly malleable language and this malleability allows you to define domain-specific languages for parts of your program and use them there.

Here's a small class for handling currency records. POD and comments have been removed.package LedgerSMB::Currency;use Moose;with 'LedgerSMB::PGOSimple::Role', 'LedgerSMB::MooseTypes';

Now the code above sets up a whole class including properties, accessors, and methods delegated to database stored prcedures. The class is effectively entirely declarative. The same amount of work in a similarly simple module from the 1.3 iteration (TaxForm.pm) requires around 50 lines of code, so more than double, and that's without accessor support. The 1.4-framework module for handling contact information (phone numbers and email addresses) is around 65 lines of code, with not much more complexity (so around triple). The simpler Bank.pm (for tracking bank account information) is around 36 lines so nearly double.

What differentiates the examples though is not only line count but readability, testability, and maintainability. The LedgerSMB::Currency module is more concise, more readable, and has much better testing and maintenance characteristics than the longer modules from the previous frameworks. Even without comments or POD, if you read the Moose and PGObject::Util::DBMethod documentation, you know immediately what the module does. And in such a module, comments may not be appropriate, but POD would likely not only be appropriate but take up significantly more space than the code.

How does that work?

Perl is a very flexible and mutable language. While you can't add keywords, you can add functions that behave more or less like keywords. Functions can be exported from one module to another and, used judiciously, this can be used to create domain-specific languages which in fact run on generated Perl code.

The example here uses two modules which provide DSL's for specific purposes. The first is Moose, which has a long history as an extremely important contributor to current Perl object-oriented programming practices. This module provides the functions "with" and "has" used above.

Moose, in this case works with a PGObject::Simple::Role module which provides a framework for interacting with PostgreSQL db's. This is extended by LedgerSMB::PGOSimple::Role which provides handling of database connections and the like.

The second is PGObject::Util:DBmethod, which provides the dbmethod function. It's worth noting that both has and dbmethod are code generators. When they run, they create functions which they attach to the package. Used in this way has creates the accessors, while dbmethod creates the delegated methods.

Why is this Powerful and Productive?

The use of robust code generation here at run-time allows you to effectively build modules and classes from specifications of modules and classes rather than implementing that specification by hand. Virtually all object-oriented frameworks in Perl effectively offer some form of this code generation.

A specification to code language provides a general tradeoff between clarity, expressiveness (in its domain) and robustness on one hand, with inflexibility on the other. This is the fundamental tradeoff of domain-specific languages generally. When you merge a domain-specific language into a general-purpose one, however, you gain the freedom to compensate for the lack of flexibility by falling back on more general tools when you need to. This flexibility is where the production gains are found.

Compare a framework built as a mini-DSL specification language to one built as an object model. In an object model framework one effectively has to juggle object-oriented design (SOLID principles, etc) with the desire for greater flexibility. Here, however the DSL's are orthogonal to the object model. They allow you to define the object model orthogonally to the framework, while re-using the DSL framework however you want. Of course these are not mutually exclusive, and it is quite possible to have both in a large and powerful application framework.

Other Similarly Powerful Languages

Perl is not the only language of this kind. The first example that comes to mind, naturally, is Lisp. However other Lispy languages are also worth mentioning. Most prominent among these are Rebol and Red, whose open source implementations are still very immature. These languages are extremely mutable and the syntax can be easily extended or even rewritten.

Metaprogramming helps to some extent with some of these issues and this is a common way of addressing this in Ruby and Python, but this makes it much harder to build a framework that is truly orthogonal to the object model.

A major aspect of the power of Perl 5 here are the very things which often cause beginning and intermediate programmers headache. Perl allows, to a remarkable extent, manipulation of its own internals (perhaps only Rebol, Red, and Lisp take this further). This allows one to rewrite the language to a remarkable extent, but it also allows for the development of contexts which allow for these sorts of extensions.

The key feature I am looking at here is the mutability of the language. And there are few languages which are themselves relatively mutable. Perl isn't just a programming language, but a toolkit for building programming languages inside it.

4 comments:

I love perl myself, but I would never do OOP on it. Want enterprise-class app - use something more relevant, like strong-typed languages with proper OOP support in first place, and/or frameworks and libs, but not perl. Perl is super cool for small tasks, when you want to prototype an algo, or do a log-parser server, or do some web-crowler. But OOP/Enterprise - never!

I would actually disagree with you here. The more I use it for larger-scale business tools, the happier I am with it. However, there are a couple caveats that may be obvious if you look at that last example closely.

1. We put a lot of logic in the database.

2. Perl acts largely as something which gives you an OO-like interface to the db logic in a very non-ormlike way. We're heading towards, we hope, maybe three times as much SQL as Perl.

3. Perl's primary OOP frameworks are almost, but not quite, exactly unlike classical OOP. It's very hard to map the concepts in Moose to the concepts in Python or Java for example (it's more inspired by CLOS as I understand it). While this leads to a lot of flexibility, it also means that what you are doing is using it as a sort of glue layer between the middleware server and the db. It works well primarily because you can keep that glue thin.

yes, strong typed then cast values back and forth (if you can remember what you can or should cast them) all the time :), or drown in xml

why not let the compiler take care of that (and the Perl compiler can do that very well), and only add checks in the rare spots where it might get confused ?

Perl is used for enterprise applications with millions of users and hundreds of requests per second :), and compared with other more enterprisy strongly-typed languages is a lot more maintainable, refactorable and piecemeal upgradable