This blog tracks development of the open source accounting and ERP software LedgerSMB. I also offer some perspectives on PostgreSQL including new features which we may find useful. Brought to you by Metatron Technology Consulting.

Saturday, March 17, 2012

Ok, so this post is going to be about the ERP market, and why LedgerSMB is headed for greatness. We are a long-standing open source project and are not going anywhere soon. There are significant strategic reasons to choose LedgerSMB over others.

We have made a lot of progress since we started, and that progress is expected to continue to accelerate. Here's why....

First, we are great because of PostgreSQL. There is no getting around that. PostgreSQL is an awesome database management system for ERP's. While there are some things it doesn't do yet as well as I would like, no other RDBMS does these either (see my recent post about this).

However, what PostgreSQL offers is an extremely powerful set of tools for addressing an ERP environment, and without per connection license costs. The latter is especially important because it allows for additional utility programs to connect to the database and perform tasks without this requiring additional client access licenses and there is a lot that can be done with the flexibility that results.

Currently we take heavy advantage of PL/PGSQL and various built-in types and functions. While there has been some talk of using PL/Perl and other stored procedure languages, we haven't yet found a case where this makes sense.

It's worth remembering that most of the work in an ERP system is in storing, retrieving, and displaying information. The database is the ideal place to put all of the logic regarding storing and retrieving the information, as well as some of the display logic. Consequently we are becoming more database-heavy and lighter-weight on the Perl side.

Less is More

In LedgerSMB we focus aggressively on overall architecture, with the idea of flexible, re-usable modules which get the logic right the first time around. For 1.4, we are removing the project and department system and replacing it with a more general "business reporting unit" system. This allows us to offer project accounting, department reports, manufacturing job costing, and more, without dedicated code to each task.

Differences are then handled by workflow scripts rather than by the model. This ensure that the application can eventually be a simple yet powerful ERP development environment rather than an application which must be everything to everybody.

Each major release of LedgerSMB has had more features but a smaller code base than the previous version.

This approach is good for a number of reasons. First it places emphasis on where the emphasis needs to be: on LedgerSMB as a platform. Secondly it encourages flexibility in implementation while focusing on getting the details regarding accounting right. It also gives us fewer maintenance points and hence a better, higher quality codebase. There are many more reasons why that is the case, but in general we try hard to do more with less.

LedgerSMB is a Multi-Vendor Project

Most open source ERP's are single-vendor products that make money selling enterprise versions. This means that the community version is basically a sales tool and many features will not be added for fear of hurting the vendor who sponsors the tool.

LedgerSMB began as a joint project between Metatron Technology Consulting and Command Prompt Inc, with a number of other consultants joining in early. While the ecosystem is constantly in flux as new individuals enter the project and old ones leave for one reason or another, one theme remains the same: you can get top-notch support from a variety of sources, and no company retains a monopoly on commit privileges.

This changes the dynamics of the project. Instead of releasing a moderately functional tool open source and then charging for advanced features, we release most things open source and only where that model breaks down do we look at subscriptions. This allows us to do things that other open source projects in the area have not yet been able to really try to do in a significant way, such as good working payroll systems with maintained rules for various tax jurisdictions (forthcoming in 1.4 at least for a couple of jurisdictions with more coming). Instead of paying for fixed asset management, you can save that money and put it towards things that matter, like maintaining payroll logic. The logic "sold" under subscription can even remain open source, since the major problem is just that of keeping the code up to compliance with the latest regulations.

LedgerSMB is a Development Platform

LedgerSMB 1.3 and higher isn't intended to be just an ERP application, but rather a development platform for helping businesses implement ERP applications in accordance to their workflows. The framework while minimalist is designed to facilitate rather than get in the way.

The framework is designed to get your teams of developers up and running as quickly as possible. The object model is designed to be clean, and most of the actual work is shipped off the the database via stored procedures, or handed off to modules to parse templates, or handed off to various CPAN modules. The actual Perl code is basically glue that holds all the rest together.

One advantage to this sort of platform is that reference implementations can be created in other platforms as well, and the basic framework can be set up such that it can be used for non-interactive batch jobs or thick clients, not just web applications. In fact that has been a major guiding feature of our framework.

Secure by Default

LedgerSMB has made security a top priority since the beginning of the project. By default users cannot do anything in LedgerSMB, but beyond that even if they find and exploit holes in the web application, the security is still further enforced on the backend.

We are trying hard to provide industry-leading security, and such that can be enforced even when integration with other software is at issue. We have plans to support Kerberos 5 authentication as soon as it is needed, as well.

Comparisons to xTuple

LedgerSMB and xTuple share a common approach to the database. We are both heavily stored-procedure centric. Both of us focus heavily on the database as an engineering discipline in itself. This means that both of us focus heavily on interoperability with other applications and a good API.

There are however two important differences. xTuple is a C++/QT application while LedgerSMB is written in Perl. This means that LedgerSMB can deliver rapid application development possibilities in a way that xTuple cannot. Rapid application around xTuple would require using other existing frameworks. Rapid application around LedgerSMB can be done in within the LedgerSMB framework. This first difference is critical because larger businesses fundamentally buy ERP frameworks as development platforms, while smaller businesses haven't yet found the value in doing this (something LedgerSMB hopes to change).

Secondly, xTuple is a single-vendor solution. It is offered by a single company, which owns the codebase, and sells add-ons. This means that the incentive for the company is to release as little as they can and remain viable while selling licenses to as much as they can. Since LedgerSMB is multi-vendor, the incentive is to provide as much open source as possible and sell as few addons as possible. One can see already that LedgerSMB comes with fixed asset management for which xTuple charges about $700. As we continue to develop, those areas where xTuple is currently ahead (such as manufacturing) will probably disappear, and the areas where we offer features that xTuple sells for profit will likely increase.

The above paragraph has been stricken because, as Ned Lilly points out it is inaccurate. In fact the fixed asset tool was developed by a third party, and this is one area they cultivate around their market. He points out further that this is an example of rapid application development in Postbooks and therefore I am incorrect there. He further asks me to suggest that their incentives are not to reduce a minimalistic product. If their marketplace materials are not mostly made by them, then I would agree on all these points. Points accepted.

I will however stand by the point that generally single-vendor solutions tend to have incentives to release minimalistic solutions to encourage folks to buy licenses. Kudos to xTuple for not falling in that trap.

I will also have to look a lot more closely at how this Javascript automation is doable. It may be possible in essence to do a fair bit more than I had considered here.

Comparisons to OpenERP

OpenERP is an open source, single-vendor ERP solution written in Python. It features a pleasing interface, a modular structure, and so forth. However the same problem occurs relating to single vendor management.

OpenERP offers rapid application development possibilities both in Python using the framework provided or in it's "application builder." However, what OpenERP as a business is really interested in selling you is a software license for the full version, which gives you exemptions from the requirements of the AGPL v3 as well as technical support. This makes it harder, I think, to leverage a larger development community around the software.

Comparisons to OpenTAPS

OpenTAPS is an ERP application and framework that is multi-vendor, built in Java, and supports multiple databases. It is based on the Apache Open for Business ERP framework.

OpenTAPS is impressive, features-wise, and it is used by some major corporations, but the choice of Java as a language means that development is less rapid than it would be in Perl or Python, and therefore the framework capabilities are more within the reach of large businesses, which likely have internal development teams familiar with Java, than small ones. This means that small businesses are still faced with the necessity of building business processes around the application rather than the other way around. It is therefore more of an upper-market product, while LedgerSMB currently shoots primarily for the midrange.

Tuesday, March 13, 2012

There are a few times when I work with PostgreSQL when I wish it was able to do some things that neither it nor other relational databases are able to do at present. In my view (and I will defend each of these feature requests), these all fit well within a relational db centric world.

3: XML/JSON to tuple type

One of the problems that one runs into in a stored-procedure centric application is managing input and output of an application. An ability to convert XML or JSON to a tuple type (remember that tuples can have members that are tuples, or arrays of tuples). The formats are semantically equivalent, so why not allow for conversion?

2: Nested Namespaces

Managing hundreds or thousands of stored procedures can be a problem in a flat namespace. We end up having to semantically create hierarchical function names or hierarchically named namespaces in a flat namespace world. Even if one cannot do it for tables, there ought to be some way to do this for functions.

Part of the problem here is that the SQL standards give semantic value to different namespace lengths, so this would have to be solved. It is not an easy problem to solve and it may not have a solution. Perhaps a package delimiter within a namespace would be helpful? Maybe a character like# or $?

1: Rich Declarative Constraints for Accounting Applications

It seems strange to me that accounting applications have been one of the primary uses of RDBMS's since their inception, and yet there is no type of declarative constraint to handle ensuring that transactions are balanced.

Whether a transaction is balanced or not is fundamentally a set-based operation. A transaction is balanced when the sum of the debits of the rows in each transaction is equal to the sum of hte credits of the rows in each transaction. In LedgerSMB, debits are negative amounts, and credits are positive amounts (they are just presented to the user as fundamentally different). So I would like the ability to do something like:

CHECK FOR EACH TRANSACTION (SUM(amount) = 0) GROUP BY trans_id;

Part of the problem I suppose is that getting this right in a row-locking environment is hard. However, in an MVCC environment it should be possible to check if this constraint matches the deleted and inserted rows, and if it does (0 + 0 = 0), we know we are balanced (deletions would only happen for unapproved transactions, i.e. ones that have not yet hit the books). Such a check would only fire once per transaction, and only check rows modified by the transaction. It therefore shouldn't be problematic in the way an aggregate check would be over the entire table.

The other two are nice, but if PostgreSQL could do this very well, it would be the king of databases for accounting software, and I am sure they feature would find new uses elsewhere.

Saturday, March 10, 2012

There as a recent email to the development list from Erik Huelssman describing his experience extending LedgerSMB. In my view this demonstrates better than anything else that we are getting right what an ERP framework needs to do.

The email is reposted here with permission.

Hi all,

For one of my businesses, I needed to create an application to store some data on the services and products we provide. Basically, there would be one central table with some references to a few other tables, mostly to provide (short) selection lists. Also, there's one bigger complexity, because we wanted to link to our customer data.

The customer data is all stored in LSMB, so, developing this database inside LSMB's database seemed like a logical choice.

Since my colleague and I were completely unfamiliar with the web development framework for LSMB, we first had a start with the tools we did know. (Internally developed stuff, nothing particularly fancy.) However, due to other priorities, things didn't quite roll forward as we wanted; it was simply too much work to get it done.

Last week we scrapped our own development. As a replacement I started looking at the tools that LSMB provides to get the same job done. Again, without prior knowledge how this should be done, I simply studied the budgeting module. I figured that if budgeting was an extension, we might consider our own administration as an extension and that it should probably work (roughly) the same way.

And it did: in 1,5 day I had our administration up and running, featuring a data entry screen and a search screen. The current version of the data entry screen searches and links to a customer only if it finds an exact match. [Chris and I are working on functionality every module can use to look up cross table references like these, so it'll be replaced by something better soon.]

* Create a top level handler file by copying ./file.pl to ./<mymodule>.pl

* Add menu items

Basically, the out of the 1,5 days I've spent, most time was spent learning what I had to do. The actual time spent coding was negligeable.

In summary: adding to or changing LedgerSMB can be extremely easy. It was a refreshing surprise to me how easy. We'll definitely develop the rest of our application using LedgerSMB's own framework and when it's done, we might be able to donate it to the project.

Anyway, I hope the above inspires others to try it out themselves. If you have any questions, both Chris and me will be glad to help you out!

Friday, March 2, 2012

It is no secret that I think of NoSQL as a kind of cool set of technologies applicable to a fairly narrow niche market. My view is that relational databases are still (and will be for the long term) the bread and butter of data management. NoSQL provides a useful set of tools for cases which have very different data requirements. There are edge cases where many NoSQL techniques make sense in a relational environment though.

Data in an ERP application can be conceptualized as sets of sections of tables, each of which has various write vs read load balances. Some sections may be updated only rarely, others may prevent updates and deletes, but allow inserts and selects, and receive data constantly. Data may migrate from one area to another over time, such as when the books are closed.

LedgerSMB tends to currently use a variety of models for different sections of the application. Settings and menu attributes (what a menu item does) are modelled using key-value modelling because that best semantically fits the sort of data being stored. The menu structure is modelled hierarchical, as (in 1.4) are projects, departments, and account classifications. Hierarchical modelling is done currently through a simple self-join.

Virtually all of the rest of the data is modelled using simple relations with no self-joins. I don't see that changing. However, for the sections that either do self-joins or key-value modelling, I am always interested in seeing how these can be optimized, and so when I read a blog post on NoSQL data modelling, it occurred to me that as time goes on some of the models discussed there may eventually become more applicable to relational systems. And so I will go over my thoughts as to how these may be helpful or not.

As PostgreSQL gets native JSON support, some parts may in fact be moved to that.

Key-Value and Schemaless Modelling

Key-value modelling is generally something to be avoided in relational design. There are, however, times when key-value pairs are in fact what one is modelling, and hence this is a good way forward. The best example in LedgerSMB is the settings table (called "defaults"). Here the data is semantically keys and values and hence there is no other clean way to model the data.

There are some other cases where data can be modelled well using key-value modelling, but where this is not quite so semantically clear cut. Consider the LedgerSMB menu, for example:

The second table effectively models the keys and values submitted back to the application when link on the menu is clicked. While this works ok with KVM, it is not really clean. Ideally any common attributes would be moved to menu_node and free-form ones would be added to this table.

But here's a case where JSON would be cleaner, per node. The data here is not getting updated frequently (unlike defaults). Indeed it is typically written once unless a fix needs to be made. The fixes would be cleaner if it was all in one field. So here, a NoSQL model would work very well, albeit incorporated into a relational database.

Hierarchical Modelling

Since LedgerSMB 1.3, we have started to do hierarchical modelling in the database. See the above menu_node table as an example. 1.4 adds hierarchical business reporting units (for projects and departments) as well.

Currently these are entirely handled through simple self-joins and no clustered indexes are added. When the menu is generated in 1.3, we use connectby() in tablefunc (we support PostgreSQL 8.3 and above), but for trunk (will be 1.4) we have gone to WITH RECURSIVE which has shown a significant speed improvement (2-70 times depending on various factors).

Initially when I read the article, I thought maybe some of the ideas might be applicable to my work with LedgerSMB, but as I think about it I am not sure. It may be that PostgreSQL's approach may perform well enough not to ever have to move to something else. WITH RECURSIVE performs very well, and so the major options for improving performance (namely clustered indexes) may be sufficient.

Outside of clustering on indexes, I am hard-pressed to think of ways to optimize hierarchical lookups in ways that would significantly beat other forms of hierarchical storage. Perhaps someone could explain what the benefits are to storing child node id's etc in the table. I sure don't see it.

OLAP

In additional to standard relational operations on a well-normalized database, there are also cases where analytics need to be run relatively quickly and in semi-flexible manners, over increasingly large datasets, and they must perform well.

Unfortunately in an ERP application, there is rarely a hard distinction between analytic and transactional workloads. Bank account reconciliation may enter data, but the workload depends on getting good data regarding reconciled transactions from the past, and that means aggregating potentially large datasets, and those datasets could increase in size over time.

The first forays into this area for LedgerSMB in fact were aimed to solve mixed transactional/reporting workflows. Our current approach uses "checkpoints" which effectively close the books on all previous dates. When one closes the books the checkpoints are created, and from that point forward, no transactions can be entered for previous dates.

This allows us eventually to tie a number of other information on to checkpoints. Various reporting views could be materialized on closing (asynchronously even), and thus provide better performance for all kinds of reports on these static data sets.

These areas have not really been fully explored in LedgerSMB yet. However for some upcoming features (such as graphs in reports), I think it will be necessary to offer this sort of thing. Referential integrity constraints can tie a set of reports to a specific checkpoint so that the reports cannot get out of sync when books are re-opened (which happens on rare occasion).