This is a repost of an article from my old blog about CodaServer. It's being published here for posterity.

Most software developers work in the trenches, the corporate IT departments, writing code to help automate a company's various processes. While the Googles get the glory and Facebooks find the fame, our intrepid warriors make life better for one knowledge worker at a time.

One problem corporate developers have is that business processes, once in code, are difficult to change. This is compounded when multiple languages or development platforms are in use, as the changes need to happen at the same time in multiple places.

The Old Way

Let's start with an example taken from a simple warehousing operation.

Pallets are delivered by UPS or Federal Express to the receiving department.

They are then inspected for damage by the staff.

If there is damage, they are rejected and sent back to the supplier.

If they are good, they are "broken down" and the boxes on them are stocked in the warehouse.

Using a traditional SQL database, there may be a table for "pallets" with a column called "status" having each of the highlighted words above as possible values. (There would be other stuff in there too, but I'm keeping it simple.) There would also be a (say) PHP class called "Pallet" with methods corresponding to each of these actions, as well as business logic handling the validation of various fields and making sure that the specified pallet was in the right state to have the action performed. Then there is a web tool that let users record this stuff online, and it would also do some form validation as the user tried to do various things. There is also a Java-based reporting engine that runs against the database, and it needs to know about all of the statuses for some of its ad hoc querying capabilities.

Now, say management decided to change this. They add a staging status between delivery and inspection, since the warehouse is receiving so many shipments and the staff can't inspect them quickly enough. The staging area will act as a short-term storage area, and the status will give management greater visibility into where their boxes are. What needs to change?

Well, the database is simple. "Staging" becomes a new status field value. Just gotta make sure to add it to all of the environments. The Pallet class gets a new method, and most of the other methods may need to change too to make sure they are checking for the new status. The front end has some new conditional logic to take care of, and it needs a new button and status made available. And the Java reporting system needs to know about staging.

The CodaServer Way

CodaServer handles "tables with status" using its FORM entity. Instead of making status a data value in a table, it lets developers define the state diagram for the entity. This is done by defining a series of "leads to" relationships:

Delivered leads to Inspected.

Inspected leads to Rejected or Broken.

Rejected leads to Nothing.

Broken leads to Nothing.

To add a new status to the list involves simply declaring the new status through an ALTER FORM command and rewiring Delivered to point to it via another ALTER FORM.

CodaServer lets you specify both an adjective and a verb for each status.

Delivered/Deliver

Inspected/Inspect

Rejected/Reject

Broken/Break

These verbs are what happens to the object to make it enter the adjective state. Incidently, these verbs replace the traditional INSERT, UPDATE and DELETE operations in SQL databases, although you can still use UPDATE to modify an object without changing its state.

CodaServer handles all of the busy work of determining if Pallet 4 is in the Delivered status, without your having to do anything. It also fires off the BEFORE INSPECT and AFTER INSPECT triggers if you have them specified to handle any validation or cleanup that needs to be done.

Applications can be written in such a way that they know about your business entities, their statuses, actions, and workflow without you needing to write any additional application code. Think about this, because it's a pretty big deal. All application metadata is available through a cross-platform web services API, a business rules engine enforces your workflow and data integrity, and your workflow can change without you having to write more application code.

This is a repost of an article from my old blog about CodaServer. It's being published here for posterity.

Most business systems implement the concept of "users," those crazy scamps who login and actually use the software. They're pretty important; each has different capabilities within the system and their access levels need to be made available to application code so that it can render a suitable interface.

The Old Way

This is one of the greatest antipatterns in enterprise development. Due to the impedance mismatch between object-oriented application code and relational database tables, most application frameworks find it expedient to use a single database user with global permissions on the schema and implement a separate user management piece using database tables. This is bad on several fronts.

Database permissions are pretty bulletproof. If you don't have INSERT on a table, you can't stick anything into it.

Custom user management code is error-prone. All of the thinking needs to be done by developers on a timeline.

Custom user management code reinvents the wheel, poorly. I have seen a number of half-baked permission schemes in my day, and I'm not really that old. There is no consistancy from one company to the next and the inner workings/caveats of these systems tend to live in the head of one or two developers.

That said, developers have no choice but to do it this way. The CRUD model used in SQL simply isn't dynamic enough for a good permissions architecture.

Everything is an UPDATE.

Once you grant UPDATE on a table, you pretty much give a user access to do anything they want with it.

The CodaServer Way

CodaServer has a much different approach. It is role-based, meaning that permissions are always assigned to roles rather than assigned to users directly. Being the Business Rules Engine, it isn't limited to having CRUD as a model. You can create a process map that goes something this:

This plurality of permission schemes is meant to obviate the need for roll-it-yourself user management code and remove a major headache from developement.

Another way that CodaServer differs from traditional SQL databases is that the role-based permissions system is actually part of the schema for each application. It is deployed with the application as it moves between different environments and the roles themselves are managed by developers rather than database administrators. (The administrators have their own domain of responsibility covered in an upcoming article)

The logic behind this is that saying which role has what access to the system is very much a part of the business rules. Is there any reason why a shipping clerk should not be able receive a shipment in the development instance but should in production? Probably not. You may want to say that Charlie should be stripped of his shipping clerk role because things he signs for tend to disappear, but the role itself is still a valid entity and shouldn't change.

A user in CodaServer can have any number of roles in an application environment, and their permissions in the system are a union of all the permissions of the roles they have been granted. Metadata about a user's roles and permissions can be gleaned through the SHOW ROLES and SHOW PERMISSIONS commands. No more trolling around in the information_schema.

Finally, a user management system that works with you rather than against you.

This is a repost of an article from my old blog about CodaServer. It's being published here for posterity.

Databases have always been a little awkward in the versioning department.

A database server is an environment all its own. Different databases can have identical schemas but much different data in them. Some of this data is very much part of the "structure" of the schema: Tables that are used to populate important application drop down boxes, for instance, or metadata tables for various reporting functions.

This data is programmer data instead of user data, and the application can't work without it.

How does one keep both schema and programmer data in sync across different environments?

The Old Way

SQL databases are specialists. They do their thing handling queries and update statements flying in from all directions. They maintain data integrity by following the rules of the schema. They handle enormous loads. But that's about it.

Database servers are often known as "instances," and for good reason: They are entirely self-contained. They generally don't know about each other and don't involve themselves in the rest of the software ecosystem. If you want to set up a testing environment, you "fire up a new instance" and load your schema onto it. As you roll out new versions of application code, you run the diffs of the schema against the new environment and backfill the data.

The developer (or system administrator) does the thinking while the database server just runs its queries and twiddles its hard drive platters.

We should be so lucky.

The CodaServer Way

CodaServer has a much different architecture. One CodaServer manages your development, testing, and production environments. If you need more capacity, you can fire up multiple CodaServer instances, point them at the same system datasource, and put a load balancer in front of them. Since all the communication with CodaServer is over HTTP-based Web Services, it has an architecture very similar to that of the web, and can leverage all the same techniques used to scale general purpose web sites.

(For more detail on CodaServer's architecture, you can look at the manual's architecture page)

Having one server (or load balanced group of servers) manage all of your environments has another benefit: Simple version control.

As you build up your CodaServer application in the DEV environment, CodaServer keeps track of all Data Definition Language commands (like CREATE TABLE) in its transaction log. When you want to move your application to TEST, simply run the PROMOTE command. When it's ready for PROD, PROMOTE from test. Your application is always constant in each environment because deployment always follows the same pattern. There cannot physically be a schema in production that was not in development first.

But wait, there's more.

CodaServer also has the concept of REF TABLES, tables that remain constant at all times between environments. All schema changes and INSERTs, UPDATEs, and DELETEs to REF TABLES occur immediately in DEV, TEST, and PROD. This is ideal for most types of programmer data.

Some might think it's limiting that CodaServer doesn't allow developers to create tables directly on different application instances. This isn't really as big a problem as it seems.

CodaServer's only concern is protecting the data model of your applications. Since the applications use standard SQL databases, you are free to add any additional tables you want for reporting, ad hoc denormalization, data transformations, etc. The only caveat is that CodaServer based applications won't know they are there, which is perfectly fine since they don't affect the datamodel.

(It is highly recommended that you adopt a naming convention for these tables that is different than that of your CodaServer tables to prevent collisions.)

This is a repost of an article from my old blog about CodaServer. It's being published here for posterity.

Benjamin Franklin once said "Lost time is never found again." That is doubly true if that time is when your company's batch processing was supposed to run, and now none of your suppliers are going to get their checks. And they cancel your accounts, and you can't get more supplies. Your customers rebel and go to your competitors, your dog leaves you to find a master who can remember not to screw up the mundane details.

That time is the most precious of all.

In programming, we constantly deal with events. These can be user-generated (mouse clicks, form submissions) or machine-generated (a web service call) and we can easily build systems that capture the events and respond to them. Less easily captured, however, are events generated by time.

How important are these events for a data model? Pretty darned important. There are many tasks which have a heavy data component that simply cannot be run on real-time production systems; rebuilding search indexes over millions of rows data, for instance, or moving millions of transactions from the denormalized transaction schema into the general ledger. There are also operations that are done at a particular time of day as part of a business rule. Electronic voting systems need to stop accepting votes at a particular time, and student registration systems need to open their course catalogs to certain students at certain times.

It's a big deal.

The Old Way

Traditional, time-based events are the domain of the venerable cron daemon. This little guy reads cron tab files which specify times for which certain scripts are to be run. Scripts can be set to run every month, on the 10th, at 3:34AM, or every Wednesday for each minute of the day. It's a remarkably flexible system with a few drawbacks.

The scripts are not necessarily part of the application's main codebase. If you have a PHP web app based on CakePHP, it's difficult to reuse that system's configuration to drive your cron script, which needs to run via the command line. It also makes class libraries more difficult to share.

The cron daemon can die, or the server it is on can die, or the script can die or time out. All of these could leave the database in a inconsistent state and can be difficult to troubleshoot after the fact.

Cron is generally handled as an outlier to the rest of the system. It is difficult to version control, often has an owner who is not a developer (usually a sysadmin), and requires special consideration during code pushes.

The CodaServer Way

CodaServer's solution is to bring cron inside the database.

Instead of scripts, CodaServer's CRON runs stored procedures which have been previously defined by developers. The syntax should be familiar.

CRON check_cron ON accounting.dev IS 0 0 1 * * process_checks;

This will create a new cron called "check_cron" in the development instance of the accounting application that runs the "process_checks" procedure on midnight on the first of each month. By placing cron inside the business rules engine, CodaServer addresses each of the problems above.

The cron isn't separate from the rest of the code. In fact it doesn't require anything in application code at all. They become part of the application's business rules.

The cron is guaranteed to run as long as the business rules engine is up. There are fewer moving parts.

The crons can be created and maintained by developers with the MANAGE_CRONS permission on the server.

Crons are unique amongst CodaServer objects in that they are not part of the application's transaction log. They need to be created explicitly for each instance under which they should run. This was meant to ease system administration and prevent the situation in which unintended events are pushed from environment to environment during code promotion with no way to turn them off. The procedures they call are part of the transaction log, however, so these will be moved between environments automatically.

Cron has been a great friend for many years, and I'm really happy we can take him along and give him a new home.

This is a repost of an article from my old blog about CodaServer. It's being published here for posterity.

Validation means many things to many people. It's that feeling you get when you are unconditionally accepted by the one you love, when all the toil and tears finally...

Uh...

For most of us programmers, validation comes in two forms:

Format-level Validation: Does this piece of data look the way it should?

Data-level Validation: Does this properly-formatted piece of data represent an appropriate value for where I'm seeing it?

This article will focus on the former.

The Old Way

Using conventional technologies, format validation has been something of an annoyance for us. Every layer of the application stack makes a token effort at addressing it, but none of them actually complete the job, and we are left negotiating between them.

The database has database types, such as VARCHAR, NUMERIC, and BLOB. As the final resting place for data, these place the most basic constraints on what it can look like and how much space it can occupy. Of course, each database has slightly different constraints here. A CLOB in Oracle is a TEXT in MySQL, VARCHARS can be 4000 characters in Oracle (I think, it's been a while) but only 255 in MySQL.

Application languages have their own types, like ints and Strings, which kind of map to SQL types. Except when they don't, when they are arrays, or classes. Adaption code must be written to handle this conversion and ensure the application values fit into the database.

Application frameworks often include their own validation layers for processing form input and making sure it conforms to some higher level structure. Is that string an email address?Â Does that purchase order number look like one of ours? How about the ZIP code? What about a Canadian postal code? Usually the frameworks include limited number of these validators and leave it to the developers to round out the set.

Sometimes Javascript is employed to stop bad data from ever leaving the web browser. Just in case.

And this is only one application language on one database. Perl libraries for validation can't be used in Ruby. Let's rewrite 'em! All this format validation hassle adds virtually no value to the final application, but it needs to be done anyway.

The CodaServer Way

Naturally, CodaServer has an answer for this problem:

An Extensible Type System!

Natively, CodaServer has 8 datatypes as well as an Array specifier. These correspond closely to the SQL types and are pretty generic containers for data. The real power comes from the ability to define your own types using the CREATE TYPE command.

CodaServer utilizes regular expressions to give you virtually unlimited ability to describe what your data should look like. (If you are unfamiliar with regular expressions, there are many great resources to get you started online. I use this site a lot to get ideas, and this site to test them out. CodaServer uses the Perl type of regular expression.)

The validation_mask value determines whether or not CodaServer throws an error, and the save_mask determines how the validated result is saved to the database. This uses a facility called regular expression substitution which is a beyond the scope of this article, although there are resources online that can give more details.

That's all there is to it! Now you have a new type that is available server-wide. You can use it in procedure argument lists, table definitions, procedural code blocks... where ever you would like.

On the application code side, you can simply pass in whatever the user submits in a form and CodaServer will tell you whether or not it validates. And (there's more!) CodaServer's ability to return multiple error messages with each statement means that you can submit your whole form, and you will get back all of the errors at once. No more of the "submit and see what else is wrong" that other database systems make you do.