Watching the Delivery Man

A distributed database based on the distributed Git version control system relies on a Perl script to help users track Internet orders. When the goods arrive, purchasers update their stock counts, wherever they may be at the time.

If you are like me, and really enjoy buying cheap goodies on the Internet, you might feel uncertain at times as to whether the things you ordered really will arrive just three days later. The thrill of spending can cause bargain hunters to lose track. It seems only natural to store the orders you place in a database and update it when the goods arrive. Of course, you need the database to be available from wherever you spend your money.

And that could be anywhere: in the office, at home, or maybe on your laptop in a cheap motel room. Maybe you don't even have Internet access, and after recovering from the shock of being inundated by mysterious parcels, you might have to update the database locally, only to synchronize it later when access to the net has been re-established.

The Git distributed version control system [2] seems like a perfect choice for the job. Put together in no more than two weeks by the kernel guru Linus Torvalds to replace the proprietary Bitkeeper product, Git manages the Linux kernel, patching and merging thousands of files at the blink of an eye. Of course, speed isn't an issue for the application I have in mind, but it's good to know that Git can synchronize various distributed filesystem trees without breaking a sweat, thanks to its integrated push/pull replication mechanism (Figure 1).

Figure 1: The central Git repository on a hosted server is the point of exchange between the individual local repositories at home, at work, and on the road.

Being Prepared

Information concerning Internet purchases and their estimated times of arrival are cached on the local hard disk. The CPAN module used for this creates a separate file for each item. Git versions this information in a local repository, which is the typical approach for a distributed versioning system. This approach gives developers full functionality without Internet access and without fazing the central repositories with temporary developments. They can check in new versions, check out old ones, create parallel development branches or merge branches with others, and many other things.

To synchronize the local repository, the local user issues a "push" to another instance someplace else. I've chosen a "central" repository on a hosting service as the point of contact for all clients, from which they push their changes to and pull their updates from. In reality, of course, no such thing as a centralized Git repository exists, and it is up to you which instance you contact to download patches or new features.

If you are working on a new laptop that doesn't know about the marvels of this tracking system yet, you can create a clone of the centralized instance with shop clone. After doing so, you do not need Internet access to query the clone or feed new data to it; instead, the changes are simply synchronized later with the centralized instance once you have reestablished the connection.

By Your Own Bootstraps

If you are interested in implementing this solution, you will need to create an empty Git repository on a hosting service with SSH access. As you can see in Figure 2, you need git init for this. Just create a new directory, change to the directory you created, and give the git init command. Now you might think that the client could simply clone this repository locally, but you need to think again: For some obscure reasons, it has to jump through a burning hoop first.

Figure 2: Creating an empty repository, dubbed buy.git, server-side.

As you can see in Figure 3, the client also needs to run git init to create an empty repository. Next, add a testfile for test purposes, run git add to insert it, and complete the process by running the commit command. Then, with remote add, define a remote branch with an origin alias pointing to the central repository on the server. The push origin master command then synchronizes the master (default) branch on the client with the similarly named branch on the server. It is a good idea to have the client's public key in the server's ~/.ssh/authorized_keys file to avoid having to type the password each time you access the repository via the network.

Figure 3: A remote branch "origin" is added to a local repository to point to the repository on the server. "git push" then feeds local changes to the remote repository.

If another client wants to retrieve the data from the server-based repository, it just clones it, as shown in Figure 4. Once on the local machine, it is a full copy of the server repository that also has the ability to git push changes checked in locally to the server.

Figure 4: Other clients can now clone the remote repository and then use "git push" to upload their changes to the server.

Wrapped in Perl

The script in Listing 1 shows the Perl script, which accepts the commands listed in Table 1 and issues the corresponding Git commands. It uses Sysadm::Install from CPAN to jump quickly back and forth between various directories (cd and cdback) and run various Git commands at the command line.

The order data is stored in a cache implemented by the Cache::FileCache CPAN module and the value 0 used for cache_depth in line 32 sends every entry to a file in the local ~/data/shop directory. Line 24 uses mkd from Sysadm:: Install to create the directory if it does not already exist. In contrast to Perl's mkdir() function, mkd does some error checking and issues a log of its activities, assuming you enabled Log4perl.

The cache's set() and get() methods accept the product name (e.g., "iPod") as a key and creates/retrieves entries in the format defined by the record_new() function (line 120). Besides the product name, a record also includes two date fields of the DateTime type. The first field, bought, stores the order date and uses the today() method in line 132 to set this to the current date.

Users can specify the expected arrival date of an item with a buy command,

shop buy 'dell netbook' 30

which specifies a delivery period of 30 days for a netbook ordered from Dell. Lines 133ff. convert this day value into a DateTime::Duration type object, which, with a bit of operator magic, can later be added to a DateTime object to calculate the expected delivery date. The latter is then stored in the second DateTime field, aptly named expected.

Both DateTime objects contain a formatter, DateTime::Format::Strptime, which defines the expected date format as "%F", thus expecting the object to be represented as YYYY-MM-DD in a string context.

Cache::FileCache has no trouble storing this deeply nested data structure in a file; it flattens the structure internally before doing so, then, when reading it later, converts it back into Perl objects. After the cache file has made its way into the local repository workspace, the shop script makes the changes permanent by running git add and git commit.