Application Development using Catalyst, Moose, Plack, DBIx::Class and other Modern Perl software!

06/03/2009

RFC: Non Code Data Directories and Standards for Catalyst

Introduction

This blog is a proposal and request for comments regarding adopting the XDG Filesystem Hierarchy as a option for managing all the non code data composing a Catalyst application.

The Problem

Right now when you create a new Catalyst application the non code data by default goes to either {home}/root (for templates and static stuff) or {home} (for configuration files), where {home} is the root directory of the application. So you get a directory structure like:

Now, this {home} directory is something of a hack, since we use Catalyst::Utils:home() to try to figure it out based on certain expectations. Perl doesn't have this idea of {home} built into it. If your application is 'installed' (via cpan or make install), we guess the location based on the physical address of the application modules (whatever you got that is inheriting from Catalyst). If it's not installed (which is the common case when you are developing and just running the development server or tests) it walks the directory structure looking for a Makefile.PL or a Build.PL and then decides that's good enough to call {home}.

Since this method can be a bit flaky, a lot of people are recommending that you use File::ShareDir (see here for a good overview). This module intergrates well with Module::Install and leverages the fact that a Perl module can have a share directory associated with it. Using this, you might create a directory structure like:

I also modified the directory hierarchy a bit to reflect the growing consensus that your Catalyst application should ideally live one level further down from your application root. In this case I choose 'MyApp/Web.pm' which seems to be the most popular choice and one that is semantically meaningful. This represents the idea that your MVC layer should be the thinnest possible over your true domain and interface logic, which sits in the MyApp directory. I also moved the configuration files to {home}/etc since that makes sense from people used to finding configuration in /etc

Although this is an improvement, it still suffers from several issues. First of all one problem with File::ShareDir is that it can only find the share directory for installed applications. For the common case where you are actively developing, or running tests, you still need some code like Catalyst::Utils::home() to guess the directory for you. In this way it's not much better than what Catalyst::Utils::home() provides out of the box.

Also, when your share data is installed into the perl library path, this means that your application server (or user running apache mod_perl or fastcgi) would need the correct level of access to the path. This complicates configuration. This setup is this is not what most Unix administrators will expect.
There are reasonably well defined norms for where your configuration
should go (/etc or ~/.config) as well as where the logs go and all that.

Although you can override the {home} directory with environment variables, this is not ideal if our goal is to minimize installation hassle and make everything work well out of the box. It complications your installation for users as well as configuration the web servers that will run the code.

It also complicate customization. For example, let's say I am using the MojoMojo wiki and want to run three instance of it. Each instance will have unique configuration and I want to slightly modify the theme files for each. Right now, the only way I can do this is via the method of overridding the environment variable for home for each running instance. Although this works, this is a 'roll my own' approach that is likely to vary from administrator to administrator, making it more difficult to onboard new admins due to the uniqueness of each application. I strongly feel that we should have clear standards for all the most common case deployment issues, since this reduces errors, speeds deployment as well as counter the argument I often hear that Perl is hard to maintain. A standard will also help grow a set of best practices surrounding deployment issues which we can document and promote.

Proposal

This is a case where Perl is not well leveraging existing norms, which really goes against the grain for us, considering CPAN with it's "reuse, recycle" mantra is one of our primary claims to fame. My recommendation is that we adopt an existing standard and make this available as a plugin or set of roles for Catalyst. The most relevent standard is the XDG Filesystem Hierarchy which exists specifically as a standard for where installed applications put configuration and data files, both locally that users can overide as well as global stuff that only admins should touch.

Although this standard is aimed at Linux, it's fairly straighforward and similar methods are employed by Windows Server and MacOSX Server so that is should be possible to create a pluggable support mechanism that is broadly applicable.

the XDG Filesystem Hierarchy defines some environment variables and defaults for the most common types of non code data, as well as offers a system for separating user configuration from global configuration.

I recommend you review the standard, since it's very short, but here's a summary. The standard defines 4 enviroment variables useful to us:

XDG_DATA_HOME

These is the location of data oriented files that a user running the application should be able to customize (or will be customized during installation or use of the application). By default these go into "~/.local/share".

XDG_CONFIG_HOME

Similar to XDG_DATA_HOME but specifically for configuration files. Defaults to "~/.config".

XDG_DATA_DIRS

Takes a string of paths (delimited by ":") where to local for systemwide data. These could be things like templates or static assets that shouldn't be changed by users and that would be shared by all instances of the application. The default is: "/usr/local/share/:/usr/share/".

XDG_CONFIG_DIRS

Like XDG_DATE_DIRS but for configuration. Defaults to "/etc/xdg".

The way I'd see this working is that if the application we being run in development mode, we'd first look for files local to the application file path, and then fall back to looking at the XDG defined directories. Additionally, we'd probably need some boilerplate install scripts that authors can use to prompt for the desired path information (which rational defaults). So our application distribution would possible look like:

MyApp-Web
/t
/etc
myapp-web.conf
/share
/local

/lib
/MyApp
Web.pm
/Web
/Model
/View
/Controller

And during installation we'd copy "MyApp-Web/share/local" to
"$XDG_DATA_HOME/myapp-web" and "MyApp-Web/share/" to
"$XDG_DATA_DIRS/myapp-web" (we'd either just copy to the first one in
the path or prompt at install time). Handling configuration would be a
bit trickier. My thougth here is that we'd copy "MyApp-Web/etc/*" to "$XDG_CONFIG_DIRS/myapp-web" but when running the application would like in both XDG_CONFIG_DIRS and XDG_CONFIG_HOME, merging both to allow locally overriding of the configuration.

Overall I believe this will give us a smoother and more professional installation experience, make it easier to administer Catalyst applications and help start a best practices dialog.

Comments

You can follow this conversation by subscribing to the comment feed for this post.

[this is good] I'm not sure I get the MyApp::Web thing.MyApp::Model is, IMO, just a dumb convention. I prefer to just have MyApp::User, MyApp::Group, and so on. I figure if I'm going to use this convenient second-level namespace, I might as well use it for my core classes.I find MyApp::Web::Model really confusing. Do you have a non-web model as well? The model is the model, and is not web-specific (at least if you're doing it right). Yes, I'm familiar with the Reaction concepts of Interface vs Domain model, and that makes sense too, but that's not what you're proposing, AFAICT.I can sort of see MyApp::Web::Controller, although it's somewhat hard for me to imagine a non-web controller (do I need a full-blown set of controllers for a CLI?).That leaves MyApp::Web::View, which seems reasonable.I do love the use of share. I already do that myself, and also do the etc thing. I hate the Catalyst convention of root, which is completely arbitrary and bears no relation to any standard I know of.

Yeah, I think the real meat of the model stuff should go into MyApp layer. Right now the default for Catalyst is to create these M/V/C directories and although the are probably needed for now (I see ...Web::Model as adapter or facade over the real models, just to decouple the website side) My hope is eventually we can ditch this in favor of something a bit simpler and more meaningful. That is for another thought experiment though, right now I am trying to work through my deployment pain and organization and wondering if we could do this better. There might be an easier solution to this than I have already proposed.

I completely agree (with the shared data thing, not the Catalyst stuff since I don't use it) and this is needed for more than just Catalyst applications. I ran into lots of these problems with Smolder. File::ShareDir helped some, but it didn't get me all the way and I still needed to do a lot of work on my own. If you put some work into this, please think about how it can be abstracted outside of Catalyst to be useful to everyone.

Somes of the guts for this already exist on CPAN, if you look up: File::BaseDir you can find it. So I'm not completely certain which additional bits I'd write would also be good as stand alone modules. Maybe the helpers for building install scripts and so forth and some plugins to Module::Install.Making this work properly on Windows and MacOS are also important to me, so I'd need a sort of base API with plugins to mimic the XDG locations.