Given my devotion to code tidying in general, this ought to have been written a long time ago, but the perfect became the enemy of the good. I always imagined a tidier that would reformat the HTML content simultaneously, so that for example the <li> would be indented inside the <ul> above.

This turns out to be difficult and fraught with edge cases. What if an HTML tag is generated inside Perl code and can’t be seen by the tidier? What about embedded javascript and CSS? Every time I encountered these problems I’d shelve the project.

In the end I decided half a solution is better than none. The current masontidy doesn’t attempt to tidy the HTML or non-Perl content. Perhaps someday I’ll figure out how to do it, but the current tool is still a big improvement.

and eliminates useless stylistic differences between revisions of files.

Using hooks

The latest tidyall distribution contains hooks for running tidyall whenever you commit/push to svn or git. If a file has not been tidied or is deemed invalid, then
the operation is aborted and you must fix the problem before retrying.

In each case, you should commit a tidyall.ini file at the top of your project specifying
which tidiers/validators to apply to which files.

This hook must be explicitly placed in every copy of the repo, although you can partially automate this process. There is (unlike the other two hooks here) no way to require or enforce that the hook is in place, so it may not be ideal for large groups.

Unlike the git pre-commit hook above, this can be enforced, and in fact there is no current way to skip the check without going in and disabling the hook. I’d like to add a flag, but not sure how that would get passed to the hook; advice welcome.

Using a commit alias

A problem with all of the hooks above is that they won’t actually tidy your files for you. They’ll simply tell you what hasn’t been tidied, then send you off to fix things. It’s all a bit tedious.

Unfortunately, modifying your code from svn/git hooks is a no-no; see here, here and here for explanations. (How did we manage before stackoverflow?)

So what I like to do is create an alias for my commit commands, like so:

It will run tidyall --git and only proceed with the commit if that succeeds. The --git and --svn flags mean “process all files that have been added/modified according to git/svn status”. This might be overkill if you’re only committing some of the files, but it’s more efficient than using --all.

As long as I use these aliases, my files should always be tidied by the time the hooks are checking them. But the hooks are still useful as a double-check, and an enforcement layer
that’s harder to skip accidentally.

Counter-arguments

Some will argue that commits should never be blocked by correctness checks. The “commit early and often” philosophy suggests that a commit might be valuable even if the code is currently untidy or invalid; this is especially true in the case of git, where commits are not shared and are designed to be performed often.

Moreover, if you’re in a technical emergency and need to commit code to deploy a fix, it would be unfortunate to be delayed by a nagging validator. (Jeff Thalhammer, perlcritic author, has said that he dislikes running perlcritic on commit for this reason.)

My responses to these arguments are (1) I’ve never personally seen a situation where it was important to commit untidy or invalid code, and (2) the escape-hatches built into the first two hooks (“NO TIDYALL” and –no-verify) will hopefully allow you to proceed during an emergency. But we can agree to disagree.

An alternative to running tidyall on commit is to run it during unit tests via Test::Code::TidyAll. In fact, if you’ve got a smoke tester that runs after every commit, this might end up being about the same.

In a web site project I might work with half a dozen of these tools, each with their own syntax and applicable only to certain files. I want to apply some of them while editing, some when I commit, and some only when I run tests. There must be a better way!

Enter tidyall

tidyall is a unifier for code tidiers and validators. You can run it on a single file or an entire project hierarchy, and configure which tidiers/validators are applied to which files. Features include:

A cache to only process files that have changed

A standard backup mechanism with auto-pruning

A plugin API that makes it trivial to add new tidiers, validators, and pre/post processors

Support for multiple modes (e.g. editor, commit, test, dzil), with different plugins running in each mode

Configuration

To use tidyall in a project, simply put a tidyall.ini file at the top of it. Here’s the tidyall.ini that I’m using for CHI:

Ways of using tidyall

In your code editor

I like having a single keystroke (ctrl-t) to process the file I’m working on. The distribution contains an Emacs implementation of this command. Its effects are fully undoable and it reports any errors in a separate window.

This is the only editor I know how to program, so others will have to be contributed.

From the command line

Of course tidyall can be run manually, against a specific file:

% tidyall file [file...]

or against all the files in the project (skipping those that haven’t changed):

% tidyall -a

or against all the files you’ve added or modified according to svn:

% tidyall --svn

In svn and git commit hooks

The distribution includes an SVN precommit hook that checks if all files are tidied and valid according to tidyall, and rejects the commit if not. e.g.

Next steps

Lately I’ve become a big fan of the Nginx + Starman combination for Perl-based web development. mod_perl has served me well for fifteen years, but with Plack/PSGI replacing the mod_perl API, it makes less sense to configure and install Apache just to invoke a PSGI handler. Starman has near-zero configuration, and Nginx provides a perfect complement for HTTP acceleration and serving static files.

So what’s Server::Control?

Server::Control is a set of libraries for controlling servers, where a server is any background process which listens to a port and has a pid file. Think apachectl on steroids
(and not for just Apache).

In the happy case, controlling a pid-file server is simple – just run the command to start it, and run kill `cat /path/to/pidfile` to stop it. Where Server::Control comes in is handling all the little unhappy cases.

For example, accidentally starting a server that’s already running or stopping a server that isn’t:

So I always take the extra time to set up Server::Control, and it usually pays off in reduced frustration in the end. For convenience I have aliases like this set up to start, stop, restart and ping (check the status of) each server on a machine:

I’ve been searching for the best way to normalize an argument list to a string, such that two argument lists convert to the same string iff they are equivalent. My ideal algorithm would

Compare embedded hashes and lists deeply, rather than by reference

Ignore hash key order

Ignore difference between 3 and “3″

Generate a relatively readable string

Perform well (XS preferred over Perl)

This is necessary for memoizing a function, or for caching a web page with query arguments.

As a strawman example, Memoize uses this as a default normalizer, which fails #1 and #3:

$argstr = join chr(28),@_;

The best candidate I’ve found to date is

JSON::XS->new->utf8->canonical

as it is fast, readable, and hash-key-order agnostic. CHI uses this to generate keys from arbitrary references.

However, JSON::XS treats the number 3 and the string “3″ differently, based on how the scalar was used recently. This can generate different strings for essentially equivalent argument lists and reduce the memoization effect. (The vast majority of functions won’t know or care if they get 3 or “3″.)

For fun I looked at a bunch of serializers to see which ones differentiate 3 and “3″:

It seems in general like the more sophisticated modules make this differentiation, perhaps because it is more “correct”, though it is the opposite of what I want in this case. Of the ones that report “equal”, not sure how to get them to ignore hash key order.

I could walk the argument list beforehand and stringify all numbers, but this would require making a deep copy and would violate #5.

If I find a great result that requires more than a few lines of code, I’ll stick it in CPAN, e.g. Params::Normalize.

Memoization is a technique for optimizing a function over repeated calls. When you call the function, the return value is cached (based on the arguments passed) before being returned to you. Next time you call the function with the same arguments, you’ll get the value back immediately.

Memoize is the standard Perl memoization solution and after twelve+ years still works well in the common case. However, since Perl caching support has come a long way, and memoization is just a specific form of caching, I wanted to try pairing memoization with modern cache features. Hence, CHI::Memoize.

A better key normalizer. Memoize just joins the keys into a string, which doesn’t work for references/undef and can generate multiple keys for the same hash. In contrast, C relies on CHI’s automatic serialization of non-scalar keys. So these will be memoized together:

Poet was designed and developed over the past six years at Hearst Digital Media. Today it is used to generate Hearst’s magazine websites (including Cosmopolitan, Esquire, and Good Housekeeping) as well as associated content management, subscription management and ad rotation systems. I’m very grateful to Hearst for agreeing to this open source release (though they bear no responsibility for its support or maintenance).

Why another Perl web framework?

To answer this requires a bit of history.

HTML::Mason was one of the early Perl “web frameworks”. Like its JSP/ASP/PHP contemporaries, its main trick was embedding code in HTML, but it contained enough web-specific goodies to serve as a one-stop solution. It relied heavily on mod_perl and had mailing lists filled with web-related discussions having nothing to do with templating.

Over time, a new breed of web framework emerged – Catalyst and Jifty and Mojolicious and Dancer in the Perl world, Rails and Sinatra and Django elsewhere. In these frameworks the templates moved from center stage to become just one piece of a large system.

HTML::Mason faced an identity dilemma; should it be a pure templating framework, or try to expand and better serve its traditional web development audience? In the end, and with coaxing from co-author Dave Rolsky, Mason 2 shifted decisively towards the former. It shed most of its web-specific code, thanks in large part to Plack/PSGI, and became more of a generic templating system (albeit still destined to spend much of its time generating HTML).

But for me, and for some others, Mason remains a great way to handle the whole web request – to dispatch URLs to components and process HTTP arguments and implement common behaviors for sets of pages. I prefer my page logic right next to my page view, rather than flipping between a controller and view that are often annoyingly coupled.

Moreover, fifteen+ years had left me with a pile of useful ideas, techniques, and conventions for web development. Mason wasn’t the appropriate place for them any more (if it ever was) but I need to collect them somewhere.

This is where Poet comes in. Poet doesn’t need a controller layer; it turns web requests into Mason requests, and happily lets Mason handle the rest of the work. Poet doesn’t have Mason’s identity crisis; it is proudly web-centric, the place to put all the web-related goodness that Mason developers want nearby.

There’s much more to come than I could put in this initial release, and I’m looking forward to pressing on with it! I hope it makes at least a few of your lives’ easier, and as always I welcome the feedback.

At work we have over 200 modules and Mason components that useCHI to cache some data or HTML. Each has its own
distinct namespace to prevent collisions.

Each namespace uses one of several different storage types — memcached, local file, NFS
file – depending on its usage characteristics. Each storage type has a set of default
parameters (e.g. root_dir for file) that rarely change. Finally, there are some defaults
we want to use across all of our caches.

To maintain a coherent cache strategy — and our sanity — we need a single place to
see and adjust all this configuration.

In the first paragraph we define overall defaults. In the second we define a set of
storage types, each with their own defaults. In the third we assign each namespace to a
storage type and an expiration time. Each level can override the defaults of previous
levels, and arguments passed in CHI->new override anything in configuration.

Support for this kind of configuration is available as of CHI 0.52. You should first
create a CHI subclass for your application, so as not to interfere with other CHI users in
the same process:

chromatic shows users how to run tests faster on cpanm and perlbrew installs. This strikes me as well-meaning advice that misses a much more basic point:

cpanm and perlbrew should not run tests by default.

This may sound heretical. Perl has always had a strong testing culture, and end-user testing may have once played a valuable role in testing a distribution under many systems. But we now have a CPAN Testers network which will run tests on countless systems and Perl versions, and report failures back to the author promptly and automatically. Distributions can be sent through the Testers’ gauntlet before ever being officially released. In this environment, it’s hard to see much additional value in ad hoc end-user testing.

None of this would matter if end-user testing was free. But it is not.

The costs of end-user testing

Slower installs. On my system, a fresh install of Moose and its dependencies takes three times longer with tests (2 minutes versus 41 seconds). A fresh install of Catalyst and its dependencies takes nearly four times longer with tests (9.5 minutes versus 2.5 minutes).

How many new Perl users find CPAN installs much slower than they need to be? How many would choose a 3-4 times speedup if they knew it was an option? It’s like having a turbo button and leaving it unpressed by default.

False positives. The more tests CPAN authors write, the more likely an occasional false-positive failure sneaks through (as in “Failed 1/1746 tests.”) In most such cases the module will still work for the user’s purposes. But the default behavior is to prevent the module from being installed at all. If your module depends on other modules, then any failure up the dependency chain likewise prevents your module from being installed, even if the failure has no bearing on your module’s efficacy.

How many new Perl users have unnecessarily failed to install a module like Moose or Catalyst because of an obscure, temporary failure deep in the dependency chain?

Fear of dependencies and code reuse. Slower installs and false positives are the main reasons why people complain about distributions having “too many dependencies”. (If the dependencies installed quickly and reliably, as they do with –notest and apt-get and yum, would anyone complain or even notice?) These complaints in turn encourage module authors to reduce or eliminate their dependencies, thus reinventing where they could be reusing.

It’s the wrong default for new users

Some veteran Perl folks may like tests to run on every install. That’s fine. But I suspect new Perl users just want things to install quickly and reliably, and in any event don’t have the experience to evaluate or take action on a test failure (especially one in an obscure dependency). For these users running tests is simply the wrong default.

I turn on –notest on each system I administer and preach it enthusiastically to every new Perl user I encounter. But I wish I didn’t have to mention it at all.

Despite its lofty martial-arts name, Tie::CHI is a simple module that allows you to tie a hash to a persistent CHI cache, using any of CHI’s backends.

I hardly ever choose Tie interfaces — too much magic — but occasionally they do produce pretty code. In this case, we have a watchdog script that sends USR2 signals to httpd processes that grow too large (so that they’ll log their call stack). Sometimes these processes stick around for a while, so I only want to send a certain number of signals per process.

Then it occurred to me that the watchdog restarts frequently, so I need to keep the kill counts persistent; and I should only limit on an hourly basis, because the same pid will eventually come around again. We already have a custom CHI subclass that we use for caching all over our application, so it was easy to plug it in: