Babysitting A CPAN Update

A few years ago in this column[ specifically, November 2002’s “Perl of Wisdom,” available online at http://www.linux-mag.com/2002-11/perl_01.html], I created a tool to create a “mini- CPAN mirror” on my laptop, allowing me to carry (just) the latest and greatest version of each installable module in the CPAN. I’m happy to say that my mini-CPAN mirroring program got quite a bit of attention, even being turned into a CPAN module of its own, CPAN::Mini.

I’ve also been a happy user, mirroring the mini-CPAN as often as hourly to my laptop. It’s only a light touch on the source server, so I don’t feel bad about running updates that frequently. Typically, I then bounce into a CPAN.pm shell to find which of my local modules need updating by entering the r command, which should tell me a sensible list of names.

But there are two problems with the r listing. The first problem is that it’s merely a listing. I have to either retype the out-of-date package names as parameters of the install command, or cut-and-paste very carefully, making sure to add spaces between the names. Ugh! The second problem is that some modules are broken for update, meaning that although I could install version 1.67 just fine, version 1.68 refuses to work on my box for any number of reasons. This means that after I’ve tried to install, it doesn’t work, and it’s still out of date. But then I do the same exact thing in an hour, wasting my time again.

Now, although there’s a programmatic interface to all of the things that the r command and install commands are doing, I found it easier to just think of the command-line as my application programming interface (API). What I needed was a script on top of this API to issue the r command, note its output, and create the appropriate install command, carefully omitting the recent past failures.

This kind of interactive-command babysitting is best handled by the CPAN’s Expect module. I had never used this module before, so I had to read the documentation very carefully. This is ironic, because I wrote the original chat2.pl to provide a similar function for Perl version 3, and Expect was inspired by the chat2.pl package (as mentioned in the documentation).

The basic notion of Expect is that you have a filehandle open on a process (or perhaps a socket or stdin) and that you give that process some length of time to generate a string that matches any of one or more regular expressions that you provide. (The process is a bit expensive, because Perl doesn’t have streaming regular expressions yet: as characters appear in chunks on the handle, they’re added to the end of a buffer, and the entire buffer is checked against each of the regular expressions in turn.)

Once the buffer matches the regular expression, everything up to the end of what matched is removed from the buffer. By default, this also exits the particular watching step, but each regular expression can also have an associated action subroutine. This subroutine can perform various actions and/or request that the expect operation be continued.

Expect can also be used to watch multiple handles, triggering various actions like sending the output from one handle to the input of another. Using carefully constructed regular expressions, you can get “in the middle” between a process and the terminal, for example, intercepting various input or output streams. As a convenience, the most common of these — run this command interactively, waiting for an escape character from the terminal — is provided as a simple routine.

As I was designing this program, I remembered that some of the CPAN installs are evil, in that they require interaction from the user. So while the install command is running, any keyboard input is automatically passed to the CPAN shell directly. The program doesn’t take over again until the CPAN shell prompt is seen again. As an added feature, if the output is idle for fifteen seconds, the terminal bell is rung, alerting me to my necessary task. Now I can truly just “fire and forget,” and wait for either the shell prompt or a series of bells.

So, let’s get right to the program, shown in its entirety in Listing One.

Lines 1 and 2 start nearly every program I write, enabling warnings and compile-time best practices.

Line 4 brings in the Expect module from the CPAN. Note that even though Expect doesn’t require IO::Stty, I highly recommend installation of that module as well.

Lines 7 and 8 define two of the configuration constants. The $LOSERS file contains packages that couldn’t be installed on the previous run of the program, and should be skipped on this run. The $BELL is how many seconds to wait for no output during the install phase before ringing the bell. And this repeats, so make sure you don’t set it too low!

Line 9 is the regular expression for the CPAN shell prompt, defined here because I use it repeatedly throughout the program.

Line 11 sets the terminal type to dumb so that the CPAN shell doesn’t get too smart, like invoking the readline interface or underlining some of the output.

Lines 14 to 16 create the CPAN shell job as an Expect object. The command to launch is given as the argument to spawn(). Setting restart_timeout_upon_receive means that timeouts are counted from the last output seen, not from the beginning of the expect cycle.

Line 19 similarly creates an Expect object on Perl’s standard input. This object is needed for the interaction during the install phase.

Lines 21 to 30 get to a CPAN shell prompt, using an expect() call against the $cpan object. The 10 on line 22 signifies that the scripts should wait at most 10 seconds for any of the patterns to match before dropping out as a timeout (triggering the die in line 30).

Line 23 is an array reference around one of the possible triggers, namely the matching of the CPAN prompt. If that’s a match, all of the characters up to and including that match are removed from the buffer and expect() returns the value 1 in a scalar context, indicating that the first trigger was hit (numbered starting at 1).

Lines 24 to 29 define another trigger. If the CPAN shell was terminated abruptly (like I accidentally closed the window in which the CPAN shell was running, which happens all too frequently), the CPAN notices that there’s an orphaned lockfile from another job and ask me if I want to remove the lockfile. The text of the regular expression in line 24 matches this case. The second parameter is a coderef to call, passing in the $cpan object as the first parameter as if it was a method call.

Inside the subroutine, I first clear out any remaining buffer items on the match (normally, only the match and before-match parts are cleared) (line 26), and then send a y to answer the prompt (line 27). Because the child process is operating in cooked mode, I have to send a return because I hit the return key on my keyboard to answer this, not the linefeed key.

Finally, the subroutine exits with the constant exp_continue, which conveniently returns the string exp_continue. This is a special return value that tells the expect() method to restart rather than exit (in this case, with the number 2 as the second possible match). At that point, the script starts looking for the CPAN prompt again.

Once the CPAN prompt reappears, the script ensures that the index is up-to-date by sending reload index to the process (line 33) and waiting up to 20 seconds for no more output (causing an abort), or the CPAN prompt, whichever comes first (line 34).

Line 37 fetches the out-of-date packages by calling the subroutine defined in lines 103 to 108, so let’s look there for a second.

Line 104 sends the now often-referenced r command. Line 105 waits for the banner at the top of the r report. This has the side-effect of flushing all output up to and including the banner, which is important for the next two steps.

Line 106 waits for the CPAN prompt. Line 107 extracts all of the text before the CPAN prompt using the before() method, then splits that into lines, then looks for package names at the beginning of each line. The result is a list of all packages that are out-of-date, which is returned from the subroutine in a list context. (In a scalar context, map returns the count of items, not very useful here.)

So, back to line 37. The script now has the list of modules that need updating. The next step is to subtract out the ones that didn’t work so well on the previous try. So, lines 40 to 43 fetch those, and line 45 turns them into a hash for easy filtering. Line 46 rips the losers out of the currently out-of-date packages.

Lines 49 to 52 post an alert that some of the outdated modules are going to be skipped, pointing me at a file I can edit if I want to retry them anyway.

If there are things to do, the big if starting in line 55 does them.

First, lines 58 and 59 ensure that the CPAN shell is in “follow” mode, so that dependencies won’t ask questions. (I normally leave my CPAN shell configured in “ask” mode so that it doesn’t go off into the deep without me getting a chance to say no.)

Then, line 62 does the deed, asking the CPAN shell to install all of the out-of-date modules.

Lines 65 to 83 set up the installation phase. First, line 65 puts the terminal into raw mode, so that characters are available to this program on a character-by-character basis. Echo is also disabled to prevent double echoing (the terminal running the CPAN shell is also echoing anyway).

Line 67 sets the timeout equal to the $BELL length defined above. Lines 69 to 73 define the timeout handler using the special timeout string as a pattern. If a timeout occurs, control-G prints to the terminal and then continues the expect loop.

Line 74 says that if a CPAN prompt appears, everything is done. This will also cause the expect() method to return 2, although we’re not testing that, because there’s really no unexpected way out of this expect loop.

Line 76 brings in “other” Expect objects to watch. The –i parameter can be followed by either a single Expect object (here $stdin), or an arrayref of Expect objects. The patterns below this entry apply to this object (list) instead of the original object. Multiple –i options can be included, allowing expect to watch many different Expect objects with many different sets of patterns.

The $stdin Expect object watching stdin is looking for only one pattern: any non-empty string (as given in line 77). If this is seen, the match method returns the string (line 79), which is then sent to the CPAN process immediately. Again, the special value exp_continue is returned to indicate that the loop should not exit (line 80).

Once the install phase is complete, the script needs to see if any progress was made. Line 86 invokes the r command yet again, and if anything is still there, reports the problem (lines 87 to 90). Lines 92 to 94 update the losers file with these packages, possibly emptying the file out if everything is now current.

Whether anything had to be installed or not, lines 98 and 99 now shut down the CPAN shell process cleanly.

And that’s all there is to it. The program captures the series of steps that I was performing manually, reducing it to simple program invocation. Expect can be used for some very cool things, and there are many examples to be found on the Internet.

Also, look for the TCL- based expect examples as well, as the syntax is very similar, although you’ll have to understand both TCL and Perl to complete the translation.