This week, another in the modules-I-plan-to-use series. For my current CPAN Twitter-bot, I essentially wrote the infrastructure that Timeout::Queue would have provided me with, had I known about it at the time. So I plan to use it in my ongoing re-write (which isn’t much further along than the last time I mentioned it).

What it does, in a nutshell, is manage a queue in terms of how soon each item is supposed to occur in time. As an element is enqueued, part of the process is specifying how soon the item should “time-out” in reference to the current moment. Then the object referent can be used to sleep until the next element’s time-out occurs, at which point you can retrieve all the items that are currently “timed-out”.

In my current bot, I poll the RDF feed from search.cpan.org every 15 minutes. When there are new items to post to the Twitter stream, I try to space them out over the next 15 minutes so that the bot doesn’t spew too many updates at once. I do this by dividing the 15 minute interval by the number of updates to post, then queuing them up with appropriate gaps between them. I also use the same queue approach to set the next poll of the feed, to check for changes/updates.

The code isn’t overly-complex, but it does lend itself to some subtle errors. In the early stages, I would often see updates come in “clumps”, because I had mis-managed the offset calculations. Had I known about this module, I could have saved myself some work. It does everything my code does, and does a few things more that I didn’t think to write.

If I could change anything about the module, I’d probably just have it offer a sleep() method to avoid having to explicitly ask for the current amount of time to wait, then having to do the sleep myself. It seems like that will always be the usage pattern, so it would make sense to have it be an available method. Then again, if it’s a good OO citizen and can be easily sub-classed, maybe I’ll just sub-class it and add the method myself! Then I can make the other change– the name. Call me pedantic, but I feel that “Queue” should have been the first element of the namespace, and I’m not really keen on the use of “Timeout”, since the items don’t really “time-out” in the sense of waiting for an alarm signal or anything. But these are minor nits.

This will be Yet Another piece of code that makes my coding task easier. (Once I get enough tuits to get back to that project.)

(If I keep covering multiple modules in a post, I’m going to have to change the title and tag I use…)

I generally try to use these posts to highlight lesser-known modules, and I imagine that the Net::Twitter module is fairly higher-profile than most of my previous choices. But are you familiar with Net::Twitter::Lite, as well?

It’s not unusual for CPAN to offer more than one solution to a given problem. The wide range of XML parsers is a testament to this. And when a subject is popular, the odds are even greater that people may choose to “roll their own” rather than trying to contribute to an existing effort. Fortunately, the interface to the social messaging service Twitter has been spared this. Maybe it’s because the source code is hosted on GitHub, and thus it is easier for people to contribute. Whatever the reason, the only real competition to Net::Twitter for basic Twitter API usage is Net::Twitter::Lite. And it’s not actually a competitor in the general sense.

Rather than representing a competing implementation, Net::Twitter::Lite came about as an (almost completely) interface-compatible alternative to Net::Twitter after it was refactored to use Moose internally. While it doesn’t have 100% of the features that Net::Twitter has, both modules strive for 100% coverage of Twitter’s API. Where N::T::Lite runs without the additional requirement of Moose, N::T gives you finer-grained control over which parts of the API are loaded and made available to connection objects.

I’ve used both modules, and can attest to the fact that the interface is kept consistent between them. At $DAY_JOB I authored a tool to echo data to a Twitter stream, for which N::T::L was the best choice as it had the fewest dependencies and our needs did not call for the additional functionality of N::T. My Twitter-bot (cpan_linked) was written with N::T in the pre-Moose days, and has not had a single problem since I seamlessly upgraded N::T to the Moose-based version. As I work on the next generation CPAN-bot, I’ll be using the OAuth support, as well as possibly the search API. Since it will be a long-running daemon, I’ll stick with the more-featureful N::T for it. But thanks to the diligence of the modules’ authors, I could just as easily swap between them at will.

If you’re planning to interface to Twitter from Perl, these two modules should be your starting point. But be sure to look at the other Twitter-oriented modules, just to be sure. There’s a lot of activity around this API, and Perl developers have kept on top of it.

When Higher Order Perl came out, one of the first concepts from it that I was able to make immediate use of was that of iterators. Wonderful things, iterators, when suitable to the task at hand. I used an iterator class to hide from the user-level when a DBI-style database statement handle was actually 4 separate handles on 4 separate hosts. So any time I see a stream interface get converted to an iterator, I at least give it a fair looking-over.

The File::Find::Object module is an excellent example of this. It takes the concept of File::Find as found in Perl’s core, and makes into an iterative, object-oriented interface. It has two features that sell me on it, over vanilla File::Find:

You can instantiate more than one instance of the finder at a time, as it has no global-variable usage to cause problems. This allows side-by-side comparison of finds run in different directories, sub-finds that execute based on interim results from the current find, etc.

Once initialized, it acts as an iterator. This has two obvious benefits: firstly, you can stop when you want without using any tricks such as die-ing or forcing $File::Find::prune. The second benefit is less apparent, until you run your find on a huge set of directories and files; as an iterator, the finder will only move forward as you call it. It doesn’t immediately sprint full-steam-ahead over the whole of the search-space.

Shlomi Fish has taken over most of the maintenance of the module. His main write-up on it is here, with links to CPAN, Kobesearch and Freshmeat. That page also links to File::Find::Object::Rule, a port of File::Find::Rule to FFO. Shlomi has also written about the module more extensively, under the heading, “What you can do with File-Find-Object (that you can’t with File::Find)“. This second posting has some very useful examples of FFO in action, and I highly recommend reading it and then giving FFO a try.

For this week’s Module Monday, I’m going to break form a little bit and actually look at three modules. All of these address the same basic problem, which I wrote about yesterday: parsing HTTP messages.

Right after writing the previous post, I discovered (by means of my CPAN Twitter-bot) two other solutions to this problem, both using linked C/C++ code for speed. So let’s have a look at all of them:

HTTP::Parser is the first one I discovered, and the one I’ve stepped up to help maintain. It has a pretty straight-forward interface, but requires that the content be passed to it as strings (though it can handle incremental chunks). Unlike the code in HTTP::Daemon that I hope to eventually replace with this, it does not read directly from a socket or any other file-handle-like source. It uses integer return codes to signal when it is finished parsing a message, at which point you can retrieve a ready-to-use object that will be either a HTTP::Request or an HTTP::Response, depending on the message.

HTTP::Parser::XS is the one I discovered via the Twitter-bot, and is also the newest of the pack. Tatsuhiko Miyagawa took this and wrote a pure-Perl fallback, then integrated them into Plack (more on the overall Plack progress in this blog post). The interface is a little unusual, compared to the more minimal approach of the previous option, in that it stuffs most of the information into environment variables in accordance with the PSGI specification (though in this case it uses a hash-table which is passed by reference, rather than actual environment variables). Which is great for projects (like Plack) that are specifically built around PSGI, but may not be as great for more light-weight parsing needs. Also, being very new, the documentation is very spare. It also uses integer return-codes to signal progress, and the codes are very similar in nature to those used by HTTP::Parser (the meaning of -1 seems to differ).

HTTP::HeaderParser::XS is the third of the set, and the one I discovered most-recently, as a result of a reference to it in the POD docs of the previous module. This one is over a year old, but seems to have just the one release. It is based on a C++ state-machine, and also offers only sparse documentation.

So, as I move forward with making HTTP::Parser a more generally-useful piece of code, these are my competition and hopefully inspiration. I’d like to see the speed of XS code eventually, but would prefer to make PSGI support an option so that the code is useful in more contexts.

This will be a slightly unusual installment of PMM, as I want to look at a module so new that it isn’t actually on CPAN yet, just GitHub: Plack. (When it makes it to CPAN, it should be here.)

Plack is a reference implementation of the burgeoning PSGI initiative. What is PSGI? Well, if you follow that link you’ll get a more complete explanation, but the short form is that it is a Perl alternative to Python’s WSGI (Web Server Gateway Interface) and Ruby’s Rack. The longer-form is that it’s a specification layer to decouple web applications from the specifics of how they’re being run, whether that’s CGI, FastCGI, Apache with mod_perl, etc. The longer explanation can be had at the link.

Back to Plack: Plack is the first reference implementation of the PSGI spec, and already it can pass all of the Catalyst tests. And as of this commit, the plackup script can coerce a an app written for Catalyst, CGI, etc. into running under different environments, thanks to the magic of PSGI.

I’ll be watching Plack very closely. I see a PSGI connector for my XML-RPC server in the not-too-distant future.

I was on vacation most of last week, so this week’s installment of PMM is going to be both short and self-serving. For this week, I’m going to “cheat” and talk about one of my own modules: Test::Formats. (I promise to not make a regular habit of using this feature to promote my own projects.)

This is a pretty simple concept: Rather than using lengthy, confusing regular expressions to test the validity of generated XML documents, why not use the validation already built in to the parser itself? The module isn’t for use on snippets, but then those can usually be tested with much simpler, easier-to-read regexp’s.

The tests you would write with this module are tests of the XML your Perl generates, not necessarily the Perl itself. Alas, time constrains me from any useful examples, so I hope you’ll check out the module itself on CPAN. Next week will be better, I promise!

For this week’s Module Monday, I’m looking at a recent discovery: IPC::Run3.

I came across this one while looking for best-practices tools to use when executing a sub-process and manipulating all of the file-handles, not just the input, or just the resulting output. I’m going to need this for an upcoming project, one that is needed at $DAY_JOB but for which I’ve been cleared to develop it as a CPAN module rather than an internal one.

What sets this module apart, in my consideration, is the ease with which it allows you to manipulate the input and capture the output. IPC::Open3 does very much the same sort of thing, and has the benefit of already being part of the core. But it uses only open file-handles as its currency, which leaves me doing much of the same open/write-or-read/close logic over and over. This module, in contrast, is very Perl-ish in how it regards each of the parameters for STDIN, STDOUT and STDERR. You can use file-handles, of course, but you can also pass the content for STDIN directly, save the results from the output streams directly, redirect them from/to /dev/null, etc.

Time and tuits permitting, I should have my new work on CPAN within the next 3-4 weeks. And when I do, IPC::Run3 will figure prominently in how it functions.

(This kicks off what I hope to be a regular, weekly series on my blog: focusing on a Perl module that’s unsung, or at least under-sung, and hopefully in doing do drawing some extra attention to a tool I feel can help other Perl developers.)

For my first “Perl Module Monday” post, I would like to introduce you to Adam Kennedy’s Test::XT. This module has been around for several months, but I only recently took the time to look at it, and see how I could utilize it.

When I first discovered the CPANTS effort, and the enormous amount of work its creators had put into it, I immediately set about improving my scoreboard. In CPAN circles, this was known as “gaming CPANTS”. And for good reason– a high score is an indicator of nothing more than the fact that your modules pass those particular metrics, none of which measure actual code quality. They only measure the quality of your distribution. I argued (which is almost too strong a word, as the discussion never really got that heated) that as more authors took the CPANTS guidelines to heart, the end result would be worthy in and of itself, a different sort of quality that stood on its own. Think of Ruby’s “gems”, and the perception of how effortless they are to install; many people have the (mistaken) impression that Perl modules are difficult, and that impression most likely came from one or two isolated incidents (whether personal or related anecdotally). And, at least in my case, it has led to better overall module development. I no longer release even the initial version of a module unless I’m pretty confident that it will meet at least the “required” metrics, if not the optional ones as well.

This dedication, though I pat myself on the back so publicly for it, has its price: a fair amount of duplicated effort. One example of this are the author-tests, or maintainer-tests if you prefer.

These are the tests that are really meant to be run only by we the authors, on our own modules. You, the user, really have nothing to gain from watching them run, because if any of them fail you really don’t have a stake in it. These are the tests for the cleanness of the POD structure, tests of the integrity of the YAML metadata file, etc. If “META.yml” doesn’t pass that test, that’s a lot less meaningful to you than if the test script for the actual functionality has one or more failures.

This is where Adam K. stepped in with Test::XT. It generates these boilerplate author/maintainer tests for you. Which handily beats my old practice of copying from an existing project when creating a new one. The test-files that it generates include checks, based on documented environment variables, that prevent the test-suites from running unless you have specified that you (as the author or maintainer) want to run them. It looks at two variables, in fact, to let you choose whether to run them during author-initiated builds, during designated “integration” (nightly, hourly, etc.) builds, or both. The logic is set up in a way that ensures the dependent modules (Test::Pod, Perl::Critic, etc.) don’t get loaded even for the purpose of the “can-we-run-these-tests” test. Which helps to avoid failing the “list of prereqs does not match actual use” metric on CPANTS. (And yes, I still have some modules that fail that, as I haven’t back-ported this to everything yet!)

It’s a simple module, not at all complex. I hope to offer some extensions or patches to it in the future, as it has been greatly helpful to me and I want to help make it even more so. So check it out– even if you aren’t a CPAN author you may find it useful for the tests you develop in your day-to-day work!