Wednesday, September 22, 2010

I'm too sick to be here, but I came into the office tonight to run the meeting. Mad props to Tommy for driving down from Westlake to present tonight. Aran is also sick, so he's bailing on presenting. His presentation "12 cpan modules in 12 penta minutes" may well be cursed.

Monday, September 20, 2010

I woke up to find a bevy of "Mail Delivery Failure" messages in my inbox. Seems the cpan-test reports I emailed in bounced back because cpan tester 2.0 dropped support of incoming email reports in favor of http. I'm excited to hear about this http switch, as I hated not being able to send test reports from machines that lacked email configurations.

This message was created automatically by the mail system (ecelerity).

A message that you sent could not be delivered to one or more of its
recipients. This is a permanent error. The following address(es) failed:

Monday, September 6, 2010

I've pushed a new release of Hadoop::Streaming to CPAN. It should be available in a couple of hours, depending on how long it takes your CPAN mirror to do the mirror update dance.

The release includes expanded documentation in the base Hadoop::Streaming placeholder file. Also included is a Hadoop::Streaming::Combiner role, for creating combiners. Combiners are like reducers that run post-map, per-merge. Once can reuse the reducer as combiner, if the reducer produces the same key/value format on output as input.

Thursday, September 2, 2010

Gitpan is a clone of all the modules on cpan in git form, nearly twenty-two thousand public repositories. This is not a place for development of modules. Instead it is a place to easily pull the current source for a module to make a patch and send to the maintainer, without having to find where she maintains her golden copy.

I read about gitpan a while ago, but then when I wanted to find it last week, I couldn't find the correct search terms. [github cpan] produces a list that doesn't include gitpan in the first page, as it is crowded out by the many perl modules developed on github for release to cpan and of course things like Net::GitHub and GitHub::Import, and an interesting discussion at perlmonks on (informal) perl naming convention for github projects.

Now that I know the name, it is still hard to find information! From the FAQ section of the readme:

What is gitPAN?
---------------
gitPAN is a project to import the entire history of CPAN (known as BackPAN) into a set of git repositories, one per distribution.

Why is gitPAN?
--------------
CPAN (and thus BackPAN) is a pile of tarballs organized by author. It
is difficult to get the complete history of a distribution, especially
one that has changed authors or is released by multiple authors (for
example, Moose). Because releases are regularly deleted from CPAN
even sites like search.cpan.org provide an incomplete history. Having
the complete history of each distrubtion in its own repository makes
the full distribution history easy to access.

gitPAN also hopes to make patching CPAN modules easier. Ideally you
simply clone the gitPAN repository and work. New releases can be
pulled and merged from gitPAN.

gitPAN hopes to showcase using a repository as an archive format,
rather than a pile of tarballs. A repository is far more useful than
a pile of tarballs, and contrary to many people's expectations, the
repository is turning out smaller.

Finally, gitPAN is being created in the hope that "if you build it
they will come". Getting data out of CPAN in an automated fashion has
traditionally been difficult.

Where is gitPAN?
----------------
The repositories are on github.com at http://github.com/gitpan
(watch out, it may overload your browser).

Code, discussion, and issues can be had at http://github.com/schwern/gitpan.