Announcing the "CPANTS Heavy 100" index

With the success of ORLite and ORDB::CPANTS I've finally managed to achieve something I've wanted for years, a cheap and well encapsulated way to screw around with CPAN graph data.

This has been possible for a long time, but I think I've finally found the solution that can do it in a Closed Problem way, and with each piece separately being a working, completed and published module.

This means I don't have to look after some random script on a website somewhere, and it lets everyone else take my work and maintain it for me:)

By combining ORDB::CPANTS with Algorithm::Dependency, this also means that now I can finally achieve a dependency-weighting engine for the CPAN dataset that is self-updating and requires basically no maintenance.

This in turn gives me the opportunity to fix one of the CPAN artifacts that I've disliked for a long time, the Phalanx 100. What I dislike about it the most is that it is just so arbitrary.

It's in the right solution area, but it is ultimately edited by humans, and it isn't updated in real-time (so it doesn't respond to CPAN usage trends).

So my plan is to "upgrade" the Phalanx 100 into a range of "Top 100" indexes that are automatically-generated, updated daily, and can be used as the basis for optimising and prioritising QA work.

I hope to release one of these new indexes every few days, with supporting code released to CPAN shortly after. As this list of lists starts to grow, I'd like to create a dedicated website ( which I'll notionally call http://top100.cpan.org/ ) to hold all the indexes.

To kick off the indexes, I'll start with the "CPANTS Heavy 100".

This is an index containing the 100 CPAN distributions with the largest dependency chains. These represent excellent sample cases for testing scenarios relating to typical large scale Perl applications in the wild.

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
Without JavaScript enabled, you might want to
use the classic discussion system instead. If you login, you can remember this preference.

Please Log In to Continue

First, I'm glad to see you doing this sort of thing. Automated CPAN analysis is good to have.
I'd like to correct a few notes on the Phalanx 100, though.
First, consider why the Phalanx 100 was created. The Phalanx project was an attempt to increase test coverage in the most-used modules on CPAN, so that Ponie would have a good test base to work with.
The Phalanx 100 was created by analysis of CPAN download logs for a one-month period from one mirror. We figured that would be a good enough estimate of

At the time that the Phalanx 100 was created, my specific beef was that it didn't appear to factor in dependencies.

So while we got a list of 100 modules, they weren't ACTUALLY the most 100 used, just the 100 most in some other sense.

I do, however, appreciate that they were based on usage data, as opposed to dependency data. And I totally plan to start factoring that into some of the indexes, once I've got the basic naive ones working.

I guess I take issue with your "beef" because it was never intended for your use. We didn't make any assertions as to how the data should be used, so it's not fair for you to say it's not what you want.

Our feeling on dependencies was that dependencies would have to get downloaded, too, and so those downloads would show that traffic. So you get dependencies in that data, but not weighted by the number of other modules that use the dependency. A single-use dependency would get as much weight as, say, HTM

I have some equivalent tools that crawl the packages themselves. I just ran my dependency chain tool for MojoMojo and come up with 239 deps rather than your 266. I'd be very curious to see what the discrepancy is.

It would be good to have a few different measures of module popularity. Personally, I think a listof the "most depended on" modules would be really useful. That some other module author would use a module I think is a pretty good vote of confidence for the usefulness and quality of that module. Such rankings would especially help when trying to choose between roughly equivalent modules for a project.