CPAN Testers Report Status

After asking several times, Andreas thought he finally understood what the dates mean on the Status page for the CPAN Testers Reports. He started watching and making page requests to see whether his requests were actioned. On Day 3 he pointed out that the date went backwards! Once he'd shown me, I understand now why the first date is confusing. And for anyone else who has been confused by it, you can blame Amazon. SimpleDB sucks. It's why the Metabase is moving to another NoSQL DB.

The date references the update date of the report as it entered the Metabase. The last processed report is the last report that was extracted from the Metabase and entered into the cpanstats DB. Unfortunately, SimpleDB has a broken concept of searching. It will return results before the date requested, and regularly return the sorted results in an unsorted order. As such the dates you see on the Status page may go backwards in time! I'm not going to try and fix this, as it will all work as intended with the new system.

Missing Reports

There have been several questions relating to missing reports over the past few years. Sometimes it just needs me to refresh the indices, but in other cases it may be due to the fact that SimpleDB omits reports from a request. Did I mention SimpleDB sucks? In a request to the Metabase, I will ask for all the reports from a given date. The results are limited to 2500, due to Amazon's own restriction. In the returned list it will often omit entries, due to its ignorance of sorting in the search request. I have gone through the Metabase code on several occasions and can verify it does the right thing. SimpleDB just chooses to ignore the complete search request and returns what it *thinks* you want to know.

Ribasushi questioned me about one of his modules that had been released recently, which still had no Cygwin reports listed, even though he sent a few himself. Further investigation revealed that they are indeed missing from the cpanstats DB. Although they did enter the Metabase, they never came out again.

To resolve this I have been revisiting the Generator code to rework the reparse and regenerate code to enable search requests for missing periods, in the hope that this will retrieve most of the missing results. If it doesn't, then I will be asking David to produce a definitive list for me, and I will make specific requests for any missing reports. The Generator code has been updated in GitHub to include all the performance improvements that have been in live for some time too.

Erronously Parsed Reports

Every so often the parsing mechanism fails and stores the wrong data within the cpanstats DB. These days it seems to only affect the platform, OS version and OS name. I'm not quite sure what is happening, as reparsing the report locally again produces the correct results. This uses the same routine to parse the report, so why they occasional fail remains a mystery. However, to combat this, I now have a script that can run and search periodicly for this erroneous data and attempt to reparse the results. It can then alert me when it can't fix it and I can investigate manually. The have been occasions where the report can't be parsed due to the output being corrupted on the test machine, which unfortunately we can't always resolve. Sometimes there are enough clues within other parts of the report that point to a particular OS, but sometimes we just have to leave it blank.

It seems in putting some of this code live before leaving the hackthon, I accidentally reintroduced a bug. Slaven was quick to spot it and tell me about it, but unfortunately it was too late for me to fix it, as I needed to leave and catch my flight home. It should be fixed by the time you read this though, so all should be back to your regular viewing pleasure :) With the new script I've written, it should hopefully find and fix these errors in the future, as well as alerting me to fix the bug again!

Thanks Again

So that was the 2012 QA Hackathon. The show ended with a group photo, although a few were missing due to their early departures home, but I think we got most of us in. Including Miyagawa, who was taking the picture. The traditional thanks yous and good byes ensued and then Andreas and I headed off to begin our adventure getting the airport! The next hackathon, the 2013 QA Hackathon, will be in London. We'll have the domain pointed to the right place just as soon as Andy gets the website up and running. I look forward to a lot more involvement for next year, as we have been steadily growing in numbers each year. There has already been some significant output, but the event is much more than that. It's a chance to take to people face to face, discuss ideas and plan for the future. Expect more news for CPAN Testers soon.

I'm currently at the 2012 QA Hackathon working on CPAN Testers servers, sites, databases and code. It has already been very productive, and already I have two new module releases.

CPAN::Testers::WWW::Reports::Query::AJAX

This module was originally written in response to a question by Leo Lapworth about how the summary information is produced. As a consequence he wrote CPAN::Testers::WWW::Reports::Query::JSON, which takes the data from the stored JSON file. In most cases this data is sufficient, but the module requires parsing the JSON file which may be slow for distributions with a large number of reports. On the CPAN Testers Reports site, in the side panel on the distribution page, you will see the temperature graphs measuring the percentage of PASS, FAIL, NA and UNKNOWN reports a particular release has. This is glean from an AJAX call to the server.

But what if you don't want an HTML/Javascript styled response? What if you wanted the results in plain test or XML? Enter CPAN::Testers::WWW::Reports::Query::AJAX. Now you can use this to query the live data to for a particular distribution, and optionally a specific version, all the result values and get them pack as a simple hash to do with as you please.

I anticipate this might be most useful to project website who wish to display their latest results from CPAN Testers in some way. They can now get the data, and present it however they wish.

CPAN::Testers::WWW::Reports::Query::Reports

Now we get to perhaps the bigger module, even though its smaller than the one above. This module is perhaps most useful to all those who are trying to maintain a version of the cpanstats metadata from the SQLite database. As mentioned previously the SQLite database has been giving us grief over the past year, and we haven't gotten to the bottom of it. Andreas suspects there is some unusual textual data in some reports that is causing SQLite problems when it tries to store it. I'm not quite convinced by this, but as I'm only inserting records, I'm at a lost as to what else be the cause.

The SQLite file now clocks in at over 1GB compressed and over 8GB uncompressed, and is starting to take a notable amount of disk space (though considerably smaller than the 250GB+ Metabase database ;) ). It is also a significant bandwidth consumer each day, which can slow processing and page displays, as disk access is our limiting factor now.

Enter CPAN::Testers::WWW::Reports::Query::Reports. This module uses the same principles as the AJAX module above, but now accesses an new API on the CPAN Testers Reports site to enable consumers to get either a specific record or a whole range of report metadata records. Currently the maximum number of records that can be return in a single request is 2500, but this may be increased once the system has been proven to work well. Typically we have around 30,000 reports submitted each day, so to allow consumers to make best use of this API, I will look to increasing the limit to maybe 50,000 or 100,000. I want to impose a limit as I don't want accidental requests being sent to consume the full database in one go, as again this would put a strain on disk access.

The aim of the module is to allow those that currently consume the SQLite database, to more regularly request smaller updates and store the results in any database they so choose. Even into a NoSQL style database. It will ultimately reduce the bandwidth, data stored and processing to gzip and bzip2, which then means we can reallocate effort to more useful tasks.

If you currently consume the SQLite database, please take a look at this module and see how you can use it. I plan to include some example scripts that could be drop-in replacements for your current processes, but if you get there first, please feel free to submit them to me too, and I will include them with full credit. If you spot any issues or improvements, please also let me know.

CPAN Testers Platform Metabase Facts

This morning we had a CPAN Testers presentation and discussion hosted by David Golden. As there is plenty of interest from a variety of parties about CPAN Testers, it was a good opportunity to highlight an area that needs work, but which David and myself, as well as other key developers in the CPAN Tester community, just don't have time to do. Breno de Oliveira (garu or IRC) has very kindly stepped forward to look at one particular task, which we have been wanting to write since the QA Hackathon in Birmingham, back in 2009!

Breno has written a CPAN Testers client for cpanminus. At the moment its a stand-alone application, but it may well be included within cpanminus in the future. As part of writing the application, Breno asked David and myself about how the clients for CPAN::Reporter and CPANPLUS::YACSmoke create the report. Due to the legacy system we came from (email and NNTP) we still use an email style presentation of the reports. However, it has always been our intention to produce structured data. A CPAN Testers Report currently has only two facts that are required, a Legacy Report and a Test Summary. However there are other facts that we have already scoped, except they are just not implemented.

Back last year the Birmingham Perl Mongers produced the CPAN::Testers::Fact::PlatformInfo fact, that consumes the data from Devel::Platform::Info (which we'd written the previous year). The problem with the way test reports are currently created, is that we don't always know the definite platform information for the platform the test suite was run on. Reports, particularly in the Perl Config section, can lie. Not big lies necessarily, but enough that it can disguise why a particular OS may have problems with a particular distribution.

Breno is now looking to produce a module that firstly abstracts all the metadata creation parts from CPAN::Reporter, CPANPLUS::YACsmoke, Test::Reporter as well as his own new application, and puts them into a single library that can then create all the appropriate facts before submitting the report to the metabase. Hopefully he can get this done during the Hackathon, but even if he doesn't, we're hopful that he will get enough done to make it easy to complete soon after. Once we then patch the respective clients to use the new library, we will then start to be able to do interesting things with how we present reports.

The CPAN Testers Reports site only displays the legacy style report, which for most is sufficient, but it really would be nice to have some specially styled presentations for particular sections, or even allow user preferences to show/hide sections automatically when a user reads a report.

CPAN Testers Admin site

This is a site that I have been working on, on and off, for about 4 years, before we even had a Metabase. As a consequence it has been promised at various points and I've always failed to deliver. Now I have release the modules above, and there have been several comments already about having such functionality, I think I need to put some focus on it again. I have shown Breno the site running on my laptop and he has given me some more ideas to make it even more useful. It'll still be awhile before its released, but this will likely be down to running with some beta testers first before a major launch, just so it doesn't break the eco-system too badly!

Essentially the site was written to help authors and testers to highlight dubious reports and have them deleted from the system. Although the reports won't actually be deleted, they will be marked to ignore, so that they can be removed from JSON files and summary requests, as well as on the CPAN Testers Report site. This will hopefully enable us to get more accurate data, and bogus reports about running out of memory or disk space can be disregarded.

However, following Breno suggestions, I will look to making the site more public, so that authors can more easily see the reporting patterns without having to log in. The log in aspect will still be needed to flag reports, but the alternate browsing of reports by testers will be much more accessible.

Thanks

I would like to thank a few people who have helped to get me here, and have enabled these QA projects, not just CPAN Testers, to advance further.

Firstly I would like to single out ShadowCat Systems, who have very kindly paid for my flight here. Thanks to BooK and Laurent for organising the event, and to all the sponsors and Perl community who have provided the funding for the venue, accommodation and food for the event. It has already been very much appreciated, and hopefully the significant submissions to GitHub and PAUSE are evidence of just how worthwhile this event is.

Thanks also to all those who are here, and are helping out in all shapes and forms to help Perl QA be even better than it already is.

Some time ago I wrote Test-YAML-Meta. At the time the name was given as a compliment to Test-YAML-Valid, which validates YAML files in terms of the formatting, rather than the data. Test-YAML-Meta took that a step further and validated the content data for META.yml files included with CPAN distributions against the evolving CPAN META Specification.

With the release of Parse-CPAN-Meta I wrote Test-CPAN-Meta, which dropped the sometimes complex dependency of the more verbose YAML parsers, for the one that was specifically aimed at CPAN META.yml files. With the emergence of JSON, there was a move to encourage authors to release META.json files too. Although considered a subset of the full YAML specification, JSON has a much better defined structure that has more complete parser support. Coinciding with this move was the desire by David Golden to properly define a specification for the CPAN Meta files. It was agreed that v2.0 of the CPAN Meta Specification should use JSON as the default implementation. As a consequence I then released Test-JSON-Meta.

Although the initial naming structure seemed the right the thing at the time, it is becoming clearer that really the names need to be revised. As such I looking to change two of the distributions to better fit the implementations. So in the coming weeks expect to see some updates. The name changes I'm planning are:

Underneath these current namespaces is the Version module that describes the data structures of the various specifications. In the short term these will also move, but will be replaced by a dependency on the main CPAN-Meta distribution in the future. There will be final releases for Test-YAML-Meta and Test-JSON-Meta, which will act as a wrapper distribution to re-point the respective distributions to their new identities.

Unless otherwise expressly stated, all original material of whatever nature created by Barbie and included in the
Memories Of A Roadie website and any related pages, including the website's archives, is licensed under a
Creative Commons by Attribution Non-Commercial License.
If you wish to use material for commercial puposes, please contact me
for further assistance regarding commercial licensing.