The Solr war Maven artifact was not being signed. (I have fixed this issue with the build.)

The Lucene (and probably Solr too) javadoc and binary jars are being built twice, once for the non-Maven artifacts, and again for the Maven artifacts. This results in timestamp differences in the jar manifests and in some javadoc HTML files. I think these should only be built once.

I guess the good news is that the above two are the only problems uncovered by the new Maven artifact checks.

This is my first attempt at Python scripting, so I welcome pythonic style critiques.

Steve Rowe
added a comment - 19/Dec/11 05:30 Patch against smokeTestRelease.py
I have not yet run the full script against a release with this patch - I commented out the other checks and then ran just the Maven checks.
The additional Maven artifact checks reveal two classes of problems with the final 3.5.0 release candidate http://people.apache.org/~simonw/staging_area/lucene-solr-3.5.0-RC2-rev1204988 :
The Solr war Maven artifact was not being signed. (I have fixed this issue with the build.)
The Lucene (and probably Solr too) javadoc and binary jars are being built twice, once for the non-Maven artifacts, and again for the Maven artifacts. This results in timestamp differences in the jar manifests and in some javadoc HTML files. I think these should only be built once.
I guess the good news is that the above two are the only problems uncovered by the new Maven artifact checks.
This is my first attempt at Python scripting, so I welcome pythonic style critiques.

The python style looks great! You don't need to do the C-battle-scars "if -1 == XXX:" (ie python will catch you if you do "if XXX = -1:" by accident), but no need to change that.

libxml2 isn't always available (at least on my OS X box it isn't installed); is it possible to use the "xml" module instead? Or does it not have the features you need....? (And, is it always installed...?).

if not, maybe we can make the maven checking optional, ie if the import fails then you get a warning that maven checking was not done...

I don't feel qualified to understand if the functions are doing the right thing... but that sure is a LOT of Python code Maven requires a lot of verifying I guess... it's awesome it catches the two problems from 3.5.0.

Michael McCandless
added a comment - 19/Dec/11 13:50 The python style looks great! You don't need to do the C-battle-scars " if -1 == XXX: " (ie python will catch you if you do "if XXX = -1: " by accident), but no need to change that.
libxml2 isn't always available (at least on my OS X box it isn't installed); is it possible to use the "xml" module instead? Or does it not have the features you need....? (And, is it always installed...?).
if not, maybe we can make the maven checking optional, ie if the import fails then you get a warning that maven checking was not done...
I don't feel qualified to understand if the functions are doing the right thing... but that sure is a LOT of Python code Maven requires a lot of verifying I guess... it's awesome it catches the two problems from 3.5.0.

libxml2 isn't always available (at least on my OS X box it isn't installed); is it possible to use the "xml" module instead? Or does it not have the features you need....? (And, is it always installed...?).

libxml2 enables XPath queries, which simplify the POM content checks. I thought libxml2 was generally installed - it is installed in my Cygwin installation - but I guess not. I tried using the "lxml" module, since it also includes XPath, and is said by several random Internet denizens to have a more pythonic API than libxml2, but the "lxml" module is not installed in my Cygwin distribution, and my (admittedly low-effort) attempt to install it wasn't successful .

Mike, do you know of any surveys of python modules' inclusion in different distributions?

I'll look into switching to the "xml" module and using DOM rather than XPath queries.

that sure is a LOT of Python code Maven requires a lot of verifying I guess...

Three sources of code volume here:

I tried to minimize changes in existing parts of the script, so there is duplication in several places (e.g. signature and hash checks).

I attempted to isolate each type of check to minimize function length and simplify maintenance; as a result, setup code is duplicated.

As you say, there's lots of verifying to do:

The Maven release artifacts are separately deployed in non-shallow directory hierchies, unlike the Lucene/Solr release packages, so a recursive crawl is required to collect them.

Each artifact has detached metadata (the POM), source, and javadoc jars that need to be validated.

Since the deployed POMs don't tell me if anything is missing, in order to figure out what should be deployed, I have to do a recursive crawl against the Subversion release branch to collect the POM templates.

Most of the Maven artifacts are copies of those in the Lucene/Solr distributions, so in contrast to the regular binary distributions' case, the Maven copies have to be verified as identical to their sources. In the case of the non-Mavenized dependencies that are published as Lucene and Solr artifacts, the deployed Maven .jar names are different from their sources, so a map has to be created to track the Maven artifact copies back to their sources.

The first of these could be addressed by refactoring. The second could be addressed without creating huge function bodies by merging functions with the same setup code, then making new functions that are called from inner loops. And the third is just the nature of the beast - I guess we could do less verifying, but that direction wouldn't get my vote .

Steve Rowe
added a comment - 19/Dec/11 16:44 Thanks for the review!
libxml2 isn't always available (at least on my OS X box it isn't installed); is it possible to use the "xml" module instead? Or does it not have the features you need....? (And, is it always installed...?).
libxml2 enables XPath queries, which simplify the POM content checks. I thought libxml2 was generally installed - it is installed in my Cygwin installation - but I guess not. I tried using the "lxml" module, since it also includes XPath, and is said by several random Internet denizens to have a more pythonic API than libxml2, but the "lxml" module is not installed in my Cygwin distribution, and my (admittedly low-effort) attempt to install it wasn't successful .
Mike, do you know of any surveys of python modules' inclusion in different distributions?
I'll look into switching to the "xml" module and using DOM rather than XPath queries.
that sure is a LOT of Python code Maven requires a lot of verifying I guess...
Three sources of code volume here:
I tried to minimize changes in existing parts of the script, so there is duplication in several places (e.g. signature and hash checks).
I attempted to isolate each type of check to minimize function length and simplify maintenance; as a result, setup code is duplicated.
As you say, there's lots of verifying to do:
The Maven release artifacts are separately deployed in non-shallow directory hierchies, unlike the Lucene/Solr release packages, so a recursive crawl is required to collect them.
Each artifact has detached metadata (the POM), source, and javadoc jars that need to be validated.
Since the deployed POMs don't tell me if anything is missing, in order to figure out what should be deployed, I have to do a recursive crawl against the Subversion release branch to collect the POM templates.
Most of the Maven artifacts are copies of those in the Lucene/Solr distributions, so in contrast to the regular binary distributions' case, the Maven copies have to be verified as identical to their sources. In the case of the non-Mavenized dependencies that are published as Lucene and Solr artifacts, the deployed Maven .jar names are different from their sources, so a map has to be created to track the Maven artifact copies back to their sources.
The first of these could be addressed by refactoring. The second could be addressed without creating huge function bodies by merging functions with the same setup code, then making new functions that are called from inner loops. And the third is just the nature of the beast - I guess we could do less verifying, but that direction wouldn't get my vote .

Michael McCandless
added a comment - 19/Dec/11 17:54 I think it's OK to stick w/ libxml2 then? Since you have it working already...
I don't know of any survey's about what's included and what isn't in Python. There's the docs for python's "standard library" ( http://docs.python.org/library/ ) – I guess that's the LCD.
The large volume of code is perfectly fine – we can refactor later. And I completely agree we want all the verification we can get...
And it's great that now we won't repeat the maven problems for 3.5.0... nice work!

I don't know of any survey's about what's included and what isn't in Python. There's the docs for python's "standard library" (http://docs.python.org/library/) - I guess that's the LCD.

The list of modules distributed with Python is here: http://docs.python.org/modindex.html. libxml2 isn't on this list. xml.etree.ElementTree is, though, so I've rewritten the patch to use it instead of libxml2.

Steve Rowe
added a comment - 29/Dec/11 05:26 I don't know of any survey's about what's included and what isn't in Python. There's the docs for python's "standard library" ( http://docs.python.org/library/ ) - I guess that's the LCD.
The list of modules distributed with Python is here: http://docs.python.org/modindex.html . libxml2 isn't on this list. xml.etree.ElementTree is, though, so I've rewritten the patch to use it instead of libxml2.

Modified the patch to use the xml.etree.ElementTree module, which is part of the base Python distribution, instead of the libxml2 module, which is not.

On Cygwin, which I use, Python is at v2.6, which doesn't include xml.etree.ElementTree v1.3, so the XPath support doesn't include attribute predicates; as a result, I had to break XPath queries where attribute checks are needed and perform them with code.

Steve Rowe
added a comment - 29/Dec/11 05:37 Modified the patch to use the xml.etree.ElementTree module, which is part of the base Python distribution, instead of the libxml2 module, which is not.
On Cygwin, which I use, Python is at v2.6, which doesn't include xml.etree.ElementTree v1.3, so the XPath support doesn't include attribute predicates; as a result, I had to break XPath queries where attribute checks are needed and perform them with code.

In this version of the patch: under Cygwin, the classpath separator is a semicolon (to allow Windows JVMs to function).

With this change, I successfully ran the whole script on Cygwin under Windows 7 (after commenting out the maven checks that fail for the 3.5.0 release: the Solr .war sig; and the maven artifacts being identical with their counterparts in the binary release).

Steve Rowe
added a comment - 30/Dec/11 00:42 In this version of the patch: under Cygwin, the classpath separator is a semicolon (to allow Windows JVMs to function).
With this change, I successfully ran the whole script on Cygwin under Windows 7 (after commenting out the maven checks that fail for the 3.5.0 release: the Solr .war sig; and the maven artifacts being identical with their counterparts in the binary release).
I'll commit this shortly.