poll-mirrors.pl needs fixed

Details

Description

i just noticed that poll-mirrors.pl is setup to look for the KEYS file in the release dir on each mirror – Infra (wisely) tweaked the way mirroring happens recently to ensure that KEYS files are not mirrored anymore (presumably to help catch bad links advising people to download untrusted KEYS files)

we're going to need to updated poll-mirrors.pl to look for something else in each release dir ... changes/Changes.html perhaps?

Activity

While we're at it, it would be nice if it was possible to run poll-mirrors.pl with an arbitrary "apache_url_suffix" instead of a "version" so it could also be used to check for things like the Solr Ref Guide (SOLR-4618) .. but i can look into that in a separate issue down the road if someone just wants to get the minimum working here ASAP.

Hoss Man
added a comment - 12/Jul/13 02:20 While we're at it, it would be nice if it was possible to run poll-mirrors.pl with an arbitrary "apache_url_suffix" instead of a "version" so it could also be used to check for things like the Solr Ref Guide ( SOLR-4618 ) .. but i can look into that in a separate issue down the road if someone just wants to get the minimum working here ASAP.

Steve Rowe
added a comment - 12/Jul/13 04:08 Good catch Hoss. I'd rather not use Changes.html, because the script downloads the file, and that file is almost 500K right now - maybe one of the .css files in that same directory, e.g. http://www.globalish.com/am/lucene/java/4.3.1/changes/ChangesSimpleStyle.css

There's a separate (but maybe related to what you want to do here) issue with poll-mirrors.pl - when Shalin did the 4.3.1 release, he didn't upload all of the artifacts at once, and as a result, the script reported that "the release" was on all mirrors, even though some parts weren't there yet, rendering the information useless.

Maybe the script could take one or more suffixes, so that it could find any number of things on each mirror, and report how many mirrors have all of them?

Steve Rowe
added a comment - 12/Jul/13 04:21 "apache_url_suffix"
There's a separate (but maybe related to what you want to do here) issue with poll-mirrors.pl - when Shalin did the 4.3.1 release, he didn't upload all of the artifacts at once, and as a result, the script reported that "the release" was on all mirrors, even though some parts weren't there yet, rendering the information useless.
Maybe the script could take one or more suffixes, so that it could find any number of things on each mirror, and report how many mirrors have all of them?

Hoss Man
added a comment - 12/Jul/13 04:30 I'd rather not use Changes.html, because the script downloads the file, and that file is almost 500K right now ...
Yeah, i don't understand that either ... why aren't we using head() instead of get() ? ... LWP does the right thing for FTP servers and everything.
Here's a patch that...
switches to using HEAD
uses Changes.html instead of KEYS
adds supports for a "-path" option that can be used mutually exclusively with the "-version" option
if path is specified, maven is skipped
changes the existing "...." output to print an "X" for URLs that fail (so you cna get a quick sense of how many Mirror URLs are failing w/o waiting forthe summary at the end of all 240 requests)
adds a "-details" option that will print the failing URLs instead of the "X".

Just to be clear, yes we could use that small CSS file you mentioned, but that won't help my previously stated secondary goal of being able to use this on the ref guide (it's a single 5MB PDF file)

Maybe the script could take one or more suffixes, so that it could find any number of things on each mirror, and report how many mirrors have all of them?

Hmm... i guess, but the stats would get kind of confusing ... wouldn't it be easier just to run multiple invocations in separate terminals with each of hte paths you are interested it? In a situation like you're describing, does it really matter "what percentage have X and Y?" or just "what percentage have X? what percentage have Y?"

Hoss Man
added a comment - 12/Jul/13 04:36 why aren't we using head() instead of get()
Just to be clear, yes we could use that small CSS file you mentioned, but that won't help my previously stated secondary goal of being able to use this on the ref guide (it's a single 5MB PDF file)
Maybe the script could take one or more suffixes, so that it could find any number of things on each mirror, and report how many mirrors have all of them?
Hmm... i guess, but the stats would get kind of confusing ... wouldn't it be easier just to run multiple invocations in separate terminals with each of hte paths you are interested it? In a situation like you're describing, does it really matter "what percentage have X and Y?" or just "what percentage have X? what percentage have Y?"

Hoss Man
added a comment - 12/Jul/13 04:44 minor improvement to patch ... the summary output (Just before the sleep) was misleadingly claiming the file was available on maven central when using "-path" – fixed that.

One thing I noticed: your $usage includes "-V" but doesn't mention "-details" - I'm guessing you renamed the option but didn't change the $usage?

why aren't we using head() instead of get()

Just to be clear, yes we could use that small CSS file you mentioned, but that won't help my previously stated secondary goal of being able to use this on the ref guide (it's a single 5MB PDF file)

We aren't using head() because I didn't think of it . It's a good idea.

does it really matter "what percentage have X and Y?" or just "what percentage have X? what percentage have Y?"

The way I've used that script, the question has been: Can I announce that the release is available? This is answered when all parts of the release are downloadable from some threshold percentage of mirrors, thus "what percentage have X AND Y". As you say, though, this could be performed by running the script in multiple terminals with different paths. One goal of the script, though, was having just one place to go to get the answer to the question (thus lumping Maven in there too). Maybe the script could be (eventually - shouldn't block the nice changes you've made here) changed to allow multiple -path options, and print a number instead of a "." for presence or "X" for absence, representing how many of the files are downloadable at each mirror: "0", "3", etc.

Steve Rowe
added a comment - 12/Jul/13 05:11 Here's a patch that...
+1 for all the changes.
One thing I noticed: your $usage includes "-V" but doesn't mention "-details" - I'm guessing you renamed the option but didn't change the $usage?
why aren't we using head() instead of get()
Just to be clear, yes we could use that small CSS file you mentioned, but that won't help my previously stated secondary goal of being able to use this on the ref guide (it's a single 5MB PDF file)
We aren't using head() because I didn't think of it . It's a good idea.
does it really matter "what percentage have X and Y?" or just "what percentage have X? what percentage have Y?"
The way I've used that script, the question has been: Can I announce that the release is available? This is answered when all parts of the release are downloadable from some threshold percentage of mirrors, thus "what percentage have X AND Y". As you say, though, this could be performed by running the script in multiple terminals with different paths. One goal of the script, though, was having just one place to go to get the answer to the question (thus lumping Maven in there too). Maybe the script could be (eventually - shouldn't block the nice changes you've made here) changed to allow multiple -path options, and print a number instead of a "." for presence or "X" for absence, representing how many of the files are downloadable at each mirror: "0", "3", etc.
Thanks for working on this, Hoss!

One thing I noticed: your $usage includes "-V" but doesn't mention "-details" - I'm guessing you renamed the option but didn't change the $usage?

Good eye ... yeah, originally i named "-details" "-Verbose" but the way we are using GetOpt it does case insensitive short form args, so trying to use "-v 4.3.1" would error that -v was too vague (it didn't know if you wanted -verbose or -version) and i was too lazy to completely revamp the arg parsing to use something more sophisticated.

The way I've used that script, the question has been: Can I announce that the release is available? This is answered when all parts of the release are downloadable ...

I hear you, and i have some ideas of how to make it work better for what you're describing, but it means gutting most of how poll-mirrors.pl works now, so i split that off into LUCENE-5108

Hoss Man
added a comment - 12/Jul/13 19:22 One thing I noticed: your $usage includes "-V" but doesn't mention "-details" - I'm guessing you renamed the option but didn't change the $usage?
Good eye ... yeah, originally i named "-details" "-Verbose" but the way we are using GetOpt it does case insensitive short form args, so trying to use "-v 4.3.1" would error that -v was too vague (it didn't know if you wanted -verbose or -version) and i was too lazy to completely revamp the arg parsing to use something more sophisticated.
The way I've used that script, the question has been: Can I announce that the release is available? This is answered when all parts of the release are downloadable ...
I hear you, and i have some ideas of how to make it work better for what you're describing, but it means gutting most of how poll-mirrors.pl works now, so i split that off into LUCENE-5108