[Bug-wget] download page-requisites with spanning hosts

From:

Jake b

Subject:

[Bug-wget] download page-requisites with spanning hosts

Date:

Wed, 29 Apr 2009 18:50:11 -0500

I'm trying to download multiple pages from the sijun speedpaint thread
so I can use their images for my random desktop folder. I can download
each page by hand using firefox, but, this becomes unwieldy,
especially since prev button has bit of a delay. ( So I want to
automate it, with delays and/or speedcaps to be friendly to the server
)
The wGet command I am using:
wget.exe -p -k -w 15
"http://forums.sijun.com/viewtopic.php?t=29807&postdays=0&postorder=asc&start=27330&quot;
It has 2 problems:
1) Rename file:
Instead of creating something like: "912.html" or "index.html" it instead
becomes: "address@hidden&postdays=0&postorder=asc&start=27330"
2) images that span hosts are failing.
I have page-resuisites on, but, since some pages are on tinypic, or
imageshack, etc.... it is not downloading them. Meaning it looks like
this:
sijun/page912.php
imageshack.com/1.png
tinypic.com/2.png
randomguyshost.com/3.png
Because of this, I cannot simply list all domains to span. I don't
know all the domains, since people have personal servers.
How do I make wget download all images on the page? I don't want to
recurse other hosts, or even sijun, just download this page, and all
images needed to display it.
[ This one is a lower priority, but someone might already know how to
solve this ]
3) After this is done, I want to loop to download multiple pages. It
would be cool If I downloaded pages 900 to 912, and each pages next
link work correctly to link to the local versions.
I'm not sure if I can use wget's -k command, or, if that won't work
because of recursion on forums can be wierd?
Either way, I have a simple script that can convert 900 to 912 into
the correct URLs, and pausing in between each request.
Maybe I will have to manually modify links using regex's unless you
know a shortcut?
thanks!
--
Jake