Anyone know of a way to save one page of a thread, which will also save the externally linked images hosted on sites such as imagevenue? If this thread were to be viewed offline the full-size images would load after clicking on a thumbnail.

Anyone know of a way to save one page of a thread, which will also save the externally linked images hosted on sites such as imagevenue? If this thread were to be viewed offline the full-size images would load after clicking on a thumbnail.

There are a lot of ways to do it, but first, here's what won't work: "save web page" -- that will only have the thumbnails.

You need a spider, something that will pull down the fullsize resources linked in the page.

Adobe Acrobat will do that, but you need the full version . . . and there are a zillion web spiders and crawlers that will do this. What OS are you on?

Google "website ripper" and you'll find a lot of choices.

My personal preference has been around forever, Teleport Pro . . . I'm sure there are better things out there

The Following 5 Users Say Thank You to deepsepia For This Useful Post:

There are a lot of ways to do it, but first, here's what won't work: "save web page" -- that will only have the thumbnails.

You need a spider, something that will pull down the fullsize resources linked in the page.

Adobe Acrobat will do that, but you need the full version . . . and there are a zillion web spiders and crawlers that will do this. What OS are you on?

Google "website ripper" and you'll find a lot of choices.

My personal preference has been around forever, Teleport Pro . . . I'm sure there are better things out there

Could you provide me with a template you use? I've taken a look at HTTrack in the past but I found it too complex. I'm trying to use a program called Cyotek Webcopy at the moment, but think I might have to spend a lot of time figuring it out. I did a test run on one page and it downloaded way too much stuff, so that I had to stop it. There are lots of things that would have to be excluded from the crawl.

Could you provide me with a template you use? I've taken a look at HTTrack in the past but I found it too complex. I'm trying to use a program called Cyotek Webcopy at the moment, but think I might have to spend a lot of time figuring it out. I did a test run on one page and it downloaded way too much stuff, so that I had to stop it. There are lots of things that would have to be excluded from the crawl.

I don't use that particular program so can't give any specific advice about using it, but here are the general issues with webspiders:

1) how deep will they search
2) will they stay on the same server, or look at other servers if there are links there.

each spider will have some kind of config panel where you set these options-- and they're important. Additionally, most spiders default to "polite" behavior, obeying robot exclusion rules; you usually have to override robot exclusion to rip a site.

As web technology has become more sophisticated, there are all sorts of technologies being used to defeat webscraping; if you can see it in your browser it almost certainly can be downloaded somehow, but it may take some doing, and you'll have to understand your particular application.

The Following 4 Users Say Thank You to deepsepia For This Useful Post: