If this is your first visit, be sure to
check out the FAQ by clicking the
link above. You may have to register
before you can post: click the register link above to proceed. To start viewing messages,
select the forum that you want to visit from the selection below.

Save a complete webpage *and the pages it links to*

I'm a member of another forum, and my PM inbox is nearly full. It uses outdated phpBB software with no "export" option, and saving these messages would require manually saving every single one, and the result would be an unorganized mess.

I've been wondering if there's a way to save a webpage *and the pages it links to*, so that when I click "file>save" it actually saves all of those links. Of course technically it probably links to all of the pages on the internet, if we follow each link, then each link at that link, and so forth. But I could either set it to just one-level, or to a limited directory, or whatever.

Another way of phrasing this is whether it's possible to save all files in a certain directory via HTTP rather than FTP, assuming all of the links are known/knowable. (I'm not trying to access hidden files.) Technically, this would include PHP-generated dynamic pages, such as those at a forum.

Of course this could potentially be used for bad purposes (stealing an entire forum), but let's focus on the good uses.

I know I could write this myself in PHP, but it would be a pain, and I'd probably be better off just saving all of the messages manually.

I think you can set something like that up on a server using PHP and an IE browser, but it's complicated.

What is done, if your server is Windows based, it should have IE installed on it or you can install it if it's not, you can then use PHP to drop to the OS and run the browser, navigate to the links one by one by 'reading' them off of the page, capturing the each page to a file as you go. You could even concat to one file.

But it just occurs to me typing this, that it might be easier to get the links using a file_get_contents on the main index page for your mailbox and parsing that to get the links you're interested in, then do a loop on those to file_get_contents on each that writes them to a single file, something like (for inside the loop):

I could write it myself (using file_get_contents(), or something along those lines), but this would be somewhat complicated because:
1. I'd rather just have a program that does it for me (if such a program exists-- I've found a couple that claim to do things like that on Google, although I'm not sure how well they work. I'm still looking around).
2. This particular case involves cookies (I need to be logged in), so that makes using PHP a little trickier. Not impossible, just trickier.
3. I'd like to have the links automatically set. It wouldn't be too hard to just loop through and save everything, but I'd like to have, for example, the timestamps from the private messages still useful for organizing them.

My internet used to go out frequently at my last residence so I used it to download a website or two so that I could get my internet fix when the internet went out for a day or two. I have not used it for a while since the internet has been better at my current location.

ajfmrf, that's almost perfect. I actually already had that for other reasons (to manage downloads), but I hadn't realized there was an option like it. Basically, it automates the saving-each-page process, which is close to what I want. The only problem is that it doesn't update all the hyperlinks so that the pages are linked together. That would be the most desirable feature, although for the moment that's sufficient to back up the information. (There's also a small problem that it only downloads the actual .html files, not external .css and so forth, so the formatting is a bit off. But in this case it doesn't matter too much for me.)

James, that's really cool. It works well. But in this case I must be logged in to view my PMs, and the repeated automated requests appear to cause the forum to log me out automatically (not sure why-- maybe to prevent what I'm trying to do, or maybe "security"). If I can work around that, I'll be happy with it. If not, it'll be useful for other (non-login) things.

I was trying to dl a site to my harddrive, but had the same problem you came across djr33. I used the downthemall! firefox plugin, however, when checking the saved .html files it obviously not saving the session, ie logging me out.

No update. I didn't figure out any way around the login problem, but for the moment my PM box on the other forum isn't full any more, so it's not a concern. I'm still casually looking for a solution, but mostly working on other projects. I'll reply here if I happen to come across anything though.

No update. I didn't figure out any way around the login problem, but for the moment my PM box on the other forum isn't full any more, so it's not a concern. I'm still casually looking for a solution, but mostly working on other projects. I'll reply here if I happen to come across anything though.

LOL yeah, I just figured what the hell as well. I think creating a script is a bit extreme, as there must be a simple way to do this that's already been done.