I frequently like to save news articles from the web for later perusal, since so many seem to disappear, as we all know. Up until now, I've used two methods for doing this. One, of course, is to print the page to a PDF. The upside to this is that all the images and formatting is preserved, but you lose hyperlink. The other, which I use for primarily text only pages, is to copy the text into TextEdit. Upside? Smaller files than PDF. Downside? Still no hyperlinks.

So, I just got my copy of iWork, and I decided I'd give Pages a shot at this. I called up an article, went to the "Print Friendly Version," and copied the text and images into a blank document. Not only did it preserve the text formatting, with a little image placement tweaking, but the hyperlinks were still present and functional! Where TextEdit just treated them as styled text, and the same for PDF export, Pages correctly recognized them as hyperlinks, and kept them intact. Now, how cool is that?

[robg adds: In 10.3 at least, TextEdit can retain hyperlinks in pasted text -- just make sure the document is set to RTF and not plain text. Still, this is an interesting use for Pages, as it gives you more control over the image placement of the page.]

This is interesting, I wonder if it can be combined with yesterday's hint about printing the current page url as header, so that you have a saved extract of a webpage with working links as well as the source url... wishful thinking?

Regarding saving web pages. More often than not, I need only a few text/graphics of a web page, this, I copy and paste into Hogbay notebook which handles all my little snippets. Often I have to do a second round copy and paste just so I can get a record of the original URL. It would be nice to streamline it into one step. About wget, my alias file is full of wget aliases and it is one of my most used tool. The only advange curl has is the ability to do some limited pattern matching to the url.

And Firefox does it for you!
Authored by: pnutslab on Feb 11, '05 01:33:59AM

I would also suggest you have a look at the Firefox Scrapbook extension. This enables you to capture a complete local copy of the webpage (while retaining all external links). I have a library of all of my archived webpages which I can access quickly through the Firefox scrapbook sidebar.

I strongly recommend the wonderful app "Webstractor." Not only does it allow you to save web pages, but you can *edit* them (for removing ads, for example), and it will build a table of contents for different saved pages as well.

There should be no need for a hint to save something. If you can't do it like in every other app (Save: cmd-S), then the browser sucks.

Well, most do.

Every major browser basically destroys what it "saves", and/or makes stupidity statements while at it (IE re-downloads what's currently displayed, currently stored in RAM, currently stored in disk cache, others find cool to alter the content and relocate linked files, effectively destroying the page from a page author's perspective, etc.)

ALL browsers' authors except one should be ashamed of themselves after all these years. It should have been obvious from day one that the proper way to do this was to "save" to a non-proprietary "archive" file format.

Which is precisely what iCab has been doing all that time: a zip archive containing the exact hierarchy of files from that page.

It lets you save the current page with absolutely no alteration, meaning I can "save" some page of mine from my website and use that as, well, an archive of that page, to later use or modify, which no other browser allows without ridiculous wizardry (your text processor saves your documents unaltered, as you're most likely to want them later, if you want dumbed-down versions of them it's an option, but not the other way around).

This also means that pages saved using older versions of my browser, which didn't render properly, DISPLAY PERFERCTLY IN LATER VERSIONS with better CSS support for example.

Yet I don't suggest to switch to iCab, because I'm fed up with all the crap I'm hearing about "incomplete CSS2, blahblahblah, useless, blahblah" (which hopefully will end when preview 3 is released, as beta 3.0 has caught up on that front already). Instead I suggest you write to your browser's authors to ask for some basic iCab features like saving. How ridiculous is that?

And since no other browser seems to originate from such a brilliant individual as the single developer of iCab, you'll have to tell them how to do it cleverly: the single trick needed to save to standard unmodified zip archives and still retain full original paths, relative and absolute links functionality to data inside AND outside the archive, and instruct the browser of where to find the saved page's source -which may be burried in a deep hierarchy- is to save that file as the first one in the archive. The rest should be obvious from exploring or decompressing some iCab archives using Zipit or Stuffit Expander.

Also, iCab has a tool to convert IE's useless uncompressed proprietary archives to plain zip iCab archives. Of course other browsers require you to decompress those and hunt for the proper page's file and loose relative links to online data, but at least it can be done. Don't expect this from others, specially M$ that leaves IE users with no future way to read "saved" pages. How safe is saving to a proprietary file format? What's the point of saving universal cross-platform web pages to a single-app file format?

iCab has been my default browser from the days of NS 3, in part because it was and still is the only browser that saves properly. So it's of course perfectly usable, even if it requires a secondary browser at hand just in case, which isn't really different from others anyway. At least I'm not constantly *censored*ing about how stupid and crappy my browser is.

Hey! funny thing, that "censor" filter in the comments ;-)

By the way, don't start me on filtering, I see close to zero ad in iCab. With all the developers' not-so-cleverness in browsers and all the adds on the web, browsing has become a torture to me without iCab ;-)