LOL.. now I understand bobince's persistence in MY post about regex vs HTML: http://stackoverflow.com/questions/3951485/regex-extracting-only-the-visible-page-text-from-a-html-source-document
(..and maybe some of you would be amused by my own persistence, too :)
However, as I stated numerous times in my comments, I wasn't out to parse the HTML per se, but "merely" interested in a much coarser extraction. And for my purposes, the regex approach works - it's a tradeoff between efficiency and total robustness. But the outcome is surprisingly solid. The final implementation can be found here: http://www.martinwardener.com/regex/
Mind you, regarding the "secondary" issue (extracting all links/URLs from an HTML document), it is of no concern that this implementation is over-eager (by design, btw) and picks out a few invalid URLs (mostly pertaining to script blocks) - those will be filtered out during the subsequent URL validation anyway.

Among programmers of any experience, it is generally regarded as A Bad Ideatm to attempt to parse HTML with regular expressions. How bad of an idea? It apparently drove one Stack Overflow user to the brink of madness: You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. ...