The tagsoup package

TagSoup is a library for extracting information out of unstructured HTML code,
sometimes known as tag-soup. The HTML does not have to be well formed, or render
properly within any particular framework. This library is for situations where
the author of the HTML is not cooperating with the person trying to extract the
information, but is also not trying to hide the information.