There are a number of examples of web page scrapers written using pyparsing, such as this one (extracts all URL links from yahoo.com) and this one (for extracting the NIST NTP server addresses). Be sure to use the pyparsing helper method makeHTMLTags, instead of just hand coding "<" + Literal(tagname) + ">" - makeHTMLTags creates a very robust parser, with accommodation for extra spaces, upper/lower case inconsistencies, unexpected attributes, attribute values with various quoting styles, and so on. Pyparsing will also give you more control over special syntax issues, such as custom entities. Also it is pure Python, liberally licensed, and small footprint (a single source module), so it is easy to drop into your GAE app right in with your other application code.