Parsing tables and XML with Beautiful Soup 4

Welcome to part 3 of the web scraping with Beautiful Soup 4 tutorial mini-series. In this tutorial, we're going to talk more about scraping what you want, specifically with a table example, as well as scraping XML documents.

The first row is empty, since it has table header (th) tags, not table data (td) tags.

While this works just fine, since the topic is scraping tables, I will just show a non-beautiful soup method, using Pandas (if you don't have it, you can do pip install pandas, but the install will take some time):

Pandas is a data analysis library, and is better suited for working with table data in many cases, especially if you're planning to do any sort of analysis with it. If you are interested in Pandas and data analysis, you can check out the Pandas for Data Analysis tutorial series.

Finally, let's talk about parsing XML. XML uses tags much like HTML, but is slightly different. We can use a variety of libraries to parse XML, including standard library options, but, since this is a Beautiful Soup 4 tutorial, let's talk about how to do it with BS4.

One of the most common reasons that you might deal with an XML document is if you are trying to scrape a sitemap for a website. PythonProgramming.net has a sitemap.xml, so we'll use that.