Extracting Maximum Number of Pages

Let's say you're building an extractor, and you want to generate a list of URLs Using the URL Generator for a website with many pages (reviews, products, etc). How do you determine the number of pages to generate? If it's just a single webpage, you can just look at the site and take a note of the total number of pages.

This gets a lot harder and time consuming once you start increasing the number of webpages. How about 100 webpages? You'd have to go through every webpage and visually keep track of the total number of pages.

This article will provide the detailed step-by-step tips and tricks needed to build an extractor that grabs the maximum number of pages for websites.

What You'll Need

A simple understanding of HTML and XPath

A browser with developer tools (Almost all do!)

For this example, we'll be going through the process of grabbing the maximum number of reviews for a Walmart product page.

Locate The Page Numbers Using XPath

If you're not familiar with using a browser's developer tools, this may look very confusing, but it's not as bad as it looks! I'll be using Chrome, but this will work with other browsers as well.

There's one page of reviews, but there are no links for pages! Most websites will not show the number of pages if there is only a single page. With the XPath query we made in Step 5, it would return a blank value for the row. That's not good!

6. Set a default value of 1 if there is no XPath results

The XPath snippet below is used to set a default value if the XPath results is empty. I'd take a note of this because this is really useful for other use cases!