Remember to use File >> Make a copy to get your own version to work with.

Grabbing the top 100 results

Google allows URL parameters that modify various aspects of results returned, including one to provide 100 results. This parameter is "num", with a maximum value of 100. So, our desired URL is as below:

https://www.google.co.uk/search?q=[QUERY]&num=100

Note that this will not give exactly the same results as retrieving ten results page by page, because Google is more likely to "Group" multiple results together with smaller results pages. So, our top 100 will contain more repetition of results from the same sites.

However, Google often block queries from Google Docs, so minimising the number of requests we make is key to avoid frustrations like errors and no results. For this reason, it's ideal to make a single request to retrieve the entire top 100, even if this affects data quality slightly.

Xpath to scrape Google results

Next, we need to use the importxml function to grab the right section of a Google results page. This function returns the code from a remote URL matching an XPath query.

Google mark-up the clickable result links via an H3 heading with a class of "r". So, we can retrieve the URL of the result with a simple query:

//h3[@class='r']//a/@href

However, it would be ideal to grab the clickable text (the result title) as well which can be retrieved with the following query:

//h3[@class='r']

Ideally, we want to grab both items without the need to perform an extra query. Fortunately, XPath allows for multiple queries to be joined together, like so:

Tidying up

Because we're making two queries at once, Google will put the data into two separate rows. This makes the results display a little awkward, because we'd ideally see the title and URL in adjacent columns. This is fixed via hiding our actual results and then using the OFFSET function to reference the correct cells:

=OFFSET(B4, +C2, 0)

Google uses tracking URLs in results (url=?...&sa=) which we do not want to appear in our spreadsheet. So, we use REGEXREPLACE to tidy up the display:

=REGEXREPLACE(RESULTURL, "/url\?q=(.*)&sa.*", "$1")

Finally, to make the spreadsheet a little easier for users, we moved the actual Google domain queried into a separate 'configuration' worksheet, and also allowed for additional custom parameters in case people want to restrict results to particular countries or similar.

So, there we have it - scrape the top 100 Google results using Google Docs. Get your spreadsheet here; we hope you find it useful, and let us know any questions or problems in the comments below.

Posted by SEOThing

3 Comments - Scrape top 100 search results with Google Docs

Great tool. Started using this last week. Is there some kind of limit on how many queries can be used per day from a paid google apps account? Today, it seemed to stop working, despite changing the Google domain.

Andy LangtonMarch 06, 2017

Unfortunately, Google periodically block "themselves" from accessing search results. What happens is the IP of Google Sheets gets sent a CAPTCHA/I'm sorry page from Google. Of course, there's no way to complete the CAPTCHA to access the results, so no results will be returned.

While the spreadsheet method is ideal for a very quick analysis, unfortunately, it has reliability issues which can't be fixed within Google sheets. The only way around it while still using Google sheets would be to use XPATH on a proxy/rotating proxies. By this point, the tech investment means you may as well use something custom rather than a spreadsheet.