Thursday, January 5, 2017

How to scrape https website with proxies

Hi all,

My last post about scraping with proxies is quite old and I decided to write a newer version of it. In particular, today I will emphasize how to scrape https website with proxies.

There are also good news about requests library. Requests has not been supporting socks proxies for quite a long time, but in 2016 there was a new release of it. So now requests fully supports both http and socks proxies.

So let's get started. Below I will show you 4 different examples of how to scrape a single https page. First, we will scrape it with requests using socks and http proxies. Second, we will do the same using urllib3 library.

Requirements

In order to use requests and urllib3, you need to install requirements first.

I tested all 4 cases using http, https and socks proxies. I checked nginx log on the real server to make sure that IP that hits https website is the proxy one. I only didn't try it with proxies that require authentication.