Anonymous Web Scraping using Python and Tor

Requirements

Python

Requests is an HTTP library, written in Python, that is wrapper over urllib*, and is very pythonic in use.$ sudo pip install requests

Fedora

$ sudo yum install tor

Ubuntu

$ sudo apt-get install tor

After doing all the required installations. To start the tor and let run in background run following command.$ tor &

By default tor uses port# 9050 if not mentioned otherwise. You can check if the process is listening using command netstat.$ netstat -tupln

look for process listening on port# 9050

Now for the programming part do following open up the python interpreter and run commands as follows.>>> import socks
>>> import socket
>>> socks.setdefaultproxy(proxy_type=socks.PROXY_TYPE_SOCKS5, addr="127.0.0.1", port=9050)

Try opening that url http://icanhazip.com from your browser as well. This website shows your public IP address. You will see different IP address in browser and in program output. Now you can change above script and write your webscraping or webcrawling program around it and make your python program run anonymously on the internet.