Category: Python

I was curious as to how one could scrape Googles Cache to recover a website that was recently taken down. Say for instance, you’re a real estate agent and your website was terminated by your previous hosting company.

I fired up an Amazon EC2 instance and placed the python script in ~/python – and allowed the script to run for about an hour. Again, I am not sure if Amazon or Google will rage but eventually Google will block the ip and you’ll get a 503 error. Keep an eye on this so you don’t get it raging. You can always run the script later after the ip block is removed and it will resume where you left off.

TL;DR: On line 19, change the search_site to your target site. Then go to line 48 and change ‘\’ to the destination directory, I used ‘/’