Harvesting google profiles

Some minutes ago, I saw an interesting tweet from Mikko H. Hypponen saying that he found out that all (yes, as in ALL – 35,513,445 ) google profiles addresses can be retrieved from a single XML file . Looked through it and , yeap, he was quite right.

Well , all these information is going to be useful somehow ,right? Right. In case it’s going to be removed here is a simple way to harvest them before that happens :

That’s it, save it , run it and wait 🙂 Not that I used it, but I calculate that you get around 1.7 GB worth of profile links .

Well , the juicy part is obviously the harvesting of the information from the profiles themselves. People are mentioning on twitter that Google is aware for a long time, or at least should be. Thoughts about the potential implications from that harvesting, on a blogpost to come .

Sorry I had missed your comment somehow. The instructions from daneelrsixth are valid. You will need to have python ofcourse installed and BeautifulSoup ( either via easy_install ) or via your distributions package manager

KR – copy the source code, open a text editor paste it, save the file as “google.py” than open the terminal go to the directory where you saved the file and digit “python google.py”. (PS i hope you are using an *nix system).