The task of email extraction is quite popular in the sphere of web scraping. Here I want to present you with a review of the GSA Email Spider, a useful program designed for collectingemails, phones and fax numbers from the web.

Some useful features of Email Spider

Extracts emails starting from a URL as well as from search results for a given keyword

Phone and fax numbers are collected too

Automated email sender

Harvests emails with the help of search engines (300+ included)

Supports https web sites

Supports SSL-only email providers (like google mail)

Allows using proxy in the crawling process

Can send emails directly using an internal SMTP server

Analyzes JavaScript code to to find hidden email addresses

Can cheat anti-spider protection (e.g. by using a random user agent string)

Collects emails with related extra information (e.g. addresses)

Has many filters for conditional extraction (like specifying keywords or excluding some domain names)

How it works

The program has a simple dialog-based interface. First, as I mentioned earlier, you choose between starting with a keyword or with a URL. Then you can tune the extraction process with dozens of settings in the Options tab:

For example, to narrow your email search you can set up an additional filter on what email you need to scrape:

After everything is set up press the Start button and the email extraction process will start. When I ran the demo version I used keywords “php”, “scrape”, “cookie” and the extraction results were following:

extraction time for 1000 results per search results was approx. 28 hours.

227,555 URLs were searched

49071 emails & phones were gathered

Though the demo version is limited to only 1000 search results per search engine, I was still impressed with the total number of emails that the spider could extract.

Auto mailer

The Email Spider does not only extract email from the web but also can automatically send messages to the extracted emails (this feature is available in the full version only). The settings of this feature are shown on the picture below:

Conclusion

GSA Email Spider is a really good helper in email and phone extraction. Being simple it is smart enough (due to the large number of options) to sift only the relevant information. As an additional feature, the in-built automailer allows you to easily send several emails based on a single template.