theHarvester is a neat information-gathering tool used by both ethical and non-ethical hackers to scrape up emails, subdomains, hosts, employee names, open ports, and banners from different public sources like popular search engines, PGP key servers, and the Shodan database. This program is particularly useful during the reconnaissance phase of gathering Open Source Intelligence (ONSIT).

The information provided on the cybersecurityman is for educational purposes only. I am in no way responsible for any misuse of the information provided. All the information here is meant to provide the reader with the knowledge to defend against hackers and prevent the attacks discussed here. At no time should any reader attempt to use this information for illegal purposes.The information provided on the cybersecurityman is for educational purposes only. I am in no way responsible for any misuse of the information provided. All the information here is meant to provide the reader with the knowledge to defend against hackers and prevent the attacks discussed here. At no time should any reader attempt to use this information for illegal purposes.

This program comes pre-installed in Kali Linux and it was created by Christian Martorella. The current version is version 3.0 (edit, I realized after I completed this post that I was using version 2.7.2 the whole time, so if you need to update theHarvester, you can find it here: https://github.com/laramies/theHarvester). Here is a short list of some of the options the theHarvester has to offer.

This isn’t an extensive list, and that makes it easy to use. But, notice all the data sources we can use using the -b argument, such as Baidu, Bing, Google, GoogleCSE, LinkedIn, PGP, Twitter, vhost, VirusTotal, netcraft, Yahoo, and so forth. We can also perform active attacks, including DNS brute force attacks, DNS reverse lookups, and DNS Top-Level Domain (TLD) expansions. Additionally, theHarvester comes with some examples to assist users in crafting useful commands.

In the first example, theharvester -d microsoft.com -l 500 -b google -h myresults.html tells theHarvester program to target microsoft.com and search for any information it can find using the Google search engine and discover available hosts by querying the Shodan database. However, the -l argument limits the number of results in a Google search to only 500. The myresults.html at the end of the command saves the results in an html file.

The second command, theharvester -d microsoft.com -b pgp, searches for e-mail accounts for the domain microsoft.com in a PGP server.

The third command on the list, theharvester -d microsoft -l 200 -b linkedin tells theHarvester program to search through the first 100 results of a Microsoft search on LinkedIn. This would identify a list of employees who either currently or previously worked for LinkedIn.

And the final command, theharvester -d apple.com -b googleCSE -l 500 -s 300, limits the search results for apple.com to 500 using Google’s custom search engine, but starts at 300 due to the -s argument.

Hopefully, this has made theHarvester syntax a little easier to understand. So, let’s work through a couple of examples on our own. I won’t be able to cover everything theHarvester can do, but I will try to cover most of them.

Gathering LinkedIn Users

Assume I am a penetration tester authorized to work for Apple. To a penetration tester who is gathering ONSIT, a job site or social media site is a sanctum. For job sites, such as LinkedIn, users voluntarily and publicly submit all types of information about themselves, such as their personal data, professional work history, education, contact information, interests, hobbies, and so forth.

Open up a terminal and use the command theharvester -d apple -l 100 -b linkedin.

This command searches for LinkedIn users who are affiliated with Apple, Inc.

These are all LinkedIn users affiliated with Apple in some way. Keep in mind that this is all publicly-available information. In this list of LinkedIn users, we could have Apple data scientists, Apple data engineers, Apple managers, or even people just interested in Apple.

Gathering E-mail Addresses

Or, let’s say I’m working for The Guardian and want to gather email addresses on journalists. I can do this using the command theharvester -d theguardian.com -b pgp.

This command will tell theHarvester to search for email accounts with the domain name “theguardian.com” in a pgp server, which is used for encrypting emails.

The output is cut, but this command provides a very long list of email addresses.

Using Microsoft’s Bing for E-mails and Hostnames

Many users like to use Google, but you can also use other popular search engines, such as Bing. Let’s get a list of email addresses and hostnames for UMD on a bing search.

We can use the command theharvester -d umd.edu -l 200 -b bing. I used the -l argument to limit the search number to 200 results.

The results give us quite a few emails to work with, but also several domain names and their corresponding IP addresses.

I had a lot of success with this feature until it came time to take a few snapshots. Unfortunately, theharvester stopped working. Maybe I’ll update this section if I find it starts working again.

DNS Brute Force Attacks

Users also have the ability to conduct DNS bruteforce attacks, which queries the target domain using a wordlist file. For example, we can use the command theharvester -d google.com -c -d google.

The -c argument is used to conduct a DNS brute force attack against google.com. With this command, we get the following subdomains.

By default, this command uses the “dns-names.txt” file, which is found in /usr/share/theHarvester/. If there is an error that occurs during this attack, it’s likely because there’s a small error in the configuration file. Open up the “dnssearch.py” file in usr/share/theHarvester/discover/ and locate where it says “Class dns_force()”. Use the “find” tool to make this easier. After you locate this section, change the following line from “self.file = dns-names.txt” to “self.file = usr/share/theharvester/dns-names.txt.” This should fix the error.

Helpful Tip

If you really want to get the most out of theHarvester, you can use the -b all argument to use all the data sources available when gathering information on your target.