This is a project that I began about a year ago when i started learning Perl. Since then, it has been rotting in a folder until today, when I fixed up some of the code, commented it out, and so on.

The web crawler will start from whatever URL you feed it. After that, you can grab a cup of coffee and sit back, because it will run until there are no more links left to explore on the World Wide Web... or at least that's the idea It scans each web page for e-mail addresses and puts them in a neat little file for you (because why the fuck not?). I'm unsure how useful it is unless you sell e-mail lists, but I'm posting it mostly so that people who know better can critique my code. And who knows? Maybe someone here will find a use for it.

########### -- STARTING THE MAIN LOOP -- ##########MAIN: while (@urls) { # Set the number of milliseconds to sleep between each page crawled. # This is randomized to a value between 12 and 17 seconds by default $sleep = rand(500000) + 12000000;

# print to both the email database file and the console. select(STDOUT); print $&."\n"; open $emailfile,">>", ("emails.txt"); select($emailfile); if($firstmail == 0){ printf "\n-------------------- $curtime --------------------\n"; $firstmail = 1; } print $&."\n"; close $emailfile; $oldemail{$&} = 1; } } # The loop is almost done, upon a successful crawl through a link, it will sleep for the amount of time # set at the start of the loop. select(STDOUT); print "\nProgram waits for ". $sleep/1000000 ." seconds before next request.\nThis is to prevent blacklisting.\n"; usleep($sleep);}

# Function to be called if user does not use the terminal flags or there is a mistake in the URLsub getInput { # Get the URL to start crawling print "\nPlease enter a URL to start crawling. \n(Example: 'http://google.com' or 'yahoo.com')\n\n"; print "http://"; $startingURL = <>; chomp $startingURL;

print "\n\nDo you want the links that appear to be only the ones in the same domain that you typed in?\n". "This is useful to avoid following links to advertising sites as these usually do not contain e-mail addresses\n\n". "Domain = $domain \n\n". "1) Yes\n". "2) No\n". "3) Exit Program\n\n";