Thursday, March 28, 2013

Introducing dumpmon: A Twitter-bot that Monitors Paste-Sites for Account/Database Dumps and Other Interesting Content

TL;DR

I created a Twitter-bot which monitors multiple paste sites for different types of content (account/database dumps, network device configuration files, etc.). You can find it on Twitter and on Github.

Introduction

Paste-sites such as Pastebin, Pastie, Slexy, and many others offer users (often anonymously) the ability to upload raw text of their choice. This is helpful in many scenarios, such as sending a crash report to someone or pasting temporary code. However, in addition to some people not being careful with what they upload (leaving passwords and other sensitive data in the text), attackers have been starting to use these sites to share post-compromise data, including user account data, database dumps, URLs of compromised sites, and more.

Since there are so many users uploading text to these sites, it's often difficult to find these interesting files manually. While techniques such as Google Alerts can be applied, the results are often a day or two old and are sometimes deleted. This prompted me to create a tool which monitors these sites in "real-time" (less than a minute of delay for the slowest sites) for specific expressions, and then automatically rank, aggregate, and post these results to Twitter for further analysis. I call this tool DumpMon.

Similar Tools

There are a couple of similar tools available which do essentially the same thing as dumpmon - with just a few key differences:

@PastebinLeaks - with its last tweet on December 16, 2011, PastebinLeaks no longer appears to provide pastebin monitoring. However, I really like how it integrated quite a few different expressions, such as one for HTTP passwords, Cisco and Juniper configuration files, etc. Unfortunately, as far as I can tell PastebinLeaks is closed-source.

@PastebinDorks - This bot (intentionally closed-source, still in "alpha") is still active and posts a few tweets per day. This bot appears to be primarily concerned with account credential dumps. I think the idea of assigning a numerical rank to a tweet could help determine the usefulness of a paste, but it makes the actual data found unclear.

My goal with dumpmon is to create the "next step" of paste site monitoring with the following key features:

Open-Source. I'm always open to contributions via Github. I'm working on creating all the documentation - should be up soon.

In the future, I would like to look into implementing the following features:

Automatically run found hashes through large wordlists and posting results

Allow users to tweet a regular expression they want monitored to the bot. The bot will then tweet them the paste once it finds a match

Search for interesting details from other sources of information (such as popular forums, etc.) instead of just paste sites

Allow caching of "most interesting" results to prevent deletion

Create daily/monthly reports that show the amount of detected data for aiding in password research

With those features outlined - let me quickly show you how I built the bot. Don't care? Just go straight to the bot here.

Bot Architecture

Here is the general architecture of the bot that's currently running:

As you can see, each site runs from its own separate thread which monitors for new pastes, downloads each one and matches it against a series of regular expressions. Then, if it finds a match, it will build and post a tweet that looks like the following:

If hashes are found, it will also include the number of hashes as well as the ratio of emails to hashes. The "Keywords" attribute seen gives an approximate ratio of "positive keywords" found out of a given list, such as "Target: ", "available dbs", "member_id","hacked by", "database: ", etc.), subtracting value for each regex matched from the blacklist. Just another metric to help determine if a paste is "interesting." It should also be noted that the emails are found are unique.

Don't Bite the Hand that Feeds

It's commonly that the most time-expensive part of web scraping is actually fetching the content. While I could go about speeding up this process by completely using an event-driven framework such as Gevent, Twisted, or others, I wanted to do my best to my best to respect the sites hosting the content. Also, I didn't want the tool to get temporarily blocked... For a third time (my bad, Pastebin). With this being the case, my bot uses the following algorithm to only get new pastes using polite time constraints.

Thanks for the comment! The tools certainly do look similar - I can't believe I hadn't seen your tool before I got started developing dumpmon! Looks like "great minds think alike".

And I like a lot of the things you've done with pystemon! Unfortunately, I think the structure of our solutions are so different, that it would be difficult to simply combine the two into one product. Also, I looked at the other sites, and all of them had around 1 paste every 4 hours, and they mostly seemed to be "garbage pastes".. I'll keep watch and see if it'd be worth the effort of making a quick module for them.

Although, with all that being said - I'm always open to contributions! I try to give credit where credit is due (as will be seen in a blog post shortly as well as on the Github "contributors" section). If you have any ideas to make dumpmon better, please don't hesitate to let me know!

You actually found something I've been meaning to do.. I have added the DB settings in my actual settings.py file, but never updated the example settings file. I've done that now, and you should see an update.

If you do not want to use a Mongo database, just set "USE_DB" to False, and you should be good to go! Adding DB support is a new feature, and I will be updating the readme shortly.

I think the error may be due to the Twitter Python library you have installed. There are two competing ones, and I started using one - but switched to the other when maxme and I discussed Python 3 compatibility. The library I am using now (which is also Python 3 compatible) is here: https://github.com/sixohsix/twitter

Let me know if that helps fix your problem! If you have the other one installed, you may need to remove it first, since I believe they are imported with the same name.

I have had issues configuring for a few reasons... I have a question how long before it starts outputting to twitter? when i run sudo python dumpmon.py i get output looks like email addresses but that is all i am getting..

ive been to twitter and created an app with my tokens and secrets etc....

If you start seeing output of email addresses, then the script is likely working properly. It's really difficult to debug issues without more information, but my best guess would be that there is an issue with the Twitter configuration in use. Can you use those same oauth creds to send a test tweet and see if it shows up? If it does, then you can assume it is an issue with the script, and I can look into it further.

Hello There !Recently I Came Across To This Wonderful Tool and I had installed on my Linux immediately. Everything is perfect but i don't know how to save the paste as they are identified (as you said earlier to edit the code in helper.py but i don't know how).Any feedback will be appreciated.Thanks