Entries in tektip
(15)

UPDATE: @kylemaxwell has accepted the pull of this script into the main maltrieve repo!

*Note: For starters, we need to say thanks as usual to technoskald and point you in the right direction to the Maltrieve Code on GitHub.

Overview

We have posted Maltrieve articles a couple times in the past, but the capabilities of this application continue to amaze us so we thought we'd add to our past contributions. During our initial build of a malware collection box (malware zoo creation) we utilized a standard concept of running Maltrieve throughout the day using a cron job. As most simple things do, this became rather complex based on the fact that the Maltrieve delivery is not categorized in any method, so finding what you're looking for is.....shall we say.....difficult at best. This article discusses a categorization method to help you organize your malware zoo so that it is manageable.

If you would prefer this article in video format, it is provided as well:

Getting started

The box containing the malware repository is a standard Precise Pangolin Ubuntu Distro (12.04 LTS), so no big tricks or hooks here. Maltrieve is installed in a standard format, but a 1TB drive is being utilized to store the malware retrieved. The box has 3TB worth of space for later use, but for now we'll deal with just the 1TB drive. The malware repository is mounted at /media/malware/maltrievepulls. All scripts utilized (to include the Maltrieve python scripts) are located at /opt/maltrieve. Again, nothing flashy in any of this, so it should be easy for you to get your box setup quick if you'd like.

Running Maltrieve Consistently

To begin the build of the malware repository, we wanted to run the maltrieve scripts hourly so that the directory would fill with new and interesting malware consistently and quickly. This screamed “crontab”, so we fired up a terminal and ran sudo crontab -l and then sudo crontab -e so that we could edit the crontab. Our initial entry was as follows:

This simply tells the system to run the maltrieve.py python script on an hourly basis and send the results to the /media/malware/maltrievepulls directory for safe storage. The second entry basically adds a little stamp in a file in my home directory so I can ensure the cron job is running every hour – you can obviously NOT include this statement if you don't see fit. In any case, we quickly noticed that the Maltrieve app was doing its job and we went about our business allowing the box to do what we asked. We quickly were swimming in malware and were ready to start analyzing to our hearts delight when we ran into the problem!

The Problem

Maltrieve does exactly what it's told and it does it well – find malware from specific sites and put it in a directory of your liking. And it finds LOTS OF MALWARE if you keep running it as we did in hopes of having a massive store. However, the files are given a hashed name that has very little use to the human eye, and they are just plopped merrily into the directory you choose when you run the malware.py python script. It became quite tedious to run the file command on files that just “looked” interesting based on a hashed filename that gave little meaning to what it might be in terms of formatting, or even payload. A quick look could allow you to do some judging by filesize, but basic command line sorting, grepping, awking, and loads of other tools were needed to try and fix the problem. These methods were simply tedious and after we began to have hundreds of GBs of malware, it became downright no fun any more. The picture below will show you a glimpse of the problem.

Hardly the beacon of light for finding what you're looking for from your malware repository.

Running the file command on a few of these things starts showing some potential though because what you get from doing this looks like:

So here we find that we have 2 pieces of malware, one is a Portable Executable for a Windows box and the other is a Zip archive. This is a very nice start, but was just 2 needles in a large and growing haystack, and the manual effort was laborious and downright daunting.

Bash to the Rescue

As coders love to do, our answer was to take the awesome product Maltrieve and throw some more code at it. My initial thought was to extend the python script, but since I pulled this from a GitHub repository I didn't want to modify the code and then have to “re-modify” it later if things were ever changed or upgraded. My answer was to create a small Bash Shell script and run it to help categorize our malware repository. The requirements we set upon ourselves were to categorize the code into multiple directories based on the first word output from the file command and then further categorize that by separating the code by size. We decided that 0-50KB files would be considered “small”, 51KB-1MB would be considered “medium”, 1.xMB-6MB would be considered “large”, and anything larger would be considered “xlarge”. It's a rather brutish method but it's something and it seems to work nicely. So in the end, we would want to see a directory tree that looked something like this:

--PE32

----small

----medium

----large

----xlarge

--Zip

----small

----medium

----large

----xlarge

and so on and so on.

Since we set up our maltrieve pulls to run hourly we decided to run the bash script - which we so obviously named maltrievecategorizer.sh – to run on every half hour, which allows maltrieve to finish and then categorizes the latest findings. To make this happen, we cracked open crontab again with sudo crontab -e and added the following to the end of the file:

30 * * * * bash /opt/maltrieve/maltrievecategorizer.sh

which just says to run our bash script on the half hour of every day of the year, plain and simple.

The Bash Script

The maltrievecategorizer.sh bash script can be seen below. An explanation follows the script.

#!/bin/sh

smallstr="/small"

mediumstr="/medium"

largestr="/large"

xlargestr="/xlarge"

smallfile=50001

mediumfile=1000001

largefile=6000001

root_dir="/media/malware/maltrievepulls/"

all_files="$root_dir*"

for file in $all_files

do

if [ -f $file ]; then

outstring=($(eval file $file))

stringsubone="${outstring[1]}"

case $stringsubone in

"a") stringsubone="PerlScript";;

"very") stringsubone="VeryShortFile";;

"empty") rm $file

continue;;

*);;

esac

if [ ! -d $root_dir$stringsubone ]; then

mkdir -p "$root_dir$stringsubone"

mkdir -p "$root_dir$stringsubone$smallstr"

mkdir -p "$root_dir$stringsubone$mediumstr"

mkdir -p "$root_dir$stringsubone$largestr"

mkdir -p "$root_dir$stringsubone$xlargestr"

fi

filesize=$(stat -c %s $file)

if [[ "$filesize" -le "$smallfile" ]]; then

mv $file "$root_dir$stringsubone$smallstr/"

elif [[ "$filesize" -le "$mediumfile" ]]; then

mv $file "$root_dir$stringsubone$mediumstr/"

elif [[ "$filesize" -le "$largefile" ]]; then

mv $file "$root_dir$stringsubone$largestr/"

else

mv $file "$root_dir$stringsubone$xlargestr/"

fi

fi

done

The first several lines simply create string literals for “small”, “medium”, “large”, and “xlarge” so we can use them later in the script, and then we create three variables “smallfile”, ”mediumfile”, and ”largefile” so we can compare file sizes later in the script. So far so good! The lines containing:

root_dir="/media/malware/maltrievepulls/"

all_files="$root_dir*"

for file in $all_files

do

if [ -f $file ]; then

do nothing more than set our root directory where our maltrieve root is and then run a loop against every file in that directory.

outstring=($(eval file $file))

Creates a variable called outstring that is an array of words representing the output of the file command. So using the file command output from above, the outstring array would have 818fc882dab3e682d83aabf3cb8b453b: PE32 executable (GUI) Intel 80386, for MS Windows in it. Each array element would be separated by the space in the statement, so outstring[0] would store: 818fc882dab3e682d83aabf3cb8b453b: and outstring[1] would store: PE32 and outstring[2] would store: executable and so on and so on. We are only interested in outstring[1] to make our categorization a possibility.

Our next line in the script

stringsubone="${outstring[1]}"

creates a variable named stringsubone that contains just the string held in outstring[1] so using the example above, stringsubone would now hold PE32.

The case statement you see next

case $stringsubone in

"a") stringsubone="PerlScript";;

"very") stringsubone="VeryShortFile";;

"empty") rm $file

continue;;

*);;

esac

fixes a couple problems with the file command's output. In the case of a piece of malware that is a Perl Script, the output that the file command provides is: a /usr/bin/perl\015 script. This may be helpful for a human, but it makes our stringsubone variable hold the letter “a” in it, which means we would be creating a directory later for categorization called “a” which is LESS THAN USEFUL. The same problem happens with something called Short Files where the output from the file command is: very short file (no magic) which means our stringsubone variable would hold the word “very” which isn't a great name for a directory either. The case statement takes care of these 2 and allows for a better naming method for these directories. It also allows for the removal of empty files which are found as well.

The next lines

if [ ! -d $root_dir$stringsubone ]; then

mkdir -p "$root_dir$stringsubone"

mkdir -p "$root_dir$stringsubone$smallstr"

mkdir -p "$root_dir$stringsubone$mediumstr"

mkdir -p "$root_dir$stringsubone$largestr"

mkdir -p "$root_dir$stringsubone$xlargestr"

fi

simply tell the script to look in the directory and if a directory that has the same name as stringsubone does not exist then create it. Then create the directory small, medium, large, and xlarge within that directory for further categorization. Using the PE32 example from above, basically this says “if there's no PE32 directory in this root directory, create one and create the sub-directories small, medium, large, and xlarge within that directory. If the PE32 directory already exists then do nothing”.

The remaining lines look difficult but are simple:

filesize=$(stat -c %s $file)

if [[ "$filesize" -le "$smallfile" ]]; then

mv $file "$root_dir$stringsubone$smallstr/"

elif [[ "$filesize" -le "$mediumfile" ]]; then

mv $file "$root_dir$stringsubone$mediumstr/"

elif [[ "$filesize" -le "$largefile" ]]; then

mv $file "$root_dir$stringsubone$largestr/"

else

mv $file "$root_dir$stringsubone$xlargestr/"

fi

fi

first we create a variable called filesize and then using the stat command, we store the file size in that variable. Then we find out if the file fits in our category of small, medium, large, or xlarge using if and elif comparison statements. Whichever comparison statement turns out to be correct is where the file is then successfully moved.

The results of this solution are in the picture below.

Conclusion

As you can plainly see, we now have the ability to quickly look for specific files in an easier fashion. If I am looking for a piece of malware that I know to be in HTML format that was over 50KB, but less than 1MB, I can easily roam to HTML->medium and a one-liner file command with some grepping and find what I am looking for. I'm certain there are other methods to go about this process and probably WAY better methods of categorizing this directory, so if you have some ideas please shoot them our way and we'll give them a try and see if we can help the community.

In this episode of TekTip we take a look at a new tool I created called hashMonitor. hashMonitor will monitor specific twitter and web resources for database dumps that include MD5, SHA1, or SHA256 hashes. Once found, hashMonitor will store the hashes in a local database which can then be used for cracking purposes.

Before getting into Moloch, I wanted to take a moment to say thank you to everyone who has putting the word out there on Automater lately. Automater has got a lot of recognition lately (thanks Reddit), which has been very motivating.

In this episode of Tektip, we take a closer look at one of the most exciting projects showed at Shmoocon 2013, Moloch.

"Moloch is an open source, large scale IPv4 packet capturing (PCAP), indexing and database system. A simple web interface is provided for PCAP browsing, searching, and exporting. APIs are exposed that allow PCAP data and JSON-formatted session data to be downloaded directly. Simple security is implemented by using HTTPS and HTTP digest password support or by using apache in front. Moloch is not meant to replace IDS engines but instead work along side them to store and index all the network traffic in standard PCAP format, providing fast access. Moloch is built to be deployed across many systems and can scale to handle multiple gigabits/sec of traffic."

"Andy Wick and Eoin Miller are members of AOL’s Computer Emergency Response Team. Andy Wick has more than 15 years of development experience at AOL. He has recently come into the CERT group and has begun developing tools for defense and forensics. Eoin Miller specializes in using IDS and full packet capture systems to identify drive by exploit kits and the traffic that feeds them (malvertising in particular). He regularly contributes the developed signatures to EmergingThreats/OISF and other groups."

Now I have put a lot of time into MASTIFF lately and haven't had a chance to get Moloch installed and configured properly quite yet. Luckily, the securabit.com team has given me access to their lab, where they have Moloch built out, along with many other products. A huge thank you to them, especially Mike Bailey (@mpbailey1911) who took the time to get Moloch installed and configured, with a decent amount of traffic pumping through it.

The version of Moloch I am using for this video is 0.7.3. Moloch gives the user an efficient method of browsing, querying, exporting, and visualizing packet data. Some commercial products I would say are similar in function are NetScout, NetWitness, and Cascade.

The power of Moloch, at least for what I will be using it for is the ability to have immediate access to traffic data and pcaps that match custom filters based on fields that are not normally available as a queryable field, such as http header information. As Moloch uses a syntax for filters very similar to wireshark, network analysts will quickly adapt to the product. On the visualization side, there is a maltego like feel. It shows how IP Addresses and ports relate to each other based on the data you have filtered on.

As Moloch is still early in development I expect the product will evolve to incorporate even more features. My current Moloch wish list is:

Groups: Have the ability to create groups of IPs, Services, tags, and then query on those groups. An example is create a group for all of your DNS servers and then write a filter to the effect of "IP Source of Not 'DNS Servers' to External on UDP/53"

Save Filters: Would be nice to be able to save filters for future use.

Share saved filters: Share filters with other users.

Enjoy of the screenshots, and check out the video for a more in-depth look.

1. First you will need a password dump to play with. There are several out in the wild. You can find some here:

http://www.skullsecurity.org/wiki/index.php/Passwords

For my demo I will use the recent (kinda) Yahoo dump

2. Get the file ready for pipal:

You only want the passwords in a file for Pipal, cut out the rest.

cat yahoousersandpass.txt | cut -d: -f 3 > yahoopassesonly.txt

3. Run Pipal:

./pipal.rb ~/leakedpasswords/yahoopassesonly.txt -o yahoodemo

4. Analyze results

We analyzed 442837 passwords in this dump!

Total entries = 442837

Total unique entries = 342509

Here we see some pretty standard bad passwords:

Top 10 passwords

123456 = 1667 (0.38%)

password = 780 (0.18%)

welcome = 437 (0.1%)

ninja = 333 (0.08%)

abc123 = 250 (0.06%)

123456789 = 222 (0.05%)

12345678 = 208 (0.05%)

sunshine = 205 (0.05%)

princess = 202 (0.05%)

qwerty = 172 (0.04%)

Base passwords are password that contain a word but are not only that word:

Top 10 base words

password = 1374 (0.31%)

welcome = 535 (0.12%)

qwerty = 464 (0.1%)

monkey = 430 (0.1%)

jesus = 429 (0.1%)

love = 421 (0.1%)

money = 407 (0.09%)

freedom = 385 (0.09%)

ninja = 380 (0.09%)

sunshine = 367 (0.08%)

As we see in most password dumps, most people go with 8 character passwords. This is a common requirement, and has been drilled into people for a while now, so no surprise there. 116 people had a 1 character password though? I usually don't try passwords less than 4 characters when I password crack, guess I might need to bring them back in.

Password length (length ordered)

1 = 116 (0.03%)

2 = 70 (0.02%)

3 = 302 (0.07%)

4 = 2748 (0.62%)

5 = 5324 (1.2%)

6 = 79629 (17.98%)

7 = 65610 (14.82%)

8 = 119133 (26.9%)

9 = 65964 (14.9%)

10 = 54759 (12.37%)

11 = 21218 (4.79%)

12 = 21729 (4.91%)

13 = 2657 (0.6%)

14 = 1492 (0.34%)

15 = 837 (0.19%)

16 = 568 (0.13%)

17 = 262 (0.06%)

18 = 125 (0.03%)

19 = 88 (0.02%)

20 = 177 (0.04%)

21 = 10 (0.0%)

22 = 7 (0.0%)

23 = 2 (0.0%)

24 = 2 (0.0%)

27 = 1 (0.0%)

28 = 4 (0.0%)

29 = 2 (0.0%)

30 = 1 (0.0%)

Password length (count ordered)

8 = 119133 (26.9%)

6 = 79629 (17.98%)

9 = 65964 (14.9%)

7 = 65610 (14.82%)

10 = 54759 (12.37%)

12 = 21729 (4.91%)

11 = 21218 (4.79%)

5 = 5324 (1.2%)

4 = 2748 (0.62%)

13 = 2657 (0.6%)

14 = 1492 (0.34%)

15 = 837 (0.19%)

16 = 568 (0.13%)

3 = 302 (0.07%)

17 = 262 (0.06%)

20 = 177 (0.04%)

18 = 125 (0.03%)

1 = 116 (0.03%)

19 = 88 (0.02%)

2 = 70 (0.02%)

21 = 10 (0.0%)

22 = 7 (0.0%)

28 = 4 (0.0%)

23 = 2 (0.0%)

24 = 2 (0.0%)

29 = 2 (0.0%)

30 = 1 (0.0%)

27 = 1 (0.0%)

|

|

|

|

|

| |

| |

||||

|||||

|||||

|||||

|||||

|||||

|||||||

|||||||

||||||||||||||||||||||||||||||||

00000000001111111111222222222233

01234567890123456789012345678901

One to six characters = 88189 (19.91%)

One to eight characters = 272932 (61.63%)

More than eight characters = 169905 (38.37%)

66% only used lowercase alpha characters or only used numbers.

Only lowercase alpha = 146516 (33.09%)

Only uppercase alpha = 1778 (0.4%)

Only alpha = 148294 (33.49%)

Only numeric = 26081 (5.89%)

A common trend is for people to capitalize the first character, or add a number or special character to the end of a password.

First capital last symbol = 1259 (0.28%)

First capital last number = 17467 (3.94%)

While months were used in passwords a decent amount in this dump, it doesn't look like days made up many of them.

Months

january = 106 (0.02%)

february = 30 (0.01%)

march = 192 (0.04%)

april = 284 (0.06%)

may = 725 (0.16%)

june = 386 (0.09%)

july = 245 (0.06%)

august = 238 (0.05%)

september = 68 (0.02%)

october = 182 (0.04%)

november = 154 (0.03%)

december = 130 (0.03%)

Days

monday = 48 (0.01%)

tuesday = 15 (0.0%)

wednesday = 9 (0.0%)

thursday = 18 (0.0%)

friday = 47 (0.01%)

saturday = 6 (0.0%)

sunday = 30 (0.01%)

Months (Abreviated)

jan = 1007 (0.23%)

feb = 172 (0.04%)

mar = 4719 (1.07%)

apr = 472 (0.11%)

may = 725 (0.16%)

jun = 798 (0.18%)

jul = 656 (0.15%)

aug = 504 (0.11%)

sept = 184 (0.04%)

oct = 425 (0.1%)

nov = 519 (0.12%)

dec = 404 (0.09%)

Days (Abreviated)

mon = 4431 (1.0%)

tues = 16 (0.0%)

wed = 212 (0.05%)

thurs = 29 (0.01%)

fri = 479 (0.11%)

sat = 365 (0.08%)

sun = 1237 (0.28%)

Another common trend is for users to add the year of their birth, or wedding, or the current year to their password. While it may be surprising that 2010, 2011, and 2012 didn't have many hits if you take the source into account it makes sense. The Yahoo dump comes from an old database that was used as part of a migration for a company that Yahoo bought call Associated Content. This purchase occurred in 2010.

Includes years

1975 = 255 (0.06%)

1976 = 266 (0.06%)

1977 = 278 (0.06%)

1978 = 332 (0.07%)

1979 = 339 (0.08%)

1980 = 353 (0.08%)

1981 = 331 (0.07%)

1982 = 359 (0.08%)

1983 = 338 (0.08%)

1984 = 392 (0.09%)

1985 = 367 (0.08%)

1986 = 361 (0.08%)

1987 = 413 (0.09%)

1988 = 360 (0.08%)

1989 = 401 (0.09%)

1990 = 304 (0.07%)

1991 = 276 (0.06%)

1992 = 251 (0.06%)

1993 = 218 (0.05%)

1994 = 202 (0.05%)

1995 = 147 (0.03%)

1996 = 171 (0.04%)

1997 = 140 (0.03%)

1998 = 155 (0.04%)

1999 = 189 (0.04%)

2000 = 617 (0.14%)

2001 = 404 (0.09%)

2002 = 404 (0.09%)

2003 = 345 (0.08%)

2004 = 424 (0.1%)

2005 = 496 (0.11%)

2006 = 572 (0.13%)

2007 = 765 (0.17%)

2008 = 1145 (0.26%)

2009 = 1052 (0.24%)

2010 = 339 (0.08%)

2011 = 92 (0.02%)

2012 = 130 (0.03%)

2013 = 50 (0.01%)

2014 = 28 (0.01%)

2015 = 24 (0.01%)

2016 = 25 (0.01%)

2017 = 26 (0.01%)

2018 = 33 (0.01%)

2019 = 84 (0.02%)

2020 = 163 (0.04%)

Years (Top 10)

2008 = 1145 (0.26%)

2009 = 1052 (0.24%)

2007 = 765 (0.17%)

2000 = 617 (0.14%)

2006 = 572 (0.13%)

2005 = 496 (0.11%)

2004 = 424 (0.1%)

1987 = 413 (0.09%)

2001 = 404 (0.09%)

2002 = 404 (0.09%)

Red and Blue make up the majority of colors in the passwords.

Colours

black = 706 (0.16%)

blue = 1143 (0.26%)

brown = 221 (0.05%)

gray = 76 (0.02%)

green = 655 (0.15%)

orange = 250 (0.06%)

pink = 357 (0.08%)

purple = 346 (0.08%)

red = 2202 (0.5%)

white = 244 (0.06%)

yellow = 228 (0.05%)

violet = 66 (0.01%)

indigo = 35 (0.01%)

As stated previously, people tend to tack numbers and special characters at the end of passwords. These statistics support that theory.

Single digit on the end = 47391 (10.7%)

Two digits on the end = 73640 (16.63%)

Three digits on the end = 31095 (7.02%)

Last number

0 = 17553 (3.96%)

1 = 46694 (10.54%)

2 = 24623 (5.56%)

3 = 29232 (6.6%)

4 = 17692 (4.0%)

5 = 17405 (3.93%)

6 = 17885 (4.04%)

7 = 20402 (4.61%)

8 = 17847 (4.03%)

9 = 19919 (4.5%)

|

|

|

|

|

| |

| |

|||

|||

||||| ||||

||||||||||

||||||||||

||||||||||

||||||||||

||||||||||

||||||||||

0123456789

Last digit

1 = 46694 (10.54%)

3 = 29232 (6.6%)

2 = 24623 (5.56%)

7 = 20402 (4.61%)

9 = 19919 (4.5%)

6 = 17885 (4.04%)

8 = 17847 (4.03%)

4 = 17692 (4.0%)

0 = 17553 (3.96%)

5 = 17405 (3.93%)

Last 2 digits (Top 10)

23 = 12364 (2.79%)

12 = 6416 (1.45%)

11 = 5476 (1.24%)

01 = 5097 (1.15%)

00 = 4098 (0.93%)

21 = 3669 (0.83%)

08 = 3627 (0.82%)

07 = 3598 (0.81%)

22 = 3587 (0.81%)

13 = 3548 (0.8%)

Last 3 digits (Top 10)

123 = 9446 (2.13%)

456 = 2443 (0.55%)

234 = 2160 (0.49%)

007 = 1477 (0.33%)

000 = 1268 (0.29%)

008 = 1150 (0.26%)

009 = 1086 (0.25%)

111 = 1056 (0.24%)

777 = 980 (0.22%)

101 = 895 (0.2%)

Last 4 digits (Top 10)

3456 = 2151 (0.49%)

1234 = 1968 (0.44%)

2008 = 1033 (0.23%)

2009 = 927 (0.21%)

2345 = 750 (0.17%)

2007 = 674 (0.15%)

2000 = 535 (0.12%)

2006 = 502 (0.11%)

1111 = 436 (0.1%)

2005 = 436 (0.1%)

Last 5 digits (Top 10)

23456 = 2121 (0.48%)

12345 = 724 (0.16%)

56789 = 316 (0.07%)

45678 = 305 (0.07%)

11111 = 269 (0.06%)

34567 = 231 (0.05%)

54321 = 197 (0.04%)

00000 = 162 (0.04%)

99999 = 150 (0.03%)

23123 = 132 (0.03%)

Most popular area codes based ont the 3 character numbers found.

US Area Codes

456 = Inbound International (--)

234 = NE Ohio: Canton, Akron (OH)

Now here is some data that can be directly applied to password cracking.