At this point, you can go ahead and close the two parent processes (since we are not interested in their
functionality, for the sake of simply finding the DGA). So we know that we are
interested in discovering how this traffic is generated. So let’s try to find
out where it originates.
Earlier, using API Monitor, we saw that explorer was using several functions within WinINet.dll:

So let’s set a breakpoint (F2) on a
function that deals with Internet traffic in WinINet.dll. For example,
InternetConnect():

HINTERNET
InternetConnect(

_In_HINTERNET hInternet,

_In_LPCTSTR lpszServerName,

_In_INTERNET_PORT nServerPort,

_In_LPCTSTR lpszUsername,

_In_LPCTSTR lpszPassword,

_In_DWORD dwService,

_In_DWORD dwFlags,

_In_DWORD_PTR dwContext

);

Hit ctrl+g to get this handy search
window. Then type InternetConnectA (A for ANSI as opposed to Wide) to find the
matching function and its address:

Click "Follow expression" and breakpoint,
hit play and see what happens in Wireshark until your breakpoint is hit.

We see several attempts to send UDP
traffic (again, outside of the scope of this post), but just wait about 45
seconds until you hit your breakpoint. It looks like it is querying Google.
This is a common thing for malware to do in order to test connectivity before
functioning.

Hit CTRL+F9 to return to the caller
in our main executable.

Look at this address in IDA to
figure out what’s going on around this function. Also you may want to rebase
your program in IDA to the offset injected into Explorer:

Note that this offset will be
different each time this program is injected. Let’s examine the calling
function address to see what’s going on.

In order to see what called this function, hit ‘x’ on the function name to bring up a list of callers. Hit Enter on
the first function and again on this function. I am doing this for the
sake of some brevity, but this segment really just calls InternetConnect() and
the HttpOpenandSendRequest() function, so move on up to the caller. I named
this function Check_Connectivity_to_Google because that’s exactly what it does
(Note: I’ve also seen it query Bing.com, the whole point is that it is a
connectivity check). It actually will try twice before failing out:

Following that function, another function that is immediately following references a SystemTime structure,
this is indicative of a DGA function. It actually is our DGA function, hence
the name I gave it (DGA in Yellow below):

I annotated the register values and
some instructions above the call to help shorten the post. Note the push of a
value I named “remainder”. This value is very important and it is used
throughout the DGA function. Let’s go ahead and see how that value is
generated.

The instruction ‘div’ (executes
EDX:EAX/ECX. EAX) in this function is “randomly” generated by a function that
uses QueryPerformanceCounter() or GetTickCount() as a value passed to a
function that performs a few shift and xor calls of its own:

Interestingly enough, if we Google
the hex value 6C078965h, we come across a post where someone describes a
“random” number generator they wrote that is very similar (just different
registers) to this function, used to generate dice rolls for a DND game. (Notice in the loop above how var_PerformanceCount is at 31E3748. The
DWORD 31E4104 is used as a maximum
value for the loop counter. It is exactly 9BCh or 2492 (623*4) bytes off).
Which aligns with this post as well:

I also see similar algorithms used
for various other functionalities in other programs. Just Google it if you are
curious.

Either way, that value is
reasonably unpredictable if you aren’t on the machine that is generating the
number. This value is then sent through an XOR function:

In which the result is divided by
1000 (or mod 1000, eventually meaning 1000 possible seed values) and stored in EDX as we call the main DGA function:

There are two parts to this DGA,
one where the domain name is generated and another where the Top Level Domain or TLD (for short) is generated. Before any of this occurs though, 7 bytes of data are
generated to be passed to an MD5 function. The division by 1000 earlier is
really just mod 1000, leaving 1000 possible domains.

These 7 bytes are stored as
follows:

Last byte of year + 30h

Month

Day/7 *7

First Random Byte

Second Random Byte

00

00

For Example:

byte_array[]{0x0E,0x01,0x0E,0xAC,0x00,0x00,0x00};

First_Byte: Last Byte of year +30h

2014=07DE-> DE +30h=0E

Second_Byte: Month

January =01

Third_Byte:( Day / 7)*7

([17 / 7]*7=14->0E)

Fourth_Byte
and Fifth_Byte: “Random” seed (0-1000) that is
incremented by one

The low byte value is used for the TLD as well

Sixth_Byte
and Seventh_Byte:0000

C Version:

int genDate(unsignedchar*dateData){

unsignedchar year;

unsignedchar month;

unsignedchar day;

time_t currTime = time(NULL);

struct tm *localTime;

localTime = localtime(&currTime);

year =((localTime->tm_year +1900)&0xff)+0x30;// Add 30h to last byte of
year

month = localTime->tm_mon +1;// Just plain month -
Number of months since January + 1

day =(localTime->tm_mday /7)*7;

unsignedchar combined[]={ year, month, day };

memcpy(dateData,&combined,4);

return1;

}

That data is then sent to an MD5 function at 0x31C54A7.
That result is used to generate the domain name and the lower byte of the
random number is used to generate the TLD:

I have annotated the instructions and written it in C:

#define HASHLEN 16

int genURL(unsignedchar*hashValue,char*URL){

// Grab each byte of hash

unsignedint j;

unsignedint currentIndex =0;// Pointer to current index
in URL array

for(j =0; j < HASHLEN ; j++){

unsignedchar cl = hashValue[j];

unsignedchar dl = cl;

dl =(dl &0x1F)+0x61;

cl =(cl >>3)+0x61;

if(cl != dl){

if(dl <=0x7A){

URL[currentIndex]= dl;

currentIndex++;

}

if(cl <=0x7A){

URL[currentIndex]= cl;

currentIndex++;

}

}

}

return1;

}

So now that we have the domain name, it’s time to grab the
TLD, if we look just below the domain-name code, we can see the TLD-generation
section. It is really simple actually, it just checks if the last byte of the
random number against a few math functions. Depending on the result, we get one
of six possible TLDs. Note the “shape” of the the overall function, this is
usually indicative of a case statement or if/then sections:

C Version:

int genTLD (unsignedshort rand,char*TLD){

unsignedchar lobyte =(rand &0xff);

if((rand %6)==0){

strncpy(TLD,".ru",4);

}

elseif((rand %5)==0){

strncpy(TLD,".biz",5);

}

elseif((lobyte &3)==00){

strncpy(TLD,".info",6);

}

elseif((rand %3)==0){

strncpy(TLD,".org",5);

}

elseif((lobyte &1)==0){

strncpy(TLD,".net",5);

}

else{

strncpy(TLD,".com",5);

}

return1;

}

All of this is done over and over
again, generating 1000 possible domains. That “random” seed value is really
just a starting point and all of the URLs are called eventually at a rate of
just about 1 per second.

Putting it all together

After figuring out how this DGA is assembled, it is clear that it only changes once every 7 days or the beginning
of a new year or month. There are still 1000 possible URLs over
this period since two bytes are unknown and incremented/used one-by-one. That’s not bad if you are the attacker. If we do the math at just 1
domain per second, it takes roughly 16 minutes to call each domain. This means
that the attacker needs to generate a list of 1000 domains for a particular day
and pick one, then the attacker has to wait at most 17 minutes for a call out
back to his/her server. Not bad.

In the end, a list can be generated
in C for any date we want pretty easily:

/*

gcc -Wall dga.c -o dga -lcrypto -lssl

Use without args for todays date.

To generate domains for another date, simply run with:

./dga mmddyyyy

*/

#include <stdio.h>

#include <stdlib.h>

#include <time.h>

#include <string.h>

#include <openssl/md5.h>

#define HASHLEN 16

int GetMD5Hash (unsignedchar*pbData,int size,unsignedchar*hashValue);

int genURL (unsignedchar*hashValue,char*URL);

int genTLD (unsignedshort rand,char*TLD);

int genDate(unsignedchar*dateData);

int genSuppliedDate(char*argv[],unsignedchar*dateData){

int year;

unsignedchar month;

unsignedchar day;

longint a = strtol(argv[1],NULL,10);// grab int without atoi

month = a /1000000;

day = a /10000%100;

year = a %10000;

year =(year &0xff)+0x30;// Add 30h to last byte of year

day =(day /7)*7;

unsignedchar combined[]={ year, month, day };

memcpy(dateData,&combined,4);

return1;

}

int genDate(unsignedchar*dateData){

unsignedchar year;

unsignedchar month;

unsignedchar day;

time_t currTime = time(NULL);

struct tm *localTime;

localTime = localtime(&currTime);

year =((localTime->tm_year +1900)&0xff)+0x30;// Add 30h to last byte of
year

month = localTime->tm_mon +1;// Just plain ol' month -
Number of months since January + 1