We recently had a red team where we had a lot of RDP endpoints, but not many other endpoints. We had some time pressure, so we looked to see if nmap had a script (we didn’t see one) and wrote a python script that grabbed the cert names. This is a good way to guess at internal hostnames. With our python script, it was also slow.

Anyway after the engagement I was thinking about writing this up as an NSE and looked more carefully at the existing nmap scripts. It turns out it’s already there with ssl-cert. I couldn’t find a command line switch to force nmap to run a script on a port, but it’s easy enough to edit the scripts themselves.

If you want port 3389 to check out the cert, edit shortport.lua (path on my box is /usr/share/nmap/nseLib/) and add it

I came across a pop esp in real code, which I think is a kind of confusing instruction. i.e. What is the value of esp after the following?

push 0x0400
pop esp

What does pop esp do? People who look at x86 asm more than me no doubt know. But to understand let’s look at what pop eax basically does.

add esp, 4 ;"instruction" 1
mov eax, [esp] ;"instruction" 2

What is the order of operations here? There could be two solutions – it could be 0x0400 (if it increments esp first – instruction 1 then 2) – or esp could be 0x0404 (if the increment is second – instruction 2 then 1).

It turns out the second is what happens. So if you guessed that esp is 0x0400, you’re right!

A use after free bug is when an application uses memory (usually on the heap) after it has been freed. In various scenarios, attackers can influence the values in that memory, and code at a later point will use it with a broken reference.

This is an introductory post to use after free – walking through an exploit. Although there are a million posts about the class of bug, not many are hands on (and this one is). I’ve been spending some free time over the past month looking into use after free type bugs. I’m a noob in this space so please call out/forgive mistakes.

Based on the crash, this is most likely either a use after free where ecx could be a pointer to a table of function pointers (although for me at this point it is difficult to tell the difference between this and a null ptr dereference).

Investigation

Let’s take a second to analyze.

The first step is to turn on pageheap and user mode stack tracing for iexplore.exe using gflags. pageheap will monitor all heap memory operations, allowing us to see when our application is trying to access the freed memory immediately (it will crash sooner – a good writeup on what’s going on is here) and also see some additional info, such as the sizes of the allocations and stack traces involved.

This is just a few instructions before our call [ecx]. So to get EIP, we need to point eax to a valid value that points to where we want to start executing. We should be able to do this with a heap spray (maybe not ideal, but easy), then a stack pivot to this address where we can execute our ROP. Because we are modifying a metasploit payload, let’s just do everything the metasploit way, which I’ll cover in the next section.

Metasploit browser Detection, Heap Spray, and ROP

Because we are modifying a metasploit module, let’s just use all their builtin stuff and do this the metasploit way.

First thing we need to do is detect the browser, which is described here. In the msf module, there was existing parameters to detect windows 7 IE9, so we have to configure it to also accept windows XP with IE8.

Heap spraying is the process of throwing a bunch of crap on the heap in a predictable way. Again, metasploit can do this for us. This is described here. We can also get rid of our lfh allocation at the beginning because js_property_spray will take care of it for us. The docs say a reliable address is 0x20302020, so we’ll just use that.

At this point you should be able to have eip control (eip=0x414141) using something like this

Finally, we just need rop, which msf also has (if you’re interested in learning how to ROP, this is a good tutorial). ROP with heap sprays are generally really easy (as long as you know a base address). This is not like a stack overflow where we need to do shenanigans, we just put stuff in order on our heap spray and we don’t have to worry about null bytes, etc etc. Nevertheless, metasploit has builtin RopDB for Windows XP where the addresses are constant, using msvcrt. https://dev.metasploit.com/api/Rex/Exploitation/RopDb.html.

To connect everything, we just need a stack pivot, meaning we need esp to point to our heap that we control. right now eax points to our heap of course, so we can look for something that moves eax to esp (there are several ways to do this, mov esp, eax | xchg eax, esp | push eax then pop esp, etc.). Let’s just look in msvcrt since we’re using that dll to virtualprotect later anyway. I usually use rp++ for this although I had mona loaded so I did search through that (unfortunately not able to find something I could use in msvcrt with mona). Searching for output with rp++ I found

0x77c3868a
xchg eax, esp
rcr dword [ebx-0x75], 0xFFFFFFC1
pop ebp
ret

This will work great. Now just put them in the right order, and we have our exploit!

I put in a pull request for this into metasploit so you can see the “final” version here, although it hasn’t yet been reviewed or accepted at the time I’m writing this. Of course this is largely useless for practical purposes, but it is a pretty good way to learn the basics of use after free IMO. Many vulns affect multiple versions of a browser, so it’s a fun exercise to port the bugs to different versions.

I recently wrote a burp plugin for common crypto attacks in web apps. Check out the code on github (I also submitted to BApp store a couple days ago). I hope to add more modules as time goes on, but to start with, here is what it has:

Active Scanning to detect padding Oracle attacks

Active Scanning capabilities to detect input being encrypted with ECB and reflected back (can be slow)

Attack tab to encrypt/decrypt padding oracles

Attack tab to decrypt ECB where you control part of the request

Here are some slides about it, and giving some background on the attacks it’s doing:

The local challenges can be grabbed from here and various other writeups are online. I was off on the timing for this one, so I only dove into most the challenges on Sunday morning… right before codegate ended and after it ended. I did pretty terrible rank-wise, but I thought the challenges were fun anyway and solved some after the codegate was over.

I learn new things almost every CTF, even if it’s sometimes just things that make sense intuitively but I’ve just never thought about before or forgotten. Here are some lessons learned from codegate.

stack canaries on linux stay constant and you can get them with an info leak. On windows there is more entropy and this wouldn’t have been as straightforward (see Pwn 250)

ulimit -s pretty much defeats ASLR if you’re local. Also, there are areas you can overwrite in this space, like in the syscall mappings, to control the instruction pointer (See Pwn 350 and Pwn 400)

To control the number of bytes output locally, you can use non-blocking sockets (see Pwn 400)

gdb really can’t find gs: segments very well, but there are usually workarounds (see Pwn 350)

You can’t call openprocess with all access on a process being debugged (i.e. you can’t open two debuggers on the same process, even if it’s been forked) but you can openprocess with PROCESS_VM_READ. (reversing 250 – although I ended up going a different route)

I wrote a Pykd script that can be called on a windbg breakpoint and continue based on python stuff e.g. you could do a conditional break on a regex which seems tough in windbg without writing a script. (see reversing 250)

it looks like they’re doing syscalls for every character written and read

Most of the binary is garbage, but you can see clusters of syscalls where the output and input happen. Looking in IDA, there are four clusters of syscalls. One outputs the fail, one outputs “Enter root password”, one is a huge loop that inputs the password, and one outputs winner.

With gdb, I recorded executions and executed backworkds until I was in the giant input loop. At that point I started searching for the password was as an offset to rbp, since this is how it outputs strings as syscalls. Sure enough, I found it pretty quickly.

Reversing 250 Clone Technique

This one has on the description: “Limited processes will be generated. Which one has the flag?”

We can see in the main function that it will call createProcessW in a loop that lasts 0x190 iterations, so that’s our number of processes.

Each process is called with three args, #randomlookingnumber# #randomlookingnumber# #counter#, where counter is 0,1,2,…

One thing I tried to do throughout this was put a breakpoint on memory access. For example,

ba w4 00409754

The value would change, but the breakpoint is never hit. Crazy! After some investigation with Joe, we eventually figured out this is because ba works by putting a break in one of four registers per processes. In our case, the memory is written to using a call to WriteProcessMemory, and the kernel is writing the memory, so our breakpoint is never hit.

Investigating this led to the cinit function, which is called before main and contains the calls to writeprocessmemory. It starts with code that grabs the command line args if they exist (and if not sets them to the first value. It then pushes them along with this interesting data string to a function, messes with a string it returns, and then 0s out that string. Even without looking at what the decode_func is doing, it looks like a likely place for a key!

My strategy was then to attach windbg to every one of these forked processes by turning childdbg on, changing the filters to control when it breaks, and set a breakpoint right before the rep stosd. I then wrote a python script to see if this is a candidate for the key.

Ok, so we might have to fix the pcap-ng file to view this correctly. I don’t know much about the format, but I opened it in a hex editor and searched for DWORD hex(4270407998), which is \x3e \x41 \x89 \xfe. I replaced this with DWORD 96 and got a similar error. Then I kind of brute forced – set it to 0x30 and got a different error. 0x40 gave a too big error. It took about 10 tries of manual binary searching, and then I got a format I could open in wireshark and follow the tcp stream.

Then just save the pdf part to a file and we get the key

Pwn 250 Angy Doraemon

This was a fun, relatively straightforward challenge. It’s a remote exploit. The bug is in the mouse function. When it reads the answer to “are you sure?” it reads 0x6E bytes into a buffer only a few bytes big.

The hard part is you have to bypass a non-executable stack and a stack canary. This is possible via an info leak. Because we can write over the end of our string, we can see other values on the stack, such as the canary and the fd. One interesting thing I learned is how on Linux the stack canary seems to be constant per process (on windows, as Matt Miller said, Windows cookies are basically an XOR’d combination of the current system time, process identifier, thread identifier, tick count, and performance counter.)

So you leak the canary, then you have to rop a payload. execl is in the .got, and I used some fancy redirecting to send the output right back through the socket. You need the file descriptor, but it’s on the stack and you need it anyway I think to do a read from the socket.

Pwn 350 4stone

This is a binary that plays a game locally, sort of like connect 4. All the interesting things happen if you win the game with 0 time (which you can do with a constant strategy – this took me longer than it should have). If you do win, it grabs some input from stdin.

As you can see from the following snippet, you get an “arbitrary” write (marked in red). There is a catch though. You can write anywhere, as long as it doesn’t start with 0x804, or 0x0b.

The stack is executable, so if we could overwrite anything, then we’d be sitting good. Unfortunately, most everything by default starts at 0x804 or 0x0b.

hmmm, we can write to the heap, but that’s tough. What can we do locally? My initial thought was to make the stack big enough so it wasn’t in the 0x0b range, then potentially overwrite a return pointer. There’s aslr on, but we might be able to brute force it. So I started filling up the stack with an execve call, but hit a limit. Ok, we can set this with ulimit. I tried this, and set it to unlimited, and something interesting happened. The mapping then looks something like this:

And a lot of these addresses, like libc, don’t change, even with ASLR! (apparently this is well known, but this is the first I’ve seen it). So where can we write in one of methods? The only function call after our arbitrary write is to exit, which makes a system call. Maybe we can overwrite that?

We can control edx and ecx with the gadget at 0x40000425, and another interesting thing is we may be able to control ebx indirectly if we can write to unk_8049150

.text:080480B4 lea ebx, unk_8049150 ; start
.text:080480BA int 80h ;

We have almost all we need for a system call, except we don’t have eax anywhere. After some research, one thing that’s promising is both “read” and “write” will return eax to the value of bytes read or written. My first thought was if I could set ecx or edx with the gadget at 0x40000425 then I might be able to use one of the existing reads or writes to control eax. This is close to working, like if they would mov eax, 4 (syswrite) immmediately before the int 80, it would have worked. But unfortunately there’s no flow like that. They all move eax, 4 at the beginning, then set all the args afterward.

I was stuck here, but luckily the good thing about trying this one late is there are solutions posted. This is an excellent writeup, and I used it to cheat: http://mslc.ctf.su/wp/codegate-2014-quals-minibomb-pwn-400/. They set a nonblocking socket to control how much could be written. Cool! I never would’ve though of that.

The sqli is relatively straightforward. You can have a True/false query like this.

password='or(1=2)and'1
password='or(1=1)and'1

But we have only 120 guesses*, and the password is quite long at 30 chars. This returns true

password='or(CHAR_LENGTH(password)=30)and'1

so that means we have only an average of four queries per letter, or 120 guesses for the whole 30 character password. They are lower case at least, so that can reduce our queries by quite a bit, but if we query all the bits, that’s 5 per letter and is too many with a true/false query (which would need 5 per letter). And if you do a straight binary search of the 30 char password, that’s on the order of a few hundred per session*

But, we really have more than true/false – we have an arbitrary number of states. true, false, and timing. We can sleep different amounts of times, etc. For example, we’re only lacking one bit, so we could ask, is the next bit 1 (return true), 0 (return false), or are all the bits 1 (sleep 10 seconds).
The query ended up pretty complicated, and I had to add some code that checked sanity to make sure letters were being decoded correctly.*

Pass the hash is dead. Just kidding. Although Windows 8.1/2012R2 has some good improvements to help slow down lateral movement on a Windows network, pass the hash style attacks are still obviously a good way to spread out as a pentester/attacker. Here’s the scenario to keep in mind: you’re a local admin on a domain joined Server 2012R2 box and want to spread out.

Let me expand a little on why you’d ever want to look at MS Cache.

In our scenario above, first you might think mimikatz. LSASS is a protected process now, but that might not matter much. Mimikatz has a legitimately signed driver. Okay, this is great. But maybe the ops team has a rule looking for drivers being loaded and you don’t want to load a driver. Or worse, maybe everyone has logged off of the box and they’re only logging in with Network login. With 2008R2, credentials seemed to usually be cached in LSASS the next reboot, but this has changed in 2012R2. Now when users log out, their credentials are no longer cached (afaik). As an aside, this might be a good thing to keep in mind with disconnected RDP sessions since the users are still logged in.

Without touching LSASS, you can also use token impersonation to get a similar effect. But this still requires a user to be logged in to take their tokens. There are also some nice things about eventually getting a hash or cleartext password rather than a token. Like even if a token can be used on that domain, a cleartext hash allows you to check for password reuse (or look for password variants if you can get the cleartext).

There are a lot of ways to go even if nobody’s logged in. You could be a pirate, trigger AV and see if an ops user logs in interactively. You could setup an NTLM relayer to pass credentials of something, maybe the Qualys box. But one thing that’s often overlooked (at least by old me) is MS cache.

An Overview of MS Cache

The terminology can be confusing. Although the MS cache hash is a hash of the user’s password, it’s a distinct value from the user’s hash that you’d use to directly pass the hash. You can’t expect to forward it and for things to work. A good overview is on the jtr wiki here.

What happens when you are in front of a Windows machine, which has a domain account and you can’t access the domain (due to network outage or domain server shutdown)? Microsoft solved this problem by saving the hash(es) of the last user(s) that logged into the local machine. These hashes are stored in the Windows registry, by default the last 10 hashes.

Anyway. The only source of entropy for these MS cache hashes is the username. To be clear, the salt does not include the domain, the computer, etc. This is better than a normal hash which does not include the username either, but it is still not great. To illustrate this concept:

With the normal NT hashes, the hash is always the same given a password (i.e. This is why pass the hash works across domains). With MS cache hashes it takes the username as entropy, but usernames aren’t random. If the “Administrator” account has the same password across domains, this MS cache hash is constant. Besides builtins like Administrator, this is also interesting in organizations that have several separate domains, but usernames are constant between them.

Another thing to hammer in again. The NT Hash seems to be cached in server2012R2 only when the user is logged in, and in server 2008R2 until the next reboot. But (if configured to cache things, like it does by default) the MSCache hash is stored in a registry hive and will persist across reboots.

Extracting and using the Cache hashes

These hashes are stored in an obfuscated way in the registry, and getting at them is comparable to getting at local SAM accounts. Cachedump is one tool, although the source code seems to 404. Quarkspwdump is another tool, use “QuarkspwDump.exe -dhdc”. Metasploit also has a post exploit module, cachedump, that does this.

This theoretically is put in a format that john the ripper understands, but unfortunately John on Kali doesn’t seem to understand the format (it will run, but even with a wordlist the password doesn’t crack). For better luck, simply put it in “username:hash”

Other crackers seem to support this format also, like hashcat (untested) and Cain.

Slowness (update)

I’ve seen some twitter activity around how this is “new pass the hash on windows 8.1”. This isn’t the case, although I do poke fun at the “pass the hash is dead” stuff. When I first published this, I should have expanded on how slow this cracking can be. JTR wiki says

The far from optimized MSCash2 algorithm provided in the sample code below and used in the corresponding MSCash2 JtR patch generates about 330 DCC2 hashes/sec (MSCash2) on an Intel Core2 Quad CPU Q6700, compared to 58.8 millon DCC1 hashes/sec (MSCash). In other words, incremental brute-force attacking for different search spaces, depending on the character set and the password length, will take ages. So it is a good idea to do some intelligent password guessing when attacking DCC2 hashes, i.e. rule-based dictionary and probabilistic attacks.

By default a lot of these could also be pre computed and put into rainbow tables. Although the iteration count is apparently configurable and could make this not practical.

</update>
The end, thanks for reading! Are greetz still a thing? Thanks to my friend Dave for help on this.

There are a lot of ways to modify the execution of a program, including at least using Windows Compatibility Toolkit (a good reference is Mark Baggett’s Derbycon talk), modifying the environment, manual patching the binary before it runs, and function hooking. Function hooking generally refers to any method where you’re able to intercept and modify function calls of a running process. A simple example of a function hook might be “every time the program calls AESEncrypt, first save the plaintext to a file and then call AESEncrypt”.

There are also many different ways to function hook, and in my opinion there isn’t really a “best” way – it just depends on what you’re trying to do. For example, if you’re doing something to try to be sneaky, one of the best ways may be like Joe outlines here:

In your DLL, write a C function that contains the functionality to execute. Optionally, return control to the original function

Overwrite the first bytes of the function to jump to your DLL

However, if your goal is to change the behavior of a program and you don’t care about stealth (e.g. you’re just using hooking as an aid to testing) there are easier ways to accomplish the same goal. “soft” function hooking usually refers to attaching a debugger to a program and using the debugger’s functionality to modify the behavior. I’ve seen this approach elsewhere – in gray hat python, they use this technique with pydbg and immunitydbg.

I learned about pykd because of mona for windb. I messed with pykd last week, and I like it quite a bit (at least more than windbg plugin alternatives I’ve used like powerdbg). There are pluses and minuses when compared with something like immunity debugger. Pykd doesn’t currently have nearly the number of convenience functions immunitydbg has (for example, you have to store your strings in memory manually). UPDATE: In this case I was looking for something like remotevirtualalloc and didn’t see it. But @corelanc0d3r pointed me at windbglib, which has these exact convenience functions. But Windbg is just a more powerful debugger. For example, immunitydbg is awesome, but it doesn’t work with 64 bit processes, following children processes, kernel debugging, etc.

Here is a simple example. I ran into a situation where a team’s test box had a hard coded a test server to listen only on localhost. This can be a pain to debug, because a lot of my tools are on other boxes and plus I can’t do things like see what’s actually going on with wireshark. This is a quick script that modifies the behavior of inet_addr, which is where this binary passed the hard coded localhost to (if you’re wondering why I didn’t just patch it – that was an option too, but there was some other important stuff right next to it in .data and ‘localhost’ was too small to fit my IP). So this hook simply grabs the current IP and passes it as the arg to inet_addr instead of “localhost”

Some things I got a bit stuck on

Use the second argument with setBP to have a callback function on the breakpoints, and then use this to modify things. Note you can’t mess with execution within the function itself. Before going this route I tried to use the EventHandlers (like onBreakPoint) and ended up with weird errors.

Within your callback function, if you return True (or nothing), execution will halt, and if you return False then execution will continue

If you know windbg basics and python, this should be really familiar – I have a tiny bit of python to grab the IP, and then inside the handlers I’m pretty much just running windbg commands sequentially. I ran this in the debugger itself. Here’s a side tip. You can run put commands in a textfile for them to run in windbg (similar to gdb’s -x arg). So you can do this to load this pykd script automatically.