@zer0memhttp://www.zer0mem.sk
The opinions expressed in this blog are my own and do not represent those of my current employerWed, 16 Jul 2014 08:24:20 +0000en-UShourly1http://wordpress.org/?v=4.0.13ALPC monitoringhttp://www.zer0mem.sk/?p=542
http://www.zer0mem.sk/?p=542#commentsTue, 15 Jul 2014 13:05:21 +0000http://www.zer0mem.sk/?p=542 Microsoft did nice work related to callback mechanism, to avoid nasty patching across kernel, and support monitoring in clean way. Currently we can use, among others, for example callbacks on loading new image, process, thread, opening & duplicating handles, dropping files etc. For monitoring network communication you can attach to some device drivers, which is cleaner than hooking, but still does not cover as much as i want to. And there comes ALPC, because even resolving host comes through, and when you are able to recognize it ..

-> everyone like ALPC. And especially applications with network communication, because as was said at training, even gethostbyname ends up by calling some ALPC! So I think it is really good point to start at some object responsible for communication

OB_OPERATION_HANDLE_CREATE – A new process handle or thread handle was or will be opened.

OB_OPERATION_HANDLE_DUPLICATE – A process handle or thread handle was or will be duplicated.

Thats basically means, that we are theoretically able to get called at two mentioned HANDLE operations. Thats good, but wants to get more .. after some digging of nt!_ALPC_PORT it is possible to spot nice structure :

One good candidate to deeper look is NtAlpcSetInformation which call AlpcpInitializeCompletionList and it ends by calling IoAllocateMiniCompletionPacket – and this last routine can sound pretty familiar now!

OK, but whats happening there ? It is another callback mechanism – *CompletionIo*, already described in Windows internals 6th edition, Part2 (I/O Completion Ports). And this callback mechanism is setup-ed by default as you have already seen to call nt!AlpcpLookasidePacketCallbackRoutine.

It is obvious that it is possible to intercept mechanism by rewriting this callback, but this is not what we want to do … When we look at this default function, we can see how this callback mechanism work.

nt!IoSetIoCompletionEx2 ends in nt!IoSetIoCompletionEx, and nt!AlpcpDeferredFreeCompletionPacketLookaside ends by calling nt!IoFreeMiniCompletionPacket per packet in queue.

So now almost done, but one essential thing is missing – alpc port itself to attach .. and there exist some approaches how to find it :

!alpc /lpp

kdexts.dll do it somehow, so here is the approach :

… unfortunately nt!AlpcpPortList is not exported symbol, but its location is inside this ‘structure’ :

one member of this structure which can be found quite easly is nt!AlpcPortObjectType, which is not directly exported, but fortunately for us nt!LpcPortObjectType is alias to it!
And there is also another way to get it (not so comfortable) – querying it :

to successfully locate this structure, and port list itself, inside of ntoskrnl image just add additional checks of predicable values of some members of structure alongside with equality of value for nt!AlpcPortObjectType

ObFiltering

Another option *should* be an official option, but in reallity …

Ob Filters, and registering on nt!AlpcPortObjectType, mechanism is ready to use and it is already implemented in kernel! But you have some obstacles :

And you can be even more specific while monitoring, because that communication with svchost, or other generic service, is not information bomb at all, but you can look at it through service names, which can be useful far more!

service : PlugPlay
service : Power
service : DcomLaunch
service : RpcEptMapper
service : RpcSs
service : eventlog
service : AudioEndpointBuilder
service : MMCSS
service : AudioSrv
service : CscService
service : gpsvc
service : ProfSvc
service : Themes
service : EventSystem
service : SENS
service : UxSms
service : SamSs
service : lmhosts
service : nsi
service : Dhcp
service : Dnscache
service : ShellHWDetection
service : Schedule
service : Spooler
service : BFE
service : MpsSvc
service : LanmanWorkstation
service : CryptSvc
service : DPS
service : FDResPub
service : NlaSvc
service : PcaSvc
service : SysMain
service : TrkWks
service : Winmgmt
service : iphlpsvc
service : LanmanServer
service : netprofm
service : WdiServiceHost
service : WPDBusEnum
service : WdiSystemHost
service : WinHttpAutoProxySvc
service : Browser
service : WSearch
service : Netman
service : WMPNetworkSvc
service : fdPHost
service : HomeGroupProvider
service : SSDPSRV
service : BITS

Article about resolving service name by its id (SubProcessTag) you can find here, and also is written more concrete example of implementation here, and even more you can find it in process hacker as well. This method was designed for user mode, but in kernel you are by creation, so lets say it is more straighforward to resolve this information

Seems that IoCompletion callbacks can be really helpful mechanism. It works just on ports that use the I/O completion port type, not on all ALPC ports, but for network monitoring purposes seems it is fair enough :)

Another limitation is that ‘limited’ usage of Ob Filters on AlpcPorts. It is quite nice feature but limited so much … I hope filtering will support at least nt!AlpcPortObjectType soon!

At the end of this post, I would like to thank to Alex Ionescu for reviewing this article, and for that nice syscan win-internals training!

]]>http://www.zer0mem.sk/?feed=rss2&p=5420C++ in Kernel Drivers (c++, boost, std)http://www.zer0mem.sk/?p=517
http://www.zer0mem.sk/?p=517#commentsTue, 06 May 2014 10:57:58 +0000http://www.zer0mem.sk/?p=517 Recently i was looking for another approaches how to use c++ features in kernel mode drivers. I found some references, but no one will fullfill my needs & desires to use also boost & std (at least partially).

Some time ago my friend show me a way how to add mentioned libraries to kernel code, so i decided to do it from scratch, do some minimalistic approach with some kind of ‘manual’ and PoC, and maybe it can be for someone, except myself, usefull.

Part I. – force c++ to cooperate

Firstly, it is no big deal to use c++ in kernel mode. For example, some of references that i was able to found :

Ok, now it is good enough to use c++ in Kernel mode, and also usage of nice feature of std::unique_ptr. But when you attempt to use some boost::intrusive or std::shared_ptr you may encounter some probems! And this problemes are c++ E X C E P T I O N S , on code project resides very good article for reading about c++ exceptions vs kernel.

in libc.git project, i use $(DDK_LIB_PATH)\libcntpr.lib; for including some additional functions (_hypot and friends…). And in next version of vs can be needed also another functions, and for this you should firstly find in .lib in $(DDK_LIB_PATH) and try to link it to project, or after that if nothing was found implement it by yourself …

But important here, is to avoid linking usermode .dll into your kernel mode driver

I also pushed on github KernelProject PoC of usage of libc.git and new WDM driver for visual studio 2013 – (shared_ptr, unique_ptr, boost::intrusive::avltree). (i used also Common repo because of demonstrating on Vads, and usage of CCppDriver class as “main” of driver) – Windows 8.1 Release is setup-ed configuration – platform, so you can try to set-up Windows 7.1 release for sucessfully running also Vad test (because in undoc is setuped constans just for win7sp1 and win7, both x64)

]]>http://www.zer0mem.sk/?feed=rss2&p=5171Boost your VadRoot iterator!http://www.zer0mem.sk/?p=439
http://www.zer0mem.sk/?p=439#commentsThu, 27 Mar 2014 23:54:57 +0000http://www.zer0mem.sk/?p=439 When it comes to working with memory of process, it comes handy to have information about whole address space of process, to do not touch PAGE_GUARD, knowing exec and writable pages, etc.

For that purpose i already implemented VadWalker in my kernel common repo, and also use it in DbiFuzz frmwrk. But recently i come accross some ideas, how to improve my recent approach and do it more efficiently and kinda smarter.

http://www.codemachine.com/figure_protopte_2.png

VadRoot is in fact simple AVL-tree, and AVL struct is commonly used accross windows kernel. For working in AVL-style in kernel come msdn with some support :

RtlInsertElementGenericTableAvl

RtlDeleteElementGenericTableAvl

RtlNumberGenericTableElementsAvl

RtlGetElementGenericTableAvl

RtlLookupFirstMatchingElementGenericTableAvl

…

For iterating this RTL_AVL_TABLE struct, i implement simple bstree methods, which i then used in AVL.hpp (LockedContainers.hpp), and same time i use this methods for walking trough VadRoot itself. But this approach leads to implement more and more logic, for iterating through VadRoot, so i start thinking about another method to not over-engineering myself too much..

At first some words about VadRoot AVL-tree struct. It seems that it is some kind of intrusive mechanism, which insert { parent, left, right } links into MMVAD_SHORT. Same time it have to be sanitized parent link, because is used method for storing additional info about node in last unused part of pointer [ sanitizatoin mask = ~sizeof(void*) ].
And it sounds familiar, do not ? boost::intrusive::avltree do same job already!

Problem can be that per different m$ versions it can be node-pointers stored at different offsets,

Now is *almost* solved problem with iterating trough VadRoot (getnext, getprev) and same time with classic functionality a.k.a. find, lower_bound…

Almost consist of two parts.

1. Still is problem, with creating header for avltree, as startpoint for iterating / finding algos

It can be solved, but not in clean way, in fact it is bit ohack. But on the other side, usage of VadRoot itself is ohack also … So key point is to create temporary boost::intrusive::avltree and insert inside dummy node, with NULL Vpn’s and redirect its pointer-links to VadRoot itself!
Here is bit of sample :

2. Next issue is with locking. For safe accessing VadRoot itself is necessary to locks AddressSapce & WorkingSet of process, mark thread which holding locks. And in addition, also MMVAD_SHORT nodes itselfs use locking mechanism for working with them, so go deeper

In my PoC i provide simple locked wrapper around VadRoot, and wraps functions { Contains, GetMemory, Find }. Which provide all necessary locking mechanism.
In addition Find resolve iterator represent targeted MMVAD_SHORT, and holds locks during work with this object.

When i would have a time i will merge it in my kernel common repo, but meanwhile you can find some classes here

]]>http://www.zer0mem.sk/?feed=rss2&p=4391Callgate to user : nt!KeUserModeCallback & ROP / MDLhttp://www.zer0mem.sk/?p=410
http://www.zer0mem.sk/?p=410#commentsSun, 22 Dec 2013 16:14:19 +0000http://www.zer0mem.sk/?p=410 Sometimes in kernel developement is needed to process some user mode data. But some of data – structs are internal and not so well documented, and due to this are available functions which work with these structures, but these are often exported just for user mode only. What are options in that case ?

user mode component – service / application

find kernel mode alternative function – often not exported

reverse structure – parse it by yourself

nt!KeUserModeCallback

In this blog post I will describe last mentioned method, you do not need additional resources or reversing undocomented structures. Some articles related to nt!KeUserModeCallback ring0 – ring3 – ring0 gate :

So data are copied onto stack, nice! And when you take closer look at user32!_fnDWORD function, you can see that 5 parameters which are passed to well looked call are stored in inputBuffer. And in addition address of this call is stored in inputBuffer as well!

Seems all count for us, but there are one more thing left. How to provide cpl3 code address for calling ?

ROP – all you need you already have!

MDL & PTE – get less privilages to your code

ROP technique :

I personaly like this technique because it is fun to play with it, but for developers it is most probably not so cool sollution, because it need to carry on with various OS version for compatibility, and same time optionally have and ROP gadgets tool (OptiRop) / compiler (ROPC) to spare your time at finding appropriate ROP sled.

I was lucky enough and I rellatively easly find suffictient ROP sled for rulling over control flow, after user32!_fnDWORD magic call was invoked. Indeed it was due to fact of low complexity of needed code

Seems ROP comes handy not just in exploit case, and its main pros is that it is non-invasive method, where you use already present ring3 code. This method is transparent, but on the other side, when it comes to developement, it is needed to keep eye on different versions of OS, where binaries are changed to implement correct ROP gadgets. This can be pain in the ass – maybe when you have available ROP tool at runtime it can solve this problem more genericaly.

MDL & PTE :

Another option how to obtain our goal, is developing more friendly method – share kernel mode code with usermode. This can be done by documented methods – memory descriptor list (MDL).

* also other apis include call to address stored at inputBuffer, but this api have almost no processing, is really straightforward and do not move RSP to far from original so inputBuffer is on the shot for ROP technique!

]]>http://www.zer0mem.sk/?feed=rss2&p=4102Appendum to “How Safe is your Link ?” – how to fool LFHhttp://www.zer0mem.sk/?p=187
http://www.zer0mem.sk/?p=187#commentsThu, 26 Sep 2013 11:40:05 +0000http://www.zer0mem.sk/?p=187 Heap overflow bug can potentionaly lead to alter heap in that way, that you can rule on its allocation / deallocation mechanism. Nowdays it is litle bit harder, because you need to fullfill some subset of prerequisities for choosen technique, and it is in more than less case not possible.

This post will describe how to break even LFH trough plugin, custom PoC for IE10 on win8 CP, vulnerable to winXP-8CP backend attack.

Prerequisites for this blogpost is to read details about introduced exploitation technique :

Due to present heap overflow bug, resizable chunk, and nature of file buffer used in plugin code, is possible to do small leak & reusing already used memory – for details of this technique read mentioned materials

when is trigerred LFH some additional pre-allocation is performed and _HEAP_LIST_LOOKUP.ArraySize is updated (depending if ListHints contains bigger chunk than LFH try to alloc).

In this state of heap is performed another search -> by using ListsInUseUlong!

Search by walking trough ListHints is validation free approach, and in that case we have no problem

But search by walking trough ListsInUseUlong, is kinda another approachagain no validating, this counts for us, but to the ListsInUseUlong is memory chunk inserted when it is freed and also cleared when it is allocated. Problem is, that we already allocated our memory chunk and due to this it is cleared from ListsInUseUlong, and in attemp to find chunk for LFH userdata block is used ListsInUseUlong… So how to insert it back ?

ListsInUseUlong is just bitmap and thats it! Bitmap is able to cover just one deputy per size, and due this have to be clear another option how to link something back even it is already used…

In other words, if _HEAP_ENTRY(FLink).Size is same size then bit is not cleared and

Conclusions : As was mentioned in presentation How Safe is your Link ? security implementation needs to be implemented whithout shorcuts.
As you can see ListInUseUlong bitmaps usage and logic is imeplemented correctly*. Set and clear bitmap ensures that in _HEAP_LIST_LOOKUP.ListHints are only valid memory chunks and that implies secure alloc / free. But ‘non-secure’ FreeListSearch algo introduced in previous post allow to bypass ListInUseUlong safe mechanism.
So keep in mind that even small security hole can cause troubles and break down another secure processing…

Vulnerable plugin, and .py script for crafting data for this plugin as well, are both just illustrative and not important too much, so i did not it include to sources on github. These was used just for illustrating of idea itself – but if you want to see that, i will provide it to you, just ping me on my email

* AGAIN except missed validation check when checked if FLink is same sized .

]]>http://www.zer0mem.sk/?feed=rss2&p=1870DBI framework for fuzzing on the board, part I.http://www.zer0mem.sk/?p=331
http://www.zer0mem.sk/?p=331#commentsSun, 04 Aug 2013 13:30:39 +0000http://www.zer0mem.sk/?p=331 I started a bit researching around fuzzers, fuzzing techniques and practices. As i study materials about fuzzing, code (node / edge) coverage approach quickly impressed me. But for this method is essential to have a good dbi. Pin or valgrind are good solutions, but will i try to make it in lightweight way – specificated for further fuzzing needs.

Already implemented features :

BTF – Hypervisor based

PageTable walker

VAD walker

full Process control [images, threads, memory]

Syscall monitoring – implemented process virtual memory monitor

FEATURES

BTF – Hypervisor based

Branch tracing is known method already implemented in some of tracers, but known (*known for me) methods implemented it just under debugger. When it comes to play with binary code (tracing, unpacking, monitoring) – i dont like simulated enviroment like debugger, because it is too slow… it can be little crappy to set up BTF in msr, in ring0, wait for exception processing and for finaly executing your exception handler to handle branch tracing, and for keeping track to set up BTF in msr == switch to ring0 again! this seems like a solid perfomance overkill.

But in previous post i mentioned possibility how to use intel vtx technology to gain some advantages for reasonable performance penalty. After a bit playing with documentation and some debuging, i come to easy way how to extend hypervisor, that was introduced, with handling TRAPS and keep eye on BTF!

In other words when Trap flag is set then each trap exception will cause VM_EXIT. So tracing on branches will be handled not by system exception handling but by our monitor! and with perfomance penalty == our processing BTF and VM_EXIT cost!

For effective fuzzing it is necessary to fuzz application from particular state (or you can just kill perfomance for re-running application per fuzz test case), and with this is related saving context -> memory address space, thread context (registers / stack). It is no such big deal just enumerate memory address space, save context and you have it .. but it need a lot of memory resources and it is time consuming as well …

Handling memory write attempts from app to protected memory is done via hook on PageFault, in which is memory temporary updated with original protection mask.

VAD walker

But it have some issues! .. first of all, i will try to disable write by unset this flag in PTE by address when memory is allocated, buut … in this moment is PTE(addr).Valid == 0 … magic is, that for performance reason m$ will not create PTE per allocation request, but instead of this by first access (== pagefault) to this memory.

It can be overcomed to handling it after PTE will be craeted for given memory range, but more simplier option comes here. How m$ code know flags of memory, and even so, is that memory even allocated and so for sure access should be granted ? answer is VAD ! Some interesting reading can be found at Windows Internals, 6th edition [Chapter 10 Memory Management] .

So lets go update VAD instead of PTE per alloc . PTE should in exchange unlock (write enable) memory in PageFault caused by application attempt to writting to its own memory – but also get callback to us that particular bytes are likely to change.

VAD can be found at EPROCESS structure, and i am not proud of it, but it needs some system dependent constants (to avoid rebuild whole project, when it should be shipped on another version of windows, will be mentioned some TODO at the end of blog). And also great source of internal knowledge of m$ code (excluding ntoskrnl binary itself ) is reactos project.

Under debugger you get events about everything, but if you set up correctly in your on-the-fly (>debuger free) monitor you can get same results as a callbacks – which ensure speed up of whole processing

“System calls provide an essential interface between a process and the operating system.” – and so it is nice point to get hook, and monitor process. Now it is just implemented virtual memory monitor to keep eye on memory address space – protection of memory pages

windows version dependent constants => constants should be provided be user app using this framework. This constants can be obtained manualy from windbg, ida, by windbg + script – pykd, or by playing with SymLoadModuleEx

.. as demo was written this concept, which at alloc set unwritable allocated memory, and at first access it ignore it – exception handling in try blog is invoked, but at second acces is access granted by setting PTE(address).Write = 1 in PageFault

00000000BADF00D0 is marker of BTF, before is printed source instruction (which changed control flow), and after follow destination address (current rip). @VirtualMemoryCallback + @Prologue + @Epilogue is implementation of current state of SYSCALL + PageFault cooperating to handle memory writes – used PTE and VAD.

]]>http://www.zer0mem.sk/?feed=rss2&p=3311Monitor everything you want (in Intel vt-x style)http://www.zer0mem.sk/?p=302
http://www.zer0mem.sk/?p=302#commentsSat, 22 Jun 2013 14:06:18 +0000http://www.zer0mem.sk/?p=302 Virtualization can be utilized to reach various goals as monitoring system, system resources and applications as well. It can be used for full system virtualuzation, but i like apporach using it just as a tool too . This post will shortly cover implementation of mini-hypervisor (which is now available on github) for intel vt-x on x64 platform, and demonstrate concept how-to-use-it.

You can monitor & play with system / application in your own. You can use EPT for monitoring memory access, combine with another cpu features … if you set-up your hypervisor right, you can have callback at hypervisor level (and trasfered to non-root mode for free if you want) at event you want. VM-exit switch is not so cheap, but also it is no tragedy, and goal can easly overcome it

]]>http://www.zer0mem.sk/?feed=rss2&p=3021How to boost PatchGuard : it’s all about gong fu!http://www.zer0mem.sk/?p=271
http://www.zer0mem.sk/?p=271#commentsTue, 04 Jun 2013 07:21:22 +0000http://www.zer0mem.sk/?p=271 In this post i will take a look at PatchGuard, at classic scenario of bypassing this protection and also at little bit diferent one. I will also examine new way (bust most probably not new, just reinvented cause it is too obvious and quite efective) how to locate & abuse page guard context and its behaviour.

With this background, i started looking for PageGuard Context, and way how it is invoked in win8. (how easy is to find it, i will describe in next blog post). Here is some snapshots of context assembly code (whole dump is located on github along the PoC sources) :

Classic approach how to defeat PatchGuard is terminate it as soon as you can, but i think it is waste of its potential . Inspiration of idea, what to do with PatchGuard, come from writers of malware known as Goblin / Xpaj (btw. very well written piece of metamorphic file infector – in new era also bootkit functionality). They not terminate PatchGuard during boot, but instead of this they abuse its behaviour for protecting its own hooks! Which is quite briliant, because no-one is more suitable to protect your piece of code than windows protection mechanism – PatchGuard itself

So lets do it with minimal touch of system, with fine gain => lets hook SYSCALL! As you already know patchguard read MSR (intel – IA64_SYSENTER_EIP – 0xC0000082) and check it for consistency. But before that you need to handle some issues :

locate PageGuard Context

locate Syscall pointer saved in PatchGuard Context

exchange it whithout pay attention of KeBugCheck ;)

It seems it is no such big deal, and really it is not ! First of all, take a look at 3exit points which invoke PatchGuard routine :

In my PoC I focused just at the first method hooking (nt!KiProcessExpiredTimerList+0x1fc) – and ofc there can be (and probably are) more such exit points which is necessary to hook for proper implementation of abusing PatchGuard -. Ok so lets go observe some behaviour :

As you can see, it is no huge number of different routines except PageGuard DereferedRoutine with bogus pointer… And this is the goal, when you are able to detect that now is turn for PatchGuard routine, with little research is very high probability that you are able to own it! And even detect PatchGuard routine will be in the future problem (a large amount of routine with bogus pointer, or more stealthy way for PatchGuard routine), i guess there will be always way how to detect it – generic or trough pattern …

LOCATION of PageGuard Context

Ok so main point is that, when you look deeper at final DPC routines which are suposed to resolve original pointer to PageGuard Context and its code, you can see interesting things calculated here :

As i said, this is probably not new ‘discovery’ – i did not find it anywhere, but it probably just means i did not search deep enough …; So from the code you have straight way how to locate PatchGuard context :

So as you can see, when you are able to filter all exit points, then it should not be problem, in current state of PatchGuard, recognize if DerferredRoutine is related to PatchGuard and locate its context.

nt!KiDispatchCallout method snapshot, i post here mainly because it clearly describe how to pwn PatchGuard. First DWORD would be always patched at the same value – probably because try to handle one of bypass technique ? – and cause of nature of decryptor you know how to looks like also first QWORD (0x085131481131482e), and it means disclosure of XOR_KEY! And when you take a deeper look at PatchGuard decryption mechanism, you can play with it in your own :

(even without xoring it would be possible to gather this xor_key -xoring is just pretty shortcut –> just keep in mind you are at the same level as PatchGuard, in your hook you have the same data available, and you can debug – disasemble – just play with code which calc this xor_key)

Position of Syscall ptr is relative to main function of PatchGuard, and position of this main function is stored in structure describing its context :

But you known encryption mechanism, so you need just to implement encrypt / decrypt mechanism, and also Just Code and PAGEGUARD_STRUCT is encrypted, data are plain Second problem can be solved by disasembling checksum mechanism, and re-calc checksum on current data .. crc snapshot :

Problems & points to research:

PageGuard is not one-threaded so it is necessary to handle with all threads

Find out all exit points

Conclusions : PatchGuard is very nice piece of code, really challenging :) In my opinion there should be always way how to ‘defeat’ it (when you are at the same privilage level it is all just about gongfu ), but i guess thats not a point. Main reason in patchguard i see to force software vendors to avoid intercept kernel with hooks and altering key structures in undocumented way. But in this way of looking at problem, i did not come with idea why so much hardering. But i believe that it is interesting part of job for m$ developers, hide code, implement obfuscation, respond to new bypass techniques etc. – but it sounds familiar does not it ?

On the other side, when m$ do it also with anti-malware reasons, i am not sure that they do it effectively. Malware writers does not come to them with this kind of findings, instead of this they will implement and sales it. And when new bypass technique invented by malware writers would be uncovered what would m$ do next ?

… when they just update its patchguard it cause BSOD on infected machines =>users dont like bsod. With legitimate software vendors it is fault by 3rd party and user will blame crappy software. But in case of malware m$ will need to handle desinfection of system at first, otherwise they are screwed up by malware writers …

So i think PatchGuard is perfect for rule over software vendors, that they can not patch kernel, but not as malware protection. I think when m$ would to improve this code more efficiently they should to create some kind of competition like pwnium, but related to breaking the PatchGuard, it will gave reason for hardering PageGuard and it will be more fun

]]>http://www.zer0mem.sk/?feed=rss2&p=2714Uint8Array pwn them all!http://www.zer0mem.sk/?p=233
http://www.zer0mem.sk/?p=233#commentsWed, 08 May 2013 13:06:44 +0000http://www.zer0mem.sk/?p=233 This week I take a look at the research blog post by @vupenAdvanced Exploitation of Mozilla Firefox Use-after-freeCVE-2012-0469 . It is one year old vulnerability, but thanks to it, simple idea come into my mind…

Exploitation of this vulnerability presented at vupen’s blogs, was not easy, because it have into arsenal just controlled OR for certain location, and it uses nothing more – which is quite interesting ! Because of that, exploitation grows complexity, and the first step was logical expanding length of string object, for memory leak. As the next move was performed OR at one tag-object targeting its VTable, which ends to arbitrary code execution.

but … in generation of HTML5 and its new features, more easier and more general method could be used for this exploitation

In previous blog post i focused on some HTML5 features, and one of them was container with byte-level access : Uint8Array . Logical is that this container have also length, and when we are able to expand it, similiar effect to patched string-object is achieved – but keep in mind that Uint8Array is far more effective -> byte-level acces and more important direct access to data!

So it seems no problem here … but, when you start spraying Uint8Array in this way

you will encounter that nothing happen! Magic is that original length which should be patched to achieve overflow effect is stored not in object whith data-buffer, but in its container…

So one option is to start spraying Uint8Array which will efectively spray 2objects in 2different part of heap, which can be problem, because length member is in smaller object so it can be little dizzy to hit this one …

But direct acces for Uint8Array made aviable system like subarray, which brings special functionality :

The range specified by the begin and end values is clamped to the valid index range for the current array.

And when you take a closer look to this subarray object, you recognize way how it is achieved. Simply, in subbaray object is defined length of substring and pointer to original buffer.

… and thats it! … now it is totaly sufficient look and patch at just one place -> subarray.length! We know that size of subbaray object is pretty aligned 0x50b, so why not to heapspray it ? .. whith this heapspray, when you are able to patch subarray.length you win! exactly, when you are able to patch length to native ~0x0, then you full pwn app (whole address space of current app), you can patch everthing you want and easier then ever before

but a little drawback is present .. i need to have at adress 0x218ff00c value { 0x10, 0x11, 0x12 }, but bytes at these adress a unused at zeroed by default … BUT! …

due this vulnerability specs, you are able to perform 2OR operations so lets go craft second value, which will patch our 0x218ff00c byte to 0x10!

0x17a44 0x10 0x218ff00c 0x218ff044 [ 0x218bd00c ] > 0x218bd220

it sounds nice, but now i still need { 0x10, 0x11, 0x12 } at address 0x218bd00c, but when you look at |PAGE + x * 0x100C| you should view values 0x6, which is related to subarray object.. so what next ? … after few experiments, try to allocate with subbaray also substring :)

Ok so subarays and substrings are at collected at the same ‘page’ of heap, and if you spray correctly at the bottom of the page (especially at 0x219000c0 too :) ) would be subarray!

Conclusions : Typed arrays are very powerfull in view of developer but also in hand of the attacker. Direct access, byte-level access make it ideal support for attack, and made a day for hard exploitable vulnerabilities … maybe some checks of overlaping length’s should not be bad idea…

* for testing was used FF11, but similiar (difference in offsets) it work in new FF, GC ..

Due missed validation check on _HEADER_USERDATA_BLOCK, we can play with this structure and find a way how to alter it, obviously no-one (even heap manager) to care , in the shape that will look cool for managing heap manager in our own way

Two Pointers are good enough Just take a look how is member of _HEAP_USERDATA_HEADER aligned

First two pointers (SubSegment, Reserved) are not interesting, last ‘pointer’ BitmapData and third ‘pointer’ (SizeIndexAndPadding, Signature) are also not interesting , BusyBitmap here you have to place valid ptr to Buffer, and reasonable SizeOfBitmap, and also crucial for calculation formula are members of fourth pointer FirstAllocationOffset and BlockStride.

question is : Are we able to cover BusyBitmapBuffer with same pointer that covers FirstAllocationOffset and BlockStride ? .

target memory

Due to sizeof(pointer) based randomization, it is no problem to allocate memory (relative to your userdata block) you want! Just calculate (depending on FirstAllocationOffset, BlockStride, and position of _HEAP_USERDATA_HEADER) how many members of bitmapdata of size sizeof(pointer) you need to reserve (value == -1), in particular bitmapdata member set just one bit as 0, and next member of bitmapdata set as free (value == 0). And null randomization is performed

But keep in mind, that memory that should be targeted, have at its fake _HEAP_ENTRY.UnusedBytes fullfill condition !(UnusedBytes & 0x3F) for preventing RtlpLogHeapFailure()

Rewriten header :

Consequences :

You are able to target (malloc will return address you can specified) every address which you can cover with FirstAllocationOffset, BlockStride, and position of _HEAP_USERDATA_HEADER

More simplier, you can force malloc to return 2times the same adress. First time for app where can be place some vtable – or sensitive data, and second time for your writeable buffer .

Conclusions : This post was only playing with the heap, and showing what can be done with current state of heap manager logic. A lot of improvements was done in win8 from winXP, but validation checks are realy essential to keep proper state of heap (or at least detect corruption). Problem with missing validation checks on LFH – UsetData block was present here from win7, and not fixed in current win8, i hope in the next version of windows will be added also validation check on _HEAP_USERDATA_HEADER header per allocation (or maybe something better ) instead of just renaming members and another calculation formula.