I will not introduce myself as an expert of security research. Actually, I will never like to say myself as an expert (maybe at most professional) on any field. Every research field is expanding every year, little or more (does not matter), it is not possible to say anyone to expert on that field. I consider myself as an advanced software and security researcher, so a professional on this field would be an inspire to me (with his/her dedication and deep thoughts for the field). The year 2018 is a disappointment to me. People who could be great inspiration for newbie like me (I was before 2018) have completely failed to show their impartial view and encouragement to move forward the field. Hence, we are still asking same question in this 2019 that why we cannot prevent decades old security issues yet. We have failed to resolve old issues when we are continuously facing advanced issues (e.g. online fake news propagation because of latest machine learning technology). This is very frustrating but at the same time it should be also encouragement to newbie and advanced level researcher to row against the tide (the negative view). I hope this new generation of security researcher will realize how it is important to resolve the issue in practical than daydreaming over theoretical talks. I hope it because this generation sees how badly the negligence of professional research impact today. Our everyday starts with a doubt about whatever we see, listen and read in online. A current trend is moving to cut off people from the virtual world. This is equally dangerous for human race and a disgrace for us. We cannot let it happen, people have to be connected the way they are now but without any fear of being exiled. I can ensure that our professionals may change their views, but they cannot contribute; so it is all upon us. I like to follow up few points from my observation:

The technology moving a lot faster than ensuring its stability. We are aiming to travel space while we have not yet ensured driver-less vehicle system. We should not slow down our evolution, but more people have to be focused on ensuring the stability of the system. Every year, maybe 5% of graduate student choose to research on security research. It is not because the field does not have fund but because the prospective people see it as a hard job than other research. I don’t know how to relate more people in this field because their consideration is fair, it is a challenging field to work. Maybe other fields have to impose strictness on their research. As an example, machine learning research should have to be passed 90% security guarantee in practical. This way it will both reduce the pressure on security and equal the challenge bar for every field.

Traditionally, security researchers have to consider both performance and scalability of their system to implement in practice. I believe this is not the right order to follow. First of all, the implementation has to be 100% correct. The implementation can have 90% security guarantee due to the complex system that they will work on. By this, I mean, a system can have limitation but the part is cover should be 100% correct. It is always possible to overcome limitations but when a system grows up, it is not anymore possible to correct it. Performance should be the third factor to consider. I acknowledge that if a security feature causes 100% performance overhead, it is useless. But if it causes 50% overhead, after 5 years this will be only 20% because of the advancement of computation machinery. So, let accept the system and move forward with it. Only by this strategy, we may avoid questions like why decade-old issues are still around us. Overall, it is correctness, scalability, and then performance.

Change the view is the most urgent. We know static analysis is overapproximated means, not 100% correct. It is meaningless to stick to it. Let use it as a secondary tool, focus on more precise dynamic and symbolic analysis. These tools need to be improved considering they are the ultimate future to resolve our issues in real.

Usually, in research, we target an issue and theoretically designed a system to prevent it. When we get into deep with the initial design, we have found more complicated design problems. Hence, sometimes, we design a subsystem that actually creates another issue, stupidly even the issue we are trying to resolve. If you try to provide a security guarantee to indirect jumps and have a design in your minds but have found a technical design problem, don’t design a sub-system that creates more indirect jumps when you cannot ensure 100% code coverage. This is complete stupidity. If you cannot see a way out, kill your idea and move forward with another.

In security research, we all see two division: academic and hacker. This division is extremely visible (they even attend the different conference of their own). Both sides of researchers are responsible for this situation. This tradition needs to be changed. The blame game is not good for humanity. Both sides have to fix their shortcomings and move forward together. The academic researcher should be impartial whatever their representative institution is. Hacker have to be passionate because a system can only suvive through years of dedication.

HexType: Efficient Detection of Type Confusion Errors for C++

Summary: In a type-based programming language, typecasting is a common phenomenon. With the object-oriented programming paradigm, this feature turns into a dangerous attack surface. When a derived class object cast to parent class object (upcast), it is usually safe, considering parent class is a subset of the derived class. On the other hand, casting a parent class object to the derived class pointer (downcast) can raise a serious issue (if the derived class has an extra item than parent class). Moreover, these casting as often indirect (an object pass to a function as a class pointer type). The research work attempts to resolve the issue with runtime checking (similar checking as dynamic casting, but expensive and limited to classes with virtual function). The research expands the coverage with a more efficient runtime checking with their defined data structure.

Design: The design can be divided into three parts. 1) Type Relationship Information: This information is available to the compiler at compile time to verify static cast, but they have to instrument it to the binary to use it at runtime. 2) Object Type Tracing: Whenever an object is allocated, a table will update the object information with its original class type. To do that they instrument every allocation site. The table they use is a combination of a hashmap with each hash item represents an RB-tree. The hashmap check considers as a fast-path check while RB-tree traverse considers as slow-path check. The hashmap will frequently update to recent object verification. 3) Type Casting Verification: In every type casting location, instrumentation will ask for runtime verification of that casting object. The verification will collect runtime information and verify against (1)[if casting path allowed according to the tree] and (2)[original type of the object is casting of]. They expand this verification coverage to dynamic cast too by replacing it to theirs. However, they also optimize the performance by discarding safe object from being tracked and safe casting from being asked to verify (detect the safe object and casting through static analysis).

Advantage: The design is simple to be scalable. They also detect couples of type confusion bugs using this. Performance overhead is better than CaVer but worst than TypeSan. They also achieve better code coverage.

Disadvantage: Static analysis based optimization is not described completely. Traditional analyses are not reliable. Handling reinterpret cast is also dubious.

Discussion: Overall, type confusion is an interesting issue to cover. Detection through runtime verification is hopeful. C also could have a similar bug, they should all be covered.

Stack Bounds Protection with Low Fat Pointers

Summary: The research work is an extension of their another work (Heap bounds protection with low-fat pointers). The concept of low-fat pointers are originated in the previous paper, they provide well-details of that too. The basic concept of low-fat pointers is: use the pointer memory itself to calculate its object boundary instead of extending the memory or creating a shadow memory. Additionally split up the memory region into different virtual pages based on object size for performance improvement. However, handling heap memory explicitly is easier than the program maintained stack memory for which they propose a stack allocator which will handle all stack memory operation.

Design: According to the design, when a pointer loads the object information, it has to access a lookup table and use its pointer address divided by 32GB (size of the virtual page) as the lookup to figure out its object size. Next, using the roundup technique with pointer address and object size, it can calculate the base address of object memory. Now, as it has both base address and objects size, it can validate the object boundary, every time the pointer access a different location on that object. The bound check operation will operate over instrumented binary. For heap allocation (e.g. malloc), they can modify the allocator to allocate the object into a variety virtual page. But for stack objects, each of them allocates to the same virtual page dedicated to stack. Due to this, their physical memory is non-fragmented, this improves their memory management. For stack objects, it is important to handle memory efficiently. To cope stack objects to the same design for heap and the same time, don’t hurt the regular stack operation, is critical. Moreover, it is important to be efficient. To calculate lookup table, they use lzcnt instruction, for aligning, they do a masking operation (another lookup table), and they allocate the objects in splited 32GB virtual pages similar to the heap, but now separate each 32GB page for heap and stack. Everything is fine unless the physical memory is now more fragmented due to redesigned stack allocator. So, they propose a memory alias to map every virtual stack page to the same physical memory. They do apply different optimization where to check bounds and where not, but they are not a major contribution.

Advantage: Low fat pointer is memory efficient, but when it comes to performance, the design still has to sacrifice little more memory than it has to.

Disadvantage: First of all, now it is easier to predict where an object is located in the physical memory by only knowing the object size. Although pointer integrity is not a part of the bound check, now it can cause greater disaster if not checked. Memory aliasing design for virtual stack pages are not reliable, the design is obscure.

Discussion: The design and evaluation do not show any hope to the practical implementation of low-fat pointers for stack object. Overall, I realize, low-fat pointers cannot precede shadow memory design.

If your research is system and software security, nothing cannot be worst than counting how many different conference you should follow. Due to nature of work, you surely have to follow security, system and software engineering conference. Besides them as security research is very much practical, you also have to follow hacker conference. Sometimes for literature review, you may also have to follow 2nd tier conference (but they are not necessary). Here I list what I follow time to time:

Hacker Conference

* BlackHat* DefCon

2nd-tier Security Conference

* International Symposium on Research in Attacks, Intrusions and Defenses (RAID)* European Symposium on Research in Computer Security (ESORICS)* IEEE European Symposium on Security and Privacy (EuroS&P)

It is impossible to write a complicated system without any hidden vulnerability resides in it. When researchers are continuously working on developing tools to detect these hidden vulnerabilities, the attackers are also coming up with new kinds of vulnerability. This race seems to be never-ending, so researchers also concentrate to design a defense system that can at least able reduce the impact of a vulnerability. These defense systems restrict an attacker to develop an effective exploit in the vulnerable program. This opens another door for attack researchers to develop tools to generate exploit automatically.

Block Oriented Programming: Automating Data-Only Attacks

Summary: Vulnerable software with an active defense system (e.g. Control-Flow Integrity, Shadow Stack, Address Space Randomization etc.) is hard to exploit. Control Flow Integrity (CFI) restrict execution within valid control flows, although because of the weak control flow graph (CFG), the coarse-grained CFI system allows overapproximating control transfers. This keeps open the exploit door for an attacker to lead the execution to a certain status (e.g. pwn the shell, information leak etc.). In this research work, the author proposes a high-level language to write a tuning complete exploit, a new approach to generate exploit using basic blocks as gadgets and validate their control flow following valid path in CFG. This approach can beat shadow stack which ROP gadgets cannot do.

Design: When a vulnerable system is built with CFI and shadow stack, it is neither possible to target any arbitrary location in the program nor possible to return anywhere. The major contribution here is that instead of using ROP gadgets, using basic blocks as gadgets, an attacker can let the program follow the valid control transfers when still he can achieve his goal. But before that, an attacker has to express his desire what he is hoping for to exploit. So, they propose a language for writing exploit payloads (SPL) which is architecture independent and tuning complete. An attacker can use abstract registers as volatile vars, a subset of C library calls (e.g. execve etc.). SPL can contain multiple statements and for each of them, the block oriented programming (BOP) will find out a set of basic blocks that satisfy that statement. For example, if a statement expects a register to hold value 7, it will find out basic blocks in the vulnerable program that ends up with any register containing value 7. These set of basic blocks are identified as candidate blocks. Next, the system will mix the set of basic blocks each represents one statement without one basic block clobbering another basic blocks purpose. They are called functional blocks and now it is time to identify their dispatcher blocks. Dispatcher blocks are those blocks which are not clobbered blocks and connect one functional blocks to another. Searching for dispatcher blocks are NP-hard problems, even approximation algorithm will not work (because of large code space in real-world programs). So, they propose a backtracking system with a minimum threshold for search depth. This may overlook lots of possible dispatcher path, but it is important for scalability. With functional blocks with their dispatcher blocks, they now have delta graph which is a context-sensitive shortest path to lead a vulnerable entry to exploit zone. Although the exploit is valid in presence of CFI and shadow stack, it is important to validate them by solving their constraints in different control transfers. A concolic execution system is used to prove a subgraph of the CFG that can use for the exploit is possible to execute. Now, if an attacker has found a vulnerability in a system, he can use this exploit generation system to figure out what exploit could be useful.

Advantage: Their exploit generation technique is a forward approach that’s why it is easy to follow. The approach of block-oriented gadgets is a real expectation (ROP gadgets are an insane idea for the secured system).

Disadvantage: Too many thresholds limit the power. The SPL idea is not necessary, it is still a C code. The evaluation shows the system mostly fails to generate exploits for a major breakthrough like pwn the shell.

Discussion: The concept of using basic blocks and follow the control flow graph as the base to help find out a subgraph which is also valid is a good one. But, due to heuristics, the expectation limits a lot. A forward approach also limits what else a vulnerable system can be able to deliver.

SVF is a static analysis framework implemented in LLVM that allows value-flow construction and pointer analysis to be performed in an iterative manner (sparse analysis – analysis conducted into stages, from overapproximate analysis to precise, expensive analysis). It uses (default) points-to information from Andersen’s analysis and constructs an interprocedural memory SSA (Static-Single Assignment) form where def-use chains of both top-level and address-taken variables are included. From the design perspective, the SVF can be seen as two modules: pointer analysis and memory SSA. To be capable of allowing another points-to analysis than Andersen’s, the design of pointer analysis is split into three parts: Graph, Rules, and Solver. To give control of scalability and precision, the memory SSA is designed to allow users to partition memory into a set of regions. Comparing to previous work, SVF takes a big leap by constructing an interprocedural value-flow graph. To identify bugs, it is an important requirement to be able to analyze a program across its procedural boundaries.

The latest version of SVF is implemented on LLVM-7.0. SVF takes LLVM IR code (a bitcode file) as input (LLVM IR module) and can analysis both C/C++ programs. To use SVF, the source needs to be compiled with Clang to generate the bitcode (clang -emit-llvm source.c -o source.bc) and later links all (.bc) file into a large single (.bc) file through LLVM Gold plugin (llvm-link source1.bc source2.bc -o source.bc).

As mentioned before, the pointer analysis consists of three loosely coupled components: Graph, Rules, and Solver. The graph is a higher level abstraction, shows where a pointer analysis should take place. A rule defines how to derive the points-to information from a statement in the graph. The ‘rules’ is the location where anyone can implement a different set of points-to analysis mechanism than Andersen’s. As the pointer-analysis will take place interprocedural, it is an important requirement to design a completely separate part to indicate which order should the points-to analysis work on the graph, i.e. the Solver. As an example, for Andersen’s points-to analysis, it uses transitive closure rules and a solver named wave. Similarly, a flow-sensitive pointer analysis uses a set of flow-sensitive strong/weal update rules with a points-to propagation solver on a sparse value-flow graph.

There are four steps in the value-flow graph construction. First, it performs a ‘Mod-Ref Analysis’ to capture interprocedural reference and modification side effect for each variable (a set of indirect defs). The ‘Mem Region Partitioning’ separates the memory partition into regions to let the user have the capability of deciding scalability and precision trade-offs. The ‘Memory SSA’ take place once the indirect uses and defs are known. It follows a standard SSA conversion algorithm to structure the program in memory SSA form. Finally, the ‘VFG Construction’ module links the uses of a variable to its defs and construct the value-flow graph.

The following statements are important to construct a complete value-flow graph for address-taken variables:

ADDROF is known as an allocation site (a stack object [alloca instruction] or a global object [an allocation site or a program function name] or a dynamically created object at its allocation site). Note: For flow-sensitive pointer analysis, the initializations for global objects take place at the entry of main().

Copy denotes either a casting instruction or a value assignment from a PHI instruction in LLVM.

PHI is introduced at a confluence point in the CFG to select the value of a variable from different control-flow branches.

LOAD is a memory accessing instruction that reads a value from an address-taken object.

STORE is a memory accessing instruction that writes a value into an address-taken object.

CALL denotes a call instruction, where the target can be either a global variable (for a direct call) or a stack virtual register (for an indirect call). In a cell site, a return value or in the entry of a procedure, the parameter could be defined as direct or indirect.

One featured client of SVF is source-sink analysis (detecting security bugs). An example bug detector is a memory leak detector where it is important to find out whether a memory allocation (source) will reach a free site (sink) or not. SVF’s value-flow graph could be used by a source-sink analysis client to achieve it. The other client could be a selective instruction client which will allow dynamic analysis tool to look only at suspicious sites. It could also be used for developing a debug tool to guide the developer.

Besides their 40K lines of code for SVF, the group also open-sourced a pointer analysis mirco benchmark, PTABen.

k-hunt: Pinpointing Insecure Cryptographic Keys from Execution Traces

Summary: It would be useful for attackers if they can identify the memory location where an application store its cryptographic keys. It will be more useful to do taint analysis for various purpose (e.g. identify if a key is insecure). This research uses an online dynamic verification system to identify the cryptographic key memory location and verify if they are secure.

Design: The key challenges to identify a cryptographic key are: 1) how to identify cryptographic operations without the signature (because they are intended for proprietary algorithm too), 2) how to identify the memory buffer of the cryptographic key. Additionally, to detect an insecure key in a complex program, it is important to design an optimized taint mechanism. For each of them, they present their key insights. For case 1, they use three different heuristics: identify basic blocks which involves the arithmetic operation, repeatedly executed, and their input-output relationship contain randomness. For case 2, they use two additional heuristics: usually, the key size is less than data buffer size and key are usually derived just before cryptographic execution with randomness. When the key is identified, in the next phase, they rerun the dynamic analysis to detect the insecure keys. To tag a key insecure, it should be: 1) generate with inadequate randomness, 2) in the remote-client environment, only one of them can participate in key derivation, or 3) key is not cleaned up immediately after the use.

Advantage: The design is simple and modularized. It only depends on one specific technology, dynamic binary analysis. The evaluation is influential. The cause is effective.

Disadvantage: The overhead is high for online verification. There is still a possibility of communication cut. The rerun for insecure key detection could be failed because of ASLR.

Discussion: Overall, the research has a significant contribution. More insecure key cases could be derived in future and implemented over it.

Due to memory vulnerability in popular programming (C, C++), the hacker can use them to overwrite other memories and take control of the vulnerable system. To the attacker, control flow hijack is one of the first priority to check on if they can find a vulnerability. Control flow hijack stands for turning regular program execution path to the attacker’s desired target. As example, an attacker can jump into the shell by targeting system function call setting with appropriate arguments in memory. If the vulnerable program has root access, the attacker can so too. Control flow integrity keeps an active eye on program execution time so that the program cannot be departed from its expected path. Two things are most important in CFI. 1) A complete control flow graph. 2) A strong enforcement policy. If either of them lose a little bit of accuracy, it will harm the another. Both the features are important for each other, so it is expected that every CFI system properly explain them.

Summary: The project has tried to achieve an ambitious goal: based on their execution history, enforce a CFI policy that will allow only one valid target for an indirect jump/call. For decades, researchers have tried to design a strict enforcement, a strong CFI policy. But the performance overhead and complex real-world program fail them every time. This project is also no exception to them. They have failed to evaluate spec benchmark like gcc, xalancbmk, omnetpp etc. which are the most important spec benchmark to evaluate considering their complexity with implementing CFI defense (they also have the most number of indirect calls). The research hugely depends on Intel PT hardware support which usually fails to capture complete information if they have frequently encountered. Basically, Intel PT is designed for offline debug purpose, so there is no chance to fix this issue for an online analysis.

Design: The project can be divided into three parts: 1) constrain data identification 2) arbitrary data collection and 3) online analysis. Constrain data is a program variable which is not a function pointer but has a significant impact to fix where a function pointer will target. For example: if there is an array of function pointer, and someone uses variable i as the index for that array to call an indirect function; then i is going to be a constraint data. They use compile-time analysis to identify first instructions related to function pointer calculation, second, they check the operands of those instructions if they are variable. Once they know constraint data, they have to collect the data at runtime (2). They use compile-time instrumentation for this purpose. Now, as they are planning to use Intel PT to track execution history, so they have to figure out a way how to force the instrumention to entry the arbitrary value in Intel PT trace. Hence, they come out with a trick: they call their defined write function with constraining data value as the parameter and from there, they create multiple indirect function call with return only body where the target will be fixed by a chunk of the passed value. To reduce the number of packet trace, they also do the same for the sensitive basic block where they also assign a unique id for each of them. With all of these, they can now get rid of checking TNT packets (for conditional control transfer) while only checking TIP packets (indirect control transfer). For online analysis(3), they decide to it in separate processor parallel to the original program. They have defined a read method to reconstruct the arbitrary data from Intel PT packet trace and validate it with a mapping.

Advantage: If you can perfectly identify every constraint data for every sensitive indirect control transfer, then if you have a perfectly working hardware to trace the history, then if the separate validation does not harm your security guarantee, then it is a very good system.

Disadvantage: Unfortunately, not any of them is possible to work perfectly. The static analysis with the complex real-world application will never be successful with such a simple algorithm. There will be always special cases and you have to keep updating your analysis mechanism time by time. Next, a missing perfect hardware is eventually reported by themselves. And everyone knows, you need to know before you jump. Pause your program in sensitive system call-points can easily cause a DoS attack by creating a real long pending list of validation check through frequently created indirect control flow transfer. About performance overhead claim, they have mentioned 10% average performance overhead. But, for perlbench and h264ref they report 47% and 24% performance overhead when they have not evaluated similar benchmark gcc, omnetpp, and xalancbmk. If we even consider 25% overhead for each of them, the overall overhead will go up at least 15%. They have evaluated nginx and vsftpd but they have not mentioned what test suites they have used. Moreover, their system is creating more indirect call, maybe even more than that exists in the target program.

Discussion: To emphasize their security improvement, they have mentioned a case study from h264ref spec benchmark. They show there is a structure that has a member function pointer. This structure is later used to create an array of size 1200. With that, they simply assume, there are 1200 valid targets for that member function pointer (unique one for each of them) and they have reduced it to 1. But the truth is that member function has only taken three valid functions in the entire source code. The project is too ambitious and hence not practical.

Summary: Intel PT is an Intel hardware support for offline debugging. It can capture compressed data packets for indirect control flow transfer, conditional branch taken/not taken etc. PT-CFI attempts to use the hardware feature for the backward edges through enforcing a shadow style protection. It leaves forward indirect control transfer out of its scope due to the unavailability of complete point-to analysis to generate the required CFI policy. It also limits its verification to critical syscall to undo the performance overhead.

Design: PT-CFI has four components: 1) packet parser will generate the TIP sequence and fed to 2) TIP graph matching (build upon training). If not match, invokes 3) Deep inspection to decode and construct shadow stack. If a match with shadow stack, add the new TIP sequence to TIP graph; otherwise 4) the syscall hooker will be informed to terminate.

Advantage: The paper describe the basic knowledge on Intel PT well.

Disadvantage: The paper fails to clarify crucial parts. For example, in deep inspection, they construct a shadow stack based on Intel PT traces and claim to match with shadow stack (what shadow stack?). They leave forward indirect control transfer out of consideration. The performance overhead is still very high (e.g. gcc spec with 67%).

Discussion: Intel PT is introduced for offline analysis, using it for online validation is not overall a good idea. The CFI policy generation with training concept is not well described. Mostly, the deep inspection is hugely misleading.

GRIFFIN: Guarding Control Flows Using Intel Processor Trace

Summary: The author only attempt to prove the performance overhead optimization using Intel PT for online verification. They claim to verify the enforcement policy for both backward and forward indirect control transfer with different strictness of policy when they completely discard the discussion regarding how they achieve these policies to verify with.

Design: To efficiently extract the information from Intel PT trace log, they propose a fast lookup design. The basic block to derefered pointer lookup is basically a mechanism similar to a cache. They hardly describe anything related their Coarse-grained and Fine-grained policy, but as they only trace TIP packet in Intel PT trace, they at-most can use indirect control transfer information based policy. They simply omit enough discussion on stateful policy for forward control transfer by mentioning they use a similar approach to PathArmor.

Advantage: They briefed about advanced topics like how to handle context switch, process fork etc. when they omit much needed basic discussion.

Disadvantage: The writing is a complete hide and seek game. It misses much of the basic discussion. It completely discards anything about how they get the policies. However, they put enforcement in a parallel process and enforce strict restriction in sensitive syscall entry. This overall concise the whole idea of control flow integrity. Hence, it is not meaningful how the performance overhead they achieve.

Discussion: An offline debugging hardware should not be considered for online verification, it will be just a waste of time.

Transparent and Efficient CFI Enforcement with Intel Processor Trace

Summary: The project is aiming to protect indirect control transfer through coarse-grained indirect control flow graph using Intel PT only at security sensitive system call point. The system uses a fast and slow check hybrid method to achieve efficiency. The fast check doesn’t require to decode the trace and only available if the dynamically fuzzing based training CFG have the exact trace. Otherwise, a slow path has to decode the trace and verify it against a static analysis based CFG (TypeArmor).

Design: The design consists of four tasks. First of all, there is a static analysis based CFG. Then, they will rank the CFG edges based dynamic fuzzing training CFG. There will be a system call interceptor in the kernel to warn security sensitive system call. The flow checker will collect trace and based on its trace, will match with the fast path or slow path. The CFG edges are only from indirect control transfer as the runtime will only use Intel PT TIP packets.

Advantage: The performance overhead is acceptable, but the security guarantee is very low. The paper tries to describe every step.

Disadvantage: The design is very concise. First of all, they only monitor security sensitive indirect control flow transfer. The fast path only available for edges with fuzzy CFG (depends on code coverage). The slow path uses static CFG which is overapproximate. The CFG itself only consists of indirect control transfer edges as Intel PT only use with TIP packets.

Discussion: This is the 3rd paper who also tries to use an offline debug purpose hardware, Intel PT, to use for online verification. Their limiting monitor on security sensitive system call once again prove that this hardware is not good choice for this purpose.

Linux kernel is a complicated code base written in c programming language. As for this, they are vulnerable to memory corruption. Generally, it is believed to be the kernel is trusted. The only way kernel can be acted according to the user is through system calls from user programme. So, the researchers are focusing on innovating new fuzzing techniques that can generate random syscall (with correct arguments) sequence. A number of vulnerabilities such as race-condition, data-races, user-after-free etc. are possible to be popped up with such technique.

RAZZER: Finding Kernel Race Bugs through Fuzzing

Summary: Existing kernel fuzzer can be divided into two categories: 1) generate a different combination of syscall with a different combination of multi-threaded environment. 2) Impose different time interleave in the kernel thread. A combined system with both features was expected to appear and razzer is that. However, they bring one more layer on top of that system which is a static analysis tool. They get 30 new races in kernel and 16 of them are confirmed.

Summary: Existing kernel fuzzer can be divided into two categories: 1) generate a different combination of syscall with a different combination of multi-threaded environment. 2) Impose different time interleave in the kernel thread. A combined system with both features was expected to appear and razzer is that. However, they bring one more layer on top of that system which is a static analysis tool. They get 30 new races in kernel and 16 of them are confirmed.

Design: The system consists of a static analyzer, a single thread fuzzer, and a multi-thread fuzzer. The static analyzer uses a point-to analysis to select a pair of memory access instructions as a candidate race pair (overapproximate). Next, the single thread fuzzer consists of a generator and executor where the generator generates a sequence of random syscalls and executor observes if a candidate race pair is appeared to be executed. If there is one, the sequence of random syscalls later push to multi-thread fuzzer. It also has two components: the generator splits the syscalls into different threads and the executor observes if the selected candidate race pair instructions access the same memory. If they are, they tag that candidate race pair as true race pair.

Advantage: Time interleaving in a thread is an important issue for detecting data races. Random syscall sequence check is equally important to cover most of the kernel code and hence increase various race conditions detection. A combination of race condition and data races detection technique helps to find out more data races. The system design is simple and hence practical.

Disadvantage: Instead of using random syscall sequence from syzkaller, they can use MoonShine for generating a logical sequence of syscalls. It may reduce the overhead and increase the vulnerability findings. The multi-thread fuzzer generator put different syscalls in different threads randomly. A deterministic measurement here could be useful if it is possible to detect which syscall cause the execution of an instruction in the candidate race pair.

Discussion: When they try to match two instructions accessing same memory concurrently while at least one of them is writing on that memory, they completely discard one major: exploitability. Their 16 out 30 true race pair being confirmed indicates that not every true race pair is exploitable. I believe a more accurate implementation of this design can find more true race pair and even can see more 50% of them as non-exploitable.

Summary: Syzkaller is one of the most popular kernel fuzzer. It generates a sequence of random system calls. Due to the randomness, most of them are unrealistic cases. They lose the efficiency because they don’t consider dependency (both implicit and explicit) among system calls. There are some other kernel fuzzer who uses a large number of benchmark program and use their sequence of system call to their fuzzer. In this project, they use a filtering algorithm that can identify dependent syscalls from real-world applications and use these syscall blocks into the fuzzer to the randomness of the blocks.

Design: The project basically extends the syzkaller and replace their random syscall sequence generator with random syscall block sequence generator. In the first step, they pour down the real-world application and observe their explicit and implicit syscall dependency. One example could be a file read; a file first must open, then read and usually they also close the file. If we think deeper, before a file read, the application also usually allocate some memory as the buffer to store the read data. In kernel perspective, these dependencies could be implicit or explicit. An explicit dependency is if a return of a syscall is used as an argument to another syscall. An implicit dependency is if two different syscalls use the same data structure in the kernel. When an explicit syscall is straightforward to resolve, the implicit dependency requires a kernel level static analysis where it collects if two different syscalls may use the same global variable in the kernel where one is a write and another is a read.

Advantage: The motivation is realistic, the design is simple, hence the project is significant and practical. They have found 17 new vulnerabilities of which 7 of them can be also found through explicit dependency check.

Disadvantage: The system don’t try to prove exploitability like every other. But, as the goal of the system is to generate more logical fuzzing input than complete randomness, their probability of exploitability is higher. They have confirmed 9 out of 17 have already fixed.

Discussion: The motivation is genuine. Although the workload is not heavy, it is the impact which let it be a significant work.

Posts navigation

About Me

I am a graduate student (Ph.D.) in Computer Science at Florida State University. I am glad to be supervised by Dr. Zhi Wang. My research interest is in System and Software Security. Specifically, my research is focused on low-level code bug detection and defense mechanism through static and dynamic binary analysis with compiler support. Besides that, I am studied on heap exploitation, virtualization, and crash recovery technique.

I participate in algorithmic, programming, and capture the flag contests. I also have a blog on tutorial for learning various tools mostly related to security research. I have completed my bachelor from Chittagong University of Engineering and Technology (2012) in Computer Science and Engineering and later joined in Samsung R&D Bangladesh (2012) as Android Application Developer. I have started my Ph.D. track program at Florida State University from Fall’2015 and have passed Qualification Exam (2017).