Return Instruction Identiﬁcation in Binary Code with Machine Learning

Jing Qiu and Guanglu Sun

School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, 150080, China(Submitted on November 12, 2018; Revised on December 15, 2018; Accepted on January 16, 2019)

Abstract:

Binary code analysis is the main method for malware analysis. In this paper, the analysis is started by identifying return instructions to disassemble binary code. The return instruction identification problem is converted into a binary classification problem: is a byte in binary code the first byte of a return instruction? The 32 bytes around a byte in binary code are considered the feature of the byte. A multilayer perceptron is employed to build the classification model. Then, the model is trained with 1,383 binaries from Windows XP SP3. The evaluation results on several open sources show that our approach is feasible and has high accuracy.