Malwares aiming at security vulnerabilities of various internet applications have become significant threat to information security. These malwares gradually become tools of experienced crime activities, which endanger network services and invade internet users’ privacy seriously. By years of competition with security measures, the anti-analyze and anti-detect technologies used in malwares have evolved dramatically, which is a huge challenge to detect and analyze these malwares for present malware prevention system.

Malware attack mechanism analysis and signature generation are both important directions to improve existing malware prevention system. Automatic fine-grain analysis based on dynamic taint propagation technology has become hotspot in the research field of malicious code analysis. To fulfill the demands of malware analysis, this paper studied the key methodologies and technologies of dynamic taint propagation, then designed and implemented a malware analysis system based on dynamic taint propagation. On this basis, this paper proposed solutions for two typical problems of malware analysis: one is network malware communication protocol reverse engineering, the other is malicious attack signature generation. The main contributions of this paper are as follows:

1) Aiming at difficulties and challenges encountered by current malicious code analysis, this paper proposed a new analysis architecture based on dynamic taint propagation, which provided several key features, such as independence from source code, transparency for malware execution, fine-grain analysis ability, and so on. Malwares equipped with dynamic code generation technologies, code confusing technologies, anti-debug technologies and so on can be analyzed effectively by this architecture.

3) On the aspect of malware communication protocol reverse engineering, recent works have limited accuracy and integrity in identifying protocol fields and are especially weak in understanding fields’ semantics. This paper proposed a behavior based analysis method. We built ETPG, from which we identified how individual protocol element was manipulated by the malicious process to divide the protocol data into different syntax fields. On this basis, we analyzed the API function calls related to each syntax field and induced the semantic information by referring the semantics contained in the functions. We implemented a prototype system and evaluated it with malware samples. The experiment results show that our method can archieve the syntax fields division and semantic extraction accurately and effectively.

4) On the aspect of malicious attack signature generation, this paper proposed a signature generation method based on traceable dynamic behavior analysis. By monitoring the instruction-lever execution of the vulnerable process, we extracted the executing trace and the constrain conditions exactly related to input data directly exploiting the vulnerability. Then, we restored the execution context and supplement the determinant statements to attain an executable Turing machine signature. We implemented a prototype system and evaluated it with different attack samples, which proved that our method was able to generate accurate attack signature fast without program source code.

5) Finally, this paper designed and implemented a malware analysis system based on dynamic taint propagation, satisfying the current demands for malware analysis, achieving flexible analysis configuration of selectable taint source and abnormal events, and providing analysis ability to extract data association and behavior association based on sensitive tainted data. With this system a special analysis against specific malware could be deployed very fast.