Data Mining Methods for Malware Detection using Instruction Sequences

Keywords

Abstract

Malicious programs pose a serious threat to computer security. Traditional approaches using signatures to detect malicious programs pose little danger to new and unseen programs whose signatures are not available. The focus of the research is shifting from using signature patterns to identify a speciﬁc malicious program and/or its variants to discover
the general malicious behavior in the programs. This paper presents a novel idea of automatically identifying critical instruction sequences that can classify between malicious and clean programs using data mining techniques. Based upon general statistics gathered from these instruction sequences we formulated the problem as a binary classiﬁcation problem and built logistic regression, neural networks and decision tree models. Our approach showed 98.4% detection rate on new programs whose data was not used in the model building process.