The underground malware-based economy is flourishing and it is evident
that the classical ad-hoc signature detection methods are becoming
insufficient. Malware authors seem to share some source code and
malware samples often feature similar behaviors, but such commonalities
are difficult to detect with signature-based methods because of an
increasing use of numerous freely-available randomized obfuscation
tools. To address this problem, the security community is actively
researching behavioral detection methods that commonly attempt to
understand and differentiate how malware behaves, as opposed to just
detecting syntactic patterns. Continuing that line of research, in
this talk I will explore how grammatical inference and tools of the
verification trade could be used for malware detection and analysis. I
will present a new approach to learning and generalizing from observed
malware behaviors based on tree automata inference. In particular, I
will show how one can infer k-testable tree automata from system call
dataflow dependency graphs and discuss the use of inferred automata in
malware recognition and classification. At the end, I will briefly
survey some other related work I have done in recent past, as well as
hint the future research directions.