Bookmark

Computer Science > Software Engineering

Title:
Not all bytes are equal: Neural byte sieve for fuzzing

Abstract: Fuzzing is a popular dynamic program analysis technique used to find
vulnerabilities in complex software. Fuzzing involves presenting a target
program with crafted malicious input designed to cause crashes, buffer
overflows, memory errors, and exceptions. Crafting malicious inputs in an
efficient manner is a difficult open problem and often the best approach to
generating such inputs is through applying uniform random mutations to
pre-existing valid inputs (seed files). We present a learning technique that
uses neural networks to learn patterns in the input files from past fuzzing
explorations to guide future fuzzing explorations. In particular, the neural
models learn a function to predict good (and bad) locations in input files to
perform fuzzing mutations based on the past mutations and corresponding code
coverage information. We implement several neural models including LSTMs and
sequence-to-sequence models that can encode variable length input files. We
incorporate our models in the state-of-the-art AFL (American Fuzzy Lop) fuzzer
and show significant improvements in terms of code coverage, unique code paths,
and crashes for various input formats including ELF, PNG, PDF, and XML.