De-anonymizing Programmers

Large Scale Authorship Attribution from Executable Binaries of Compiled Code and Source Code

Last year I presented research showing how to de-anonymize programmers based on their coding style. This is of immediate concern to open source software developers who would like to remain anonymous. On the other hand, being able to de-anonymize programmers can help in forensic investigations, or in resolving plagiarism claims or copyright disputes.

I will report on our new research findings in the past year. We were able to increase the scale and accuracy of our methods dramatically and can now handle 1,600 programmers, reaching 94% de-anonymization accuracy. In ongoing research, we are tackling the much harder problem of de-anonymizing programmers from binaries of compiled code. This can help identify the author of a suspicious executable file and can potentially aid malware forensics. We demonstrate the efficacy of our techniques using a dataset collected from GitHub.