Machine Learning Could Be Used To Identify Anonymous Code

Just like cooking, drawing, or writing, when it comes to coding, every programmer has a personal preference in how the algorithm is laid out, how certain pieces of code are strung together, and so on, ultimately creating a “signature” of sorts. Now researchers have found that by using machine learning, it can be used to help identify pieces of code even if they were written anonymously.

How the AI works is by being fed examples of a programmer’s work where it studies the coding structure. From there, it will then be able to train itself to be capable of spotting that programmer’s work in the future. Based on the testing that they did using Google’s Code Jam, their AI appeared to be relatively adept as it was capable of identifying the programmers 83% of the time.

So why is this useful? It could be useful in investigating instances of hacks, or identifying who created certain pieces of malware, which generally tend to be anonymous. It could also be used in legal situations where a developer might accuse another of copying their code. However the downside is privacy, where there might be some instances in which programmers might choose to remain anonymous for certain reasons, and being able to be identified is not necessarily a good thing.