Botnets are networks of infected machines controlled by an external entity, the botmaster, who uses this infrastructures for malicious activities (i.e. spamming and Distributed Denial Of Service). The botmaster employs a machine, the Command and Control Server (C&C), to send commands to, and gather information from, the bots. The communication between the bots and the C&C is established through a variety of protocols that change from botnet to botnet. In the case of DGA-based botnets, the protocol used to find the rendezvous point between the bots and the botmaster is a Domain Generation Algorithm (DGA). The mitigation of a botnet is a topic widely covered in literature but several proposed systems suffer from the major shortcomings of either using a supervised approach, which means the system needs some a priori knowledge, or leveraging DNS data that contain informations on the infected machines, which leads to users' privacy issues.
We have concentrated on CERBERUS, an automated system based on machine learning that overcomes such shortcomings thanks to an unsupervised approach, that means the system does not need any a priori knowledge to analyze passive DNS data free of any privacy issues. CERBERUS is proven to be an effective system to discover botnets. Not only have we managed to make CERBERUS, a proof of concept so far, a real working system able to operate on its own but we have also deployed it in the real world. In order to do so we have added a new module feeding the system with data collected from a passive DNS sensor and we have partially reorganized the program flow to achieve our goals in terms of performance and robustness. The new system is now multiprocess so it can maximize the use of computing resources and it is scalable. Moreover, it can recover from errors which may occur without compromising the integrity of the system.
CERBERUS is now working around the clock analyzing DNS data and classifying malicious domains to discover new threats. We constantly monitored CERBERUS's work for two weeks and it was able to analyze 13.506.000 domains and classified 144 of them as malicious. After that we let CERBERUS work on its own for two more weeks and we checked whether the system was still working, which allowed us to prove that the system is autonomous.