A botnet is a network of compromised machines (bots) under the control a an entity (the botmaster), which uses them to perform illegal activities.
Modern botnets rely on domain generation algorithms, also known as DGA, to build resilient C&C infrastructures. Recently, researchers proposed approaches to recognize automatically-generated domains from DNS traffic to infiltrate into such C&C infrastructures and cause the masters to lose control of their bots.
Unfortunately, such approaches require access to DNS sensors whose deployment poses practical issues that render their adoption problematic. Instead, we propose a novel way to combine publicly-available and privacy-preserving databases of historical DNS traffic together with linguistic-based models of the suspicious domains. From this, we find automatically-generated domain names, characterize the generation algorithms, isolate logical groups of domains that represent the respective botnets, and produce novel knowledge about the evolving behavior of each tracked botnet.
We evaluated our approach on millions of real-world domains. Overall, it correctly flags 81.4 to 94.8% of the domains as being automatically generated. More important, it isolates families of domains that belong to different DGAs. We were also able to verify the validity of our findings against live botnets (e.g., Conficker.B).