Selected Research Projects

Despite great advancements from machine learning, especially deep learning, current learning systems have severe limitations. Even if a learner performs well on the typical scenario in which it is trained and tested on the same/similar data distribution, it can fail under new scenarios and be fooled and misled by attacks at inference time (adversarial examples) or training time (data poisoning attacks). As learning systems become pervasive, safeguarding their security and privacy is critical.

In particular, recent studies have shown that the current learning systems are vulnerable to evasion attacks such as adversarial examples for which the perturbation could be of very small magnitude. For example, our work has shown that by putting printed color stickers on road signs, a learner can be easily fooled by such physical perturbation. This is one of the first works to generate robust physical adversarial perturbation that remain effective under various conditions and viewpoints. Moreover, the model may be trained with a poisoned data set, causing it to give wrong predictions under certain scenarios. Our recent work has demonstrated that attackers can embed “backdoors” in a learner using a poisoned data set on real-world applications such as face recognition.

Several solutions to these threats have been proposed, but they are not resilient against intelligent adversaries responding dynamically to the deployed defenses. Generalization is a key challenge to deep learning systems. How do we know how a deep learning system such as a neural program, a robot or a self-driving car will behave in a new environment and still be safe and secure against attacks such as adversarial perturbation? How do we specify security properties for deep learning systems? How do we test and verify desired security properties for deep learning systems? Is it possible to provide provable guarantees? Thus, the question of how to improve the robustness of machine learning models against advanced adversaries remains largely unanswered. Here we aim to explore practical novel attack strategies against real-world machine learning models, and therefore develop robust learning algorithms that are resilient against strong adaptive attackers.

Machine learning has recently been widely adopted in various applications, and such success is largely due to a combination of algorithmic breakthroughs, computation resource improvements, and especially the access to a large amount of diverse training data. However, such massive data usually contain privacy sensitive information such as medical and financial information of individuals. With the rise of ubiquitous sensing, personalization, and virtual assistants, users' privacy is at ever-increasing risk. Can we enable the power and utility of machine learning and data analytics while still ensuring users' privacy? What is the relationship between privacy-preservation and generalization and robustness in machine learning? Can we design privacy preserving learning algorithms that can ensure privacy and guarantee high data utility?

We will explore novel techniques including differential privacy to enable privacy-preserving machine learning and data analytics in practice. Our long-term goal is to both provide practical real-world solutions for privacy-preserving machine learning and data analytics and deepen the theoretical understanding in the big data era.