Document Type



Master of Science


Computer Science

First Adviser

Wang, Ting


In this paper, we present two different novel approaches to defend against adversarial examples in neural networks: attention-based against pixel-based attack and certificate-based against spatially transformed attack. We discuss the vulnerability of neural networks for adversarial examples, which significantly hinders their application in security-critical domains. We detail several popular pixel-based methods of attacking a model. We then walk through current defense methods and note that they can often be circumvented by adaptive adversaries. For the first contribution, we take a completely different route by leveraging the definition of adversarial inputs: while deceiving for deep neural networks, they are barely discernible for human visions. Building upon recent advances in interpretable models, we construct a new detection framework that contrasts an input’s interpretation against its classification. We validate the efficacy of this framework through extensive experiments using benchmark datasets and attacks. We believe that this work opens a new direction for designing adversarial input detection methods. As for the second contribution, we discuss a completely different approach to generate adversarial examples, based on the spatial transformation of an input image. We then extend a currently proposed certificate framework to this setting and show that the certificate can improve the resilience of a network against adversarial spatial transformation.