Skip to content

Latest commit

 

History

History
1472 lines (773 loc) · 87.6 KB

File metadata and controls

1472 lines (773 loc) · 87.6 KB

A list of papers in NeurIPS 2022 related to adversarial attack and defense / AI security.

Language : EN | CN

Background:

  1. Source of Keywords and TL;DR: Original Author/Review Comments/Paper Abstract/Personal Summary

  2. Academia and industry are trying to build a robustness benchmark:

If there are any additions, omissions or errors, please point them out!

Table of Contents


Adversarial Attacks


Black-box Attacks (7)

Shengming Yuan, Qilong Zhang, Lianli Gao, Yaya Cheng, Jingkuan Song

Keywords: unrestricted color attack, transferability, flexible, natural, semantic-based, unrestricted attack

TL;DR: we propose a Natural Color Fool (NCF), which fully exploits color distributions of semantic classes in an image to craft human-imperceptible, flexible, and highly transferable adversarial examples.

Yucheng Shi, Yahong Han, Yu-an Tan, Xiaohui Kuang

Keywords: Adversarial attack, Black-box attack, Decision-based attack, Vision transformer

TL;DR: This paper proposes a new decision-based black-box adversarial attack against ViTs with theoretical analysis that divides images into patches through a coarse-to-fine search process and compresses the noise on each patch separately.

Zeyu Qin, Yanbo Fan, Yi Liu, Li Shen, Yong Zhang, Jue Wang, Baoyuan Wu

Keywords: Adversarial Transferability, Black-Box Attacks, Adversarial Examples

TL;DR: The authors claim that the adversarial examples should be in a flat local region for better transferability. To boost the transferability, they propose a simple yet effective method named Reverse Adversarial Perturbation (RAP). RAP adds an inner optimization to help the attack escape sharp local minima, which is general to other attacks. Experimental results demonstrate the high effectiveness of RAP.

Zikui Cai, Chengyu Song, Srikanth Krishnamurthy, Amit Roy-Chowdhury, M. Salman Asif

Keywords: surrogate ensemble, bilevel optimization, limited query attacks, surrogate ensemble, hard-label attacks, query attack,

TL;DR: Optimizing a weighted loss function over surrogate ensemble provides highly successful and query efficient blackbox (targeted and untargeted) attacks. In this paper, we propose a novel method for Blackbox Attacks via Surrogate Ensemble Search (BASES) that can generate highly successful blackbox attacks using an extremely small number of queries.

Chenghao Sun, Yonggang Zhang, Wan Chaoqun, Qizhou Wang, Ya Li, Tongliang Liu, Bo Han, Xinmei Tian

Keywords: Adversarial examples, Adversarial attacks, Black-box attack, No-box attack

TL;DR: This paper enables black-box attack to generate powerful adversarial examples even when only one sample per category is available for adversaries.

Yunrui Yu, Xitong Gao, Cheng-zhong Xu

Keywords: adversarial attack, ensemble adversarial defense, model robustness, model ensemble, ensemble attack

TL;DR: Re-weighing sub-models in various ensemble defenses can lead to attacks with much faster convergence and higher success rates. The main idea is to assign different weights to different models.

Abhishek Aich, Calvin-Khang Ta, Akash A Gupta, Chengyu Song, Srikanth Krishnamurthy, M. Salman Asif, Amit Roy-Chowdhury

Keywords: adversarial machine learning, image classification, generative adversarial attack, Generator-based, multi-label

TL;DR: Vision-language models can be used by attackers to create potent perturbations on multi-object scenes to fool diverse classifiers.


Backdoor, Data Poisoning (9)

Pedro Sandoval-Segura, Vasu Singla, Jonas Geiping, Micah Goldblum, Tom Goldstein, David W. Jacobs

Keywords: autoregressive processes, poisons, data poisoning, data protection, imperceptible perturbations, adversarial machine learning

TL;DR: This paper proposes a new data poisoning attack to prevent data scraping. The proposed method adds class conditional autoregressive (AR) noise to training data to prevent people from using the data for training, and the method is data and model independent, which means that the same noise can be used to poison different datasets and models of different architectures.

Khoa D Doan, Yingjie Lao, Ping Li

Keywords: arbitrary trigger, backdoor attacks, generative models

TL;DR: Backdoor Attacks with Arbitrary Target Class

Lue Tao, Lei Feng, Hongxin Wei, Jinfeng Yi, Sheng-Jun Huang, Songcan Chen

Keywords: Adversarial Training, Adversarial Robustness, Availability Attacks, Hypocritical Perturbations

TL;DR: Adversarial training may fail to provide test robustness under stability attacks, and thus an adaptive defense is necessary to resolve this issue. This paper introduces a novel data poisoning attack against adversarial training called stability attacks. The goal is to temper the training data such that the robust performance of adversarial training over this manipulated dataset is degraded. To construct this attack, a hypocritical perturbation is built: unlike adversarial perturbations, the aim of hypocritical perturbations is to reinforce the non-robust features in the training data. These perturbations can be generated by negating adversarial example generation objectives.

Hossein Souri, Liam H Fowl, Rama Chellappa, Micah Goldblum, Tom Goldstein

Keywords: Backdoor attacks, data poisoning, clean labels, adversarial examples, security

TL;DR: We introduce the first scalable hidden trigger backdoor attack to be effective against neural networks trained form scratch.

Xiangrui Cai, haidong xu, Sihan Xu, Ying Zhang, Xiaojie Yuan

Keywords: Continuous prompt, backdoor, few-shot learning

TL;DR: This paper conducts a study on the vulnerability of the continuous prompt learning algorithm to backdoor attacks. The authors have made a few interesting observations, such as that the few-shot scenario poses challenge to backdoor attacks. The authors then propose BadPrompt for backdoor attacking continuous prompts.

Yufei Chen, Chao Shen, Yun Shen, Cong Wang, Yang Zhang

Keywords: Data poisoning, membership inference, data privacy

TL;DR: We demonstrate how to use data poisoning attacks to amplify the membership exposure of the targeted class.

Sanghyun Hong, Nicholas Carlini, Alexey Kurakin

Keywords: Backdoor attacks, handcrafting model parameters, neural networks, supply-chain attack

TL;DR: We show that the backdoor attacker, originally presented as a supply-chain adversary, can handcraft model parameters to inject backdoors into deep neural networks.

Yiming Li, Yang Bai, Yong Jiang, Yong Yang, Shu-Tao Xia, Bo Li

Keywords: Ownership Verification, Dataset Protection, Copyright Protection, Backdoor Attack, AI Security

TL;DR: We explore how to design the untargeted backdoor watermark and how to use it for harmless and stealthy dataset copyright protection.

Wenxiao Wang, Alexander Levine, Soheil Feizi

Keywords: data poisoning, robustness, theory, security

TL;DR: We propose Lethal Dose Conjecture, which characterizes the largest amount of poisoned samples any defense can tolerate for a given task, and showcase its implications, including better/easy ways to improve robustness against data poisoning. (This paper suggests a hypothesis for the necessary and sufficient amount of malicious samples needed asymptotically for successful data poisoning. Most notably, the amount is inversely proportional to the minimum amount of data needed to learn the concept for the chose model class. )


Adversarial Reprogramming (3)

Matthias Englert, Ranko Lazic

Keywords: adversarial reprogramming, adversarial examples, adversarial robustness, random networks, implicit bias

TL;DR: We show that neural networks with random weights are susceptible to adversarial reprogramming, and that in some settings training the network can cause its adversarial reprogramming to fail.

Qizhou Wang, Feng Liu, Yonggang Zhang, Jing Zhang, Chen Gong, Tongliang Liu, Bo Han

Keywords: OOD Detection, model reprogramming, adversarial reprogramming

TL;DR: boosting classification-based OOD detection via model reprogramming

Guanhua Zhang, Yihua Zhang, Yang Zhang, Wenqi Fan, Qing Li, Sijia Liu, Shiyu Chang

Keywords: Fairness, Model Reprogramming, NLP, CV

TL;DR: We introduce a novel model reprogramming based post-processing fairness promoting method for machine learning models.


Physical Attacks (2)

Yibo Miao, Yinpeng Dong, Jun Zhu, Xiao-Shan Gao

Keywords: 3D adversarial examples, physical attacks, isometry

TL;DR: A novel method to generate nature 3D adversarial examples in the physical world.

Stephen Casper, Max Nadeau, Dylan Hadfield-Menell, Gabriel Kreiman

Keywords: Interpretability, Explainability, Adversarial Attacks, Feature-Level

TL;DR: We produce feature-level adversarial attacks using a deep image generator. They have a wide range of capabilities, and they are effective for studying feature/class (mis)associations in networks.


Data security (9)

Federated Learning Attacks

Henger Li, Xiaolin Sun, Zizhan Zheng

Keywords: Federated Learning, Adversarial Attacks, Reinforcement Learning, Data Poisoning Attacks

TL;DR: We propose a model-based reinforcement learning framework to derive untargeted poisoning attacks against federated learning (FL) systems.

Membership Inference Attacks

Jasper Tan, Blake Mason, Hamid Javadi, Richard Baraniuk

Keywords: Membership inference, Privacy, Linear Regression, Overparameterization

TL;DR: We prove that increasing the number of parameters of a linear model increases its vulnerability to membership inference attacks.

Yaxin Xiao, Qingqing Ye, Haibo Hu, Huadi Zheng, Chengfang Fang, Jie Shi

Keywords: AI Safety, Model Extraction attack, Membership Inference attack

TL;DR: This work explores a chained and iterative reaction where model extraction and membership inference advance each other. (This paper studies the problem of securing a model once it is published as a service. In most prior studies the focus was on either protecting the model from extraction attacks (ME) or from identifying the data used for training the model (MI). The authors propose that a simultaneous attack on both surfaces is even more powerful since the MI attack provides more information to the ME attack.)

Pingyi Hu, Zihan Wang, Ruoxi Sun, Hu Wang, Minhui Xue

Keywords: Membership inference attack, Data privacy leakage, Multimodality

TL;DR: The paper studies the privacy leakage of multi-modal models, proposing a membership inference attack against multi-modal models. Two attack methods are introduced, the metric-based M4I and the feature-based M4I. In metric-based M4I, the adversary can score the data and use a threshold or a binary classifier to distinguish between the scores of member data and non-member data; while in feature-based M4I, a pre-trained shadow multi-modal feature extractor is used to conduct data inference attack.

Model Inversion Attacks

Mengda Yang, Ziang Li, Juan Wang, Hongxin Hu, Ao Ren, Xiaoyang Xu, Wenzhe Yi

Keywords: model inversion attack, edge-cloud learning

TL;DR: The paper presents a new model inversion attack for edge-cloud learning which is shown to overcome existing defenses against model inversion in such scenario. The attack is based on Sensitive Feature Distillation (SFD) which simultaneously uses two existing techniques, shadow model and feature-based knowledge distillation. The authors claim that the proposed attack method can effectively extract the purified feature map from the intentionally obfuscated one and recover the private image for popular image datasets such as MNIST, CIFAR10, and CelebA.

Niv Haim, Gal Vardi, Gilad Yehudai, michal Irani, Ohad Shamir

Keywords: implicit bias, dataset reconstruction, privacy attacks, model inversion attack

TL;DR: We provide a novel scheme for reconstructing large portions of the actual training samples from a trained neural network. Our scheme is inspired by recent theoretical results of the implicit bias in training neural networks.

Model Steganography

Xudong Pan, Shengyao Zhang, Mi Zhang, Yifan Yan, Min Yang

Keywords: data stealing, deep learning privacy, AI security

TL;DR: This paper presents a method called Cans for encoding secret datasets into deep neural networks (DNNs) and transmitting them in a openly shared “carrier” DNN. In contrast to existing steganography methods encoding information into least significant bits, the authors encode the secret dataset into a trained publicly shared DNN model such that the public model will predict weights for secret key inputs (known to covert operatives), the weights are used to populate a secret DNN model and the secret DNN model predicts secret dataset for noisy inputs (known to covert operatives). The main advantage of the Cans encoding is that it can covertly transmit over 10000 real-world data samples within a carrier model which has 100× less parameters than the total size of the stolen data, and simultaneously transmit multiple heterogeneous datasets within a single carrier model, under a trivial distortion rate (< 10−5) and with almost no utility loss on the carrier model (< 1%).

Differential Privacy

Jiaqi Wang, Roei Schuster, Ilia Shumailov, David Lie, Nicolas Papernot

Keywords: adversarial, differential privacy, privacy, attacks

TL;DR: We show that the differrential privacy mechanism used to protect training sets in ensemble-based decentralized learning, in fact causes leakage of sensitive information.

Spurious Correlations

Qi Tian, Kun Kuang, Kelu Jiang, Furui Liu, Zhihua Wang, Fei Wu

Keywords: data privacy, generative adversarial network, causal confounder

TL;DR: We propose a causality inspired method, named ConfounderGAN, a generative adversarial network (GAN) to make personal image data unlearnable for protecting the data privacy of its owners.


Downstream Tasks Attacks

Pre-trained Model Attacks

Yuanhao Ban, Yinpeng Dong

Keywords: Adversarial samples, pre-trained models, security

TL;DR: We design a novel algorithm to generate adversarial samples using pre-trained models which can fool the corresponding fine-tuned ones and thus reveal the safety problem of fine-tuning pre-trained models to do downstream tasks.

Image Quality Assessment, IQA

Weixia Zhang, Dingquan Li, Xiongkuo Min, Guangtao Zhai, Guodong Guo, Xiaokang Yang, Kede Ma

Keywords: Adversarial attack, Image Quality

TL;DR: This paper builds upon work from several fields (adversarial attacks, MAD competition, and Eigen-distortion analysis) which are analysis by synthesis methods used to generate small magnitude perturbations (by some pixel measure) that cause large change in model response (either in classification for adversarial attacks, or perceptual distance in MAD and Eigen-distortion). The paper extends this to the field of no-reference image quality metrics, and compares their contribution with these other methods. They successfully use this method to synthesize images that change the NR-IQA quality score significantly but are below human detection (with humans in the loop).

Face Recognition

Shuai Jia, Bangjie Yin, Taiping Yao, Shouhong Ding, Chunhua Shen, Xiaokang Yang, Chao Ma

Keywords: adversarial attack, face recognition, unrestricted attack

TL;DR: This paper studied the inconspicuous and transferable adversarial attacks on face recognition models. Different from previous works that consider norm perturbations, this paper introduces the framework of adversarial attributes, which generates noise on the attribute space of face images based on StyleGAN. An importance-aware attribute selection approach is proposed to ensure the stealthiness and attacking performance. The experiments on various face models show the effectiveness.


Other Models

Diffusion Models

Maximilian Augustin, Valentyn Boreiko, Francesco Croce, Matthias Hein

Keywords: Diffusion Models, Counterfactual, Visual Counterfactual Explanations

TL;DR: This paper focus on generating Visual Counterfactual Explanations (VCE) for any arbitrary classifier using diffusion models. This work leverages diffusion models to generate realistic and minimal semantically altered VCE examples.

Recommender Systems

Haoyang LI, Shimin Di, Lei Chen

Keywords: Recommender system, Injective Attacks, Poisoning Attack

TL;DR: We first revisit current injective attackers on recommender systems and then propose a difficulty-aware and diversity-aware attacker.

speaker recognition

Patrick O'Reilly, Andreas Bugler, Keshav Bhandari, Max Morrison, Bryan Pardo

Keywords: privacy, adversarial examples, speech, speaker recognition

TL;DR: This paper presents VoiceBox, an adversarial audio attack which (1) dramatically lowers the accuracy of speaker recognition systems, (2) is mostly imperceptible to humans, and (3) can operate in real-time on live audio streams. The authors further demonstrate that their proposed model may also transfer to speaker recognition systems that it was not explicitly trained to fool.

Graph Neural Networks, GNN

Felix Mujkanovic, Simon Geisler, Stephan Günnemann, Aleksandar Bojchevski

Keywords: Adversarial Robustness, Graph Neural Networks, Adaptive Attacks

TL;DR: Adaptive evaluation reveals that most examined adversarial defenses for GNNs show no or only marginal improvement in robustness The paper suggests that non-adaptive attacks lead to an overstate on adversarial robustness, and thus the authors recommend using adaptive attacks as a gold-standard.

Hongwei Jin, Zishun Yu, Xinhua Zhang

Keywords: certification of robustness, Gromov-Wasserstein distance, convex relaxation

TL;DR: We develop convex relaxations to certify the robustness of graph convolution networks under Gromov-Wasserstein style threat models.

Clustering

Anshuman Chhabra, Ashwin Sekhari, Prasant Mohapatra

Keywords: Deep Clustering, Adversarial Attacks, Visual Learning, Robust Learning

TL;DR: We show that state-of-the-art deep clustering models (even "robust" variants and a production-level MLaaS API) are susceptible to adversarial attacks that significantly reduce performance. Natural defense approaches are unable to mitigate our attack.

Neural Tangent Kernel, NTK

Nikolaos Tsilivis, Julia Kempe

Keywords: Neural Tangent Kernel, Adversarial Examples, Non Robust Features, Linearised Networks

TL;DR: We study adversarial examples though the lens of the NTK, introduce a new set of induced features to uncover the role of robust/non-robust features in classification, and study the kernel dynamics during adversarial training.

Spatiotemporal Traffic Forecasting Models

Fan Liu, Hao Liu, Wenzhao Jiang

Keywords: Spatiotemporal traffic foresting, Adversarial attack

TL;DR: This paper presents a novel adversarial attack for spatiotemporal traffic forecasting models. Moreover, theoretically analysis are conducted to demonstrate the worst performance bound of the attack. Comprehensive experiments on real-world datasets show the effectiveness of the effectiveness of the attack, and how the robustness of the models enhances when combined with the corresponding adversarial training.

Reinforcement Learning, RL

Li-Cheng Lan, Huan Zhang, Ti-Rong Wu, Meng-Yu Tsai, I-Chen Wu, Cho-Jui Hsieh

Keywords: Reinforcement Learning, AlphaGo, AlphaZero, Robustness, adversarial attack, discrete observation space sequential decision making problem

TL;DR: We found adversarial states that will let AlphaZero trained agents make beginner's mistake on the game of Go.

Dong-Sig Han, Hyunseo Kim, Hyundo Lee, JeHwan Ryu, Byoung-Tak Zhang

Keywords: inverse reinforcement learning, regularized Markov decision processes, imitation learning, learning theory

TL;DR: we present a novel algorithm that provides rewards as iterative optimization targets for an imitation learning agent. (Inverse reinforcement learning (IRL) is an algorithm for learning ground-truth rewards from expert demonstrations where the expert acts optimally with respect to an unknown reward function.)

Lazy Training

Yunjuan Wang, Enayat Ullah, Poorya Mianjy, Raman Arora

Keywords: Lazy Training, Adversarial Robustness,

TL;DR: The paper takes a solid step towards explaining adversarial sensitivity of neural networks based on random networks. The paper shows that networks can be attacked by a single gradient descent step in the lazy regime where parameters are close to initialization.


Others

Ruochen Wang, Yuanhao Xiong, Minhao Cheng, Cho-Jui Hsieh

Keywords: AutoML, Optimizer Search, Optimization, Adversarial Robustness, Graph Neural Networks, BERT

TL;DR: This paper designs a new search space over optimizers and a search algorithm combining rejection sampling with Monte Carlo search. They evaluate the optimizer search on image classification, adversarial attacks, training GNNs, and BERT fine-tuning.


Adversarial Defense


Old task

input transformation

Yue Gao, Ilia Shumailov, Kassem Fawaz, Nicolas Papernot

Keywords: adversarial examples, randomized defenses, preprocessing defenses, input transformation, limitations

TL;DR: We demonstrate the limitations of using stochastic input transformations to provide adversarial robustness.

Chih-Hui Ho, Nuno Vasconcelos

Keywords: Adversarial Defense, Adversarial Attack, Implicit Functions, Local Implicit Function, Denoising

TL;DR: A novel adversarial defense for image classification is proposed with the use of the local implicit functions. (This paper provides a way, named aDversarIal defenSe with local impliCit functiOns (DISCO), to protect classifiers from being attacked by adversarial examples. DISCO is composed of two parts: an encoder and a local implicit module. For inference, DISCO takes an image (either clean or adversarially perturbed) and a query pixel location as input and outputs an RGB value that is as clean as possible. After the process, the new output image is expected to wipe out all the adversarial perturbation on it, making the classifier predicts with high accuracy. In summary, I think that DISCO is one type of denoising model that aims to be adversarially robust.)

query attack defense

Sizhe Chen, Zhehao Huang, Qinghua Tao, Yingwen Wu, Cihang Xie, Xiaolin Huang

Keywords: adversarial defense, black-box attack, model calibration, score-based query attack

TL;DR: We propose a novel defense against score-based query attacks, which post-processes model outputs to effectively confound attackers without hurting accuracy and calibration.

model ensemble

Sen Cui, Jingfeng Zhang, Jian Liang, Bo Han, Masashi Sugiyama, Changshui Zhang

Keywords: adversarial defense, collaboration, model ensemble.

TL;DR: This paper further improves the ensemble' adversarial robustness through a collaboration scheme.

Normalization

Minjing Dong, Xinghao Chen, Yunhe Wang, Chang Xu

Keywords: Adversarial Robustness, Adversarial Transferability, Normalization

TL;DR: We introduce a Random Normalization Aggregation module to achieve defense capability via adversarial transferability reduction.


Adversarial Training

Gaurang Sriramanan, Maharshi Gor, Soheil Feizi

Keywords: Adversarial Robustness, Adversarial Defense, Adversarial Training, Multiple Threat Models, Fast Adversarial Training, Efficient Adversarial Training, Single-Step Adversarial Training

TL;DR: In this work, we show that by carefully choosing the objective function used for robust training, it is possible to achieve similar, or improved worst-case performance over a union of threat models while utilizing only single-step attacks, thereby achieving a significant reduction in computational resources necessary for training.

Pau de Jorge, Adel Bibi, Riccardo Volpi, Amartya Sanyal, Philip Torr, Grégory Rogez, Puneet K. Dokania

Keywords: single-step adversarial training, catastrophic overfitting, FGSM, efficient adversarial training, fast adversarial training

TL;DR: We introduce a novel single-step attack for adversarial training that can prevent catastrophic overfitting while obtaining a 3x speed-up.

Mazda Moayeri, Kiarash Banihashem, Soheil Feizi

Keywords: robustness, adversarial, distributional, spurious correlations

TL;DR: adversarial training can increase model reliance on spurious correlations, reducing distributional robustness

Qixun Wang, Yifei Wang, Hong Zhu, Yisen Wang

Keywords: robustness, adversarial training, Out-of-Distribution,

TL;DR: This paper proposes two modified methods for adversarial training to improve robustness. The noise being added is constrained to low dimensional subspace, and it further changes to more delicate constraints.

Chengyu Dong, Liyuan Liu, Jingbo Shang

Keywords: Adversarial training, Label noise, Robust overfitting, Double descent

TL;DR: We show that label noise exists in adversarial training and can explain robust overfitting as well as its intriguing behaviors.

Avrim Blum, Omar Montasser, Greg Shakhnarovich, Hongyang Zhang

Keywords: boosting, adversarial robustness, sample complexity, oracle complexity

TL;DR: We present an oracle-efficient algorithm for boosting robustness to adversarial examples. (The paper studies a new direction in adversarial training, on the distinctions between barely robust leaners and strongly robust learners. The paper provides theoretical results on whether we can make a barely robust classifier into a strongly robust classifier. )

Sihui Dai, Saeed Mahloujifar, Prateek Mittal

Keywords: adversarial robustness, training method

TL;DR: This paper studies the model robustness extension to unforeseen perturbations. The main contribution of the paper is a generalization bound for unforeseen threat models with the knowledge of a source model. The paper further provides a training method for achieving better unforeseen robustness using the generalization bound.

Jiancong Xiao, Yanbo Fan, Ruoyu Sun, Jue Wang, Zhi-Quan Luo

Keywords: Generalization, Adversarial Training, Uniform Stability

TL;DR: This paper uses uniform stability to study generalization in networks under adversarial training. The authors use their framework to understand robust overfitting: the phenomenon in which a network is adversarially robust on the training set, but generalize poorly. The theoretical results in this paper are backed up by experiments and applications to provide theoretical clarity to a number of common methods for reducing overfitting.

Yiting Chen, Qibing Ren, Junchi Yan

Keywords: Convolutional Neural Network, adversarial robustness, frequency domain, Shapley value

TL;DR: This paper proposes to quantify the impact of frequency components of images on CNNs and investigates adversarial training(AT) and the adversarial attack in the frequency domain.

Xinsong Ma, Zekai Wang, Weiwei Liu

Keywords: Adversarial training, robust fairness

TL;DR: This paper studies an important and interesting topic in adversarial training, i.e., robust fairness issue. The robust fairness issue indicates that the adversarially trained model tends to perform well in some classes and poorly in other classes, creating a disparity in the robust accuracies for different classes. This paper focuses on the influence of perturbation radii on the robust fairness problem and finds that there is a tradeoff between average robust accuracy and robust fairness. To mitigate this tradeoff, the authors present a method to correct the lack of fairness in adversarial training.

Sravanti Addepalli, Samyak Jain, Venkatesh Babu Radhakrishnan

Keywords: Adversarial Training, Data Augmentation, Adversarial Robustness

TL;DR: We propose an effective augmentation strategy for Adversarial Training that can be integrated with several Adversarial Training algorithms and data augmentations.

Yichuan Mo, Dongxian Wu, Yifei Wang, Yiwen Guo, Yisen Wang

Keywords: Vision Transformer, Adversarial Training, Robustness

TL;DR: This paper investigates the training techniques and utilizes the unique architectures to improve the adversarial robustness of Vision transformers.

Xiaofeng Mao, YueFeng Chen, Ranjie Duan, Yao Zhu, Gege Qi, Shaokai Ye, Xiaodan Li, Rong Zhang, Hui Xue'

Keywords: Adversarial Training, Discrete Visual Representation, Robustness, Generalization

TL;DR: We propose Discrete Adversarial Training (DAT) which transfers the merit of NLP-style adversarial training to vision models, for improving robustness and generalization simultaneously.

Julia Grabinski, Paul Gavrikov, Janis Keuper, Margret Keuper

Keywords: Computer Vision, Adversarial Robustness, Model Calibration, Adversarial Training

TL;DR: We empirically show that adversarially robust models are less over-confident then their non-robust counterparts.

Jianan Zhou, Jianing Zhu, Jingfeng Zhang, Tongliang Liu, Gang Niu, Bo Han, Masashi Sugiyama

Keywords: adversarial training, weakly supervised learning, complementary label

TL;DR: How to equip machine learning models with adversarial robustness when all given labels in a dataset are wrong (i.e., complementary labels)?

Christian Cianfarani, Arjun Nitin Bhagoji, Vikash Sehwag, Ben Zhao, Haitao Zheng, Prateek Mittal

Keywords: representation similarities, robust training, visualizations, adversarial training

TL;DR: Using representation similarities to establish salient differences between robust and non-robust representations. The authors primarily use the CKA metric on CIFAR10 and subsets of ImageNet2012 provide several novel insights on "salient pitfalls" in robust networks which suggest that robust representations are less specialized with a weaker block structure, early layers in robust networks are largely unaffected by adversarial examples as the representations seem similar for benign vs. perturbed inputs, deeper layers overfit during robust learning, and that models trained to be robust to different threat models have similar representations.

Yue Xing, Qifan Song, Guang Cheng Published: 01 Nov 2022, Last Modified: 07 Oct 2022NeurIPS 2022 AcceptReaders: EveryoneShow BibtexShow Revisions

Keywords: adversarial robustness, unlabeled data, semi-supervised learning, adversarial training

TL;DR: Our theory in simple models shows the effectiveness of using unlabeled data to help adversarial training. (This paper investigated the phenomena when using simulated data to improve adversarial robustness and provided the theoretical analysis. Specifically, the author decomposed the adversarial risk used in adversarial training to explain why and how unlabeled data can help and how its quality affects the resulting robustness.)

Zhuoer Xu, Guanghui Zhu, Changhua Meng, shiwen cui, Zhenzhe Ying, Weiqiang Wang, Ming GU, Yihua Huang

Keywords: Adversarial Training, Automated Machine Learning

TL;DR: propose a method called A2 to improve the robustness by constructing the optimal perturbations on-the-fly during training. (Based on the idea of AutoML, this paper proposes an attack method that efficiently generates strong adversarial perturbations. The main idea is to use an attention mechanism to score possible attacks in the attacker space, then sample the attack to perform based on the assigned scores. The experimental results show that the proposed method can increase the attack power and improve the adversarial training performance without too much overhead.)

Yue Xing, Qifan Song, Guang Cheng

Keywords: Adversarial robustness, Adversarial training

TL;DR: The paper studies how to mitigate the gap between adversarial training and clean training by finding the optimal epsilon. This paper investigates the critical point of the magnitude of the adversarial perturbations with which training trajectories of adversarial training become significantly different from those of standard training.


Backdoor Defense

Ruisi Cai, Zhenyu Zhang, Tianlong Chen, Xiaohan Chen, Zhangyang Wang

Keywords: Backdoor Attack Detection, Randomly Dhuffling

TL;DR: This paper proposes the backdoor defense even without a clean data set. Specifically, after randomly shuffling the filters and comparing the changes in DNN output, the authors find there is an obvious difference between backdoored DNNs and benign DNNs. Futher They propose some metrics to detect whether a given model is backdoored or not.

Steve Hanneke, Amin Karbasi, Mohammad Mahmoody, Idan Mehalel, Shay Moran

Keywords: Data Poisoning, the smallest achievable error

TL;DR: The paper information-theoretically studies the possibility and impossibility of learning from poisoned data, in a supervised learning setting with deterministic model predictions, against a relatively powerful attacker who knows the data x on which the model will be tested (or deployed).

Zhenting Wang, Hailun Ding, Juan Zhai, Shiqing Ma

Keywords: Natural Backdoors, Robustness

TL;DR: This paper introduces a new algorithm to mitigate backdoors in neural network models.

Zhenting Wang, Kai Mei, Hailun Ding, Juan Zhai, Shiqing Ma

Keywords: Backdoor/Trojan defense

TL;DR: Traditionally, most trojan detection methods focus on reverse-engineering a fixed trojan pattern on the input pictures. This paper proposes to reverse-engineer the trojan in the feature space of DNNs. The proposed method can be used to detect feature-based trojan or dynamic trojan that is input dependent.

Tian Yu Liu, Yu Yang, Baharan Mirzasoleiman

Keywords: Data Poisoning, Friendly Noise, different settings

TL;DR: This paper proposes a poisoning defense that unlike existing methods breaks various types of poisoning attacks with a small drop in the generalization. The key claim is that attacks exploit sharp loss regions to craft adversarial perturbations which can substantially alter examples' gradient or representations under small perturbations. The authors then propose to generate noise patterns which maximally perturb examples while minimizing performance degradation.

Runkai Zheng, Rongjun Tang, Jianze Li, Li Liu

Keywords: Backdoor Defense, Backdoor Attack, Adversarial Learning

TL;DR: In this paper, we demonstrate two defense strategies against backdoor attack with the observed property that the backdoor neurons in an infected neural network have a mixture of two distributions with significantly different moments.

Yuhao Zhang, Aws Albarghouthi, Loris D'Antoni

Keywords: data poisoning, certified defense, backdoor attack, trigger-less attack

TL;DR: We present BagFlip, a model-agnostic certified approach that can effectively defend against both trigger-less and backdoor attack.

Shuwen Chai, Jinghui Chen

Keywords: Backdoor Defense, Adverarial Machine Learning, One-shot Learning

TL;DR: The paper presents a method for defending deep neural networks against backdoor attacks, i.e., attacks that inject “triggered” samples into the training set. The method can be seen as an improvement on Adversarial Neuron Pruning (ANP) that uses (i) soft weight masking (SWM), (ii) adversarial trigger recovery (ATR) and (iii) sparsity regularization (SR). The main focus of the paper is in the low-data regime, especially in the one-shot setting and when the network size is small.

Weixin Chen, Baoyuan Wu, Haoqian Wang

Keywords: backdoor defense, backdoor learning, trustworthy AI, AI security

TL;DR: Through the observation that clean and backdoored data with a dissimilar feature representation after data transformation techniques (e.g. rotation, scaling), the author proposed a sensitivity metric, feature consistency towards transformations (FCT), to detect the potential backdoor samples. The author further proposed two backdoor removing modules with inspiration from the existing defenses of semi-supervised learning and backdoor unlearning.

Haotao Wang, Junyuan Hong, Aston Zhang, Jiayu Zhou, Zhangyang Wang

Keywords: backdoor defense, AI security

TL;DR: In this paper, we propose a brand-new backdoor defense strategy, which makes it much easier to remove the harmful influence of backdoor samples from the model.


certified robustness / provable defenses

Randomized Smoothing

Jan Schuchardt, Stephan Günnemann

Keywords: Robustness certification, Verification, Randomized smoothing, Invariances, Equivariances

TL;DR: We derive tight, invariance-aware robustness certificates that augment black-box randomized smoothing with white-box knowledge about model invariances.

Miklós Z. Horváth, Mark Niklas Mueller, Marc Fischer, Martin Vechev

Keywords: adversarial robustness, certified robustness, randomized smoothing, tree-based models

TL;DR: We propose a (De-)Randomized Smoothing approach for decision stump ensembles, which i) significantly improves SOTA certified Lp-norm robustness for tree-based models and ii) enables joint certificates of numerical & categorical perturbations.

Huan Zhang, Shiqi Wang, Kaidi Xu, Linyi Li, Bo Li, Suman Jana, Cho-Jui Hsieh, J Zico Kolter

Keywords: neural network, formal verification, adversarial robustness

TL;DR: We propose GCP-CROWN, which extends bound-propagation-based neural network verifiers with general cutting planes constraints to strengthen the convex relaxations. GCP-CROWN is part of α,β-CROWN (alpha-beta-CROWN), the VNN-COMP 2022 winner.

Mikhail Pautov, Olesya Kuznetsova, Nurislam Tursynbek, Aleksandr Petiushko, Ivan Oseledets

Keywords: Certified robustness, randomized smoothing, few-shot learning

TL;DR: This paper proposes a certified robustness method for few-shot learning classification based on randomized smoothing.

Jinyuan Jia, Wenjie Qu, Neil Zhenqiang Gong

Keywords: Adversarial examples, multi-label classification, certified defense, randomized smoothing

TL;DR: This paper proposes MultiGuard, where multi-label classification with provable guarantees against adversarial perstubations is studied. The method is based on randomized-smoothing, where randomization with Gaussian noise is utilized to provide a smoothed classifier with provable guarantees, and this work generalizes that to multi-label classification, with adjusted claims to suit multi-label classification.

Lipschitz Neural Networks

Sahil Singla, Soheil Feizi

Keywords: provable defenses, adversarial examples, Lipschitz CNNs, formal guarantees

TL;DR: we introduce several new techniques that lead to significant improvements on CIFAR-10 and CIFAR-100 for both standard and provable robust accuracy and establishes a new state-of-the-art.

Xiaojun Xu, Linyi Li, Bo Li

Keywords: Adversarial Robustness, Lipschitz-bounded Models, certified robustness

TL;DR: We propose a 1-Lipschitz CNN which achieves state-of-the-art deterministic certified robustness.

Bohang Zhang, Du Jiang, Di He, Liwei Wang

Keywords: Adversarial Robustness, Certified Defense, Lipschitz Neural Network, Expressive Power

TL;DR: We study certified robustness from a novel perspective of representing Boolean functions, providing deep insights into how recently proposed Lipschitz networks work and guiding the design of better Lipschitz networks.

Louis Béthune, Thibaut Boissin, Mathieu Serrurier, Franck Mamalet, Corentin Friedrich, Alberto Gonzalez Sanz

Keywords: robustness, lipschitz, certificate, orthogonal, generalization, loss, provably robust, certified robustness

TL;DR: Lipschitz neural network are good classifiers: they are expressive, they are provably robust, and they generalize.

Others

Zhuolin Yang, Zhikuan Zhao, Boxin Wang, Jiawei Zhang, Linyi Li, Hengzhi Pei, Bojan Karlaš, Ji Liu, Heng Guo, Ce Zhang, Bo Li

Keywords: certified robustness, logical reasoning, Markov logic networks (MLN)

TL;DR: We propose the sensing-reasoning pipeline with knowledge based logical reasoning and provide the first certified robustness analysis for this pipeline. Results show it outperforms the current state-of-the-art in terms of certified robustness.

Yizhen Wang, Mohannad Alhanahnah, Xiaozhu Meng, Ke Wang, Mihai Christodorescu, Somesh Jha

Keywords: adversarial machine learning, relational adversaries, input normalization, input transformation, defense mechanism with guarantee, malware detection

TL;DR: This paper studies relational adversary, a general threat model in which adversary can manipulate test data via transformations specified by a logical relation. Inspired by the conditions for robustness against relational adversaries and the sources of robustness-accuracy trade-off, the authors propose a learning framework called normalize-and-predict, which leverages input normalization to achieve provable robustness.

Jing Liu, Chulin Xie, Oluwasanmi O Koyejo, Bo Li

Keywords: Robust Collaborative Inference, Feature Purification, Adversarial Machine Learning, Vertical Federated Inference, Robust Decomposition

TL;DR: This paper focuses on improving the robustness of the model for collaborative inference. A pre-processing-based defense method is proposed against inference phase attacks. Both theoretical analyses and empirical evaluations are provided to demonstrate the effectiveness of the proposed method.

Idan Attias, Steve Hanneke, Yishay Mansour

Keywords: Semi-Supervised Learning, Adversarial Robustness, PAC Learning, Sample Complexity, Combinatorial Dimensions, Partial Concept Classes

TL;DR: The authors study semi-supervised learning for adversarially robust classification and claim upper bounds on labelled and unlabelled sample complexity of their learner.


Data Security

model stealing defense

Jiyang Guan, Jian Liang, Ran He

Keywords: NN fingerprinting, model stealing attack, sample correlation

TL;DR: We propose a novel correlation-based fingerprinting method SAC, which robustly detects different kinds of model stealing attacks.

Xuanli He, Qiongkai Xu, Yi Zeng, Lingjuan Lyu, Fangzhao Wu, Jiwei Li, Ruoxi Jia

Keywords: natural language generation, conditional lexical watermarks, IP protection

TL;DR: We propose a novel Conditional wATERmarking framework (CATER) for protecting the IP right of text generation APIs caused by imitation attacks.

model inversion defense

Hoyong Jeong, Suyoung Lee, Sung Ju Hwang, Sooel Son

Keywords: model inversion defense, model explanation, explainable AI

TL;DR: We propose the first defense framework that mitigates explanation-aware model inversion attacks by teaching a model to suppress inversion-critical features in a given explanation while preserving its functionality.

federated learning

Chen Chen, Yuchen Liu, Xingjun Ma, Lingjuan Lyu

Keywords: federated learning, adversarial training

TL;DR: A novel calibrated federated adversarial training method that can handle label skewness.

Adversarial Removal

Abhinav Kumar, Chenhao Tan, Amit Sharma

Keywords: Probing, Null-Space Removal, Adversarial Removal, Spurious Correlation, Fairness

TL;DR: We theoretically and experimentally demonstrate that even under favorable conditions, probing-based null-space and adversarial removal methods fail to remove the sensitive attribute from latent representation.

Ryutaro Tanno, Melanie F. Pradier, Aditya Nori, Yingzhen Li

Keywords: model repairment, interpretability, continual learning, data deletion, debugging, interpretability

TL;DR: We develop a framework for repairing machine learning models by identifying detrimental training datapoints and erasing their memories

Nicholas Carlini, Matthew Jagielski, Chiyuan Zhang, Nicolas Papernot, Andreas Terzis, Florian Tramer

Keywords: memorization, privacy, auditing, membership inference attack

TL;DR: Removing the layer of outlier points that are most vulnerable to a privacy attack exposes a new layer of previously-safe points to the attack This paper examines the impact of removing records that are most vulnerable to a membership inference attack on the privacy of the remaining records. Given a machine learning model trained on a private dataset, one way to make it more “privacy-preserving” without randomizing the training algorithm could be to identify records at risk, remove them from the dataset, then re-train a model on the remaining records. The paper shows that some of the remaining records become more at risk and investigates potential causes for this phenomenon.

Vinith Menon Suriyakumar, Ashia Camage Wilson

Keywords: Online Algorithms, Data Deletion

TL;DR: In this paper we use the infinitesimal jacknife to develop an efficient approximate unlearning algorithm for online delete requests.


Downstream tasks

Out-of-Distribution, OOD

Mohammad Azizmalayeri, Arshia Soltani Moakar, Arman Zarei, Reihaneh Zohrabi, Mohammad Taghi Manzuri, Mohammad Hossein Rohban

Keywords: Out-of-distribution Detection, Adversarial Robustness, Attack

TL;DR: Here we provide benchmarking of current robust OOD detection methods against strong attacks, and propose a novel effective defense.

Alexander Meinke, Julian Bitterwolf, Matthias Hein

Keywords: adversarial robustness, out-of-distribution detection

TL;DR: We slightly modify the architecture of neural network classifiers such that one can obtain provable guarantees on adversarially robust OOD detection without any loss in accuracy.

Stereo Matching

Kelvin Cheng, Tianfu Wu, Zhebin Zhang, Hongyu Sun, Christopher G. Healey

Keywords: Stereo Matching, Contextualized Non-Parametric Cost Volume, Adversarial Robustness, Simulation-to-Real Generalizability

TL;DR: The integration of DNN-contextualized binary-pattern-driven non-parametric cost volume and DNN cost aggregation leads to more robust and more generalizable stereo matching.


Other Models

VITs

Yao Qin, Chiyuan Zhang, Ting Chen, Balaji Lakshminarayanan, Alex Beutel, Xuezhi Wang

Keywords: Robustness, Distributional shift, Vision transformers

TL;DR: vision transformers (ViTs) are known to be non-sensitive to patch shuffling, in contrast to humans. Could we make ViTs more robust by making them sensitive to patch shuffling (and other patch-based transformations)?

VAE

Anna Kuzina, Max Welling, Jakub Mikolaj Tomczak

Keywords: VAE, MCMC, Adversarial Attack

TL;DR: We show that MCMC can be used to fix the latent code of the VAE which was corrupted by an adversarial attack. (The paper presents the hypothesis that adversarial attacks on VAEs push the latent code into low probability areas. According to this hypothesis, the attack can be mitigated by pushing back the latent code into more probable region of the latent space. To do so, MCMC is applied in inference time.)

Đorđe Miladinović, Kumar Shridhar, Kushal Jain, Max B. Paulus, Joachim M. Buhmann, Carl Allen

Keywords: VAE, Dropout, posterior collapse, adversarial training

TL;DR: We propose an adversarial training strategy to achieve information-based stochastic dropout.

Amrutha Saseendran, Kathrin Skubch, Stefan Falkner, Margret Keuper

Keywords: Adversarial robustness, Generative models, Deterministic autoencoder

TL;DR: An adversarially robust deterministic autoencoder with superior performance in terms of both generation and robustness of the learned representations

Binary Neural Networks, BNN

Chen Liu, Ziqi Zhao, Sabine Süsstrunk, Mathieu Salzmann

Keywords: Adversarial Robustness, Model Compression, BNN

TL;DR: We introduce a framework to find robust sub-networks from randomly-initialized binary networks without updating the model parameters.

Natural Language Processing, NLP

Daniel Ziegler, Seraphina Nix, Lawrence Chan, Tim Bauman, Peter Schmidt-Nielsen, Tao Lin, Adam Scherlis, Noa Nabeshima, Benjamin Weinstein-Raun, Daniel de Haas, Buck Shlegeris, Nate Thomas

Keywords: adversarial training, language model, redteaming, human adversaries, tool assisted

TL;DR: We used a safe language generation task (``avoid injuries'') as a testbed for achieving high reliability through adversarial training.

Biru Zhu, Yujia Qin, Ganqu Cui, Yangyi Chen, Weilin Zhao, Chong Fu, Yangdong Deng, Zhiyuan Liu, Jingang Wang, Wei Wu, Maosong Sun, Ming Gu

Keywords: Backdoor Defense, Pre-trained Language Models

TL;DR: In this paper, the authors investigate an interesting phenomenon that when the models are doing a moderate fitting with parameter-efficient training methods, the models are likely to ignore the backdoored features, as those features are ill-trained. Based on this observation, the authors suggest restricting the language model fine-tuning to the moderate-fitting stage to naturally improve the robustness of language models against backdoor triggers. Furthermore, the authors find that (1) parameter capacity, (2) training epochs, and (3) learning rate are key factors that can impact the models’ vulnerability to backdoors. Reducing those hyper-parameters can help models fail to adapt to backdoor features.

Deep Equilibrium Models, DEQs

Zonghan Yang, Tianyu Pang, Yang Liu

Keywords: Deep Equilibrium Models, Adversarial Attack

TL;DR: This paper evaluated the robustness of the general deep equilibrium model (DEQ) in the traditional white-box attack-defense setting. The authors first pointed out the challenges of training robust DEQs. Then they developed a method to estimate the intermediate gradients of DEQs and integrate them into the adversarial attack pipelines.

Spiking Neural Network

Ling Liang, Kaidi Xu, Xing Hu, Lei Deng, Yuan Xie

Keywords: Spiking Neural Network, Certified Training, Adversarial Attack

TL;DR: The first work that applies certification-based techniques to spiking neural networks.

Jianhao Ding, Tong Bu, Zhaofei Yu, Tiejun Huang, Jian K Liu

Keywords: Spiking Neural Networks, Neural Coding, Perturbation Analysis, Adversarial Training

TL;DR: Experimental and theoretical insights about the robustness of spiking neural networks motivate a robust training scheme.

Stochastic Neural Networks

Sina Däubener, Asja Fischer

Keywords: Stochastic neural network, robustness, adversarial attacks

TL;DR: This paper drives a sufficient condition for the robustness of stochastic neural networks (SNNs).

Regression

Fan Zheyi, Zhaohui Li, Qingpei Hu

Keywords: Robust regression, Hard thresholding, Bayesian reweighting, Variational inference

TL;DR: By combining the hard thresholding method and prior information, we propose two robust regression algorithms, TRIP and BRHT, which can effectively resist adaptive adversarial attacks.

Neural Tangent Kernel, NTK

Noel Loo, Ramin Hasani, Alexander Amini, Daniela Rus

Keywords: Neural Tangent Kernel, adversarial training

TL;DR: We empirically study the evolution of the NTK under adversarial training.

Polynomial Neural Networks

Elias Abad Rocamora, Mehmet Fatih Sahin, Fanghui Liu, Grigorios Chrysos, Volkan Cevher

Keywords: branch and bound, adversarial robustness, adversarial examples, certified robustness, polynomial network verification

TL;DR: We propose a branch and bound algorithm for polynomial network verification.

Mixture of Expert models, MoEs

Joan Puigcerver, Rodolphe Jenatton, Carlos Riquelme Ruiz, Pranjal Awasthi, Srinadh Bhojanapalli

Keywords: mixture of experts, moe, adversarial, robustness, Lipschitz constants

TL;DR: We analyze conditions under which Mixture of Expert models (MoEs) are more adversarially robust than dense models.

Graph Neural Networks, GNN

Yang Song, QIYU KANG, Sijie Wang, Zhao Kai, Wee Peng Tay

Keywords: GNN, Robustness, Topology perturbation

TL;DR: This paper studies the robustness properties of graph neural partial differential equations and empirically demonstrates that graph neural PDEs are intrinsically more robust against topology perturbation compared to other graph neural networks.

Runlin Lei, Zhen WANG, Yaliang Li, Bolin Ding, Zhewei Wei

Keywords: Graph Neural Networks, Homophily, Robustness

TL;DR: A spectral GNN without odd-order terms generalizes better across graphs of different homophily. (This paper proposes a simple and effective idea to only use even-order neighbors to improve the robustness and generalization ability of spectral GNNs. It is based on the intuition that a friend's friend is friend, and an enemy's enemy is also friend, thus only using even-order neighbors improves the generalization across different homophily/heterophily levels. )

Yan Scholten, Jan Schuchardt, Simon Geisler, Aleksandar Bojchevski, Stephan Günnemann

Keywords: Robustness Certificates, Adversarial Robustness, Randomized Smoothing, Graph Neural Networks

TL;DR: Exploiting the message-passing principle of Graph Neural Networks to certify robustness against strong adversaries that can arbitrarily perturb all features of multiple nodes in the graph.

Reinforcement Learning, RL

Haonan Yu, Wei Xu, Haichao Zhang

Keywords: Safe RL, constrained Markov decision process, safety layer, first-order safety method, model-free RL

TL;DR: Learning a safety editor policy that transforms potentially unsafe actions proposed by a utility maximizer policy into safe ones, achieving extremely low constraint violation rates on 14 challenging safe RL tasks.

Aivar Sootla, Alexander Imani Cowen-Rivers, Jun Wang, Haitham Bou Ammar

Keywords: safe reinforcement learning, safety during training

TL;DR: We aim at improving safe exploration by augmenting safety information into the state-space and by developing ways to control the safety state

Yongyuan Liang, Yanchao Sun, Ruijie Zheng, Furong Huang

Keywords: Reinforcement Learning, Robustness, Worst-case Aware, Adversarial Learning

TL;DR: We propose a strong and efficient robust training framework for RL, WocaR-RL, that directly estimates and optimizes the worst-case reward of a policy under bounded attacks without requiring extra samples for learning an attacker.

Marc Rigter, Bruno Lacerda, Nick Hawes

Keywords: offline reinforcement learning, model-based reinforcement learning, deep reinforcement learning, robust reinforcement learning, adversarial learning, Adversarial training

TL;DR: Adversarial training of the dynamics model to prevent model exploitation in model-based offline RL.

Shubham Kumar Bharti, Xuezhou Zhang, Adish Singla, Jerry Zhu

Keywords: Adversarial Learning, Reinforcement Learning, Backdoor defense

TL;DR: We propose a provable defense mechanism against backdoor policies in reinforcement learning.


Analysis

Overparameterization

Sanjam Garg, Somesh Jha, Saeed Mahloujifar, Mohammad Mahmoody, Mingyuan Wang

Keywords: Large models, Robustness, Adversarial examples, Computational hardness, over-parameterization

TL;DR: We show that efficient (robust) learning could provably need more parameters than inefficient (robust) learning.

Binghui Li, Jikai Jin, Han Zhong, John E. Hopcroft, Liwei Wang

Keywords: deep learning theory, adversarial robustness, robust generalization gap, expressive power, over-parameterization

TL;DR: We provide a theoretical understanding of robust generalization gap from the perspective of expressive power for deep neural networks.

Zhenyu Zhu, Fanghui Liu, Grigorios Chrysos, Volkan Cevher

Keywords: over-parameterized model, robustness, perturbation stability, initialization scheme

TL;DR: We explore the interplay of the width, the depth and the initialization(s) on the average robustness of neural networks with new theoretical bounds in an effort to address the apparent contradiction in the literature.

Dynamic System

Xiyuan Li, Xin Zou, Weiwei Liu

Keywords: ordinary differential equation (ODE), Adversarial Defense

TL;DR: The paper studies the problem of enhancing neural network robustness from a dynamic system perspective. To this end, the authors proposed a nonautonomous neural ordinary differential equation (ASODE) that makes clean instances be their asymptotically stable equilibrium points. In this way, the asymptotic stability will reduce the adversarial noise to bring the nearby adversarial examples close to the clean instance. The empirical studies show that the proposed method can defend against existing attacks and outperform SOTA defense methods.

Counterfactual Explanations

Jing Ma, Ruocheng Guo, Saumitra Mishra, Aidong Zhang, Jundong Li

Keywords: Counterfactual explanations, graph, explainability

TL;DR: This paper proposes a model-agnostic framework for counterfactual explanations on graphs, facilitating the optimization, generalization, and causality in counterfactual explanation generation.

Others

Gal Vardi, Gilad Yehudai, Ohad Shamir

Keywords: implicit bias, deep learning theory, robustness, provably non-robust

TL;DR: We show that depth-2 neural networks trained under a natural setting are provably non-robust, even when robust networks on the same dataset exist. (The paper presents an interesting and novel analysis for an important problem of understanding why standard gradient-based training can lead to non-robust models, although robust models provably exist. The paper nicely points out that margin maximization in the parameter space is not aligned in general with the L2 robustness. Moreover, I think the paper is an interesting contribution towards the discussion of adversarial examples being “bugs vs. features” (https://arxiv.org/abs/1905.02175) where the authors of the current paper give theoretical evidence that adversarial examples can also arise as “bugs” from the particular optimization algorithm we use for training.)

Aniket Das, Bernhard Schölkopf, Michael Muehlebach

Keywords: Minimax Optimization, Smooth Games, Nonconvex-Nonconcave Minimax Optimization, Sampling without Replacement, Random Reshuffling, Shuffle Once, Incremental Gradient, Gradient Descent Ascent, Proximal Point Method, Alternating Gradient Descent Ascent

TL;DR: The paper shows the convergence of stochastic GDA with random reshuffling (RR), shuffle once (SO) and adversarial shuffling (AS) for strongly-convex-strongly-concave min-max problems. It also extends to the two-sided PL condition with alternating GDA.

Laurent Meunier, Raphael Ettedgui, Rafael Pinot, Yann Chevaleyre, Jamal Atif

Keywords: adversarial, consistency, calibration

TL;DR: We study calibration and consistency of losses in the adversarial setting. Rating: 796

Omar Montasser, Steve Hanneke, Nathan Srebro

Keywords: adversarially robust PAC learning, sample complexity, theoretical analysis

TL;DR: We present a minimax optimal learner for the problem of learning predictors robust to adversarial examples at test-time. Rating: 987

Improving task performance with adversarial learning

Amrith Setlur, Benjamin Eysenbach, Virginia Smith, Sergey Levine

Keywords: supervised learning, overfitting, regularization, adversarial examples, spurious correlations, out-of-distribution

TL;DR: The method, which we call RCAD (Reducing Confidence along Adversarial Directions), aims to reduce confidence on out-of-distribution examples lying along directions adversarially chosen to increase training loss. (Training the model to make unconfident predictions on self-generated examples along the adversarial direction can improve generaiization. )

Zhun Zhong, Yuyang Zhao, Gim Hee Lee, Nicu Sebe

Keywords: Domain Generalization, Semantic Segmentation, Adversarial Style Augmentation

TL;DR: We propose a novel adversarial style augmentation approach for domain generalization in semantic segmentation, which is easy to implement and can effectively improve the model performance on unseen real domains.

Shangquan Sun, Wenqi Ren, Tao Wang, Xiaochun Cao

Keywords: Image Restoration, Object Detection, Image Dehazing, Low Light Enhancement, Targeted Adversarial Attack

TL;DR: We propose a training pipeline for image restoration model by generating pseudo ground truth with targeted adversarial attack, such that a subsequent detector can predicts better on its recovered image.

Yang Li, Yichuan Mo, Liangliang Shi, Junchi Yan

Keywords: Generative Adversarial Networks, Adversarial Learning, Latent Space

TL;DR: This paper proposes to improve the performance of GAN in terms of generative quality and diversity by mining the latent space using adversarial learning (I-FGSM).

Lilly Kumari, Shengjie Wang, Tianyi Zhou, Jeff Bilmes

Keywords: continual learning, adversarial perturbations, class incremental learning, boundary samples, catastrophic forgetting

TL;DR: we develop an adversarial augmentation based method that combines new task samples with memory buffer samples for continual learning, which can be applied with general continual learning methods such as ER, MIR, etc. to achieve improved performance (The paper proposes a novel approach to tackle the classifier bias problem in the context of continual learning. For this purpose, the approach (called RAR) focusses on the examplars that are close to the the forgetting border. RAR perturbs the selected exemplars towards the closest sample from the current task in the latent space. By replaying such samples, RAR is able to refine the boundary between previous and current tasks, hence combating forgetting and reducing bias towards the current task. Moreover, the authors propose to combine RAR with the mix-up technique which significantly improves continual learning in the small buffer regime. RAR is a generic approach and could be combined with any experience replay method.)


Evaluations

3D Simulation

Guillaume Leclerc, Hadi Salman, Andrew Ilyas, Sai Vemprala, Logan Engstrom, Vibhav Vineet, Kai Yuanqing Xiao, Pengchuan Zhang, Shibani Santurkar, Greg Yang, Ashish Kapoor, Aleksander Madry

Keywords: robustness, debugging, simulation, computer vision, deep learning

TL;DR: We introduce 3DB: an extendable, unified framework for testing and debugging vision models using photorealistic simulation.

Nataniel Ruiz, Sarah Adel Bargal, Cihang Xie, Kate Saenko, Stan Sclaroff

Keywords: convolutional neural networks, vision transformers, robustness, testing, simulation, synthetic data, out of distribution, generalization, domain shift

TL;DR: A framework to compare CNNs and Vision Transformers by answering counterfactual questions using realistic simulated scenes.

Robustness Evaluations

Maura Pintor, Luca Demetrio, Angelo Sotgiu, Ambra Demontis, Nicholas Carlini, Battista Biggio, Fabio Roli

Keywords: Debugging, machine learning, adversarial machine learning

TL;DR: Analysis of failures in the optimization of adversarial attacks, indicators to reveal when they happen, and systematic framework to avoid them.

Dennis Wei, Rahul Nair, Amit Dhurandhar, Kush R. Varshney, Elizabeth M. Daly, Moninder Singh

Keywords: safety, interpretability, explainability, Evaluations

TL;DR: We assess the safety of a predictive model through its maximum deviation from a reference model and show how interpretability facilitates this safety assessment.

Yinpeng Dong, Shouwei Ruan, Hang Su, Caixin Kang, Xingxing Wei, Jun Zhu

Keywords: Visual Recognition, Robustness, Viewpoint Changes, OOD Generalization, Robustness Evaluations, realistic 3D models

TL;DR: A novel method to evaluate viewpoint robustness of visual recognition models in the physical world. (In this work, the authors constrain adversarial perturbations to the space of object and camera pose that lead to poor visual recognition performance. Importantly, the space considered is constrained to be physically plausible. They leverage recent advances in Neural Rendering (NeRF) to generate realistic 3D models of objects, and optimize for non-canonical poses that lead to poor predictive performance. Finally, the authors propose a new benchmark for evaluating viewpoint robustness (ImageNet-V) which may be used to assess the general quality of any image recognition system.)

Florian Wenzel, Andrea Dittadi, Peter Vincent Gehler, Carl-Johann Simon-Gabriel, Max Horn, Dominik Zietlow, David Kernert, Chris Russell, Thomas Brox, Bernt Schiele, Bernhard Schölkopf, Francesco Locatello

Keywords: Out-of-distribution generalization, robustness, distribution shifts, large-scale empirical study, Robustness Evaluations

TL;DR: We perform a large scale empirical study of out-of-distribution generalization. (This paper presents a large-scale empirical study of deep neural networks’ Out-Of-Distribution (OOD) generalization on visual recognition tasks. The main motivation is that previous studies for OOD generalization consider specific types of distributional shifts and are evaluated with specific datasets and architectures, and thus the conclusion drawn from one setting may not generalize to another. To overcome such limitations, the authors perform large-scale experiments under diverse types of OOD data on 172 ID/OOD datasets with multiple architectures. The experimental results show that some of the conclusions drawn from previous small-scale works do not hold in general and may depend on the target tasks, and that the factor that is consistently correlated with the OOD performance is the model’s ID performance.)

Roland S. Zimmermann, Wieland Brendel, Florian Tramer, Nicholas Carlini

Keywords: adversarial robustness, robustness, adversarial attack, Adversarial Robustness Evaluations

TL;DR: We propose a test that enables researchers to find flawed adversarial robustness evaluations. Passing our test produces compelling evidence that the attacks used have sufficient power to evaluate the model’s robustness. (This paper proposes a binarization test to identify weak attacks against adversarial example defenses. The proposed test changes the model’s prediction layer to a binary classifier and fine-tunes it on a small crafted dataset for each benign example. As a result, the original attack, if sufficiently strong, should be able to find adversarial examples when applied to the modified model. This test serves as an active robustness test to complement existing passive tests of weak attacks. )