Papers related to instance-based interpretability (e.g. influence functions, prototypes, etc.), measures computed via training dynamics, memorization/forgetting, etc.
Basu et al. Influence Functions in Deep Learning Are Fragile. In ICLR 2021.
Chen et al. HyDRA: Hypergradient Data Relevance Analysis for Interpreting Deep Neural Networks. In AAAI 2021.
D'souza et al. A Tale Of Two Long Tails. In UDL-ICML Workshop 2021.
Hanawa et al. Evaluation of Similarity-based Explanations. in ICLR 2021.
Harutyunyan et al. Estimating informativeness of samples with Smooth Unique Information. In ICLR 2021.
Jiang et al. Characterizing Structural Regularities of Labeled Data in Overparameterized Models. In ICML 2021.
Kong and Chaudhuri. Understanding Instance-based Interpretability of Variational Auto-Encoders. In NeurIPS 2021.
Paul et al. Deep Learning on a Data Diet: Finding Important Examples Early in Training. In NeurIPS 2021.
Sui et al. Representer Point Selection via Local Jacobian Expansion for Post-hoc Classifier Explanation of Deep Neural Networks and Ensemble Models. In NeurIPS 2021.
Terashita et al. Influence Estimation for Generative Adversarial Networks. In ICLR 2021.
Zhang et al. On Sample Based Explanation Methods for NLP: Faithfulness, Efficiency and Semantic Evaluation. In ACL 2021.
Barshan et al. RelatIF: Identifying Explanatory Training Samples via Relative Influence. In AISTATS 2020.
Basu et al. On Second-Order Group Influence Functions for Black-Box Predictions. In ICML 2020.
Brophy and Lowd. TREX: Tree-Ensemble Representer-Point Explanations. In XXAI-ICML Workshop 2020.
Chen et al. Multi-Stage Influence Function. In NeurIPS 2020.
Feldman and Zhang. What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation. In NeurIPS 2020.
Vitaly Feldman. Does learning require memorization? a short tale about a long tail. In STOC 2020.
Agarwal et al. Estimating Example Difficulty Using Variance of Gradients. In WHI-ICML Workshop 2020.
Jacovi and Goldberg. Towards Faithfully Interpretable NLP Systems: How Should We Define and Evaluate Faithfulness?. In ACL 2020.
Pleiss et al. Identifying Mislabeled Data using the Area Under the Margin Ranking. In NeurIPS 2020.
Pruthi et al. Estimating Training Data Influence by Tracing Gradient Descent. In NeurIPS 2020.
Swayamdipta et al. Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics. In EMNLP 2020.
Yoon et al. Data Valuation using Reinforcement Learning. In ICML 2020.
Brunet et al. Understanding the Origins of Bias in Word Embeddings. In ICML 2019.
Charpiat et al. Input Similarity from the Neural Network Perspective. In NeurIPS 2019.
Chen et al. This Looks Like That: Deep Learning for Interpretable Image Recognition. In NeurIPS 2019.
Ghorbani and Zou. Data Shapley: Equitable Valuation of Data for Machine Learning. In ICML 2019.
Hara et al. Data Cleansing for Models Trained with SGD. In NeurIPS 2019.
Jia et al. Towards Efficient Data Valuation Based on the Shapley Value. AISTATS 2019.
Khanna et al. Interpreting Black Box Predictions using Fisher Kernels. In AISTATS 2019.
Koh et al. On the Accuracy of Influence Functions for Measuring Group Effects. In NeurIPS 2019.
Scharchilev et al. Finding Influential Training Samples for Gradient Boosted Decision Trees. In ICML 2018.
Toneva et al. An Empirical Study of Example Forgetting during Deep Neural Network Learning. In ICLR 2018.
Yeh et al. Representer Point Selection for Explaining Deep Neural Networks. In NeurIPS 2018.
Koh and Liang. Understanding Black-box Predictions via Influence Functions. In ICML 2017.
Cook and Weisberg. Characterizations of an Empirical Influence Function for Detecting Influential Cases in Regression . In Technometrics 1980.
Kim et al. Examples are not enough, learn to criticize! Criticism for Interpretability. In NeurIPS 2016.