In machine learning, a question of great interest is understanding what examples are challenging for a model to classify. Identifying challenging examples helps inform safe deployment of models, isolates examples that require further human inspection, and provides interpretability into model behavior. We start with a simple hypothesis – examples that a model has difficulty learning will exhibit higher variance in gradient updates over the course of training. On the other hand, we expect the backpropagated gradients of the samples that are relatively easier to learn will have lower variance.

In this work, we propose Variance of Gradients (VOG) as a valuable and efficient proxy metric for detecting outliers in the data distribution. We provide quantitative and qualitative support that VOG is a meaningful way to rank data by difficulty and to surface a tractable subset of the most challenging examples for human-in-the-loop auditing. Data points with high VOG scores are far more difficult for the model to learn and over-index on corrupted or memorized examples

VOG offers an efficient method to rank the global difficulty of examples and automatically surface a possible subset to aid human interpretability. VOG can be computed using checkpoints stored over the course of training and is model agnostic. Alternatively, VOG can be computed using the predicted label, which makes it an unsupervised auditing tool at test time.

The primary contributions of our work can be summarized as follows:

1. We propose Variance of Gradient (VOG) – a class-normalized variance gradient score for determining the relative ease of learning data samples within a given class

2. We show that VOG is an effective auditing tool for ranking high dimensional datasets by difficulty

3. VOG identifies clusters of images with clearly distinct semantic properties

4. VOG effectively surfaces OOD and memorized examples.

VoG can be an effective tool to audit high-dimensional datasets. Below, we plot images from late-stage training for randomly selected classes from CIFAR-10 and CIFAR-100. We observe consistent results in differences between Low VoG and High VoG samples across both datasets.

Pre-computed VoG scores for MNIST/CIFAR-10/CIFAR-100 are available here.

We welcome additional discussion and code contributions on the topic of this work. A comprehensive introduction of the methodology, experiment framework and results can be found in our paper and open source code.

If you use this software, please consider citing:

@article{agarwal2020estimating, title={Estimating Example Difficulty using Variance of Gradients}, author={Agarwal, Chirag and D'souza, Daniel and Hooker, Sara}, journal={arXiv preprint arXiv:2008.11600}, year={2020} }