[Paper Read]Learning From Noisy Large-Scale Datasets With Minimal Supervision

[Paper Read]Learning From Noisy Large-Scale Datasets With Minimal Supervision

tags: course AMMAI CVPR_17

Paper
Andreas Veit,Neil Alldrin Gal Chechik, Ivan Krasin, Abhinav Gupta, Serge Belongie1

Problem Definition

To leverage a small set of clean labels in the presence of a massive dataset with noisy labels.

  • Problem: Large noisy annotation data
  • DataSet: Open Images (~9 million images), multiple-labeled image and over 6000 unique classes.
  • Application: Object Classification
  • Assumption(Limitation): data with large number of classes and wide range of noise in annotations

Contribution

  • Introduce a semi-supervised learning framework which can produce a cleaned version of the dataset and a robust multi-label image classifier that facilitates small sets of clean annotations with large noisy data.

  • Outperforming direct fine-tuning approaches across all major categories in the Open Image dataset.

  • Improving performance across the full range of label noise levels(even in limited rated data), and most effective for classes having 20% to 80% false positive annotations.

Method

high-level

netwrok


Use the small clean dataset to learn a mapping between between noise and clean annotation.

  • Two supervised-learning network in this model.

  • The first is “label cleaning network” , which input the set of noisy label and image feature extract by CNN(Inception V3) , output cleaned label set to supervise the second network.

  • The label cleaning network is a residual model learning the difference between clean and noisy data. The model use identity skip-connection structure ( inspired by ResNet V1).

  • The second is “multi-label classifier” taking the label predicted by the first network as ground truth if the image pair doesn’t have a clean label.

Result