2015年3月14日 星期六

Fine-Grained Crowdsourcing for Fine-Grained Recognition

Jia Deng, Jonathan Krause, Li Fei-Fei 
Computer Science Department, Stanford University 


Summary

This paper presents a game capable of collecting the pivotal features when differentiating between two species of birds and an algorithm which further harness the gleaned information, that is, "bubble"s, in this case, to help boost success classification rate.

During the game, players are presented with two "clear" pictures telling them how a certain species looks and a blurred image which they have to classify into one of the two species. The user could uncover certain parts of the picture to ascertain the result. However, each revelation diminish the total score, making the player parsimoniously reveal the truly essential parts, during which the most substantial "bubbles" are obtained. 

Now, having a bunch of bubbles collected from the "training phase", that is, all the games played, concerning the most differentiating features between pairs of two species, they then create a detector represented by one or more descriptors for each bubble and apply it on the testing images to classify a given image. Here, they assume spatial prior when applying detectors, that is, since the probability of an arbitrary feature appears in roughly the same region in pictures of the same species with quite high probability, they could simply apply the detector to that area.

Contributions and Novelties
  1. Present a interactive game that can not only be utilized to gather consequential features when telling two species apart, but also fulfill entertaining and recreational purpose.
  2. The game is domain agnostic, that is, it could be applied on sorts of different fine-grained classification problems and get results that are warranted by the mechanism and design of the game.
Confusions
  1. Why is it ok to assume the spatial prior, that is, why is it assured that the bubbles will appear in roughly the same location in different pictures of even same species?
  2. What does it mean to convolve the descriptor with densely sampled patches and take the maximum response? Does it mean to in some way, retrieve several parts of the testing image and apply the detector on each of them and get the maximum score?

沒有留言:

張貼留言