Santosh K. Divvala⇤,†, Ali Farhadi†, Carlos Guestrin
University of Washington, The Allen Institute for AI
Abstract
Recognition has ripped to be used on real-world application. However, how scalable and exhaustive could it be to cover all the aspects of a single concept? Also, to what point could human involvement be lessened? The author proposed a system capable of weakly supervised learning harnessing web data.
To learn a model of a concept, visual space of variance is first to retrieved. Then, a model is trained to deal with the intra-concept variance. So, to retrieve the visual space, the authors utilized Google Books Ngrmas to obtain possible variances of a concept, say, 'horse'. And to handle the unavoidable noiseness of the data, a weak classifier is trained for each variance with the intuition that meaningful aspect somehow processes saliency that could be recognized by the model. Thus, model trained on noisy ones will score relatively low, hence, be filtered out.
Moving on, within these left aspects, some are visually similar and thus training a model for each would be wasting. Thus the author constructed a graph, where each node represents a aspect and edge showing the similarity between the linked aspects. The edge weight Eij is the AP using the weak model trained on j to classify i. Through this procedure, several superngrams could be obtained each for which a strong model is trained.
Contributions
- Propose a system that could learn every aspects given any concept with almost zero human involvement. To date, models for 50000 variations of 150 concepts are available.
- The performance is almost as high as the supervised method.
- Could be harnessed further to solve some major NLP problems such as coreference resolution, where two textual mentions are actually refer to the same entity.


