2015年4月13日 星期一

To aggregate or not to aggregate: Selective match kernels for image search

Giorgos Tolias, Yannis Avrithis, Herve ́ Je ́gou

  Abstract

This paper focused on improving visual recognition accuracy, no matter it's objects, locations or scenes. The ubiquitous approach targeted at this problem is local descriptor and most of them are derived from BoW, which seeks to quantize the obtained descriptors into several discrete visual words and hence represent each image as a vector, which could be further combined with machine learning techniques such as SVM to achieve fair results. 

These methods, however, could be categorized into two "genres", of which are matching based and aggregated approaches. The authors hence analyzed them and proffered a new method, which is unprecedented in that it took the merits from these two and combine them to yield a result whose performance is even higher than that of state-of-art approaches. 

The matching based methods, in some sense, utilizes the descriptors independently, while the aggregated approach, indicated by its name, coalesced several descriptors, which if properly set not only preserves some individual information, but also has the advantage of using less memory. 

In the experiment, through setting different parameters, the authors showed that it's worth combing matching based and aggregated approaches to yield superior results.

Contribution
  1. Analyze the difference between these two seemingly different approaches 
  2. Combine flawlessly these two, proposing a method that is not only efficient in terms of memory consumption but also yields a better results.
Questions
  1. Not sure what it meant to have multiple assignments. And to "replicate each descriptor vector and assign each instance to a different visual word" ?
  2. The meaning of selectivity and threshold? Is it parameters to dictate how local descriptors are chosen or rendered?

沒有留言:

張貼留言