2015年3月24日 星期二

Iterative Quantization: A Procrustean Approach to Learning Binary Codes for Large-scale Image Retrieval

Yunchao Gong, Svetlana Lazebnik, Albert Gordo, Florent Perronnin 

Abstract

This paper proposes a method to learn similarity-preserving binary codes which is largely used in similar images search in large-scale dataset. When applying binary encoding on images, one aims to minimizes the error or quantization error between the transformed and the original point representing the image. This paper presents a method through which the transformed high-dimensional data points and the axises are iteratively "rotated" to achieve the optimum binary representation. 

With no big difference to commonly applied methods, data points are first applied upon PCA, which serves to reduce the dimensionality of the data points. After PCA operation, initial axises are obtained, which quantized each data point into a binary code. Then, an novel algorithm called "Iterative Quantization" are applied on the incipient binary codes to minimize the error, or distance to the original point. 

Conceptually, what ITQ does is to try find a transformation matrix that will transform the binary codes to achieve the minimum error given the axises are fixed. After acquiring the new points, the next step comes into play, in which those points are fixed while axises are shifted to find, again, the optimum configuration. By iteratively conducting such operation, according to the paper, for about 50 rounds, the performance will be close to that in the convergent state, which is shown to take, though, much time compared to other methods, still applicable in real settings. 

Novelties and Discoveries
  1. These methods performs well especially when the reduced dimensionality is small, that is, when number of bits used to represent the original data points are rather small. 
  2. It could be combined with supervised method such as Canonical Correlation Analysis(CCA) to achieve better performance if the ground-truth labels are provided.
Questions
  1. Say, the target is to maximize the variances of each bits(According to the paper, exactly half of the data points are on one side while the others the other side), then how come Figure 3. that of ITQ is the lowest compared to RR and even PCA?

沒有留言:

張貼留言