2015年4月11日 星期六

Nonlinear Dimensionality Reduction by Locally Linear Embedding

Sam T. Roweis1 and Lawrence K. Saul

Abstract

In this paper, the authors proposed a method, which could help dimension reduction. They begun saying since in lots of different spheres, professions often encounter tons of data, which if no further processing, are normally of hundreds if not thousands of dimensions, hindering scientists to efficiently deal with them. Of this reason, the importance of dimension reduction can't be overemphasized. However, during this process, similarities, distances or certain characteristics are apt to lose or wrongly preserved. Hence, the method proffered in this paper utilized local areas that are overlapped to preserve the traits shown in the original dimensionality after reduced to a lower one.

First, several common methods harnessed in dimension reduction are discussed, during which, the author stated that several methods simply use the Euclidian distance, which could produce misleading result when dealing with distribution like manifold. Though, some methods such as one employing the shortest path could in some way avoid this nuisance, their problem is the time-consuming dynamic programming required to compute the shortest-path distance.  

The authors hence proposed the method which harnesses local information to grasp the global one. Through representing each data point by the K nearest neighbors, each with a weight controlling how important that neighbor is when reconstructing the point, the authors managed to preserve the correlations between points and their neighbors after applying dimension reduction. Besides, by using overlapped local information, this method not only gets away with the rather heavy computation in shortest path distance, but also successfully model the non-linear characteristics of the whole distribution via these linear representations. 

Contributions

  1. Proposing a method preserving meaningful information in each axises of its reduced dimensionality, which has the merits of not only efficiency but also efficacy.
  2. Combining with other feature processing and learning schemas, the method will yield even better performance. 


沒有留言:

張貼留言