2015年5月30日 星期六


ImageNet Classification with Deep Convolutional Neural Networks 

Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton 

Abstract
This paper basically documented how the recently extremely popular topic arose -- how by using large scaled neural network with the aid of fast-growing computing power of GPU Alex managed to achieved an error rate around 20% lower compared to the then best model in ILSVRC. 

The author then went through all the important features and topics associated with this net. Starting from ReLU, which compared to the traditional activation function, is non-saturating, and thus could speed up the training speed several times due to not confine the gradient in a range. Following is about local response normalization, observed in and inspired by  the real neuron, is to take values of neighbors of a certain kernel into consideration and refine its value, which aids generalization and lowers the error rate by 1%. The next one is about overlapped pooling, which instead of applying kernel non-overlappingly, the authors moved the kernel by a step s, which is smaller than the kernel size, say z, which is proved to be less prone to overfitting.

And then, the author presented with the overall structure with five convolution layers followed by three fully connected layer with the last one equipped with a softmax layer to transform dimensionality from 4096 to 1000, the number of categories. The author also gave advices on how to further prevent overfitting by using data augmentation (producing more training data by cropping and flipping a given image) and dropout (randomly setting the value of a neuron to zero with a given probability, which is quite like ensembling several models). 

Finally the author presented the experiment results on ILSVRC 2010, and 2012, which is shown to significantly lower the error rate. 

Contributions
  1. Trained the largest NN then and applied it on image classification
  2. Proposed several methods to deal with the highly possible overfitting problems 
  3. The network contains several new features, which hugely accelerate the training process


2015年5月12日 星期二

Story-Driven Summarization for Egocentric Video 
Zheng Lu and Kristen Grauman 

Abstract

As cameras and media storage continues to grow, the author stated, the usage will become more and more ubiquitous. With these upwelling recorded videos whose length is ever increasing, it becomes impossible for human to view every single detail from start to end and hence techniques to analyze and summarize these videos become more significant than ever. 

This paper targets at summarizing eco-centric videos taken by wearable devices or as the authors stated, robots, producing a shortened clip given a long video without losing much information and context. Traditionally, papers handling this issue often focuses on selecting high-quality subshots, while putting little effort on inter-shot relationship and hence sometimes loses the context as how one shot transitions to the next and often includes too many redundant subshots. 

This paper hence focuses on "telling the story" out of the clip by selecting the best chain of subshots that maximizes a three-part objective function, consisting of story, importance and diversity. Of the three part, story is the essence of this paper -- through using relationship between objects to analyze the intimacy of subshots. Briefly, objects are first detected in subshots and a bipartite directed graph is constructed connecting subgraphs and objects with the weights denoting the probability the object given a certain scene or vice versa.  With the graph, random walk is initiated to get the closeness of a pair of subshots based on an intuitive assumption that if two subshots A and B are highly correlated, walks starting from A will be highly likely to end at B. 

With the story part and the other two, the best chain of subshots could be retrieved. And through experiments the author showed that subjects preferred their summarization to other three baselines in a blind test where each subject is given summarizations derived using different methods. 

Contributions
  1. Inspired by a previous work targeting at summarizing news using text, the authors successfully transform it and use the concept on video ego-centric summarization.
  2. Proposed a objective function that considers the preserved context of a video in a generated summarizing clip.
  3. Proposed a segmentation method tailored to ego-centric videos which deals with lack of sharp distinction often used to get subshots.

2015年5月5日 星期二

Probabilistic Latent Semantic Analysis

Thomas Hofmann

Abstract
Learning from text is not only the most challenging task but also of crucial significance in Machine Learning and AI, which any breakthrough would make a huge leap in the sphere. And to extract information from text or natural language, understanding the actual or semantic meaning is a must. Conventionally, Latent Semantic Analysis is applied to do the job, which harnesses SVD to reduce the data dimensionality, hence mapping them to a new feature space, in which hopefully, axises are of semantic meanings.

This paper proposes a new way of handling the issue. Viewing from a different angle, the author sees the problem from a statistical point of view, harnessing probabilistic model to tackle the problem. The method is called Probabilistic Latent Semantic Analysis, which different to LSA, it assumes that given the hidden latent topic, the probability of a document is independent to the probability of a certain word. 

Harnessing typical EM procedure, a new space could be "learnt" and hence a lower dimensional representation of a data could be obtained. Conceptually, a document could be represented by basis, meaning the probability of this document is of this latent topic. Combining with the probability of a word given a certain topic, we could obtain the probability of a word given a certain document.



Contributions 
  1. Proposed a new method to extract semantic components from text and natural language.
  2. Not only is the new method more compact, the accuracy is also higher.
  3. Combining with annealing, which is often used in Machine Learning area, the performance could be further enhanced.