2015年6月2日 星期二


Rich feature hierarchies for accurate object detection and semantic segmentation

Ross Girshick Jeff Donahue Trevor Darrell Jitendra Malik 
UC Berkeley 


Abstract

Conventionally, visual recognition, no matter it's object recognition, classification or so, is primarily based on SIFT and HOG, which aims at finding local parts that are rather different or variant, hoping these would aid the tasks performed. However, biological visual system is more like a sequential or hierarchical process, inspiring the authors to harness neural network, termed R-CNN(Regional CNN) to facilitate object detection in this work.

The proposed system is composed of several components. First, given a image, around 2000 object proposals are produced harnessing selective search. Next, each proposal is fed into convolution neural network to get features, which would then be used as input to several SVMs, each trained for classifying a specific object, say, airplane. Combining output from all SVMs, a proposal could be classified as one of the objects or background.


After introduction to the overall design, experiment results are given. The authors showed that when training data are scarce, a pre-trained NN with similar domain could be fine-tuned using these scarce data, yielding significant performance boost. Even, one could simply use the pre-trained NN without fine-tuning it by taking the output from the last convolution layer as input features to SVM (taking that of fcs yields worst result), demonstrating that conv layers are like feature extractors and fcs classifiers. And by fine-tuning, it's like we're teaching NN to apply its generality of convs to the targeted task. 

The experiments showed a higher accuracy(54%) is reached compared to that using pre-defined features(around 35%). Also, not only the storage required is largely reduced due to a more compact feature representation but the computation time is two-orders of magnitude faster. 


Contributions
  1. show that NN could be harnessed on object proposals to accomplish object detection and segmentation
  2. When training data is scarce, one could fine-tune auxiliary pre-trained model to obtain a significant performance enhancement

沒有留言:

張貼留言