ImageNet Classification with Deep Convolutional Neural Networks
Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton
Abstract
This paper basically documented how the recently extremely popular topic arose -- how by using large scaled neural network with the aid of fast-growing computing power of GPU Alex managed to achieved an error rate around 20% lower compared to the then best model in ILSVRC.
The author then went through all the important features and topics associated with this net. Starting from ReLU, which compared to the traditional activation function, is non-saturating, and thus could speed up the training speed several times due to not confine the gradient in a range. Following is about local response normalization, observed in and inspired by the real neuron, is to take values of neighbors of a certain kernel into consideration and refine its value, which aids generalization and lowers the error rate by 1%. The next one is about overlapped pooling, which instead of applying kernel non-overlappingly, the authors moved the kernel by a step s, which is smaller than the kernel size, say z, which is proved to be less prone to overfitting.
And then, the author presented with the overall structure with five convolution layers followed by three fully connected layer with the last one equipped with a softmax layer to transform dimensionality from 4096 to 1000, the number of categories. The author also gave advices on how to further prevent overfitting by using data augmentation (producing more training data by cropping and flipping a given image) and dropout (randomly setting the value of a neuron to zero with a given probability, which is quite like ensembling several models).
Finally the author presented the experiment results on ILSVRC 2010, and 2012, which is shown to significantly lower the error rate.
Contributions
- Trained the largest NN then and applied it on image classification
- Proposed several methods to deal with the highly possible overfitting problems
- The network contains several new features, which hugely accelerate the training process


