AMMAI

2015年6月7日星期日

Deep Neural Networks for Acoustic Modeling

in Speech Recognition

Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, et. al

Abstract

This paper discussed about how artificial neural network could and has been used in speech related task and showed that in many cases it has already outperformed the conventional GMM-HMM method by a huge margin.

Briefing the traditional GMM way, the authors stated the shortcoming of it, that is its incompetence of modeling data lying on a manifold. And researchers believe that neural networks has better performance on modeling this kind of data space. Following, the way of training neural network is introduced, starting from using Restricted Boltzmann Machine or noise tolerant auto-encoder to pre-train each layer and stacked those layers to form deep network.

The authors then talked about how DNN could be used with HMM by either simply use output of it as a new kind of input features to HMM or take those as probability of certain state given the input features, whether it's MFCC, a common used features or others. Further, some groups even tried Convolution Neural Network directly on spectrogram or output of mel filters.

Next, the authors examined some real cases mainly from several groups and stated that in all sorts of tasks, neural networks has been shown to be the state-of-the-art method and could be further exploited to future leap. However, the authors also listed several obstacles that should be fixed before we could make the full use of NN's power including its limit when it comes to parallelization.

Contributions

1. Summarizing the methods now largely used in speech-related tasks when it comes to harnessing neural networks

2. Discussed about the merits and shortcomings of NN when applied on these tasks and the hinder awaiting solutions for further exploiting its power

沒有留言:

張貼留言

訂閱：張貼留言 (Atom)