Context Enhancement of Recurrent Neural Network Language Models for Automatic Speech Recognition

Michael Hentschel (1661025)


Language models are a key component of automatic speech recognition systems. In recent years, language models based on neural networks and in particular recurrent neural networks have shown significant performance improvements on traditional count-based language models. Language models calculate the probability of the next word from the word history. In the task of automatic speech recognition, various context information beside words is available. This context can be from different sources like the acoustic signal, the recent word history, or a global text topic.

In this presentation, I will specifically focus on In this presentation, I will specifically focus on how topic information can be used for the task of domain adaptation for language models. I will introduce hidden layer factorisation as an effective adaptation architecture. Hidden layer factorisation showed significantly better performance than state neural network language models using of the art long short-term memory and outperformed other feature-based adaptation architectures. Subsequently, I introduce a unified framework for context extraction and language model adaptation. This framework allows learning a context representation by a neural network and using this representation for feature-based domain adaptation. In comparison with conventional methods for domain adaptation, the proposed method can be trained in a single training step and learned jointly with the language model. All presented methods achieved improvements on state of the art recurrent neural network language models in terms of model perplexity and word error rate in rescoring.