Deeper Time Delay Neural Networks for Effective Acoustic Modelling

Time delay neural networks (TDNNs) have been shown to be an efficient network architecture for modelling long temporal contexts in speech recognition. Meanwhile, the training times of TDNNs are much less, compared with other long temporal contexts models based on recurrent neural networks. In this paper, we propose deeper architectures to improve the modelling power of TDNNs. At each TDNN layer that needs spliced input, we increase the number of transforms so that the lower layers can provide more salient features for upper layers. Dropout is found to be an effective way to prevent the model from overfitting once the depth of the model is substantially increased. The proposed architectures significantly improvements the recognition accuracy in Switchboard and AMI.

Export citation and abstract BibTeX RIS

Previous article in issue

Next article in issue

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Please wait… references are loading.

Deeper Time Delay Neural Networks for Effective Acoustic Modelling

Article metrics

Share this article

Author e-mails

Author affiliations

Abstract