ISCA Archive Interspeech 2016
ISCA Archive Interspeech 2016

Overcoming Data Sparsity in Acoustic Modeling of Low-Resource Language by Borrowing Data and Model Parameters from High-Resource Languages

Basil Abraham, S. Umesh, Neethu Mariam Joy

In this paper, we propose two techniques to improve the acoustic model of a low-resource language by: (i) Pooling data from closely related languages using a phoneme mapping algorithm to build acoustic models like subspace Gaussian mixture model (SGMM), phone cluster adaptive training (Phone-CAT), deep neural network (DNN) and convolutional neural network (CNN). Using the low-resource language data, we then adapt the afore mentioned models towards that language. (ii) Using models built from high-resource languages, we first borrow subspace model parameters from SGMM/Phone-CAT; or hidden layers from DNN/CNN. The language specific parameters are then estimated using the low-resource language data. The experiments were performed on four Indian languages namely Assamese, Bengali, Hindi and Tamil. Relative improvements of 10 to 30% were obtained over corresponding monolingual models in each case.


doi: 10.21437/Interspeech.2016-963

Cite as: Abraham, B., Umesh, S., Joy, N.M. (2016) Overcoming Data Sparsity in Acoustic Modeling of Low-Resource Language by Borrowing Data and Model Parameters from High-Resource Languages. Proc. Interspeech 2016, 3037-3041, doi: 10.21437/Interspeech.2016-963

@inproceedings{abraham16b_interspeech,
  author={Basil Abraham and S. Umesh and Neethu Mariam Joy},
  title={{Overcoming Data Sparsity in Acoustic Modeling of Low-Resource Language by Borrowing Data and Model Parameters from High-Resource Languages}},
  year=2016,
  booktitle={Proc. Interspeech 2016},
  pages={3037--3041},
  doi={10.21437/Interspeech.2016-963}
}