A Comparative Study of Classifier Based Mispronunciation Detection System for Confusing


  • M Maqsood Department of Software Engineering, UET, Taxila
  • H. A. Habib Department of Computer Science, UET, Taxila
  • S. M. Anwar Department of Software Engineering, UET, Taxila
  • M. A. Ghazanfar Department of Software Engineering, UET, Taxila
  • T. Nawaz Department of Software Engineering, UET, Taxila


Pronunciation training systems detect mispronunciations from language learner’s speech and provide useful feedback. Mispronunciation detection systems can either be developed using Confidence Measures (CM) or using classifiers with Acoustic Phonetic Features (APF). This paper presents an APF based computer assisted pronunciation training (CAPT) system for most confusing Arabic phoneme pairs (/ ? / vs/ ?/)and (/ ? / vs / ? / or / ? /) developed for subjects of  Pakistani origin. A super-vector is formed based on APF consisting of Mel-frequency cepstral coefficients (MFCCs) along with its first and second derivative, energy, zero-cross, spectral features and pitch. A large dataset has been recorded from 200 speakers of Pakistani origin learning Arabic as their second language. Four different machine learning classifiers; Random Forest, Naïve Bayes, Ada-boost and K-NN have been used for mispronunciation detection. A comparison has been conducted between these classifiers and standard Goodness of Pronunciation (GOP) method. The results show that Random Forest outperforms all other methods by a significant margin.


A. Al Hindi, M. Alsulaiman, G. Muhammad and S. Al-Kahtani, "Automatic pronunciation error detection of nonnative Arabic Speech", IEEE/ACS 11th International Conference on Computer Systems and Applications (AICCSA), pp. 190-197, 2014.

S. Wei, G. Hu, Y. Hu and R.-H. Wang, "A new method for mispronunciation detection using support vector machine based on pronunciation space models", Speech Communication, vol. 51, pp. 896-905, 2009.

F. Zhang, C. Huang, F.K. Soong, M. Chu and R. Wang, "Automatic mispronunciation detection for Mandarin", IEEE Int. Conf. on Acoustics, Speech and Signal Processing, pp. 5077-5080, 2008.

H. Strik, K. Truong, F. De Wet and C. Cucchiarini, "Comparing different approaches for automatic pronunciation error detection", Speech Communication, vol. 51, pp. 845-852, 2009.

S. M. Witt", Automatic error detection in pronunciation training: Where we are and where we need to go", Proc. IS ADEPT, vol. 6, 2012.

W. Hu, Y. Qian, F. K. Soong and Y. Wang, "Improved mispronunciation detection with deep neural network trained acoustic models and transfer learning based logistic regression classifiers," Speech Communication, vol. 67, pp. 154-166, 2015.

S.M. Witt and S. J. Young, "Phone-level pronunciation scoring and assessment for interactive language learning", Speech communication, vol. 30, pp. 95-108, 2000.

K. Truong, A. Neri, C. Cucchiarini and H. Strik, "Automatic pronunciation error detection: an acoustic-phonetic approach", InSTIL/ICALL Symposium, 2004.

H. M. A. Tabbaa and B. Soudan, "Computer-Aided Training for Quranic Recitation", Procedia-Social and Behavioral Sciences, vol. 192, pp. 778-787, 2015.

I. Amdal, M. H. Johnsen and E. Versvik, "Automatic evaluation of quantity contrast in non-native Norwegian speech", SLaTE, pp. 21-24, 2009.

C. Cucchiarini, H. Strik and L. Boves, "Quantitative assessment of second language learners’ fluency by means of automatic speech recognition technology, "The Journal of the Acoustical Society of America, vol. 107, pp. 989-999, 2000.

H. Franco, L. Neumeyer, M. Ramos and H. Bratt, "Automatic detection of phone-level mispronunciation for language learning", EUROSPEECH, 1999.

A.M. Harrison, W.-K. Lo, X. Qian and H. Meng, "Implementation of an extended recognition network for mispronunciation detection and diagnosis in computer-assisted pronunciation training", SLaTE, pp. 45-48, 2009.

A. Neri, C. Cucchiarini and H. Strik, "Selecting segmental errors in non-native Dutch for optimal pronunciation training", IRAL-International Review of Applied Linguistics in Language Teaching, vol. 44, pp. 357-404, 2006.

S.-Y. Yoon, M. Hasegawa-Johnson and R. Sproat, "Landmark-based automated pronunciation error detection", Interspeech, pp. 614-617, 2010.

H. Franco, L. Neumeyer, V. Digalakis and O. Ronen, "Combination of machine scores for automatic grading of pronunciation quality", Speech Communication, vol. 30, pp. 121-130, 2000.

A. Ito, Y.-L. Lim, M. Suzuki and S. Makino, "Pronunciation error detection method based on error rule clustering using a decision tree", pp. 173-176, 2005.

C. Hacker, T. Cincarek, A. Maier, A. HeBler and E. Noth, "Boosting of prosodic and pronunciation features to detect mispronunciations of non-native children", IEEE Int. Conf. on Acoustics, Speech and Signal Processing-ICASSP'07, pp. IV-197-IV-200, 2007.

A. M. Harrison, W. Y. Lau, H. M. Meng and L. Wang, "Improving mispronunciation detection and diagnosis of learners' speech with context-sensitive phonological rules based on language transfer", INTERSPEECH, pp. 2787-2790, 2008.

X. Qian, H. M. Meng and F. K. Soong, "The Use of DBN-HMMs for Mispronunciation Detection and Diagnosis in L2 English to Support Computer-Aided Pronunciation Training", INTER SPEECH, pp. 775-778, 2012.

X. Qian, F. K. Soong and H. M. Meng, "Discriminative acoustic model for improving mispronunciation detection and diagnosis in computer-aided pronunciation training (CAPT)", INTERSPEECH, pp. 757-760, 2010.

O. Ronen, L. Neumeyer and H. Franco, "Automatic detection of mispronunciation for language instruction", EUROSPEECH, 1997.

Y.-B. Wang and L.-S. Lee, "Improved approaches of modeling and detecting error patterns with empirical analysis for computer-aided pronunciation training", IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), pp. 5049-5052, 2012.

S.M. Abdou, S.E. Hamid, M. Rashwan, A. Samir, O. Abdel-Hamid, M. Shahin, et al., "Computer aided pronunciation learning system using speech recognition techniques", INTERSPEECH, pp. 849-852, 2006.

S. Abdou, M. Rashwan, H. Al-Barhamtoshy, K. Jambi and W. Al-Jedaibi, "Enhancing the Confidence Measure for an Arabic Pronunciation Verification System", Proc. of the Int. Symp. on Automatic Detection of Errors in Pronunciation Training, pp. 6-8, 2012.

J. van Doremalen, C. Cucchiarini and H. Strik, "Automatic detection of vowel pronunciation errors using multiple information sources", IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 580-585.

S. Molau, M. Pitz, R. Schluter, and H. Ney, "Computing mel-frequency cepstral coefficients on the power spectrum", Acoustics, Speech and Signal Processing, ICASSP'01, pp. 73-76, 2001.

A. Joseph and R. Sridhar, "Performance Evaluation of Various Classifiers in Emotion Recognition Using Discrete Wavelet Transform, Linear Predictor Coefficients and Formant Features", Advances in Computational Intelligence: Proceedings of International Conference on Computational Intelligence, pp. 373-382, 2015.

N. Dave, "Feature extraction methods LPC, PLP and MFCC in speech recognition", Int. J. Adv. Res. Engg. & Tech., vol. 1, pp. 1-4, 2013.

D. Shete, S. Patil, and S. Patil, "Zero crossing rate and Energy of the Speech Signal of Devanagari Script", IOSR-JVSP, vol. 4, pp. 1-5, 2014.

S. Zahid, F. Hussain, M. Rashid, M. H. Yousaf and H. A. Habib, "Optimized audio classification and segmentation algorithm by using ensemble methods", Mathematical Problems in Engg., vol. 2015, 2015.

M. Maqsood et al. / The Nucleus 54, No. 2 (2017) 114-120

A. I. Al-Shoshan, "Speech and music classification and separation: a review", Journal of King Saud University, vol. 19, pp. 95-133, 2006.

M.A. Ghazanfar, "Experimenting switching hybrid recommender systems", Intelligent Data Analysis, vol. 19, pp. 845-877, 2015.

A.M. Bhatti, M. Majid, S. M. Anwar, and B. Khan, "Human emotion recognition and analysis in response to audio music using brain signals," Computers in Human Behavior, vol. 65, pp. 267-275, 2016.

K.J. Archer and R.V. Kimes, "Empirical characterization of random forest variable importance measures", Computational Statistics & Data Analysis, vol. 52, pp. 2249-2260, 2008.

K. R. Gray, P. Aljabar, R. A. Heckemann, A. Hammers, D. Rueckert, and A. s. D. N. Initiative, "Random forest-based similarity measures for multi-modal classification of Alzheimer's disease", NeuroImage, vol. 65, pp. 167-175, 2013.

S. Young, G. Evermann, M. Gales, T. Hain, D. Kershaw, X. Liu, et al., "The HTK book", Cambridge University Engineering Department, vol. 3, p. 175, 2002.




How to Cite

M. Maqsood, H. A. Habib, S. M. Anwar, M. A. Ghazanfar, and T. Nawaz, “A Comparative Study of Classifier Based Mispronunciation Detection System for Confusing”, The Nucleus, vol. 54, no. 2, pp. 114–120, Jun. 2017.