Advances and Applications in Statistics
Volume 64, Issue 2, Pages 267 - 276
(October 2020) http://dx.doi.org/10.17654/AS064020267 |
|
K-MEANS CLUSTERING TO TTR BASED LEXICAL DIVERSITY ANALYSIS
Yanhui Zhang
|
Abstract: How to differentiate, with automated approaches, the language proficiency between native speakers and the second language learners of a language is a challenging question in language studies. This paper demonstrates an innovative method to effectively address the problem using K-means clustering. The main metric for the undertaken task is the analysis of selected lexical diversity measures, including the basic measure of Type to Token Ratio, and several other important measures derived from thereupon. Performance of the selected measures was assessed using apposite clustering statistics, including the Silhouette scores. Implications and further directions are provided in concluding remarks. |
Keywords and phrases: K-means clustering, lexical diversity, TTR, language proficiency.
|
|
Number of Downloads: 202 | Number of Views: 510 |
|