IIAE CONFERENCE SYSTEM, The 3rd International Conference on Industrial Application Engineering 2015 (ICIAE2015)

Font Size: 
The Clustering Validity with Silhouette and Sum of Squared Errors
Tippaya Thinsungnoen, Nuntawut Kaoungku, Pongsakorn Durongdumronchai, Kittisak Kerdprasop, Nittaya Kerdprasop

Last modified: 2015-02-05

Abstract


The data clustering with automatic program such as k-means has been a popular technique widely used in many general applications. Two interesting sub-activity of clustering process are studied in this paper, selection the number of clusters and analysis the result of data clustering. This research aims at studying the clustering validation to find appropriate number of clusters for k-means method. The characteristics of experimental data have 3 shapes and each shape have 4 datasets (100 items), which diffusion is achieved by applying a Gaussian distributed (normal distribution). This research used two techniques for clustering validation: Silhouette and Sum of Squared Errors (SSE). The research shows comparative results on data clustering configuration k from 2 to 10. The results of both Silhouette and SSE are consistent in the sense that Silhouette and SSE present appropriate number of clusters at the same k-value (Silhouette value: maximum average, SSE-value: knee point).

Keywords


Clustering Validity;Silhouette Measure;Sum of Squared Errors;k-means Algorithm

Full Text: PDF