EM-Based Clustering Algorithm for Uncertain Data

Kinoshita, Naohiko; Endo, Yasunori

doi:10.1007/978-3-319-02821-7_8

Naohiko Kinoshita⁷ &
Yasunori Endo⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 245))

1033 Accesses

Abstract

In recent years, advanced data analysis techniques to get valuable knowledge from data using computing power of today are required. Clustering is one of the unsupervised classification technique of the data analysis. Information on a real space is transformed to data in a pattern space and analyzed in clustering. However, the data should be often represented not by a point but by a set because of uncertainty of the data, e.g., measurement error margin, data that cannot be regarded as one point, and missing values in data.

These uncertainties of data have been represented as interval range and many clustering algorithms for these interval ranges of data have been constructed. However, the guideline to select an available distance in each case has not been shown so that this selection problem is difficult. Therefore, methods to calculate the dissimilarity between such uncertain data without introducing a particular distance, e.g., nearest neighbor one and so on, have been strongly desired. From this viewpoint, we proposed a concept of tolerance. The concept represents a uncertain data not as an interval but as a point with a tolerance vector. However, the distribution of uncertainty which represents the tolerance is uniform distribution and it it difficult to handle other distributions of uncertainty in the framework of tolerance, e.g., the Gaussian distribution, with HCM or FCM.

In this paper, we try to construct an clustering algorithm based on the EM algorithm which handles uncertain data which are represented by the Gaussian distributions through solving the optimization problem.Moreover, effectiveness of the proposed algorithm will be verified.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)
Google Scholar
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York (1981)
Book MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1), 1–38 (1977)
MathSciNet MATH Google Scholar
Takata, O., Miyamoto, S.: Fuzzy clustering of Data with Interval Uncertainties. Journal of Japan Society for Fuzzy Theory and Systems 12(5), 686–695 (2000) (in Japanese)
Google Scholar
Endo, Y., Horiuchi, K.: On Clustering Algorithm for Fuzzy Data. In: Proc. 1997 International Symposium on Nonlinear Theory and Its Applications, pp. 381–384 (November 1997)
Google Scholar
Endo, Y.: Clustering Algorithm Using Covariance for Fuzzy Data. In: Proc. 1998 International Symposium on Nonlinear Theory and Its Applications, pp. 511–514 (September 1998)
Google Scholar
Endo, Y., Murata, R., Haruyama, H., Miyamoto, S.: Fuzzy c-Means for Data with Tolerance. In: Proc. 2005 International Symposium on Nonlinear Theory and Its Applications, pp. 345–348 (2005)
Google Scholar
Murata, R., Endo, Y., Haruyama, H., Miyamoto, S.: On Fuzzy c-Means for Data with Tolerance. Journal of Advanced Computational Intelligence and Intelligent Informatics 10(5), 673–681 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Systems and Information Engineering, University of Tsukuba, Tennodai 1-1-1, Tsukuba, Ibaraki, 305-8573, Japan
Naohiko Kinoshita
Faculty of Engineering, Information and Systems, University of Tsukuba, Tennodai 1-1-1, Tsukuba, Ibaraki, 305-8573, Japan
Yasunori Endo

Authors

Naohiko Kinoshita
View author publications
You can also search for this author in PubMed Google Scholar
Yasunori Endo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Naohiko Kinoshita .

Editor information

Editors and Affiliations

School of Knowledge Science, Japan Advanced Institute of Science and Technology, Ishikawa, Japan
Van Nam Huynh
UMR CNRS 7253 Heudiasyc, University of Technology of Compiegne, Compiegne Cedex, France
Thierry Denoeux
Faculty of Information Technology, Hanoi National University of Education, Hanoi, Vietnam
Dang Hung Tran
Faculty of Information Technology, University of Engineering and Technology, Hanoi, Vietnam
Anh Cuong Le
Faculty of Information Technology, University of Engineering and Technology, Hanoi, Vietnam
Son Bao Pham

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kinoshita, N., Endo, Y. (2014). EM-Based Clustering Algorithm for Uncertain Data. In: Huynh, V., Denoeux, T., Tran, D., Le, A., Pham, S. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 245. Springer, Cham. https://doi.org/10.1007/978-3-319-02821-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-02821-7_8
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02820-0
Online ISBN: 978-3-319-02821-7
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics