A Multi-objectives Genetic Algorithm Clustering Ensembles Based Approach to Summarize Relational Data

Alfred, Rayner; Chiye, Gabriel Jong; Lim, Yuto; On, Chin Kim; Obit, Joe Henry

doi:10.1007/978-981-10-2777-2_10

Rayner Alfred¹³,
Gabriel Jong Chiye¹³,
Yuto Lim¹⁴,
Chin Kim On¹³ &
…
Joe Henry Obit¹³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 652))

Included in the following conference series:

International Conference on Soft Computing in Data Science

832 Accesses

Abstract

In learning relational data, the Dynamic Aggregation of Relational Attributes algorithm is capable to transform a multi-relational database into a vector space representation, in which a traditional clustering algorithm can then be applied directly to summarize relational data. However, the performance of the algorithm is highly dependent on the quality of clusters produced. A small change in the initialization of the clustering algorithm parameters may cause adverse effects to the clusters quality produced. In optimizing the quality of clusters, a Genetic Algorithm is used to find the best combination of initializations in order to produce the optimal clusters. The proposed method involves the task of finding the best initialization with respect to the number of clusters, proximity distance measurements, fitness functions, and classifiers used for the evaluation. Based on the results obtained, clustering coupled with Euclidean distance is found to perform better in the classification stage compared to using clustering coupled with Cosine similarity. Based on the findings, the cluster entropy is the best fitness function, followed by multi-objectives fitness function used in the genetic algorithm. This is most probably because of the involvement of external measurement that takes the class label into consideration in optimizing the structure of the cluster results. In short, this paper shows the influence of varying the initialization values on the predictive performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cattral, R., Oppacher, F., Graham, K.J.L.: Techniques for evolutionary rule discovery in data mining. In: Conference on Evolutionary Computation, pp. 1737–1744 (2009)
Google Scholar
Xu, L., Jiang, C., Wang, J., Yuan, J., Ren, Y.: Information security in big data: privacy and data mining. In: IEEE 2014, pp. 1149–1176 (2014)
Google Scholar
Dzeroski, S.: Relational data mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 887–911. Springer US (2010)
Google Scholar
Ling, P., Rong, X.: Double-Phase Locality Sensitive Hashing of neighborhood development for multi-relational data. In: 13th UK Workshop on Computational Intelligence (UKCI), pp. 206–213 (2013)
Google Scholar
Mistry, U., Thakkar, A.R.: Link-based classification for Multi-Relational database. In: Recent Advances and Innovations in Engineering (ICRAIE), pp. 1–6 (2014)
Google Scholar
Zhang, W.: Multi-relational data mining based on higher-order inductive logic. In: WRI Global Congress in Intelligent Systems, Xiamen, pp. 453–458 (2009)
Google Scholar
Roth, D., Yih, W.-T.: Propositionalization of relational learning: an information extraction case study. In: 17th International Joint Conference on Artificial Intelligence, Seattle (2001)
Google Scholar
Nguyen, T.-S., Duong, T.-A., Kheau, C.S., Alfred, R., Keng, L.H.: Dimensionality reduction in data summarization approach to learning relational data. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) ACIIDS 2013, Part I. LNCS, vol. 7802, pp. 166–175. Springer, Heidelberg (2013)
Chapter Google Scholar
Lu, B., Ju, F.: An optimized genetic K-means clustering algorithm. In: International Conference on Computer Science and Information Processing, pp. 1296–1299 (2012)
Google Scholar
Li, T., Chen, Y.: A weight entropy k-means algorithm for clustering dataset with mixed numeric and categorical data. In: Fifth International Conference on Fuzzy Systems and Knowledge Discovery, Shandong, pp. 36–41(2008)
Google Scholar
Cui, X., Potok, T.E., Palathingal, P.: Document clustering using particle swarm optimization. In: IEEE Swarm Intelligence Symposium 2005, pp. 185–191(2005)
Google Scholar
Abdel-Kader, R.F.: Genetically improved PSO algorithm for efficient data clustering. In: 2nd International Conference on Machine Learning and Computing (ICMLC), pp. 71–75 (2010)
Google Scholar
Bharwad, N.D., Goswami, M.M.: Proposed efficient approach for classification for multi-relational data mining using Bayesian Belief Network. In: 2014 International Conference on Green Computing Communication and Electrical Engineering, pp. 1–4 (2004)
Google Scholar
Muggleton, S.: Inductive Logic Programming. New Gener. Comput. 8(4), 295–318 (1991)
Article MATH Google Scholar
Guo, J., Zheng, L., Li, T.: An efficient graph-based multi-relational data mining algorithm. In: International Conference on Computational Intelligence and Security, pp. 176–180 (2007)
Google Scholar
Dutta, D., Dutta, P., Sil, J.: Data clustering with mixed features by multi objective generic algorithm. In: 12th International Conference on Hybrid Intelligent Systems, Pune, pp. 336–341 (2012)
Google Scholar
Shah, N., Mahajan, S.: Document clustering: a detailed review. Int. J. Appl. Inf. Syst. 4, 30–38 (2012)
Google Scholar
Chen, C.-L., Tseng, F.S.C., Liang, T.: An integration of WordNet and fuzzy association rule mining for multi-label document clustering. Data Knowl. Eng. 69(11), 1208–1226 (2010)
Article Google Scholar
Pettinger, D., Di Fatta, G.: Space partitioning for scalable k-means. In: 9th International Conference in Machine Learning and Apps (ICMLA), pp. 319–324 (2010)
Google Scholar
Rendon, E., Abundez, A.A.I., Quiroz, E.M.: Internal versus External cluster validation indexes. Int. J. Comput. and Commun. 5(1), 27–32 (2011)
Google Scholar
Bilal, M., Masud, S., Athar, S.: FPGA design for statistics-inspired approximate sum-of-squared-error computation in multimedia applications. IEEE Trans. Circ. Syst. II: Express Briefs 59(8), 506–510 (2012)
Article Google Scholar
Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, London (1999)
MATH Google Scholar
Razali, N.M., Geraghty, J.: Genetic algorithm performance with different selection strategies in solving TSP. In: Proceedings of the World Congress on Engineering 2011, London, vol. II (2011)
Google Scholar
Wahid, A., Gao, X., Peter, A.: Multi-view clustering of web documents using multi-objective genetic algorithm. In: 2014 IEEE Congress Evolutionary Computation (CEC), pp. 2625–2632 (2014)
Google Scholar
Wen, X., Li, X., Gao, L., Wan, L., Wang, W.: Multi-objective genetic algorithm for integrated process planning and scheduling with fuzzy processing time. In: 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI), pp. 293–298 (2013)
Google Scholar
Konak, A., Coit, D.W., Smith, A.E.: Multi-objective optimization using genetic algorithms: a tutorial. Reliab. Eng. Syst. Saf. 9(9), 992–1007 (2006)
Article Google Scholar
Ismail, F.S., Yusof, R., Waqiyuddin, S.M.M.: Multi-objective optimization problems: method and application. In: 2011 4th International Conference on Modeling, Simulation and Applied Optimization (ICMSAO), pp. 1–6 (2011)
Google Scholar
Zeghichi, N., Assas, M., Mouss, L.H.: Genetic algorithm with pareto fronts for multi-criteria optimization case study milling parameters optimization. In: 2011 5th International Conference on Software, Knowledge Information, Industrial Management and Applications (SKIMA), Benevento, pp. 1–5 (2011)
Google Scholar
Atashkari, K., NarimanZadeh, N., Ghavimi, A.R., Mahmoodabadi, M.J., Aghaienezhad, F.: Multi-objective optimization of power and heating system based on artificial bee colony. In: International Symposium on Innovations in Intelligent Systems and Applications (INISTA), Istanbul, pp. 64–68 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computing and Informatics, Universiti Malaysia Sabah, Kota Kinabalu, Malaysia
Rayner Alfred, Gabriel Jong Chiye, Chin Kim On & Joe Henry Obit
School of Information Science, Japan Advanced Institute of Science and Technology, Nomi, Japan
Yuto Lim

Authors

Rayner Alfred
View author publications
You can also search for this author in PubMed Google Scholar
Gabriel Jong Chiye
View author publications
You can also search for this author in PubMed Google Scholar
Yuto Lim
View author publications
You can also search for this author in PubMed Google Scholar
Chin Kim On
View author publications
You can also search for this author in PubMed Google Scholar
Joe Henry Obit
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rayner Alfred .

Editor information

Editors and Affiliations

University of Tennessee, Knoxville, Tennessee, USA
Michael W. Berry
Universiti Teknologi MARA, Shah Alam, Malaysia
Azlinah Hj. Mohamed
Faculty of Computer and Mathematical Sciences, Universiti Teknologi MARA, Shah Alam, Malaysia
Bee Wah Yap

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Alfred, R., Chiye, G.J., Lim, Y., On, C.K., Obit, J.H. (2016). A Multi-objectives Genetic Algorithm Clustering Ensembles Based Approach to Summarize Relational Data. In: Berry, M., Hj. Mohamed, A., Yap, B. (eds) Soft Computing in Data Science. SCDS 2016. Communications in Computer and Information Science, vol 652. Springer, Singapore. https://doi.org/10.1007/978-981-10-2777-2_10

Download citation

DOI: https://doi.org/10.1007/978-981-10-2777-2_10
Published: 18 September 2016
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2776-5
Online ISBN: 978-981-10-2777-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics