Abstract
In learning relational data, the Dynamic Aggregation of Relational Attributes algorithm is capable to transform a multi-relational database into a vector space representation, in which a traditional clustering algorithm can then be applied directly to summarize relational data. However, the performance of the algorithm is highly dependent on the quality of clusters produced. A small change in the initialization of the clustering algorithm parameters may cause adverse effects to the clusters quality produced. In optimizing the quality of clusters, a Genetic Algorithm is used to find the best combination of initializations in order to produce the optimal clusters. The proposed method involves the task of finding the best initialization with respect to the number of clusters, proximity distance measurements, fitness functions, and classifiers used for the evaluation. Based on the results obtained, clustering coupled with Euclidean distance is found to perform better in the classification stage compared to using clustering coupled with Cosine similarity. Based on the findings, the cluster entropy is the best fitness function, followed by multi-objectives fitness function used in the genetic algorithm. This is most probably because of the involvement of external measurement that takes the class label into consideration in optimizing the structure of the cluster results. In short, this paper shows the influence of varying the initialization values on the predictive performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cattral, R., Oppacher, F., Graham, K.J.L.: Techniques for evolutionary rule discovery in data mining. In: Conference on Evolutionary Computation, pp. 1737–1744 (2009)
Xu, L., Jiang, C., Wang, J., Yuan, J., Ren, Y.: Information security in big data: privacy and data mining. In: IEEE 2014, pp. 1149–1176 (2014)
Dzeroski, S.: Relational data mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 887–911. Springer US (2010)
Ling, P., Rong, X.: Double-Phase Locality Sensitive Hashing of neighborhood development for multi-relational data. In: 13th UK Workshop on Computational Intelligence (UKCI), pp. 206–213 (2013)
Mistry, U., Thakkar, A.R.: Link-based classification for Multi-Relational database. In: Recent Advances and Innovations in Engineering (ICRAIE), pp. 1–6 (2014)
Zhang, W.: Multi-relational data mining based on higher-order inductive logic. In: WRI Global Congress in Intelligent Systems, Xiamen, pp. 453–458 (2009)
Roth, D., Yih, W.-T.: Propositionalization of relational learning: an information extraction case study. In: 17th International Joint Conference on Artificial Intelligence, Seattle (2001)
Nguyen, T.-S., Duong, T.-A., Kheau, C.S., Alfred, R., Keng, L.H.: Dimensionality reduction in data summarization approach to learning relational data. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) ACIIDS 2013, Part I. LNCS, vol. 7802, pp. 166–175. Springer, Heidelberg (2013)
Lu, B., Ju, F.: An optimized genetic K-means clustering algorithm. In: International Conference on Computer Science and Information Processing, pp. 1296–1299 (2012)
Li, T., Chen, Y.: A weight entropy k-means algorithm for clustering dataset with mixed numeric and categorical data. In: Fifth International Conference on Fuzzy Systems and Knowledge Discovery, Shandong, pp. 36–41(2008)
Cui, X., Potok, T.E., Palathingal, P.: Document clustering using particle swarm optimization. In: IEEE Swarm Intelligence Symposium 2005, pp. 185–191(2005)
Abdel-Kader, R.F.: Genetically improved PSO algorithm for efficient data clustering. In: 2nd International Conference on Machine Learning and Computing (ICMLC), pp. 71–75 (2010)
Bharwad, N.D., Goswami, M.M.: Proposed efficient approach for classification for multi-relational data mining using Bayesian Belief Network. In: 2014 International Conference on Green Computing Communication and Electrical Engineering, pp. 1–4 (2004)
Muggleton, S.: Inductive Logic Programming. New Gener. Comput. 8(4), 295–318 (1991)
Guo, J., Zheng, L., Li, T.: An efficient graph-based multi-relational data mining algorithm. In: International Conference on Computational Intelligence and Security, pp. 176–180 (2007)
Dutta, D., Dutta, P., Sil, J.: Data clustering with mixed features by multi objective generic algorithm. In: 12th International Conference on Hybrid Intelligent Systems, Pune, pp. 336–341 (2012)
Shah, N., Mahajan, S.: Document clustering: a detailed review. Int. J. Appl. Inf. Syst. 4, 30–38 (2012)
Chen, C.-L., Tseng, F.S.C., Liang, T.: An integration of WordNet and fuzzy association rule mining for multi-label document clustering. Data Knowl. Eng. 69(11), 1208–1226 (2010)
Pettinger, D., Di Fatta, G.: Space partitioning for scalable k-means. In: 9th International Conference in Machine Learning and Apps (ICMLA), pp. 319–324 (2010)
Rendon, E., Abundez, A.A.I., Quiroz, E.M.: Internal versus External cluster validation indexes. Int. J. Comput. and Commun. 5(1), 27–32 (2011)
Bilal, M., Masud, S., Athar, S.: FPGA design for statistics-inspired approximate sum-of-squared-error computation in multimedia applications. IEEE Trans. Circ. Syst. II: Express Briefs 59(8), 506–510 (2012)
Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, London (1999)
Razali, N.M., Geraghty, J.: Genetic algorithm performance with different selection strategies in solving TSP. In: Proceedings of the World Congress on Engineering 2011, London, vol. II (2011)
Wahid, A., Gao, X., Peter, A.: Multi-view clustering of web documents using multi-objective genetic algorithm. In: 2014 IEEE Congress Evolutionary Computation (CEC), pp. 2625–2632 (2014)
Wen, X., Li, X., Gao, L., Wan, L., Wang, W.: Multi-objective genetic algorithm for integrated process planning and scheduling with fuzzy processing time. In: 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI), pp. 293–298 (2013)
Konak, A., Coit, D.W., Smith, A.E.: Multi-objective optimization using genetic algorithms: a tutorial. Reliab. Eng. Syst. Saf. 9(9), 992–1007 (2006)
Ismail, F.S., Yusof, R., Waqiyuddin, S.M.M.: Multi-objective optimization problems: method and application. In: 2011 4th International Conference on Modeling, Simulation and Applied Optimization (ICMSAO), pp. 1–6 (2011)
Zeghichi, N., Assas, M., Mouss, L.H.: Genetic algorithm with pareto fronts for multi-criteria optimization case study milling parameters optimization. In: 2011 5th International Conference on Software, Knowledge Information, Industrial Management and Applications (SKIMA), Benevento, pp. 1–5 (2011)
Atashkari, K., NarimanZadeh, N., Ghavimi, A.R., Mahmoodabadi, M.J., Aghaienezhad, F.: Multi-objective optimization of power and heating system based on artificial bee colony. In: International Symposium on Innovations in Intelligent Systems and Applications (INISTA), Istanbul, pp. 64–68 (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Alfred, R., Chiye, G.J., Lim, Y., On, C.K., Obit, J.H. (2016). A Multi-objectives Genetic Algorithm Clustering Ensembles Based Approach to Summarize Relational Data. In: Berry, M., Hj. Mohamed, A., Yap, B. (eds) Soft Computing in Data Science. SCDS 2016. Communications in Computer and Information Science, vol 652. Springer, Singapore. https://doi.org/10.1007/978-981-10-2777-2_10
Download citation
DOI: https://doi.org/10.1007/978-981-10-2777-2_10
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2776-5
Online ISBN: 978-981-10-2777-2
eBook Packages: Computer ScienceComputer Science (R0)