Skip to main content

A Multi-objectives Genetic Algorithm Clustering Ensembles Based Approach to Summarize Relational Data

  • Conference paper
  • First Online:
Soft Computing in Data Science (SCDS 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 652))

Included in the following conference series:

  • 832 Accesses

Abstract

In learning relational data, the Dynamic Aggregation of Relational Attributes algorithm is capable to transform a multi-relational database into a vector space representation, in which a traditional clustering algorithm can then be applied directly to summarize relational data. However, the performance of the algorithm is highly dependent on the quality of clusters produced. A small change in the initialization of the clustering algorithm parameters may cause adverse effects to the clusters quality produced. In optimizing the quality of clusters, a Genetic Algorithm is used to find the best combination of initializations in order to produce the optimal clusters. The proposed method involves the task of finding the best initialization with respect to the number of clusters, proximity distance measurements, fitness functions, and classifiers used for the evaluation. Based on the results obtained, clustering coupled with Euclidean distance is found to perform better in the classification stage compared to using clustering coupled with Cosine similarity. Based on the findings, the cluster entropy is the best fitness function, followed by multi-objectives fitness function used in the genetic algorithm. This is most probably because of the involvement of external measurement that takes the class label into consideration in optimizing the structure of the cluster results. In short, this paper shows the influence of varying the initialization values on the predictive performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cattral, R., Oppacher, F., Graham, K.J.L.: Techniques for evolutionary rule discovery in data mining. In: Conference on Evolutionary Computation, pp. 1737–1744 (2009)

    Google Scholar 

  2. Xu, L., Jiang, C., Wang, J., Yuan, J., Ren, Y.: Information security in big data: privacy and data mining. In: IEEE 2014, pp. 1149–1176 (2014)

    Google Scholar 

  3. Dzeroski, S.: Relational data mining. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 887–911. Springer US (2010)

    Google Scholar 

  4. Ling, P., Rong, X.: Double-Phase Locality Sensitive Hashing of neighborhood development for multi-relational data. In: 13th UK Workshop on Computational Intelligence (UKCI), pp. 206–213 (2013)

    Google Scholar 

  5. Mistry, U., Thakkar, A.R.: Link-based classification for Multi-Relational database. In: Recent Advances and Innovations in Engineering (ICRAIE), pp. 1–6 (2014)

    Google Scholar 

  6. Zhang, W.: Multi-relational data mining based on higher-order inductive logic. In: WRI Global Congress in Intelligent Systems, Xiamen, pp. 453–458 (2009)

    Google Scholar 

  7. Roth, D., Yih, W.-T.: Propositionalization of relational learning: an information extraction case study. In: 17th International Joint Conference on Artificial Intelligence, Seattle (2001)

    Google Scholar 

  8. Nguyen, T.-S., Duong, T.-A., Kheau, C.S., Alfred, R., Keng, L.H.: Dimensionality reduction in data summarization approach to learning relational data. In: Selamat, A., Nguyen, N.T., Haron, H. (eds.) ACIIDS 2013, Part I. LNCS, vol. 7802, pp. 166–175. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  9. Lu, B., Ju, F.: An optimized genetic K-means clustering algorithm. In: International Conference on Computer Science and Information Processing, pp. 1296–1299 (2012)

    Google Scholar 

  10. Li, T., Chen, Y.: A weight entropy k-means algorithm for clustering dataset with mixed numeric and categorical data. In: Fifth International Conference on Fuzzy Systems and Knowledge Discovery, Shandong, pp. 36–41(2008)

    Google Scholar 

  11. Cui, X., Potok, T.E., Palathingal, P.: Document clustering using particle swarm optimization. In: IEEE Swarm Intelligence Symposium 2005, pp. 185–191(2005)

    Google Scholar 

  12. Abdel-Kader, R.F.: Genetically improved PSO algorithm for efficient data clustering. In: 2nd International Conference on Machine Learning and Computing (ICMLC), pp. 71–75 (2010)

    Google Scholar 

  13. Bharwad, N.D., Goswami, M.M.: Proposed efficient approach for classification for multi-relational data mining using Bayesian Belief Network. In: 2014 International Conference on Green Computing Communication and Electrical Engineering, pp. 1–4 (2004)

    Google Scholar 

  14. Muggleton, S.: Inductive Logic Programming. New Gener. Comput. 8(4), 295–318 (1991)

    Article  MATH  Google Scholar 

  15. Guo, J., Zheng, L., Li, T.: An efficient graph-based multi-relational data mining algorithm. In: International Conference on Computational Intelligence and Security, pp. 176–180 (2007)

    Google Scholar 

  16. Dutta, D., Dutta, P., Sil, J.: Data clustering with mixed features by multi objective generic algorithm. In: 12th International Conference on Hybrid Intelligent Systems, Pune, pp. 336–341 (2012)

    Google Scholar 

  17. Shah, N., Mahajan, S.: Document clustering: a detailed review. Int. J. Appl. Inf. Syst. 4, 30–38 (2012)

    Google Scholar 

  18. Chen, C.-L., Tseng, F.S.C., Liang, T.: An integration of WordNet and fuzzy association rule mining for multi-label document clustering. Data Knowl. Eng. 69(11), 1208–1226 (2010)

    Article  Google Scholar 

  19. Pettinger, D., Di Fatta, G.: Space partitioning for scalable k-means. In: 9th International Conference in Machine Learning and Apps (ICMLA), pp. 319–324 (2010)

    Google Scholar 

  20. Rendon, E., Abundez, A.A.I., Quiroz, E.M.: Internal versus External cluster validation indexes. Int. J. Comput. and Commun. 5(1), 27–32 (2011)

    Google Scholar 

  21. Bilal, M., Masud, S., Athar, S.: FPGA design for statistics-inspired approximate sum-of-squared-error computation in multimedia applications. IEEE Trans. Circ. Syst. II: Express Briefs 59(8), 506–510 (2012)

    Article  Google Scholar 

  22. Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, London (1999)

    MATH  Google Scholar 

  23. Razali, N.M., Geraghty, J.: Genetic algorithm performance with different selection strategies in solving TSP. In: Proceedings of the World Congress on Engineering 2011, London, vol. II (2011)

    Google Scholar 

  24. Wahid, A., Gao, X., Peter, A.: Multi-view clustering of web documents using multi-objective genetic algorithm. In: 2014 IEEE Congress Evolutionary Computation (CEC), pp. 2625–2632 (2014)

    Google Scholar 

  25. Wen, X., Li, X., Gao, L., Wan, L., Wang, W.: Multi-objective genetic algorithm for integrated process planning and scheduling with fuzzy processing time. In: 2013 Sixth International Conference on Advanced Computational Intelligence (ICACI), pp. 293–298 (2013)

    Google Scholar 

  26. Konak, A., Coit, D.W., Smith, A.E.: Multi-objective optimization using genetic algorithms: a tutorial. Reliab. Eng. Syst. Saf. 9(9), 992–1007 (2006)

    Article  Google Scholar 

  27. Ismail, F.S., Yusof, R., Waqiyuddin, S.M.M.: Multi-objective optimization problems: method and application. In: 2011 4th International Conference on Modeling, Simulation and Applied Optimization (ICMSAO), pp. 1–6 (2011)

    Google Scholar 

  28. Zeghichi, N., Assas, M., Mouss, L.H.: Genetic algorithm with pareto fronts for multi-criteria optimization case study milling parameters optimization. In: 2011 5th International Conference on Software, Knowledge Information, Industrial Management and Applications (SKIMA), Benevento, pp. 1–5 (2011)

    Google Scholar 

  29. Atashkari, K., NarimanZadeh, N., Ghavimi, A.R., Mahmoodabadi, M.J., Aghaienezhad, F.: Multi-objective optimization of power and heating system based on artificial bee colony. In: International Symposium on Innovations in Intelligent Systems and Applications (INISTA), Istanbul, pp. 64–68 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rayner Alfred .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Alfred, R., Chiye, G.J., Lim, Y., On, C.K., Obit, J.H. (2016). A Multi-objectives Genetic Algorithm Clustering Ensembles Based Approach to Summarize Relational Data. In: Berry, M., Hj. Mohamed, A., Yap, B. (eds) Soft Computing in Data Science. SCDS 2016. Communications in Computer and Information Science, vol 652. Springer, Singapore. https://doi.org/10.1007/978-981-10-2777-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-2777-2_10

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-2776-5

  • Online ISBN: 978-981-10-2777-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics