透過您的圖書館登入
IP:3.137.218.215
  • 學位論文

Dolphins Inspired Novel Approach for Data Clustering

Dolphins Inspired Novel Approach for Data Clustering

指導教授 : 黃有評

摘要


We are on the verge of data explosion and there is an imminent need that some extensive measures should be taken to manage that gigantic bulk of data. One of the emerging and relying fields working on this problem is data mining. This thesis also serves the same purpose and introduces a novel technique to handle the data. First part of the thesis shows an existing technique being applied on large data and the later part proposes a novel technique. To show the authenticity of the proposed technique it is implemented on different types of data, compared with already renowned algorithm and in the end association rules are derived providing cross comparisons. Following the first part; depression is one of the emerging fatal diseases on world’s health care radar. This draft applies the data mining techniques such as association analysis and FP tree used altogether on depression database. The results from the analysis state the most common symptoms of depressed patients as well as provide an advantage of time and effort saving. In the later part a novel algorithm is proposed which imitates the concept of herding, i.e., dolphins catching their prey. This technique is a nature inspired method and introduces a new concept to cluster the data. It is an intelligent technique as all the data points are considered for every possible solution providing clustering and ideal centroids in a short time. This work presents the functioning of the algorithm on two different types of data sets that are artificially generated data sets and real medical data sets. It is believed that different results from various types of data can provide better diversity in results and also shows the validity of proposed algorithm. The results of the proposed algorithm are then compared with well-known fuzzy c-means (FCM) in terms of clusters formed presenting number of data items in each cluster, CPU processing time, simplicity and coverage and hence it has proved to be better than previous algorithm. After comparing it with the existing technique, association rules are derived from the unhandled raw data and the clustered data, i.e., before and after clustering the data. By doing so it will help us see the difference among results and also identify what new has been deduced in the results. Apart from comparison in data before and after clustering the same technique is applied on clusters formed to examine the bonding between inter clusters and intra clusters and it is likely to say that it fulfills the principle of clustering. The novel technique is manipulated and used every way so that no reservations are left to prove its efficiency and performance.

並列摘要


We are on the verge of data explosion and there is an imminent need that some extensive measures should be taken to manage that gigantic bulk of data. One of the emerging and relying fields working on this problem is data mining. This thesis also serves the same purpose and introduces a novel technique to handle the data. First part of the thesis shows an existing technique being applied on large data and the later part proposes a novel technique. To show the authenticity of the proposed technique it is implemented on different types of data, compared with already renowned algorithm and in the end association rules are derived providing cross comparisons. Following the first part; depression is one of the emerging fatal diseases on world’s health care radar. This draft applies the data mining techniques such as association analysis and FP tree used altogether on depression database. The results from the analysis state the most common symptoms of depressed patients as well as provide an advantage of time and effort saving. In the later part a novel algorithm is proposed which imitates the concept of herding, i.e., dolphins catching their prey. This technique is a nature inspired method and introduces a new concept to cluster the data. It is an intelligent technique as all the data points are considered for every possible solution providing clustering and ideal centroids in a short time. This work presents the functioning of the algorithm on two different types of data sets that are artificially generated data sets and real medical data sets. It is believed that different results from various types of data can provide better diversity in results and also shows the validity of proposed algorithm. The results of the proposed algorithm are then compared with well-known fuzzy c-means (FCM) in terms of clusters formed presenting number of data items in each cluster, CPU processing time, simplicity and coverage and hence it has proved to be better than previous algorithm. After comparing it with the existing technique, association rules are derived from the unhandled raw data and the clustered data, i.e., before and after clustering the data. By doing so it will help us see the difference among results and also identify what new has been deduced in the results. Apart from comparison in data before and after clustering the same technique is applied on clusters formed to examine the bonding between inter clusters and intra clusters and it is likely to say that it fulfills the principle of clustering. The novel technique is manipulated and used every way so that no reservations are left to prove its efficiency and performance.

並列關鍵字

Clustering dolphin algorithm FCM association rules

參考文獻


49. Y.-P. Huang, S.-L. Lai, F. E. Sandnes and S.-I. Liu, “Improving classifications of medical data based on fuzzy ART2 decision trees,” International Journal of Fuzzy Systems, vol. 14, no. 3, pp.444-453, Sep. 2012.
38. Y.-P. Huang, C.-Y. Huang, S.-R. Chen, S.-I. Liu and H.-C. Huang, “Discovering association rules from responded questionnaire for diagnosing geriatric depression,” in Proceedings of the ICME International Conference on Complex Medical Engineering, Kobe, Japan, pp.343-348, July 2012.
16. R. Agarwal, B. Kochar and D. Srivastava, “A novel and efficient KNN using modified apriori algorithm,” International Journal of Scientific & Technology research, vol. 2, no. 5, pp.112-117, May 2012.
3. D. Hand, H. Mannila, and P. Smyth, Principles of Data Mining, The MIT Press, MA, U.S.A., Mar. 2001.
10. K. D. Bailey, “Numerical taxonomy and cluster analysis,” Typologies and Taxonomies, issue 102, Sage publications, California, United States, pp.34, Apr. 1994.