Documents Clustering Using K-Means Algorithm

R.B Wahyu, Arnold Vito

Abstract


Nowadays in the digital era, people could easily access and stored a wide range of information through the Internet into documents. With the huge number of unstructured documents with various type of information in digital storage, people need an application that could help them organize and classify the documents automatically. Documents Clustering using K-Means Algorithm is a desktop-based documents clustering application which implement K-Means Algorithm to provides clustering output based on the documents content similarity up to 85% accuracy based on the user expectation.


Full Text:

PDF

References


Ambler, S. W. (n.d.). Agile Modeling. Retrieved March 15 2017, from UML 2 Use Case Diagrams: AN Agile Introduction: http://www.agilemodeling.com/artifacts/useCaseDiagram.htm

Erb, E. (n.d.). Github. Retrieved April 20, 2017, from Document Clustering Program in Java: https://github.com/ezraerb/DocumentCluster

File: K Means Example Step 1.svg. (n.d.). Retrieved March 20, 2017, from Wikipedia: https://en.wikipedia.org/wiki/File:K_Means_Example_Step_1.svg

File: K Means Example Step 2.svg. (n.d.). Retrieved March 20, 2017, from Wikipedia: https://en.wikipedia.org/wiki/File:K_Means_Example_Step_2.svg

File: K Means Example Step 3.svg. (n.d.). Retrieved March 20, 2017, from Wikipedia: https://en.wikipedia.org/wiki/File:K_Means_Example_Step_3.svg

Information Retrieval. (n.d.). Retrieved March 22, 2017, from http://www.doc.ic.ac.uk/~nd/surprise_97/journal/vol4/hks/inf_ret.html

Jajoo, P. (2008). Document Clustering. Retrieved February 5, 2017

K-Means Clustering. (n.d.). Retrieved February 4, 2017, from A Tutorial on Clustering Algorithms: https://home.deib.polimi.it/matteucc/Clustering/tutorial_html/kmeans.html

Kunwar, S. (n.d.). Text Documents Clustering using K-Means Algorithm. Retrieved October 20, 2016, from Code Project: https://www.codeproject.com/Articles/439890/Text-Documents-Clustering-using-K-Means-Algorithm

Osinski, S., & Weiss, D. (n.d.). Carrot2. Retrieved April 27, 2017, from Carrot2 Search: http://search.carrot2.org/stable/search

Osinski, S., & Weiss, D. (n.d.). Carrot2 User and Developer Manual. Retrieved April 27, 2017, from Carrot2: http://download.carrot2.org/head/manual/index.html#chapter.introduction

Rose, B. (n.d.). Document Clustering with Python. Retrieved March 19, 2017, from http://brandonrose.org/clustering

Shah, N., & Mahajan, S. (2012). Document Clustering: A Detailed Review. International Journal of Applied Information Systems (IJAIS). Retrieved February 6, 2017

Sousa, S. d. (n.d.). The Advantages and Disadvantages of RAD Software Development. Retrieved October 4, 2016, from Susan de Sousa's My PM Expert: www.my-project-management-expert.com/the-advantages-and-disadvantages-of-rad-software-development.html

Teknomo, K. (n.d.). Difference of Cluster Analysis and Discriminant Analysis. Retrieved February 2, 2017, from Revoledu: http://people.revoledu.com/kardi/tutorial/LDA/Cluster%20and%20discriminant%20analysis.html

Teknomo, K. (n.d.). Discriminant Analysis Tutorial. Retrieved February 2, 2017, from Revoledu: http://people.revoledu.com/kardi/tutorial/LDA/

Teknomo, K. (n.d.). Euclidean Distance. Retrieved March 23, 2017, from Revoledu: http://people.revoledu.com/kardi/tutorial/Similarity/EuclideanDistance.html

Teknomo, K. (n.d.). How the K-Mean Clustering algorithm works? Retrieved March 1, 2017, from Revoledu: http://people.revoledu.com/kardi/tutorial/kMean/Algorithm.htm

Teknomo, K. (n.d.). What is Clustering? Retrieved January 30, 2017, from Revoledu: http://people.revoledu.com/kardi/tutorial/Clustering/clustering.htm

Teknomo, K. (n.d.). What is K-Mean Clustering? Retrieved January 31, 2017, from Revoledu: http://people.revoledu.com/kardi/tutorial/kMean/WhatIs.htm

toletol, K. (n.d.). Rapid Application Development (RAD) Model. Retrieved October 6, 2016, from Wikipedia: https://en.wikipedia.org/wiki/File:RADModel.JPG

What does tf-idf mean? (n.d.). Retrieved March 23, 2017, from http://www.tfidf.com/

Zong, J. (n.d.). K Means Clustering with Tf-idf Weights. Retrieved March 10, 2017, from http://jonathanzong.com/blog/2013/02/02/k-means-clustering-with-tfidf-weights




DOI: http://dx.doi.org/10.33021/itfs.v3i02.589

Refbacks

  • There are currently no refbacks.


Copyright (c) 2019 IT for Society




All articles in this journal are indexed in:

  


 Creative Commons License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.