Category Classification of the Training Set Combined with Sentence Multiplication for Semantic Data Extraction Using GENI Algorithm

Saravana Kumar   Coimbatore   Shanmugam; Santhosh      Rajendran; Amudhavalli      Padmanabhan; Kalaiarasan      Chellan

Abstract

Background: Increase in the internet data has increased the priority in the data extraction accuracy. Accuracy here lies with what data the user has requested for and what has been retrieved. The same large data sets that need to be analyzed make the required information retrieval a challenging task.

Objective: To propose a new algorithm in an improved way than the traditional methods to classify the category or group to which each training sentence belongs.

Method: Identifying the category to which the input sentence belongs is achieved by analyzing the Noun and Verb of each training sentence. NLP is applied to each training sentence and the group or category classification is achieved using the proposed GENI algorithm so that the classifier is trained efficiently to extract the user requested information.

Results: The input sentences are transformed into a data table by applying GENI algorithm for group categorization. Plotting the graph in R tool, the accuracy of the group extracted by the Classifier involving GENI approach is higher than that of Naive Bayes & Decision Trees.

Conclusion: It remains a challenging task to extract the user-requested data, when the user query is complex. Existing techniques are based more on the fixed attributes, and when we move with respect to the fixed attributes, it becomes too complex or impossible for us to determine the common group from the base sentence. Existing techniques are more suitable to a smaller dataset, whereas the proposed GENI algorithm does not hold any restrictions for the Group categorization of larger data sets.

Keywords: Text classification, semantic association, supervised learning, naive bayes, decision trees, GENI algorithm.

Graphical Abstract

[1] 
M. Budhiraja . "Multi label text classification for un-trained data through supervised learning" International Conference on Intelligent Computing and Control (I2C2)  2017; 1-3.
[2] 
P. Lui , H. Yu , T. Xu , and C. Lan . "Research on archives text classification based on naive bayes In 2017 IEEE 2nd Information Technology, Networking", Electronic and Automation Control Conference (ITNEC)  2017; 187-90.
[3] 
L. Li , and X. Zhang . "Study of data mining algorithm based on decision tree" International Conference on Computer Design and Applications  2010; Vol.1: V1-155.
[4] 
Z. Wang , and Z. Qu . "Research on web text classification algorithm based on improved CNN and SVM" 17th IEEE International Conference on Communication Technology  2017; 1958-61.
[5] 
Y. Zhao , Y. Qian , and C. Li . "Improved KNN text classification algorithm with mapreduce implementation" 4th International Conference on Systems and Informatics (ICSAI)  2017; 1417-22.
[6] 
L. Stanchev . "Semantic document clustering using information from WordNet and DBPedia" 12th IEEE International Conference on Semantic Computing  2018; 100-7.
[7] 
S. Brindha , K. Prabha , and S. Sukumaran . "A survey on classification techniques for text mining" 3rd International Conference on Advanced Computing and Communication Systems  2016; Vol.1: 1-5.
[8] 
S. Kohli , and H. Singal . "Data analysis with R" IEEE/ACM 7th International Conference on Utility and Cloud Computing" 2014 pp; 537-8.
[9] 
P.P. Shinde , K.S. Oza , and R.K. Kamat . "Big data predictive analysis: Using R analytical tool" International Conference on I-SMAC  2017; 839-42.

Rights & Permissions Print Cite

Article Metrics

5

Journal Information

For Authors

For Editors

For Reviewers

Explore Articles

Open Access

Open Access Articles

For Visitors

DOI https://dx.doi.org/10.2174/2213275912666190218153753	Print ISSN 2666-2558
Publisher Name Bentham Science Publisher	Online ISSN 2666-2566

Recent Advances in Computer Science and Communications

Category Classification of the Training Set Combined with Sentence Multiplication for Semantic Data Extraction Using GENI Algorithm

Abstract

Graphical Abstract

?The New Era of Computational Intelligence: Big Data Applications in Health Care?

Advanced Applications of Artificial Intelligence in Manufacturing Technologies

Advancements in AI and Machine Learning for Enhanced Computer Vision Applications

Advancing Computer Vision and Multimedia Communication for Seamless Human-Machine Interaction

Recent Advances in Computer Science and Communications

Category Classification of the Training Set Combined with Sentence Multiplication for Semantic Data Extraction Using GENI Algorithm

Abstract

Graphical Abstract

Call for Papers in Thematic Issues

?The New Era of Computational Intelligence: Big Data Applications in Health Care?

Advanced Applications of Artificial Intelligence in Manufacturing Technologies

Advancements in AI and Machine Learning for Enhanced Computer Vision Applications

Advancing Computer Vision and Multimedia Communication for Seamless Human-Machine Interaction

Related Journals

Related Books