Topic Discovery Using Frequent Subgraph Mining Approach

Nguyen, Tri; Do, Phuc

doi:10.1007/978-981-10-8276-4_41

Tri Nguyen³³ &
Phuc Do³³

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 488))

Included in the following conference series:

International Conference on Computational Science and Technology

1048 Accesses
1 Citations

Abstract

The topic modeling has long been used to check and explore the content of a document in dataset based on the search for hidden topics within the document. Over the years, many algorithms have evolved based on this model, with major approaches such as “bag-of-words” and vector spaces. These approaches mainly fulfill the search, statistics the frequency of occurrences of words related to the topic of the document, thereby extracting the topic model. However, with these approaches the structure of the sentence, namely the order of words, affects the meaning of the document is often ignored. In this paper, we propose a new approach to exploring the hidden topic of document in dataset using a co-occurrence graph. After that, the frequent subgraph mining algorithm is applied to model the topic. Our goal is to overcome the word order problem from affecting the meaning and topic of the document. Furthermore, we also implemented this model on a large distributed data processing system to speed up the processing of complex mathematical problems in graph, which required many of times to execute.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
MATH Google Scholar
Sonawane, S.S., Kulkarni, P.A.: Graph based representation and analysis of text document: a survey of techniques. Int. J. Comput. Appl. 96(19), 1–8 (2014)
Google Scholar
Khodeir, N.: Graphical representation in tutoring systems. Int. J. Comput. Sci. Inf. Technol. 9(3), 107–116 (2017)
Google Scholar
Rao, P.R., Devi, S.L.: Patent document summarization using conceptual graphs. Int. J. Nat. Lang. Comput. 6(3), 15–32 (2017)
Article Google Scholar
Tomita, J., Nakawatase, H., Ishii, M.: Graph-based text database for knowledge discovery. In: Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers and Posters, New York, NY, USA (2004)
Google Scholar
Wörlein, M., Meinl, T., Fischer, I., Philippsen, M.: A quantitative comparison of the subgraph miners MoFa, gSpan, FFSM, and Gaston. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 392–403. Springer, Heidelberg (2005)
Google Scholar
Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph. Pattern Recog. Lett. 19(3), 255–259 (1998)
Article MATH Google Scholar
Hassan, S., Mihalcea, R., Banea, C.: Random walk term weighting for improved text classification. Int. J. Semant. Comput. 1(4), 421–439 (2007)
Article Google Scholar
Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: Proceedings of the IEEE International Conference on Data Mining, 2002, ICDM 2003, pp. 721–724. IEEE (2002)
Google Scholar

Download references

Acknowledgments

This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCMC) under the grant number B2017-26-02.

Author information

Authors and Affiliations

University of Information Technology, VNU-HCM, Ho Chi Minh City, Vietnam
Tri Nguyen & Phuc Do

Authors

Tri Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Phuc Do
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Phuc Do .

Editor information

Editors and Affiliations

Knowledge Technology Research Unit, Faculty of Computing and Informatics, Universiti Malaysia Sabah, Kota Kinabalu, Malaysia
Rayner Alfred
School of Information Science, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Hiroyuki Iida
Faculty of Computing and Informatics, Universiti Malaysia Sabah, Kota Kinabalu, Malaysia
Ag. Asri Ag. Ibrahim
School of Information Science, Security and Networks Area, Japan Advanced Institute of Science and Technology, Nomi, Ishikawa, Japan
Yuto Lim

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, T., Do, P. (2018). Topic Discovery Using Frequent Subgraph Mining Approach. In: Alfred, R., Iida, H., Ag. Ibrahim, A., Lim, Y. (eds) Computational Science and Technology. ICCST 2017. Lecture Notes in Electrical Engineering, vol 488. Springer, Singapore. https://doi.org/10.1007/978-981-10-8276-4_41

Download citation

DOI: https://doi.org/10.1007/978-981-10-8276-4_41
Published: 24 February 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-8275-7
Online ISBN: 978-981-10-8276-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics