Skip to main content

Topic Discovery Using Frequent Subgraph Mining Approach

  • Conference paper
  • First Online:
Computational Science and Technology (ICCST 2017)

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 488))

Included in the following conference series:

Abstract

The topic modeling has long been used to check and explore the content of a document in dataset based on the search for hidden topics within the document. Over the years, many algorithms have evolved based on this model, with major approaches such as “bag-of-words” and vector spaces. These approaches mainly fulfill the search, statistics the frequency of occurrences of words related to the topic of the document, thereby extracting the topic model. However, with these approaches the structure of the sentence, namely the order of words, affects the meaning of the document is often ignored. In this paper, we propose a new approach to exploring the hidden topic of document in dataset using a co-occurrence graph. After that, the frequent subgraph mining algorithm is applied to model the topic. Our goal is to overcome the word order problem from affecting the meaning and topic of the document. Furthermore, we also implemented this model on a large distributed data processing system to speed up the processing of complex mathematical problems in graph, which required many of times to execute.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  2. Sonawane, S.S., Kulkarni, P.A.: Graph based representation and analysis of text document: a survey of techniques. Int. J. Comput. Appl. 96(19), 1–8 (2014)

    Google Scholar 

  3. Khodeir, N.: Graphical representation in tutoring systems. Int. J. Comput. Sci. Inf. Technol. 9(3), 107–116 (2017)

    Google Scholar 

  4. Rao, P.R., Devi, S.L.: Patent document summarization using conceptual graphs. Int. J. Nat. Lang. Comput. 6(3), 15–32 (2017)

    Article  Google Scholar 

  5. Tomita, J., Nakawatase, H., Ishii, M.: Graph-based text database for knowledge discovery. In: Proceedings of the 13th International World Wide Web Conference on Alternate Track Papers and Posters, New York, NY, USA (2004)

    Google Scholar 

  6. Wörlein, M., Meinl, T., Fischer, I., Philippsen, M.: A quantitative comparison of the subgraph miners MoFa, gSpan, FFSM, and Gaston. In: European Conference on Principles of Data Mining and Knowledge Discovery, pp. 392–403. Springer, Heidelberg (2005)

    Google Scholar 

  7. Bunke, H., Shearer, K.: A graph distance metric based on the maximal common subgraph. Pattern Recog. Lett. 19(3), 255–259 (1998)

    Article  MATH  Google Scholar 

  8. Hassan, S., Mihalcea, R., Banea, C.: Random walk term weighting for improved text classification. Int. J. Semant. Comput. 1(4), 421–439 (2007)

    Article  Google Scholar 

  9. Yan, X., Han, J.: gSpan: graph-based substructure pattern mining. In: Proceedings of the IEEE International Conference on Data Mining, 2002, ICDM 2003, pp. 721–724. IEEE (2002)

    Google Scholar 

Download references

Acknowledgments

This research is funded by Vietnam National University Ho Chi Minh City (VNU-HCMC) under the grant number B2017-26-02.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Phuc Do .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Nguyen, T., Do, P. (2018). Topic Discovery Using Frequent Subgraph Mining Approach. In: Alfred, R., Iida, H., Ag. Ibrahim, A., Lim, Y. (eds) Computational Science and Technology. ICCST 2017. Lecture Notes in Electrical Engineering, vol 488. Springer, Singapore. https://doi.org/10.1007/978-981-10-8276-4_41

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-8276-4_41

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-8275-7

  • Online ISBN: 978-981-10-8276-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics