Skip to main content

Using Text Mining and Link Analysis for Software Mining

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4944))

Abstract

Many data mining techniques are these days in use for ontology learning – text mining, Web mining, graph mining, link analysis, relational data mining, and so on. In the current state-of-the-art bundle there is a lack of “software mining” techniques. This term denotes the process of extracting knowledge out of source code. In this paper we approach the software mining task with a combination of text mining and link analysis techniques. We discuss how each instance (i.e. a programming construct such as a class or a method) can be converted into a feature vector that combines the information about how the instance is interlinked with other instances, and the information about its (textual) content. The so-obtained feature vectors serve as the basis for the construction of the domain ontology with OntoGen, an existing system for semi-automatic data-driven ontology construction.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Batagelj, V., Mrvar, A., de Nooy, W.: Exploratory Network Analysis with Pajek. Cambridge University Press, Cambridge (2004)

    Google Scholar 

  2. Brank, J., Leskovec, J.: The Download Estimation Task on KDD Cup 2003. In: ACM SIGKDD Explorations Newsletter, vol. 5(2), pp. 160–162. ACM Press, New York (2003)

    Google Scholar 

  3. Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics ACL 2002 (2002)

    Google Scholar 

  4. Fortuna, B., Grobelnik, M., Mladenic, D.: Semi-automatic Data-driven Ontology Construction System. In: Proceedings of the 9th International Multi-conference Information Society IS-2006, Ljubljana, Slovenia (2006)

    Google Scholar 

  5. Fortuna, B., Mladenic, D., Grobelnik, M.: Visualization of Text Document Corpus. Informatica 29, 497–502 (2005)

    Google Scholar 

  6. Grcar, M., Mladenic, D., Grobelnik, M., Bontcheva, K.: D2.1: Data Source Analysis and Method Selection. Project report IST-2004-026460 TAO, WP 2, D2.1 (2006)

    Google Scholar 

  7. Grcar, M., Mladenic, D., Grobelnik, M., Fortuna, B., Brank, J.: D2.2: Ontology Learning Implementation. Project report IST-2004-026460 TAO, WP 2, D2.2 (2006)

    Google Scholar 

  8. Maedche, A., Staab, S.: Discovering Conceptual Relations from Text. In: Proc. of ECAI 2000, pp. 321–325 (2001)

    Google Scholar 

  9. Helm, R., Maarek, Y.: Integrating Information Retrieval and Domain Specific Approaches for Browsing and Retrieval in Object-oriented Class Libraries. In: Proceedings of Object-oriented Programming Systems, Languages, and Applications, pp. 47–61. ACM Press, New York, USA (1991)

    Chapter  Google Scholar 

  10. Mladenic, D., Grobelnik, M.: Visualizing Very Large Graphs Using Clustering Neighborhoods. In: Local Pattern Detection, Dagstuhl Castle, Germany, April 12–16, 2004 (2004)

    Google Scholar 

  11. Mladenic, D., Grobelnik, M.: Word Sequences as Features in Text Learning. In: Proceedings of the 17th Electrotechnical and Computer Science Conference ERK 1998, Ljubljana, Slovenia (1998)

    Google Scholar 

  12. Olston, C., Chi, H.E.: ScentTrails: Integrating Browsing and Searching on the Web. In: ACM Transactions on Computer-human Interaction TOCHI, vol. 10(3), pp. 177–197. ACM Press, New York (2003)

    Google Scholar 

  13. Sabou, M.: Building Web Service Ontologies. In: SIKS Dissertation Series No. 2004-4 (2006) ISBN 90-9018400-7

    Google Scholar 

  14. Kamada, T., Kawai, S.: An Algorithm for Drawing General Undirected Graphs. Information Processing Letters 31, 7–15 (1989)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Zbigniew W. Raś Shusaku Tsumoto Djamel Zighed

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Grcar, M., Grobelnik, M., Mladenic, D. (2008). Using Text Mining and Link Analysis for Software Mining. In: Raś, Z.W., Tsumoto, S., Zighed, D. (eds) Mining Complex Data. MCD 2007. Lecture Notes in Computer Science(), vol 4944. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68416-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68416-9_1

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68415-2

  • Online ISBN: 978-3-540-68416-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics