Abstract
Many data mining techniques are these days in use for ontology learning – text mining, Web mining, graph mining, link analysis, relational data mining, and so on. In the current state-of-the-art bundle there is a lack of “software mining” techniques. This term denotes the process of extracting knowledge out of source code. In this paper we approach the software mining task with a combination of text mining and link analysis techniques. We discuss how each instance (i.e. a programming construct such as a class or a method) can be converted into a feature vector that combines the information about how the instance is interlinked with other instances, and the information about its (textual) content. The so-obtained feature vectors serve as the basis for the construction of the domain ontology with OntoGen, an existing system for semi-automatic data-driven ontology construction.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Batagelj, V., Mrvar, A., de Nooy, W.: Exploratory Network Analysis with Pajek. Cambridge University Press, Cambridge (2004)
Brank, J., Leskovec, J.: The Download Estimation Task on KDD Cup 2003. In: ACM SIGKDD Explorations Newsletter, vol. 5(2), pp. 160–162. ACM Press, New York (2003)
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics ACL 2002 (2002)
Fortuna, B., Grobelnik, M., Mladenic, D.: Semi-automatic Data-driven Ontology Construction System. In: Proceedings of the 9th International Multi-conference Information Society IS-2006, Ljubljana, Slovenia (2006)
Fortuna, B., Mladenic, D., Grobelnik, M.: Visualization of Text Document Corpus. Informatica 29, 497–502 (2005)
Grcar, M., Mladenic, D., Grobelnik, M., Bontcheva, K.: D2.1: Data Source Analysis and Method Selection. Project report IST-2004-026460 TAO, WP 2, D2.1 (2006)
Grcar, M., Mladenic, D., Grobelnik, M., Fortuna, B., Brank, J.: D2.2: Ontology Learning Implementation. Project report IST-2004-026460 TAO, WP 2, D2.2 (2006)
Maedche, A., Staab, S.: Discovering Conceptual Relations from Text. In: Proc. of ECAI 2000, pp. 321–325 (2001)
Helm, R., Maarek, Y.: Integrating Information Retrieval and Domain Specific Approaches for Browsing and Retrieval in Object-oriented Class Libraries. In: Proceedings of Object-oriented Programming Systems, Languages, and Applications, pp. 47–61. ACM Press, New York, USA (1991)
Mladenic, D., Grobelnik, M.: Visualizing Very Large Graphs Using Clustering Neighborhoods. In: Local Pattern Detection, Dagstuhl Castle, Germany, April 12–16, 2004 (2004)
Mladenic, D., Grobelnik, M.: Word Sequences as Features in Text Learning. In: Proceedings of the 17th Electrotechnical and Computer Science Conference ERK 1998, Ljubljana, Slovenia (1998)
Olston, C., Chi, H.E.: ScentTrails: Integrating Browsing and Searching on the Web. In: ACM Transactions on Computer-human Interaction TOCHI, vol. 10(3), pp. 177–197. ACM Press, New York (2003)
Sabou, M.: Building Web Service Ontologies. In: SIKS Dissertation Series No. 2004-4 (2006) ISBN 90-9018400-7
Kamada, T., Kawai, S.: An Algorithm for Drawing General Undirected Graphs. Information Processing Letters 31, 7–15 (1989)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Grcar, M., Grobelnik, M., Mladenic, D. (2008). Using Text Mining and Link Analysis for Software Mining. In: Raś, Z.W., Tsumoto, S., Zighed, D. (eds) Mining Complex Data. MCD 2007. Lecture Notes in Computer Science(), vol 4944. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68416-9_1
Download citation
DOI: https://doi.org/10.1007/978-3-540-68416-9_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68415-2
Online ISBN: 978-3-540-68416-9
eBook Packages: Computer ScienceComputer Science (R0)