Abstract
In order to support the domain modeling process in model-based software development, we automatically create large networks of semantically related terms from natural language. Using part-of-speech tagging, lexical patterns and co-occurrence analysis, and several semantic improvement algorithms, we construct SemNet, a network of approximately 2.7 million single and multi-word terms and 37 million relations denoting the degree of semantic relatedness. This paper gives a comprehensive description of the construction of SemNet, provides examples of the analysis process and compares it to other knowledge bases. We demonstrate the application of the network within the Eclipse/Ecore modeling tools by adding semantically enhanced class name autocompletion and other semantic support facilities like concept similarity.
Chapter PDF
References
Kelly, S., Tolvanen, J.P.: Domain-Specific Modeling: Enabling Full Code Generation. Wiley-IEEE Computer Society Press (March 2008)
Fowler, M.: Domain Specific Languages. Addison-Wesley, Boston (2010)
Mernik, M., Heering, J., Sloane, A.M.: When and how to develop domain-specific languages. ACM Comput. Surv. 37, 316–344 (2005)
Pastor, O., Molina, J.C.: Model-Driven Architecture in Practice: A Software Production Environment Based on Conceptual Modeling. Springer-Verlag New York, Inc., Secaucus (2007)
Agt, H.: Supporting Software Language Engineering by Automated Domain Knowledge Acquisition. In: Kienzle, J. (ed.) MODELS 2011 Workshops. LNCS, vol. 7167, pp. 4–11. Springer, Heidelberg (2012)
Evans: Domain-Driven Design: Tacking Complexity in the Heart of Software. Addison-Wesley Longman Publishing Co., Inc., Boston (2003)
Jurafsky, D., Martin, J.: Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice Hall series in artificial intelligence. Prentice Hall (2000)
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-Rich Part-of-Speech Tagging with a Cyclic Dependency Network. In: Proceedings of the NAACL 2003, pp. 173–180. Association for Computational Linguistics, Stroudsburg (2003)
Marcus, M.P., Marcinkiewicz, M.A., Santorini, B.: Building a Large Annotated Corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)
Zesch, T.: Study of Semantic Relatedness of Words Using Collaboratively Constructed Semantic Resources. PhD thesis, TU Darmstadt (February 2010)
Michel, J.B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., Team, T.G.B., Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M.A., Aiden, E.L.: Quantitative Analysis of Culture Using Millions of Digitized Books. Science 331(6014), 176–182 (2011)
Hearst, M.A.: Automatic acquisition of hyponyms from large text corpora. In: Proceedings of the 14th Conference on Computational Linguistics, COLING 1992, Stroudsburg, PA, USA, vol. 2 (1992)
Harris, Z.: Distributional structure. Word 10(23), 146–162 (1954)
Tandon, N., de Melo, G., Weikum, G.: Deriving a Web-Scale Common Sense Fact Database. In: AAAI (2011)
Agt, H.: SemAcom: A System for Modeling with Semantic Autocompletion. In: Model Driven Engineering Languages and Systems - 15th International Conference, MODELS 2012, Demo Track, Innsbruck, Austria (2012)
Fellbaum, C.: WordNet: An Electronic Lexical Database. The MIT Press, Cambridge (1998)
Speer, R., Havasi, C.: Representing General Relational Knowledge in ConceptNet 5. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey (2012)
Tairas, R., Mernik, M., Gray, J.: Using Ontologies in the Domain Analysis of Domain-Specific Languages. In: Chaudron, M.R.V. (ed.) MODELS 2008. LNCS, vol. 5421, pp. 332–342. Springer, Heidelberg (2009)
Agt, H., Kutsche, R.D., Wegeler, T.: Guidance for Domain Specific Modeling in Small and Medium Enterprises. In: SPLASH 2011 Workshops. Proceedings of the Compilation of the Co-located Workshops on DSM 2011, Portland, OR, USA (2011)
Nulty, P., Costello, F.: Using lexical patterns in the Google Web 1T corpus to deduce semantic relations between nouns. In: Proceedings of the Workshop on Semantic Evaluations, DEW 2009, Stroudsburg, PA, USA, pp. 58–63 (2009)
Grineva, M., Grinev, M., Lizorkin, D.: Extracting key terms from noisy and multitheme documents. In: Proceedings of the 18th International Conference on World Wide Web, WWW 2009, pp. 661–670. ACM, New York (2009)
Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Int. Res. 37(1), 141–188 (2010)
Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI 2007, San Francisco, CA, USA (2007)
Henderson-Sellers, B.: Bridging metamodels and ontologies in software engineering. J. Syst. Softw. 84, 301–313 (2011)
Thonggoom, O., Song, I.-Y., An, Y.: Semi-automatic conceptual data modeling using entity and relationship instance repositories. In: Jeusfeld, M., Delcambre, L., Ling, T.-W. (eds.) ER 2011. LNCS, vol. 6998, pp. 219–232. Springer, Heidelberg (2011)
Gomes, P., Gandola, P., Cordeiro, J.: Helping software engineers reusing UML class diagrams. In: Weber, R.O., Richter, M.M. (eds.) ICCBR 2007. LNCS (LNAI), vol. 4626, pp. 449–462. Springer, Heidelberg (2007)
West, R., Precup, D., Pineau, J.: Automatically suggesting topics for augmenting text documents. In: Proceedings of the 19th International Conference on Information and Knowledge Management, CIKM 2010. ACM, New York (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Agt, H., Kutsche, RD. (2013). Automated Construction of a Large Semantic Network of Related Terms for Domain-Specific Modeling. In: Salinesi, C., Norrie, M.C., Pastor, Ó. (eds) Advanced Information Systems Engineering. CAiSE 2013. Lecture Notes in Computer Science, vol 7908. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38709-8_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-38709-8_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38708-1
Online ISBN: 978-3-642-38709-8
eBook Packages: Computer ScienceComputer Science (R0)