Published May 1, 2018 | Version v1
Conference paper Open

An Integrated Formal Representation for Terminological and Lexical Data included in Classification Schemes

  • 1. DFKI GmbH
  • 2. Saint Petersburg State University of Architecture and Civil Engineering

Description

This paper presents our work dealing with a potential application in e-lexicography: the automatized creation of specialized multilingual dictionaries from structured data, which are available in the form of comparable multilingual classification schemes or taxonomies. As starting examples, we use comparable industry classification schemes, which frequently occur in the context of stock exchanges and business reports. Initially, we planned to follow an approach based on cross-taxonomies and cross-languages string mapping to automatically detect candidate multilingual dictionary entries for this specific domain. However, the need to first transform the comparable classification schemes into a shared formal representation language in order to be able to properly align their components before implementing the algorithms for the multilingual lexicon extraction soon became apparent. We opted for the SKOS-XL vocabulary for modelling the multilingual terminological part of the comparable taxonomies and for OntoLex-Lemon for modelling the multilingual lexical entries which can be extracted from the original data. In this paper, we present the suggested modelling architecture, which demonstrates how terminological elements and lexical items can be formally integrated and explicitly cross-linked in the context of the Linguistic Linked Open Data (LLOD).

Notes

Grant by BMBF project "DeepLee - Tiefes Lernen fuer End-to-End-Anwendungen in der Sprachtechnologie" with number 01W17001.

Files

integrated-formal-representation-2(1).pdf

Files (358.4 kB)

Name Size Download all
md5:ac690acc4b3c2251e39291869d292f54
358.4 kB Preview Download

Additional details

Funding

ELEXIS – European Lexicographic Infrastructure 731015
European Commission