Cross-lingual link discovery with TR-ESA
Section snippets
Introduction and motivations
The Linked Data paradigm has been proposed to publish structured data on the web in a way that data can be easily consumed by third-party applications [18]. Several tools can be used to transform data into Resource Description Framework (rdf),1 a format compliant to the Linked Data principles. However, publishing an rdf dataset on the web is not sufficient to realize the vision of linked data. To interconnect two datasets, a data linking task has to be performed. The task
A Semantic matching function for short textual descriptions
Matching two or more texts is essential for several artificial-intelligence tasks, such as classification, clustering, filtering, and retrieval. Text matching can be implemented as simple string matching, which analyzes the lexical overlap between two texts, or can take into account also their semantics.
In this section, we present a semantic-based matching function able to deal with short textual content in different languages. The data linking strategy adopted in the paper is based on this
Interactive cross-lingual data linking
tr-esa is used as a feature generation and matching method in the interactive approach to cross-lingual data linking [16], [26], [27] proposed in the paper.
Definition 1 (Cross-lingual data linking) Let S and T be two sets of resources, called source (S) and target (T) dataset, described in two different languages L1 and L2 respectively. Let R be set of relations between resources in S and T. A cross-lingual data linking task can be defined as a partial function l: S × T → R, defined as follows:
CroSeR for cross-lingual linking of E-gov services
The cross-lingual link discovery approach described in Section 3 was implemented in a system named CroSeR (Cross-lingual Service Retrieval). CroSeR supports users in the specific task of linking e-gov services described in different languages. In this domain, S represents the source service catalog, T is the target service catalog, and R is the set of relations defined as owl:sameAs, skos:narrowMatch, skos:broadMatch}. The target service T is the European Local Government Service List (lgsl).
Experimental evaluation
We carried out two experimental sessions: an in-vitro experiment useful to detect the best system configuration, and an in-vivo experiment in which CroSeR was exploited for helping human experts to link an Italian catalog of e-gov services to the lgsl.
We tested our approach in the e-gov domain for different reasons:
- First, linking public services descriptions is a real-world problem of interest for many governments involved in Open Data initiatives. Linking public services is an objective of
Related work
To better scope the problem addressed in this paper, we report the distinction between multi-language information access (mlia) and cross-lingual information access (clia) proposed in the literature. mlia is the problem of accessing, querying and retrieving information from collections in any language and at any level of specificity [43]. In this sense, mlia subsumes clia, which is the problem of accessing a data collection in a target language L′ by using a source language L, where L ≠ L′.
Conclusions and future work
In this paper we presented a cross-lingual link discovery approach based on an effective method to match short textual descriptions written in different languages. Our matching method is based on the definition of tr-esa, a translation-based version of the Explicit Semantic Analysis that performs a machine translation of the input text and generates a Wikipedia-based representations for it. This matching method is used to recommend potential cross-lingual links to users of a web application by
References (53)
- et al.
Using semantic data to improve cross-lingual linking of article clusters
Web Semant.
(2015) - et al.
Challenges for the multilingual web of data
Web Semant.
(2012) - et al.
Concept-based item representations for a cross-lingual content-based recommendation process
Inf. Sci.
(2016) On link discovery using a hybrid approach
J. Data Semant.
(2012)WeSeE-Match results for OEAI 2012
Proceedings of the 7th International Workshop on Ontology Matching (OM 2012)
(2012)- et al.
Overview of the NTCIR-10 cross-lingual link discovery task
Proceedings of the Tenth NTCIR Workshop Meeting, page to appear, NII, Tokyo
(2013) - et al.
Bridging the gap between citizens and local administrations with knowledge-based service bundle recommendations
24th International Workshop on Database and Expert Systems Applications, DEXA 2013, Prague, Czech Republic, August 26–29, 2013
(2013) - et al.
Latent dirichlet allocation
J. Mach. Learn. Res.
(2003) - et al.
Analysis and refinement of cross-lingual entity linking
Information Access Evaluation. Multilinguality, Multimodality, and Visual Analytics
(2012) - et al.
Quality-based model for effective and robust multi-user pay-as-you-go ontology matching
Semant. Web
(2015)
Indexing by latent semantic analysis
JASIS
Large-scale linked data integration using probabilistic reasoning and crowdsourcing
VLDB J.
Item-based top-n recommendation algorithms
ACM Trans. Inf. Syst.
Comparing taxonomies for organising collections of documents
TAGME: On-the-fly annotation of short text fragments (by Wikipedia entities)
Proceedings of the 19th ACM International Conference on Information and Knowledge Management
Wikipedia-based semantic interpretation for natural language processing
J. Artif. Intell. Res.
Monolingual and cross-lingual ontology matching with CIDER-CL: evaluation report for OAEI 2013
When owl:sameAs isn’t the same: an analysis of identity in linked data
The Semantic Web–ISWC 2010
Linked Data: Evolving the Web into a Global Data Space
Dictionary-based cross-language information retrieval: learning experiences from CLEF 2000–2002
Inf. Retr.
Cross-lingual lexical matching with word translation and local similarity optimization
Proceedings of the 10th International Conference on Semantic Systems, SEMANTiCS 2015, Vienna, Austria, September.
Effectiveness of automatic translations for cross-lingual ontology mapping
J. Artif. Intell. Res.
WikiMatch - using Wikipedia for ontology matching
Proceedings of the 7th International Workshop on Ontology Matching (OM 2012)
Cited by (8)
A fully automated approach to a complete Semantic Table Interpretation
2020, Future Generation Computer SystemsCitation Excerpt :Tables are essential to perform queries, but the implicit or visual structures employed in tables are not easily machine-readable. In order to allow computers to interpret, combine and reuse such data for several artificial-intelligence tasks (such as classification, clustering, filtering, and retrieval [5]), the semantics of data should become explicit. Therefore, an underlying requirement is identifying and annotating entities in cells, their types and the connections between entities.
Linking and disambiguating entities across heterogeneous RDF graphs
2019, Journal of Web SemanticsCitation Excerpt :The latter approach anchors the resources as vectors of BabelNet identifiers where each of them represents a sense of a term allowing to compute vector distances as a proxy for instance similarity. Combining machine translation with concept embeddings, [25] translates each resource description to English and then a Wikipedia-based representation (a set of concepts) is generated for the resources in order to compare them. Datatype properties vs. object properties.
A survey of semantic relatedness evaluation datasets and procedures
2020, Artificial Intelligence ReviewSemantics in adaptive and personalised systems: Methods, tools and applications
2019, Semantics in Adaptive and Personalised Systems: Methods, Tools and ApplicationsDoing web data: From dataset recommendation to data linking
2018, NoSQL Data Models: Trends and Challenges