research-article

RDF shape induction using knowledge base profiling

Authors:
Nandana Mihindukulasooriya

Universidad Politécnica de Madrid, Spain

Universidad Politécnica de Madrid, Spain
View Profile

,
Mohammad Rifat Ahmmad Rashid

Politecnico di Torino, Italy

Politecnico di Torino, Italy
View Profile

,
Giuseppe Rizzo

Instituto Superiore Mario Boella, Italy

Instituto Superiore Mario Boella, Italy
View Profile

,
Raúl García-Castro

Universidad Politécnica de Madrid, Spain

Universidad Politécnica de Madrid, Spain
View Profile

,
Oscar Corcho

Universidad Politécnica de Madrid, Spain

Universidad Politécnica de Madrid, Spain
View Profile

,
Marco Torchiano

Politecnico di Torino, Italy

Politecnico di Torino, Italy
View Profile

SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied ComputingApril 2018Pages 1952–1959https://doi.org/10.1145/3167132.3167341

Published:09 April 2018Publication History

SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

Pages 1952–1959

ABSTRACT

Knowledge Graphs (KGs) are becoming the core of most artificial intelligent and cognitive applications. Popular KGs such as DBpedia and Wikidata have chosen the RDF data model to represent their data. Despite the advantages, there are challenges in using RDF data, for example, data validation. Ontologies for specifying domain conceptualizations in RDF data are designed for entailments rather than validation. Most ontologies lack the granular information needed for validating constraints. Recent work on RDF Shapes and standardization of languages such as SHACL and ShEX provide better mechanisms for representing integrity constraints for RDF data. However, manually creating constraints for large KGs is still a tedious task. In this paper, we present a data driven approach for inducing integrity constraints for RDF data using data profiling. Those constraints can be combined into RDF Shapes and can be used to validate RDF graphs. Our method is based on machine learning techniques to automatically generate RDF shapes using profiled RDF data as features. In the experiments, the proposed approach achieved 97% precision in deriving RDF Shapes with cardinality constraints for a subset of DBpedia data.

References

Ziawasch Abedjan and Felix Naumann. 2013. Improving RDF Data Through Association Rule Mining. Datenbank-Spektrum 13, 2 (01 Jul 2013), 111--120.Google Scholar
Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of databases: the logical level. (1995). Google ScholarDigital Library
Rakesh Agrawal, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, A Inkeri Verkamo, et al. 1996. Fast discovery of association rules. Advances in knowledge discovery and data mining 12, 1 (1996), 307--328. Google ScholarDigital Library
Adrien Basse, Fabien Gandon, Isabelle Mirbel, and Moussa Lo. 2010. DFS-based frequent graph pattern extraction to characterize the content of RDF Triple Stores. In Web Science Conference 2010 (WebSci10).Google Scholar
Christopher M Bishop. 2006. Pattern recognition and machine learning. springer. Google ScholarDigital Library
Peter Bloem and Gerben K. D. De Vries. 2014. Machine Learning on Linked Data, a Position Paper. In Proceedings of the 1st International Conference on Linked Data for Knowledge Discovery - Volume 1232 (LD4KD'14). CEUR-WS.org, Aachen, Germany, Germany, 64--68. http://dl.acm.org/citation.cfm?id=3053827.3053834 Google ScholarDigital Library
Eva Blomqvist, Ziqi Zhang, Anna Lisa Gentile, Isabelle Augenstein, and Fabio Ciravegna. 2013. Statistical knowledge patterns for characterising linked data. In Proceedings of the 4th International Conference on Ontology and Semantic Web Patterns-Volume 1188. CEUR-WS. org, 1--13. Google ScholarDigital Library
Lorenz Bühmann, Daniel Fleischhacker, Jens Lehmann, Andre Melo, and Johanna Völker. 2014. Inductive lexical learning of class expressions. In International Conference on Knowledge Engineering and Knowledge Management. Springer, 42--53.Google ScholarCross Ref
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321--357. Google ScholarCross Ref
Luc De Raedt, Tias Guns, and Siegfried Nijssen. 2010. Constraint programming for data mining and machine learning. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10). 1671--1675. Google ScholarDigital Library
David A Freedman. 2009. Statistical models: theory and practice. cambridge university press.Google Scholar
Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.Google Scholar
Johannes Fürnkranz and Peter A Flach. 2005. Roc 'n' rule learning - towards a better understanding of covering algorithms. Machine Learning 58, 1 (2005), 39--77. Google ScholarDigital Library
Lise Getoor and Ben Taskar. 2007. Introduction to statistical relational learning. MIT press. Google ScholarDigital Library
TinKamHo. 1995. Random decision forests. In Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on, Vol. 1. IEEE, 278--282. Google ScholarDigital Library
Aidan Hogan, Andreas Harth, Alexandre Passant, Stefan Decker, and Axel Polleres. 2010. Weaving the Pedantic Web. In Proceedings of the Linked Data on the Web (LDOW 2010), Vol. 628. CEUR Workshop Proceedings.Google Scholar
Theodore Johnson. 2009. Data Profiling. In Encyclopedia of Database Systems, LING LIU and M. TAMER ÖZSU (Eds.). Springer US, Boston, MA, 604--608.Google Scholar
Hassan Khosravi and Bahareh Bina. 2010. A Survey on Statistical Relational Learning.. In Canadian Conference on AI. Springer, 256--268. Google ScholarDigital Library
Holger Knublauch and Dimitris Kontokostas. 2017. W3C Shapes Constraint Language (SHACL). (July 2017). https://www.w3.org/TR/shacl/Google Scholar
Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning. The MIT Press. Google ScholarDigital Library
Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick Van Kleef, Sören Auer, et al. 2015. DBpedia-a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6, 2 (2015), 167--195.Google ScholarCross Ref
Stephen W Liddle, David W Embley, and Scott N Woodfield. 1993. Cardinality constraints in semantic data models. Data & Knowledge Engineering 11, 3 (1993), 235--270. Google ScholarDigital Library
Deborah L McGuinness, Frank Van Harmelen, et al. 2004. OWL web ontology language overview. W3C recommendation 10, 10 (2004), 2004.Google Scholar
Nandana Mihindukulasooriya, María Poveda-Villalón, Raúl García-Castro, and Asunción Gómez-Pérez. 2015. Loupe-An Online Tool for Inspecting Datasets in the Linked Data Cloud. In Demo at the 14th International Semantic Web Conference. Bethlehem, USA.Google Scholar
Thomas Neumann and Guido Moerkotte. 2011. Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. In Data Engineering (ICDE), 2011 IEEE 27th International Conference on. IEEE, 984--994. Google ScholarDigital Library
Eric Prud'hommeaux, Iovka Boneva, Jose Emilio Labra-Gayo, and Gregg Kellogg. 2017. Shape Expressions Language 2.0. (July 2017). http://shex.io/shex-semantics/Google Scholar
Eric Prud'hommeaux, Jose Emilio Labra Gayo, and Harold Solbrig. 2014. Shape expressions: an RDF validation and transformation language. In Proceedings of the 10th International Conference on Semantic Systems. ACM, 32--40. Google ScholarDigital Library
J Ross Quinlan. 2014. C4. 5: programs for machine learning. (2014).Google Scholar
Dan Steinberg and Phillip Colla. 2009. CART: classification and regression trees. The top ten algorithms in data mining 9 (2009), 179.Google ScholarCross Ref
Johan AK Suykens, Tony Van Gestel, and Jos De Brabanter. 2002. Least squares support vector machines. World Scientific.Google Scholar
Jiao Tao, Evren Sirin, Jie Bao, and Deborah L McGuinness. 2010. Extending OWL with Integrity Constraints. Description Logics 573 (2010).Google Scholar
Giri Kumar Tayi and Donald P Ballou. 1998. Examining Data Quality. Commun. ACM 41, 2 (1998), 54--57. Google ScholarDigital Library
Raphael Troncy and Giuseppe Rizzo et al. 2017. 3cixty: Building Comprehensive Knowledge Bases for City Exploration. Web Semantics: Science, Services and Agents on the World Wide Web 46-47, Supplement C (2017), 2 -- 13. Google ScholarDigital Library
WEKA. 2013. Weka Manual for Version 3-7-8. Technical Report. WEKA. https://pdfs.semanticscholar.org/d617/d41097bdf97d994d1481adbcfe0c05a51696.pdfGoogle Scholar

Index Terms

RDF shape induction using knowledge base profiling
1. Computing methodologies
  1. Artificial intelligence
    1. Knowledge representation and reasoning
2. Information systems
  1. Data management systems
    1. Information integration
      1. Data cleaning

Recommendations

The RDF foundry: call for an initiative to build enhanced RDF resources for biological data integration
WIMS '11: Proceedings of the International Conference on Web Intelligence, Mining and Semantics

Currently, the OBO Foundry plays an important role by setting guidelines to formalise the concepts within the biomedical domain. The ontologies within the OBO Foundry are usually represented in the OBO ontology language. While being human-readable, this ...
Read More
The role of reasoning for RDF validation
SEMANTICS '15: Proceedings of the 11th International Conference on Semantic Systems

For data practitioners embracing the world of RDF and Linked Data, the openness and flexibility is a mixed blessing. For them, data validation according to predefined constraints is a much sought-after feature, particularly as this is taken for granted ...
Read More
Extended RDF: Computability and complexity issues

ERDF stable model semantics is a recently proposed semantics for ERDF ontologies and a faithful extension of RDFS semantics on RDF graphs. In this paper, we elaborate on the computability and complexity issues of the ERDF stable model semantics. Based ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing
April 2018
2327 pages
ISBN:9781450351911
DOI:10.1145/3167132
Conference Chairs:
Hisham M. Haddad
Kennesaw State University
,
Roger L. Wainwright
University of Tulsa
,
Richard Chbeir
University of Pau & Pays Adour, France
Copyright © 2018 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 9 April 2018
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
RDF shape
data quality
knowledge base
machine learning
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate1,650of6,669submissions,25%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 176
  Total Downloads
- Downloads (Last 12 months)21
- Downloads (Last 6 weeks)1
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

RDF shape induction using knowledge base profiling

SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

The RDF foundry: call for an initiative to build enhanced RDF resources for biological data integration

The role of reasoning for RDF validation

Extended RDF: Computability and complexity issues

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

RDF shape induction using knowledge base profiling

SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing

ABSTRACT

References

Cited By

Index Terms

Recommendations

The RDF foundry: call for an initiative to build enhanced RDF resources for biological data integration

The role of reasoning for RDF validation

Extended RDF: Computability and complexity issues

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media