skip to main content
10.1145/3167132.3167341acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

RDF shape induction using knowledge base profiling

Published:09 April 2018Publication History

ABSTRACT

Knowledge Graphs (KGs) are becoming the core of most artificial intelligent and cognitive applications. Popular KGs such as DBpedia and Wikidata have chosen the RDF data model to represent their data. Despite the advantages, there are challenges in using RDF data, for example, data validation. Ontologies for specifying domain conceptualizations in RDF data are designed for entailments rather than validation. Most ontologies lack the granular information needed for validating constraints. Recent work on RDF Shapes and standardization of languages such as SHACL and ShEX provide better mechanisms for representing integrity constraints for RDF data. However, manually creating constraints for large KGs is still a tedious task. In this paper, we present a data driven approach for inducing integrity constraints for RDF data using data profiling. Those constraints can be combined into RDF Shapes and can be used to validate RDF graphs. Our method is based on machine learning techniques to automatically generate RDF shapes using profiled RDF data as features. In the experiments, the proposed approach achieved 97% precision in deriving RDF Shapes with cardinality constraints for a subset of DBpedia data.

References

  1. Ziawasch Abedjan and Felix Naumann. 2013. Improving RDF Data Through Association Rule Mining. Datenbank-Spektrum 13, 2 (01 Jul 2013), 111--120.Google ScholarGoogle Scholar
  2. Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of databases: the logical level. (1995). Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Rakesh Agrawal, Heikki Mannila, Ramakrishnan Srikant, Hannu Toivonen, A Inkeri Verkamo, et al. 1996. Fast discovery of association rules. Advances in knowledge discovery and data mining 12, 1 (1996), 307--328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Adrien Basse, Fabien Gandon, Isabelle Mirbel, and Moussa Lo. 2010. DFS-based frequent graph pattern extraction to characterize the content of RDF Triple Stores. In Web Science Conference 2010 (WebSci10).Google ScholarGoogle Scholar
  5. Christopher M Bishop. 2006. Pattern recognition and machine learning. springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Peter Bloem and Gerben K. D. De Vries. 2014. Machine Learning on Linked Data, a Position Paper. In Proceedings of the 1st International Conference on Linked Data for Knowledge Discovery - Volume 1232 (LD4KD'14). CEUR-WS.org, Aachen, Germany, Germany, 64--68. http://dl.acm.org/citation.cfm?id=3053827.3053834 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Eva Blomqvist, Ziqi Zhang, Anna Lisa Gentile, Isabelle Augenstein, and Fabio Ciravegna. 2013. Statistical knowledge patterns for characterising linked data. In Proceedings of the 4th International Conference on Ontology and Semantic Web Patterns-Volume 1188. CEUR-WS. org, 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Lorenz Bühmann, Daniel Fleischhacker, Jens Lehmann, Andre Melo, and Johanna Völker. 2014. Inductive lexical learning of class expressions. In International Conference on Knowledge Engineering and Knowledge Management. Springer, 42--53.Google ScholarGoogle ScholarCross RefCross Ref
  9. Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321--357. Google ScholarGoogle ScholarCross RefCross Ref
  10. Luc De Raedt, Tias Guns, and Siegfried Nijssen. 2010. Constraint programming for data mining and machine learning. In Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence (AAAI-10). 1671--1675. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. David A Freedman. 2009. Statistical models: theory and practice. cambridge university press.Google ScholarGoogle Scholar
  12. Jerome H Friedman. 2001. Greedy function approximation: a gradient boosting machine. Annals of statistics (2001), 1189--1232.Google ScholarGoogle Scholar
  13. Johannes Fürnkranz and Peter A Flach. 2005. Roc 'n' rule learning - towards a better understanding of covering algorithms. Machine Learning 58, 1 (2005), 39--77. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Lise Getoor and Ben Taskar. 2007. Introduction to statistical relational learning. MIT press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. TinKamHo. 1995. Random decision forests. In Document Analysis and Recognition, 1995., Proceedings of the Third International Conference on, Vol. 1. IEEE, 278--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Aidan Hogan, Andreas Harth, Alexandre Passant, Stefan Decker, and Axel Polleres. 2010. Weaving the Pedantic Web. In Proceedings of the Linked Data on the Web (LDOW 2010), Vol. 628. CEUR Workshop Proceedings.Google ScholarGoogle Scholar
  17. Theodore Johnson. 2009. Data Profiling. In Encyclopedia of Database Systems, LING LIU and M. TAMER ÖZSU (Eds.). Springer US, Boston, MA, 604--608.Google ScholarGoogle Scholar
  18. Hassan Khosravi and Bahareh Bina. 2010. A Survey on Statistical Relational Learning.. In Canadian Conference on AI. Springer, 256--268. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Holger Knublauch and Dimitris Kontokostas. 2017. W3C Shapes Constraint Language (SHACL). (July 2017). https://www.w3.org/TR/shacl/Google ScholarGoogle Scholar
  20. Daphne Koller and Nir Friedman. 2009. Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning. The MIT Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Jens Lehmann, Robert Isele, Max Jakob, Anja Jentzsch, Dimitris Kontokostas, Pablo N Mendes, Sebastian Hellmann, Mohamed Morsey, Patrick Van Kleef, Sören Auer, et al. 2015. DBpedia-a large-scale, multilingual knowledge base extracted from Wikipedia. Semantic Web 6, 2 (2015), 167--195.Google ScholarGoogle ScholarCross RefCross Ref
  22. Stephen W Liddle, David W Embley, and Scott N Woodfield. 1993. Cardinality constraints in semantic data models. Data & Knowledge Engineering 11, 3 (1993), 235--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Deborah L McGuinness, Frank Van Harmelen, et al. 2004. OWL web ontology language overview. W3C recommendation 10, 10 (2004), 2004.Google ScholarGoogle Scholar
  24. Nandana Mihindukulasooriya, María Poveda-Villalón, Raúl García-Castro, and Asunción Gómez-Pérez. 2015. Loupe-An Online Tool for Inspecting Datasets in the Linked Data Cloud. In Demo at the 14th International Semantic Web Conference. Bethlehem, USA.Google ScholarGoogle Scholar
  25. Thomas Neumann and Guido Moerkotte. 2011. Characteristic sets: Accurate cardinality estimation for RDF queries with multiple joins. In Data Engineering (ICDE), 2011 IEEE 27th International Conference on. IEEE, 984--994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Eric Prud'hommeaux, Iovka Boneva, Jose Emilio Labra-Gayo, and Gregg Kellogg. 2017. Shape Expressions Language 2.0. (July 2017). http://shex.io/shex-semantics/Google ScholarGoogle Scholar
  27. Eric Prud'hommeaux, Jose Emilio Labra Gayo, and Harold Solbrig. 2014. Shape expressions: an RDF validation and transformation language. In Proceedings of the 10th International Conference on Semantic Systems. ACM, 32--40. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J Ross Quinlan. 2014. C4. 5: programs for machine learning. (2014).Google ScholarGoogle Scholar
  29. Dan Steinberg and Phillip Colla. 2009. CART: classification and regression trees. The top ten algorithms in data mining 9 (2009), 179.Google ScholarGoogle ScholarCross RefCross Ref
  30. Johan AK Suykens, Tony Van Gestel, and Jos De Brabanter. 2002. Least squares support vector machines. World Scientific.Google ScholarGoogle Scholar
  31. Jiao Tao, Evren Sirin, Jie Bao, and Deborah L McGuinness. 2010. Extending OWL with Integrity Constraints. Description Logics 573 (2010).Google ScholarGoogle Scholar
  32. Giri Kumar Tayi and Donald P Ballou. 1998. Examining Data Quality. Commun. ACM 41, 2 (1998), 54--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Raphael Troncy and Giuseppe Rizzo et al. 2017. 3cixty: Building Comprehensive Knowledge Bases for City Exploration. Web Semantics: Science, Services and Agents on the World Wide Web 46-47, Supplement C (2017), 2 -- 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. WEKA. 2013. Weka Manual for Version 3-7-8. Technical Report. WEKA. https://pdfs.semanticscholar.org/d617/d41097bdf97d994d1481adbcfe0c05a51696.pdfGoogle ScholarGoogle Scholar

Index Terms

  1. RDF shape induction using knowledge base profiling

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        SAC '18: Proceedings of the 33rd Annual ACM Symposium on Applied Computing
        April 2018
        2327 pages
        ISBN:9781450351911
        DOI:10.1145/3167132

        Copyright © 2018 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 9 April 2018

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate1,650of6,669submissions,25%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader