skip to main content
10.1145/2660517.2660525acmotherconferencesArticle/Chapter ViewAbstractPublication PagesfseConference Proceedingsconference-collections
research-article

Representing dataset quality metadata using multi-dimensional views

Published:04 September 2014Publication History

ABSTRACT

Data quality is commonly defined as fitness for use. The problem of identifying quality of data is faced by many data consumers. Data publishers often do not have the means to identify quality problems in their data. To make the task for both stakeholders easier, we have developed the Dataset Quality Ontology (daQ). daQ is a core vocabulary for representing the results of quality benchmarking of a linked dataset. It represents quality metadata as multi-dimensional and statistical observations using the Data Cube vocabulary. Quality metadata are organised as a self-contained graph, which can, e.g., be embedded into linked open datasets. We discuss the design considerations, give examples for extending daQ by custom quality metrics, and present use cases such as analysing data versions, browsing datasets by quality, and link identification. We finally discuss how data cube visualisation tools enable data publishers and consumers to analyse better the quality of their data.

References

  1. Alexander, K. et al. Describing Linked Datasets. On the Design and Usage of voiD, the "Vocabulary of Interlinked Datasets". In: Linked Data on the Web (LDOW). (Madrid, Spain, 20th Apr. 2009). Ed. by C. Bizer et al. CEUR Workshop Proceedings 538. Aachen, Apr. 2009. http://CEUR-WS.org/Vol-538.Google ScholarGoogle Scholar
  2. Alexander, K. et al. Describing Linked Datasets with the VoID Vocabulary. W3C Interest Group Note. World Wide Web Consortium (W3C), 3rd Mar. 2011. http://www.w3.org/TR/2011/NOTE-void-20110303/.Google ScholarGoogle Scholar
  3. Attard, J. et al. Ontology-based Situation Recognition for Context-aware Systems. In: Proceedings of the 9th International Conference on Semantic Systems (I-SEMANTICS). (Graz, Austria, 4th-6th Sept. 2013). Ed. by M. Sabou et al. New York, NY, USA: ACM, 2013, pp. 113--120. http://dl.acm.org/citation.cfm?id=2506182. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Auer, S. et al. Managing the life-cycle of Linked Data with the LOD2 Stack. In: The Semantic Web (Part II). 11th International Semantic Web Conference (ISWC). (Boston, MA, USA, 11th-15th Nov. 2012). Ed. by P. Cudré-Mauroux et al. LNCS 7650. Springer, 2012. http://iswc2012.semanticweb.org/sites/default/files/76500001.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Berners-Lee, T. et al. Tabulator: Exploring and Analyzing linked data on the Semantic Web. English. In: Proceedings of the 3rd International Semantic Web User Interaction Workshop (SWUI06). Nov. 2006.Google ScholarGoogle Scholar
  6. Bizer, C. Quality-Driven Information Filtering in the Context of Web-Based Information Systems. PhD thesis. FU Berlin, Mar. 2007. http://www.diss.fu-berlin.de/diss/receive/FUDISS_thesis_000000002736.Google ScholarGoogle Scholar
  7. Carroll, J. J. et al. Named Graphs, Provenance and Trust. In: Proceedings of the 14th WWW conference. (Chiba, Japan, 10th-14th May 2005). Ed. by A. Ellis, T. Hagino. ACM Press, 2005, pp. 613--622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Carroll, J. J. et al. Semantic Web Publishing using Named Graphs. In: ISWC Workshop on Trust, Security, and Reputation on the Semantic Web. Ed. by J. Golbeck et al. Vol. 127. CEUR Workshop Proceedings. CEUR-WS.org, 9th May 2005. http://dblp.uni-trier.de/db/conf/semweb/iswc2004trust.html#CarrollBHS04.Google ScholarGoogle Scholar
  9. Crosby, P. B. Quality is Free. The Art of Making Quality Certain. Mentor book. McGraw-Hill, 1979. http://books.google.ie/books?id=bR_LnQEACAAJ.Google ScholarGoogle Scholar
  10. Cyganiak, R., Reynolds, D., Tennison, J. The RDF Data Cube Vocabulary. W3C Recommendation. World Wide Web Consortium (W3C), 16th Jan. 2014. http://www.w3.org/TR/2014/REC-vocab-data-cube-20140116/.Google ScholarGoogle Scholar
  11. Debattista, J., Lange, C., Auer, S. daQ, an Ontology for Dataset Quality Information. In: Linked Data on the Web (LDOW). (Seoul, 8th Apr. 2014). Ed. by C. Bizer et al. 2014. http://events.linkeddata.org/ldow2014/.Google ScholarGoogle Scholar
  12. Ermilov, I. et al. Linked Open Data Statistics: Collection and Exploitation. In: Proceedings of the 4th Conference on Knowledge Engineering and Semantic Web. 2013. http://svn.aksw.org/papers/2013/KESW_LODStats_Demo/public.pdf.Google ScholarGoogle Scholar
  13. Flemming, A. Quality characteristics of linked data publishing datasources. MA thesis. Humboldt-Universität zu Berlin, Institut für Informatik, 2011.Google ScholarGoogle Scholar
  14. Fürber, C., Hepp, M. Towards a Vocabulary for Data Quality Management in Semantic Web Architectures. In: Proceedings of the 1st International Workshop on Linked Web Data Management (LDWM). (Uppsala, Sweden). New York, NY, USA: ACM, 2011, pp. 1--8. http://doi.acm.org/10.1145/1966901.1966903. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Harth, A. VisiNav: A system for visual search and navigation on web data. In: Web Semantics: Science, Services and Agents on the World Wide Web 8(4) (2010). Semantic Web Challenge 2009 User Interaction in Semantic Web research, pp. 348--354. http://www.sciencedirect.com/science/article/pii/S1570826810000600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Heim, P., Ziegler, J., Lohmann, S. gFacet: A Browser for the Web of Data. In: Proceedings of the International Workshop on Interacting with Multimedia Content in the Social Semantic Web (IMC-SSW 2008). Vol. 417. CEUR Workshop Proceedings. Aachen, 2008, pp. 49--58. http://CEUR-WS.org/Vol-417.Google ScholarGoogle Scholar
  17. Hogan, A. et al. An empirical survey of Linked Data conformance. In: Web Semantics: Science, Services and Agents on the World Wide Web 14(0) (2012): Special Issue on Dealing with the Messiness of the Web of Data, pp. 14--44. http://www.sciencedirect.com/science/article/pii/S1570826812000352. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Hogan, A. et al. Searching and browsing Linked Data with SWSE: The Semantic Web Search Engine. In: Web Semantics: Science, Services and Agents on the World Wide Web 9(4) (2011): Special issue on Semantic Search, pp. 365--401. http://www.sciencedirect.com/science/article/pii/S1570826811000473. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Juran, J. M. Juran's Quality Control Handbook. 4th ed. McGraw-Hill, 1974. http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/0070331766.Google ScholarGoogle Scholar
  20. Maali, F., Erickson, J., Archer, P. Data Catalog Vocabulary (DCAT). W3C Recommendation. World Wide Web Consortium (W3C), 16th Jan. 2014. http://www.w3.org/TR/2014/REC-vocab-dcat-20140116/.Google ScholarGoogle Scholar
  21. Ngonga Ngomo, A.-C., Auer, S. Limes -- A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data. In: Proceedings of the 22nd joint conference on artificial intelligence (IJCAI); Volume Three. AAAI Press, 2011, pp. 2312--2317. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Pirsig, R. Zen and the Art of Motorcycle Maintenance: An Inquiry Into Values. Essence (Philosophy). Vintage, 1974. http://books.google.ie/books?id=M69poeV1UhoC.Google ScholarGoogle Scholar
  23. Volz, J. et al. Discovering and Maintaining Links on the Web of Data. In: ISWC. Vol. 5823. LNCS. Springer, 2009, pp. 650--665. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Zaveri, A. et al. Quality Assessment Methodologies for Linked Open Data. In: Semantic Web Journal (2014). This article is still under review. http://www.semantic-web-journal.net/content/quality-assessment-methodologies-linked-data-survey.Google ScholarGoogle Scholar

Index Terms

  1. Representing dataset quality metadata using multi-dimensional views

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Other conferences
      SEM '14: Proceedings of the 10th International Conference on Semantic Systems
      September 2014
      161 pages
      ISBN:9781450329279
      DOI:10.1145/2660517

      Copyright © 2014 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 September 2014

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      SEM '14 Paper Acceptance Rate22of59submissions,37%Overall Acceptance Rate22of59submissions,37%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader