ABSTRACT
Data quality is commonly defined as fitness for use. The problem of identifying quality of data is faced by many data consumers. Data publishers often do not have the means to identify quality problems in their data. To make the task for both stakeholders easier, we have developed the Dataset Quality Ontology (daQ). daQ is a core vocabulary for representing the results of quality benchmarking of a linked dataset. It represents quality metadata as multi-dimensional and statistical observations using the Data Cube vocabulary. Quality metadata are organised as a self-contained graph, which can, e.g., be embedded into linked open datasets. We discuss the design considerations, give examples for extending daQ by custom quality metrics, and present use cases such as analysing data versions, browsing datasets by quality, and link identification. We finally discuss how data cube visualisation tools enable data publishers and consumers to analyse better the quality of their data.
- Alexander, K. et al. Describing Linked Datasets. On the Design and Usage of voiD, the "Vocabulary of Interlinked Datasets". In: Linked Data on the Web (LDOW). (Madrid, Spain, 20th Apr. 2009). Ed. by C. Bizer et al. CEUR Workshop Proceedings 538. Aachen, Apr. 2009. http://CEUR-WS.org/Vol-538.Google Scholar
- Alexander, K. et al. Describing Linked Datasets with the VoID Vocabulary. W3C Interest Group Note. World Wide Web Consortium (W3C), 3rd Mar. 2011. http://www.w3.org/TR/2011/NOTE-void-20110303/.Google Scholar
- Attard, J. et al. Ontology-based Situation Recognition for Context-aware Systems. In: Proceedings of the 9th International Conference on Semantic Systems (I-SEMANTICS). (Graz, Austria, 4th-6th Sept. 2013). Ed. by M. Sabou et al. New York, NY, USA: ACM, 2013, pp. 113--120. http://dl.acm.org/citation.cfm?id=2506182. Google ScholarDigital Library
- Auer, S. et al. Managing the life-cycle of Linked Data with the LOD2 Stack. In: The Semantic Web (Part II). 11th International Semantic Web Conference (ISWC). (Boston, MA, USA, 11th-15th Nov. 2012). Ed. by P. Cudré-Mauroux et al. LNCS 7650. Springer, 2012. http://iswc2012.semanticweb.org/sites/default/files/76500001.pdf. Google ScholarDigital Library
- Berners-Lee, T. et al. Tabulator: Exploring and Analyzing linked data on the Semantic Web. English. In: Proceedings of the 3rd International Semantic Web User Interaction Workshop (SWUI06). Nov. 2006.Google Scholar
- Bizer, C. Quality-Driven Information Filtering in the Context of Web-Based Information Systems. PhD thesis. FU Berlin, Mar. 2007. http://www.diss.fu-berlin.de/diss/receive/FUDISS_thesis_000000002736.Google Scholar
- Carroll, J. J. et al. Named Graphs, Provenance and Trust. In: Proceedings of the 14th WWW conference. (Chiba, Japan, 10th-14th May 2005). Ed. by A. Ellis, T. Hagino. ACM Press, 2005, pp. 613--622. Google ScholarDigital Library
- Carroll, J. J. et al. Semantic Web Publishing using Named Graphs. In: ISWC Workshop on Trust, Security, and Reputation on the Semantic Web. Ed. by J. Golbeck et al. Vol. 127. CEUR Workshop Proceedings. CEUR-WS.org, 9th May 2005. http://dblp.uni-trier.de/db/conf/semweb/iswc2004trust.html#CarrollBHS04.Google Scholar
- Crosby, P. B. Quality is Free. The Art of Making Quality Certain. Mentor book. McGraw-Hill, 1979. http://books.google.ie/books?id=bR_LnQEACAAJ.Google Scholar
- Cyganiak, R., Reynolds, D., Tennison, J. The RDF Data Cube Vocabulary. W3C Recommendation. World Wide Web Consortium (W3C), 16th Jan. 2014. http://www.w3.org/TR/2014/REC-vocab-data-cube-20140116/.Google Scholar
- Debattista, J., Lange, C., Auer, S. daQ, an Ontology for Dataset Quality Information. In: Linked Data on the Web (LDOW). (Seoul, 8th Apr. 2014). Ed. by C. Bizer et al. 2014. http://events.linkeddata.org/ldow2014/.Google Scholar
- Ermilov, I. et al. Linked Open Data Statistics: Collection and Exploitation. In: Proceedings of the 4th Conference on Knowledge Engineering and Semantic Web. 2013. http://svn.aksw.org/papers/2013/KESW_LODStats_Demo/public.pdf.Google Scholar
- Flemming, A. Quality characteristics of linked data publishing datasources. MA thesis. Humboldt-Universität zu Berlin, Institut für Informatik, 2011.Google Scholar
- Fürber, C., Hepp, M. Towards a Vocabulary for Data Quality Management in Semantic Web Architectures. In: Proceedings of the 1st International Workshop on Linked Web Data Management (LDWM). (Uppsala, Sweden). New York, NY, USA: ACM, 2011, pp. 1--8. http://doi.acm.org/10.1145/1966901.1966903. Google ScholarDigital Library
- Harth, A. VisiNav: A system for visual search and navigation on web data. In: Web Semantics: Science, Services and Agents on the World Wide Web 8(4) (2010). Semantic Web Challenge 2009 User Interaction in Semantic Web research, pp. 348--354. http://www.sciencedirect.com/science/article/pii/S1570826810000600. Google ScholarDigital Library
- Heim, P., Ziegler, J., Lohmann, S. gFacet: A Browser for the Web of Data. In: Proceedings of the International Workshop on Interacting with Multimedia Content in the Social Semantic Web (IMC-SSW 2008). Vol. 417. CEUR Workshop Proceedings. Aachen, 2008, pp. 49--58. http://CEUR-WS.org/Vol-417.Google Scholar
- Hogan, A. et al. An empirical survey of Linked Data conformance. In: Web Semantics: Science, Services and Agents on the World Wide Web 14(0) (2012): Special Issue on Dealing with the Messiness of the Web of Data, pp. 14--44. http://www.sciencedirect.com/science/article/pii/S1570826812000352. Google ScholarDigital Library
- Hogan, A. et al. Searching and browsing Linked Data with SWSE: The Semantic Web Search Engine. In: Web Semantics: Science, Services and Agents on the World Wide Web 9(4) (2011): Special issue on Semantic Search, pp. 365--401. http://www.sciencedirect.com/science/article/pii/S1570826811000473. Google ScholarDigital Library
- Juran, J. M. Juran's Quality Control Handbook. 4th ed. McGraw-Hill, 1974. http://www.amazon.com/exec/obidos/redirect?tag=citeulike07-20&path=ASIN/0070331766.Google Scholar
- Maali, F., Erickson, J., Archer, P. Data Catalog Vocabulary (DCAT). W3C Recommendation. World Wide Web Consortium (W3C), 16th Jan. 2014. http://www.w3.org/TR/2014/REC-vocab-dcat-20140116/.Google Scholar
- Ngonga Ngomo, A.-C., Auer, S. Limes -- A Time-Efficient Approach for Large-Scale Link Discovery on the Web of Data. In: Proceedings of the 22nd joint conference on artificial intelligence (IJCAI); Volume Three. AAAI Press, 2011, pp. 2312--2317. Google ScholarDigital Library
- Pirsig, R. Zen and the Art of Motorcycle Maintenance: An Inquiry Into Values. Essence (Philosophy). Vintage, 1974. http://books.google.ie/books?id=M69poeV1UhoC.Google Scholar
- Volz, J. et al. Discovering and Maintaining Links on the Web of Data. In: ISWC. Vol. 5823. LNCS. Springer, 2009, pp. 650--665. Google ScholarDigital Library
- Zaveri, A. et al. Quality Assessment Methodologies for Linked Open Data. In: Semantic Web Journal (2014). This article is still under review. http://www.semantic-web-journal.net/content/quality-assessment-methodologies-linked-data-survey.Google Scholar
Index Terms
- Representing dataset quality metadata using multi-dimensional views
Recommendations
Quality views: capturing and exploiting the user perspective on data quality
VLDB '06: Proceedings of the 32nd international conference on Very large data basesThere is a growing awareness among life scientists of the variability in quality of the data in public repositories, and of the threat that poor data quality poses to the validity of experimental results. No standards are available, however, for ...
An infrastructure for acquiring high quality semantic metadata
ESWC'06: Proceedings of the 3rd European conference on The Semantic Web: research and applicationsBecause metadata that underlies semantic web applications is gathered from distributed and heterogeneous data sources, it is important to ensure its quality (i.e., reduce duplicates, spelling errors, ambiguities). However, current infrastructures that ...
Metadata quality in digital repositories: Empirical results from the cross-domain transfer of a quality assurance process
Metadata quality presents a challenge faced by many digital repositories. There is a variety of proposed quality assurance frameworks applied in repositories that are deployed in various contexts. Although studies report that there is an improvement of ...
Comments