Abstract
With increased utilization of data within their operational and strategic processes, enterprises need to ensure data quality and accuracy. Data curation is a process that can ensure the quality of data and its fitness for use. Traditional approaches to curation are struggling with increased data volumes, and near real-time demands for curated data. In response, curation teams have turned to community crowd-sourcing and semi-automatedmetadata tools for assistance. This chapter provides an overview of data curation, discusses the business motivations for curating data and investigates the role of community-based data curation, focusing on internal communities and pre-competitive data collaborations. The chapter is supported by case studies from Wikipedia, The New York Times, Thomson Reuters, Protein Data Bank and ChemSpider upon which best practices for both social and technical aspects of community-driven data curation are described.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Davenport, T.H., Competing On Analytics, in Harvard Business Review. 2006. p. 98-107.
Wang, R. and D. Strong, Beyond Accuracy: What Data Quality Means to Data Consumers. Journal of Management Information Systems, 1996. 12(4): p. 5-33.
Knight, S.A. and J. Burn, Developing a Framework for Assessing Information Quality on the World Wide Web. Informing Science, 2005. 8: p. 159-172.
Ball, A., Preservation and Curation in Institutional Repositories. 2010, Digital Curation Centre.
Bourne, P. and J.McEntyre, Biocurators: Contributors to theWorld of Science. PLoS Comput Biol, 2006. 2(10): p. 142.
Uren, V., et al., Semantic Annotation for Knowledge Aanagement: Requirements and a Survey of the State of the Art. Web Semantics: Science, Services and Agents on the World Wide Web, 2006. 4(1): p. 14-28.
Appelt, D.E. and D.J. Israel, Introduction to Information Extraction Technology. in International Joint Conference on Artificial Intelligence. 1999.
Ekins, S. and A.J.Williams, Reaching out to Collaborators: Crowdsourcing for Pharmaceutical Research. Pharmaceutical Research. 27(3): p. 393-5.
Bingham, A. and S. Ekins, Competitive Collaboration in the Pharmaceutical and Biotechnology Industry. Drug Discovery Today, 2009. 14(23-24): p. 1079-81.
Barnes, M.R., et al., Lowering Industry Firewalls: Pre-competitive Informatics Initiatives in Drug Discovery. Nature Reviews Drug Discovery, 2009. 8(9): p. 701-708.
Giles, J., Internet Encyclopaedias go Head to Head. Nature, 2005. 438(7070): p. 900-901.
Emigh, W. and S.C. Herring. Collaborative Authoring on the Web: A Genre Analysis of Online Encyclopedias. in System Sciences, 2005. HICSS 2005. Proceedings of the 38th Annual Hawaii International Conference on System Sciences.
Mons, B., et al., Calling on a Million Minds for Community Annotation in WikiProteins. Genome Biology, 2008. 9(5): R89.
Stvilia, B., et al., Information QualityWork Organization inWikipedia. Journal of the American Society for Information Science and Technology, 2008. 59(6): p. 983-1001.
Kollock, P. and M. Smith, The Economies of Online Cooperation: Gifts and Public Goods in Cyberspace, in Communities in Cyberspace. 1999, Routledge. p. 220-239.
Bryant, S., A. Forte, and A. Bruckman. Becoming Wikipedian: Transformation of Participation in a Collaborative Online Encyclopedia. in GROUP ’05: Proceedings of the 2005 international ACM SIGGROUP Conference on Supporting Group Work. 2005. Sanibel Island, Florida, USA: ACM.
Viegas, F., et al. Talk Before You Type: Coordination inWikipedia. in System Sciences, 2007. HICSS 2007. Proceedings of the 40th Annual Hawaii International Conference on System Sciences.
Bizer, C., et al., DBpedia - A Crystallization Point for the Web of Data. Web Semantics: Science, Services and Agents on the World Wide Web, 2009. 7(3): p. 154-165.
Hepp, M., K. Siorpaes, and D. Bachlechner, Harvesting Wiki Consensus: Using Wikipedia Entries as Vocabulary for Knowledge Management. IEEE Internet Computing, 2007. 11(5): p. 54-65.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Curry, E., Freitas, A., O’Riáin, S. (2010). The Role of Community-Driven Data Curation for Enterprises. In: Wood, D. (eds) Linking Enterprise Data. Springer, Boston, MA. https://doi.org/10.1007/978-1-4419-7665-9_2
Download citation
DOI: https://doi.org/10.1007/978-1-4419-7665-9_2
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4419-7664-2
Online ISBN: 978-1-4419-7665-9
eBook Packages: Computer ScienceComputer Science (R0)