Abstract
Sharing structured data today requires standardizing upon a single schema, then mapping and cleaning all of the data. This results in a single queriable mediated data instance. However, for settings in which structured data is being collaboratively authored by a large community, e.g., in the sciences, there is often a lack of consensus about how it should be represented, what is correct, and which sources are authoritative. Moreover, such data is seldom static: it is frequently updated, cleaned, and annotated. The ORCHESTRA collaborative data sharing system develops a new architecture and consistency model for such settings, based on the needs of data sharing in the life sciences. In this paper we describe the basic architecture and implementation of the ORCHESTRA system, and summarize some of the open challenges that arise in this setting.
- S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarDigital Library
- L. Antova, C. Koch, and D. Olteanu. 10106 worlds and beyond: Efficient representation and processing of incomplete information. In ICDE, 2007.Google Scholar
- A. Balmin, V. Hristidis, and Y. Papakonstantinou. ObjectRank: Authority-based keyword search in databases. In VLDB, 2004. Google ScholarDigital Library
- O. Benjelloun, A.D. Sarma, A.Y. Halevy, and J. Widom. ULDBs: Databases with uncertainty and lineage. In VLDB, 2006. Google ScholarDigital Library
- P. A. Bernstein, F. Giunchiglia, A. Kementsietsidis, J. Mylopoulos, L. Serafini, and I. Zaihrayeu. Data management for peer-to-peer computing: A vision. In WebDB '02, June 2002.Google Scholar
- P. Buneman, S. Khanna, and W.C. Tan. Why and where: A characterization of data provenance. In ICDT, 2001. Google ScholarDigital Library
- L. Chiticariu and W.-C. Tan. Debugging schema mappings with routes. In VLDB, 2006. Google ScholarDigital Library
- K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7:551--585, 2006. Google ScholarDigital Library
- Y. Cui. Lineage Tracing in Data Warehouses. PhD thesis, Stanford University, 2001. Google ScholarDigital Library
- F. Dabek, M.F. Kaashoek, D. Karger, R. Morris, and I. Stoica. Widearea cooperative storage with CFS. In SOSP, 2001. Google ScholarDigital Library
- N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, 2004. Google ScholarDigital Library
- U. Dayal and P.A. Bernstein. On the correct translation of update operations on relational views. TODS, 7(3), 1982. Google ScholarDigital Library
- A. Deutsch, L. Popa, and V. Tannen. Query reformulation with constraints. SIGMOD Record, 35(1), 2006. Google ScholarDigital Library
- A. Deutsch and V. Tannen. Reformulation of XML queries and constraints. In ICDT, 2003. Google ScholarDigital Library
- O.M. Duschka and M.R. Genesereth. Answering recursive queries using views. In PODS, 1997. Google ScholarDigital Library
- R. Fagin, P. Kolaitis, R.J. Miller, and L. Popa. Data exchange: Semantics and query answering. Theoretical Computer Science, 336:89--124, 2005. Google ScholarDigital Library
- A. Fuxman, P.G. Kolaitis, R.J. Miller, and W.-C. Tan. Peer data exchange. In PODS, 2005. Google ScholarDigital Library
- A. Fuxman and R.J. Miller. First-order query rewriting for inconsistent databases. J. Comput. Syst. Sci., 73(4), 2007. Google ScholarDigital Library
- T.J. Green, G. Karvounarakis, Z.G. Ives, and V. Tannen. Update exchange with mappings and provenance. In VLDB, 2007. Amended version available as Univ. of Pennsylvania report MS-CIS-07-26. Google ScholarDigital Library
- T.J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, 2007. Google ScholarDigital Library
- T.J. Green, N. Taylor, G. Karvounarakis, O. Biton, Z. Ives, and V. Tannen. ORCHESTRA: Facilitating collaborative data sharing. In SIGMOD, 2007. Demonstration description. Google ScholarDigital Library
- L. Guo, F. Shao, C. Botev, and J. Shanmugasundaram. XRANK: Ranked keyword search over XML documents. In SIGMOD, 2003. Google ScholarDigital Library
- A. Gupta, I.S. Mumick, and V.S. Subrahmanian. Maintaining views incrementally. In SIGMOD, 1993. Google ScholarDigital Library
- A.Y. Halevy. Answering queries using views: A survey. VLDB J., 10(4), 2001. Google ScholarDigital Library
- A.Y. Halevy, Z.G. Ives, D. Suciu, and I. Tatarinov. Schema mediation in peer data management systems. In ICDE, March 2003.Google ScholarCross Ref
- V. Hristidis and Y. Papakonstantinou. Discover: Keyword search in relational databases. In VLDB, 2002. Google ScholarDigital Library
- R. Huebsch, J.M. Hellerstein, N. Lanham, B.T. Loo, S. Shenker, and I. Stoica. Quering the Internet with PIER. In VLDB, 2003. Google ScholarDigital Library
- Z. Ives, N. Khandelwal, A. Kapur, and M. Cakir. ORCHESTRA: Rapid, collaborative sharing of dynamic data. In CIDR, January 2005.Google Scholar
- V. Kacholia, S. Pandit, S. Chakrabarti, S. Sudarshan, R. Desai, and H. Karambelkar. Bidirectional expansion for keyword search on graph databases. In VLDB, 2005. Google ScholarDigital Library
- G. Karvounarakis and Z.G. Ives. Bidirectional mappings for data and update exchange. In WebDB, 2008.Google Scholar
- G. Kasneci, F.M. Suchanek, G. Ifrim, M. Ramanath, and G. Weikum. Naga: Searching and ranking knowledge. In ICDE, 2008.Google ScholarDigital Library
- A. Kementsietsidis, M. Arenas, and R.J. Miller. Mapping data in peer-to-peer systems: Semantics and algorithmic issues. In SIGMOD, June 2003. Google ScholarDigital Library
- H.T. Kung and J.T. Robinson. On optimistic methods for concurrency control. TODS, 6(2), 1981. Google ScholarDigital Library
- L.V.S. Lakshmanan, N. Leone, R. Ross, and V.S. Subrahmanian. Probview: a flexible probabilistic database system. ACM Trans. Database Syst., 22(3), 1997. Google ScholarDigital Library
- M. Lenzerini. Tutorial - data integration: A theoretical perspective. In PODS, 2002. Google ScholarDigital Library
- L. Libkin. Data exchange and incomplete information. In PODS, 2006. Google ScholarDigital Library
- D. Narayanan, A. Donnelly, R. Mortier, and A. Rowstron. Delay aware querying with Seaweed. In VLDB, 2006. Google ScholarDigital Library
- L. Popa, Y. Velegrakis, R.J. Miller, M.A. Hernández, and R. Fagin. Translating web data. In VLDB, 2002. Google ScholarDigital Library
- A. Rowstron and P. Druschel. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In Middleware, pages 329--350, Nov. 2001. Google ScholarDigital Library
- P.P. Talukdar, M. Jacob, M.S. Mehmood, K. Crammer, Z.G. Ives, F. Pereira, and S. Guha. Learning to create data-integrating queries. In VLDB, 2008. Google ScholarDigital Library
- N.E. Taylor and Z.G. Ives. Reconciling while tolerating disagreement in collaborative data sharing. In SIGMOD, 2006. Google ScholarDigital Library
Index Terms
- The ORCHESTRA Collaborative Data Sharing System
Recommendations
Reconciling while tolerating disagreement in collaborative data sharing
SIGMOD '06: Proceedings of the 2006 ACM SIGMOD international conference on Management of dataIn many data sharing settings, such as within the biological and biomedical communities, global data consistency is not always attainable: different sites' data may be dirty, uncertain, or even controversial. Collaborators are willing to share their ...
Collaborative Practices with Structured Data: Do Tools Support What Users Need?
CHI '19: Proceedings of the 2019 CHI Conference on Human Factors in Computing SystemsCollaborative work with data is increasingly common and spans a broad range of activities - from creating or analysing data in a team, to sharing it with others, to reusing someone else's data in a new context. In this paper, we explore collaboration ...
Collaborative Support for Community Data Sharing
WI-IAT '08: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology - Volume 03The Semantic Web aims to create a web of data where contents can be easily discovered and integrated using metadata. Many ontologies have been proposed over the years in different domains, thus producing a semantic heterogeneity that is difficult to ...
Comments