ABSTRACT
In a number of application areas, distributed database systems can be used to provide persistent storage of data while providing efficient access for both local and remote data. With an increasing number of sites (computers) involved in a query, the probability of failure at query time increases. Recovery has previously only focused on database updates while query failures have been handled by complete restart of the query. This technique is not always applicable in the context of large queries and queries with deadlines. In this paper we present an approach for partial restart of queries that incurs minimal extra network traffic during query recovery. Based on results from experiments on an implementation of the partial restart technique in a distributed database system, we demonstrate its applicability and significant reduction of query cost in the presence of failures.
- M. N. Alpdemir et al. OGSA-DQP: a service for distributed querying on the Grid. In Proceedings of EDBT'2004, 2004.Google Scholar
- R. S. Barga et al. Recovery guarantees for internet applications. ACM Trans. Internet Techn., 4(3):289--328, 2004. Google ScholarDigital Library
- P. Bonnet and A. Tomasic. Partial answers for unavailable data sources. In Proceedings of FQAS'98, 1998. Google ScholarDigital Library
- R. Braumandl, M. Keidl, A. Kemper, D. Kossmann, A. Kreutz, S. Seltzsam, and K. Stocker. ObjectGlobe: ubiquitous query processing on the Internet. VLDB Journal, 10(1):48--71, 2001. Google ScholarDigital Library
- B. Chandramouli, C. N. Bond, S. Babu, and J. Yang. Query suspend and resume. In Proceedings of the SIGMOD'2007, 2007. Google ScholarDigital Library
- S. Chaudhuri, R. Kaushik, R. Ramamurthy, and A. Pol. Stop-and-restart style execution for long running decision support queries. In Proceedings of VLDB'2007, 2007. Google ScholarDigital Library
- S. Chaudhuri, R. Krishnamurthy, S. Potamianos, and K. Shim. Optimizing queries with materialized views. In Proceedings of ICDE'1995, 1995. Google ScholarDigital Library
- S. Dar, M. J. Franklin, B. T. Jónsson, D. Srivastava, and M. Tan. Semantic data caching and replacement. In Proceedings of VLDB'1996, 1996. Google ScholarDigital Library
- A. Gounaris et al. Adapting to changing resource performance in Grid query processing. In Proceedings of DMG'05, 2005. Google ScholarDigital Library
- J.-H. Hwang et al. High-availability algorithms for distributed stream processing. In Proceedings of ICDE'2005, 2005. Google ScholarDigital Library
- J.-H. Hwang et al. A cooperative, self-configuring high-availability solution for stream processing. In Proceedings of ICDE'2007, 2007.Google ScholarCross Ref
- N. Kabra and D. J. DeWitt. Efficient mid-query re-optimization of sub-optimal query execution plans. In Proceedings of SIGMOD'1998, 1998. Google ScholarDigital Library
- D. Kossmann. The state of the art in distributed query processing. ACM Computing Surveys, 32(4):422--469, 2000. Google ScholarDigital Library
- W. Labio et al. Efficient resumption of interrupted warehouse loads. In Proceedings of SIGMOD'2000, 2000. Google ScholarDigital Library
- Q. Ren, M. H. Dunham, and V. Kumar. Semantic caching and query processing. IEEE Trans. on Knowl. and Data Eng., 15(1):192--210, 2003. Google ScholarDigital Library
- A. N. Saharia and Y. M. Babad. Enhancing data warehouse performance through query caching. SIGMIS Database, 31(3):43--63, 2000. Google ScholarDigital Library
- J. Smith and P. Watson. Fault-tolerance in distributed query processing. In Proceedings of IDEAS'2005, 2005. Google ScholarDigital Library
- R. Wang, B. Salzberg, and D. B. Lomet. Log-based recovery for middleware servers. In Proceedings of SIGMOD'2007, 2007. Google ScholarDigital Library
- A. N. Wilschut and P. M. G. Apers. Dataflow query execution in a parallel main-memory environment. Distributed and Parallel Databases, 1(1):103--128, 1993. Google ScholarDigital Library
Index Terms
- PROQID: partial restarts of queries in distributed databases
Recommendations
A low-overhead recovery technique using quasi-synchronous checkpointing
ICDCS '96: Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)In this paper, we propose a quasi-synchronous checkpointing algorithm and a low-overhead recovery algorithm based on it. The checkpointing algorithm preserves process autonomy by allowing them to take checkpoints asynchronously and uses communication-...
A quasi-synchronous checkpointing algorithm that prevents contention for stable storage
Checkpointing and rollback recovery are established techniques for handling failures in distributed systems. Under synchronous checkpointing, each process involved in the distributed computation takes checkpoint almost simultaneously. This causes ...
Asynchronous recovery without using vector timestamps
A checkpoint of a process involved in a distributed computation is said to be useful if it is part of a consistent global checkpoint. In this paper, we present a quasi-synchronous checkpointing algorithm that makes every checkpoint useful. We also ...
Comments