Abstract
This paper describes a parallel database load prototype for Digital's Rdb database product. The prototype takes a dataflow approach to database parallelism. It includes an explorer that discovers and records the cluster configuration in a database, a client CUI interface that gathers the load job description from the user and from the Rdb catalogs, and an optimizer that picks the best parallel execution plan and records it in a web data structure. The web describes the data operators, the dataflow rivers among them, the binding of operators to processes, processes to processors, and files to discs and tapes. This paper describes the optimizer's cost-based hierarchical optimization strategy in some detail. The prototype executes the web's plan by spawning a web manager process at each node of the cluster. The managers create the local executor processes, and orchestrate startup, phasing, checkpoint, and shutdown. The execution processes perform one or more operators. Data flows among the operators are via memory-to-memory streams within a node, and via web-manager multiplexed tcp/ip streams among nodes. The design of the transaction and checkpoint/restart mechanisms are also described. Preliminary measurements indicate that this design will give excellent scaleups.
- [DASH] M. Heinrich, et. al., "The Performance Impact and Flexibility of the Stanford FLASH Multiprocessor," 6th ASPLOS, Oct. 1994. Google ScholarDigital Library
- [DeWitt 1] D. DeWitt, et. al., "GAMMA - A High Performance Dataflow Database Machine", Proc. 12th VLDB, Chicago, Sept. 1986. Google ScholarDigital Library
- [DeWitt 2] D. DeWitt, "The Wisconsin Benchmark, Past, Present, and Future", in The Benchmark Handbook for Database and Transaction Processing Systems. 2nd ed., Morgan Kaufmann, San Mateo 1993.Google Scholar
- [Englert] S. Englert, "Performance Benefits of Parallel Query Execution and Mixed Workload Support in NonStop SQL Release 2", Tandem Systems Review, V.6.2, Oct 1990, pp. 12-23.Google Scholar
- [Garey & Johnson] M.R. Garey, D.S. Johnson, Computers and Intractability, W.H. Freeman, 1979.Google Scholar
- [Graefe] Graefe, G., "Query Evaluation Techniques for Large Databases," ACM Computing Surveys, V. 25.2, pp. 73-170, June, 1993. Google ScholarDigital Library
- [Gray & Reuter] J. Gray, A. Reuter, Transaction Processing Concepts and Techniques. Morgan Kaufmann, San Mateo, 1992. Google ScholarDigital Library
- [Hasan, Motwani] W. Hasan and R. Motwani, "Optimization Algorithms for Exploiting the Parallelism-Communication Tradeoff in Pipelined Parallelism," Proc. 20th VLDB, Santiago, pp. 36-47 Sept. 1994. Google ScholarDigital Library
- [Hong] W. Hong, Parallel Query Processing Using Shared Memory Multiprocessors and Disk Arrays, Ph.D. Thesis, U.C. Berkeley, 1992.Google Scholar
- [Kitsuregawa 1] M. Kitsuregawa, H. Tanaka, T. Moto-ka, "Application of Hash to Database Machine and Its application," New Generation Computing, 1, 1 pp. 63-74, Springer Verlag, 1983.Google ScholarCross Ref
- [Kitsuregawa 2] M. Kitsuregawa, Yasushi Ogawa, "Bucket Spreading Parallel Hash : A New Robust Parallel Hash Join Method for Data Skew in the Super Database Computer(SDC)." Proc. 16th. VLDB, pp.59- 70. 1990. Google ScholarDigital Library
- [Serlin] O. Serlin, "The History of the TPC", in The Benchmark Handbook for Database and Transaction Processing Systems. 2nd ed., Morgan Kaufmann Publishers, San Mateo 1993.Google Scholar
- [Teradata] Teradata DBS Concepts and Facilities for the NCR System 3600, AT&T GIS, Dayton Ohio, Jan 1994.Google Scholar
Index Terms
- Loading databases using dataflow parallelism
Recommendations
Application of Parallelism SQL in Fuzzy Relational Databases
ICCSIT '08: Proceedings of the 2008 International Conference on Computer Science and Information TechnologyRecently, new applications have emerged that require database management systems with uncertainty capabilities. Many of the existing approaches to modeling uncertainty in database management systems are based on the theory of fuzzy sets. This paper ...
Exploiting Loop-Level Parallelism for SIMD Arrays Using OpenMP
IWOMP '07: Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core EraProgramming SIMD arrays in languages such as C or FORTRAN is difficult and although work on automatic parallelizing programs has achieved much, it is far from satisfactory. In particular, almost all `fully' automatic parallelizing compilers place ...
Comments