skip to main content
article
Free Access

Loading databases using dataflow parallelism

Authors Info & Claims
Published:01 December 1994Publication History
Skip Abstract Section

Abstract

This paper describes a parallel database load prototype for Digital's Rdb database product. The prototype takes a dataflow approach to database parallelism. It includes an explorer that discovers and records the cluster configuration in a database, a client CUI interface that gathers the load job description from the user and from the Rdb catalogs, and an optimizer that picks the best parallel execution plan and records it in a web data structure. The web describes the data operators, the dataflow rivers among them, the binding of operators to processes, processes to processors, and files to discs and tapes. This paper describes the optimizer's cost-based hierarchical optimization strategy in some detail. The prototype executes the web's plan by spawning a web manager process at each node of the cluster. The managers create the local executor processes, and orchestrate startup, phasing, checkpoint, and shutdown. The execution processes perform one or more operators. Data flows among the operators are via memory-to-memory streams within a node, and via web-manager multiplexed tcp/ip streams among nodes. The design of the transaction and checkpoint/restart mechanisms are also described. Preliminary measurements indicate that this design will give excellent scaleups.

References

  1. [DASH] M. Heinrich, et. al., "The Performance Impact and Flexibility of the Stanford FLASH Multiprocessor," 6th ASPLOS, Oct. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. [DeWitt 1] D. DeWitt, et. al., "GAMMA - A High Performance Dataflow Database Machine", Proc. 12th VLDB, Chicago, Sept. 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. [DeWitt 2] D. DeWitt, "The Wisconsin Benchmark, Past, Present, and Future", in The Benchmark Handbook for Database and Transaction Processing Systems. 2nd ed., Morgan Kaufmann, San Mateo 1993.Google ScholarGoogle Scholar
  4. [Englert] S. Englert, "Performance Benefits of Parallel Query Execution and Mixed Workload Support in NonStop SQL Release 2", Tandem Systems Review, V.6.2, Oct 1990, pp. 12-23.Google ScholarGoogle Scholar
  5. [Garey & Johnson] M.R. Garey, D.S. Johnson, Computers and Intractability, W.H. Freeman, 1979.Google ScholarGoogle Scholar
  6. [Graefe] Graefe, G., "Query Evaluation Techniques for Large Databases," ACM Computing Surveys, V. 25.2, pp. 73-170, June, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. [Gray & Reuter] J. Gray, A. Reuter, Transaction Processing Concepts and Techniques. Morgan Kaufmann, San Mateo, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. [Hasan, Motwani] W. Hasan and R. Motwani, "Optimization Algorithms for Exploiting the Parallelism-Communication Tradeoff in Pipelined Parallelism," Proc. 20th VLDB, Santiago, pp. 36-47 Sept. 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. [Hong] W. Hong, Parallel Query Processing Using Shared Memory Multiprocessors and Disk Arrays, Ph.D. Thesis, U.C. Berkeley, 1992.Google ScholarGoogle Scholar
  10. [Kitsuregawa 1] M. Kitsuregawa, H. Tanaka, T. Moto-ka, "Application of Hash to Database Machine and Its application," New Generation Computing, 1, 1 pp. 63-74, Springer Verlag, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  11. [Kitsuregawa 2] M. Kitsuregawa, Yasushi Ogawa, "Bucket Spreading Parallel Hash : A New Robust Parallel Hash Join Method for Data Skew in the Super Database Computer(SDC)." Proc. 16th. VLDB, pp.59- 70. 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. [Serlin] O. Serlin, "The History of the TPC", in The Benchmark Handbook for Database and Transaction Processing Systems. 2nd ed., Morgan Kaufmann Publishers, San Mateo 1993.Google ScholarGoogle Scholar
  13. [Teradata] Teradata DBS Concepts and Facilities for the NCR System 3600, AT&T GIS, Dayton Ohio, Jan 1994.Google ScholarGoogle Scholar

Index Terms

  1. Loading databases using dataflow parallelism

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGMOD Record
          ACM SIGMOD Record  Volume 23, Issue 4
          Dec. 1994
          98 pages
          ISSN:0163-5808
          DOI:10.1145/190627
          • Editor:
          • Arie Segev
          Issue’s Table of Contents

          Copyright © 1994 Authors

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 December 1994

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader