skip to main content
10.1145/1383422.1383461acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
poster

DataLab: transactional data-parallel computing on an active storage cloud

Published:23 June 2008Publication History

ABSTRACT

Active storage clouds are an attractive platform for executing large data intensive workloads found in many fields of science. However, active storage presents new system management challenges. A large system of fault-prone machines with local persistent state can easily degenerate into a mess of unreferenced data and runaway computations. Our solution to this problem is DataLab, a software framework for running data parallel workloads on active storage clusters. DataLab provides a simple language for expressing workloads, works with legacy application codes, and achieves robustness through the use of distributed transactions. Our prototype implementation scales to 250 nodes on a large biometric image processing workload.

References

  1. E. Riedel, G. Gibson, and C. Faloutsos. Active storage for large scale data mining and multimedia. VLDB 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. X. Ma and A. Reddy. Implementation and evaluation of an active storage system prototype. Workshop on Novel Uses of System Area Networks, 2002.Google ScholarGoogle Scholar
  3. R. Wickremisinghe, J. Vitter, and J. Chase. Dist. comp. with load managed active storage. HPDC 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large cluster. OSDI 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly. Dryad: Distributed data parallel programs from sequential building blocks. EuroSys 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DataLab: transactional data-parallel computing on an active storage cloud

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        HPDC '08: Proceedings of the 17th international symposium on High performance distributed computing
        June 2008
        252 pages
        ISBN:9781595939975
        DOI:10.1145/1383422

        Copyright © 2008 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 June 2008

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • poster

        Acceptance Rates

        Overall Acceptance Rate166of966submissions,17%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader