skip to main content
10.1145/2110217.2110226acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
short-paper

An architecture for a data-intensive computer

Published:14 November 2011Publication History

ABSTRACT

Scientific instruments, as well as simulations, generate increasingly large datasets, changing the way we do science. We propose a system that we call the data-intensive computer for computing with Petascale-sized datasets. The data-intensive computer consists of an HPC cluster, a massively parallel database and a set of computing servers running the data-intensive operating system, which turns the database into a layer in the memory hierarchy of the data-intensive computer.

The data-intensive operating system is data-object-oriented: the abstract programming model of a sequential file, central to traditional computer operating systems, is replaced with system-level support for high-level data objects, such as multi-dimensional arrays, graphs, sparse arrays, etc. User application programs will be compiled into code that is executed both on the HPC cluster and inside the database. The data-intensive operating system is however non-local, allowing remote applications to execute code inside the database. This model supports the collaborative environment, where a large data set is typically created and processed by a large group of users.

We are developing a software library, MPI-DB, which is a prototype of the data-intensive operating system. It is currently being used by the Turbulence group at JHU to store simulation output in the database and to perform simulations refining previously stored results.

References

  1. Benchmarks provided by infiniband vendors.Google ScholarGoogle Scholar
  2. http://openconnectomeproject.org.Google ScholarGoogle Scholar
  3. http://pcbunn.cacr.caltech.edu/cochlea/.Google ScholarGoogle Scholar
  4. http://turbulence.pha.jhu.edu/.Google ScholarGoogle Scholar
  5. http://www.sdss.org/.Google ScholarGoogle Scholar
  6. K. Asanovíc, R. Bodik, B. Catanzaro, J. Gebis, P. Husbands, K. Keutzer, D. Patterson, W. Plishker, J. Shalf, S. Williams, and K. Yelick. The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-2006-183, Electrical Engineering and Computer Sciences, University of California at Berkeley, December 2006.Google ScholarGoogle Scholar
  7. C. Blakeley, N. Cunningham, B. Ellis, Rathakrishnan, and M. C. Wu. Distributed/heterogeneous query processing in microsoft sql server. In 21st Int. Conf. on Data Engineering (ICDE'05), pages 1001--1012, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. D. Bock, W.-C. A. Lee, A. M. Kerlin, M. L. Andermann, G. Hood, A. W. Wetzel, S. Yurgenson, E. R. Soucy, H. S. Kim, and R. C. Reid. Network anatomy and in vivo physiology of visual cortical neurons. Nature, (471):177--182, March 2011.Google ScholarGoogle Scholar
  9. T. Budavari, A. Szalay, and G. Fekete. Searchable sky coverage of astronomical observations: Footprints and exposures. Submitted, 2010.Google ScholarGoogle Scholar
  10. L. Dobos, I. Csabai, M. Milovanovic, T. Budavari, A. Szalay, M. Tintor, J. Blakeley, A. Jovanovic, and D. Tomic. Array requirements for scientific applications and an implementation for microsoft sql server. In Proc. of the EDBT/ICDT 2011 Workshop on Array Databases, Uppsala, Sweden, (ed.: P. Baumann), 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. G. L. Eyink. Stochastic flux freezing and magnetic dynamo, 2011.Google ScholarGoogle Scholar
  12. E. Givelberg and J. Bunn. A comprehensive three-dimensional model of the cochlea. J. Comp. Phys., 191(2):377--391, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. Givelberg and K. Yelick. Distributed immersed boundary simulation in titanium. SIAM J. on Scientific Computing, 28(4):1367--1378, July 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Y. Gu and R. L. Grossman. Udt: Udp-based data transfer for high-speed wide area networks. Comput. Netw., 51:1777--1799, May 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. V. Kale and G. Zheng. Charm ++ and ampi: Adaptive runtime strategies via migratable objects. pages 265--282. Wiley-Interscience, 2009.Google ScholarGoogle Scholar
  16. N. Kasthuri, K. Hayworth, J. Tapia, R. Schalek, S. Nundy, and J. Lichtman. The brain on tape: Imaging an ultra-thin section library (utsl). Society for Neuroscience Abstracts, 2009.Google ScholarGoogle Scholar
  17. G. Lemson and the Virgo Consortium. Halo and galaxy formation histories from the millennium simulation: Public release of a vo-oriented database. 2006.Google ScholarGoogle Scholar
  18. Leonard. personal communication., 2009.Google ScholarGoogle Scholar
  19. Y. Li, E. Perlman, M. Wan, Y. Yang, R. Burns, C. Meneveau, S. Chen, A. Szalay, and G. Eyink. A public turbulence database cluster and applications to study lagrangian evolution of velocity increments in turbulence. J. Turbulence, 9(31), 2008.Google ScholarGoogle Scholar
  20. G. Memik, M. T. Kandemir, W.-K. Liao, and A. Choudhary. Multicollective i/o: A technique for exploiting inter-file access patterns. ACM Transactions on Storage, 2(3), Aug. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. C. Meneveau. A web-services accessible turbulence database of isotropic turbulence: lessons learned. In Progress in wall turbulence: understanding and modeling. (M. Stanislas, ed.), held on 21--23 April, in Lille France, 2009.Google ScholarGoogle Scholar
  22. E. Perlman, R. Burns, Y. Li, and C. Meneveau. Data exploration of turbulence simulations using a database cluster. In Proceedings of the Supercomputing Conference (SC'07), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Snyder. personal communication., 2009.Google ScholarGoogle Scholar
  24. V. Springel, S. White, A. Jenkins, C. Frenk, N. Yoshida, L. Gao, J. Navarro, R. Thacker, D. Croton, J. Helly, J. Peacock, S. Cole, P. Thomas, H. Couchman, A. Evrard, J. Colberg, and F. Pearce. Simulations of the formation, evolution and clustering of galaxies and quasars. Nature, 435:629--636, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  25. M. Stonebraker, P. Brown, A. Poliakov, and S. Raman. The architecture of scidb. In J. Bayard Cushing, J. French, and S. Bowers, editors, Scientific and Statistical Database Management, volume 6809 of Lecture Notes in Computer Science, pages 1--16. Springer Berlin / Heidelberg, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. S. Szalay, P. Z. Kunszt, A. Thakar, J. Gray, D. Slutz, and R. J. Brunner. Designing and mining multi-terabyte astronomy archives: the sloan digital sky survey. In SIGMOD '00: Proceedings of the 2000 ACM SIGMOD international conference on Management of data, pages 451--462, New York, NY, USA, 2000. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. L. X and J. Katz. Measurement of pressure-rate-of-strain, pressure diffusion and velocity-presure- gradient tensors around an open cavity trailing corner. Bull. Am. Phys. Soc., 53(15), 2008.Google ScholarGoogle Scholar
  28. K. Yelick, L. Semenzato, G. Pike, C. Miyamoto, B. Liblit, A. Krishnamurthy, P. Hilfinger, S. Graham, D. Gay, P. Colella, and A. Aiken. Titanium: A high-performance java dialect. Concurrency: Practice and Experience, 10(11--13), September-November 1998.Google ScholarGoogle Scholar
  29. H. Yu and C. Meneveau. Lagrangian refined kolmogorov similarity hypothesis for gradient time evolution and correlation in turbulent flows. Phys. Rev. Lett., 104(8):084502, Feb 2010.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. An architecture for a data-intensive computer

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      NDM '11: Proceedings of the first international workshop on Network-aware data management
      November 2011
      84 pages
      ISBN:9781450311328
      DOI:10.1145/2110217

      Copyright © 2011 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 November 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • short-paper

      Acceptance Rates

      Overall Acceptance Rate14of23submissions,61%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader