skip to main content
10.1145/3603166.3632136acmconferencesArticle/Chapter ViewAbstractPublication PagesuccConference Proceedingsconference-collections
research-article

NSDF-Services: Integrating Networking, Storage, and Computing Services into a Testbed for Democratization of Data Delivery

Published:04 April 2024Publication History

ABSTRACT

The lack of a readily accessible, tightly integrated data fabric connecting high-speed networking, storage, and computing services remains a critical barrier to the democratization of scientific discovery. To address this challenge, we are building National Science Data Fabric (NSDF), a holistic ecosystem to facilitate domain scientists in their daily research. NSDF comprises networking, storage, and computing services, as well as outreach initiatives. In this paper, we present a testbed integrating three services (i.e., networking, storage, and computing). We evaluate their performance. Specifically, we study the networking services and their throughput and latency with a focus on academic cloud providers; the storage services and their performance with a focus on data movement using file system mappers for both academic and commercial clouds; and computing orchestration services focusing on commercial cloud providers. We discuss NSDF's potential to increase scalability and usability as it decreases time-to-discovery across scientific domains.

References

  1. National Research Council, Big Data in Materials Research and Development: Summary of a Workshop. The National Academies Press, 2014.Google ScholarGoogle Scholar
  2. T. T. Wong, "Building a Materials Data Infrastructure," The Minerals, Metals & Materials Society (TMS), JOM, vol. 68, p. 2029--2030, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  3. N. Zhou et al., "Orchestration of Materials Science Workflows for Heterogeneous Resources at Large Scale," The International Journal of High Performance Computing Applications, vol. 37, no. 3--4, pp. 260--271, 2023.Google ScholarGoogle Scholar
  4. P. Olaya et al., "Building Trust in Earth Science Findings through Data Traceability and Results Explainability," IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 34, no. 2, pp. 704--717, 2023.Google ScholarGoogle ScholarCross RefCross Ref
  5. J. Luettgau et al., "NSDF-Cloud: Enabling Ad-Hoc Compute Clusters Across Academic and Commercial Clouds," in Proc. of the 31st International Symposium on High-Performance Parallel and Distributed Computing, pp. 279--280, ACM, 2022.Google ScholarGoogle Scholar
  6. P. Olaya et al., "NSDF-FUSE: A Testbed for Studying Object Storage via FUSE File Systems," in Proc. of the 31st International Symposium on High-Performance Parallel and Distributed Computing, pp. 277--278, ACM, 2022.Google ScholarGoogle Scholar
  7. P. Olaya et al., "Enabling scalability in the cloud for scientific workflows: An earth science use case," in Proc. of IEEE 16th International Conference on Cloud Computing (CLOUD), pp. 383--393, 2023.Google ScholarGoogle Scholar
  8. J. Luetgau et al., "Development of large-scale scientific cyberinfrastructure and the growing opportunity to democratize access to platforms and data," in Proc. of the 11th International Conference: Distributed, Ambient and Pervasive Interactions (DAPI), Held as Part of the 25th HCI International Conference (HCII), pp. 378--389, Springer Nature Switzerland, 2023.Google ScholarGoogle Scholar
  9. National Academies of Sciences and Medicine, Open Science by Design: Realizing a Vision for 21st Century Research. The National Academies Press, 2018.Google ScholarGoogle Scholar
  10. National Academies of Sciences and Medicine, Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. The National Academies Press, 2016.Google ScholarGoogle Scholar
  11. Subcommittee On Future Advanced Computing Ecosystem Of The National Science & Technology Council, "Pioneering the Future Advanced Computing Ecosystem: A Strategic Plan," January, 2021.Google ScholarGoogle Scholar
  12. J. Luettgau et al., "Studying Latency and Throughput Constraints for Geo-Distributed Data in the National Science Data Fabric," in Proc. of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, p. 325--326, ACM, 2023.Google ScholarGoogle Scholar
  13. I. Foster, "Globus Online: Accelerating and Democratizing Science through Cloud-Based Services," IEEE Internet Computing, vol. 15, no. 3, pp. 70--73, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. DeHaan, "Ansible." Available at https://github.com/ansible/ansible, [Online; accessed 10-30-2023].Google ScholarGoogle Scholar
  15. M. Rocklin, "Dask: Parallel computation with blocked algorithms and task scheduling," in Proc. of the 14th Python in Science Conference, no. 130-136, 2015.Google ScholarGoogle Scholar
  16. A. Hanemann et al., "PerfSONAR: A Service Oriented Architecture for Multi-domain Network Monitoring," in Proc. of Service-Oriented Computing - ICSOC 2005, pp. 241--254, Springer Berlin Heidelberg, 2005.Google ScholarGoogle Scholar
  17. C. Boeheim et al., "Scalla: Scalable Cluster Architecture for Low Latency Access Using Xrootd and Olbd Aervers," Technical report, Stanford Linear Accelerator Center, 2006.Google ScholarGoogle Scholar
  18. I. Sfiligoi et al., "Running a Pre-exascale, Geographically Distributed, Multi-Cloud Scientific Simulation," in Proc. of The High-Performance Computing ISC, p. 23--40, Springer International Publishing, 2020.Google ScholarGoogle Scholar
  19. I. Sfiligoi et al., "Demonstrating a pre-exascale, cost-effective multi-cloud environment for scientific computing: Producing a fp32 exaflop hour worth of icecube simulation data in a single workday," in Proc. of Practice and Experience in Advanced Research Computing (PEARC '20), p. 85--90, ACM, 2020.Google ScholarGoogle Scholar
  20. I. Sfiligoi et al., "Pushing the Cloud Limits in Support of IceCube Science," IEEE Internet Computing, vol. 25, pp. 71--75, Jan. 2021.Google ScholarGoogle ScholarCross RefCross Ref
  21. B. Bockelman et al., "Principles, Technologies, and Time: The Translational Journey of the HTCondor-CE," Journal of Computational Science, vol. 52, p. 101213, 2021.Google ScholarGoogle ScholarCross RefCross Ref
  22. J. Abadie et al., "Search for gravitational waves from compact binary coalescence in LIGO and Virgo data from S5 and VSR1," Phys. Rev. D, vol. 82, no. 10, p. 102001, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  23. "Kagra Observatory." Available at https://www.icrr.u-tokyo.ac.jp/en/facility/4219/, [Online; accessed 10-30-2023].Google ScholarGoogle Scholar
  24. I. Sfiligoi et al., "Managing Cloud Networking Costs for Data-Intensive Applications by Provisioning Dedicated Network Links," in Proc. of Practice and Experience in Advanced Research Computing (PEARC '21), ACM, 2021.Google ScholarGoogle Scholar
  25. "OpenStorage Network." Available at https://www.openstoragenetwork.org, [Online; accessed 10-30-2023].Google ScholarGoogle Scholar
  26. M. Parashar et al., "The Virtual Data Collaboratory: A Regional Cyberinfrastructure for Collaborative Data-Driven Research," Computing in Science & Engineering, vol. 22, no. 3, pp. 79--92, 2020.Google ScholarGoogle ScholarCross RefCross Ref
  27. R. Pordes, "The Open Science Grid," Journal of Physics: Conference Series, vol. 78, p. 012057, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  28. L. Bauerdick et al., "Xrootd, Disk-Based, Caching Proxy For Optimization Of Data Access, Data Placement And Data Replication," Journal of Physics: Conference Series, vol. 513, no. 4, p. 042044, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  29. K. Alberi et al., "The 2019 materials by design roadmap," Journal of Physics D: Applied Physics, vol. 52, p. 013001, oct 2018.Google ScholarGoogle ScholarCross RefCross Ref
  30. L. Aaegesen et al., "PRISMS: An Integrated, Open-Source Framework for Accelerating Predictive Structural Materials Science," The Minerals, Metals & Materials Society (TMS), JOM, vol. 70, 08 2018.Google ScholarGoogle Scholar
  31. Engineering, Committee and Board, National and Sciences, Division and Council, National, Integrated Computational Materials Engineering: A Transformational Discipline For Improved Competitiveness And National Security. The National Academies Press, 10 2008.Google ScholarGoogle Scholar
  32. National Research Council, Application of Lightweighting Technology to Military Aircraft, Vessels, and Vehicles. The National Academies Press, 2012.Google ScholarGoogle Scholar
  33. "NSF's 10 Big Ideas - Special Report." Available at https://www.nsf.gov/news/special_reports/big_ideas/, [Online; accessed 10-30-2023].Google ScholarGoogle Scholar
  34. L. Fermi Research Alliance, "MINERvA: Bringing neutrinos into sharp focus." Available at https://minerva.fnal.gov, [Online; accessed 10-30-2023].Google ScholarGoogle Scholar
  35. L. Fermi Research Alliance, "NOvA: NuMI Off-axis Appearance experiment." Available at https://novaexperiment.fnal.gov/, [Online; accessed 10-30-2023].Google ScholarGoogle Scholar
  36. L. Fermi Research Alliance, "DUNE: Deep Underground Neutrino Experiment." Available at https://lbnf-dune.fnal.gov/, [Online; accessed 10-30-2023].Google ScholarGoogle Scholar
  37. R. Kessler et al., "The Difference Imaging Pipeline For The Transient Search In The Dark Energy Survey," The Astronomical Journal, vol. 150, 11 2015.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. NSDF-Services: Integrating Networking, Storage, and Computing Services into a Testbed for Democratization of Data Delivery

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      UCC '23: Proceedings of the IEEE/ACM 16th International Conference on Utility and Cloud Computing
      December 2023
      502 pages
      ISBN:9798400702341
      DOI:10.1145/3603166

      Copyright © 2023 Copyright is held by the owner/author(s). Publication rights licensed to ACM.

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 4 April 2024

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate38of125submissions,30%
    • Article Metrics

      • Downloads (Last 12 months)27
      • Downloads (Last 6 weeks)12

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader