ABSTRACT
The lack of a readily accessible, tightly integrated data fabric connecting high-speed networking, storage, and computing services remains a critical barrier to the democratization of scientific discovery. To address this challenge, we are building National Science Data Fabric (NSDF), a holistic ecosystem to facilitate domain scientists in their daily research. NSDF comprises networking, storage, and computing services, as well as outreach initiatives. In this paper, we present a testbed integrating three services (i.e., networking, storage, and computing). We evaluate their performance. Specifically, we study the networking services and their throughput and latency with a focus on academic cloud providers; the storage services and their performance with a focus on data movement using file system mappers for both academic and commercial clouds; and computing orchestration services focusing on commercial cloud providers. We discuss NSDF's potential to increase scalability and usability as it decreases time-to-discovery across scientific domains.
- National Research Council, Big Data in Materials Research and Development: Summary of a Workshop. The National Academies Press, 2014.Google Scholar
- T. T. Wong, "Building a Materials Data Infrastructure," The Minerals, Metals & Materials Society (TMS), JOM, vol. 68, p. 2029--2030, 2016.Google ScholarCross Ref
- N. Zhou et al., "Orchestration of Materials Science Workflows for Heterogeneous Resources at Large Scale," The International Journal of High Performance Computing Applications, vol. 37, no. 3--4, pp. 260--271, 2023.Google Scholar
- P. Olaya et al., "Building Trust in Earth Science Findings through Data Traceability and Results Explainability," IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 34, no. 2, pp. 704--717, 2023.Google ScholarCross Ref
- J. Luettgau et al., "NSDF-Cloud: Enabling Ad-Hoc Compute Clusters Across Academic and Commercial Clouds," in Proc. of the 31st International Symposium on High-Performance Parallel and Distributed Computing, pp. 279--280, ACM, 2022.Google Scholar
- P. Olaya et al., "NSDF-FUSE: A Testbed for Studying Object Storage via FUSE File Systems," in Proc. of the 31st International Symposium on High-Performance Parallel and Distributed Computing, pp. 277--278, ACM, 2022.Google Scholar
- P. Olaya et al., "Enabling scalability in the cloud for scientific workflows: An earth science use case," in Proc. of IEEE 16th International Conference on Cloud Computing (CLOUD), pp. 383--393, 2023.Google Scholar
- J. Luetgau et al., "Development of large-scale scientific cyberinfrastructure and the growing opportunity to democratize access to platforms and data," in Proc. of the 11th International Conference: Distributed, Ambient and Pervasive Interactions (DAPI), Held as Part of the 25th HCI International Conference (HCII), pp. 378--389, Springer Nature Switzerland, 2023.Google Scholar
- National Academies of Sciences and Medicine, Open Science by Design: Realizing a Vision for 21st Century Research. The National Academies Press, 2018.Google Scholar
- National Academies of Sciences and Medicine, Future Directions for NSF Advanced Computing Infrastructure to Support U.S. Science and Engineering in 2017-2020. The National Academies Press, 2016.Google Scholar
- Subcommittee On Future Advanced Computing Ecosystem Of The National Science & Technology Council, "Pioneering the Future Advanced Computing Ecosystem: A Strategic Plan," January, 2021.Google Scholar
- J. Luettgau et al., "Studying Latency and Throughput Constraints for Geo-Distributed Data in the National Science Data Fabric," in Proc. of the 32nd International Symposium on High-Performance Parallel and Distributed Computing, p. 325--326, ACM, 2023.Google Scholar
- I. Foster, "Globus Online: Accelerating and Democratizing Science through Cloud-Based Services," IEEE Internet Computing, vol. 15, no. 3, pp. 70--73, 2011.Google ScholarDigital Library
- M. DeHaan, "Ansible." Available at https://github.com/ansible/ansible, [Online; accessed 10-30-2023].Google Scholar
- M. Rocklin, "Dask: Parallel computation with blocked algorithms and task scheduling," in Proc. of the 14th Python in Science Conference, no. 130-136, 2015.Google Scholar
- A. Hanemann et al., "PerfSONAR: A Service Oriented Architecture for Multi-domain Network Monitoring," in Proc. of Service-Oriented Computing - ICSOC 2005, pp. 241--254, Springer Berlin Heidelberg, 2005.Google Scholar
- C. Boeheim et al., "Scalla: Scalable Cluster Architecture for Low Latency Access Using Xrootd and Olbd Aervers," Technical report, Stanford Linear Accelerator Center, 2006.Google Scholar
- I. Sfiligoi et al., "Running a Pre-exascale, Geographically Distributed, Multi-Cloud Scientific Simulation," in Proc. of The High-Performance Computing ISC, p. 23--40, Springer International Publishing, 2020.Google Scholar
- I. Sfiligoi et al., "Demonstrating a pre-exascale, cost-effective multi-cloud environment for scientific computing: Producing a fp32 exaflop hour worth of icecube simulation data in a single workday," in Proc. of Practice and Experience in Advanced Research Computing (PEARC '20), p. 85--90, ACM, 2020.Google Scholar
- I. Sfiligoi et al., "Pushing the Cloud Limits in Support of IceCube Science," IEEE Internet Computing, vol. 25, pp. 71--75, Jan. 2021.Google ScholarCross Ref
- B. Bockelman et al., "Principles, Technologies, and Time: The Translational Journey of the HTCondor-CE," Journal of Computational Science, vol. 52, p. 101213, 2021.Google ScholarCross Ref
- J. Abadie et al., "Search for gravitational waves from compact binary coalescence in LIGO and Virgo data from S5 and VSR1," Phys. Rev. D, vol. 82, no. 10, p. 102001, 2010.Google ScholarCross Ref
- "Kagra Observatory." Available at https://www.icrr.u-tokyo.ac.jp/en/facility/4219/, [Online; accessed 10-30-2023].Google Scholar
- I. Sfiligoi et al., "Managing Cloud Networking Costs for Data-Intensive Applications by Provisioning Dedicated Network Links," in Proc. of Practice and Experience in Advanced Research Computing (PEARC '21), ACM, 2021.Google Scholar
- "OpenStorage Network." Available at https://www.openstoragenetwork.org, [Online; accessed 10-30-2023].Google Scholar
- M. Parashar et al., "The Virtual Data Collaboratory: A Regional Cyberinfrastructure for Collaborative Data-Driven Research," Computing in Science & Engineering, vol. 22, no. 3, pp. 79--92, 2020.Google ScholarCross Ref
- R. Pordes, "The Open Science Grid," Journal of Physics: Conference Series, vol. 78, p. 012057, 2007.Google ScholarCross Ref
- L. Bauerdick et al., "Xrootd, Disk-Based, Caching Proxy For Optimization Of Data Access, Data Placement And Data Replication," Journal of Physics: Conference Series, vol. 513, no. 4, p. 042044, 2014.Google ScholarCross Ref
- K. Alberi et al., "The 2019 materials by design roadmap," Journal of Physics D: Applied Physics, vol. 52, p. 013001, oct 2018.Google ScholarCross Ref
- L. Aaegesen et al., "PRISMS: An Integrated, Open-Source Framework for Accelerating Predictive Structural Materials Science," The Minerals, Metals & Materials Society (TMS), JOM, vol. 70, 08 2018.Google Scholar
- Engineering, Committee and Board, National and Sciences, Division and Council, National, Integrated Computational Materials Engineering: A Transformational Discipline For Improved Competitiveness And National Security. The National Academies Press, 10 2008.Google Scholar
- National Research Council, Application of Lightweighting Technology to Military Aircraft, Vessels, and Vehicles. The National Academies Press, 2012.Google Scholar
- "NSF's 10 Big Ideas - Special Report." Available at https://www.nsf.gov/news/special_reports/big_ideas/, [Online; accessed 10-30-2023].Google Scholar
- L. Fermi Research Alliance, "MINERvA: Bringing neutrinos into sharp focus." Available at https://minerva.fnal.gov, [Online; accessed 10-30-2023].Google Scholar
- L. Fermi Research Alliance, "NOvA: NuMI Off-axis Appearance experiment." Available at https://novaexperiment.fnal.gov/, [Online; accessed 10-30-2023].Google Scholar
- L. Fermi Research Alliance, "DUNE: Deep Underground Neutrino Experiment." Available at https://lbnf-dune.fnal.gov/, [Online; accessed 10-30-2023].Google Scholar
- R. Kessler et al., "The Difference Imaging Pipeline For The Transient Search In The Dark Energy Survey," The Astronomical Journal, vol. 150, 11 2015.Google ScholarCross Ref
Index Terms
- NSDF-Services: Integrating Networking, Storage, and Computing Services into a Testbed for Democratization of Data Delivery
Recommendations
Data storage auditing service in cloud computing: challenges, methods and opportunities
Cloud computing is a promising computing model that enables convenient and on-demand network access to a shared pool of configurable computing resources. The first offered cloud service is moving data into the cloud: data owners let cloud service ...
Middleware enabled data sharing on cloud storage services
MW4SOC '10: Proceedings of the 5th International Workshop on Middleware for Service Oriented ComputingWith the emergence of public cloud storage platforms like Amazon, Microsoft and Google etc, individual applications and some enterprise storage are being increasingly deployed on Clouds. However, dynamic data sharing in public clouds face problems of ...
Cloud Storage as the Infrastructure of Cloud Computing
ICICCI '10: Proceedings of the 2010 International Conference on Intelligent Computing and Cognitive InformaticsAs an emerging technology and business paradigm, Cloud Computing has taken commercial computing by storm. Cloud computing platforms provide easy access to a company’s high-performance computing and storage infrastructure through web services. With cloud ...
Comments