ABSTRACT
Data access throughput is one of the key performance metrics in scientific computing, particularly for distributed data-intensive applications. While there has been a body of studies focusing on elephant connections that consume a significant fraction of network bandwidth, this study focuses on predicting slow connections that create bottlenecks in distributed workflows. In this study, we analyze network traffic logs collected between January 2019 and May 2021 at National Energy Research Scientific Computing Center (NERSC). Based on the observed patterns from this data collection, we define a set of features to be used for identifying low-performing data transfers. Through extensive feature engineering and feature selection, we identify a number of new features to significantly enhance the prediction performance. With these new features, even the relatively simple decision tree model could predict slow connections with a F1 score as high as 0.945.
- A Alekseev, A Kiryanov, A Klimentov, T Korchuganova, V Mitsyn, D Oleynik, A Smirnov, S Smirnov, and A Zarochentsev. 2020. Scientific Data Lake for High Luminosity LHC project and other data-intensive particle and astro-particle physics experiments. In Journal of Physics: Conference Series, Vol. 1690. 012166.Google ScholarCross Ref
- Ran Ben Basat, Gil Einziger, Roy Friedman, and Yaron Kassner. 2017. Optimal elephant flow detection. In IEEE INFOCOM 2017-IEEE Conference on Computer Communications. IEEE, 1--9.Google ScholarCross Ref
- Thomas Beermann, Olga Chuchuk, Alessandro Di Girolamo, Maria Grigorieva, Alexei Klimentov, Mario Lassnig, Markus Schulz, Andrea Sciaba, and Eugeny Tretyakov. 2021. Methods of Data Popularity Evaluation in the ATLAS Experiment at the LHC. In EPJ Web of Conferences, Vol. 251. EDP Sciences, 02013.Google ScholarCross Ref
- Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD '16). Association for Computing Machinery, New York, NY, USA, 785--794.Google ScholarDigital Library
- Anshuman Chhabra and Mariam Kiran. 2017. Classifying elephant and mice flows in high-speed scientific networks. Proc. INDIS (2017), 1--8.Google Scholar
- Bjoern Enders, Debbie Bard, Cory Snavely, Lisa Gerhardt, Jason Lee, Becci Totzke, Katie Antypas, Suren Byna, Ravi Cheema, Shreyas Cholia, et al. 2020. Cross-facility science with the superfacility project at LBNL. In 2020 IEEE/ACM 2nd Workshop on Extreme-scale Experiment-in-the-Loop Computing (XLOOP). 1--7.Google ScholarCross Ref
- Alessandro Finamore, Marco Mellia, Michela Meo, Maurizio M Munafo, Politecnico Di Torino, and Dario Rossi. 2011. Experiences of internet traffic monitoring with tstat. IEEE Network, Vol. 25, 3 (2011), 8--14.Google ScholarCross Ref
- Rajkumar Kettimuthu, Zhengchun Liu, Ian Foster, Peter H Beckman, Alex Sim, Kesheng Wu, Wei-keng Liao, Qiao Kang, Ankit Agrawal, and Alok Choudhary. 2018. Towards autonomic science infrastructure: Architecture, limitations, and open issues. In Proceedings of the 1st International Workshop on Autonomous Infrastructure for Science. 1--9.Google ScholarDigital Library
- Zhenlong Li, Qunying Huang, Yuqin Jiang, and Fei Hu. 2020. SOVAS: a scalable online visual analytic system for big climate data analysis. International Journal of Geographical Information Science, Vol. 34, 6 (2020), 1188--1209.Google ScholarCross Ref
- Albert Mestres, Alberto Rodriguez-Natal, Josep Carner, Pere Barlet-Ros, Eduard Alarcón, Marc Solé, Victor Muntés-Mulero, David Meyer, Sharon Barkai, Mike J Hibbett, et al. 2017. Knowledge-defined networking. ACM SIGCOMM Computer Communication Review, Vol. 47, 3 (2017), 2--10.Google ScholarDigital Library
- M Nakashima, A Sim, and J Kim. 2020. Evaluation of Deep Learning Models for Network Performance Prediction for Scientific Facilities. In Proceedings of the 3rd International Workshop on Systems and Network Telemetry and Analytics. 53--56.Google ScholarDigital Library
- Makiya Nakashima, Alex Sim, Youngsoo Kim, Jonghyun Kim, and Jinoh Kim. 2021. Automated feature selection for anomaly detection in network traffic data. ACM Transactions on Management Information Systems (TMIS), Vol. 12, 3 (2021), 1--28.Google ScholarDigital Library
- Taylor Reiter, Phillip T Brooks, Luiz Irber, Shannon EK Joslin, Charles M Reid, Camille Scott, C Titus Brown, and N Tessa Pierce-Ward. 2021. Streamlining data-intensive biology with workflow systems. GigaScience, Vol. 10, 1 (2021), giaa140.Google Scholar
- Oleg Sukhoroslov. 2021. Toward efficient execution of data-intensive workflows. The Journal of Supercomputing, Vol. 77, 8 (2021), 7989--8012.Google ScholarDigital Library
- Astha Syal, Alina Lazar, Jinoh Kim, Alex Sim, and Kesheng Wu. 2019. Automatic detection of network traffic anomalies and changes. In Proceedings of the ACM Workshop on Systems and Network Telemetry and Analytics. 3--10.Google ScholarDigital Library
- Benjamin A Weaver, Michael R Blanton, Jon Brinkmann, Joel R Brownstein, and Fritz Stauffer. 2015. The Sloan digital sky survey data transfer infrastructure. Publications of the Astronomical Society of the Pacific, Vol. 127, 950 (2015), 397.Google ScholarCross Ref
Index Terms
- Predicting Slow Network Transfers in Scientific Computing
Recommendations
Comparing FutureGrid, Amazon EC2, and Open Science Grid for Scientific Workflows
Scientists have many computing infrastructures available to conduct their research, including grids and public or private clouds. This article explores the use of these cyberinfrastructures to execute scientific workflows, an important class of ...
A method for predicting citations to the scientific publications of individual researchers
IDEAS '14: Proceedings of the 18th International Database Engineering & Applications SymposiumAny researcher's publications at any time can be ordered from the highest cited to the lowest cited, yielding a citation curve. We describe a novel method for predicting citation curves of researchers in the future. The method depends on treating the ...
Network Analysis of Scientific Workflows: A Gateway to Reuse
Online workflow repositories let scientists share successful experimental routines and compose new workflows from best practices and existing service components. The authors share the results of a social- network analysis of the myExperiment workflow ...
Comments