Abstract
Researchers in the area of grid/cloud computing perform many of their experiments using simulations that must capture network behavior. In this context, packet-level simulations, which are widely used to study network protocols, are too costly given the typical large scales of simulated systems and applications. An alternative is to implement network simulations with less costly flow-level models. Several flow-level models have been proposed and implemented in grid/cloud simulators. Surprisingly, published validations of these models, if any, consist of verifications for only a few simple cases. Consequently, even when they have been used to obtain published results, the ability of these simulators to produce scientifically meaningful results is in doubt. This work evaluates these state-of-the-art flow-level network models of TCP communication via comparison to packet-level simulation. While it is straightforward to show cases in which previously proposed models lead to good results, instead we follow the critical method, which places model refutation at the center of the scientific activity, and we systematically seek cases that lead to invalid results. Careful analysis of these cases reveals fundamental flaws and also suggests improvements. One contribution of this work is that these improvements lead to a new model that, while far from being perfect, improves upon all previously proposed models in the context of simulation of grids or clouds. A more important contribution, perhaps, is provided by the pitfalls and unexpected behaviors encountered in this work, leading to a number of enlightening lessons. In particular, this work shows that model validation cannot be achieved solely by exhibiting (possibly many) “good cases.” Confidence in the quality of a model can only be strengthened through an invalidation approach that attempts to prove the model wrong.
- Alexandrov, A., Ionescu, M., Schauser, K., and Scheiman, C. 1995. LogGP: Incorporating long messages into the LogP Model—One step closer towards a realistic model for parallel computation. In Proceedings of the ACM Symposium on Parallel Algorithms and Architectures (SPAA'95). Google ScholarDigital Library
- Barabási, A. and Albert, R. 1999. Emergence of scaling in random networks. Science 59, 509--512.Google ScholarCross Ref
- Baumgart, I., Heep, B., and Krause, S. 2009. OverSim: A scalable and flexible overlay framework for simulation and real network applications. In Proceedings of the 9th International Conference on Peer-to-Peer Computing.Google Scholar
- Bell, W. H., Cameron, D. G., Millar, A. P., Capozza, L., Stockinger, K., and Zini, F. 2003. OptorSim: A grid simulator for studying dynamic data replication strategies. Int. J. High Perform. Comput. Appl. 17, 4.Google ScholarDigital Library
- Bertsekas, D. P. and Gallager, R. 1992. Data Networks. Prentice-Hall, Upper Saddle River, NJ. Google ScholarDigital Library
- Blythe, J., Jain, S., Deelman, E., Gil, Y., Vahi, K., et al. 2005. Task scheduling strategies for workflow-based applications in grids. In Proceedings of the IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (IN CCGRID'05). IEEE, Los Alamitos, CA, 759--767. Google ScholarDigital Library
- Braun, T. D., Siegel, H. J., Beck, N., Bölöni, L. L., Maheswaran, M., Reuther, A. I., Robertson, J. P., Theys, M. D., Yao, B., Hensgen, D., and Freund, R. F. 2001. A comparison of eleven static heuristics for mapping a class of independent tasks onto heterogeneous distributed computing systems. J. Parallel Distrib. Comput. 61, 6, 810--837. Google ScholarDigital Library
- Buyya, R. and Murshed, M. 2002. GridSim: A toolkit for the modeling and simulation of distributed resource management and scheduling for grid computing. J. Concurrency Comput. Pract. Experience (CCPE) 14, 13, 1175--1120.Google ScholarCross Ref
- Calheiros, R. N., Ranjan, R., Beloglazov, A., De Rose, C. A. F., and Buyya, R. 2011. CloudSim: A toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Software Pract. Experience 41, 1, 23--50. Google ScholarDigital Library
- Casanova, H. 2001. SimGrid: A toolkit for the simulation of application scheduling. In 1st IEEE International Symposium on Cluster Computing and the Grid (CCGrid'01). Google ScholarDigital Library
- Casanova, H., Legrand, A., and Marchal, L. 2003. Scheduling distributed applications: The SimGrid simulation framework. In Proceedings of the 3rd IEEE International Symposium on Cluster Computing and the Grid (CCGrid'03). IEEE, Los Alamitos, CA. Google ScholarDigital Library
- Casanova, H., Legrand, A., and Quinson, M. 2008. SimGrid: A generic framework for large-scale distributed experiments. In Proceedings of the 10th Conference on Computer Modeling and Simulation (EuroSim'08). Google ScholarDigital Library
- Casanova, H. and Marchal, L. 2002. A network model for simulation of grid application. Tech. Rep. 2002-40, LIP. Oct.Google Scholar
- Chen, Q., Chang, H., Govindan, R., and Jamin, S. 2002. The origin of power laws in Internet topologies revisited. In Proceedings of the 21st Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM'02). 608--617.Google Scholar
- Chen, W. and Deelman, E. 2012. Workflowsim: A toolkit for simulating scientific workflows in distributed environments. In Proceedings of the 8th IEEE International Conference on eScience. IEEE, Los Alamitos, CA. Google ScholarDigital Library
- Chiu, D. N. 1999. Some observations on fairness of bandwidth sharing. Tech. Rep., Sun Microsystems. Google ScholarDigital Library
- Clauss, P.-N., Stillwell, M., Genaud, S., Suter, F., Casanova, H., and Quinson, M. 2011. Single node on-line simulation of MPI applications with SMPI. In Proceedings of the 25th IEEE International Parallel and Distributed Processing Symposium (IPDPS'11). Google ScholarDigital Library
- Culler, D., Karp, R., Patterson, D., Sahay, A., Schauser, K., Santos, E., Subramonian, R., and von Eicken, T. 1993. LogP: Towards a realistic model of parallel computation. In Proceedings of the ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. Google ScholarDigital Library
- Dabek, F., Cox, R., Kaashoek, F., and Morris, R. 2004. Vivaldi: A decentralized network coordinate system. In Proceedings of the 2004 ACM Conference of the Special Interest Group on Data Communication (SIGCOMM'04). Google ScholarDigital Library
- de Cnodder, S., Elloumi, O., and Pauwels, K. 2000. Red behavior with different packet sizes. In Proceedings of the 5th IEEE Symposium on Computers and Communications (ISCC'00). IEEE Computer Society, Washington, DC. Google ScholarDigital Library
- Doar, M. B. 1996. A better model for generating test networks. In Proceedings of the IEEE Global Communications Conference (GLOBECOM'96). 86--93.Google ScholarCross Ref
- Faloutsos, M., Faloutsos, P., and Faloutsos, C. 1999. On power-law relationships of the internet topology. In Proceedings of the ACM Conference of the Special Interest Group on Data Communication (SIGCOMM'99). 251--262. Google ScholarDigital Library
- Floyd, S. and Fall, K. 1999. Promoting the use of end-to-end congestion control in the Internet. IEEE/ACM Trans. Networking 7, 4, 458--472. Google ScholarDigital Library
- Floyd, S. and Jacobson, V. 1992. On traffic phase effects in packet-switched gateways. Internetworking: Res. Experience 3, 115--156.Google Scholar
- Floyd, S. and Jacobson, V. 1993. Random early detection gateways for congestion avoidance. IEEE/ACM Trans. Networking 1, 4. Google ScholarDigital Library
- Fujiwara, K. and Casanova, H. 2007. Speed and accuracy of network simulation in the SimGrid framework. In Proceedings of the 2nd International Conference on Performance Evaluation Methodologies and Tools. 1--10. Google ScholarDigital Library
- Gil, T. M., Kaashoek, F., Li, J., Morris, R., and Stribling, J. 2005. P2PSim: A simulator for peer-to-peer protocols. http://pdos.csail.mit.edu/p2psim/.Google Scholar
- Giuli, T. and Baker, M. 2002. Narses: A scalable flow-based network simulator. Tech. Rep. cs.PF/0211024, Stanford University. Available at http://arxiv.org/abs/cs.PF/0211024.Google Scholar
- Heusse, M., Merritt, S. A., Brown, T. X., and Duda, A. 2011. Two-way TCP connections: Old problem, new insight. ACM SIGGCOMM Comput. Commun. Rev. 41, 2, 5--15. Google ScholarDigital Library
- Hoefler, T., Schneider, T., and Lumsdaine, A. 2010. LogGOPSim—Simulating large-scale applications in the LogGOPS Model. In Proceedings of the 2nd Workshop on Large-Scale System and Application Performance. Google ScholarDigital Library
- Ino, F., Fujimoto, N., and Hagihara, K. 2001. LogGPS: A parallel computational model for synchronization analysis. In Proceedings of the 8th ACM SIGPLAN Symposium on Principles and Practices of Parallel Programming. Google ScholarDigital Library
- Issariyakul, T. and Hossain, E. 2008. Introduction to Network Simulator NS2. Springe, New York. Google ScholarDigital Library
- Jacobsson, K., Andrew, L., Tang, A., Johansson, K., Hjalmarsson, H., and Low, S. 2008. Ack-clocking dynamics: Modelling the interaction between windows and the network. In Proceedings of the 27th Conference on Computer Communications (INFOCOM'08).Google Scholar
- Jain, M., Prasad, R. S., and Dovrolis, C. 2003. The TCP bandwidth-delay product revisited: Network buffering, cross traffic, and socket buffer auto-sizing. Tech. Rep. GIT-CERCS-03-02, Georgia Institute of Technology.Google Scholar
- Jansen, S. and McGregor, A. 2007. Validation of simulated real world TCP stacks. In Proceedings of the Winter Simulation Conference. Google ScholarDigital Library
- Jung, J. and Kim, H. 2012. MR-CloudSim: Designing and implementing MapReduce computing model on CloudSim. In Proceedings of the International Conference on ICT Convergence (ICTC'12). 504--509.Google Scholar
- Kelly, F., Maulloo, A., and Tan, D. 1998. Rate control for communication networks: Shadow prices, proportional fairness and stability. J. Oper. Res. Soc. 49, 3.Google ScholarCross Ref
- Kielmann, T., Bal, H., and Verstoep, K. 2000. Fast measurement of LogP parameters for message passing platforms. In Proceedings of the 4th Workshop on Run-Time Systems for Parallel Programming. Google ScholarDigital Library
- Lakhina, A., Byers, J., Crovella, M., and Xie, P. 2003. Sampling biases in ip topology measurements. In Proceedings of the 22nd Annual Joint conference of the IEEE Computer and Communications Societies (INFOCOM'03).Google Scholar
- Ledlie, J., Gardner, P., and Seltzer, M. 2007. Network coordinates in the wild. In Proceedings of the 4th Symposium on Networked Systems Design and Implementation (NSDI'07). Google ScholarDigital Library
- Low, S. H. 2003. A duality model of TCP and queue management algorithms. IEEE/ACM Trans. Networking 11, 4. Google ScholarDigital Library
- Low, S. H., Peterson, L. L., and Wang, L. 2002. Understanding vegas: A duality model. J. ACM 49, 2. Google ScholarDigital Library
- Low, S. H. and Srikant, R. 2004. A mathematical framework for designing a low-loss, low-delay internet. Network Spatial Econ. 4, 75--102.Google ScholarCross Ref
- Marfia, G., Palazzi, C., Pau, G., Gerla, M., Sanadidi, M., and Roccetti, M. 2007. Tcp libra: Exploring rtt-fairness for tcp. In NETWORKING 2007. Ad Hoc and Sensor Networks, Wireless Networks, Next Generation Internet, I. F. Akyildiz, R. Sivakumar, E. Ekici, J. C. d. Oliveira, and J. McNair, Eds. Lecture Notes in Computer Science Series, vol. 4479. Springer, Berlin, 1005--1013. Google ScholarDigital Library
- Mathis, M., Semke, J., and Mahdavi, J. 1997. The macroscopic behavior of the TCP congestion avoidance algorithm. Comput. Commun. Rev. 27, 3. Google ScholarDigital Library
- Medina, A., Lakhina, A., Matta, I., and Byers, J. 2001. BRITE: An approach to universal topology generation. In Proceedings of the International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunications Systems (MASCOTS'01). Google ScholarDigital Library
- Mo, J., La, R., Anantharam, V., and Walrand, J. 1999. Analysis and comparison of TCP Reno and TCP Vegas. In Proceedings of the 18th Annual Joint Conference of the IEEE Computer and Communication Societies (INFOCOM'99).Google Scholar
- Mo, J. and Walrand, J. 2000. Fair end-to-end window-based congestion control. IEEE/ACM Trans. Networking 8, 5. Google ScholarDigital Library
- Montresor, A. and Jelasity, M. 2009. PeerSim: A scalable P2P simulator. In Proceedings of the 9th International Conference on Peer-to-Peer Computing.Google Scholar
- NS3. 2011. The Network Simulator 3. http://www.nsnam.org/.Google Scholar
- Núñez, A., Vázquez-Poletti, J., Caminero, A., Carretero, J., and Llorente, I. M. 2011. Design of a new cloud computing simulation platform. In Proceedings of the 11th International Conference on Computational Science and Its Applications. Google ScholarDigital Library
- Ostermann, S., Prodan, R., and Fahringer, T. 2010. Dynamic cloud provisioning for scientific grid workflows. In Proceedings of the 11th ACM/IEEE International Conference on Grid Computing (Grid'10).Google Scholar
- Pentikousis, K. 2001. Connector: Active queue management. Crossroads 7, 5, 2. Google ScholarDigital Library
- Popper, K. 1972. Objective Knowledge: An Evolutionary Approach. Oxford University Press, New York.Google Scholar
- Ramaswamy, S. and Banerjee, P. 1993. Processor allocation and scheduling of macro dataflow graphs on distributed memory multicomputers by the paradigm compiler. In Proceedings of the 1993 International Conference on Parallel Processing, volume II-Software. CRC Press, Boca Raton, FL, 134--138. Google ScholarDigital Library
- Riley, G. F. 2003. The Georgia Tech network simulator. In Proceedings of the ACM SIGCOMM Workshop on Models, Methods and Tools for Reproducible Network Research. 5--12. Google ScholarDigital Library
- Schnorr, L., Legrand, A., and Vincent, J.-M. 2012. Detection and analysis of resource usage anomalies in large distributed systems through multi-scale visualization. Concurrency Comput. Pract. Experience. 24, 15, 1792--1816. Google ScholarDigital Library
- Schnorr, L. M., Huard, G., and Navaux, P. O. A. 2010. Triva: Interactive 3D visualization for performance analysis of parallel applications. Future Gener. Comput. Syst. 26, 3, 348--358. Google ScholarDigital Library
- Shi, Y., Jiang, X., and Ye, K. 2011. An energy-efficient scheme for cloud resource provisioning based on cloudsim. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER'11). 595--599. Google ScholarDigital Library
- Tang, A., Andrew, L., Jacobsson, K., Johansson, K., Hjalmarsson, H., and Low, S. 2010. Queue dynamics with window flow control. IEEE/ACM Trans. Networking 18, 5. Google ScholarDigital Library
- Tang, A., Andrew, L., Jacobsson, K., Johansson, K., Low, S., and Hjalmarsson, H. 2008. Window flow control: Macroscopic properties from microscopic factors. In Proceedins of the 27th Conference on Computer Communications (INFOCOM'08).Google Scholar
- Tangmunarunkit, H., Govindan, R., Jamin, S., Shenker, S., and Willinger, W. 2002. Network topology generators: Degree-based vs structural. In Proceedings of the ACM 2002 Annual Conferenc of the Special Interest Group on Data Communication (SIGCOMM'02). Google ScholarDigital Library
- Teng, F., Yu, L., and Magoulès, F. 2011. SimMapReduce: A simulator for modeling MapReduce framework. In Proceedings of the 5th FTRA International Conference on Multimedia and Ubiquitous Engineering (MUE'11). 277--282. Google ScholarDigital Library
- Topcuoglu, H., Hariri, S., and Wu, M.-Y. 1999. Task scheduling algorithms for heterogeneous processors. In Proceedings of the 8th Heterogeneous Computing Workshop. IEEE Computer Society Press, Washington, DC. Google ScholarDigital Library
- Triva. 2011. Triva Visualization Tool. http://triva.gforge.inria.fr.Google Scholar
- Varga, A. and Hornig, R. 2008. An overview of the OMNeT++ simulation environment. In Proceedings of the 1st International Conference on Simulation Tools and Techniques for Communications, Networks and Systems. Google ScholarDigital Library
- Velho, P. and Legrand, A. 2009. Accuracy study and improvement of network simulation in the SimGrid framework. In Proceedings of the 2nd International Conference on Simulation Tools and Techniques. Google ScholarDigital Library
- Waxman, B. M. 1988. Routing of multipoint connections. IEEE J. Selected Areas Commun. 6, 9, 1617--1622. Google ScholarDigital Library
- Yaïche, H., Mazumdar, R. R., and Rosenberg, C. 2010. A game theoretic framework for bandwidth allocation and pricing in broadband networks. IEEE/ACM Trans. Networking 8, 5. Google ScholarDigital Library
- Zhang, L., Shenker, S., and Clark, D. D. 1991. Observations on the dynamics of a congestion control algorithm: The effects of two-way traffic. ACM Comput. Commun. Rev. 21, 4, 133--147. Google ScholarDigital Library
- Zheng, G., Kakulapati, G., and Kalé, L. V. 2004a. BigSim: A parallel simulator for performance prediction of extremely large parallel machines. In Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS'04).Google Scholar
- Zheng, G., Wilmarth, T., Lawlor, O. S., Kalé, L. V., Adve, S., and Padua, D. 2004b. Performance modeling and programming environments for petaflops computers and the Blue Gene machine. In Proceedings of the 18th International on Parallel and Distributed Processing Symposium. IEEE, Los Alamitos, CA.Google Scholar
Index Terms
- On the validity of flow-level tcp network models for grid and cloud simulations
Recommendations
G-DEVS/HLA Environment for Distributed Simulations of Workflows
We present a Workflow environment allowing distributed simulation based on DEVS/G-DEVS formalisms. A description language for Workflow processes and an automatic transformation of a Workflow into a G-DEVS model have been defined. We then introduce a new ...
SimGrid-HLA - Simulation Grid prototype Oriented to Evaluation of Warfare Performance
SKG '05: Proceedings of the First International Conference on Semantics, Knowledge and GridSimulation grid (SimGrid) is fit for evaluation of warfare performance. Since stakeholders in warfare simulation include scenario developers, running & managing personnel and model managers, it firstly analyzed what the stakeholders would expect from ...
TCP CERL: congestion control enhancement over wireless networks
In this paper, we propose and verify a modified version of TCP Reno that we call TCP Congestion Control Enhancement for Random Loss (CERL). We compare the performance of TCP CERL, using simulations conducted in ns-2, to the following other TCP variants: ...
Comments