skip to main content
research-article
Free Access

ReNIC: Architectural extension to SR-IOV I/O virtualization for efficient replication

Authors Info & Claims
Published:26 January 2012Publication History
Skip Abstract Section

Abstract

Virtualization is gaining popularity in cloud computing and has become the key enabling technology in cloud infrastructure. By replicating the virtual server state to multiple independent platforms, virtualization improves the reliability and availability of cloud systems. Unfortunately, existing Virtual Machine (VM) replication solutions were designed only for software virtualized I/O, which suffers from large performance and scalability overheads. Although hardware-assisted I/O virtualization (such as SR-IOV) can achieve close to native performance and very good scalability, they cannot be properly replicated across different physical machines due to architectural limitations (such as lack of efficient device state read/write, buffering outbound packets, etc.).

In this paper, we address those architectural limitations, by proposing ReNIC, an architectural extension to SR-IOV I/O virtualization for efficient I/O replications. We have extended Xen hypervisor and the Remus rapid checkpoint solution to support this new architectural extension. We developed a system simulator on multi-core systems to extensively evaluate ReNIC. The experimental results demonstrate that ReNIC achieves up to 54% CPU usage reduction, compared to software based I/O virtualization at runtime, and up to 16.2% performance advantage over software based I/O virtualization in rapid checkpoint. During migration, ReNIC reduces service shutdown time by about 50%, compared to device emulation and paravirtualized I/O, and over 71% compared to teaming driver.

References

  1. Adams, K. and Agesen, O. 2006. A comparison of software and hardware techniques for x86 virtualization. In Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS'12). IEEE, 2--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amazon EC2 2008. Amazon EC2 service level agreement. Amazon EC2, http://aws.amazon.com/ec2-sla/.Google ScholarGoogle Scholar
  3. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T., Ho, A., Neugebauer, R., Pratt, I., and Warfield, A. 2003. Xen and the art of virtualization. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP'03). ACM, 164--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Ben-Yehuda, M., Day, M. D., Dubitzky, Z., Factor, M., Har'El, N., Gordon, A., Liguori, A., Wasserman, O., and Yassour, B. A. 2010. The Turtles Project: Design and Implementation of Nested Virtualization. In Proceedings of the 9th Symposium on Operating Systems Design and Implementation (OSDI'10). USENIX, 423--436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bradford, R., Kotsovinos, E., Feldmann, A., and Schioberg, H. 2007. Live wide-area migration of virtual machines including local persistent state. In Proceedings of the 3rd ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE'07). ACM, 169--179. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bressoud, T. C. and Schneider, F. B. 1996. Hypervisor-based fault tolerance. ACM Trans. Computer Syst. 14, 1, 80--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Checconi, F., Cucinotta, T., and Stein, M. 2009. Real-time issues in live migration of virtual machines, In Proceedings of the IEEE International Conference on Parallel Processing (Euro-Par'09). IEEE, 454--466. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Clark, C., Fraser, K., Hand, S., Hansen, J. G., Jul, E., Limpach, C., Pratt, I., and Warfield, A. 2005. Live migration of virtual machines. In Proceedings of the 2nd Conference on Networked Systems Design and Implementation (NSDI'05). USENIX, 273--286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Cucinotta, T., Anastasi, G., and Abeni, L. 2008. Real-time virtual machines. In Proceedings of the 29th IEEE Real-Time System Symposium (RTSS'08), (Work in Progress Session). IEEE.Google ScholarGoogle Scholar
  10. Cully, B., Lefebvre, G., Meyer, D., and Feeley, M. 2008. Remus: High availability via asynchronous virtual machine replication. In Proceedings of the 5th Conference on Networked Systems Design and Implementation (NSDI'08). USENIX, 161--171. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Dong, Y., Yang, X., Li, X., Li, J., Tian, K., and Guan, H. 2010. High performance network virtualization with SR-IOV. In Proceedings of the 16th IEEE International Symposium on High-Performance Computer Architecture (HPCA'10). IEEE, 271--280.Google ScholarGoogle Scholar
  12. Dunlap, G. W., King, S. T., Cinar, S., Basrai, M. A., and Chen, P. M. 2002. ReVirt: enabling intrusion analysis through virtual-machine logging and replay. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI'02). USENIX, 211--224. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Dunlap, G. W., Lucchetti, D. G., Fetterman, M. A., and Chen, P. M. 2008. Execution replay of multiprocessor virtual machines. In Proceedings of the 4th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE'08). ACM, 121--130. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Fraser, K., Hand, S., Neugebauer, R., Pratt, I., Warfield, A., and Williams, M. 2004. Safe hardware access with the Xen virtual machine monitor. In Proceedings of the 1st Workshop on Operating System and Architectural Support for the On Demand IT InfraStructure (OASIS'04).Google ScholarGoogle Scholar
  15. Gmach, D., Rolia, J., Cherkasova, L., Belrose, G., Turicchi, T., and Kemper, A. 2008. An integrated approach to resource pool management: policies, efficiency and quality metrics. In Proceedings of the 38th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN'08). IEEE, 326--335.Google ScholarGoogle Scholar
  16. Guo, D., Liao, G., and Bhuyan, L. N. 2009. Performance characterization and cache-aware core scheduling in a virtualized multi-core server under 10GbE. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'09). IEEE, 168--177. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hines, M. R., Deshpande, U., and Gopalan, K. 2009. Post-copy live migration of virtual machines. SIGOPS Operat. Syst. Rev. 43, 3, 14--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Huang, W., Gao, Q., Liu, J., and Panda, D. K. 2007. High performance virtual machine migration with RDMA over modern interconnects. In Proceedings of the IEEE International Conference on Cluster Computing (CLUSTER'07). IEEE, 11--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Intel82598 2011. Intel® 82598 10 Gigabit Ethernet Controller. http://ark.intel.com/products/41282/Intel-82599ES-10-Gigabit-Ethernet-Controller.Google ScholarGoogle Scholar
  20. Intel82576 2011. Intel® 82576 Gigabit Ethernet Controller. http://ark.intel.com/products/37166/Intel-82576EB-Gigabit-Ethernet-Controller.Google ScholarGoogle Scholar
  21. Kadav, A. and Swift, M. M. 2009. Live migration of direct-access devices. ACM SIGOPS Operat. Syst. Rev. 43, 3, 95--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Liu, H., Jin, H., Liao, X., Hu, L., and Yu, C. 2009. Live migration of virtual machine based on full system trace and replay. In Proceedings of the 18th ACM International Symposium on High performance Distributed Computing (HPDC'09). ACM, 101--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Liao, G., Zhu, X., and Bhuyan, L. N. 2011. A new server I/O architecture for high speed networks. In Proceedings of the 17th IEEE International Symposium on High-Performance Computer Architecture (HPCA'11). IEEE, 255--265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Liao, G., Bhuyan, L. N., Wu, W., Yu, H., and King, S. R. 2010. A new TCB cache to efficiently manage TCP sessions for Web servers. In Proceedings of the 6th ACM/IEEE Symposium on Architecture for Networking and Communication Systems (ANCS'10). ACM, 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Liu, J., Huang, W., Abali, B., and Panda, D. K. 2006. High performance VMM-bypass I/O in virtual machines. In Proceedings of the USENIX Annual Technical Conference (USENIX'06). USENIX, 3--3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Lu, M. and Chiueh, T. 2009. Fast memory state synchronization for virtualization-based fault tolerance. In Proceedings of the 39th IEEE/IFIP International conference on Dependable Systems & Networks (DSN'09). IEEE, 534--543.Google ScholarGoogle Scholar
  27. Magnusson, P. S., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002 2002002022. Simics: A full system simulation platform. IEEE Computer. 35, 2, 50--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Mansley, K., Law, G., Riddoch, D., Barzini, G., Turton, N., and Pope, S. 2007. Getting 10 Gb/s from Xen: Safe and fast device access from unprivileged domains, In Proceedings of the Conference on Parallel Processing (Euro-Par'07). 224--233. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Nelson, M., Lim, B., and Hutchins, G. 2005. Fast transparent migration for virtual machines. In Proceedings of the USENIX Annual Technical Conference (USENIX'05). USENIX, 391--394. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Raj, H. and Schwan, K. 2007. High performance and scalable I/O virtualization via self-virtualized devices. In Proceedings of the 16th International Symposium on High Performance Distributed Computing (HPDC'07). ACM, 179--188. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. SR-IOV 2008. PCI special interest group. http://www.pcisig.com/home.Google ScholarGoogle Scholar
  32. Sapuntzajis, C. P., Chandra, R., Pfaff, B., Chow, J., Lam, M. S., and Rosenblum, M. 2002. Optimizing the migration of virtual computers. In Proceedings of the 5th Symposium on Operating Systems Design and Implementation (OSDI'02). USENIX, 377--390. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Santos, J. R., Turner, Y. Janakiraman, G., and Pratt, I. 2008. Bridging the gap between software and hardware techniques for I/O virtualization, In Proceedings of the USENIX Annual Technical Conference (USENIX'08). USENIX, 29--42. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Shafer, J. and Rixner, S. 2007. RiceNIC: A reconfigurable network interface for experimental research and education. In Proceedings of the Workshop on Experimental Computer Science (ExpCS'07). ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Uhlig, R., Neiger, G., Rodgers, D., Santoni, A. L., Martins, F. C. M., Anderson, A. V., Bennett, S. M., Kagi, A., Leung, F. H., and Simth, L. 2005 2005005055. Intel Virtualization Technology. IEEE Computer 38, 5, 48--56. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Vmware 2011. vSphere resource management guide. http://www.vmware.com/pdf/vsphere4/r41/vsp_41_resource_mgmt.pdf.Google ScholarGoogle Scholar
  37. Voorsluys, W., Broberg, J., Venugopal, S., and Buyya, R. 2009. Cost of virtual machine live migration in Clouds: A performance evaluation. In Proceedings of the 1st International Conference on Cloud Computing (CloudCom'09). IEEE, 254--265. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Watson, G., Mckeown, N., and Casado, M. 2006. NetFPGA: A tool for network research and education. In Proceedings of Workshop on Architecture Research Using FPGA Platforms (WARFP).Google ScholarGoogle Scholar
  39. Willmann, P., Brogioli, M., and Paj, V. S. 2004. Spinach: A libertybased simulator for programmable network interface architectures. In Proceedings of the ACM SIGPLAN/SIGBED 2004 Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'04). ACM, 20--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Zec, M., Mikuc, M., and Zagar, M. 2002. Estimating the impact of interrupt coalescing delays on steady state TCP throughput. In Proceedings of the 10th International Conference on Software, Telecommunications and Computer Networks (SoftCOM'02). IEEE, Croatia, Italy.Google ScholarGoogle Scholar
  41. Zhai, E., Cummings, G. D., and Dong, Y. 2008. Live migration with pass-through device for linux vm. In Proceedings of Ottawa Linux Symposium (OLS'08). Vol 2, 261--268.Google ScholarGoogle Scholar

Index Terms

  1. ReNIC: Architectural extension to SR-IOV I/O virtualization for efficient replication

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Architecture and Code Optimization
              ACM Transactions on Architecture and Code Optimization  Volume 8, Issue 4
              Special Issue on High-Performance Embedded Architectures and Compilers
              January 2012
              765 pages
              ISSN:1544-3566
              EISSN:1544-3973
              DOI:10.1145/2086696
              Issue’s Table of Contents

              Copyright © 2012 ACM

              Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 26 January 2012
              • Accepted: 1 December 2011
              • Revised: 1 October 2011
              • Received: 1 July 2011
              Published in taco Volume 8, Issue 4

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader