System-level virtualization research at Oak Ridge National Laboratory☆
Section snippets
Introduction to system-level virtualization
System-level virtualization is used for a number of reasons, but the three major justifications are [1], [2], [3]: (i) isolation, (ii) consolidation, and (iii) migration. We describe these points in the following sections after a brief description of the terminology used in this article.
Terminology. The execution of a virtual machine (VM) implies that one or more virtual systems are running concurrently on top of the same hardware, each having its own view of available resources. The operating
Why system-level virtualization for high-performance computing?
Today, high-performance computing (HPC) centers need to support multiple execution platforms. For example, at Oak Ridge National Laboratory (ORNL), massively parallel processing (MPP) platforms, such as the Cray XT4, and Beowulf-type clusters, like the ORNL Institutional Cluster (OIC), are available to users. Each of these systems targets a specific OS, requiring users to port their application before execution. On the other hand, a user’s requirements may also differ. For instance, some users
System-level virtualization and system availability
Modern high-performance computing platforms are composed of thousands or even hundreds of thousands of nodes. Because each node can be subject to a failure, the global availability of the system decreases in proportion to the system scale. Therefore, applications for this environment must be fault tolerant or resilient, able to operate successfully in the face of failure, and the systems should exhibit high-availability traits.
System-level virtualization provides three interesting capabilities
System-level virtualization and system management
The usage of VMs creates several challenges including: (i) how can we support multiple virtualization solutions? (ii) how can we easily manage both, the host OS and the VMs? and, (iii) is it possible to abstract the complexity of virtualization?
Several studies have been recently made to address these issues, which led to the implementation of OSCAR-V [9]. An extension for the management of VMs using the OSCAR system installation/management suite, it integrates several prototypes developed by
Conclusion
System-level virtualization provides several advantages for HPC that may change the way modern HPC systems are currently used though plug-and-play computing, system environment customization, computing on demand, and transparent application resilience through system provided fault tolerance.
However, the usage of virtual machines also creates several challenges including: (i) the development of a virtualization solution suitable for HPC, (ii) the development of tools and methods for the
Stephen L. Scott is a Senior Research Scientist in the Computer Science Group of the Computer Science and Mathematics Division at the Oak Ridge National Laboratory (ORNL), Oak Ridge, USA. Dr. Scott’s research interest is in experimental systems with a focus on high performance distributed, heterogeneous, and parallel computing. He is a founding member of the Open Cluster Group (OCG) and Open Source Cluster Application Resources (OSCAR). Within this organization, he has served as the OCG
References (9)
- et al.
High performance VMM-bypass I/O in virtual machines
- P. Barham, B. Dragovic, K. Fraser, S. Hand, T. Harris, A. Ho, R. Neugebauer, I. Pratt, A. Warfield, Xen and the art of...
- C. Clark, K. Fraser, S. Hand, J.G. Hansen, E. Jul, C. Limpach, I. Pratt, A. Warfield, Live migration of virtual...
- A. Whitaker, M. Shaw, S.D. Gribble, Denali: Lightweight virtual machines for distributed and networked applications,...
Cited by (18)
Internet-based Virtual Computing Environment: Beyond the data center as a computer
2013, Future Generation Computer SystemsCitation Excerpt :In the past ten years, many projects for grid computing have been launched, such as TeraGrid, DataGrid and e-Science, to explore the methods to build a new computing environment across multiple organizations [9,29–33]. As the virtual machine technology becomes increasingly popular, it is used for grid resource virtualization at the system level [31,32] to accelerate the seamless expansion of current existing application software to grid environments. This further promotes the development of grid computing.
Grid design for mobile thin client computing
2011, Future Generation Computer SystemsCitation Excerpt :More recently, process migration techniques like ZAP [27] and its successor Cruz [28] have solved these drawbacks. Besides process migration, one could practice virtualization technologies and benefit from the ease to migrate a virtual session but suffer some resource overhead [29]. In this paper, the study of a mobile thin client grid service is presented.
Suspending, migrating and resuming HPC virtual clusters
2010, Future Generation Computer SystemsCitation Excerpt :Here, the basic idea consists in encapsulating the computational engine (the whole set of “hardware” and software requirements of an application) in a virtual cluster (VC) [5], a set of virtual machines which are deployed and managed as a single entity. In the case of data-driven applications, virtual clusters can typically be moved to where data reside in a fraction of the time required to move data to a static computational cluster, a feature which adds to the many other benefits virtualization yields to HPC [6,7]. As a practical example, next-generation sequencing devices [8] use multi-parallel arrayed formats that produce very large datasets of images, typically growing at the rate of multiple terabytes a week per apparatus.
Nezha: Mobile OS virtualization framework for multiple clients on single computing platform
2019, HotMobile 2019 - Proceedings of the 20th International Workshop on Mobile Computing Systems and ApplicationsA lightweight virtualization solution for android devices
2015, IEEE Transactions on Computers
Stephen L. Scott is a Senior Research Scientist in the Computer Science Group of the Computer Science and Mathematics Division at the Oak Ridge National Laboratory (ORNL), Oak Ridge, USA. Dr. Scott’s research interest is in experimental systems with a focus on high performance distributed, heterogeneous, and parallel computing. He is a founding member of the Open Cluster Group (OCG) and Open Source Cluster Application Resources (OSCAR). Within this organization, he has served as the OCG steering committee chair, as the OSCAR release manager, and as working group chair. Dr. Scott is the project lead principal investigator for the Modular Linux and Adaptive Runtime support for HEC OS/R research (MOLAR) research team. This multi-institution research effort, funded by the Department of Energy - Office of Science, concentrates on adaptive, reliable, and efficient operating and runtime system solutions for ultra-scale scientific high-end computing (HEC) as part of the Forum to Address Scalable Technology for Runtime and Operating Systems (FAST-OS). Dr. Scott is also principal investigator of a project investigating techniques in virtualized system environments for petascale computing and is involved with a related storage effort that is investigating the advantages of storage virtualization in petascale computing environments. Dr. Scott is the chair of the international Scientific Advisory Committee for the European Commission’s XtreemOS project. Stephen has published numerous papers on cluster and distributed computing and has both a Ph.D. and M.S. in computer science. He is also a member of ACM, IEEE Computer, and IEEE Task Force on Cluster Computing.
Geoffroy Vallée is an R&D Associate in the Network and Cluster Computing Group of the Computer Science and Mathematics Division of Oak Ridge National Laboratory (ORNL), USA. He has received his Ph.D. from University of Rennes, France, in the framework of a french industrial collaboration between University of Rennes, INRIA and EDF. Geoffroy has completed his master degree from University of Saint-Quentin-en-Yvelines, France. His research interests include research in operating systems and high availability. Geoffroy is one of the initial developers and designers of the Kerrighed Single System Image (http://www.kerrighed.org/) for clusters. He is also one of the core team members of the Open Source Cluster Application Resource (OSCAR) software (http://www.openclustergroup.org/). Geoffroy is currently doing research on operating systems for petascale computing, focusing on high performance and high availability.
Thomas Naughton is an R&D associate working in the area of high performance system software. He has been involved in the Open Source Cluster Application Resources (OSCAR) project for several years serving as a developer, working group chair and co-chair for the annual OSCAR Symposium. His current efforts are focused on the areas of system-level virtualization and system resilience. Prior to starting at Oak Ridge National Laboratory Thomas received a M.S. degree in Computer Science from the Middle Tennessee State University and a B.S. in Computer Science & B.A. in Philosophy from the University of Tennessee at Martin. He is currently pursuing a Ph.D. from the University of Reading, England.
Anand Tikotekar is currently working as a Post Master’s research associate at Oak Ridge National Lab. His research interests include Fault Tolerant computing, OS level virtualization, and cluster Survivability. He received his Master’s at Louisiana Tech University in Computer Science, and received his B.E. from the Pune University in India in computer science.
Christian Engelmann is a R&D Staff Member at ORNL. He holds an M.Sc. from the University of Reading and a German engineering diploma from the Technical College for Engineering and Economics (FHTW) Berlin. As part of his research at ORNL, Christian is currently pursuing a Ph.D. at the University of Reading. His research aims at high-level reliability, availability, and serviceability for next-generation supercomputers to improve their resiliency (and ultimately efficiency) with novel high availability and fault tolerance system software solutions. Another research area concentrates on “plug-and-play” supercomputing, where transparent portability eliminates most of the software modifications caused by divers platforms and system upgrades. His past research included a pluggable lightweight heterogeneous Distributed Virtual Machine (DVM) environment, the successor of the Parallel Virtual Machine (PVM), and a new generation of superscalable scientific algorithms that address the challenges in scalability and fault tolerance for extreme-scale supercomputers.
Hong Ong is a research staff member in the Computer Science and Mathematics division at Oak Ridge National Laboratory (ORNL). He earned his Ph.D. from the University of Portsmouth, UK in 2004, under the supervision of Professor Mark Baker. His research interests are in the areas of operating systems, middleware for parallel and distributed systems, and system-level performance evaluation. Hong has significant in depth working knowledge in technologies for clusters and the Grid. Prior to joining ORNL, he worked on machines evaluation where he studied the factors that affect large-scale scientific applications performance and analyzed the interaction between network protocols and applications. He has additionally worked on a number of grid-related projects, including the UK e-Science OGSA Testbed project and work evaluating security and firewall issues. Hong currently focuses on several Department of Energy (DOE) Office of Science projects including the scalable operating systems, system virtualization, and dependability middleware. Hong also serves on various program committees of international conferences.
- ☆
Research sponsored by the Laboratory Directed Research and Development Program of Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC for the US Department of Energy under Contract No. DE-AC05-00OR22725.