ABSTRACT
To efficiently design deduplication, caching and other management mechanisms for virtual machine (VM) images in Infrastructure as a Service (IaaS) clouds, it is essential to understand the level and pattern of similarity among VM images in real world IaaS environments. This paper empirically analyzes the similarity within and between 525 VM images from a production IaaS cloud. Besides presenting the overall level of content similarity, we have also discovered interesting insights on multiple factors affecting the similarity pattern, including the image creation time and the location in the image's address space. Moreover, we found that similarities between pairs of images exhibit high variance, and an image is very likely to be more similar to a small subset of images than all other images in the repository. Groups of data chunks often appear in the same image. These image and chunk "clusters" can help predict future data accesses, and therefore provide important hints to cache placement, eviction, and prefetching.
- A. Swartz and L. Siri. Turnkey Linux Virtual Appliance Library. See http://www.turnkeylinux.org/.Google Scholar
- Amazon Web Services (AWS) Inc. Elastic Compute Cloud (EC2). See http://aws.amazon.com. VM image data retrieved from an author's AWS console on Aug 7, 2011.Google Scholar
- G. Ammons, V. Bala, T. Mummert, D. Reimer, and Z. Xiaolan. Virtual machine images as structured data: The mirage image library. In HotCloud '11, Portland, OR, June 2011. Google ScholarDigital Library
- B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13. Google ScholarDigital Library
- A. T. Clements, I. Ahmad, M. Vilayannur, and J. Li. Decentralized deduplication in san cluster file systems. USENIX'09. Google ScholarDigital Library
- D. Reimer and A. Thomas and G. Ammons and T. Mummert and B. Alpern and V. Bala. Opening Black Boxes: Using Semantic Information to Combat Virtual Machine Image Sprawl. In VEE '08. Google ScholarDigital Library
- IBM. Tivoli Virtual Deployment Engine (VDE) Beta.Google Scholar
- K. Jin and E. L. Miller. The effectiveness of deduplication on virtual machine disk images. In SYSTOR'09. Google ScholarDigital Library
- D. T. Meyer and W. J. Bolosky. A study of practical deduplication. FAST'11. Google ScholarDigital Library
- D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Youseff, and D. Zagorodnov. The eucalyptus open-source cloud-computing system. CCGRID '09. Google ScholarDigital Library
- H. Pucha, D. G. Andersen, and M. Kaminsky. Exploiting similarity for multi-source downloads using file handprints. NSDI'07. Google ScholarDigital Library
- K. Tangwongsan, H. Pucha, D. G. Andersen, and M. Kaminsky. Efficient similarity estimation for systems exploiting data redundancy. INFOCOM'10. Google ScholarDigital Library
- VMWare Inc. vSphere Data Center Virtualization. See http://www.vmware.com/products/vsphere/.Google Scholar
Index Terms
- An empirical analysis of similarity in virtual machine images
Recommendations
Enabling Instantaneous Relocation of Virtual Machines with a Lightweight VMM Extension
CCGRID '10: Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid ComputingWe are developing an efficient resource management system with aggressive virtual machine (VM) relocation among physical nodes in a data center. Existing live migration technology, however, requires a long time to change the execution host of a VM, it ...
Criticality aware tiered cache hierarchy: a fundamental relook at multi-level cache hierarchies
ISCA '18: Proceedings of the 45th Annual International Symposium on Computer ArchitectureOn-die caches are a popular method to help hide the main memory latency. However, it is difficult to build large caches without substantially increasing their access latency, which in turn hurts performance. To overcome this difficulty, on-die caches ...
Empirical study of application runtime performance using on-demand streaming virtual disks in the cloud
MIDDLEWARE '12: Proceedings of the Industrial Track of the 13th ACM/IFIP/USENIX International Middleware ConferenceAs enterprises migrate more and more mission critical workloads to the cloud, the performance of a cloud computing system becomes increasingly important. The traditional method of pre-copying virtual machine images to hypervisors before VMs are booted ...
Comments