skip to main content
10.1145/2090181.2090187acmconferencesArticle/Chapter ViewAbstractPublication PagesmiddlewareConference Proceedingsconference-collections
research-article

An empirical analysis of similarity in virtual machine images

Published:12 December 2011Publication History

ABSTRACT

To efficiently design deduplication, caching and other management mechanisms for virtual machine (VM) images in Infrastructure as a Service (IaaS) clouds, it is essential to understand the level and pattern of similarity among VM images in real world IaaS environments. This paper empirically analyzes the similarity within and between 525 VM images from a production IaaS cloud. Besides presenting the overall level of content similarity, we have also discovered interesting insights on multiple factors affecting the similarity pattern, including the image creation time and the location in the image's address space. Moreover, we found that similarities between pairs of images exhibit high variance, and an image is very likely to be more similar to a small subset of images than all other images in the repository. Groups of data chunks often appear in the same image. These image and chunk "clusters" can help predict future data accesses, and therefore provide important hints to cache placement, eviction, and prefetching.

References

  1. A. Swartz and L. Siri. Turnkey Linux Virtual Appliance Library. See http://www.turnkeylinux.org/.Google ScholarGoogle Scholar
  2. Amazon Web Services (AWS) Inc. Elastic Compute Cloud (EC2). See http://aws.amazon.com. VM image data retrieved from an author's AWS console on Aug 7, 2011.Google ScholarGoogle Scholar
  3. G. Ammons, V. Bala, T. Mummert, D. Reimer, and Z. Xiaolan. Virtual machine images as structured data: The mirage image library. In HotCloud '11, Portland, OR, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. T. Clements, I. Ahmad, M. Vilayannur, and J. Li. Decentralized deduplication in san cluster file systems. USENIX'09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Reimer and A. Thomas and G. Ammons and T. Mummert and B. Alpern and V. Bala. Opening Black Boxes: Using Semantic Information to Combat Virtual Machine Image Sprawl. In VEE '08. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. IBM. Tivoli Virtual Deployment Engine (VDE) Beta.Google ScholarGoogle Scholar
  8. K. Jin and E. L. Miller. The effectiveness of deduplication on virtual machine disk images. In SYSTOR'09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. T. Meyer and W. J. Bolosky. A study of practical deduplication. FAST'11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. D. Nurmi, R. Wolski, C. Grzegorczyk, G. Obertelli, S. Soman, L. Youseff, and D. Zagorodnov. The eucalyptus open-source cloud-computing system. CCGRID '09. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. H. Pucha, D. G. Andersen, and M. Kaminsky. Exploiting similarity for multi-source downloads using file handprints. NSDI'07. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Tangwongsan, H. Pucha, D. G. Andersen, and M. Kaminsky. Efficient similarity estimation for systems exploiting data redundancy. INFOCOM'10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. VMWare Inc. vSphere Data Center Virtualization. See http://www.vmware.com/products/vsphere/.Google ScholarGoogle Scholar

Index Terms

  1. An empirical analysis of similarity in virtual machine images

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          Middleware '11: Proceedings of the Middleware 2011 Industry Track Workshop
          December 2011
          44 pages
          ISBN:9781450310741
          DOI:10.1145/2090181

          Copyright © 2011 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 December 2011

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate203of948submissions,21%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader