A new degree of freedom for memory allocation in clusters

Montaner, Héctor; Silla, Federico; Fröning, Holger; Duato, José

doi:10.1007/s10586-010-0150-7

A new degree of freedom for memory allocation in clusters

Published: 07 February 2011

Volume 15, pages 101–123, (2012)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Héctor Montaner¹,
Federico Silla¹,
Holger Fröning² &
…
José Duato¹

132 Accesses
6 Citations
Explore all metrics

Abstract

Improvements in parallel computing hardware usually involve increments in the number of available resources for a given application such as the number of computing cores and the amount of memory. In the case of shared-memory computers, the increase in computing resources and available memory is usually constrained by the coherency protocol, whose overhead rises with system size, limiting the scalability of the final system. In this paper we propose an efficient and cost-effective way to increase the memory available for a given application by leveraging free memory in other computers in the cluster.

Our proposal is based on the observation that many applications benefit from having more memory resources but do not require more computing cores, thus reducing the requirements for cache coherency and allowing a simpler implementation and better scalability.

Simulation results show that, when additional mechanisms intended to hide remote memory latency are used, execution time of applications that use our proposal is similar to the time required to execute them in a computer populated with enough local memory, thus validating the feasibility of our proposal. We are currently building a prototype that implements our ideas. The first results from real executions in this prototype demonstrate not only that our proposal works but also that it can efficiently execute applications that make use of remote memory resources.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

3leaf Systems: http://www.3leafsystems.com
Acharya, A., Setia, S.: Availability and utility of idle memory in workstation clusters. ACM SIGMETRICS Perform. Eval. Rev. 27(1), 35–46 (1999). doi:10.1145/301464.301478
Article Google Scholar
Anderson, T., Culler, D., Patterson, D.: A case for NOW (Networks of Workstations). IEEE MICRO 15(1), 54–64 (1995). doi:10.1109/40.342018
Article Google Scholar
HyperTransport Technology Consortium. HyperTransport I/O Link Specification Revision 3.10 (2008). Available at http://www.hypertransport.org
Bienia, C., Kumar, S., et al.: The parsec benchmark suite: Characterization and architectural implications. In: Proceedings of the 17th PACT (2008)
Google Scholar
Chapman, M., Heiser, G.: vNUMA: A virtual shared-memory multiprocessor. In: Proceedings of the 2009 USENIX Annual Technical Conference, San Diego, USA, 2000, pp. 349–362. (2009)
Google Scholar
Charles, P., Grothoff, C., Saraswat, V., et al.: X10: an object-oriented approach to non-uniform cluster computing. ACM SIGPLAN Not. 40(10), 519–538 (2005)
Article Google Scholar
Consortium, H.: HyperTransport High Node Count, Slides. http://www.hypertransport.org/default.cfm?page=HighNodeCountSpecification
Conway, P., Hughes, B.: The AMD opteron northbridge architecture. IEEE MICRO 27(2), 10–21 (2007). doi:10.1109/MM.2007.43
Article Google Scholar
Conway, P., Kalyanasundharam, N., Donley, G., et al.: Blade computing with the AMD Opteron processor (Magny-Cours). Hot chips 21 (2009)
Duato, J., Silla, F., Yalamanchili, S., et al.: Extending HyperTransport protocol for improved scalability. First International Workshop on HyperTransport Research and Applications (2009)
Feeley, M.J., Morgan, W.E., Pighin, E.P., Karlin, A.R., Levy, H.M., Thekkath, C.A.: Implementing global memory management in a workstation cluster. In: SOSP ’95: Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles, pp. 201–212. ACM, New York (1995). doi:10.1145/224056.224072
Chapter Google Scholar
Fröning, H., Litz, H.: Efficient hardware support for the partitioned global address space. In: 10th Workshop on Communication Architecture for Clusters (2010)
Google Scholar
Fröning, H., Nuessle, M., Slogsnat, D., Litz, H., Brüening, U.: The HTX-board: a rapid prototyping station. In: 3rd annual FPGAworld Conference (2006)
Google Scholar
Garcia-Molina, H., Salem, K.: Main memory database systems: an overview. IEEE Trans. Knowl. Data Eng. 4(6), 509–516 (1992). doi:10.1109/69.180602
Article Google Scholar
Gaussian 03: http://www.gaussian.com
Gray, J., Liu, D.T., Nieto-Santisteban, M., et al.: Scientific data management in the coming decade. SIGMOD Rec. 34(4), 34–41 (2005). doi:10.1145/1107499.1107503
Article Google Scholar
IBM journal of Research and Development staff: Overview of the IBM Blue Gene/P project. IBM J. Res. Dev. 52(1/2), 199–220 (2008)
Google Scholar
IBM z Series: http://www.ibm.com/systems/z
In-Memory Database Systems (IMDSs) Beyond the Terabyte Size Boudary: http://www.mcobject.com/130/EmbeddedDatabaseWhitePapers.htm
Keltcher, C., McGrath, K., Ahmed, A., Conway, P.: The AMD opteron processor for multiprocessor servers. Micro IEEE 23(2), 66–76 (2003). doi:10.1109/MM.2003.1196116
Article Google Scholar
Kottapalli, S., Baxter, J.: Nehalem-EX CPU architecture. Hot chips 21 (2009)
Liang, S., Noronha, R., Panda, D.: Swapping to remote memory over infiniband: an approach using a high performance network block device. In: Cluster Computing, 2005. IEEE International, pp. 1–10. (2005) doi:10.1109/CLUSTR.2005.347050
Chapter Google Scholar
Litz, H., Fröning, H., Nuessle, M., Brüening, U.: A hypertransport network interface controller for ultra-low latency message transfers. HyperTransport Consortium White Paper (2007)
Litz, H., Fröning, H., Nuessle, M., Brüening, U.: VELO: A novel communication engine for ultra-low latency message transfers. In: 37th International Conference on Parallel Processing, 2008. ICPP ’08, pp. 238–245 (2008). doi:10.1109/ICPP.2008.85
Chapter Google Scholar
Magnusson, P., Christensson, M., Eskilson, J., et al.: Simics: a full system simulation platform. Computer 35(2), 50–58 (2002). doi:10.1109/2.982916
Article Google Scholar
Martin, M., Sorin, D., Beckmann, B., et al.: Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. ACM SIGARCH Comput. Archit. News 33(4), 92–99 (2005) doi:10.1145/1105734.1105747
Article Google Scholar
MBA3 NC Series Catalog: http://www.fujitsu.com/global/services/computing/storage/hdd/ehdd/mba3073nc-mba3300nc.html
McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25 (1995)
Google Scholar
NUMAChip: http://www.numachip.com/
Oguchi, M., Kitsuregawa, M.: Using available remote memory dynamically for parallel data mining application on ATM-connected PC cluster. In: IPDPS 2000. Proceedings, 14th International, pp. 411–420 (2000). doi:10.1109/IPDPS.2000.846014
Google Scholar
Oleszkiewicz, J., Xiao, L., Liu, Y.: Parallel network RAM: effectively utilizing global cluster memory for large data-intensive parallel programs. In: International Conference on Parallel Processing, 2004. ICPP 2004, vol. 1, pp. 353–360 (2004). doi:10.1109/ICPP.2004.1327942
Chapter Google Scholar
Ronstrom, M., Thalmann, L.: MySQL cluster architecture overview. Technical White Paper. MySQL (2004)
ScaleMP: http://www.scalemp.com
SGI: Technical advances in the SGI Altix UV architecture, White Paper. http://www.sgi.com/products/servers/altix/uv/
Slogsnat, D., Giese, A., Nüssle, M., Brüning, U.: An open-source HyperTransport core. ACM Trans. Reconfigurable Technol. Syst. 1(3), 1–21 (2008). doi:10.1007/s10586-010-0150-7
Article Google Scholar
Szalay, A.S., Gray, J., vandenBerg, J.: Petabyte Scale Data Mining: Dream or Reality? CoRR cs.DB/0208013 (2002)
Tuck, J., Ceze, L., Torrellas, J.: Scalable cache miss handling for high memory-level parallelism. In: Microarchitecture, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on (2006)
Google Scholar
Violin Memory: http://violin-memory.com
Dynamic Logical Partitioning. White Paper: http://www.ibm.com/systems/p/hardware/whitepapers/dlpar.html
Yelick, K.: Computer architecture: Opportunities and challenges for scalable applications. Sandia CSRI Workshop on Next-generation scalable applications: When MPI-only is not enough (2008)
Yelick, K.: Programming models: Opportunities and challenges for scalable applications. Sandia CSRI Workshop on Next-generation scalable applications: When MPI-only is not enough (2008)

Download references

Author information

Authors and Affiliations

Departament d’Informàtica de Sistemes i Computadors, Universitat Politècnica de València, 46022, Valencia, Spain
Héctor Montaner, Federico Silla & José Duato
Computer Architecture Group, University of Heidelberg, 68131, Mannheim, Germany
Holger Fröning

Authors

Héctor Montaner
View author publications
You can also search for this author in PubMed Google Scholar
Federico Silla
View author publications
You can also search for this author in PubMed Google Scholar
Holger Fröning
View author publications
You can also search for this author in PubMed Google Scholar
José Duato
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Héctor Montaner.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Montaner, H., Silla, F., Fröning, H. et al. A new degree of freedom for memory allocation in clusters. Cluster Comput 15, 101–123 (2012). https://doi.org/10.1007/s10586-010-0150-7

Download citation

Received: 17 September 2010
Accepted: 29 December 2010
Published: 07 February 2011
Issue Date: June 2012
DOI: https://doi.org/10.1007/s10586-010-0150-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new degree of freedom for memory allocation in clusters

Abstract

Access this article

Similar content being viewed by others

Towards optimal scheduling policy for heterogeneous memory architecture in many-core system

Modeling Large Compute Nodes with Heterogeneous Memories with Cache-Aware Roofline Model

Managing Cache Memory Resources in Adaptive Many-Core Systems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new degree of freedom for memory allocation in clusters

Abstract

Access this article

Similar content being viewed by others

Towards optimal scheduling policy for heterogeneous memory architecture in many-core system

Modeling Large Compute Nodes with Heterogeneous Memories with Cache-Aware Roofline Model

Managing Cache Memory Resources in Adaptive Many-Core Systems

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation