ABSTRACT
Datacenters demand big memory servers for big data. For blade servers, which disaggregate memory across multiple blades, we derive technology and architectural models to estimate communication delay and energy. These models permit new case studies in refusal scheduling to mitigate NUMA and improve the energy efficiency of data movement. Preliminary results show that our model helps researchers coordinate NUMA mitigation and queueing dynamics. We find that judiciously permitting NUMA reduces queueing time, benefiting throughput, latency and energy efficiency for datacenter workloads like Spark. These findings highlight blade servers' strengths and opportunities when building distributed shared memory machines for data analytics.
- AMD. The truth about power consumption starts here. In AMD White Paper: Power Consumption, 2009.Google Scholar
- J. Antony et al. Exploring thread and memory placement on numa architectures. In HiPC, 2006. Google ScholarDigital Library
- B. Goglin and N. Furmento. Enabling high-performance memory migration for multithreaded applications on LINUX. In IPDPS, 2009. Google ScholarDigital Library
- T. Ham et al. Disintegrated control for power-efficient and heterogeneous memory systems. In HPCA, 2013. Google ScholarDigital Library
- B. Holden. Latency comparison between HyperTransport and PCI-Express in communication systems. In HyperTransport Consortium Technical Note, 2006.Google Scholar
- R. Hou et al. Cost effective data center servers. In HPCA, 2013. Google ScholarDigital Library
- A. Joy et al. Analog-DFE-based 16Gb/s SerDes in 40nm CMOS that operates across 34dB loss channels at Nyquist with a baud rate CDR and 1.2Vpp voltage-mode driver. In ISSCC, 2011.Google Scholar
- A. Kazmi. Minimizing PCI Express power consumption. In PCI-SIG Developers Conference, 2007.Google Scholar
- T. Li et al. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In SC, 2007. Google ScholarDigital Library
- K. Lim et al. Disaggregated memory for expansion and sharing in blade servers. In ISCA, 2009. Google ScholarDigital Library
- K. Lim et al. System-level implications of disaggregated memory. In HPCA, 2012. Google ScholarDigital Library
- H. Löf et al. Affinity-on-next-touch: Increasing the performance of an industrial PDE solver on a cc-NUMA system. In ICS, 2005.Google Scholar
- K. Malladi et al. Towards energy-proportional datacenter memory with mobile DRAM. In ISCA, 2012. Google ScholarDigital Library
- D. Meisner et al. BigHouse: A simulation infrastructure for data center systems. In ISPASS, 2012. Google ScholarDigital Library
- Micron. Calculating memory system power for DDR3. In Technicalj Note TN-41-01, 2007.Google Scholar
- PLX Technology. PLX PCIe switch power consumption. In PLX Technology White Paper, 2008.Google Scholar
- J. Rao, K. Wang, et al. Optimizing virtual machine scheduling in NUMA multicore systems. In HPCA, 2013. Google ScholarDigital Library
- J. Regula. Using non-transparent bridging in PCI Express systems. In PLX Technology White Paper, 2004.Google Scholar
- L. Tang et al. Optimizing Google's warehouse scale computers: The NUMA experience. In HPCA, 2013. Google ScholarDigital Library
- M. Zaharia et al. Delay scheduling. In EuroSys, 2010.Google ScholarDigital Library
Index Terms
- Modeling communication costs in blade servers
Recommendations
Modeling Communication Costs in Blade Servers
Special TopicsDatacenters demand big memory servers for big data. For blade servers, which disaggregate memory across multiple blades, we derive technology and architectural models to estimate communication delay and energy. These models permit new case studies in ...
Monitoring Memory Behaviors and Mitigating NUMA Drawbacks on Tiered NVM Systems
Network and Parallel ComputingAbstractNon-Volatile Memory with byte-addressability invites a new paradigm to access persistent data directly. However, this paradigm brings new challenges to the Non-Uniform Memory Access (NUMA) architecture. Since data accesses cross NUMA node can ...
Comments