ABSTRACT
CPU-FPGA heterogeneous acceleration platforms have shown great potential for continued performance and energy efficiency improvement for modern data centers, and have captured great attention from both academia and industry. However, it is nontrivial for users to choose the right platform among various PCIe and QPI based CPU-FPGA platforms from different vendors. This paper aims to find out what microarchitectural characteristics affect the performance, and how. We conduct our quantitative comparison and in-depth analysis on two representative platforms: QPI-based Intel-Altera HARP with coherent shared memory, and PCIe-based Alpha Data board with private device memory. We provide multiple insights for both application developers and platform designers.
- "Alpha Data ADM-PCIE-7V3 datasheet." {Online}. Available: http://www.alpha-data.com/pdfs/adm-pcie-7v3.pdfGoogle Scholar
- "SDAccel development environment." {Online}. Available: http://www.xilinx.com/products/design-tools/software-zone/sdaccel.htmlGoogle Scholar
- B. Brech et al., "IBM Data Engine for NoSQL - Power Systems Edition," IBM Systems Group, Tech. Rep., 2015.Google Scholar
- T. M. Brewer, "Instruction set innovations for the convey hc-1 computer," IEEE micro, no. 2, pp. 70--79, 2010. Google ScholarDigital Library
- N. Chandramoorthy et al., "Exploring architectural heterogeneity in intelligent vision systems," in HPCA, 2015.Google Scholar
- Y.-T. Chen et al., "A novel high-throughput acceleration engine for read alignment," FCCM, 2015. Google ScholarDigital Library
- J. Cong et al., "Composable accelerator-rich microprocessor enhanced for adaptivity and longevity," in ISLPED, 2013. Google ScholarDigital Library
- S. Cook, CUDA programming: a developer's guide to parallel computing with GPUs. Newnes, 2012. Google ScholarDigital Library
- E. G. Cota et al., "An analysis of accelerator coupling in heterogeneous architectures," in DAC, 2015. Google ScholarDigital Library
- Z. Fang et al., "Measuring microarchitectural details of multi- and many-core memory systems through microbenchmarking," ACM TACO, 2015. Google ScholarDigital Library
- A. Goldhammer et al., "Understanding performance of PCI express systems," Xilinx WP350, Sept, vol. 4, 2008.Google Scholar
- Intel, "Accelerator abstraction layer software programmer's guide."Google Scholar
- Intel, "Intel quickpath interconnect fpga core cache interface specification."Google Scholar
- J. Jang et al., "Energy-and time-efficient matrix multiplication on FPGAs," IEEE TVLSI, vol. 13, no. 11, pp. 1305--1319, 2005. Google ScholarDigital Library
- Nvidia, "Nvidia's next generation CUDA compute architecture: FERMI," Comput. Syst, vol. 26, pp. 63--72, 2009.Google Scholar
- A. Putnam et al., "A reconfigurable fabric for accelerating large-scale datacenter services," in ISCA, 2014. Google ScholarDigital Library
- P. Rogers, "Heterogeneous system architecture overview," in Hot Chips, 2013.Google Scholar
- J. Shendure et al., "Next-generation dna sequencing," Nature biotechnology, vol. 26, no. 10, pp. 1135--1145, 2008.Google ScholarCross Ref
- J. Stuecheli et al., "CAPI: A coherent accelerator processor interface," IBM Journal of Research and Development, vol. 59, no. 1, pp. 7--1, 2015.Google ScholarDigital Library
- H. Wong et al., "Demystifying gpu microarchitecture through microbenchmarking," in ISPASS, 2010.Google Scholar
- S. Yesil et al., "Hardware accelerator design for data centers," in ICCAD, 2015. Google ScholarDigital Library
Recommendations
In-Depth Analysis on Microarchitectures of Modern Heterogeneous CPU-FPGA Platforms
Conventional homogeneous multicore processors are not able to provide the continued performance and energy improvement that we have expected from past endeavors. Heterogeneous architectures that feature specialized hardware accelerators are widely ...
Vectorized algorithm for multidimensional Monte Carlo integration on modern GPU, CPU and MIC architectures
The aim of this paper is to show that the multidimensional Monte Carlo integration can be efficiently implemented on computers with modern multicore CPUs and manycore accelerators including Intel MIC and GPU architectures using a new vectorized version ...
Accelerating the 3D euler atmospheric solver through heterogeneous CPU-GPU platforms
CF '16: Proceedings of the ACM International Conference on Computing FrontiersIn climate change studies, the atmospheric model is an essential component for building a high-resolution climate simulation system. While the accuracy of atmospheric simulations has long been limited by the computational capabilities of CPU platforms, ...
Comments