skip to main content
10.1145/3373376.3378482acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections

A Hypervisor for Shared-Memory FPGA Platforms

Authors Info & Claims
Published:13 March 2020Publication History

ABSTRACT

Cloud providers widely deploy FPGAs as application-specific accelerators for customer use. These providers seek to multiplex their FPGAs among customers via virtualization, thereby reducing running costs. Unfortunately, most virtualization support is confined to FPGAs that expose a restrictive, host-centric programming model in which accelerators cannot issue direct memory accesses (DMAs). The host-centric model incurs high runtime overhead for workloads that exhibit pointer chasing. Thus, FPGAs are beginning to support a shared-memory programming model in which accelerators can issue DMAs. However, virtualization support for shared-memory FPGAs is limited. This paper presents Optimus, the first hypervisor that supports scalable shared-memory FPGA virtualization. Optimus offers both spatial multiplexing and temporal multiplexing to provide efficient and flexible sharing of each accelerator on an FPGA. To share the FPGA-CPU interconnect at a high clock frequency, Optimus implements a multiplexer tree. To isolate each guest's address space, Optimus introduces the technique of page table slicing as a hardware-software co-design. To support preemptive temporal multiplexing, Optimus provides an accelerator preemption interface. We show that Optimus supports eight physical accelerators on a single FPGA and improves the aggregate throughput of twelve real-world benchmarks by 1.98x-7x.

References

  1. [n.d.]. GP100 Pascal Whitepaper. https://images.nvidia.com/content/ pdf/tesla/whitepaper/pascal-architecture-whitepaper.pdf.Google ScholarGoogle Scholar
  2. [n.d.]. Hugetlbfs Reservation. https://www.kernel.org/doc/html/v4. 18/vm/hugetlbfs_reserv.html.Google ScholarGoogle Scholar
  3. [n.d.]. Open-Source FPGA Bitcoin Miner. https://github.com/progran ism/Open-Source-FPGA-Bitcoin-Miner.Google ScholarGoogle Scholar
  4. [n.d.]. Transparent huge pages in 2.6.38. https://lwn.net/Articles /423584/.Google ScholarGoogle Scholar
  5. [n.d.]. Transparent Hugepage Support. https://www.kernel.org/doc/D ocumentation/vm/transhuge.txt.Google ScholarGoogle Scholar
  6. Amazon. [n.d.]. Amazon EC2 F1 Instances - Run Customizable FPGAs in the AWS Cloud. https://aws.amazon.com/ec2/instance-types/f1.Google ScholarGoogle Scholar
  7. Amazon. [n.d.]. Official repository of the AWS EC2 FPGA Hardware and Software Development Kit. https://github.com/aws/aws-fpga.Google ScholarGoogle Scholar
  8. Mikhail Asiatici, Nithin George, Kizheppatt Vipin, Suhaib A Fahmy, and Paolo Ienne. 2017. Virtualized execution runtime for fpga accelerators in the cloud. Ieee Access 5 (2017), 1900--1910.Google ScholarGoogle ScholarCross RefCross Ref
  9. Systems Group at ETH Zurich. [n.d.]. Enzian is a research computer built by the Systems Group at ETH Zurich. http://www.enzian.syste ms/.Google ScholarGoogle Scholar
  10. Osama G Attia, Tyler Johnson, Kevin Townsend, Philip Jones, and Joseph Zambreno. 2014. Cygraph: A reconfigurable architecture for parallel breadth-first search. In 2014 IEEE International Parallel & Distributed Processing Symposium Workshops. IEEE, 228--235.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Rachata Ausavarungnirun, Joshua Landgraf, Vance Miller, Saugata Ghose, Jayneel Gandhi, Christopher J. Rossbach, and Onur Mutlu. 2018. Mosaic: Enabling Application-Transparent Support for Multiple Page Sizes in Throughput Processors. SIGOPS Oper. Syst. Rev. 52, 1 (Aug. 2018), 27--44. https://doi.org/10.1145/3273982.3273986Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jayaram Bhasker. 1999. A Vhdl Primer. Prentice-Hall.Google ScholarGoogle Scholar
  13. Alexander Brant and Guy GF Lemieux. 2012. ZUMA: An open FPGA overlay architecture. In 2012 IEEE 20th international symposium on field-programmable custom computing machines. IEEE, 93--96.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Doug Burger. 2017. Microsoft unveils Project Brainwave for real-time AI. Microsoft Research, Microsoft 22 (2017).Google ScholarGoogle Scholar
  15. Stuart Byma, J Gregory Steffan, Hadi Bannazadeh, Alberto Leon Garcia, and Paul Chow. 2014. Fpgas in the cloud: Booting virtualized hardware accelerators with openstack. In Field-Programmable Custom Computing Machines (FCCM), 2014 IEEE 22nd Annual International Symposium on. IEEE, 109--116.Google ScholarGoogle ScholarCross RefCross Ref
  16. Adrian M Caulfield, Eric S Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, et al. 2016. A cloud-scale acceleration architecture. In The 49th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Press, 7.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Ciro Ceissler, Ramon Nepomuceno, Marcio Pereira, and Guido Araujo. 2018. Automatic Offloading of Cluster Accelerators. In 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE, 224--224.Google ScholarGoogle Scholar
  18. Fei Chen, Yi Shan, Yu Zhang, Yu Wang, Hubertus Franke, Xiaotao Chang, and Kun Wang. 2014. Enabling FPGAs in the cloud. In Proceedings of the 11th ACM Conference on Computing Frontiers. ACM, 3.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Eric S Chung, James C Hoe, and Ken Mai. 2011. CoRAM: an in-fabric memory architecture for FPGA-based computing. In Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays. ACM, 97--106.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. L. Dagum and R. Menon. 1998. OpenMP: an industry standard API for shared-memory programming. IEEE Computational Science and Engineering 5, 1 (Jan 1998), 46--55. https://doi.org/10.1109/99.660313Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Suhaib A Fahmy, Kizheppatt Vipin, and Shanker Shreejith. 2015. Virtualized FPGA accelerators for efficient cloud computing. In Cloud Computing Technology and Science (CloudCom), 2015 IEEE 7th International Conference on. IEEE, 430--435.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Fleming, H. Yang, M. Adler, and J. Emer. 2014. The LEAP FPGA operating system. In 2014 24th International Conference on Field Programmable Logic and Applications (FPL). 1--8. https://doi.org/10.1109/ FPL.2014.6927488Google ScholarGoogle ScholarCross RefCross Ref
  23. Gokul Govindu, Ronald Scrofano, and Viktor K Prasanna. 2005. A library of parameterizable floating-point cores for FPGAs and their application to scientific computing. In Proc Int'l Conf. Eng. Reconfigurable Systems and Algorithms (ERSA'05). Citeseer.Google ScholarGoogle Scholar
  24. Intel. [n.d.]. Acceleration Stack for Intel Xeon CPU with FPGAs Core Cache Interface (CCI-P) Reference Manual. https://www.altera.com/content/dam/altera-www/global/en_ US/pdfs/literature/manual/mnl-ias-ccip.pdf.Google ScholarGoogle Scholar
  25. Intel. [n.d.]. Hardware Accelerator Research Program. https://softwar e.intel.com/en-us/hardware-accelerator-research-program.Google ScholarGoogle Scholar
  26. Intel. [n.d.]. Intel Arria 10 Avalon-ST Interface with SR-IOV PCIe Solutions User Guide. https://www.altera.com/en_US/pdfs/literature /ug/ug_a10_pcie_sriov.pdf.Google ScholarGoogle Scholar
  27. Intel. [n.d.]. Intel Programmable Acceleration Card with Intel Arria 10 GX FPGA. https://www.intel.com/content/www/us/en/programm able/products/boards_and_kits/dev-kits/altera/acceleration-cardarria- 10-gx.html.Google ScholarGoogle Scholar
  28. Intel. [n.d.]. Intel Virtualization Technology for Directed I/O. https://software.intel.com/sites/default/files/managed/c5/15/vtdirected- io-spec.pdf.Google ScholarGoogle Scholar
  29. Intel. [n.d.]. Open Programmable Acceleration Engine. https://opae.g ithub.io/latest/index.html.Google ScholarGoogle Scholar
  30. Intel. 2017. Intel Open Source HD Graphics and Intel Iris Plus Graphics Programmer's Reference Manual for the 2016 - 2017 Intel Core Processors, Celeron Processors, and Pentium Processors based on the "Kaby Lake" Platform. https://01.org/sites/default/files/documentation/intelgfx- prm-osrc-kbl-vol05-memory_views.pdf.Google ScholarGoogle Scholar
  31. Intel. 2019. Embedded Peripherals IP User Guide. https: //www.intel.com/content/dam/www/programmable/us/en/pdf s/literature/ug/ug_embedded_ip.pdf.Google ScholarGoogle Scholar
  32. Intel. 2019. Intel Arria 10 FPGAs. https://www.intel.com/content/ww w/us/en/products/programmable/fpga/arria-10.html.Google ScholarGoogle Scholar
  33. Intel. 2019. Intel FPGA Basic Building Blocks (BBB). https://github.c om/OPAE/intel-fpga-bbb.Google ScholarGoogle Scholar
  34. Abhishek Kumar Jain, Suhaib A Fahmy, and Douglas L Maskell. 2015. Efficient Overlay architecture based on DSP blocks. In 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, 25--28.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Abhishek Kumar Jain, Douglas L Maskell, and Suhaib A Fahmy. 2016. Throughput oriented FPGA overlays using DSP blocks. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1628--1633.Google ScholarGoogle Scholar
  36. Neo Jia and Kirti Wankhede. [n.d.]. VFIO Mediated devices. https: //www.kernel.org/doc/Documentation/vfio-mediated-device.txt.Google ScholarGoogle Scholar
  37. Ahmed Khawaja, Joshua Landgraf, Rohith Prakash, Michael Wei, Eric Schkufza, and Christopher J Rossbach. 2018. Sharing, Protection, and Compatibility for Reconfigurable Fabric with AmorphOS. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). USENIX Association, 107--127.Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Moein Khazraee, Lu Zhang, Luis Vega, and Michael Bedford Taylor. 2017. Moonwalk: NRE Optimization in ASIC Clouds. SIGPLAN Not. 52, 4 (April 2017), 511--526. https://doi.org/10.1145/3093336.3037749Google ScholarGoogle Scholar
  39. Avi Kivity, Yaniv Kamay, Dor Laor, Uri Lublin, and Anthony Liguori. 2007. KVM: the Linux Virtual Machine Monitor. In In Proceedings of the 2007 Ottawa Linux Symposium (OLS).Google ScholarGoogle Scholar
  40. Oliver Knodel, Paul R Genssler, and Rainer G Spallek. 2017. Virtualizing Reconfigurable Hardware to Provide Scalability in Cloud Architectures. Reconfigurable Architectures, Tools and Applications, RECATA (2017).Google ScholarGoogle Scholar
  41. Oliver Knodel and Rainer G Spallek. 2015. RC3E: provision and management of reconfigurable hardware accelerators in a cloud environment. arXiv preprint arXiv:1508.06843 (2015).Google ScholarGoogle Scholar
  42. Dirk Koch, Christian Beckhoff, and Guy GF Lemieux. 2013. An efficient FPGA overlay for portable custom instruction set extensions. In 2013 23rd international conference on field programmable logic and applications. IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  43. Patrick Kutch. 2011. Pci-sig sr-iov primer: An introduction to sr-iov technology. Intel application note (2011), 321211--002.Google ScholarGoogle Scholar
  44. Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J Rossbach, and Emmett Witchel. 2016. Coordinated and efficient huge page management with ingens. In 12th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 16). 705--721.Google ScholarGoogle Scholar
  45. Youngjin Kwon, Hangchen Yu, Simon Peter, Christopher J Rossbach, and Emmett Witchel. 2017. Ingens: Huge Page Support for the OS and Hypervisor. ACM SIGOPS Operating Systems Review 51, 1 (2017), 83--93.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Doug Lea. [n.d.]. A Memory Allocator. http://gee.cs.oswego.edu/dl/h tml/malloc.html.Google ScholarGoogle Scholar
  47. W. Li, G. Jin, X. Cui, and S. See. 2015. An Evaluation of Unified Memory Technology on NVIDIA GPUs. In 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. 1092--1098. https: //doi.org/10.1109/CCGrid.2015.105Google ScholarGoogle Scholar
  48. Enno Lübbers and Marco Platzner. 2009. ReconOS: Multithreaded programming for reconfigurable computers. ACM Transactions on Embedded Computing Systems (TECS) 9, 1 (2009), 8.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Theodore Michailidis, Alex Delis, and Mema Roussopoulos. 2019. MEGA: overcoming traditional problems with OS huge page management. In Proceedings of the 12th ACM International Conference on Systems and Storage. ACM, 121--131.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Microsoft. 2019. What are FPGAs and Project Brainwave? https://docs.microsoft.com/en-us/azure/machinelearning/ service/concept-accelerate-with-fpgas.Google ScholarGoogle Scholar
  51. David Mulnix. 2017. Intel Xeon processor scalable family technical overview.Google ScholarGoogle Scholar
  52. Muhsen Owaida, David Sidler, Kaan Kara, and Gustavo Alonso. 2017. Centaur: A framework for hybrid CPU-FPGA databases. In Field- Programmable Custom Computing Machines (FCCM), 2017 IEEE 25th Annual International Symposium on. IEEE, 211--218.Google ScholarGoogle ScholarCross RefCross Ref
  53. Michele Paolino, Sébastien Pinneterre, and Daniel Raho. 2017. FPGA virtualization with accelerators overcommitment for Network Function Virtualization. In 2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  54. Wesley Peck, Erik Anderson, Jason Agron, Jim Stevens, Fabrice Baijot, and David Andrews. 2006. Hthreads: A computational model for reconfigurable devices. In 2006 International Conference on Field Programmable Logic and Applications. IEEE, 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  55. Sébastien Pinneterre, Spyros Chiotakis, Michele Paolino, and Daniel Raho. 2018. vFPGAmanager: A virtualization framework for orchestrated FPGA accelerator sharing in 5G cloud environments. In 2018 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB). IEEE, 1--5.Google ScholarGoogle ScholarCross RefCross Ref
  56. Andrew Putnam, Adrian M Caulfield, Eric S Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, et al. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. ACM SIGARCH Computer Architecture News 42, 3 (2014), 13--24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. W. Qiao, J. Du, Z. Fang, M. Lo, M. F. Chang, and J. Cong. 2018. High- Throughput Lossless Compression on Tightly Coupled CPU-FPGA Platforms. In 2018 IEEE 26th Annual International Symposium on Field- Programmable Custom Computing Machines (FCCM). 37--44. https: //doi.org/10.1109/FCCM.2018.00015Google ScholarGoogle Scholar
  58. Nikolay Sakharnykh. 2018. EVERYTHING YOU NEED TO KNOW ABOUT UNIFIED MEMORY. http://on-demand.gputechconf.com/g tc/2018/presentation/s8430-everything-you-need-to-know-aboutunified- memory.pdf.Google ScholarGoogle Scholar
  59. Eric Schkufza, Michael Wei, and Christopher J Rossbach. 2019. Just- In-Time Compilation for Verilog: A New Technique for Improving the FPGA Programming Experience. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. ACM, 271--286.Google ScholarGoogle ScholarDigital LibraryDigital Library
  60. Hardik Sharma, Jongse Park, Emmanuel Amaro, Bradley Thwaites, Praneetha Kotha, Anmol Gupta, Joon Kyung Kim, Asit Mishra, and Hadi Esmaeilzadeh. 2016. Dnnweaver: From high-level deep network models to fpga acceleration. In theWorkshop on Cognitive Architectures.Google ScholarGoogle Scholar
  61. Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh. 2016. From high-level deep neural models to FPGAs. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on. IEEE, 1--12.Google ScholarGoogle ScholarDigital LibraryDigital Library
  62. David Sidler, Zsolt István, Muhsen Owaida, and Gustavo Alonso. 2017. Accelerating pattern matching queries in hybrid CPU-FPGA architectures. In Proceedings of the 2017 ACM International Conference on Management of Data. ACM, 403--415.Google ScholarGoogle ScholarDigital LibraryDigital Library
  63. Any Silicon. [n.d.]. FPGA vs ASIC, What to Choose? https://anysilic on.com/fpga-vs-asic-choose/.Google ScholarGoogle Scholar
  64. Hayden Kwok-Hay So and Robert Brodersen. 2008. A unified hardware/ software runtime environment for FPGA-based reconfigurable computers using BORPH. ACM Transactions on Embedded Computing Systems (TECS) 7, 2 (2008), 14.Google ScholarGoogle Scholar
  65. Hayden Kwok-Hay So and Robert W Brodersen. 2007. Borph: An operating system for fpga-based reconfigurable computers. University of California, Berkeley.Google ScholarGoogle Scholar
  66. J. Stuecheli, B. Blaner, C. R. Johns, and M. S. Siegel. 2015. CAPI: A Coherent Accelerator Processor Interface. IBM Journal of Research and Development 59, 1 (Jan 2015), 7:1--7:7. https://doi.org/10.1147/JR D.2014.2380198Google ScholarGoogle ScholarDigital LibraryDigital Library
  67. Naif Tarafdar, Thomas Lin, Eric Fukuda, Hadi Bannazadeh, Alberto Leon-Garcia, and Paul Chow. 2017. Enabling flexible network FPGA clusters in a heterogeneous cloud data center. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 237--246.Google ScholarGoogle ScholarDigital LibraryDigital Library
  68. Terasic and Altera. [n.d.]. DE5a-Net FPGA Development Kit User Manual. https://www.intel.com/content/dam/alterawww/ global/en_US/portal/dsn/42/doc-us-dsnbk-42--1804382103- de5a-net-user-manual.pdf.Google ScholarGoogle Scholar
  69. Donald Thomas and Philip Moorby. 2008. The Verilog® Hardware Description Language. Springer Science & Business Media.Google ScholarGoogle Scholar
  70. Kun Tian, Yaozu Dong, and David Cowperthwaite. 2014. A Full GPU Virtualization Solution with Mediated Pass-Through.. In USENIX Annual Technical Conference. 121--132.Google ScholarGoogle Scholar
  71. Anuj Vaishnav, Khoa Dang Pham, and Dirk Koch. 2018. A survey on fpga virtualization. In 2018 28th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 131--1317.Google ScholarGoogle ScholarCross RefCross Ref
  72. Duy Viet Vu, Oliver Sander, Timo Sandmann, Steffen Baehr, Jan Heidelberger, and Juergen Becker. 2014. Enabling partial reconfiguration for coprocessors in mixed criticality multicore systems using PCI Express Single-Root I/O Virtualization. In 2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  73. WeiWang, Miodrag Bolic, and Jonathan Parri. 2013. pvFPGA: accessing an FPGA-based hardware accelerator in a paravirtualized environment. In Hardware/Software Codesign and System Synthesis (CODES+ ISSS), 2013 International Conference on. IEEE, 1--9.Google ScholarGoogle Scholar
  74. JagathWeerasinghe, Francois Abel, Christoph Hagleitner, and Andreas Herkersdorf. 2015. Enabling FPGAs in hyperscale data centers. In Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom), 2015 IEEE 12th Intl Conf on. IEEE, 1078--1086.Google ScholarGoogle Scholar
  75. Gabriel Weisz and James C Hoe. 2015. CoRAM++: Supporting datastructure- specific memory interfaces for FPGA computing. In 2015 25th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  76. Gabriel Weisz, Joseph Melber, Yu Wang, Kermin Fleming, Eriko Nurvitadhi, and James C Hoe. 2016. A study of pointer-chasing performance on shared-memory processor-FPGA systems. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 264--273.Google ScholarGoogle ScholarDigital LibraryDigital Library
  77. Gabriel Weisz, Joseph Melber, Yu Wang, Kermin Fleming, Eriko Nurvitadhi, and James C. Hoe. 2016. A Study of Pointer-Chasing Performance on Shared-Memory Processor-FPGA Systems. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field- Programmable Gate Arrays (FPGA '16). ACM, New York, NY, USA, 264--273. https://doi.org/10.1145/2847263.2847269Google ScholarGoogle Scholar
  78. Lei Xia, Sanjay Kumar, Xue Yang, Praveen Gopalakrishnan, York Liu, Sebastian Schoenberg, and Xingang Guo. 2011. Virtual WiFi: bring virtualization from wired to wireless. In Acm sigplan notices, Vol. 46. ACM, 181--192.Google ScholarGoogle Scholar
  79. Xilinx. [n.d.]. AXI Interconnect. https://www.xilinx.com/products/in tellectual-property/axi_interconnect.html.Google ScholarGoogle Scholar
  80. Xilinx. [n.d.]. Designing with SR-IOV Capability of Xilinx Virtex-7 PCI Express Gen3 Integrated Block. https://www.xilinx.com/support /documentation/application_notes/xapp1177-pcie-gen3-sriov.pdf.Google ScholarGoogle Scholar
  81. Xilinx. [n.d.]. DMA for PCI Express (PCIe) Subsystem. https://www.xi linx.com/products/intellectual-property/pcie-dma.html.Google ScholarGoogle Scholar
  82. Xilinx. [n.d.]. SDAccel Development Environment. https://www.xili nx.com/products/design-tools/software-zone/sdaccel.html.Google ScholarGoogle Scholar
  83. Peter Xu. 2018. Device Assignment with Nested Guest and DPDK. https://www.linux-kvm.org/images/a/a6/KVM_Forum_2018_ viommu_vfio.pdf.Google ScholarGoogle Scholar
  84. Hangchen Yu, Arthur M. Peters, Amogh Akshintala, and Christopher J. Rossbach. 2019. Automatic Virtualization of Accelerators. In 17th Workshop on Hot Topics in Operating Systems (HotOS {XVII}).Google ScholarGoogle Scholar
  85. H. Zeng, C. Zhang, and V. Prasanna. 2017. Fast Generation of High Throughput Customized Deep Learning Accelerators on FPGAs. In 2017 International Conference on ReConFigurable Computing and FPGAs (ReConFig). 1--8. https://doi.org/10.1109/RECONFIG.2017.8279792Google ScholarGoogle Scholar
  86. Jiansong Zhang, Yongqiang Xiong, Ningyi Xu, Ran Shu, Bojie Li, Peng Cheng, Guo Chen, and Thomas Moscibroda. 2017. The Feniks FPGA Operating System for Cloud Computing. In Proceedings of the 8th Asia-Pacific Workshop on Systems. ACM, 22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  87. Ritchie Zhao, Weinan Song, Wentao Zhang, Tianwei Xing, Jeng-Hau Lin, Mani Srivastava, Rajesh Gupta, and Zhiru Zhang. 2017. Accelerating binarized convolutional neural networks with softwareprogrammable fpgas. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 15--24.Google ScholarGoogle ScholarDigital LibraryDigital Library
  88. Tianhao Zheng, David Nellans, Arslan Zulfiqar, Mark Stephenson, and Stephen W Keckler. 2016. Towards high performance paged memory for GPUs. In 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA). IEEE, 345--357.Google ScholarGoogle ScholarCross RefCross Ref
  89. S. Zhou and V. K. Prasanna. 2017. Accelerating Graph Analytics on CPU-FPGA Heterogeneous Platform. In 2017 29th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD). 137--144. https://doi.org/10.1109/SBAC-PAD.2017.25Google ScholarGoogle Scholar

Index Terms

  1. A Hypervisor for Shared-Memory FPGA Platforms

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems
        March 2020
        1412 pages
        ISBN:9781450371025
        DOI:10.1145/3373376

        Copyright © 2020 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 13 March 2020

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate535of2,713submissions,20%

        Upcoming Conference

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader