[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3582016.3582048acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article
Open access

Reconfigurable Virtual Memory for FPGA-Driven I/O

Published: 25 March 2023 Publication History

Abstract

FPGAs are increasingly used to accelerate modern applications, and cloud providers offer FPGA platforms on-demand with a variety of FPGAs, I/O peripherals, and memory options. FPGA vendors expose I/O with low-level interfaces that limit application portability. Current approaches to abstracting these interfaces trade level of abstraction against performance.
We present FSRF, File System for Reconfigurable Fabrics, which abstracts FPGA I/O at a high level without sacrificing performance. Rather than exposing platform-specific I/O interfaces, FSRF enables files to be mapped directly into FPGA virtual memory from the host. On the FPGA, powerful OS-managed virtual memory hardware provides performant access to FPGA-local resources and helps coordinate access to remote data. FSRF leverages reconfigurability to specialize its virtual memory implementation to applications, including selecting between SRAM and DRAM TLBs, adapting FPGA DRAM striping, and tuning DMA I/O. On Amazon F1 FPGAs, FSRF outperforms techniques from FPGA OSes such as Coyote and AmorphOS with improvements of up to 64× and 2.3×, respectively (+75% and +27% geometric mean), and performance close to that of physical addressing (90% geometric mean).

References

[1]
Michael Adler, Kermin E. Fleming, Angshuman Parashar, Michael Pellauer, and Joel Emer. 2011. Leap Scratchpads: Automatic Memory and Cache Management for Reconfigurable Logic. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA ’11). ACM, New York, NY, USA. 25–28. isbn:978-1-4503-0554-9 https://doi.org/10.1145/1950413.1950421
[2]
Jeongseob Ahn, Seongwook Jin, and Jaehyuk Huh. 2012. Revisiting Hardware-Assisted Page Walks for Virtualized Systems. In ISCA.
[3]
Jeongseob Ahn, Seongwook Jin, and Jaehyuk Huh. 2015. Fast Two-Level Address Translation for Virtualized Systems. In IEEE TC.
[4]
Alibaba Corp. 2021. Alibaba Cloud Instances Overview. https://www.alibabacloud.com/help/doc-detail/108504.htm
[5]
Altera. [n. d.]. Integrating 100-GbE Switching Solutions on 28nm FPGAs. https://www.altera.com/en_US/pdfs/literature/wp/wp-01127-stxv-100gbe-switching.pdf (Accessed May 2018)
[6]
Altera. 2018. Cyclone V SoC Development Board Reference Manual. https://www.altera.com/en_US/pdfs/literature/manual/rm_cv_soc_dev_board.pdf (Accessed May 2018)
[7]
Erik Anderson, Jason Agron, Wesley Peck, Jim Stevens, Fabrice Baijot, Ron Sass, and David Andrews. 2006. Enabling a Uniform Programming Model across the Software/Hardware Boundary. FCCM 06.
[8]
MAWI Working Group Traffic Archive. [n. d.]. MAWI/mawi_201512020030. https://sparse.tamu.edu/MAWI/mawi_201512020030 (Accessed on 06/27/2022)
[9]
ARM. [n. d.]. Cortex-A15 MPCore Processor Technical Reference Manual. https://developer.arm.com/documentation/ddi0438/i/memory-management-unit/tlb-organization/l2-tlb?lang=en
[10]
ARM. 2013. AMBA AXI and ACE Protocol Specification AXI3, AXI4, and AXI4-Lite ACE and ACE-Lite. https://developer.arm.com/documentation/ihi0022/e/ (Accessed on 4/30/2021)
[11]
Rachata Ausavarungnirun, Joshua Landgraf, Vance Miller, Saugata Ghose, Jayneel Gandhi, Christopher J. Rossbach, and Onur Mutlu. 2017. Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes. In MICRO.
[12]
Rachata Ausavarungnirun, Vance Miller, Joshua Landgraf, Saugata Ghose, Jayneel Gandhi, Adwait Jog, Christopher J. Rossbach, and Onur Mutlu. 2018. MASK: Redesigning the GPU Memory Hierarchy to Support Multi-Application Concurrency. In Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’18). ACM, New York, NY, USA. 503–518. isbn:978-1-4503-4911-6 https://doi.org/10.1145/3173162.3173169
[13]
Baidu Corp. 2021. Baudi Cloud Instances Overview. https://cloud.baidu.com/product/fpga.html
[14]
Thomas W. Barr, Alan L. Cox, and Scott Rixner. 2010. Translation Caching: Skip, Don’T Walk (the Page Table). In ISCA.
[15]
Arkaprava Basu, Jayneel Gandhi, Jichuan Chang, Mark D. Hill, and Michael M. Swift. 2013. Efficient Virtual Memory for Big Memory Servers. In ISCA.
[16]
Scott Beamer, Krste Asanovic, and David A. Patterson. 2015. The GAP Benchmark Suite. CoRR, abs/1508.03619 (2015), arXiv:1508.03619. arxiv:1508.03619
[17]
Abhishek Bhattacharjee, Daniel Lustig, and Margaret Martonosi. 2011. Shared Last-level TLBs for Chip Multiprocessors. In HPCA.
[18]
Abhishek Bhattacharjee and Margaret Martonosi. 2009. Characterizing the TLB Behavior of Emerging Parallel Workloads on Chip Multiprocessors. In PACT.
[19]
Alexander Brant and Guy GF Lemieux. 2012. ZUMA: An open FPGA overlay architecture. In Field-Programmable Custom Computing Machines (FCCM), 2012 IEEE 20th Annual International Symposium on. 93–96.
[20]
Gordon J. Brebner. 1996. A Virtual Hardware Operating System for the Xilinx XC6200. In Proceedings of the 6th International Workshop on Field-Programmable Logic, Smart Applications, New Paradigms and Compilers (FPL ’96). Springer-Verlag, London, UK, UK. 327–336. isbn:3-540-61730-2 http://dl.acm.org/citation.cfm?id=647923.741195
[21]
Tanya Brokhman, Pavel Lifshits, and Mark Silberstein. 2019. GAIA: An OS Page Cache for Heterogeneous Systems. In Proceedings of the 2019 USENIX Conference on Usenix Annual Technical Conference (USENIX ATC ’19). USENIX Association, USA. 661–674. isbn:9781939133038
[22]
Stuart Byma, J. Gregory Steffan, Hadi Bannazadeh, Alberto Leon Garcia, and Paul Chow. 2014. FPGAs in the Cloud: Booting Virtualized Hardware Accelerators with OpenStack. In Proceedings of the 2014 IEEE 22Nd International Symposium on Field-Programmable Custom Computing Machines (FCCM ’14). IEEE Computer Society, Washington, DC, USA. 109–116. isbn:978-1-4799-5111-6
[23]
Stuart Byma, Naif Tarafdar, Talia Xu, Hadi Bannazadeh, Alberto Leon-Garcia, and Paul Chow. 2015. Expanding OpenFlow Capabilities with Virtualized Reconfigurable Hardware. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA ’15). ACM, New York, NY, USA. 94–97. isbn:978-1-4503-3315-3 https://doi.org/10.1145/2684746.2689086
[24]
Jared Casper and Kunle Olukotun. 2014. Hardware Acceleration of Database Operations. In Proceedings of the 2014 ACM/SIGDA International Symposium on Field-programmable Gate Arrays (FPGA ’14). ACM, New York, NY, USA. 151–160. isbn:978-1-4503-2671-1 https://doi.org/10.1145/2554688.2554787
[25]
Adrian Caulfield, Eric Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, Stephen Heil, Matt Humphrey, Puneet Kaur, Joo-Young Kim, Daniel Lo, Todd Massengill, Kalin Ovtcharov, Michael Papamichael, Lisa Woods, Sitaram Lanka, Derek Chiou, and Doug Burger. 2016. A Cloud-Scale Acceleration Architecture. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture. https://www.microsoft.com/en-us/research/publication/configurable-cloud-acceleration/
[26]
Zhilei Chai, Jin Yu, Zhibin Wang, Jie Zhang, and Haojie Zhou. 2015. An Embedded FPGA Operating System Optimized for Vision Computing (Abstract Only). In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA ’15). ACM, New York, NY, USA. 271–271. isbn:978-1-4503-3315-3 https://doi.org/10.1145/2684746.2689127
[27]
Fei Chen, Yi Shan, Yu Zhang, Yu Wang, Hubertus Franke, Xiaotao Chang, and Kun Wang. 2014. Enabling FPGAs in the Cloud. In Proceedings of the 11th ACM Conference on Computing Frontiers (CF ’14). ACM, New York, NY, USA. Article 3, 10 pages. isbn:978-1-4503-2870-8 https://doi.org/10.1145/2597917.2597929
[28]
Liang Chen, Thomas Marconi, and Tulika Mitra. 2012. Online Scheduling for Multi-core Shared Reconfigurable Fabric. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE ’12). EDA Consortium, San Jose, CA, USA. 582–585. isbn:978-3-9810801-8-6 http://dl.acm.org/citation.cfm?id=2492708.2492853
[29]
Eric Chung, Jeremy Fowers, Kalin Ovtcharov, Michael Papamichael, Adrian Caulfield, Todd Massengill, Ming Liu, Mahdi Ghandi, Daniel Lo, Steve Reinhardt, Shlomi Alkalay, Hari Angepat, Derek Chiou, Alessandro Forin, Doug Burger, Lisa Woods, Gabriel Weisz, Michael Haselman, and Dan Zhang. 2018. Serving DNNs in Real Time at Datacenter Scale with Project Brainwave. IEEE. https://www.microsoft.com/en-us/research/publication/serving-dnns-real-time-datacenter-scale-project-brainwave/
[30]
Eric S. Chung, James C. Hoe, and Ken Mai. 2011. CoRAM: An In-fabric Memory Architecture for FPGA-based Computing. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA ’11). ACM, New York, NY, USA. 97–106. isbn:978-1-4503-0554-9 https://doi.org/10.1145/1950413.1950435
[31]
Jason Cong, Zhenman Fang, Yuchen Hao, and Glenn Reinmana. 2017. Supporting Address Translation for Accelerator-Centric Architectures. In HPCA.
[32]
Louise H Crockett, Ross A Elliot, Martin A Enderwitz, and Robert W Stewart. 2014. The Zynq Book: Embedded Processing with the Arm Cortex-A9 on the Xilinx Zynq-7000 All Programmable Soc. Strathclyde Academic Media.
[33]
Guohao Dai, Yuze Chi, Yu Wang, and Huazhong Yang. 2016. FPGP: Graph Processing Framework on FPGA A Case Study of Breadth-First Search. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA ’16). ACM, New York, NY, USA. 105–110. isbn:978-1-4503-3856-1 https://doi.org/10.1145/2847263.2847339
[34]
Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38, 1 (2011), Article 1, dec, 25 pages. issn:0098-3500 https://doi.org/10.1145/2049662.2049663
[35]
André DeHon, Yury Markovsky, Eylon Caspi, Michael Chu, Randy Huang, Stylianos Perissakis, Laura Pozzi, Joseph Yeh, and John Wawrzynek. 2006. Stream computations organized for reconfigurable execution. Microprocessors and Microsystems, 30, 6 (2006), 334–354. https://doi.org/10.1016/j.micpro.2006.02.009
[36]
Alexander Domahidi, Eric Chu, and Stephen Boyd. 2013. ECOS: An SOCP solver for embedded systems. In Control Conference (ECC), 2013 European. 3071–3076.
[37]
Amazon EC2. 2017. Amazon EC2 F1 Instances. https://aws.amazon.com/ec2/instance-types/f1/
[38]
Suhaib A. Fahmy, Kizheppatt Vipin, and Shanker Shreejith. 2015. Virtualized FPGA Accelerators for Efficient Cloud Computing. In Proceedings of the 2015 IEEE 7th International Conference on Cloud Computing Technology and Science (CloudCom) (CLOUDCOM ’15). IEEE Computer Society, Washington, DC, USA. 430–435. isbn:978-1-4673-9560-1
[39]
ETH Zurich FPGA Systems Group. [n. d.]. fpgasystems/Coyote. https://github.com/fpgasystems/Coyote
[40]
W. Fu and K. Compton. 2008. Scheduling Intervals for Reconfigurable Computing. In Field-Programmable Custom Computing Machines, 2008. FCCM ’08. 16th International Symposium on. 87–96. https://doi.org/10.1109/FCCM.2008.48
[41]
Jayneel Gandhi, Mark D. Hill, and Michael M. Swift. 2016. Exceeding the Best of Nested and Shadow Paging. In ISCA.
[42]
Jayneel Gandhi, Arkaprava Basu, Mark D. Hill, and Michael M. Swift. 2014. Efficient Memory Virtualization. In MICRO.
[43]
Ivan Gonzalez, Sergio Lopez-Buedo, Gustavo Sutter, Diego Sanchez-Roman, Francisco J. Gomez-Arribas, and Javier Aracil. 2012. Virtualization of Reconfigurable Coprocessors in HPRC Systems with Multicore Architecture. J. Syst. Archit., 58, 6-7 (2012), June, 247–256. issn:1383-7621 https://doi.org/10.1016/j.sysarc.2012.03.002
[44]
B. K. Hamilton, M. Inggs, and H. K. H. So. 2014. Scheduling Mixed-Architecture Processes in Tightly Coupled FPGA-CPU Reconfigurable Computers. In Field-Programmable Custom Computing Machines (FCCM), 2014 IEEE 22nd Annual International Symposium on. 240–240. https://doi.org/10.1109/FCCM.2014.75
[45]
John L. Hennessy and David A. Patterson. 2018. A New Golden Age for Computer Architecture: Domain-Specific Hardware/Software Co-Design, Enhanced Security, Open Instruction Sets, and Agile Chip Development. http://iscaconf.org/isca2018/turing_lecture.html
[46]
Homer Hsing. [n. d.]. Tiny AES. https://opencores.org/projects/tiny_aes (Accessed on 09/20/2021)
[47]
Chun-Hsian Huang and Pao-Ann Hsiung. 2009. Hardware Resource Virtualization for Dynamically Partially Reconfigurable Systems. IEEE Embed. Syst. Lett., 1, 1 (2009), May, 19–23. issn:1943-0663 https://doi.org/10.1109/LES.2009.2028039
[48]
Huawei Corp. 2021. FPGA Accelerated Cloud Server. https://www.huaweicloud.com/en-us/product/fcs.html
[49]
Apple Inc. 2013. About the Virtual Memory System. https://developer.apple.com/library/archive/documentation/Performance/Conceptual/ManagingMemory/Articles/AboutMemory.html
[50]
Aws Ismail and Lesley Shannon. 2011. FUSE: Front-End User Framework for O/S Abstraction of Hardware Accelerators. In Proceedings of the 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM ’11). IEEE Computer Society, Washington, DC, USA. 170–177. isbn:978-0-7695-4301-7 https://doi.org/10.1109/FCCM.2011.48
[51]
Zsolt István, David Sidler, Gustavo Alonso, and Marko Vukolic. 2016. Consensus in a Box: Inexpensive Coordination in Hardware. In Proceedings of the 13th Usenix Conference on Networked Systems Design and Implementation (NSDI’16). USENIX Association, Berkeley, CA, USA. 425–438. isbn:978-1-931971-29-4 http://dl.acm.org/citation.cfm?id=2930611.2930639
[52]
Alexander Kaganov, Asif Lakhany, and Paul Chow. 2011. FPGA Acceleration of MultiFactor CDO Pricing. ACM Trans. Reconfigurable Technol. Syst., 4, 2 (2011), Article 20, May, 17 pages. issn:1936-7406 https://doi.org/10.1145/1968502.1968511
[53]
H. Kalte and M. Porrmann. 2005. Context saving and restoring for multitasking in reconfigurable systems. In Field Programmable Logic and Applications, 2005. International Conference on. 223–228. https://doi.org/10.1109/FPL.2005.1515726
[54]
Gokul B. Kandiraju and Anand Sivasubramaniam. 2002. Going the Distance for TLB Prefetching: An Application-driven Study. In ISCA.
[55]
Rüdiger Kapitza, Johannes Behl, Christian Cachin, Tobias Distler, Simon Kuhnle, Seyed Vahid Mohammadi, Wolfgang Schröder-Preikschat, and Klaus Stengel. 2012. CheapBFT: Resource-efficient Byzantine Fault Tolerance. In Proceedings of the 7th ACM European Conference on Computer Systems (EuroSys ’12). ACM, New York, NY, USA. 295–308. isbn:978-1-4503-1223-3 https://doi.org/10.1145/2168836.2168866
[56]
Nachiket Kapre and Jan Gray. 2015. Hoplite: Building austere overlay NoCs for FPGAs. In FPL. IEEE, 1–8.
[57]
Kaan Kara and Gustavo Alonso. 2016. Fast and robust hashing for database operators. In 26th International Conference on Field Programmable Logic and Applications, FPL 2016, Lausanne, Switzerland, August 29 - September 2, 2016. 1–4. https://doi.org/10.1109/FPL.2016.7577353
[58]
Vasileios Karakostas, Jayneel Gandhi, Furkan Ayar, Adrián Cristal, Mark D. Hill, Kathryn S. McKinley, Mario Nemirovsky, Michael M. Swift, and Osman Ünsal. 2015. Redundant Memory Mappings for Fast Access to Large Memories. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA ’15). ACM, New York, NY, USA. 66–78. isbn:978-1-4503-3402-0 https://doi.org/10.1145/2749469.2749471
[59]
Jens Kehne, Jonathan Metter, and Frank Bellosa. 2015. GPUswap: Enabling Oversubscription of GPU Memory through Transparent Swapping. In Proceedings of the 11th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE ’15). Association for Computing Machinery, New York, NY, USA. 65–77. isbn:9781450334501 https://doi.org/10.1145/2731186.2731192
[60]
Ahmed Khawaja, Joshua Landgraf, Rohith Prakash, Michael Wei, Eric Schkufza, and Christopher J Rossbach. 2018. Sharing, protection, and compatibility for reconfigurable fabric with amorphos. In 13th $USENIX$ Symposium on Operating Systems Design and Implementation ($OSDI$ 18). 107–127.
[61]
Robert Kirchgessner, Alan D. George, and Greg Stitt. 2015. Low-Overhead FPGA Middleware for Application Portability and Productivity. ACM Trans. Reconfigurable Technol. Syst., 8, 4 (2015), Article 21, Sept., 22 pages. issn:1936-7406 https://doi.org/10.1145/2746404
[62]
Robert Kirchgessner, Greg Stitt, Alan George, and Herman Lam. 2012. VirtualRC: A Virtual FPGA Platform for Applications and Tools Portability. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA ’12). ACM, New York, NY, USA. 205–208. isbn:978-1-4503-1155-7 https://doi.org/10.1145/2145694.2145728
[63]
O. Knodel, P. Lehmann, and R. G. Spallek. 2016. RC3E: Reconfigurable Accelerators in Data Centres and Their Provision by Adapted Service Models. In 2016 IEEE 9th International Conference on Cloud Computing (CLOUD). 19–26. https://doi.org/10.1109/CLOUD.2016.0013
[64]
Oliver Knodel and Rainer G. Spallek. 2015. RC3E: Provision and Management of Reconfigurable Hardware Accelerators in a Cloud Environment. CoRR, abs/1508.06843 (2015), arxiv:1508.06843.
[65]
Dirk Koch, Christian Beckhoff, and Guy G. F. Lemieux. 2013. An efficient FPGA overlay for portable custom instruction set extensions. In FPL. IEEE, 1–8.
[66]
Dario Korolija, Timothy Roscoe, and Gustavo Alonso. 2020. Do OS abstractions make sense on FPGAs? In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). USENIX Association, 991–1010. isbn:978-1-939133-19-9 https://www.usenix.org/conference/osdi20/presentation/roscoe
[67]
A. Kulkarni, M. Chiosa, T. B. Preuß er, K. Kara, D. Sidler, and G. Alonso. 2020. HyperLogLog Sketch Acceleration on FPGA. In FPL.
[68]
Joshua Landgraf, Tiffany Yang, Will Lin, Christopher J. Rossbach, and Eric Schkufza. 2021. Compiler-Driven FPGA Virtualization with SYNERGY. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2021). Association for Computing Machinery, New York, NY, USA. 818–831. isbn:9781450383172 https://doi.org/10.1145/3445814.3446755
[69]
Christian Leber, Benjamin Geib, and Heiner Litz. 2011. High Frequency Trading Acceleration Using FPGAs. In Proceedings of the 2011 21st International Conference on Field Programmable Logic and Applications (FPL ’11). IEEE Computer Society, Washington, DC, USA. 317–322. isbn:978-0-7695-4529-5 https://doi.org/10.1109/FPL.2011.64
[70]
Trong-Yen Lee, Che-Cheng Hu, Li-Wen Lai, and Chia-Chun Tsai. 2010. Hardware Context-Switch Methodology for Dynamically Partially Reconfigurable Systems. J. Inf. Sci. Eng., 26 (2010), 1289–1305.
[71]
John Leitch. [n. d.]. MD5 Pipelined. https://opencores.org/projects/md5_pipelined (Accessed on 09/20/2021)
[72]
L. Levinson, R. Manner, M. Sessler, and H. Simmler. 2000. Preemptive multitasking on FPGAs. In Field-Programmable Custom Computing Machines, 2000 IEEE Symposium on. 301–302. https://doi.org/10.1109/FPGA.2000.903426
[73]
Chen Li, Rachata Ausavarungnirun, Christopher J. Rossbach, Youtao Zhang, Onur Mutlu, Yang Guo, and Jun Yang. 2019. A Framework for Memory Oversubscription Management in Graphics Processing Units. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’19). Association for Computing Machinery, New York, NY, USA. 49–63. isbn:9781450362405 https://doi.org/10.1145/3297858.3304044
[74]
Sheng Li, Hyeontaek Lim, Victor W. Lee, Jung Ho Ahn, Anuj Kalia, Michael Kaminsky, David G. Andersen, O. Seongil, Sukhan Lee, and Pradeep Dubey. 2015. Architecting to Achieve a Billion Requests Per Second Throughput on a Single Key-value Store Server Platform. In Proceedings of the 42Nd Annual International Symposium on Computer Architecture (ISCA ’15). ACM, New York, NY, USA. 476–488. isbn:978-1-4503-3402-0 https://doi.org/10.1145/2749469.2750416
[75]
Enno Lübbers and Marco Platzner. 2009. ReconOS: Multithreaded Programming for Reconfigurable Computers. ACM Trans. Embed. Comput. Syst., 9, 1 (2009), Article 8, Oct., 33 pages. issn:1539-9087 https://doi.org/10.1145/1596532.1596540
[76]
Daniel Lustig, Abhishek Bhattacharjee, and Margaret Martonosi. 2013. TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs. ACM TACO.
[77]
Roman Lysecky, Kris Miller, Frank Vahid, and Kees Vissers. 2005. Firm-core Virtual FPGA for Just-in-Time FPGA Compilation (Abstract Only). In Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-programmable Gate Arrays (FPGA ’05). ACM, New York, NY, USA. 271–271. isbn:1-59593-029-9 https://doi.org/10.1145/1046192.1046247
[78]
Jiacheng Ma, Gefei Zuo, Kevin Loughlin, Xiaohe Cheng, Yanqiang Liu, Abel Mulugeta Eneyew, Zhengwei Qi, and Baris Kasikci. 2020. A Hypervisor for Shared-Memory FPGA Platforms. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’20). Association for Computing Machinery, New York, NY, USA. 827–844. isbn:9781450371025 https://doi.org/10.1145/3373376.3378482
[79]
Microsoft. 2017. Microsoft Azure Goes Back To Rack Servers With Project Olympus. https://www.nextplatform.com/2016/11/01/microsoft-azure-goes-back-rack-servers-project-olympus/ (Accessed July 2018)
[80]
Mahim Mishra, Timothy J. Callahan, Tiberiu Chelcea, Girish Venkataramani, Seth C. Goldstein, and Mihai Budiu. 2006. Tartan: Evaluating Spatial Computation for Whole Program Execution. SIGOPS Oper. Syst. Rev., 40, 5 (2006), Oct., 163–174. issn:0163-5980 https://doi.org/10.1145/1168917.1168878
[81]
NEC. [n. d.]. neoface | NEC Today. http://nectoday.com/tag/neoface/ (Accessed April 2019)
[82]
Tayo Oguntebi and Kunle Olukotun. 2016. GraphOps: A Dataflow Library for Graph Analytics Acceleration. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA ’16). ACM, New York, NY, USA. 111–117. isbn:978-1-4503-3856-1 https://doi.org/10.1145/2847263.2847337
[83]
M.-M. Papadopoulou, Xin Tong, A. Seznec, and A. Moshovos. 2015. Prediction-based Superpage-friendly TLB Designs. In HPCA.
[84]
Wesley Peck, Erik K. Anderson, Jason Agron, Jim Stevens, Fabrice Baijot, and David L. Andrews. 2006. Hthreads: A Computational Model for Reconfigurable Devices. In FPL. IEEE, 1–4.
[85]
Binh Pham, Abhishek Bhattacharjee, Yasuko Eckert, and Gabriel H. Loh. 2014. Increasing TLB Reach by Exploiting Clustering in Page Translations. In HPCA.
[86]
Binh Pham, Viswanathan Vaidyanathan, Aamer Jaleel, and Abhishek Bhattacharjee. 2012. CoLT: Coalesced Large-Reach TLBs. In MICRO.
[87]
K. Dang Pham, A. K. Jain, J. Cui, S. A. Fahmy, and D. L. Maskell. 2013. Microkernel hypervisor for a hybrid ARM-FPGA platform. In Application-Specific Systems, Architectures and Processors (ASAP), 2013 IEEE 24th International Conference on. 219–226. issn:2160-0511 https://doi.org/10.1109/ASAP.2013.6567578
[88]
Bharath Pichai, Lisa Hsu, and Abhishek Bhattacharjee. 2014. Architectural Support for Address Translation on GPUs: Designing Memory Management Units for CPU/GPUs with Unified Address Spaces. In ASPLOS.
[89]
Christian Plessl and Marco Platzner. 2005. Zippy-A coarse-grained reconfigurable array with support for hardware virtualization. In Application-Specific Systems, Architecture Processors, 2005. ASAP 2005. 16th IEEE International Conference on. 213–218.
[90]
Jason Power, Mark D. Hill, and David A. Wood. 2014. Supporting x86-64 address translation for 100s of GPU lanes. In 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA). 568–578. https://doi.org/10.1109/HPCA.2014.6835965
[91]
progranism, teknohog, OrphanedGland, udif, TheSeven, makomk, and newMeat1. [n. d.]. Open Source FPGA Bitcoin Miner. https://github.com/progranism/Open-Source-FPGA-Bitcoin-Miner (Accessed on 10/24/2017)
[92]
Andrew Putnam, Adrian Caulfield, Eric Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, Jim Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger. 2014. A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services. In 41st Annual International Symposium on Computer Architecture (ISCA). http://research.microsoft.com/apps/pubs/default.aspx?id=212001
[93]
Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger. 2016. A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services. Commun. ACM, 59, 11 (2016), Oct., 114–122. issn:0001-0782 https://doi.org/10.1145/2996868
[94]
Kyle Rupnow, Wenyin Fu, and Katherine Compton. 2009. Block, Drop or Roll(back): Alternative Preemption Methods for RH Multi-Tasking. In FCCM 2009, 17th IEEE Symposium on Field Programmable Custom Computing Machines, Napa, California, USA, 5-7 April 2009, Proceedings. 63–70. https://doi.org/10.1109/FCCM.2009.30
[95]
Ashley Saulsbury, Fredrik Dahlgren, and Per Stenström. 2000. Recency-based TLB Preloading. In ISCA.
[96]
Eric Schkufza, Michael Wei, and Christopher J. Rossbach. 2019. Just-In-Time Compilation for Verilog: A New Technique for Improving the FPGA Programming Experience. In Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019, Providence, RI, USA, April 13-17, 2019. 271–286.
[97]
Hardik Sharma, Jongse Park, Emmanuel Amaro, Bradley Thwaites, Praneetha Kotha, Anmol Gupta, Joon Kyung Kim, Asit Mishra, and Hadi Esmaeilzadeh. 2016. Dnnweaver: From high-level deep network models to fpga acceleration. In the Workshop on Cognitive Architectures.
[98]
Mark Silberstein, Bryan Ford, Idit Keidar, and Emmett Witchel. 2013. GPUfs: Integrating a File System with GPUs. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS).
[99]
Hayden Kwok-Hay So and Robert Brodersen. 2008. A Unified Hardware/Software Runtime Environment for FPGA-based Reconfigurable Computers Using BORPH. ACM Trans. Embed. Comput. Syst., 7, 2 (2008), Article 14, Jan., 28 pages. issn:1539-9087 https://doi.org/10.1145/1331331.1331338
[100]
Hayden Kwok-Hay So and Robert W. Brodersen. 2007. BORPH: An Operating System for FPGA-Based Reconfigurable Computers. Ph. D. Dissertation. EECS Department, University of California, Berkeley. http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-92.html
[101]
Shekhar Srikantaiah and Mahmut Kandemir. 2010. Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors. In MICRO.
[102]
C. Steiger, H. Walder, and M. Platzner. 2004. Operating systems for reconfigurable embedded platforms: online scheduling of real-time tasks. IEEE Trans. Comput., 53, 11 (2004), Nov, 1393–1407. issn:0018-9340 https://doi.org/10.1109/TC.2004.99
[103]
G. Stitt and J. Coole. 2011. Intermediate Fabrics: Virtual Architectures for Near-Instant FPGA Compilation. IEEE Embedded Systems Letters, 3, 3 (2011), Sept, 81–84. issn:1943-0663 https://doi.org/10.1109/LES.2011.2167713
[104]
Naveen Suda, Vikas Chandra, Ganesh Dasika, Abinash Mohanty, Yufei Ma, Sarma Vrudhula, Jae-sun Seo, and Yu Cao. 2016. Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA ’16). ACM, New York, NY, USA. 16–25. isbn:978-1-4503-3856-1 https://doi.org/10.1145/2847263.2847276
[105]
M. Talluri and M. D. Hill. 1994. Surpassing the TLB Performance of Superpages with Less Operating System Support. In ASPLOS.
[106]
Mellanox Technologies. 2018. Innova-2 Flex Programmable Network Adapter. http://www.mellanox.com/related-docs/prod_adapter_cards/PB_Innova-2_Flex.pdf (Accessed May 2018)
[107]
Tencent Corp. 2021. FPGA Cloud Server. https://cloud.tencent.com/product/fpga
[108]
A. Tsutsui, T. Miyazaki, K. Yamada, and N. Ohta. 1995. Special Purpose FPGA for High-speed Digital Telecommunication Systems. In Proceedings of the 1995 International Conference on Computer Design: VLSI in Computers and Processors (ICCD ’95). IEEE Computer Society, Washington, DC, USA. 486–491. isbn:0-8186-7165-3 http://dl.acm.org/citation.cfm?id=645463.653355
[109]
Anuj Vaishnav, Khoa Dang Pham, Joseph Powell, and Dirk Koch. 2020. FOS: A Modular FPGA Operating System for Dynamic Workloads. ACM Trans. Reconfigurable Technol. Syst., 13, 4 (2020), Article 20, sep, 28 pages. issn:1936-7406 https://doi.org/10.1145/3405794
[110]
Ján Veselý, Arkaprava Basu, Abhishek Bhattacharjee, Gabriel H. Loh, Mark Oskin, and Steven K. Reinhardt. 2018. Generic System Calls for GPUs. In Proceedings of the 45th Annual International Symposium on Computer Architecture (ISCA ’18). IEEE Press, 843–856. isbn:9781538659847 https://doi.org/10.1109/ISCA.2018.00075
[111]
G. Wassi, Mohamed El Amine Benkhelifa, G. Lawday, F. Verdier, and S. Garcia. 2014. Multi-shape tasks scheduling for online multitasking on FPGAs. In Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC), 2014 9th International Symposium on. 1–7. https://doi.org/10.1109/ReCoSoC.2014.6861366
[112]
Matthew A. Watkins and David H. Albonesi. 2010. ReMAP: A Reconfigurable Heterogeneous Multicore Architecture. In Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO ’43). IEEE Computer Society, Washington, DC, USA. 497–508. isbn:978-0-7695-4299-7 https://doi.org/10.1109/MICRO.2010.15
[113]
Jagath Weerasinghe, François Abel, Christoph Hagleitner, and Andreas Herkersdorf. 2015. Enabling FPGAs in Hyperscale Data Centers. In 2015 IEEE 12th Intl Conf on Ubiquitous Intelligence and Computing and 2015 IEEE 12th Intl Conf on Autonomic and Trusted Computing and 2015 IEEE 15th Intl Conf on Scalable Computing and Communications and Its Associated Workshops (UIC-ATC-ScalCom), Beijing, China, August 10-14, 2015. 1078–1086.
[114]
Stephan Werner, Oliver Oey, Diana Göhringer, Michael Hübner, and Jürgen Becker. 2012. Virtualized On-chip Distributed Computing for Heterogeneous Reconfigurable Multi-core Systems. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE ’12). EDA Consortium, San Jose, CA, USA. 280–283. isbn:978-3-9810801-8-6 http://dl.acm.org/citation.cfm?id=2492708.2492778
[115]
Tobias Wiersema, Ame Bockhorn, and Marco Platzner. 2014. Embedding FPGA overlays into configurable Systems-on-Chip: ReconOS meets ZUMA. In ReConFig. IEEE, 1–6.
[116]
Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, and James Demmel. 2009. Optimization of sparse matrix–vector multiplication on emerging multicore platforms. Parallel Comput., 35, 3 (2009), 178–194. issn:0167-8191 https://doi.org/10.1016/j.parco.2008.12.006 Revolutionary Technologies for Acceleration of Emerging Petascale Applications
[117]
Felix Winterstein, Kermin Fleming, Hsin-Jung Yang, Samuel Bayliss, and George Constantinides. 2015. MATCHUP: Memory Abstractions for Heap Manipulating Programs. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA ’15). ACM, New York, NY, USA. 136–145. isbn:978-1-4503-3315-3 https://doi.org/10.1145/2684746.2689073
[118]
Yue Zha and Jing Li. 2020. Virtualizing FPGAs in the Cloud. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’20). Association for Computing Machinery, New York, NY, USA. 845–858. isbn:9781450371025 https://doi.org/10.1145/3373376.3378491
[119]
Yue Zha and Jing Li. 2021. Hetero-ViTAL: A Virtualization Stack for Heterogeneous FPGA Clusters. In Proceedings of the 48th Annual International Symposium on Computer Architecture (ISCA ’21). IEEE Press, 470–483. isbn:9781450390866 https://doi.org/10.1109/ISCA52012.2021.00044
[120]
Yue Zha and Jing Li. 2021. When Application-Specific ISA Meets FPGAs: A Multi-Layer Virtualization Framework for Heterogeneous Cloud FPGAs. In Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2021). Association for Computing Machinery, New York, NY, USA. 123–134. isbn:9781450383172 https://doi.org/10.1145/3445814.3446699
[121]
Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA ’15). ACM, New York, NY, USA. 161–170. isbn:978-1-4503-3315-3 https://doi.org/10.1145/2684746.2689060
[122]
Tianhao Zheng, David Nellans, Arslan Zulfiqar, Mark Stephenson, and Stephen W. Keckler. 2016. Towards High Performance Paged Memory for GPUs. In HPCA.
[123]
Yuan Zhou, Udit Gupta, Steve Dai, Ritchie Zhao, Nitish Srivastava, Hanchen Jin, Joseph Featherston, Yi-Hsiang Lai, Gai Liu, Gustavo Angarita Velasquez, Wenping Wang, and Zhiru Zhang. 2018. Rosetta: A Realistic High-Level Synthesis Benchmark Suite for Software-Programmable FPGAs. Int’l Symp. on Field-Programmable Gate Arrays (FPGA), Feb.

Cited By

View all
  • (2024)Architectural Support for Sharing, Isolating and Virtualizing FPGA ResourcesACM Transactions on Architecture and Code Optimization10.1145/364847521:2(1-26)Online publication date: 21-May-2024
  • (2024)Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00065(834-847)Online publication date: 29-Jun-2024
  • (2024)FlexiMem: Flexible Shared Virtual Memory for PCIe-attached FPGAs2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00020(78-86)Online publication date: 2-Sep-2024

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3
March 2023
820 pages
ISBN:9781450399180
DOI:10.1145/3582016
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 March 2023

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. FPGAs
  2. Operating Systems
  3. Virtual Memory
  4. Virtualization

Qualifiers

  • Research-article

Funding Sources

Conference

ASPLOS '23

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)770
  • Downloads (Last 6 weeks)91
Reflects downloads up to 16 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Architectural Support for Sharing, Isolating and Virtualizing FPGA ResourcesACM Transactions on Architecture and Code Optimization10.1145/364847521:2(1-26)Online publication date: 21-May-2024
  • (2024)Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00065(834-847)Online publication date: 29-Jun-2024
  • (2024)FlexiMem: Flexible Shared Virtual Memory for PCIe-attached FPGAs2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00020(78-86)Online publication date: 2-Sep-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media