More Web Proxy on the site http://driver.im/

research-article

Software-defined floating-point number formats and their application to graph processing

Author:

H. VandierendonckAuthors Info & Claims

ICS '22: Proceedings of the 36th ACM International Conference on Supercomputing

Article No.: 8, Pages 1 - 17

https://doi.org/10.1145/3524059.3532360

Published: 28 June 2022 Publication History

Abstract

This paper proposes software-defined floating-point number formats for graph processing workloads, which can improve performance in irregular workloads by reducing cache misses. Efficient arithmetic on software-defined number formats is challenging, even when based on conversion to wider, hardware-supported formats. We derive efficient conversion schemes that are tuned to the IA64 and AVX512 instruction sets. We demonstrate that: (i) reduced-precision number formats can be applied to graph processing without loss of accuracy; (ii) conversion of floating-point values is possible with minimal instructions; (iii) conversions are most efficient when utilizing vectorized instruction sets, specifically on IA64 processors. Experiments on twelve real-world graph data sets demonstrate that our techniques result in speedups up to 89% for PageRank and Accelerated PageRank, and up to 35% for Single-Source Shortest Paths. The same techniques help to accelerate the integer-based maximal independent set problem by up to 262%.

References

[1]

2019. IEEE Standard for Floating-Point Arithmetic. IEEE Std 754-2019 (Revision of IEEE 754-2008) (2019), 1--84.

[2]

Andreas Abel. [n.d.]. Latency, Throughput, and Port Usage Information for Instructions on Recent x86 Microarchitectures. https://uops.info.

[3]

Virat Agarwal, Fabrizio Petrini, Davide Pasetto, and David A. Bader. 2010. Scalable Graph Exploration on Multicore Processors. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10). IEEE Computer Society, Washington, DC, USA, 1--11.

Digital Library

[4]

Ankur Agrawal, Silvia M. Mueller, Bruce M. Fleischer, Xiao Sun, Naigang Wang, Jungwook Choi, and Kailash Gopalakrishnan. 2019. DLFloat: A 16-b Floating Point Format Designed for Deep Learning Training and Inference. In 2019 IEEE 26th Symposium on Computer Arithmetic (ARITH). 92--95.

[5]

Andrew Anderson, Servesh Muralidharan, and David Gregg. 2017. Efficient Multibyte Floating Point Data Formats Using Vectorization. IEEE Trans. Comput. 66, 12 (2017), 2081--2096.

[6]

Hartwig Anzt, Goran Flegar, Thomas Grützmacher, and Enrique S Quintana-Ortí. 2019. Toward a modular precision ecosystem for high-performance computing. The International Journal of High Performance Computing Applications 33, 6 (2019), 1069--1078. arXiv:https://doi.org/10.1177/1094342019846547

Digital Library

[7]

Junya Arai, Hiroaki Shiokawa, Takeshi Yamamuro, Makoto Onizuka, and Sotetsu Iwamura. 2016. Rabbit Order: Just-in-Time Parallel Reordering for Fast Graph Analysis. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 22--31.

[8]

Marc Baboulin, Alfredo Buttari, Jack Dongarra, Jakub Kurzak, Julie Langou, Julien Langou, Piotr Luszczek, and Stanimire Tomov. 2009. Accelerating scientific computations with mixed precision algorithms. Computer Physics Communications 180, 12 (2009), 2526--2533. 40 YEARS OF CPC: A celebratory issue focused on quality software for high performance, grid and novel computing architectures.

[9]

Abanti Basak, Xing Hu, Shuangchen Li, Sang Min Oh, and Yuan Xie. 2018. Exploring Core and Cache Hierarchy Bottlenecks in Graph Processing Workloads. IEEE Computer Architecture Letters 17, 2 (2018), 197--200.

[10]

Scott Beamer, Krste Asanović, and David Patterson. 2012. Direction-optimizing Breadth-first Search. In Proc. of the Intl. Conference on High Performance Computing, Networking, Storage and Analysis. 12:1--12:10.

[11]

Scott Beamer, Krste Asanović, and David Patterson. 2015. Locality exists in graph processing: Workload characterization on an Ivy Bridge server. In Workload Characterization (IISWC), 2015 IEEE International Symposium on. IEEE, 56--65.

Digital Library

[12]

Scott Beamer, Krste Asanovic, and David A. Patterson. 2015. The GAP Benchmark Suite. CoRR abs/1508.03619 (2015). arXiv:1508.03619 http://arxiv.org/abs/1508.03619

[13]

Maciej Besta, Simon Weber, Lukas Gianinazzi, Robert Gerstenberger, Andrey Ivanov, Yishai Oltchik, and Torsten Hoefler. 2019. Slim Graph: Practical Lossy Graph Compression for Approximate Graph Processing, Storage, and Analytics. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '19). Association for Computing Machinery, New York, NY, USA, Article 35, 25 pages.

Digital Library

[14]

Guy E. Blelloch, Jeremy T. Fineman, and Julian Shun. 2012. Greedy Sequential Maximal Independent Set and Matching Are Parallel on Average. In Proceedings of the Twenty-fourth Annual ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '12). ACM, New York, NY, USA, 308--317.

Digital Library

[15]

Paolo Boldi, Massimo Santini, and Sebastiano Vigna. 2008. A Large Time-Aware Web Graph. SIGIR Forum 42, 2 (nov 2008), 33--38.

Digital Library

[16]

Ronald S Burt. 2004. Structural holes and good ideas. American journal of sociology 110, 2 (2004), 349--399.

[17]

Zachariah Carmichael, Hamed F. Langroudi, Char Khazanov, Jeffrey Lillie, John L. Gustafson, and Dhireesha Kudithipudi. 2019. Performance-Efficiency Trade-off of Low-Precision Numerical Formats in Deep Neural Networks. In Proceedings of the Conference for Next Generation Arithmetic 2019 (CoNGA'19). Association for Computing Machinery, New York, NY, USA, Article 3, 9 pages.

Digital Library

[18]

Charles L. A. Clarke, Nick Craswell, and Ian Soboroff. 2009. Overview of the TREC 2009 web track. Technical Report. DTIC Document.

[19]

Andrew Dawson and Peter D. Dueben. 2017. rpe v5: an emulator for reduced floating-point precision in large numerical simulations. Geoscientific Model Development 10 (2017), 2221--2230.

[20]

Laxman Dhulipala, Guy Blelloch, and Julian Shun. 2017. Julienne: A Framework for Parallel Graph Algorithms Using Work-Efficient Bucketing. In Proceedings of the 29th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '17). Association for Computing Machinery, New York, NY, USA, 293--304.

Digital Library

[21]

Xiaojun Dong, Yan Gu, Yihan Sun, and Yunming Zhang. 2021. Efficient Stepping Algorithms and Implementations for Parallel Shortest Paths. In Proceedings of the 33rd ACM Symposium on Parallelism in Algorithms and Architectures (SPAA '21). Association for Computing Machinery, New York, NY, USA, 184--197.

Digital Library

[22]

Laurent Fousse, Guillaume Hanrot, Vincent Lefèvre, Patrick Pélissier, and Paul Zimmermann. 2007. MPFR: A Multiple-Precision Binary Floating-Point Library with Correct Rounding. ACM Trans. Math. Softw. 33, 2 (jun 2007), 13--es.

Digital Library

[23]

Friendster 2011. Friendster Data Set. https://archive.org/download/friendster-dataset-201107.

[24]

David F. Gleich. 2009. Models and Algorithms for PageRank Sensitivity. Ph.D. Dissertation. Stanford University. http://www.stanford.edu/group/SOL/dissertations/pagerank-sensitivity-thesis-online.pdf

Digital Library

[25]

Samuel Grossman, Heiner Litz, and Christos Kozyrakis. 2018. Making Pull-based Graph Processing Performant. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '18). ACM, New York, NY, USA, 246--260.

Digital Library

[26]

Thomas Grützmacher, Terry Cojean, Goran Flegar, Hartwig Anzt, and Enrique S. Quintana-Ortí. 2020. Acceleration of PageRank with Customized Precision Based on Mantissa Segmentation. ACM Trans. Parallel Comput. 7, 1, Article 4 (March 2020), 19 pages.

Digital Library

[27]

Azzam Haidar, Harun Bayraktar, Stanimire Tomov, Jack Dongarra, and Nicholas J. Higham. 2020. Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems. Proceedings of the Royal Society A 476, 2243 (Nov. 2020).

[28]

John R. Hauser. 1996. Handling Floating-Point Exceptions in Numeric Programs. ACM Trans. Program. Lang. Syst. 18, 2 (mar 1996), 139--174.

Digital Library

[29]

Azin Heidarshenas, Serif Yesil, Dimitrios Skarlatos, Sasa Misailovic, Adam Morrison, and Josep Torrellas. 2020. V-Combiner: Speeding-up Iterative Graph Processing on a Shared-Memory Platform with Vertex Merging. In Proceedings of the 34th ACM International Conference on Supercomputing (ICS '20). Association for Computing Machinery, New York, NY, USA, Article 9, 13 pages.

Digital Library

[30]

Nicholas J. Higham. 1993. The Accuracy of Floating-Point Summation. SIAM Journal on Scientific Computing 14, 4 (1993), 783--799.

Digital Library

[31]

Sungpack Hong, Tayo Oguntebi, and Kunle Olukotun. 2011. Efficient parallel graph exploration on multi-core CPU and GPU. In Parallel Architectures and Compilation Techniques (PACT), 2011 International Conference on. IEEE, 78--88.

Digital Library

[32]

Intel. 2012. Intel 64 and IA-32 Architectures Optimization Reference Manual. http://www.intel.com/content/dam/doc/manual/64-ia-32-architectures-optimization-manual.pdf.

[33]

Intel 2015. Intel Architecture Instruction Set Extensions Programming Reference. 319433--023.

[34]

Intel. 2018. bfloat16 - Hardware Numerics Definition. https://software.intel.com/content/www/us/en/develop/download/bfloat16-hardware-numerics-definition.html.

[35]

Anand Padmanabha Iyer, Zaoxing Liu, Xin Jin, Shivaram Venkataraman, Vladimir Braverman, and Ion Stoica. 2018. ASAP: Fast, Approximate Graph Pattern Mining at Scale. In Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation (OSDI'18) USENIX Association, USA, 745--761.

[36]

William Kahan. 1965. Pracniques: Further Remarks on Reducing Truncation Errors. Commun. ACM 8, 1 (Jan. 1965), 40.

Digital Library

[37]

Dhiraj Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Kunal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, Jiyan Yang, Jongsoo Park, Alexander Heinecke, Evangelos Georganas, Sudarshan Srinivasan, Abhisek Kundu, Misha Smelyanskiy, Bharat Kaul, and Pradeep Dubey. 2019. A Study of BFLOAT16 for Deep Learning Training. arXiv:cs.LG/1905.12322

[38]

U. Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. 2009. PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations. In Proceedings of the 2009 Ninth IEEE International Conference on Data Mining (ICDM '09). IEEE Computer Society, Washington, DC, USA, 229--238.

Digital Library

[39]

Jeremy Kepner, Peter Aaltonen, David Bader, Aydin Buluç, Franz Franchetti, John Gilbert, Dylan Hutchison, Manoj Kumar, Andrew Lumsdaine, Henning Meyerhenke, Scott McMillan, Carl Yang, John D. Owens, Marcin Zalewski, Timothy Mattson, and Jose Moreira. 2016. Mathematical foundations of the GraphBLAS. In 2016 IEEE High Performance Extreme Computing Conference (HPEC). 1--9.

[40]

Moritz Kreutzer, Georg Hager, Gerhard Wellein, Holger Fehske, and Alan R. Bishop. 2014. A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units. SIAM Journal on Scientific Computing 36, 5 (2014), C401--C423. arXiv:https://doi.org/10.1137/130930352

Digital Library

[41]

Jéroôme Kunegis. 2013. KONECT: The Koblenz Network Collection. In Proceedings of the 22nd International Conference on World Wide Web (WWW '13 Companion). Association for Computing Machinery, New York, NY, USA, 1343--1350.

Digital Library

[42]

Haewoon Kwak, Changhyun Lee, Hosung Park, and Sue Moon. 2010. What is Twitter, a Social Network or a News Media?. In Proceedings of the 19th International Conference on World Wide Web (WWW '10). Association for Computing Machinery, New York, NY, USA, 591--600.

Digital Library

[43]

JunKyu Lee, Dimitrios S. Nikolopoulos, and Hans Vandierendonck. 2020. Mixed-Precision Kernel Recursive Least Squares. IEEE Transactions on Neural Networks and Learning Systems (2020), 1--15.

[44]

JunKyu Lee, Gregory D. Peterson, Dimitrios S. Nikolopoulos, and Hans Vandierendonck. 2020. AIR: Iterative refinement acceleration using arbitrary dynamic precision. Parallel Comput. 97 (2020), 102663.

[45]

JunKyu Lee, Hans Vandierendonck, Mahwish Arif, Gregory D. Peterson, and Dimitrios S. Nikolopoulos. 2018. Energy-Efficient Iterative Refinement Using Dynamic Precision. IEEE Journal on Emerging and Selected Topics in Circuits and Systems 8, 4 (2018), 722--735.

[46]

Yongsub Lim, U Kang, and Christos Faloutsos. 2014. SlashBurn: Graph Compression and Mining beyond Caveman Communities. IEEE Transactions on Knowledge and Data Engineering 26, 12 (Dec 2014), 3077--3089.

[47]

Weifeng Liu and Brian Vinter. 2015. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS '15). ACM, New York, NY, USA, 339--350.

Digital Library

[48]

Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M. Hellerstein. 2012. Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud. Proc. VLDB Endow. 5, 8 (April 2012), 716--727.

Digital Library

[49]

Andrew Lumsdaine, Douglas Gregor, Bruce Hendrickson, and Jonathan W. Berry. 2007. Challenges in parallel graph processing. Parallel Processing Letters 17, 01 (2007), 5--20.

[50]

Grzegorz Malewicz, Matthew H. Austern, Aart J.C Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. 2010. Pregel: A System for Large-Scale Graph Processing. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data (SIGMOD '10). Association for Computing Machinery, New York, NY, USA, 135--146.

Digital Library

[51]

Frank McSherry. 2005. A Uniform Approach to Accelerated PageRank Computation. In Proceedings of the 14th International Conference on World Wide Web (WWW '05). ACM, New York, NY, USA, 575--582.

Digital Library

[52]

Robert Meusel, Sebastiano Vigna, Oliver Lehmberg, and Christian Bizer. 2014. Graph Structure in the Web --- Revisited: A Trick of the Heavy Tail. In Proceedings of the 23rd International Conference on World Wide Web (WWW '14 Companion). Association for Computing Machinery, New York, NY, USA, 427--432.

Digital Library

[53]

Ulrich Meyer and Prasanthan Sanders. 2003. Δ-Stepping: A Parallelizable Shortest Path Algorithm. J. Algorithms 49, 1 (oct 2003), 114--152.

Digital Library

[54]

Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory F. Diamos, Erich Elsen, David García, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, and Hao Wu. 2017. Mixed Precision Training. CoRR abs/1710.03740 (2017). arXiv:1710.03740 http://arxiv.org/abs/1710.03740

[55]

Alan Mislove, Massimiliano Marcon, Krishna P. Gummadi, Peter Druschel, and Bobby Bhattacharjee. 2007. Measurement and Analysis of Online Social Networks. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement (IMC '07). Association for Computing Machinery, New York, NY, USA, 29--42.

Digital Library

[56]

Amir S. Molahosseini and Hans Vandierendonck. 2020. Half-Precision Floating-Point Formats for PageRank: Opportunities and Challenges. In 2020 IEEE High Performance Extreme Computing Conference (HPEC). 1--7.

[57]

Steven Noel. 2018. A Review of Graph Approaches to Network Security Analytics. Springer International Publishing, Cham, 300--323.

[58]

NVidia 2021. CUDA Math API. https://docs.nvidia.com/cuda/cuda-math-api/index.html.

[59]

Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd. 1999. The PageRank Citation Ranking: Bringing Order to the Web. Technical Report. Stanford InfoLab.

[60]

Ajay Panyala, Omer Subasi, Mahantesh Halappanavar, Ananth Kalyanaraman, Daniel Chavarria-Miranda, and Sriram Krishnamoorthy. 2017. Approximate Computing Techniques for Iterative Graph Algorithms. In 2017 IEEE 24th International Conference on High Performance Computing (HiPC). 23--32.

[61]

Alberto Parravicini, Francesco Sgherzi, and Marco D. Santambrogio. 2021. A Reduced-Precision Streaming SpMV Architecture for Personalized PageRank on FPGA. In Proceedings of the 26th Asia and South Pacific Design Automation Conference (ASPDAC '21). Association for Computing Machinery, New York, NY, USA, 378--383.

Digital Library

[62]

Posit Working Group. 2018. Posit Standard Documentation. Release 3.2-draft. https://posithub.org/docs/posit_standard.pdf.

[63]

Nataša Pržulj. 2011. Protein-protein interactions: Making sense of networks via graph-theoretic modeling. Bioessays 33, 2 (2011), 115--123.

[64]

Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel. 2013. X-Stream: Edge-Centric Graph Processing Using Streaming Partitions. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (SOSP '13). Association for Computing Machinery, New York, NY, USA, 472--488.

Digital Library

[65]

Nadathur Satish, Narayanan Sundaram, Md. Mostofa Ali Patwary, Jiwon Seo, Jongsoo Park, M. Amber Hassaan, Shubho Sengupta, Zhaoming Yin, and Pradeep Dubey. 2014. Navigating the Maze of Graph Analytics Frameworks Using Massive Graph Datasets. In Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data (SIGMOD '14). Association for Computing Machinery, New York, NY, USA, 979--990.

Digital Library

[66]

Julian Shun and Guy E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. SIGPLAN Not. 48, 8, 135--146.

Digital Library

[67]

Julian Shun, Laxman Dhulipala, and Guy E. Blelloch. 2015. Smaller and Faster: Parallel Processing of Compressed Graphs with Ligra+. In 2015 Data Compression Conference. 403--412.

Digital Library

[68]

Somesh Singh and Rupesh Nasre. 2018. Scalable and Performant Graph Processing on GPUs Using Approximate Computing. IEEE Transactions on Multi-Scale Computing Systems 4, 3 (2018), 190--203.

[69]

Jiawen Sun, Hand Vandierendonck, and Dimitrios S. Nikolopoulos. 2017. GraphGrind: Addressing Load Imbalance of Graph Partitioning. In Proceedings of the International Conference on Supercomputing (ICS '17). ACM, New York, NY, USA, Article 16, 10 pages.

Digital Library

[70]

Xiao Sun, Naigang Wang, Chia-Yu Chen, Jiamin Ni, Ankur Agrawal, Xiaodong Cui, Swagath Venkataramani, Kaoutar El Maghraoui, Vijayalakshmi (Viji) Srinivasan, and Kailash Gopalakrishnan. 2020. Ultra-Low Precision 4-bit Training of Deep Neural Networks. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 1796--1807. https://proceedings.neurips.cc/paper/2020/file/13b919438259814cd5be8cb45877d577-Paper.pdf

[71]

Sruthikesh Surineni, Ruidong Gu, Huyen Nguyen, and Michela Becchi. 2017. Understanding the performance-accuracy tradeoffs of floating-point arithmetic on GPUs. In 2017 IEEE International Symposium on Workload Characterization (IISWC). 207--218.

[72]

Guiseppe Tagliavini, Andrea Marongiu, and Luca Benini. 2020. FlexFloat: A Software Library for Transprecision Computing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 1 (2020), 145--156.

Digital Library

[73]

Hans Vandierendonck. 2020. Graptor: Efficient Pull and Push Style Vectorized Graph Processing. In Proceedings of the 34th ACM International Conference on Supercomputing (ICS '20). Association for Computing Machinery, New York, NY, USA, Article 13, 13 pages.

Digital Library

[74]

Vincent Vanhoucke, Andrew Senior, and Mark Z. Mao. 2011. Improving the speed of neural networks on CPUs. In Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011.

[75]

Naigang Wang, Jungwook Choi, Daniel Brand, Chia-Yu Chen, and Kailash Gopalakrishnan. 2018. Training Deep Neural Networks with 8-Bit Floating Point Numbers. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (NIPS'18). Curran Associates Inc., Red Hook, NY, USA, 7686--7695.

Digital Library

[76]

Hao Wei, Jeffrey Xu Yu, Can Lu, and Xuemin Lin. 2016. Speedup Graph Processing by Graph Ordering. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16). ACM, NewYork, NY, USA, 1813--1828.

Digital Library

[77]

Wikimedia 2011. Wikimedia Downloads. https://dumps.wikimedia.org.

[78]

Jeremiah Willcock and Andrew Lumsdaine. 2006. Accelerating Sparse Matrix Computations via Data Compression. In Proceedings of the 20th Annual International Conference on Supercomputing (ICS '06). Association for Computing Machinery, New York, NY, USA, 307--316.

Digital Library

[79]

Biwei Xie, Jianfeng Zhan, Xu Liu, Wanling Gao, Zhen Jia, Xiwen He, and Lixin Zhang. 2018. CVR: Efficient Vectorization of SpMV on x86 Processors. In Proceedings of the 2018 International Symposium on Code Generation and Optimization (CGO 2018). ACM, New York, NY, USA, 149--162.

Digital Library

[80]

Shixiong Xu and David Gregg. 2017. Bitslice Vectors: A Software Approach to Customizable Data Precision on Processors with SIMD Extensions. In 2017 46th International Conference on Parallel Processing (ICPP). 442--451.

[81]

Serif Yesil, Azin Heidarshenas, Adam Morrison, and Josep Torrellas. 2020. Speeding up SpMV for Power-Law Graph Analytics by Enhancing Locality and Vectorization. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '20). IEEE Press, Article 86, 15 pages.

Digital Library

[82]

Kaiyuan Zhang, Rong Chen, and Haibo Chen. 2015. NUMA-Aware Graph-Structured Analytics. SIGPLAN Not. 50, 8 (Jan. 2015), 183--193.

Digital Library

[83]

Yunming Zhang, Ajay Brahmakshatriya, Xinyi Chen, Laxman Dhulipala, Shoaib Kamil, Saman Amarasinghe, and Julian Shun. 2020. Optimizing Ordered Graph Algorithms with GraphIt. Association for Computing Machinery, New York, NY, USA, 158--170.

Digital Library

[84]

Yunming Zhang, Vladimir Kiriansky, Charith Mendis, Saman Amarasinghe, and Matei Zaharia. 2017. Making caches work for graph analytics. In 2017 IEEE International Conference on Big Data (Big Data). 293--302.

Index Terms

Software-defined floating-point number formats and their application to graph processing

Recommendations

High-Radix Formats for Enhancing Floating-Point FPGA Implementations
Abstract
This article proposes a family of high-radix floating-point representation to efficiently deal with floating-point addition in FPGA devices with no native floating-point support. Since variable shifter implementation (required in any FP adder) has ...
Decimal Floating-Point Multiplication

Decimal multiplication is important in many commercial applications including financial analysis, banking, tax calculation, currency conversion, insurance, and accounting. This paper presents the design of two decimal floating-point multipliers: one ...
Intel® Itanium® floating-point architecture
WCAE '03: Proceedings of the 2003 workshop on Computer architecture education: Held in conjunction with the 30th International Symposium on Computer Architecture

The Intel® Itanium® architecture is increasingly becoming one of the major processor architectures present in the market today. Launched in 2001, the Intel Itanium processor was followed in 2002 by the Itanium 2 processor, with increased integer and ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '22: Proceedings of the 36th ACM International Conference on Supercomputing

June 2022

514 pages

ISBN:9781450392815

DOI:10.1145/3524059

General Chairs:
Lawrence Rauchwerger
University of Illinois at Urbana-Champaign
,
Kirk Cameron
Virginia Tech
,
Program Chairs:
Dimitrios S. Nikolopoulos
Virginia Tech
,
Dionisios Pnevmatikatos
National Technical University of Athens

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

ICS '22

Sponsor:

SIGARCH

ICS '22: 2022 International Conference on Supercomputing

June 28 - 30, 2022

Virtual Event

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
169
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)3

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents