[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3470496.3527437acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

NDMiner: accelerating graph pattern mining using near data processing

Published: 11 June 2022 Publication History

Abstract

Graph Pattern Mining (GPM) algorithms mine structural patterns in graphs. The performance of GPM workloads is bottlenecked by control flow and memory stalls. This is because of data-dependent branches used in set intersection and difference operations that dominate the execution time.
This paper first conducts a systematic GPM workload analysis and uncovers four new observations to inform the optimization effort. First, GPM workloads mostly fetch inputs of costly set operations from different memory banks. Second, to avoid redundant computation, modern GPM workloads employ symmetry breaking that discards several data reads, resulting in cache pollution and wasted DRAM bandwidth. Third, sparse pattern mining algorithms perform redundant memory reads and computations. Fourth, GPM workloads do not fully utilize the in-DRAM data parallelism.
Based on these observations, this paper presents NDMiner, a Near Data Processing (NDP) architecture that improves the performance of GPM workloads. To reduce in-memory data transfer of fetching data from different memory banks, NDMiner integrates compute units to offload set operations in the buffer chip of DRAM. To alleviate the wasted memory bandwidth caused by symmetry breaking, NDMiner integrates a load elision unit in hardware that detects the satisfiability of symmetry breaking constraints and terminates unnecessary loads. To optimize the performance of sparse pattern mining, NDMiner employs compiler optimizations and maps reduced reads and composite computation to NDP hardware that improves algorithmic efficiency of sparse GPM. Finally, NDMiner proposes a new graph remapping scheme in memory and a hardware-based set operation reordering technique to best optimize bank, rank, and channel-level parallelism in DRAM. To orchestrate NDP computation, this paper presents design modifications at the host ISA, compiler, and memory controller. We compare the performance of NDMiner with state-of-the-art software and hardware baselines using a mix of dense and sparse GPM algorithms. Our evaluation shows that NDMiner significantly outperforms software and hardware baselines by 6.4X and 2.5X, on average, while incurring a negligible area overhead on CPU and DRAM.

References

[1]
Abraham Addisie, Hiwot Kassa, Opeoluwa Matthews, and Valeria Bertacco. 2018. Heterogeneous Memory Subsystem for Natural Graph Analytics. In 2018 IEEE International Symposium on Workload Characterization (IISWC). 134--145.
[2]
Junwhan Ahn, Sungpack Hong, Sungjoo Yoo, Onur Mutlu, and Kiyoung Choi. 2015. A scalable processing-in-memory accelerator for parallel graph processing. In 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA). 105--117.
[3]
S. Ainsworth and T. M. Jones. 2016. Graph Prefetching Using Data Structure Knowledge. In ICS (Istanbul, Turkey). New York, NY, USA, Article 39, 11 pages.
[4]
V. Balaji and B. Lucia. 2018. When is Graph Reordering an Optimization? Studying the Effect of Lightweight Graph Reordering Across Applications and Input Graphs. In IISWC. 203--214.
[5]
Abanti Basak, Shuangchen Li, Xing Hu, Sang Min Oh, Xinfeng Xie, Li Zhao, Xiaowei Jiang, and Yuan Xie. 2019. Analysis and Optimization of the Memory Hierarchy for Graph Processing Workloads. In HPCA. 373--386.
[6]
S. Beamer, K. Asanović, and D. Patterson. 2015. The GAP Benchmark Suite. In arXiv:1508.03619 [cs.DC].
[7]
Leul Wuletaw Belayneh and V. Bertacco. 2020. GraphVine: Exploiting Multicast for Scalable Graph Analytics. 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE) (2020), 762--767.
[8]
Maciej Besta, Raghavendra Kanakagiri, Grzegorz Kwasniewski, Rachata Ausavarungnirun, Jakub Beránek, Konstantinos Kanellopoulos, Kacper Janda, Zur Vonarburg-Shmaria, Lukas Gianinazzi, Ioana Stefan, Juan Gómez Luna, Marcin Copik, Lukas Kapp-Schwoerer, Salvatore Di Girolamo, Marek Konieczny, Onur Mutlu, and Torsten Hoefler. 2021. SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems. 2021 54th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO) (2021).
[9]
Xuhao Chen, Roshan Dathathri, Gurbinder Gill, Loc Hoang, and Keshav Pingali. 2021. Sandslash: A Two-Level Framework for Efficient Graph Pattern Mining. In Proceedings of the ACM International Conference on Supercomputing (Virtual Event, USA) (ICS '21). Association for Computing Machinery, New York, NY, USA, 378--391.
[10]
Xuhao Chen, Roshan Dathathri, Gurbinder Gill, and Keshav Pingali. 2020. Pangolin: An Efficient and Flexible Graph Mining System on CPU and GPU. Proc. VLDB Endow. 13, 8 (April 2020), 1190--1205.
[11]
Xuhao Chen and Tianhao Huang. 2021. GraphMinerBench open-source implementations. In Github Repository. https://github.com/chenxuhao/GraphMiner
[12]
X. Chen, T. Huang, S. Xu, T. Bourgeat, C. Chung, and Arvind. 2021. FlexMiner: A Pattern-Aware Accelerator for Graph Pattern Mining. In Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer Architecture. 581--594.
[13]
Ping Chi, Shuangchen Li, Cong Xu, Tao Zhang, Jishen Zhao, Yongpan Liu, Yu Wang, and Yuan Xie. 2016. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 27--39.
[14]
Young-Rae Cho and Aidong Zhang. 2010. Predicting Protein Function byFrequent Functional Association Pattern Mining in Protein Interaction Networks. IEEE Transactions on Information Technology in Biomedicine 14, 1 (2010), 30--36.
[15]
Guohao Dai, Tianhao Huang, Yuze Chi, Jishen Zhao, Guangyu Sun, Yongpan Liu, Yu Wang, Yuan Xie, and Huazhong Yang. 2019. GraphH: A Processing-in-Memory Architecture for Large-Scale Graph Processing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 4 (2019), 640--653.
[16]
Vinicius Dias, Carlos HC Teixeira, Dorgival Guedes, Wagner Meira, and Srinivasan Parthasarathy. 2019. Fractal: A general-purpose graph pattern mining system. In Proceedings of the 2019 International Conference on Management of Data. 1357--1374.
[17]
D. G. Elliott, M. Stumm, W. M. Snelgrove, C. Cojocaru, and R. Mckenzie. 1999. Computational RAM: implementing processors in memory. IEEE Design Test of Computers 16, 1 (Jan 1999), 32--41.
[18]
Mojtaba Eskandari and Hooman Raesi. 2014. Frequent sub-graph mining for intelligent malware detection. Security and Communication Networks 7, 11 (2014), 1872--1886. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1002/sec.902
[19]
M. Gokhale, B. Holmes, and K. Iobst. 1995. Processing in memory: the Terasys massively parallel PIM array. Computer 28, 4 (Apr 1995), 23--31.
[20]
Tae Jun Ham, Lisa Wu, Narayanan Sundaram, Nadathur Satish, and Margaret Martonosi. 2016. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics. In 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--13.
[21]
Kartik Hegde, Hadi Asghari-Moghaddam, Michael Pellauer, Neal Crago, Aamer Jaleel, Edgar Solomonik, Joel Emer, and Christopher W Fletcher. 2019. Extensor: An accelerator for sparse tensor algebra. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 319--333.
[22]
Kasra Jamshidi, Rakesh Mahadasa, and Keval Vora. 2020. Peregrine: A Pattern-Aware Graph Mining System., Article 13 (2020), 16 pages.
[23]
Kasra Jamshidi and Keval Vora. 2021. A Deeper Dive into Pattern-Aware Subgraph Exploration with PEREGRINE. SIGOPS Oper. Syst. Rev. 55, 1 (June 2021), 1--10.
[24]
Oren Kalinsky, Benny Kimelfeld, and Yoav Etsion. 2020. The TrieJax Architecture: Accelerating Graph Operations Through Relational Joins. In Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems (Lausanne, Switzerland) (ASPLOS '20). Association for Computing Machinery, New York, NY, USA, 1217--1231.
[25]
Liu Ke, Udit Gupta, Benjamin Youngjae Cho, David Brooks, Vikas Chandra, Utku Diril, Amin Firoozshahian, Kim Hazelwood, Bill Jia, Hsien-Hsin S. Lee, Meng Li, Bert Maher, Dheevatsa Mudigere, Maxim Naumov, Martin Schatz, Mikhail Smelyanskiy, Xiaodong Wang, Brandon Reagen, Carole-Jean Wu, Mark Hempstead, and Xuan Zhang. 2020. RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing. In 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). 790--803.
[26]
Y. Kim, W. Yang, and O. Mutlu. 2016. Ramulator: A Fast and Extensible DRAM Simulator. IEEE Comput. Archit. Lett. 15, 1 (Jan. 2016), 45--49.
[27]
Sukhan Lee, Shin haeng Kang, Jaehoon Lee, Hyeonsu Kim, Eojin Lee, Seungwoo Seo, Hosang Yoon, Seungwon Lee, Kyounghwan Lim, Hyunsung Shin, Jinhyun Kim, Seongil O, Anand Iyer, David Wang, Kyomin Sohn, and Nam Sung Kim. 2021. Hardware Architecture and Software Stack for PIM Based on Commercial DRAM Technology. In 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA).
[28]
Yutaka I. Leon-Suematsu, Kentaro Inui, Sadao Kurohashi, and Yutaka Kidawara. 2011. Web Spam Detection by Exploring Densely Connected Subgraphs. In Proceedings of the 2011 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology - Volume 01 (WI-IAT '11). IEEE Computer Society, USA, 124--129.
[29]
Shuangchen Li, Cong Xu, Qiaosha Zou, Jishen Zhao, Yu Lu, and Yuan Xie. 2016. Pinatubo: A processing-in-memory architecture for bulk bitwise operations in emerging non-volatile memories. In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC). 1--6.
[30]
D. Mawhirter, S. Reinehr, C. Holmes, T. Liu, and B. Wu. 2019. Graphzero: Breaking symmetry for efficient graph mining. In arXiv preprint arXiv:1911.12877.
[31]
Daniel Mawhirter and Bo Wu. 2019. Automine: harmonizing high-level abstraction and high performance for graph mining. In Proceedings of the 27th ACM Symposium on Operating Systems Principles. 509--523.
[32]
P. J. Meaney, L. D. Curley, G. D. Gilda, M. R. Hodges, D. J. Buerkle, R. D. Siegl, and R. K. Dong. 2015. The IBM Z13 Memory Subsystem for Big Data. IBM J. Res. Dev. 59, 4--5 (July 2015), 4:1--4:11.
[33]
Anurag Mukkara, Nathan Beckmann, Maleen Abeydeera, Xiaosong Ma, and Daniel Sanchez. 2018. Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling. In 2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 1--14.
[34]
Anurag Mukkara, Nathan Beckmann, and Daniel Sanchez. 2017. Cache-Guided Scheduling: Exploiting caches to maximize locality in graph processing. AGP'17 (2017).
[35]
Anurag Mukkara, Nathan Beckmann, and Daniel Sanchez. 2019. PHI: Architectural Support for Synchronization- and Bandwidth-Efficient Commutative Scatter Updates. MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, 1009--1022.
[36]
Mark Oskin, Frederic T. Chong, and Timothy Sherwood. 1998. Active Pages: A Computation Model for Intelligent Memory. SIGARCH Comput. Archit. News 26, 3 (April 1998), 192--203.
[37]
Muhammet Mustafa Ozdal, Serif Yesil, Taemin Kim, Andrey Ayupov, John Greth, Steven Burns, and Ozcan Ozturk. 2016. Energy Efficient Architecture for Graph Analytics Accelerators. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 166--177.
[38]
David Patterson, Thomas Anderson, Neal Cardwell, Richard Fromm, Kimberly Keeton, Christoforos Kozyrakis, Randi Thomas, and Katherine Yelick. 1997. A Case for Intelligent RAM. IEEE Micro 17, 2 (March 1997), 34--44.
[39]
D. Patterson, T. Anderson, N. Cardwell, R. Fromm, K. Keeton, C. Kozyrakis, R. Thomas, and K. Yelick. 1997. Intelligent RAM (IRAM): chips that remember and compute. In 1997 IEEE International Solids-State Circuits Conference. Digest of Technical Papers. 224--225.
[40]
Ashutosh Pattnaik, Xulong Tang, Onur Kayiran, Adwait Jog, Asit Mishra, Mahmut T Kandemir, Anand Sivasubramaniam, and Chita R Das. 2019. Opportunistic computing in gpu architectures. In 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA). IEEE, 210--223.
[41]
Peter Pessl, Daniel Gruss, Clémentine Maurice, Michael Schwarz, and Stefan Mangard. 2016. DRAMA: Exploiting DRAM addressing for cross-cpu attacks. In 25th USENIX security symposium (USENIX security 16). 565--581.
[42]
B. Prakash. 2015. Graph Mining for Cyber Security. Advances in Information Security 56 (04 2015), 287--306.
[43]
Gengyu Rao, Jingji Chen, Jason Yik, and Xuehai Qian. 2021. IntersectX: An Efficient Accelerator for Graph Mining. arXiv:2012.10848v4 (2021).
[44]
Samsung. 2018. DDR4 SDRAM Data sheet, 288--pin Load Reduced DIMM (LRDIMM)-based 8GB C-Die.
[45]
Vivek Seshadri, Yoongu Kim, Chris Fallin, Donghyuk Lee, Rachata Ausavarungnirun, Gennady Pekhimenko, Yixin Luo, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2013. RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization. In 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 185--197.
[46]
Vivek Seshadri, Donghyuk Lee, Thomas Mullins, Hasan Hassan, Amirali Boroumand, Jeremie Kim, Michael A. Kozuch, Onur Mutlu, Phillip B. Gibbons, and Todd C. Mowry. 2017. Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology. In 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 273--287.
[47]
Ali Shafiee, Anirban Nag, Naveen Muralimanohar, Rajeev Balasubramonian, John Paul Strachan, Miao Hu, R. Stanley Williams, and Vivek Srikumar. 2016. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. In 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 14--26.
[48]
T. Shi, M. Zhai, Y. Xu, and J. Zhai. 2020. GraphPi: High Performance Graph Pattern Matching through Effective Redundancy Elimination. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis.
[49]
Shreyas G. Singapura, Ajitesh Srivastava, Rajgopal Kannan, and Viktor K. Prasanna. 2017. OSCAR: Optimizing SCrAtchpad reuse for graph processing. In 2017 IEEE High Performance Extreme Computing Conference (HPEC). 1--7.
[50]
Linghao Song, Youwei Zhuo, Xuehai Qian, Hai Li, and Yiran Chen. 2018. GraphR: Accelerating Graph Processing Using ReRAM. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 531--543.
[51]
Nishil Talati, Ameer Haj Ali, Rotem Ben Hur, Nimrod Wald, Ronny Ronen, Pierre-Emmanuel Gaillardon, and Shahar Kvatinsky. 2018. Practical challenges in delivering the promises of real processing-in-memory machines. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1628--1633.
[52]
Nishil Talati, Saransh Gupta, Pravin Mane, and Shahar Kvatinsky. 2016. Logic design within memristive memories using memristor-aided loGIC (MAGIC). IEEE Transactions on Nanotechnology 15,4 (2016), 635--650.
[53]
Nishil Talati, Heonjae Ha, Ben Perach, Ronny Ronen, and Shahar Kvatinsky. 2019. Concept: A column-oriented memory controller for efficient memory and pim operations in rram. IEEE Micro 39, 1 (2019), 33--43.
[54]
Nishil Talati, Kyle May, Armand Behroozi, Yichen Yang, Kuba Kaszyk, Christos Vasiladiotis, Tarunesh Verma, Lu Li, Brandon Nguyen, Jiawen Sun, John Magnus Morton, Agreen Ahmadi, Todd Austin, Michael O'Boyle, Scott Mahlke, Trevor Mudge, and Ronald Dreslinski. 2021. Prodigy: Improving the Memory Latency of Data-Indirect Irregular Workloads Using Hardware-Software Co-Design. (2021), 654--667.
[55]
Lei Tang and Huan Liu. 2010. Graph Mining Applications to Social Network Analysis. In Managing and Mining Graph Data.
[56]
Carlos HC Teixeira, Alexandre J Fonseca, Marco Serafini, Georgos Siganos, Mohammed J Zaki, and Ashraf Aboulnaga. 2015. Arabesque: a system for distributed graph mining. In Proceedings of the 25th Symposium on Operating Systems Principles. 425--440.
[57]
Johan Ugander, Lars Backstrom, and Jon Kleinberg. 2013. Subgraph Frequencies: Mapping the Empirical and Extremal Geography of Large Graph Collections. In Proceedings of the 22nd International Conference on World Wide Web (Rio de Janeiro, Brazil) (WWW '13). Association for Computing Machinery, New York, NY, USA, 1307--1318.
[58]
Kai Wang, Zhiqiang Zuo, John Thorpe, Tien Quang Nguyen, and Guoqing Harry Xu. 2018. {RStream}: Marrying Relational Algebra with Streaming for Efficient Graph Mining on A Single Machine. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 763--782.
[59]
Mingyu Yan, Xing Hu, Shuangchen Li, Abanti Basak, Han Li, Xin Ma, Itir Akgun, Yujing Feng, Peng Gu, Lei Deng, Xiaochun Ye, Zhimin Zhang, Dongrui Fan, and Yuan Xie. 2019. Alleviating Irregularity in Graph Analytics Acceleration: A Hardware/Software Co-Design Approach. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (Columbus, OH, USA) (MICRO '52). Association for Computing Machinery, New York, NY, USA, 615--628.
[60]
P. Yao, L. Zheng, Z. Zeng, Y. Huang, C. Gui, X. Liao, H. Jin, and J. Xue. 2020. A Locality-Aware Energy-Efficient Accelerator for Graph Mining Applications. In 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). 895--907.
[61]
Mingxing Zhang, Youwei Zhuo, Chao Wang, Mingyu Gao, Yongwei Wu, Kang Chen, Christos Kozyrakis, and Xuehai Qian. 2018. GraphP: Reducing Communication for PIM-Based Graph Processing with Efficient Data Partition. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 544--557.
[62]
Youwei Zhuo, Chao Wang, Mingxing Zhang, Rui Wang, Dimin Niu, Yanzhi Wang, and Xuehai Qian. 2019. Graphq: Scalable pim-based graph processing. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture. 712--725.

Cited By

View all
  • (2024)PimPam: Efficient Graph Pattern Matching on Real Processing-in-Memory HardwareProceedings of the ACM on Management of Data10.1145/36549642:3(1-25)Online publication date: 30-May-2024
  • (2024)TMiner: A Vertex-Based Task Scheduling Architecture for Graph Pattern Mining2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00096(1295-1308)Online publication date: 2-Nov-2024
  • (2024)AceMiner: Accelerating Graph Pattern Matching using PIM with Optimized Cache System2024 IEEE 42nd International Conference on Computer Design (ICCD)10.1109/ICCD63220.2024.00091(558-565)Online publication date: 18-Nov-2024
  • Show More Cited By

Index Terms

  1. NDMiner: accelerating graph pattern mining using near data processing

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    ISCA '22: Proceedings of the 49th Annual International Symposium on Computer Architecture
    June 2022
    1097 pages
    ISBN:9781450386104
    DOI:10.1145/3470496
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    In-Cooperation

    • IEEE CS TCAA: IEEE CS technical committee on architectural acoustics

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 June 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. graph pattern mining
    2. hardware-software co-design
    3. near data processing

    Qualifiers

    • Research-article

    Funding Sources

    • Air Force Research Laboratory (AFRL) and Defense Advanced Research Projects Agency (DARPA)

    Conference

    ISCA '22
    Sponsor:

    Acceptance Rates

    ISCA '22 Paper Acceptance Rate 67 of 400 submissions, 17%;
    Overall Acceptance Rate 543 of 3,203 submissions, 17%

    Upcoming Conference

    ISCA '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)180
    • Downloads (Last 6 weeks)21
    Reflects downloads up to 04 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)PimPam: Efficient Graph Pattern Matching on Real Processing-in-Memory HardwareProceedings of the ACM on Management of Data10.1145/36549642:3(1-25)Online publication date: 30-May-2024
    • (2024)TMiner: A Vertex-Based Task Scheduling Architecture for Graph Pattern Mining2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00096(1295-1308)Online publication date: 2-Nov-2024
    • (2024)AceMiner: Accelerating Graph Pattern Matching using PIM with Optimized Cache System2024 IEEE 42nd International Conference on Computer Design (ICCD)10.1109/ICCD63220.2024.00091(558-565)Online publication date: 18-Nov-2024
    • (2023)A Tensor Marshaling Unit for Sparse Tensor Algebra on General-Purpose ProcessorsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614284(1332-1346)Online publication date: 28-Oct-2023
    • (2023)GraphSet: High Performance Graph Mining through Equivalent Set TransformationsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3613213(1-14)Online publication date: 12-Nov-2023
    • (2023)Shogun: A Task Scheduling Framework for Graph Mining AcceleratorsProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589086(1-15)Online publication date: 17-Jun-2023
    • (2023)Exploiting the Potential of Flexible Processing Units2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD59825.2023.00013(34-45)Online publication date: 17-Oct-2023
    • (2023)DIMM-Link: Enabling Efficient Inter-DIMM Communication for Near-Memory Processing2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071005(302-316)Online publication date: Feb-2023
    • (2023)PSMiner: A Pattern-Aware Accelerator for High-Performance Streaming Graph Pattern Mining2023 60th ACM/IEEE Design Automation Conference (DAC)10.1109/DAC56929.2023.10247902(1-6)Online publication date: 9-Jul-2023
    • (2022)Software Systems Implementation and Domain-Specific Architectures towards Graph AnalyticsIntelligent Computing10.34133/2022/98067582022Online publication date: 29-Oct-2022

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media