Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleJune 2024
Load Balanced PIM-Based Graph Processing
ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 29, Issue 4Article No.: 61, Pages 1–22https://doi.org/10.1145/3659951Graph processing is widely used for many modern applications, such as social networks, recommendation systems, and knowledge graphs. However, processing large-scale graphs on traditional Von Neumann architectures is challenging due to the irregular graph ...
- research-articleJune 2023
Decoupled SSD: Rethinking SSD Architecture through Network-based Flash Controllers
ISCA '23: Proceedings of the 50th Annual International Symposium on Computer ArchitectureArticle No.: 61, Pages 1–13https://doi.org/10.1145/3579371.3589096Modern NAND Flash memory-based Solid State Drives (SSDs) are designed to provide high-bandwidth for I/O requests through high-speed NVMe interface and increased internal flash memory bandwidth. In addition to providing high performance for incoming I/...
- research-articleJune 2022
MI2D: Accelerating Matrix Inversion with 2-Dimensional Tile Manipulations
GLSVLSI '22: Proceedings of the Great Lakes Symposium on VLSI 2022Pages 423–429https://doi.org/10.1145/3526241.3530314Matrix inversion is critical in mathematics and scientific applications. Large-scale dense matrix inversion is especially challenging for modern computers due to its heavy dependency of matrix elements and the poor temporal data locality. In this paper, ...
- research-articleFebruary 2021
HBM Connect: High-Performance HLS Interconnect for FPGA HBM
FPGA '21: The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysPages 116–126https://doi.org/10.1145/3431920.3439301With the recent release of High Bandwidth Memory (HBM) based FPGA boards, developers can now exploit unprecedented external memory bandwidth. This allows more memory-bounded applications to benefit from FPGA acceleration. However, fully utilizing the ...
- research-articleApril 2019
pLock: A Fast Lock for Architectures with Explicit Inter-core Message Passing
ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating SystemsPages 765–778https://doi.org/10.1145/3297858.3304030Synchronization is a significant issue for multi-threaded programs. Mutex lock, as a classic solution, is widely used in legacy programs and is still popular for its intuition. The SW26010 architecture, deployed on the supercomputer Sunway Taihulight, ...
-
- research-articleJanuary 2016
A Filtering Mechanism to Reduce Network Bandwidth Utilization of Transaction Execution
ACM Transactions on Architecture and Code Optimization (TACO), Volume 12, Issue 4Article No.: 51, Pages 1–26https://doi.org/10.1145/2837028Hardware Transactional Memory (HTM) relies heavily on the on-chip network for intertransaction communication. However, the network bandwidth utilization of transactions has been largely neglected in HTM designs. In this work, we propose a cost model to ...
- ArticleSeptember 2015
Implementation and Modeling for High-performance I/O Hub Used in SPARC M7 Processor-Based Servers
MCSOC '15: Proceedings of the 2015 IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-ChipPages 275–282https://doi.org/10.1109/MCSoC.2015.29The I/O Hub (IOH) for SPARC M7 processor-based servers is an ASIC providing high performance, flexible, and virtualized access to multiple Gen3 PCIe devices. The IOH's top-level interconnect, connecting multiple PCIe Root Complexes to a set of SPARC M7 ...
- research-articleAugust 2014
Consolidated conflict detection for hardware transactional memory
PACT '14: Proceedings of the 23rd international conference on Parallel architectures and compilationPages 201–212https://doi.org/10.1145/2628071.2628076Hardware Transactional Memory (HTM) promises to ease multithreaded parallel programming with uncompromised performance. Microprocessors supporting HTM implement a conflict detection mechanism to detect data access conflicts between transactions. ...
- research-articleOctober 2013
Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures
ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 18, Issue 4Article No.: 48, Pages 1–28https://doi.org/10.1145/2504906Current heterogeneous chip-multiprocessors (CMPs) integrate a GPU architecture on a die. However, the heterogeneity of this architecture inevitably exerts different pressures on shared resource management due to differing characteristics of CPU and GPU ...
- opinionOctober 2013
Toward a Coherent Multicore Memory Model
With exascale multicores, the question of how to efficiently support a shared memorymodel is of paramount importance. Asprogrammers demand the convenience ofcoherent shared memory, ever-growing core counts place higher demands onmemory subsystems, and ...
- ArticleNovember 2012
Performance Modeling and Analysis of On-chip Networks for Real-Time Applications
PRDC '12: Proceedings of the 2012 IEEE 18th Pacific Rim International Symposium on Dependable ComputingPages 111–120https://doi.org/10.1109/PRDC.2012.18Network-on-Chip (NoC) is now considered to be a promising approach to implementing many-core systems and some real-time applications are executed on them. However, it has not yet been proven that on-chip networks can theoretically satisfy the hard real-...
- posterSeptember 2012
TMNOC: a case of HTM and NoC co-design for increased energy efficiency and concurrency
PACT '12: Proceedings of the 21st international conference on Parallel architectures and compilation techniquesPages 439–440https://doi.org/10.1145/2370816.2370885Hardware Transactional Memory (HTM) designs must implement conflict detection to guarantee the correctness of transaction execution. A conflict occurs when more than one transaction access the same data and at least one of them attempts to modify the ...
- research-articleJuly 2012
Power-aware performance increase via core/uncore reinforcement control for chip-multiprocessors
ISLPED '12: Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and designPages 97–102https://doi.org/10.1145/2333660.2333686Network-on-Chips (NoCs) have emerged as the backbone for the inter-core communication of a chip-multiprocessor (CMP). This paper evaluates and analyzes the advantages of managing the processing cores and the on-chip communication fabric in synergy for ...
- ArticleMay 2012
Efficient Timing Channel Protection for On-Chip Networks
NOCS '12: Proceedings of the 2012 IEEE/ACM Sixth International Symposium on Networks-on-ChipPages 142–151https://doi.org/10.1109/NOCS.2012.24On-chip network is often dynamically shared among applications that are concurrently running on a chip-multiprocessor (CMP). In general, such shared resources imply that applications can affect each other's timing characteristics through interference in ...
- research-articleDecember 2010
Thread criticality support in on-chip networks
NoCArc '10: Proceedings of the Third International Workshop on Network on Chip ArchitecturesPages 5–10https://doi.org/10.1145/1921249.1921253Multicore computing is becoming the mainstream approach in computer system designs to effectively use growing transistor budgets for harnessing performance and energy-efficiency. Increasing the parallelism with more cores requires careful management, ...
- ArticleDecember 2010
Probabilistic Distance-Based Arbitration: Providing Equality of Service for Many-Core CMPs
MICRO '43: Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on MicroarchitecturePages 509–519https://doi.org/10.1109/MICRO.2010.18Emerging many-core chip multiprocessors will integrate dozens of small processing cores with an on-chip interconnect consisting of point-to-point links. The interconnect enables the processing cores to not only communicate, but to share common resources ...
- posterSeptember 2010
Approximating age-based arbitration in on-chip networks
PACT '10: Proceedings of the 19th international conference on Parallel architectures and compilation techniquesPages 575–576https://doi.org/10.1145/1854273.1854359The on-chip network of emerging many-core CMPs enables the sharing of numerous on-chip components. This on-chip network needs to ensure fairness when accessing the shared resources. In this work, we propose providing equality of service (EoS) in future ...
- research-articleMarch 2010
A power-efficient all-optical on-chip interconnect using wavelength-based oblivious routing
ASPLOS XV: Proceedings of the fifteenth International Conference on Architectural support for programming languages and operating systemsPages 15–28https://doi.org/10.1145/1736020.1736024We present an all-optical approach to constructing data networks on chip that combines the following key features: (1) Wavelength-based routing, where the route followed by a packet depends solely on the wavelength of its carrier signal, and not on ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 38 Issue 1ACM SIGPLAN Notices: Volume 45 Issue 3 - research-articleJanuary 2010
Performability/energy tradeoff in error-control schemes for on-chip networks
IEEE Transactions on Very Large Scale Integration (VLSI) Systems (ITVL), Volume 18, Issue 1Pages 1–14https://doi.org/10.1109/TVLSI.2008.2000994High reliability against noise, high performance, and low energy consumption are key objectives in the design of on-chip networks. Recently some researchers have considered the impact of various error-control schemes on these objectives and on the ...
- research-articleDecember 2009
Low-cost router microarchitecture for on-chip networks
MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Symposium on MicroarchitecturePages 255–266https://doi.org/10.1145/1669112.1669145On-chip networks are critical to the scaling of future multi-core processors. The challenge for on-chip network is to reduce the cost including power consumption and area while providing high performance such as low latency and high bandwidth. Although ...