Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleDecember 2024
PyGim : An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures
- Christina Giannoula,
- Peiming Yang,
- Ivan Fernandez,
- Jiacheng Yang,
- Sankeerth Durvasula,
- Yu Xin Li,
- Mohammad Sadrosadati,
- Juan Gomez Luna,
- Onur Mutlu,
- Gennady Pekhimenko
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 8, Issue 3Article No.: 43, Pages 1–36https://doi.org/10.1145/3700434Graph Neural Networks (GNNs) are emerging models to analyze graph-structure data. GNN execution involves both compute-intensive and memory-intensive kernels. The latter kernels dominate execution time, because they are significantly bottlenecked by data ...
- research-articleJune 2024
Sylva: Sparse Embedded Adapters via Hierarchical Approximate Second-Order Information
ICS '24: Proceedings of the 38th ACM International Conference on SupercomputingPages 485–497https://doi.org/10.1145/3650200.3656619Fine-tuning is the gateway to transferring learned knowledge in a pre-trained Large Language Model (LLM) on many downstream applications. To make LLM fine-tuning more affordable, prior works follow two paths: i) adapters freeze the pre-trained LLM ...
- research-articleApril 2024
Marple: Scalable Spike Sorting for Untethered Brain-Machine Interfacing
- Eugene Sha,
- Andy Liu,
- Kareem Ibrahim,
- Mostafa Mahmoud,
- Christina Giannoula,
- Ameer Abdelhadi,
- Andreas Moshovos
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2Pages 666–682https://doi.org/10.1145/3620665.3640357Spike sorting is the process of parsing electrophysiological signals from neurons to identify if, when, and which particular neurons fire. Spike sorting is a particularly difficult task in computational neuroscience due to the growing scale of recording ...
- research-articleApril 2024
Atalanta: A Bit is Worth a “Thousand” Tensor Values
- Alberto Delmas Lascorz,
- Mostafa Mahmoud,
- Ali Hadi Zadeh,
- Milos Nikolic,
- Kareem Ibrahim,
- Christina Giannoula,
- Ameer Abdelhadi,
- Andreas Moshovos
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2Pages 85–102https://doi.org/10.1145/3620665.3640356Atalanta is a lossless, hardware/software co-designed compression technique for the tensors of fixed-point quantized deep neural networks. Atalanta increases effective memory capacity, reduces off-die traffic, and/or helps to achieve the desired ...
- research-articleApril 2024
Minuet: Accelerating 3D Sparse Convolutions on GPUs
EuroSys '24: Proceedings of the Nineteenth European Conference on Computer SystemsPages 786–802https://doi.org/10.1145/3627703.3629560Sparse Convolution (SC) is widely used for processing 3D point clouds that are inherently sparse. Different from dense convolution, SC preserves the sparsity of the input point cloud by only allowing outputs to specific locations. To efficiently compute ...
- abstractJune 2023
Architectural Support for Efficient Data Movement in Fully Disaggregated Systems
- Christina Giannoula,
- Kailong Huang,
- Jonathan Tang,
- Nectarios Koziris,
- Georgios Goumas,
- Zeshan Chishti,
- Nandita Vijaykumar
SIGMETRICS '23: Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer SystemsPages 5–6https://doi.org/10.1145/3578338.3593533Traditional data centers include monolithic servers that tightly integrate CPU, memory and disk (Figure 1a). Instead, Disaggregated Systems (DSs) [8, 13, 18, 27] organize multiple compute (CC), memory (MC) and storage devices as independent, failure-...
Also Published in:
ACM SIGMETRICS Performance Evaluation Review: Volume 51 Issue 1 - research-articleMarch 2023
DaeMon: Architectural Support for Efficient Data Movement in Fully Disaggregated Systems
- Christina Giannoula,
- Kailong Huang,
- Jonathan Tang,
- Nectarios Koziris,
- Georgios Goumas,
- Zeshan Chishti,
- Nandita Vijaykumar
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 7, Issue 1Article No.: 16, Pages 1–36https://doi.org/10.1145/3579445Resource disaggregation offers a cost effective solution to resource scaling, utilization, and failure-handling in data centers by physically separating hardware devices in a server. Servers are architected as pools of processor, memory, and storage ...
- research-articleNovember 2022
- abstractJune 2022
Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures
SIGMETRICS/PERFORMANCE '22: Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer SystemsPages 33–34https://doi.org/10.1145/3489048.3522661Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures place simple cores close to DRAM banks. Recent research demonstrates that they ...
Also Published in:
ACM SIGMETRICS Performance Evaluation Review: Volume 50 Issue 1 - research-articleFebruary 2022
SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures
Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 6, Issue 1Article No.: 21, Pages 1–49https://doi.org/10.1145/3508041Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures place simple cores close to DRAM banks. Recent research demonstrates that they ...
- research-articleOctober 2019
SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations
- Konstantinos Kanellopoulos,
- Nandita Vijaykumar,
- Christina Giannoula,
- Roknoddin Azizi,
- Skanda Koppula,
- Nika Mansouri Ghiasi,
- Taha Shahroodi,
- Juan Gomez Luna,
- Onur Mutlu
MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on MicroarchitecturePages 600–614https://doi.org/10.1145/3352460.3358286Important workloads, such as machine learning and graph analytics applications, heavily involve sparse linear algebra operations. These operations use sparse matrix compression as an effective means to avoid storing zeros and performing unnecessary ...
- research-articleApril 2019
An adaptive concurrent priority queue for NUMA architectures
CF '19: Proceedings of the 16th ACM International Conference on Computing FrontiersPages 135–144https://doi.org/10.1145/3310273.3323164Designing scalable concurrent priority queues for contemporary NUMA servers is challenging. Several NUMA-unaware implementations can scale up to a high number of threads exploiting the potential parallelism of the insert operations. In contrast, in ...