Author: Giannoula, Christina : Search

research-article

Free

PyGim : An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures

Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 8, Issue 3Article No.: 43, Pages 1–36https://doi.org/10.1145/3700434

Graph Neural Networks (GNNs) are emerging models to analyze graph-structure data. GNN execution involves both compute-intensive and memory-intensive kernels. The latter kernels dominate execution time, because they are significantly bottlenecked by data ...

research-article

Sylva: Sparse Embedded Adapters via Hierarchical Approximate Second-Order Information

ICS '24: Proceedings of the 38th ACM International Conference on SupercomputingPages 485–497https://doi.org/10.1145/3650200.3656619

Fine-tuning is the gateway to transferring learned knowledge in a pre-trained Large Language Model (LLM) on many downstream applications. To make LLM fine-tuning more affordable, prior works follow two paths: i) adapters freeze the pre-trained LLM ...

research-article

Marple: Scalable Spike Sorting for Untethered Brain-Machine Interfacing

ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2Pages 666–682https://doi.org/10.1145/3620665.3640357

Spike sorting is the process of parsing electrophysiological signals from neurons to identify if, when, and which particular neurons fire. Spike sorting is a particularly difficult task in computational neuroscience due to the growing scale of recording ...

research-article

Atalanta: A Bit is Worth a “Thousand” Tensor Values

ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2Pages 85–102https://doi.org/10.1145/3620665.3640356

Atalanta is a lossless, hardware/software co-designed compression technique for the tensors of fixed-point quantized deep neural networks. Atalanta increases effective memory capacity, reduces off-die traffic, and/or helps to achieve the desired ...

research-article

Minuet: Accelerating 3D Sparse Convolutions on GPUs

EuroSys '24: Proceedings of the Nineteenth European Conference on Computer SystemsPages 786–802https://doi.org/10.1145/3627703.3629560

Sparse Convolution (SC) is widely used for processing 3D point clouds that are inherently sparse. Different from dense convolution, SC preserves the sparsity of the input point cloud by only allowing outputs to specific locations. To efficiently compute ...

abstract

Architectural Support for Efficient Data Movement in Fully Disaggregated Systems

SIGMETRICS '23: Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer SystemsPages 5–6https://doi.org/10.1145/3578338.3593533

Traditional data centers include monolithic servers that tightly integrate CPU, memory and disk (Figure 1a). Instead, Disaggregated Systems (DSs) [8, 13, 18, 27] organize multiple compute (CC), memory (MC) and storage devices as independent, failure-...

Also Published in:

ACM SIGMETRICS Performance Evaluation Review: Volume 51 Issue 1

research-article

DaeMon: Architectural Support for Efficient Data Movement in Fully Disaggregated Systems

Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 7, Issue 1Article No.: 16, Pages 1–36https://doi.org/10.1145/3579445

Resource disaggregation offers a cost effective solution to resource scaling, utilization, and failure-handling in data centers by physically separating hardware devices in a server. Servers are architected as pools of processor, memory, and storage ...

research-article

High-performance and balanced parallel graph coloring on multicore platforms

The Journal of Supercomputing (JSCO), Volume 79, Issue 6Pages 6373–6421https://doi.org/10.1007/s11227-022-04894-6

Abstract

Graph coloring is widely used to parallelize scientific applications by identifying subsets of independent tasks that can be executed simultaneously. Graph coloring assigns colors the vertices of a graph, such that no adjacent vertices have the ...

abstract

Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures

SIGMETRICS/PERFORMANCE '22: Abstract Proceedings of the 2022 ACM SIGMETRICS/IFIP PERFORMANCE Joint International Conference on Measurement and Modeling of Computer SystemsPages 33–34https://doi.org/10.1145/3489048.3522661

Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures place simple cores close to DRAM banks. Recent research demonstrates that they ...

Also Published in:

ACM SIGMETRICS Performance Evaluation Review: Volume 50 Issue 1

research-article

SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures

Proceedings of the ACM on Measurement and Analysis of Computing Systems (POMACS), Volume 6, Issue 1Article No.: 21, Pages 1–49https://doi.org/10.1145/3508041

Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures, after decades of research efforts. Near-bank PIM architectures place simple cores close to DRAM banks. Recent research demonstrates that they ...

research-article

SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations

MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on MicroarchitecturePages 600–614https://doi.org/10.1145/3352460.3358286

Important workloads, such as machine learning and graph analytics applications, heavily involve sparse linear algebra operations. These operations use sparse matrix compression as an effective means to avoid storing zeros and performing unnecessary ...

research-article

An adaptive concurrent priority queue for NUMA architectures

CF '19: Proceedings of the 16th ACM International Conference on Computing FrontiersPages 135–144https://doi.org/10.1145/3310273.3323164

Designing scalable concurrent priority queues for contemporary NUMA servers is challenging. Several NUMA-unaware implementations can scale up to a high number of threads exploiting the potential parallelism of the insert operations. In contrast, in ...

Search Results

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

PyGim : An Efficient Graph Neural Network Library for Real Processing-In-Memory Architectures

Sylva: Sparse Embedded Adapters via Hierarchical Approximate Second-Order Information

Marple: Scalable Spike Sorting for Untethered Brain-Machine Interfacing

Atalanta: A Bit is Worth a “Thousand” Tensor Values

Minuet: Accelerating 3D Sparse Convolutions on GPUs

Architectural Support for Efficient Data Movement in Fully Disaggregated Systems

Also Published in:

DaeMon: Architectural Support for Efficient Data Movement in Fully Disaggregated Systems

High-performance and balanced parallel graph coloring on multicore platforms

Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures

Also Published in:

SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Architectures

SMASH: Co-designing Software Compression and Hardware-Accelerated Indexing for Efficient Sparse Matrix Operations

An adaptive concurrent priority queue for NUMA architectures

Applied Filters

People

Names

Institutions

Authors

Publications

Journal/Magazine Names

Proceedings/Book Names

All Publications

Content Type

Media Formats

Publisher

Conferences

Sponsors

Conference Event

Proceedings Series

Reproducibility Badges

Publication Date

Save to Binder

Also Published in:

Also Published in: