Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
- research-articleNovember 2024
SURE: Secure Unikernels Make Serverless Computing Rapid and Efficient
SoCC '24: Proceedings of the 2024 ACM Symposium on Cloud ComputingPages 668–688https://doi.org/10.1145/3698038.3698558Current serverless platforms introduce non-trivial overheads when chaining and orchestrating loosely-coupled microservices. Containerized function runtimes are also constrained by insufficient isolation and excessive startup time. This motivates our ...
- research-articleJuly 2024
SHREG: Mitigating register redundancy in GPUs
Journal of Systems Architecture: the EUROMICRO Journal (JOSA), Volume 152, Issue Chttps://doi.org/10.1016/j.sysarc.2024.103152AbstractGraphics Processing Units (GPUs) have become dominant accelerators for Machine Learning (ML) and High-Performance Computing (HPC) applications due to their massive parallelism capabilities, through the utilization of general matrix-to-matrix ...
- ArticleAugust 2024
- ArticleApril 2024
A Denotational Approach to Release/Acquire Concurrency
AbstractWe present a compositional denotational semantics for a functional language with first-class parallel composition and shared-memory operations whose operational semantics follows the Release/Acquire weak memory model (RA). The semantics is ...
- research-articleFebruary 2024
Access Interval Prediction by Partial Matching for Tightly Coupled Memory Systems
International Journal of Parallel Programming (IJPP), Volume 52, Issue 1-2Pages 3–19https://doi.org/10.1007/s10766-024-00764-1AbstractIn embedded systems, tightly coupled memories (TCMs) are usually shared between multiple masters for the purpose of hardware efficiency and software flexibility. On the one hand, memory sharing improves area utilization, but on the other hand, ...
-
- research-articleAugust 2023
Re-Cache: Mitigating cache contention by exploiting locality characteristics with reconfigurable memory hierarchy for GPGPUs
AbstractModern GPGPUs have employed multi-threading to hide the long off-chip memory access latency caused by frequent cache misses. However, the limited cache capacity shared by thousands of concurrently running warps will introduce serious cache ...
- research-articleMarch 2023
Approximate Toom–Cook FFT with sparsity aware error tuning in a shared memory architecture
Integration, the VLSI Journal (INTG), Volume 89, Issue CPages 94–105https://doi.org/10.1016/j.vlsi.2022.11.009AbstractApproximate Computing techniques are finding a central role in modern applications, by optimizing architectures to relax some computation but with a constrained inaccuracy. In many applications, the FFT algorithm is invariably applied ...
Highlights- Approximate Computing can exploit error resilience of the applications to reduce chip resources.
- research-articleDecember 2022
Hybrid parallelization of Euler–Lagrange simulations based on MPI-3 shared memory
Advances in Engineering Software (ADES), Volume 174, Issue Chttps://doi.org/10.1016/j.advengsoft.2022.103291AbstractThe use of Euler–Lagrange methods on unstructured grids extends their application area to more versatile setups. However, the lack of a regular topology limits the scalability of distributed parallel methods, especially for routines ...
Highlights- A novel method to identify halo elements for unstructured Euler–Lagrange solvers.
- research-articleNovember 2022
- research-articleOctober 2022
- ArticleAugust 2022
The Shared Memory Based Cryptographic Card Virtualization
AbstractDriven by cloud computing technology, traditional cryptography is transforming into cloud cryptographic service. Cryptographic cards must be virtualized if they are to be used in cloud. Hardware virtualization is the most commonly used ...
- ArticleJuly 2022
Tagged Geometric History Length Access Interval Prediction for Tightly Coupled Memory Systems
Embedded Computer Systems: Architectures, Modeling, and SimulationPages 90–100https://doi.org/10.1007/978-3-031-15074-6_6AbstractIn embedded systems, tightly coupled memories (TCMs) are usually shared between multiple masters for the purpose of performance scalability, hardware efficiency and software flexibility. On the one hand, memory sharing improves area utilization, ...
- research-articleJune 2022
DIFFUSE: A DIstributed and decentralized platForm enabling Function composition in Serverless Environments
Computer Networks: The International Journal of Computer and Telecommunications Networking (CNTW), Volume 210, Issue Chttps://doi.org/10.1016/j.comnet.2022.108993AbstractServerless computing is an emerging proposition in the cloud offering landscape that promotes a higher level of abstraction, further decoupling software operations from the underlying hardware. Often recognized as an economically ...
- research-articleJune 2022
Symbolic identification of shared memory based bank conflicts for GPUs
Journal of Systems Architecture: the EUROMICRO Journal (JOSA), Volume 127, Issue Chttps://doi.org/10.1016/j.sysarc.2022.102518AbstractGraphic processing units (GPUs) are routinely used for general purpose computations to improve performance. To achieve the sought performance gains, care must be invested in fine tuning the way GPU programs interact with the underlying ...
- research-articleMay 2022
Implementing three exchange read operations for distributed atomic storage
Journal of Parallel and Distributed Computing (JPDC), Volume 163, Issue CPages 97–113https://doi.org/10.1016/j.jpdc.2022.01.024AbstractCommunication latency typically dominates the performance of message-passing systems, and consequently defines the efficiency of operations of algorithms implementing atomic read/write objects in asynchronous, crash-prone, message-...
Highlights- Reduce the communication demands of read operations for distributed atomic memory.
- ArticleMarch 2022
On the Difference Between Shared Memory and Shared Address Space in HPC Communication
AbstractShared memory mechanisms, e.g., POSIX shmem or XPMEM, are widely used to implement efficient intra-node communication among processes running on the same node. While POSIX shmem allows other processes to access only newly allocated memory, XPMEM ...
- rapid-communicationMarch 2022
Randomized consensus with regular registers
Highlights- The randomized consensus algorithm of Aspnes and Herlihy, which was shown to work with atomic registers, works even with regular registers.
The well-known randomized consensus algorithm by Aspnes and Herlihy (1990) for asynchronous shared-memory systems was proved to work, even against a strong adversary, under the assumption that the registers that it uses are atomic. ...
- research-articleFebruary 2022
On atomic registers and randomized consensus in M&M systems
Distributed Computing (DICO), Volume 35, Issue 1Pages 81–103https://doi.org/10.1007/s00446-021-00405-7AbstractMotivated by recent distributed systems technology, Aguilera et al. introduced a hybrid model of distributed computing, called the message-and-memory model or m&m model for short. In this model, processes can communicate by message passing and ...
- research-articleDecember 2021
Developing parallel programming and soft skills: A project based learning approach
Journal of Parallel and Distributed Computing (JPDC), Volume 158, Issue CPages 151–163https://doi.org/10.1016/j.jpdc.2021.07.015AbstractUpon graduation, a computer science student should have a good understanding of the current technology and have the soft skills necessary to secure a position in industry. Considering that typical computers and even the common ...
Highlights- Hands-on Project Based Learning is an effective means to teach parallel programming.
- research-articleDecember 2021
A novel simulated annealing-based optimization approach for cluster-based task scheduling
Cluster Computing (KLU-CLUS), Volume 24, Issue 4Pages 2927–2956https://doi.org/10.1007/s10586-021-03275-7AbstractRapidly advancing technology brings a huge volume of data along the way that grows at a staggering pace and cannot be processed with traditional algorithms/hardware. Therefore, storing, processing, and analyzing this data in a timely manner ...