Export Citations
Save this search
Please login to be able to save your searches and receive alerts for new content matching your search criteria.
COAXIAL: A CXL-Centric Memory System for Scalable Servers
SC '24: Proceedings of the International Conference for High Performance Computing, Networking, Storage, and AnalysisArticle No.: 95, Pages 1–15https://doi.org/10.1109/SC41406.2024.00101The memory system is a major performance determinant for server processors. Ever-growing core counts and datasets demand higher memory bandwidth and capacity. DDR---the dominant processor interface to memory---requires a large number of on-chip pins, ...
- research-articleNovember 2023
Filtering Wasteful Vertex Visits in Breadth-First Search
SC-W '23: Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and AnalysisPages 770–773https://doi.org/10.1145/3624062.3625133Breadth-First Search (BFS) is a common building block for several graph processing algorithms today. In this work, we highlight that a large fraction of vertex visits across the network in distributed BFS results in wasteful work. We investigate methods ...
- research-articleDecember 2022
Cooperative Concurrency Control for Write-Intensive Key-Value Workloads
ASPLOS 2023: Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1Pages 30–46https://doi.org/10.1145/3567955.3567957Key-Value Stores (KVS) are foundational infrastructure components for online services. Due to their latency-critical nature, today’s best-performing KVS contain a plethora of full-stack optimizations commonly targeting read-mostly, popularity-skewed ...
- research-articleDecember 2023
Patching up Network Data Leaks with Sweeper
MICRO '22: Proceedings of the 55th Annual IEEE/ACM International Symposium on MicroarchitecturePages 464–479https://doi.org/10.1109/MICRO56248.2022.00041Datacenters have witnessed a staggering evolution in networking technologies, driven by insatiable application demands for larger datasets and inter-server data transfers. Modern NICs can already handle 100s of Gbps of traffic, a bandwidth capability ...
- research-articleNovember 2021
OneEdge: An Efficient Control Plane for Geo-Distributed Infrastructures
SoCC '21: Proceedings of the ACM Symposium on Cloud ComputingPages 182–196https://doi.org/10.1145/3472883.3487008Resource management for geo-distributed infrastructures is challenging due to the scarcity and non-uniformity of edge resources, as well as the high client mobility and workload surges inherent to situation awareness applications. Due to their ...
-
- research-articleOctober 2021
COSPlay: Leveraging Task-Level Parallelism for High-Throughput Synchronous Persistence
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on MicroarchitecturePages 86–99https://doi.org/10.1145/3466752.3480075A key challenge in programming crash-consistent applications for Persistent Memory (PM) is achieving high performance while controlling the order of PM updates. Managing persist ordering from the CPU typically requires frequent synchronization points, ...
- research-articleOctober 2021
Cerebros: Evading the RPC Tax in Datacenters
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on MicroarchitecturePages 407–420https://doi.org/10.1145/3466752.3480055The emerging paradigm of microservices decomposes online services into fine-grained software modules frequently communicating over the datacenter network, often using Remote Procedure Calls (RPCs). Ongoing advancements in the network stack have exposed ...
- research-articleJanuary 2021
<monospace>IDIO</monospace>: Orchestrating Inbound Network Data on Server Processors
IEEE Computer Architecture Letters (ICAL), Volume 20, Issue 1Pages 30–33https://doi.org/10.1109/LCA.2020.3044923Network bandwidth demand in datacenters is doubling every 12 to 15 months. In response to this demand, high-bandwidth network interface cards, each capable of transferring 100s of Gigabits of data per second, are making inroads into the servers of next-...
- surveyJune 2020
Exploiting Errors for Efficiency: A Survey from Circuits to Applications
- Phillip Stanley-Marbell,
- Armin Alaghi,
- Michael Carbin,
- Eva Darulova,
- Lara Dolecek,
- Andreas Gerstlauer,
- Ghayoor Gillani,
- Djordje Jevdjic,
- Thierry Moreau,
- Mattia Cacciotti,
- Alexandros Daglis,
- Natalie Enright Jerger,
- Babak Falsafi,
- Sasa Misailovic,
- Adrian Sampson,
- Damien Zufferey
ACM Computing Surveys (CSUR), Volume 53, Issue 3Article No.: 51, Pages 1–39https://doi.org/10.1145/3394898When a computational task tolerates a relaxation of its specification or when an algorithm tolerates the effects of noise in its execution, hardware, system software, and programming language compilers or their runtime systems can trade deviations from ...
- research-articleSeptember 2020
The NeBuLa RPC-optimized architecture
- Mark Sutherland,
- Siddharth Gupta,
- Babak Falsafi,
- Virendra Marathe,
- Dionisios Pnevmatikatos,
- Alexandres Daglis
ISCA '20: Proceedings of the ACM/IEEE 47th Annual International Symposium on Computer ArchitecturePages 199–212https://doi.org/10.1109/ISCA45697.2020.00027Large-scale online services are commonly structured as a network of software tiers, which communicate over the datacenter network using RPCs. Ongoing trends towards software decomposition have led to the prevalence of tiers receiving and generating RPCs ...
- research-articleOctober 2019
Distributed Logless Atomic Durability with Persistent Memory
MICRO '52: Proceedings of the 52nd Annual IEEE/ACM International Symposium on MicroarchitecturePages 466–478https://doi.org/10.1145/3352460.3358321Datacenter operators have started deploying Persistent Memory (PM), leveraging its combination of fast access and persistence for significant performance gains. A key challenge for PM-aware software is to maintain high performance while achieving atomic ...
- research-articleApril 2019
Mitigating Load Imbalance in Distributed Data Serving with Rack-Scale Memory Pooling
ACM Transactions on Computer Systems (TOCS), Volume 36, Issue 2Article No.: 6, Pages 1–37https://doi.org/10.1145/3309986To provide low-latency and high-throughput guarantees, most large key-value stores keep the data in the memory of many servers. Despite the natural parallelism across lookups, the load imbalance, introduced by heavy skew in the popularity distribution ...
- research-articleApril 2019
RPCValet: NI-Driven Tail-Aware Balancing of µs-Scale RPCs
ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating SystemsPages 35–48https://doi.org/10.1145/3297858.3304070Modern online services come with stringent quality requirements in terms of response time tail latency. Because of their decomposition into fine-grained communicating software layers, a single user request fans out into a plethora of short, μs-scale ...
- research-articleOctober 2018
Design guidelines for high-performance SCM hierarchies
- Dmitrii Ustiugov,
- Alexandros Daglis,
- Javier Picorel,
- Mark Sutherland,
- Edouard Bugnion,
- Babak Falsafi,
- Dionisios Pnevmatikatos
MEMSYS '18: Proceedings of the International Symposium on Memory SystemsPages 3–16https://doi.org/10.1145/3240302.3240310With emerging storage-class memory (SCM) nearing commercialization, there is evidence that it will deliver the much-anticipated high density and access latencies within only a few factors of DRAM. Nevertheless, the latency-sensitive nature of memory-...
- research-articleAugust 2018
Algorithm/Architecture Co-Design for Near-Memory Processing
- Mario Drumond,
- Alexandros Daglis,
- Nooshin Mirzadeh,
- Dmitrii Ustiugov,
- Javier Picorel,
- Babak Falsafi,
- Boris Grot,
- Dionisios Pnevmatikatos
ACM SIGOPS Operating Systems Review (SIGOPS), Volume 52, Issue 1Pages 109–122https://doi.org/10.1145/3273982.3273992With mainstream technologies to couple logic tightly with memory on the horizon, near-memory processing has re-emerged as a promising approach to improving performance and energy for data-centric computing. DRAM, however, is primarily designed for ...
- research-articleJune 2017
The Mondrian Data Engine
- Mario Drumond,
- Alexandros Daglis,
- Nooshin Mirzadeh,
- Dmitrii Ustiugov,
- Javier Picorel,
- Babak Falsafi,
- Boris Grot,
- Dionisios Pnevmatikatos
ISCA '17: Proceedings of the 44th Annual International Symposium on Computer ArchitecturePages 639–651https://doi.org/10.1145/3079856.3080233The increasing demand for extracting value out of ever-growing data poses an ongoing challenge to system designers, a task only made trickier by the end of Dennard scaling. As the performance density of traditional CPU-centric architectures stagnates, ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 45 Issue 2 - research-articleOctober 2016
SABRes: atomic object reads for in-memory rack-scale computing
MICRO-49: The 49th Annual IEEE/ACM International Symposium on MicroarchitectureArticle No.: 6, Pages 1–13Modern in-memory services rely on large distributed object stores to achieve the high scalability essential to service thousands of requests concurrently. The independent and unpredictable nature of incoming requests results in random accesses to the ...
- research-articleOctober 2016
The Case for RackOut: Scalable Data Serving Using Rack-Scale Systems
SoCC '16: Proceedings of the Seventh ACM Symposium on Cloud ComputingPages 182–195https://doi.org/10.1145/2987550.2987577To provide low latency and high throughput guarantees, most large key-value stores keep the data in the memory of many servers. Despite the natural parallelism across lookups, the load imbalance, introduced by heavy skew in the popularity distribution ...
- posterJune 2016
An Analysis of Load Imbalance in Scale-out Data Serving
SIGMETRICS '16: Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer SciencePages 367–368https://doi.org/10.1145/2896377.2901501Despite the natural parallelism across lookups, performance of distributed key-value stores is often limited due to load imbalance induced by heavy skew in the popularity distribution of the dataset. To avoid violating service level objectives expressed ...
Also Published in:
ACM SIGMETRICS Performance Evaluation Review: Volume 44 Issue 1 - research-articleJune 2015
Manycore network interfaces for in-memory rack-scale computing
ISCA '15: Proceedings of the 42nd Annual International Symposium on Computer ArchitecturePages 567–579https://doi.org/10.1145/2749469.2750415Datacenter operators rely on low-cost, high-density technologies to maximize throughput for data-intensive services with tight tail latencies. In-memory rack-scale computing is emerging as a promising paradigm in scale-out datacenters capitalizing on ...
Also Published in:
ACM SIGARCH Computer Architecture News: Volume 43 Issue 3S