Author: Naithani, Ajeya : Search

Applied Filters

Publication Date

People

8 Results for: Author: Naithani, AjeyaEdit SearchSave SearchRSS

Searched The ACM Guide to Computing Literature (3,856,353 records)|Limit your search to The ACM Full-Text Collection (778,796 records)

Showing 1 - 8of8 Results

Filters

Select All

Export Citations Save to Binder

per page:

Recency

research-article
July 2024
Decoupled Vector Runahead for Prefetching Nested Memory-Access Chains
IEEE Micro (IMIC), Volume 44, Issue 4Pages 20–26https://doi.org/10.1109/MM.2024.3406891
Decoupled vector runahead (DVR) exploits massive amounts of memory-level parallelism to improve the performance of applications that feature indirect memory accesses by dynamically inferring loop bounds at runtime, recognizing striding loads, and ...
0
Metrics
Total Citations0
research-article
December 2023
Decoupled Vector Runahead
MICRO '23: Proceedings of the 56th Annual IEEE/ACM International Symposium on MicroarchitecturePages 17–31https://doi.org/10.1145/3613424.3614255

We present Decoupled Vector Runahead (DVR), an in-core prefetching technique, executing separately to the main application thread, that exploits massive amounts of memory-level parallelism to improve the performance of applications featuring indirect ...
4
632
Metrics
Total Citations4
Total Downloads632
Last 12 Months381
Last 6 weeks32
Get Access
research-article
July 2022
Vector Runahead for Indirect Memory Accesses
IEEE Micro (IMIC), Volume 42, Issue 4Pages 116–123https://doi.org/10.1109/MM.2022.3163132
Vector runahead delivers extremely high memory-level parallelism even for the chains of dependent memory accesses with complex intermediate address computation, which conventional runahead techniques fundamentally cannot handle and, therefore, have ...
1
Metrics
Total Citations1
research-article
June 2022
VMT: Virtualized Multi-Threading for Accelerating Graph Workloads on Commodity Processors
IEEE Transactions on Computers (ITCO), Volume 71, Issue 6Pages 1386–1398https://doi.org/10.1109/TC.2021.3086069
Modern-day graph workloads operate on huge graphs through pointer chasing which leads to high last-level cache (LLC) miss rates and limited memory-level parallelism (MLP). Simultaneous Multi-Threading (SMT) effectively hides the memory access latencies ...
1
Metrics
Total Citations1
research-article
Open Access
January 2022
The Forward Slice Core: A High-Performance, Yet Low-Complexity Microarchitecture
ACM Transactions on Architecture and Code Optimization (TACO), Volume 19, Issue 2Article No.: 17, Pages 1–25https://doi.org/10.1145/3499424
Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architecture should ideally ...
0
2,032
Metrics
Total Citations0
Total Downloads2,032
Last 12 Months629
Last 6 weeks141
View online with eReader
PDF
research-article
November 2021
Vector runahead
ISCA '21: Proceedings of the 48th Annual International Symposium on Computer ArchitecturePages 195–208https://doi.org/10.1109/ISCA52012.2021.00024

The memory wall places a significant limit on performance for many modern workloads. These applications feature complex chains of dependent, indirect memory accesses, which cannot be picked up by even the most advanced microarchitectural prefetchers. ...
7
169
Metrics
Total Citations7
Total Downloads169
Last 12 Months34
Last 6 weeks5
Get Access
research-article
September 2020
The Forward Slice Core Microarchitecture
PACT '20: Proceedings of the ACM International Conference on Parallel Architectures and Compilation TechniquesPages 361–372https://doi.org/10.1145/3410463.3414629

Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. A processor architecture should ...
10
466
Metrics
Total Citations10
Total Downloads466
Last 12 Months46
Last 6 weeks3
Get Access
research-article
January 2019
Precise Runahead Execution
IEEE Computer Architecture Letters (ICAL), Volume 18, Issue 1Pages 71–74https://doi.org/10.1109/LCA.2019.2910518
Runahead execution improves processor performance by accurately prefetching long-latency memory accesses. When a long-latency load causes the instruction window to fill up and halt the pipeline, the processor enters runahead mode and keeps speculatively ...
2
Metrics
Total Citations2