Ham T, Aragón J and Martonosi M. (2019). Efficient Data Supply for Parallel Heterogeneous Architectures. ACM Transactions on Architecture and Code Optimization. 16:2. (1-23). Online publication date: 30-Jun-2019.

Kondguli S and Huang M. Bootstrapping. Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems. (687-700).

Rengasamy P, Zhang H, Zhao S, Nachiappan N, Sivasubramaniam A, Kandemir M and Das C. CritICs critiquing criticality in mobile apps. Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture. (867-880).

https://doi.org/10.1109/MICRO.2018.00075

Jayaraman P and Parthasarathi R. (2017). A Survey on Post-Silicon Functional Validation for Multicore Architectures. ACM Computing Surveys. 50:4. (1-30). Online publication date: 31-Jul-2018.

https://doi.org/10.1145/3107615

Kondguli S and Huang M. (2018). A Case for a More Effective, Power-Efficient Turbo Boosting. ACM Transactions on Architecture and Code Optimization. 15:1. (1-22). Online publication date: 2-Apr-2018.

https://doi.org/10.1145/3170433

Ham T, Aragón J and Martonosi M. (2017). Decoupling Data Supply from Computation for Latency-Tolerant Communication in Heterogeneous Architectures. ACM Transactions on Architecture and Code Optimization. 14:2. (1-27). Online publication date: 30-Jun-2017.

https://doi.org/10.1145/3075620

Hashemi M, Mutlu O and Patt Y. Continuous runahead. The 49th Annual IEEE/ACM International Symposium on Microarchitecture. (1-12).

/doi/10.5555/3195638.3195712

Ntafam P, Paire E, Clouard A and Petrot F. Simulation driven insertion of data prefetching instructions for early software-on-SoC optimization. Proceedings of the 27th International Symposium on Rapid System Prototyping: Shortening the Path from Specification to Prototype. (93-99).

https://doi.org/10.1145/2990299.2990315

Atta I, Tong X, Srinivasan V, Baldini I and Moshovos A. Self-contained, accurate precomputation prefetching. Proceedings of the 48th International Symposium on Microarchitecture. (153-165).

https://doi.org/10.1145/2830772.2830816

Sembrant A, Carlson T, Hagersten E, Black-Shaffer D, Perais A, Seznec A and Michaud P. Long term parking (LTP). Proceedings of the 48th International Symposium on Microarchitecture. (334-346).

https://doi.org/10.1145/2830772.2830815

Ham T, Aragón J and Martonosi M. DeSC. Proceedings of the 48th International Symposium on Microarchitecture. (191-203).

https://doi.org/10.1145/2830772.2830800

Patsilaras G, Choudhary N and Tuck J. (2012). Efficiently exploiting memory level parallelism on asymmetric coupled cores in the dark silicon era. ACM Transactions on Architecture and Code Optimization. 8:4. (1-21). Online publication date: 1-Jan-2012.

https://doi.org/10.1145/2086696.2086707

Mameesh R and Franklin M. Speculative-aware execution. Proceedings of the 19th international conference on Parallel architectures and compilation techniques. (421-430).

https://doi.org/10.1145/1854273.1854326

Sancho J, Kerbyson D and Lang M. Characterizing the impact of using spare-cores on application performance. Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I. (74-85).

/doi/10.5555/1887695.1887704

Ansari A, Feng S, Gupta S and Mahlke S. (2010). Necromancer. ACM SIGARCH Computer Architecture News. 38:3. (473-484). Online publication date: 19-Jun-2010.

https://doi.org/10.1145/1816038.1816024

Ansari A, Feng S, Gupta S and Mahlke S. Necromancer. Proceedings of the 37th annual international symposium on Computer architecture. (473-484).

https://doi.org/10.1145/1815961.1816024

Chen Y, Zhu H and Sun X. An Adaptive Data Prefetcher for High-Performance Processors. Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. (155-164).

https://doi.org/10.1109/CCGRID.2010.61

Ro W and Gaudiot J. (2008). A low-complexity microprocessor design with speculative pre-execution. Journal of Systems Architecture: the EUROMICRO Journal. 54:12. (1101-1112). Online publication date: 1-Dec-2008.

https://doi.org/10.1016/j.sysarc.2008.05.003

Garg A and Huang M. A performance-correctness explicitly-decoupled architecture. Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture. (306-317).

https://doi.org/10.1109/MICRO.2008.4771800

Bell G and Lipasti M. Skewed redundancy. Proceedings of the 17th international conference on Parallel architectures and compilation techniques. (62-71).

https://doi.org/10.1145/1454115.1454126

Rangan R, Vachharajani N, Ottoni G and August D. (2008). Performance scalability of decoupled software pipelining. ACM Transactions on Architecture and Code Optimization. 5:2. (1-25). Online publication date: 1-Aug-2008.

https://doi.org/10.1145/1400112.1400113

Zhou P and Õnder S. Improving single-thread performance with fine-grain state maintenance. Proceedings of the 5th conference on Computing frontiers. (251-260).

https://doi.org/10.1145/1366230.1366274

Chen Y, Byna S and Sun X. Data access history cache and associated data prefetching mechanisms. Proceedings of the 2007 ACM/IEEE conference on Supercomputing. (1-12).

https://doi.org/10.1145/1362622.1362651

Ganusov I and Burtscher M. Efficient emulation of hardware prefetchers via event-driven helper threading. Proceedings of the 15th international conference on Parallel architectures and compilation techniques. (144-153).

https://doi.org/10.1145/1152154.1152178

Rui H, Zhang L and Hu W. A hybrid hardware/software generated prefetching thread mechanism on chip multiprocessors. Proceedings of the 12th international conference on Parallel Processing. (506-516).

https://doi.org/10.1007/11823285_52

Qureshi M, Lynch D, Mutlu O and Patt Y. A Case for MLP-Aware Cache Replacement. Proceedings of the 33rd annual international symposium on Computer Architecture. (167-178).

https://doi.org/10.1109/ISCA.2006.5

Qureshi M, Lynch D, Mutlu O and Patt Y. (2006). A Case for MLP-Aware Cache Replacement. ACM SIGARCH Computer Architecture News. 34:2. (167-178). Online publication date: 1-May-2006.

https://doi.org/10.1145/1150019.1136501

Gandhi A, Akkary H, Rajwar R, Srinivasan S and Lai K. (2006). Scalable Load and Store Processing in Latency-Tolerant Processors. IEEE Micro. 26:1. (30-39). Online publication date: 1-Jan-2006.

/doi/10.5555/1116644.1116667

Mutlu O, Kim H and Patt Y. (2006). Efficient Runahead Execution. IEEE Micro. 26:1. (10-20). Online publication date: 1-Jan-2006.

/doi/10.5555/1116644.1116665