Abstract
This is the age of specialization. Today’s server, mobile, and desktop processors contain not only conventional CPUs, but also various flavors of accelerators. The most prominent among the accelerators are Graphics Processing Units (GPUs). Other types of accelerators including Digital Signal Processors (DSPs), AI accelerators (e.g., Apple’s Neural Engine, Google’s TPU), cryptographic accelerators, and field-programmable-gate-arrays (FPGAs) are also becoming common.
Chapter PDF
References
The AMBA CHI Specification. https://developer.arm.com 246
The CCIX Consortium. https://www.ccixconsortium.com 246
The GenZ Consortium. https://genzconsortium.org 246
The OpenCAPI Consortium. https://opencapi.org 246
T. M. Aamodt, W. W. L. Fung, and T. G. Rogers. General-Purpose Graphics Processor Architectures. Synthesis Lectures on Computer Architecture. Morgan & Claypool Publishers, 2018. DOI: https://doi.org/10.2200/s00848ed1v01y201804cac044 212
Y. Afek, G. M. Brown, and M. Merritt. Lazy caching. ACM Transactions on Programming Languages and Systems, 15(1):182–205, 1993. DOI: https://doi.org/10.1145/151646.151651 247
N. Agarwal, D. W. Nellans, E. Ebrahimi, T. F. Wenisch, J. Danskin, and S. W. Keckler. Selective GPU caches to eliminate CPU-GPU HW cache coherence. In IEEE International Symposium on High Performance Computer Architecture (HPCA), Barcelona, Spain, March 12–16, 2016. DOI: https://doi.org/10.1109/hpca.2016.7446089 246, 247
J. Alglave, M. Batty, A. F. Donaldson, G. Gopalakrishnan, J. Ketema, D. Poetzl, T. Sorensen, and J. Wickerson. GPU concurrency: Weak behaviours and programming assumptions. In Proc. of the 20th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 577–591, Istanbul, Turkey, March 14–18, 2015. DOI: https://doi.org/10.1145/2775054.2694391 214
J. Alsop, M. S. Orr, B. M. Beckmann, and D. A. Wood. Lazy release consistency for GPUs. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 26:1–26:13, Taipei, Taiwan, October 15–19, 2016. DOI: https://doi.org/10.1109/micro.2016.7783729 247
J. Alsop, M. D. Sinclair, and S. V. Adve. Spandex: A flexible interface for efficient heterogeneous coherence. In ISCA, 2018. DOI: https://doi.org/10.1109/isca.2018.00031 245, 246
L. Alvarez, L. Vilanova, M. Moretó, M. Casas, M. González, X. Martorell, N. Navarro, E. Ayguadé, and M. Valero. Coherence protocol for transparent management of scratchpad memories in shared memory manycore architectures. In Proc. of the 42nd Annual International Symposium on Computer Architecture, pages 720–732, Portland, OR, June 13–17, 2015. DOI: https://doi.org/10.1145/2749469.2750411 247
J. F. Cantin, J. E. Smith, M. H. Lipasti, A. Moshovos, and B. Falsafi. Coarse-grain coherence tracking: RegionScout and region coherence arrays. IEEE Micro, 26(1):70–79, 2006. DOI: https://doi.org/10.1109/mm.2006.8 246
B. Choi, R. Komuravelli, H. Sung, R. Smolinski, N. Honarmand, S. V. Adve, V. S. Adve, N. P. Carter, and C. Chou. DeNovo: Rethinking the memory hierarchy for disciplined parallelism. In International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 155–166, Galveston, TX, October 10–14, 2011. DOI: https://doi.org/10.1109/pact.2011.21 231, 247
M. Elver and V. Nagarajan. TSO-CC: Consistency directed cache coherence for TSO. In 20th IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 165–176, Orlando, FL, February 15–19, 2014. DOI: https://doi.org/10.1109/hpca.2014.6835927 247
D. R. Hower, B. A. Hechtman, B. M. Beckmann, B. R. Gaster, M. D. Hill, S. K. Reinhardt, and D. A. Wood. Heterogeneous-race-free memory models. In Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 427–440, Salt Lake City, UT, March 1–5, 2014. DOI: https://doi.org/10.1145/2541940.2541981 213, 217, 239, 247
R. Komuravelli, M. D. Sinclair, J. Alsop, M. Huzaifa, M. Kotsifakou, P. Srivastava, S. V. Adve, and V. S. Adve. Stash: Have your scratchpad and cache it too. In Proc. of the 42nd Annual International Symposium on Computer Architecture, pages 707–719, Portland, OR, June 13–17, 2015. DOI: https://doi.org/10.1145/2872887.2750374 217, 231, 247
L. I. Kontothanassis, M. L. Scott, and R. Bianchini. Lazy release consistency for hardware-coherent multiprocessors. In Proc. Supercomputing, page 61, San Diego, CA, December 4–8, 1995. DOI: https://doi.org/10.21236/ada290062 247
A. R. Lebeck and D. A. Wood. Dynamic self-invalidation: Reducing coherence overhead in shared-memory multiprocessors. In Proc. of the 22nd Annual International Symposium on Computer Architecture (ISCA), pages 48–59, Santa Margherita Ligure, Italy, June 22–24, 1995. DOI: https://doi.org/10.1109/isca.1995.524548 217, 247
M. Lis, K. S. Shim, M. H. Cho, and S. Devadas. Memory coherence in the age of multicores. In IEEE 29th International Conference on Computer Design (ICCD), Amherst, MA, October 9–12, 2011. DOI: https://doi.org/10.1109/iccd.2011.6081367 247
D. Lustig, S. Sahasrabuddhe, and O. Giroux. A formal analysis of the NVIDIA PTX memory consistency model. In Proc. of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2019. DOI: https://doi.org/10.1145/3297858.3304043 228
D. Lustig, C. Trippel, M. Pellauer, and M. Martonosi. ArMOR: Defending against memory consistency model mismatches in heterogeneous architectures. In Proc. of the 42nd Annual International Symposium on Computer Architecture, pages 388–400, Portland, OR, June 13–17, 2015. 240 DOI: https://doi.org/10.1145/2749469.2750378
L. E. Olson, M. D. Hill, and D. A. Wood. Crossing guard: Mediating host-accelerator coherence interactions. In Proc. of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 163–176, Xi’an, China, April 8–12, 2017. DOI: https://doi.org/10.1145/3093336.3037715 246
M. S. Orr, S. Che, A. Yilmazer, B. M. Beckmann, M. D. Hill, and D. A. Wood. Synchronization Using Remote-Scope Promotion. In Proc. of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pages 73–86, Istanbul, Turkey, March 14–18, 2015. DOI: https://doi.org/10.1145/2694344.2694350 217
J. Power, A. Basu, J. Gu, S. Puthoor, B. M. Beckmann, M. D. Hill, S. K. Reinhardt, and D. A. Wood. Heterogeneous system coherence for integrated CPU-GPU systems. In The 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-46), pages 457–467, Davis, CA, December 7–11, 2013. DOI: https://doi.org/10.1145/2540708.2540747 246
X. Ren and M. Lis. Efficient sequential consistency in GPUs via relativistic cache coherence. In IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 625–636, Austin, TX, February 4–8, 2017. DOI: https://doi.org/10.1109/hpca.2017.40 226, 247
A. Ros and S. Kaxiras. Complexity-effective multicore coherence. In International Conference on Parallel Architectures and Compilation Techniques (PACT), pages 241–252, Minneapolis, MN, September 19–23, 2012. DOI: https://doi.org/10.1145/2370816.2370853 247
Y. S. Shao, S. L. Xi, V. Srinivasan, G. Wei, and D. M. Brooks. Co-designing accelerators and SOC interfaces using gem5-aladdin. In 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pages 48:1–48:12, Taipei, Taiwan, October 15–19, 2016. DOI: https://doi.org/10.1109/micro.2016.7783751 247
D. E. Shasha and M. Snir. Efficient and correct execution of parallel programs that share memory. ACM Transactions on Programming Languages and Systems, 10(2):282–312, 1988. DOI: https://doi.org/10.1145/42190.42277 228
M. D. Sinclair, J. Alsop, and S. V. Adve. Chasing away RAts: Semantics and evaluation for relaxed atomics on heterogeneous systems. In Proc. of the 44th Annual International Symposium on Computer Architecture (ISCA), pages 161–174, New York, NY, ACM, 2017. DOI: https://doi.org/10.1145/3079856.3080206 247
I. Singh, A. Shriraman, W. W. L. Fung, M. O’Connor, and T. M. Aamodt. Cache coherence for GPU architectures. In 19th IEEE International Symposium on High Performance Computer Architecture (HPCA), pages 578–590, Shenzhen, China, February 23–27, 2013. DOI: https://doi.org/10.1109/mm.2014.4 216, 217, 220, 223, 226, 247
X. Yu and S. Devadas. Tardis: Time traveling coherence algorithm for distributed shared memory. In International Conference on Parallel Architecture and Compilation (PACT), pages 227–240, San Francisco, CA, October 18–21, 2015. DOI: https://doi.org/10.1109/pact.2015.12 247
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Nagarajan, V., Sorin, D.J., Hill, M.D., Wood, D.A. (2020). Consistency and Coherence for Heterogeneous Systems. In: A Primer on Memory Consistency and Cache Coherence. Synthesis Lectures on Computer Architecture. Springer, Cham. https://doi.org/10.1007/978-3-031-01764-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-031-01764-3_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-00636-4
Online ISBN: 978-3-031-01764-3
eBook Packages: Synthesis Collection of Technology (R0)eBColl Synthesis Collection 3eBColl Synthesis Collection 9