More Web Proxy on the site http://driver.im/

Article

Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support

Authors:

Naveen Muralimanohar,

Norman P. JouppiAuthors Info & Claims

SC '10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis

Pages 1 - 11

https://doi.org/10.1109/SC.2010.50

Published: 13 November 2010 Publication History

Abstract

System-in-Package (SiP) and 3D integration are promising technologies to bring more memory onto a microprocessor package to mitigate the "memory wall" problem. In this paper, instead of using them to build caches, we study a heterogenous main memory using both on- and off-package memories providing both fast and high-bandwidth on-package accesses and expandable and low-cost commodity off-package memory capacity. We introduce another layer of address translation coupled with an on-chip memory controller that can dynamically migrate data between off-package and off-package memory either in hardware or with operating system assistance depending on the migration granularity. Our experimental results demonstrate that such design can achieve the average effectiveness of 83% of the ideal case where all memory can be placed in high-speed on-package memory for our simulated benchmarks.

References

[1]

B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, L. Jiang et al., "Die Stacking (3D) Microarchitecture," in MICRO '06, 2006, pp. 469-479.

Digital Library

[2]

G. H. Loh, Y. Xie, and B. Black, "Processor Design in 3D Die-Stacking Technologies," IEEE Micro, vol. 27, no. 3, pp. 31-48, 2007.

Digital Library

[3]

International Technology Roadmap for Semiconductors, "ITRS 2009 Edition," http://www.itrs.net/.

[4]

T. Kgil, A. Saidi, N. Binkert, S. Reinhardt, K. Flautner et al., "Pi-coServer: Using 3D stacking technology to build energy efficient servers," ACM Journal on Emerging Technologies in Computing Systems, vol. 4, no. 4, pp. 1-34, 2008.

Digital Library

[5]

G. H. Loh, "3D-Stacked Memory Architectures for Multi-core Processors," in ISCA '08, 2008, pp. 453-464.

Digital Library

[6]

G. L. Loi, B. Agrawal, N. Srivastava, S.-C. Lin, T. Sherwood et al., "A Thermally-Aware Performance Analysis of Vertically Integrated (3-D) Processor-Memory Hierarchy," in DAC '06, 2006, pp. 991-996.

Digital Library

[7]

M. Ghosh and H.-H. S. Lee, "Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs," in MICRO '07, 2007, pp. 134-145.

Digital Library

[8]

P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg et al., "Simics: A Full System Simulation Platform," Computer, vol. 35, no. 2, pp. 50-58, 2002.

Digital Library

[9]

S. Thoziyoor, J. H. Ahn, M. Monchiero, J. B. Brockman, and N. P. Jouppi, "A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies," in ISCA '08. IEEE Computer Society, 2008, pp. 51-62.

Digital Library

[10]

Micron, "2Gb: x4, x8, x16 DDR3 SDRAM," http://www.micron.net/.

[11]

S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens, "Memory Access Scheduling," in ISCA '00, 2000, pp. 128-138.

Digital Library

[12]

B. M. Beckmann, M. R. Marty, and D. A. Wood, "ASR: Adaptive Selective Replication for CMP Caches," in MICRO '06, 2006, pp. 443- 454.

Digital Library

[13]

Z. Chishti, M. D. Powell, and T. N. Vijaykumar, "Optimizing Replication, Communication, and Capacity Allocation in CMPs," in ISCA '05, 2005, pp. 357-368.

Digital Library

[14]

J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger et al., "A NUCA Substrate for Flexible CMP Cache Sharing," in ICS '05, 2005, pp. 31-40.

Digital Library

[15]

N. Rafique, W.-T. Lim, and M. Thottethodi, "Architectural Support for Operating System-Driven CMP Cache Management," in PACT '06, 2006, pp. 2-12.

Digital Library

[16]

S. Cho and L. Jin, "Managing Distributed, Shared L2 Caches through OS-Level Page Allocation," in MICRO '06, 2006, pp. 455-468.

Digital Library

[17]

"UltraSPARC T2. Supplement to the UltraSPARC Architecture 2007," Sun Microsystems, Inc., Tech. Rep. 950-5556-01, 2007.

[18]

G. Loh, "Extending the Effectiveness of 3D-Stacked DRAM Caches with an Adaptive Multi-Queue Policy," in MICRO '09, 2009, pp. 201- 212.

Digital Library

[19]

J. Liedtke, "Improving IPC by kernel design," in SOSP '93, 1993, pp. 175-188.

Digital Library

[20]

E. Argollo, A. Falcón, P. Faraboschi, M. Monchiero, and D. Ortega, "COTSon: Infrastructure for Full System Simulation," ACM SIGOPS Operating Systems Review, vol. 43, no. 1, pp. 52-61, 2009.

Digital Library

[21]

K. Fukuda, H. Yamashita, G. Ono, R. Nemoto, E. Suzuki, T. Takemoto, F. Yuki, and T. Saito, "A 12.3mW 12.5Gb/s Complete Transceiver in 65nm CMOS," in ISSCC '10, 2010, pp. 368-369.

[22]

W.-F. Lin, S. K. Reinhardt, and D. Burger, "Reducing DRAM Latencies with an Integrated Memory Hierarchy Design," in HPCA '01, 2001, pp. 301-312.

Digital Library

[23]

L. Zhang, Z. Fang, M. Parker, B. K. Mathew, L. Schaelicke et al., "The Impulse Memory Controller," IEEE Transactions on Computers, vol. 50, no. 11, pp. 1117-1132, 2001.

Digital Library

[24]

Z. Zhu and Z. Zhang, "A Performance Comparison of DRAM Memory System Optimizations for SMT Processors," in HPCA '05, 2005, pp. 213-224.

Digital Library

[25]

E. Ipek, O. Mutlu, J. F. Martínez, and R. Caruana, "Self-Optimizing Memory Controllers: A Reinforcement Learning Approach," in ISCA '08, 2008, pp. 39-50.

Digital Library

[26]

H. Q. Le, W. J. Starke, J. S. Fields, F. O'Connell et al., "IBM POWER6 Microarchitecture," IBM Journal of Research and Developement, vol. 51, no. 6, pp. 639-662, 2007.

Digital Library

[27]

S. Sharma, J. G. Beu, and T. M. Conte, "Spectral Prefetcher: An Effective Mechanism for L2 Cache Prefetching," ACM Transactions on Architecture and Code Optimization, vol. 2, no. 4, pp. 423-450, 2005.

Digital Library

[28]

C. J. Lee, O. Mutlu, V. Narasiman, and Y. N. Patt, "Prefetch-Aware DRAM Controllers," in MICRO '08, 2008, pp. 200-209.

Digital Library

[29]

J. Laudon and D. Lenoski, "The SGI Origin: A ccNUMA Highly Scalable Server," in ISCA '97, 1997, pp. 241-251.

Digital Library

[30]

B. Ganesh, A. Jaleel, D. Wang, and B. Jacob, "Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling," in HPCA' 07, 2007, pp. 109-120.

Digital Library

[31]

C. Kim, D. Burger, and S. W. Keckler, "An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches," in ASPLOS '02, 2002, pp. 211-222.

Digital Library

[32]

J. Chang and G. S. Sohi, "Cooperative Caching for Chip Multiprocessors," in ISCA '06, 2006, pp. 264-276.

Digital Library

[33]

M. Ekman and P. Stenstrom, "A Cost-Effective Main Memory Organization for Future Servers," in IPDPS '05, 2005, pp. 45-54.

Digital Library

Cited By

Jin WJang WPark HLee JKim SLee JSolihin YHeinrich M(2023)DRAM Translation Layer: Software-Transparent DRAM Power Savings for Disaggregated MemoryProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589051(1-13)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589051
Chen LZhao JWang CCao TZigman JVolos HMutlu OLv FFeng XXu GCui H(2022)Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid MemoriesACM Transactions on Computer Systems10.1145/351121139:1-4(1-38)Online publication date: 5-Jul-2022
https://dl.acm.org/doi/10.1145/3511211
Shin DJang HOh KLee J(2022)An Energy-Efficient DRAM Cache Architecture for Mobile Platforms With PCM-Based Main MemoryACM Transactions on Embedded Computing Systems10.1145/345199521:1(1-22)Online publication date: 14-Jan-2022
https://dl.acm.org/doi/10.1145/3451995
Show More Cited By

Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support

Recommendations

Redesign the Memory Allocator for Non-Volatile Main Memory
Special Issue on Hardware and Algorithms for Learning On-a-chip and Special Issue on Alternative Computing Systems

The non-volatile memory (NVM) has the merits of byte-addressability, fast speed, persistency and low power consumption, which make it attractive to be used as main memory. Commonly, user process dynamically acquires memory through memory allocators. ...
A durable and energy efficient main memory using phase change memory technology
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Using nonvolatile memories in memory hierarchy has been investigated to reduce its energy consumption because nonvolatile memories consume zero leakage power in memory cells. One of the difficulties is, however, that the endurance of most nonvolatile ...
Power management of hybrid DRAM/PRAM-based main memory
DAC '11: Proceedings of the 48th Design Automation Conference

Hybrid main memory consisting of DRAM and non-volatile memory is attractive since the non-volatile memory can give the advantage of low standby power while DRAM provides high performance and better active power. In this work, we address the power ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences

SC '10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis

November 2010

634 pages

ISBN:9781424475599

Conference Chair:
Barry V. Hess

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS: Computer Society

Publisher

IEEE Computer Society

United States

Publication History

Published: 13 November 2010

Check for updates

Qualifiers

Article

Conference

SC '10

Sponsor:

SIGARCH
IEEE-CS

SC '10: International Conference for High Performance Computing, Networking, Storage and Analysis

November 13 - 19, 2010

Acceptance Rates

SC '10 Paper Acceptance Rate 51 of 253 submissions, 20%;

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

64
Total Citations
View Citations
627
Total Downloads

Downloads (Last 12 months)2
Downloads (Last 6 weeks)1

Reflects downloads up to 10 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Jin WJang WPark HLee JKim SLee JSolihin YHeinrich M(2023)DRAM Translation Layer: Software-Transparent DRAM Power Savings for Disaggregated MemoryProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589051(1-13)Online publication date: 17-Jun-2023
https://dl.acm.org/doi/10.1145/3579371.3589051
Chen LZhao JWang CCao TZigman JVolos HMutlu OLv FFeng XXu GCui H(2022)Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid MemoriesACM Transactions on Computer Systems10.1145/351121139:1-4(1-38)Online publication date: 5-Jul-2022
https://dl.acm.org/doi/10.1145/3511211
Shin DJang HOh KLee J(2022)An Energy-Efficient DRAM Cache Architecture for Mobile Platforms With PCM-Based Main MemoryACM Transactions on Embedded Computing Systems10.1145/345199521:1(1-22)Online publication date: 14-Jan-2022
https://dl.acm.org/doi/10.1145/3451995
Kannan SRen YBhattacharjee ASherwood TBerger EKozyrakis C(2021)KLOCs: kernel-level object contexts for heterogeneous memory systemsProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446745(65-78)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3445814.3446745
Wang CCui HCao TZigman JVolos HMutlu OLv FFeng XXu GMcKinley KFisher K(2019)Panthera: holistic memory management for big data processing over hybrid memoriesProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314650(347-362)Online publication date: 8-Jun-2019
https://dl.acm.org/doi/10.1145/3314221.3314650
Yan ZLustig DNellans DBhattacharjee ABahar IHerlihy MWitchel ELebeck A(2019)Nimble Page Management for Tiered Memory SystemsProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304024(331-345)Online publication date: 4-Apr-2019
https://dl.acm.org/doi/10.1145/3297858.3304024
Kannan SGavrilovska AGupta VSchwan K(2018)HeteroOSACM SIGOPS Operating Systems Review10.1145/3273982.327398552:1(13-26)Online publication date: 28-Aug-2018
https://dl.acm.org/doi/10.1145/3273982.3273985
Lee SLee KSung MAlian MKim CCho WOh RO SAhn JKim NEvripidou SStenström PO'Boyle M(2018)3D-XpathProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243191(1-12)Online publication date: 1-Nov-2018
https://dl.acm.org/doi/10.1145/3243176.3243191
Teran EChishti ZWang ZWilkerson CJiménez DKaeli DPericàs M(2018)Flexible associativity for DRAM cachesProceedings of the 15th ACM International Conference on Computing Frontiers10.1145/3203217.3203283(88-96)Online publication date: 8-May-2018
https://dl.acm.org/doi/10.1145/3203217.3203283
Chen XPourbakhsh SFu JGong NWang J(2018)A Novel Hybrid Delay Unit Based on Dummy TSVs for 3-D On-Chip MemoryIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2018.280996126:7(1277-1289)Online publication date: 1-Jul-2018
https://dl.acm.org/doi/10.1109/TVLSI.2018.2809961
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents