[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/SC.2010.50acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
Article

Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support

Published: 13 November 2010 Publication History

Abstract

System-in-Package (SiP) and 3D integration are promising technologies to bring more memory onto a microprocessor package to mitigate the "memory wall" problem. In this paper, instead of using them to build caches, we study a heterogenous main memory using both on- and off-package memories providing both fast and high-bandwidth on-package accesses and expandable and low-cost commodity off-package memory capacity. We introduce another layer of address translation coupled with an on-chip memory controller that can dynamically migrate data between off-package and off-package memory either in hardware or with operating system assistance depending on the migration granularity. Our experimental results demonstrate that such design can achieve the average effectiveness of 83% of the ideal case where all memory can be placed in high-speed on-package memory for our simulated benchmarks.

References

[1]
B. Black, M. Annavaram, N. Brekelbaum, J. DeVale, L. Jiang et al., "Die Stacking (3D) Microarchitecture," in MICRO '06, 2006, pp. 469-479.
[2]
G. H. Loh, Y. Xie, and B. Black, "Processor Design in 3D Die-Stacking Technologies," IEEE Micro, vol. 27, no. 3, pp. 31-48, 2007.
[3]
International Technology Roadmap for Semiconductors, "ITRS 2009 Edition," http://www.itrs.net/.
[4]
T. Kgil, A. Saidi, N. Binkert, S. Reinhardt, K. Flautner et al., "Pi-coServer: Using 3D stacking technology to build energy efficient servers," ACM Journal on Emerging Technologies in Computing Systems, vol. 4, no. 4, pp. 1-34, 2008.
[5]
G. H. Loh, "3D-Stacked Memory Architectures for Multi-core Processors," in ISCA '08, 2008, pp. 453-464.
[6]
G. L. Loi, B. Agrawal, N. Srivastava, S.-C. Lin, T. Sherwood et al., "A Thermally-Aware Performance Analysis of Vertically Integrated (3-D) Processor-Memory Hierarchy," in DAC '06, 2006, pp. 991-996.
[7]
M. Ghosh and H.-H. S. Lee, "Smart Refresh: An Enhanced Memory Controller Design for Reducing Energy in Conventional and 3D Die-Stacked DRAMs," in MICRO '07, 2007, pp. 134-145.
[8]
P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hållberg et al., "Simics: A Full System Simulation Platform," Computer, vol. 35, no. 2, pp. 50-58, 2002.
[9]
S. Thoziyoor, J. H. Ahn, M. Monchiero, J. B. Brockman, and N. P. Jouppi, "A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies," in ISCA '08. IEEE Computer Society, 2008, pp. 51-62.
[10]
Micron, "2Gb: x4, x8, x16 DDR3 SDRAM," http://www.micron.net/.
[11]
S. Rixner, W. J. Dally, U. J. Kapasi, P. Mattson, and J. D. Owens, "Memory Access Scheduling," in ISCA '00, 2000, pp. 128-138.
[12]
B. M. Beckmann, M. R. Marty, and D. A. Wood, "ASR: Adaptive Selective Replication for CMP Caches," in MICRO '06, 2006, pp. 443- 454.
[13]
Z. Chishti, M. D. Powell, and T. N. Vijaykumar, "Optimizing Replication, Communication, and Capacity Allocation in CMPs," in ISCA '05, 2005, pp. 357-368.
[14]
J. Huh, C. Kim, H. Shafi, L. Zhang, D. Burger et al., "A NUCA Substrate for Flexible CMP Cache Sharing," in ICS '05, 2005, pp. 31-40.
[15]
N. Rafique, W.-T. Lim, and M. Thottethodi, "Architectural Support for Operating System-Driven CMP Cache Management," in PACT '06, 2006, pp. 2-12.
[16]
S. Cho and L. Jin, "Managing Distributed, Shared L2 Caches through OS-Level Page Allocation," in MICRO '06, 2006, pp. 455-468.
[17]
"UltraSPARC T2. Supplement to the UltraSPARC Architecture 2007," Sun Microsystems, Inc., Tech. Rep. 950-5556-01, 2007.
[18]
G. Loh, "Extending the Effectiveness of 3D-Stacked DRAM Caches with an Adaptive Multi-Queue Policy," in MICRO '09, 2009, pp. 201- 212.
[19]
J. Liedtke, "Improving IPC by kernel design," in SOSP '93, 1993, pp. 175-188.
[20]
E. Argollo, A. Falcón, P. Faraboschi, M. Monchiero, and D. Ortega, "COTSon: Infrastructure for Full System Simulation," ACM SIGOPS Operating Systems Review, vol. 43, no. 1, pp. 52-61, 2009.
[21]
K. Fukuda, H. Yamashita, G. Ono, R. Nemoto, E. Suzuki, T. Takemoto, F. Yuki, and T. Saito, "A 12.3mW 12.5Gb/s Complete Transceiver in 65nm CMOS," in ISSCC '10, 2010, pp. 368-369.
[22]
W.-F. Lin, S. K. Reinhardt, and D. Burger, "Reducing DRAM Latencies with an Integrated Memory Hierarchy Design," in HPCA '01, 2001, pp. 301-312.
[23]
L. Zhang, Z. Fang, M. Parker, B. K. Mathew, L. Schaelicke et al., "The Impulse Memory Controller," IEEE Transactions on Computers, vol. 50, no. 11, pp. 1117-1132, 2001.
[24]
Z. Zhu and Z. Zhang, "A Performance Comparison of DRAM Memory System Optimizations for SMT Processors," in HPCA '05, 2005, pp. 213-224.
[25]
E. Ipek, O. Mutlu, J. F. Martínez, and R. Caruana, "Self-Optimizing Memory Controllers: A Reinforcement Learning Approach," in ISCA '08, 2008, pp. 39-50.
[26]
H. Q. Le, W. J. Starke, J. S. Fields, F. O'Connell et al., "IBM POWER6 Microarchitecture," IBM Journal of Research and Developement, vol. 51, no. 6, pp. 639-662, 2007.
[27]
S. Sharma, J. G. Beu, and T. M. Conte, "Spectral Prefetcher: An Effective Mechanism for L2 Cache Prefetching," ACM Transactions on Architecture and Code Optimization, vol. 2, no. 4, pp. 423-450, 2005.
[28]
C. J. Lee, O. Mutlu, V. Narasiman, and Y. N. Patt, "Prefetch-Aware DRAM Controllers," in MICRO '08, 2008, pp. 200-209.
[29]
J. Laudon and D. Lenoski, "The SGI Origin: A ccNUMA Highly Scalable Server," in ISCA '97, 1997, pp. 241-251.
[30]
B. Ganesh, A. Jaleel, D. Wang, and B. Jacob, "Fully-Buffered DIMM Memory Architectures: Understanding Mechanisms, Overheads and Scaling," in HPCA' 07, 2007, pp. 109-120.
[31]
C. Kim, D. Burger, and S. W. Keckler, "An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches," in ASPLOS '02, 2002, pp. 211-222.
[32]
J. Chang and G. S. Sohi, "Cooperative Caching for Chip Multiprocessors," in ISCA '06, 2006, pp. 264-276.
[33]
M. Ekman and P. Stenstrom, "A Cost-Effective Main Memory Organization for Future Servers," in IPDPS '05, 2005, pp. 45-54.

Cited By

View all
  • (2023)DRAM Translation Layer: Software-Transparent DRAM Power Savings for Disaggregated MemoryProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589051(1-13)Online publication date: 17-Jun-2023
  • (2022)Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid MemoriesACM Transactions on Computer Systems10.1145/351121139:1-4(1-38)Online publication date: 5-Jul-2022
  • (2022)An Energy-Efficient DRAM Cache Architecture for Mobile Platforms With PCM-Based Main MemoryACM Transactions on Embedded Computing Systems10.1145/345199521:1(1-22)Online publication date: 14-Jan-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
November 2010
634 pages
ISBN:9781424475599

Sponsors

Publisher

IEEE Computer Society

United States

Publication History

Published: 13 November 2010

Check for updates

Qualifiers

  • Article

Conference

SC '10
Sponsor:

Acceptance Rates

SC '10 Paper Acceptance Rate 51 of 253 submissions, 20%;
Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)2
  • Downloads (Last 6 weeks)1
Reflects downloads up to 10 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)DRAM Translation Layer: Software-Transparent DRAM Power Savings for Disaggregated MemoryProceedings of the 50th Annual International Symposium on Computer Architecture10.1145/3579371.3589051(1-13)Online publication date: 17-Jun-2023
  • (2022)Unified Holistic Memory Management Supporting Multiple Big Data Processing Frameworks over Hybrid MemoriesACM Transactions on Computer Systems10.1145/351121139:1-4(1-38)Online publication date: 5-Jul-2022
  • (2022)An Energy-Efficient DRAM Cache Architecture for Mobile Platforms With PCM-Based Main MemoryACM Transactions on Embedded Computing Systems10.1145/345199521:1(1-22)Online publication date: 14-Jan-2022
  • (2021)KLOCs: kernel-level object contexts for heterogeneous memory systemsProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446745(65-78)Online publication date: 19-Apr-2021
  • (2019)Panthera: holistic memory management for big data processing over hybrid memoriesProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation10.1145/3314221.3314650(347-362)Online publication date: 8-Jun-2019
  • (2019)Nimble Page Management for Tiered Memory SystemsProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304024(331-345)Online publication date: 4-Apr-2019
  • (2018)HeteroOSACM SIGOPS Operating Systems Review10.1145/3273982.327398552:1(13-26)Online publication date: 28-Aug-2018
  • (2018)3D-XpathProceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques10.1145/3243176.3243191(1-12)Online publication date: 1-Nov-2018
  • (2018)Flexible associativity for DRAM cachesProceedings of the 15th ACM International Conference on Computing Frontiers10.1145/3203217.3203283(88-96)Online publication date: 8-May-2018
  • (2018)A Novel Hybrid Delay Unit Based on Dummy TSVs for 3-D On-Chip MemoryIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2018.280996126:7(1277-1289)Online publication date: 1-Jul-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media