[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3195638.3195709acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Delegated persist ordering

Published: 15 October 2016 Publication History

Abstract

Systems featuring a load-store interface to persistent memory (PM) are expected soon, making in-memory persistent data structures feasible. Ensuring persistent data structure recoverability requires constraints on the order PM writes become persistent. But, current memory systems reorder writes, providing no such guarantees. To complement their upcoming 3D XPoint memory, Intel has announced new instructions to enable programmer control of data persistence. We describe the semantics implied by these instructions, an ordering model we call synchronous ordering.
Synchronous ordering (SO) enforces order by stalling execution when PM write ordering is required, exposing PM write latency on the execution critical path. It incurs an average slowdown of 7.21x over volatile execution without ordering in PM-write-intensive benchmarks. SO tightly couples enforcing order and flushing writes to PM, but this tight coupling is unneeded in many recoverable software systems. Instead, we propose delegated ordering, wherein ordering requirements are communicated explicitly to the PM controller, fully decoupling PM write ordering from volatile execution and cache management. We demonstrate that delegated ordering can bring performance within 1.93x of volatile execution, improving over SO by 3.73x.

References

[1]
Intel and Micron, "Intel and micron produce breakthrough memory technology,"2015, http://newsroom.intel.com/community/intel_newsroom/blog/2015/07/28/intel-and-micron-produce-breakthrough-memory-technology.
[2]
C. World, "Hp and sandisk partner to bring storage-class memory to market," 2015, http://www.computerworld.com/article/2990809/data-storage-solutions/hp-sandisk-partner-to-bring-storage-class-memory-to-market.html.
[3]
Intel, "Intel architecture instruction set extensions programming reference (319433--022)," 2014, https://software.intel.com/sites/default/files/managed/0d/53/319433-022.pdf.
[4]
S. Pelley, P. M. Chen, and T. F. Wenisch, "Memory persistency," in Proceedings of the 41st International Symposium on Computer Architecture, 2014.
[5]
J. Zhao, S. Li, D. H. Yoon, Y. Xie, and N. P. Jouppi, "Kiln: Closing the performance gap between systems with and without persistence support," in Proceedings of 46th International Symposium on Microarchitecure, 2013.
[6]
H. Volos, A. J. Tack, and M. M. S. E, "Mnemosyne: Leightweight persistent memory," in Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, 2011.
[7]
J. Condit, E. B. Nightingale, C. Frost, E. Ipek, B. Lee, D. Burger, and D. Coetzee, "Better i/o through byte-addressable, persistent memory," in Proceedings of the 22nd ACM Symposium on Operating Systems Principles, 2009.
[8]
J. Zhao, O. Mutlu, and Y. Xie, "Firm: Fair and high-performance memory control for peristent memory systems," in Proceedings of 47th International Symposium on Microarchitecure, 2014.
[9]
V. Chidambaram, T. S. Pillai, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau, "Optimistic crash consistency," in Proceedings of the 24th ACM Symposium on Operating Systems Principles, 2013.
[10]
J. Coburn, A. M. Caulfield, A. Akel, L. M. Grupp, R. K. Gupta, R. Jhala, and S. Swanson, "Nv-heaps: Making persistent objects fast and safe with next-generation, non-volatile memories," in Proceedings of the 16th International Conference on Architectural Support for Programming Languages and Operating Systems, 2011.
[11]
D. R. Chakrabarti, H.-J. Boehm, and K. Bhandari, "Atlas: leveraging locks for non-volatile memory consistency," in Proceedings of the Conference on Object-Oriented Programming, Systems, Languages, and Applications, 2014.
[12]
H.-J. Boehm and D. R. Chakrabarti, "Persistence programming models for non-volatile memory," Hewlett-Packard, Tech. Rep. HPL-2015-59, 2015.
[13]
A. Joshi, V. Nagarajan, M. Cintra, and S. Viglas, "Efficient persist barriers for multicores," in Proceedings of the international symposium on Microarchitecture, 2015.
[14]
A. Kolli, S. Pelley, A. Saidi, P. M. Chen, and T. F. Wenisch, "High-performance transactions for persistent memories," in Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016.
[15]
J. Izraelevitz, T. Kelly, and A. Kolli, "Failure-atomic persistent memory updates via justdo logging," in Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems, 2016.
[16]
T. Wang and R. Johnson, "Scalable logging through emerging nonvolatile memory," Proceedings of the VLDB Endowment, vol. 7, no. 10, pp. 865--876, June 2014.
[17]
D. Narayanan and O. Hodson, "Whole-system persistence," in Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012.
[18]
F. Nawab, D. Chakrabarti, T. Kelly, and C. B. M. III, "Procrastination beats prevention: Timely sufficient persistence for efficient crash resilience," Hewlett-Packard, Tech. Rep. HPL-2014-70, December 2014.
[19]
G. R. Ganger, M. K. McKusick, C. A. N. Soules, and Y. N. Patt, "Soft Updates: A Solution to the Metadata Update Problem in File Systems," ACM Transactions on Computer Systems, vol. 18, no. 2, May 2000.
[20]
C. Blundell, M. M. Martin, and T. F. Wenisch, "Invisifence: Performance-transparent memory ordering in conventional multiprocessors," in Proceedings of the 36th Annual International Symposium on Computer Architecture, 2009.
[21]
T. F. Wenisch, A. Ailamaki, B. Falsafi, and A. Moshovos, "Mechanisms for store-wait-free multiprocessors," in Proceedings of the 34th Annual International Symposium on Computer Architecture, 2007.
[22]
L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas, "Bulksc: Bulk enforcement of sequential consistency," in Proceedings of the 34th Annual International Symposium on Computer Architecture, 2007.
[23]
C. Gniady, B. Falsafi, and T. N. Vijaykumar, "Is sc + ilp = rc?" in Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999.
[24]
P. Ranganathan, V. S. Pai, and S. V. Adve, "Using speculative retirement and larger instruction windows to narrow the performance gap between memory consistency models," in Proceedings of the Ninth Annual ACM Symposium on Parallel Algorithms and Architectures, 1997.
[25]
K. Gharachorloo, A. Gupta, and J. Hennessy, "Two techniques to enhance the performance of memory consistency models," in In Proceedings of the 1991 International Conference on Parallel Processing, 1991.
[26]
N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comput. Archit. News, vol. 39, no. 2, pp. 1--7, Aug. 2011.
[27]
S. R. Dulloor, S. Kumar, A. Keshavamurthy, P. Lantz, D. Reddy, R. Sankaran, and J. Jackson, "System software for persistent memory," in Proceedings of the 9th European Conference on Computer Systems, 2014.
[28]
K. Bhandari, D. R. Chakrabarti, and H.-J. Boehm, "Implications of cpu caching on byte-addressable non-volatile memory programming," Hewlett-Packard, Tech. Rep. HPL-2012-236, December 2012.
[29]
B. C. Lee, E. Ipek, O. Mutlu, and D. Burger, "Architecting phase change memory as a scalable dram alternative," in Proceedings of the 36th Annual International Symposium on Computer Architecture, 2009.
[30]
S. V. Adve and K. Gharachorloo, "Shared memory consistency models: A tutorial," IEEE Computer, vol. 29, no. 12, pp. 66--76, December 1996.
[31]
ARM, "Armv8-a architecture evolution," 2016, https://community.arm.com/groups/processors/blog/2016/01/05/armv8-a-architecture-evolution.
[32]
ARM, ARM Architecture Reference Manual. ARM, 2007.
[33]
A. Kolli, S. Pelley, A. Saidi, P. M. Chen, and T. F. Wenisch, "Persistency programming 101," 2015, http://nvmw.ucsd.edu/2015/assets/abstracts/33.
[34]
M. Luc, S. Inria, Sarkar, and P. Sewell, "A tutorial introduction to the arm and power relaxed memory models," 2012.
[35]
D. Lustig, C. Trippel, M. Pellauer, and M. Martonosi, "Armor: Defending against memory consistency model mismatches in heterogeneous architectures," in Proceedings of the 42Nd Annual International Symposium on Computer Architecture, 2015.
[36]
S. Sarkar, P. Sewell, J. Alglave, L. Maranget, and D. Williams, "Understanding power multiprocessors," in Proceedings of the 32Nd ACM SIGPLAN Conference on Programming Language Design and Implementation, 2011.
[37]
J. Alglave, L. Maranget, and M. Tautschnig, "Herding cats: Modelling, simulation, testing, and data-mining for weak memory," in Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, 2014.
[38]
ARM, "Barrier litmus tests and cookbook," 2009, http://infocenter.arm.com/help/topic/com.arm.doc.genc007826/Barrier_Litmus_Tests_and_Cookbook_A08.pdf.
[39]
R. Ausavarungnirun, K. K.-W. Chang, L. Subramanian, G. H. Loh, and O. Mutlu, "Staged memory scheduling: schieving high performance and scalability in heterogeneous systems," in In Proceedings of the International Symposium on Computer Architecture, 2012.
[40]
Y. Kim, D. Han, O. MUtlu, and M. Harchol-Balter, "Atlas: A scalable and high-performance scheduling algorithm for multiple memory controllers," in In Proceedings of the International Symposium on High Performance Computer Architecture, 2010.
[41]
Y. Kim, M. Papamichael, O. Mutlu, and M. Harchol-Balter, "Thread cluster memory scheduling: Exploiting differences in memory access behavior," in In Proceedings of the International Symposium on Microarchitecture, 2010.
[42]
R.-S. Liu, D.-Y. Shen, C.-L. Yang, S.-C. Yu, and C.-Y. M. Wang, "Nvm duet: unified working memory and persistent store architecture," in Proceedings of the international conference on Architectural Support for Programming Languages an Operating Systems, 2014.
[43]
T. Harris, J. Larus, and R. Rajwar, Transactional memory. Morgan & Claypool Publishers, 2010.
[44]
C. Xu, D. Niu, N. Muralimanohar, R. Balasubramonian, T. Zhang, S. Yu, and Y. Xie, "Overcoming the challenges of crossbar resistive memory architectures," in In Proceedings of the International Symposium on High Performance Computer Architecture, 2015.
[45]
S. Neuvonen, A. Wolski, M. Manner, and V. Raatikka, "Telecom application transaction processing benchmark," 2011, http://tatpbenchmark.sourceforge.net/.
[46]
T. P. P. C. (TPC), "Tpc benchmark b," 2010, http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-c_v5-11.pdf.
[47]
M. K. Qureshi, M. M. Franchescini, V. Srinivasan, L. A. Lastras, B. Abali, and J. Karidis, "Enhancing lifetime and security of pcm-based main memory with start-gap wear leveling," in Proceedings of the International Symposium on Microarchitecture, 2009.
[48]
M. K. Qureshi, A. Seznec, L. A. Lastras, and M. M. Franchescini, "Practical and secure pcm systems by online detection of malicious write streams," in Proceedings of the 17th International Symposium on High Performance Computer Architecture, 2011.
[49]
P. Zhou, B. Zhao, J. Yang, and Y. Zhang, "A durable and energy efficient main memory using phase change memory technology," in Proceedings of the 36th International Symposium on Computer Architecture, 2009.
[50]
J. Yue and Y. Zhu, "Accelerating write by exploiting pcm asymmetries," in Proceedings of the International Symposium on High Performance Computer Architecture, 2013.
[51]
S. Cho and H. Lee, "Flip-n-write: a simple deterministic technique to improve pram write performance, energy and endurance," in Proceedings of the International Symposium on Microarchitecture, 2009.
[52]
A. Hay, K. Strauss, T. Sherwood, G. H. Loh, and D. Burger, "Preventing pcm banks from seizing too much power," in Proceedings of the International Symposium on Microarchitecture, 2011.
[53]
M. Awasthi, M. Shevgoor, K. Sudan, B. Rajendran, and R. Balasubramonian, "Efficient scrub mechanisms for error-prone emerging memories," in Proceedings of the International Symposium on High Performance Computer Architecture, 2012.
[54]
A. Chatzistergiou, M. Cintra, and S. D. Vaglis, "Rewind: Recovery write-ahead system for in-memory non-volatile data structures," Proceedings of the VLDB Endowment, vol. 8, no. 5, 2015.
[55]
X. Wu and A. L. N. Reddy, "Scmfs: a file system for storage class memory," in In Proceedings of the International Conference for High Performance Computing, 2011.
[56]
Y. Lu, J. Shu, L. Sun, and O. Mutlu, "Loose-ordering consistency for persistent memory," in Proceedings of the 32nd IEEE International Conference on Computer Design, 2014.

Cited By

View all
  • (2023)Scoped Buffered Persistency Model for GPUsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575749(688-701)Online publication date: 27-Jan-2023
  • (2023)NearPM: A Near-Data Processing System for Storage-Class ApplicationsProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3587456(751-767)Online publication date: 8-May-2023
  • (2022)PaviseProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569662(109-123)Online publication date: 8-Oct-2022
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO-49: The 49th Annual IEEE/ACM International Symposium on Microarchitecture
October 2016
816 pages

Sponsors

Publisher

IEEE Press

Publication History

Published: 15 October 2016

Check for updates

Author Tags

  1. delegated ordering
  2. memory persistency
  3. persistent memory
  4. relaxed consistency

Qualifiers

  • Research-article

Conference

MICRO-49
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 11 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Scoped Buffered Persistency Model for GPUsProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 210.1145/3575693.3575749(688-701)Online publication date: 27-Jan-2023
  • (2023)NearPM: A Near-Data Processing System for Storage-Class ApplicationsProceedings of the Eighteenth European Conference on Computer Systems10.1145/3552326.3587456(751-767)Online publication date: 8-May-2023
  • (2022)PaviseProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569662(109-123)Online publication date: 8-Oct-2022
  • (2022)GPM: leveraging persistent memory from a GPUProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507758(142-156)Online publication date: 28-Feb-2022
  • (2022)Understanding and detecting deep memory persistency bugs in NVM programs with DeepMCProceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming10.1145/3503221.3508427(322-336)Online publication date: 2-Apr-2022
  • (2021)COSPlay: Leveraging Task-Level Parallelism for High-Throughput Synchronous PersistenceMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480075(86-99)Online publication date: 18-Oct-2021
  • (2021)PMFuzz: test case generation for persistent memory programsProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446691(487-502)Online publication date: 19-Apr-2021
  • (2021)Write Prediction for Persistent Memory SystemsProceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques10.1109/PACT52795.2021.00025(242-257)Online publication date: 26-Sep-2021
  • (2021)NVOverlayProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00046(498-511)Online publication date: 14-Jun-2021
  • (2021)Execution dependence extension (EDE)Proceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00043(456-469)Online publication date: 14-Jun-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media