[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ Skip to main content
Log in

Extending hardware transactional memory capacity via rollback-only transactions and suspend/resume

POWER8 TM

  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

Transactional memory (TM) aims at simplifying concurrent programming via the familiar abstraction of atomic transactions. Recently, Intel and IBM have integrated hardware based TM (HTM) implementations in commodity processors, paving the way for the mainstream adoption of the TM paradigm. Yet, existing HTM implementations suffer from a crucial limitation, which hampers the adoption of HTM as a general technique for regulating concurrent access to shared memory: the inability to execute transactions whose working sets exceed the capacity of CPU caches. In this article we propose P8TM, a novel approach that mitigates this limitation on IBM’s POWER8 architecture by leveraging a key combination of hardware and software techniques to support different execution paths. P8TM also relies on self-tuning mechanisms aimed at dynamically switching between different execution modes to best adapt to the workload characteristics. In-depth evaluation with several benchmarks indicates that P8TM can achieve striking performance gains in workloads that stress the capacity limitations of HTM, while achieving performance on par with HTM even in unfavourable workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
£29.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. The list of restrictions is actually longer, including the lack of support for system calls and other non-undoable instructions, context switches and ring transitions.

  2. Two transactions are concurrent if the begin call of either of them happens in real time between the begin and commit calls of the other.

  3. Logarithmic bounds on the cumulative error, called regret, from playing non-optimal levers even in finite time horizons [3].

  4. To minimize instrumentation overhead, latency measurements were only performed on a single thread. Since in this benchmarks all threads execute all type of transactions, this approach does not introduce any bias in the results we gathered.

References

  1. Afek, Y., Levy, A., Morrison, A.: Programming with hardware lock elision. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, pp. 295–296. ACM, New York (2013). https://doi.org/10.1145/2442516.2442552

  2. Arbel, M., Attiya, H.: Concurrent updates with RCU: Search tree as an example. In: Proceedings of the 2014 ACM Symposium on Principles of Distributed Computing, PODC ’14, pp. 196–205. ACM, New York (2014). https://doi.org/10.1145/2611462.2611471

  3. Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3, 397–422 (2003)

    MathSciNet  MATH  Google Scholar 

  4. Berry, D., Fristedt, B.: Bandit Problems. Chapman and Hall, London (1985)

    Book  Google Scholar 

  5. Boehm, H., Gottschlich, J., Luchangco, V., Michael, M., Moir, M., Nelson, C., Riegel, T., Shpeisman, T., Wong, M.: Transactional language constructs for C++. ISO/IEC JTC1/SC22 WG21 (C++) (2012)

  6. Cain, H.W., Michael, M.M., Frey, B., May, C., Williams, D., Le, H.: Robust architectural support for transactional memory in the power architecture. In: Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA ’13, pp. 225–236. ACM, New York (2013). https://doi.org/10.1145/2485922.2485942

  7. Calciu, I., Shpeisman, T., Pokam, G., Herlihy, M.: Improved single global lock fallback for best-effort hardware transactional memory. In: 9th ACM SIGPLAN Wkshp. on Transactional Computing (2014)

  8. Clements, A.T., Kaashoek, M.F., Zeldovich, N.: Scalable address spaces using RCU balanced trees. In: Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pp. 199–210. ACM, New York (2012). https://doi.org/10.1145/2150976.2150998

  9. Dalessandro, L., Carouge, F., White, S., Lev, Y., Moir, M., Scott, M.L., Spear, M.F.: Hybrid NOrec: A case study in the effectiveness of best effort hardware transactional memory. In: Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, pp. 39–52. ACM, New York (2011). https://doi.org/10.1145/1950365.1950373

  10. Damron, P., Fedorova, A., Lev, Y., Luchangco, V., Moir, M., Nussbaum, D.: Hybrid transactional memory. In: Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XII, pp. 336–346. ACM, New York (2006). https://doi.org/10.1145/1168857.1168900

  11. Di Sanzo, P., Ciciani, B., Palmieri, R., Quaglia, F., Romano, P.: On the analytical modeling of concurrency control algorithms for software transactional memories: the case of commit-time-locking. Perform. Eval. 69(5), 187–205 (2012). https://doi.org/10.1016/j.peva.2011.05.002

    Article  Google Scholar 

  12. di Sanzo, P., Quaglia, F., Palmieri, R.: Analytical modelling of commit-time-locking algorithms for software transactional memories. In: Int. CMG Conference (2010)

  13. Dice, D., Harris, T.L., Kogan, A., Lev, Y., Moir, M.: Hardware extensions to make lazy subscription safe. CoRR arXiv:1407.6968 (2014)

  14. Dice, D., Lev, Y., Moir, M., Nussbaum, D.: Early experience with a commercial hardware transactional memory implementation. In: Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIV, pp. 157–168. ACM, New York (2009). https://doi.org/10.1145/1508244.1508263

  15. Dice, D., Shavit, N.: Understanding tradeoffs in software transactional memory. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO ’07, pp. 21–33. IEEE Computer Society, Washington, DC (2007). https://doi.org/10.1109/CGO.2007.38

  16. Diegues, N., Romano, P.: Self-tuning intel transactional synchronization extensions. In: 11th International Conference on Autonomic Computing (ICAC 14), pp. 209–219. USENIX Association, Philadelphia (2014). https://www.usenix.org/conference/icac14/technical-sessions/presentation/diegues

  17. Diegues, N., Romano, P., Garbatov, S.: Seer: probabilistic scheduling for hardware transactional memory. In: Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’15, pp. 224–233. ACM, New York (2015). https://doi.org/10.1145/2755573.2755578. http://doi.acm.org.focus.lib.kth.se/10.1145/2755573.2755578

  18. Diegues, N., Romano, P., Rodrigues, L.: Virtues and limitations of commodity hardware transactional memory. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT ’14, pp. 3–14. ACM, New York (2014). https://doi.org/10.1145/2628071.2628080

  19. Evan Jones: tpccbench. https://github.com/evanj/tpccbench (2007). Accessed 23 Sept 2019

  20. Felber, P., Issa, S., Matveev, A., Romano, P.: Hardware read-write lock elision. In: Proceedings of the Eleventh European Conference on Computer Systems, EuroSys ’16, pp. 34:1–34:15. ACM, New York (2016). https://doi.org/10.1145/2901318.2901346

  21. Garivier, A., Moulines, E.: On upper-confidence bound policies for switching bandit problems. In: Proceedings of the 22nd International Conference on Algorithmic Learning Theory, ALT’11, pp. 174–188. Springer, Berlin (2011). http://dl.acm.org/citation.cfm?id=2050345.2050365

  22. Goel, B., Titos-Gil, R., Negi, A., McKee, S.A., Stenstrom, P.: Performance and energy analysis of the restricted transactional memory implementation on haswell. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 615–624 (2014). https://doi.org/10.1109/IPDPS.2014.70

  23. Guerraoui, R., Kapalka, M.: On the correctness of transactional memory. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’08, pp. 175–184. ACM, New York (2008). https://doi.org/10.1145/1345206.1345233

  24. Harris, T.L.: A pragmatic implementation of non-blocking linked-lists. In: Proceedings of the 15th International Conference on Distributed Computing, DISC ’01, pp. 300–314. Springer, London (2001). http://dl.acm.org/citation.cfm?id=645958.676105

  25. Hart, T.E., McKenney, P.E., Brown, A.D., Walpole, J.: Performance of memory reclamation for lockless synchronization. J. Parallel Distrib. Comput. 67(12), 1270–1285 (2007). https://doi.org/10.1016/j.jpdc.2007.04.010

    Article  MATH  Google Scholar 

  26. Issa, S.: P8TM source code and benchmarks. https://github.com/shadyalaa/POWER8TM (2017). Accessed 23 Sept 2019

  27. Jacobi, C., Slegel, T., Greiner, D.: Transactional memory architecture and implementation for IBM System Z. In: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-45, pp. 25–36. IEEE Computer Society, Washington, DC (2012). https://doi.org/10.1109/MICRO.2012.12

  28. Java docs: ReentrantReadWriteLock. https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.html (2018). Accessed 23 Sept 2019

  29. Kumar, S., Chu, M., Hughes, C.J., Kundu, P., Nguyen, A.: Hybrid transactional memory. In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’06, pp. 209–220. ACM, New York (2006). https://doi.org/10.1145/1122971.1123003

  30. Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6, 4–22 (1985)

    Article  MathSciNet  Google Scholar 

  31. Le, H.Q., Guthrie, G.L., Williams, D.E., Michael, M.M., Frey, B.G., Starke, W.J., May, C., Odaira, R., Nakaike, T.: Transactional memory support in the IBM POWER8 processor. IBM J. Res. Dev. 59(1), 8:1–8:14 (2015). https://doi.org/10.1147/JRD.2014.2380199

    Article  Google Scholar 

  32. Marathe, V.J., Scherer, W.N., Scott, M.L.: Adaptive software transactional memory. In: Proceedings of the 19th International Conference on Distributed Computing, DISC’05, pp. 354–368. Springer, Berlin (2005). https://doi.org/10.1007/11561927_26

  33. Matveev, A., Shavit, N.: Reduced Hardware NOrec: a safe and scalable hybrid transactional memory. In: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’15, pp. 59–71. ACM, New York (2015). https://doi.org/10.1145/2694344.2694393

  34. Mckenney, P.E., Slingwine, J.D.: Read-copy update: using execution history to solve concurrency problems. In: Parallel and Distributed Computing and Systems, pp. 509–518. Las Vegas, NV (1998)

  35. McKenney, P.E., Walpole, J.: What is RCU, fundamentally? https://lwn.net/Articles/262464/ (2007). Accessed 23 Sept 2019

  36. Minh, C.C., Chung, J., Kozyrakis, C., Olukotun, K.: Stamp: Stanford transactional applications for multi-processing. In: IEEE International Symposium on Workload Characterization, 2008. IISWC 2008, pp. 35–46. IEEE (2008)

  37. Nakaike, T., Odaira, R., Gaudet, M., Michael, M.M., Tomari, H.: Quantitative comparison of hardware transactional memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA ’15, pp. 144–157. ACM, New York (2015). https://doi.org/10.1145/2749469.2750403

  38. Nussbaum, D., Lev, Y., Moir, M.: PhTM: Phased transactional memory. In: The Second ACM SIGPLAN Workshop on Transactional Computing, TRANSACT ’07. ACM, New York (2007). https://urresearch.rochester.edu/institutionalPublicationPublicView.action?institutionalItemId=4058

  39. Ortner, R., Ryabko, D., Auer, P., Munos, R.: Regret bounds for restless markov bandits. In: Proceedings of the 23rd International Conference on Algorithmic Learning Theory, ALT’12, pp. 214–228. Springer, Berlin (2012). https://doi.org/10.1007/978-3-642-34106-9_19

  40. Raminhas, P., Issa, T., Romano, P.: Enhancing efficiency of hybrid transactional memory via dynamic data partitioning schemes. In: Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid ’18, pp. 1–11 (2018). https://doi.org/10.1109/CCGRID.2018.94

  41. Rossbach, C.J., Hofmann, O.S., Witchel, E.: Is transactional programming actually easier? In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’10, pp. 47–56. ACM, New York (2010). https://doi.org/10.1145/1693453.1693462

  42. Stonebraker, M., Madden, S., Abadi, D.J., Harizopoulos, S., Hachem, N., Helland, P.: The end of an architectural era: (it’s time for a complete rewrite). In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB ’07, pp. 1150–1160. VLDB Endowment (2007). http://dl.acm.org/citation.cfm?id=1325851.1325981

  43. TPC Council: TPC-C Benchmark. http://www.tpc.org/tpcc (2011). Accessed 23 Sept 2019

  44. Wang, A., Gaudet, M., Wu, P., Amaral, J.N., Ohmacht, M., Barton, C., Silvera, R., Michael, M.: Evaluation of Blue Gene/Q hardware support for transactional memories. In: Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT ’12, pp. 127–136. ACM, New York (2012). https://doi.org/10.1145/2370816.2370836

  45. Wang, Q., Kulkarni, S., Cavazos, J., Spear, M.: A transactional memory with automatic performance tuning. ACM Trans. Archit. Code Optim. 8(4), 54:1–54:23 (2012). https://doi.org/10.1145/2086696.2086733

    Article  Google Scholar 

  46. Yoo, R.M., Hughes, C.J., Lai, K., Rajwar, R.: Performance evaluation of Intel® transactional synchronization extensions for high-performance computing. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’13, pp. 19:1–19:11. ACM, New York (2013). https://doi.org/10.1145/2503210.2503232

  47. Yu, J.Y., Mannor, S.: Piecewise-stationary bandit problems with side observations. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pp. 1177–1184. ACM, New York (2009). https://doi.org/10.1145/1553374.1553524

Download references

Acknowledgements

This work was supported by Portuguese funds through Fundação para a Ciência e Tecnologia via projects UID/CEC/50021/2019 and PTDC/EEISCR/1743/2014.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pascal Felber.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Issa, S., Felber, P., Matveev, A. et al. Extending hardware transactional memory capacity via rollback-only transactions and suspend/resume. Distrib. Comput. 33, 327–348 (2020). https://doi.org/10.1007/s00446-019-00363-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00446-019-00363-1

Keywords