Abstract
Transactional memory (TM) aims at simplifying concurrent programming via the familiar abstraction of atomic transactions. Recently, Intel and IBM have integrated hardware based TM (HTM) implementations in commodity processors, paving the way for the mainstream adoption of the TM paradigm. Yet, existing HTM implementations suffer from a crucial limitation, which hampers the adoption of HTM as a general technique for regulating concurrent access to shared memory: the inability to execute transactions whose working sets exceed the capacity of CPU caches. In this article we propose P8TM, a novel approach that mitigates this limitation on IBM’s POWER8 architecture by leveraging a key combination of hardware and software techniques to support different execution paths. P8TM also relies on self-tuning mechanisms aimed at dynamically switching between different execution modes to best adapt to the workload characteristics. In-depth evaluation with several benchmarks indicates that P8TM can achieve striking performance gains in workloads that stress the capacity limitations of HTM, while achieving performance on par with HTM even in unfavourable workloads.
Similar content being viewed by others
Notes
The list of restrictions is actually longer, including the lack of support for system calls and other non-undoable instructions, context switches and ring transitions.
Two transactions are concurrent if the begin call of either of them happens in real time between the begin and commit calls of the other.
Logarithmic bounds on the cumulative error, called regret, from playing non-optimal levers even in finite time horizons [3].
To minimize instrumentation overhead, latency measurements were only performed on a single thread. Since in this benchmarks all threads execute all type of transactions, this approach does not introduce any bias in the results we gathered.
References
Afek, Y., Levy, A., Morrison, A.: Programming with hardware lock elision. In: Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’13, pp. 295–296. ACM, New York (2013). https://doi.org/10.1145/2442516.2442552
Arbel, M., Attiya, H.: Concurrent updates with RCU: Search tree as an example. In: Proceedings of the 2014 ACM Symposium on Principles of Distributed Computing, PODC ’14, pp. 196–205. ACM, New York (2014). https://doi.org/10.1145/2611462.2611471
Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. J. Mach. Learn. Res. 3, 397–422 (2003)
Berry, D., Fristedt, B.: Bandit Problems. Chapman and Hall, London (1985)
Boehm, H., Gottschlich, J., Luchangco, V., Michael, M., Moir, M., Nelson, C., Riegel, T., Shpeisman, T., Wong, M.: Transactional language constructs for C++. ISO/IEC JTC1/SC22 WG21 (C++) (2012)
Cain, H.W., Michael, M.M., Frey, B., May, C., Williams, D., Le, H.: Robust architectural support for transactional memory in the power architecture. In: Proceedings of the 40th Annual International Symposium on Computer Architecture, ISCA ’13, pp. 225–236. ACM, New York (2013). https://doi.org/10.1145/2485922.2485942
Calciu, I., Shpeisman, T., Pokam, G., Herlihy, M.: Improved single global lock fallback for best-effort hardware transactional memory. In: 9th ACM SIGPLAN Wkshp. on Transactional Computing (2014)
Clements, A.T., Kaashoek, M.F., Zeldovich, N.: Scalable address spaces using RCU balanced trees. In: Proceedings of the Seventeenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pp. 199–210. ACM, New York (2012). https://doi.org/10.1145/2150976.2150998
Dalessandro, L., Carouge, F., White, S., Lev, Y., Moir, M., Scott, M.L., Spear, M.F.: Hybrid NOrec: A case study in the effectiveness of best effort hardware transactional memory. In: Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, pp. 39–52. ACM, New York (2011). https://doi.org/10.1145/1950365.1950373
Damron, P., Fedorova, A., Lev, Y., Luchangco, V., Moir, M., Nussbaum, D.: Hybrid transactional memory. In: Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XII, pp. 336–346. ACM, New York (2006). https://doi.org/10.1145/1168857.1168900
Di Sanzo, P., Ciciani, B., Palmieri, R., Quaglia, F., Romano, P.: On the analytical modeling of concurrency control algorithms for software transactional memories: the case of commit-time-locking. Perform. Eval. 69(5), 187–205 (2012). https://doi.org/10.1016/j.peva.2011.05.002
di Sanzo, P., Quaglia, F., Palmieri, R.: Analytical modelling of commit-time-locking algorithms for software transactional memories. In: Int. CMG Conference (2010)
Dice, D., Harris, T.L., Kogan, A., Lev, Y., Moir, M.: Hardware extensions to make lazy subscription safe. CoRR arXiv:1407.6968 (2014)
Dice, D., Lev, Y., Moir, M., Nussbaum, D.: Early experience with a commercial hardware transactional memory implementation. In: Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XIV, pp. 157–168. ACM, New York (2009). https://doi.org/10.1145/1508244.1508263
Dice, D., Shavit, N.: Understanding tradeoffs in software transactional memory. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO ’07, pp. 21–33. IEEE Computer Society, Washington, DC (2007). https://doi.org/10.1109/CGO.2007.38
Diegues, N., Romano, P.: Self-tuning intel transactional synchronization extensions. In: 11th International Conference on Autonomic Computing (ICAC 14), pp. 209–219. USENIX Association, Philadelphia (2014). https://www.usenix.org/conference/icac14/technical-sessions/presentation/diegues
Diegues, N., Romano, P., Garbatov, S.: Seer: probabilistic scheduling for hardware transactional memory. In: Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures, SPAA ’15, pp. 224–233. ACM, New York (2015). https://doi.org/10.1145/2755573.2755578. http://doi.acm.org.focus.lib.kth.se/10.1145/2755573.2755578
Diegues, N., Romano, P., Rodrigues, L.: Virtues and limitations of commodity hardware transactional memory. In: Proceedings of the 23rd International Conference on Parallel Architectures and Compilation, PACT ’14, pp. 3–14. ACM, New York (2014). https://doi.org/10.1145/2628071.2628080
Evan Jones: tpccbench. https://github.com/evanj/tpccbench (2007). Accessed 23 Sept 2019
Felber, P., Issa, S., Matveev, A., Romano, P.: Hardware read-write lock elision. In: Proceedings of the Eleventh European Conference on Computer Systems, EuroSys ’16, pp. 34:1–34:15. ACM, New York (2016). https://doi.org/10.1145/2901318.2901346
Garivier, A., Moulines, E.: On upper-confidence bound policies for switching bandit problems. In: Proceedings of the 22nd International Conference on Algorithmic Learning Theory, ALT’11, pp. 174–188. Springer, Berlin (2011). http://dl.acm.org/citation.cfm?id=2050345.2050365
Goel, B., Titos-Gil, R., Negi, A., McKee, S.A., Stenstrom, P.: Performance and energy analysis of the restricted transactional memory implementation on haswell. In: 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 615–624 (2014). https://doi.org/10.1109/IPDPS.2014.70
Guerraoui, R., Kapalka, M.: On the correctness of transactional memory. In: Proceedings of the 13th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’08, pp. 175–184. ACM, New York (2008). https://doi.org/10.1145/1345206.1345233
Harris, T.L.: A pragmatic implementation of non-blocking linked-lists. In: Proceedings of the 15th International Conference on Distributed Computing, DISC ’01, pp. 300–314. Springer, London (2001). http://dl.acm.org/citation.cfm?id=645958.676105
Hart, T.E., McKenney, P.E., Brown, A.D., Walpole, J.: Performance of memory reclamation for lockless synchronization. J. Parallel Distrib. Comput. 67(12), 1270–1285 (2007). https://doi.org/10.1016/j.jpdc.2007.04.010
Issa, S.: P8TM source code and benchmarks. https://github.com/shadyalaa/POWER8TM (2017). Accessed 23 Sept 2019
Jacobi, C., Slegel, T., Greiner, D.: Transactional memory architecture and implementation for IBM System Z. In: Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO-45, pp. 25–36. IEEE Computer Society, Washington, DC (2012). https://doi.org/10.1109/MICRO.2012.12
Java docs: ReentrantReadWriteLock. https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/ReentrantReadWriteLock.html (2018). Accessed 23 Sept 2019
Kumar, S., Chu, M., Hughes, C.J., Kundu, P., Nguyen, A.: Hybrid transactional memory. In: Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’06, pp. 209–220. ACM, New York (2006). https://doi.org/10.1145/1122971.1123003
Lai, T., Robbins, H.: Asymptotically efficient adaptive allocation rules. Adv. Appl. Math. 6, 4–22 (1985)
Le, H.Q., Guthrie, G.L., Williams, D.E., Michael, M.M., Frey, B.G., Starke, W.J., May, C., Odaira, R., Nakaike, T.: Transactional memory support in the IBM POWER8 processor. IBM J. Res. Dev. 59(1), 8:1–8:14 (2015). https://doi.org/10.1147/JRD.2014.2380199
Marathe, V.J., Scherer, W.N., Scott, M.L.: Adaptive software transactional memory. In: Proceedings of the 19th International Conference on Distributed Computing, DISC’05, pp. 354–368. Springer, Berlin (2005). https://doi.org/10.1007/11561927_26
Matveev, A., Shavit, N.: Reduced Hardware NOrec: a safe and scalable hybrid transactional memory. In: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS ’15, pp. 59–71. ACM, New York (2015). https://doi.org/10.1145/2694344.2694393
Mckenney, P.E., Slingwine, J.D.: Read-copy update: using execution history to solve concurrency problems. In: Parallel and Distributed Computing and Systems, pp. 509–518. Las Vegas, NV (1998)
McKenney, P.E., Walpole, J.: What is RCU, fundamentally? https://lwn.net/Articles/262464/ (2007). Accessed 23 Sept 2019
Minh, C.C., Chung, J., Kozyrakis, C., Olukotun, K.: Stamp: Stanford transactional applications for multi-processing. In: IEEE International Symposium on Workload Characterization, 2008. IISWC 2008, pp. 35–46. IEEE (2008)
Nakaike, T., Odaira, R., Gaudet, M., Michael, M.M., Tomari, H.: Quantitative comparison of hardware transactional memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8. In: Proceedings of the 42nd Annual International Symposium on Computer Architecture, ISCA ’15, pp. 144–157. ACM, New York (2015). https://doi.org/10.1145/2749469.2750403
Nussbaum, D., Lev, Y., Moir, M.: PhTM: Phased transactional memory. In: The Second ACM SIGPLAN Workshop on Transactional Computing, TRANSACT ’07. ACM, New York (2007). https://urresearch.rochester.edu/institutionalPublicationPublicView.action?institutionalItemId=4058
Ortner, R., Ryabko, D., Auer, P., Munos, R.: Regret bounds for restless markov bandits. In: Proceedings of the 23rd International Conference on Algorithmic Learning Theory, ALT’12, pp. 214–228. Springer, Berlin (2012). https://doi.org/10.1007/978-3-642-34106-9_19
Raminhas, P., Issa, T., Romano, P.: Enhancing efficiency of hybrid transactional memory via dynamic data partitioning schemes. In: Proceedings of the 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid ’18, pp. 1–11 (2018). https://doi.org/10.1109/CCGRID.2018.94
Rossbach, C.J., Hofmann, O.S., Witchel, E.: Is transactional programming actually easier? In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’10, pp. 47–56. ACM, New York (2010). https://doi.org/10.1145/1693453.1693462
Stonebraker, M., Madden, S., Abadi, D.J., Harizopoulos, S., Hachem, N., Helland, P.: The end of an architectural era: (it’s time for a complete rewrite). In: Proceedings of the 33rd International Conference on Very Large Data Bases, VLDB ’07, pp. 1150–1160. VLDB Endowment (2007). http://dl.acm.org/citation.cfm?id=1325851.1325981
TPC Council: TPC-C Benchmark. http://www.tpc.org/tpcc (2011). Accessed 23 Sept 2019
Wang, A., Gaudet, M., Wu, P., Amaral, J.N., Ohmacht, M., Barton, C., Silvera, R., Michael, M.: Evaluation of Blue Gene/Q hardware support for transactional memories. In: Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, PACT ’12, pp. 127–136. ACM, New York (2012). https://doi.org/10.1145/2370816.2370836
Wang, Q., Kulkarni, S., Cavazos, J., Spear, M.: A transactional memory with automatic performance tuning. ACM Trans. Archit. Code Optim. 8(4), 54:1–54:23 (2012). https://doi.org/10.1145/2086696.2086733
Yoo, R.M., Hughes, C.J., Lai, K., Rajwar, R.: Performance evaluation of Intel® transactional synchronization extensions for high-performance computing. In: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC ’13, pp. 19:1–19:11. ACM, New York (2013). https://doi.org/10.1145/2503210.2503232
Yu, J.Y., Mannor, S.: Piecewise-stationary bandit problems with side observations. In: Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, pp. 1177–1184. ACM, New York (2009). https://doi.org/10.1145/1553374.1553524
Acknowledgements
This work was supported by Portuguese funds through Fundação para a Ciência e Tecnologia via projects UID/CEC/50021/2019 and PTDC/EEISCR/1743/2014.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Issa, S., Felber, P., Matveev, A. et al. Extending hardware transactional memory capacity via rollback-only transactions and suspend/resume. Distrib. Comput. 33, 327–348 (2020). https://doi.org/10.1007/s00446-019-00363-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00446-019-00363-1