[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Collection skeletons: : Declarative abstractions for data collections

Published: 17 July 2024 Publication History

Abstract

Modern programming languages provide programmers with rich abstractions for data collections as part of their standard libraries, e.g., Containers in the C++ STL, the Java Collections Framework, or the Scala Collections API. Typically, these collections frameworks are organised as hierarchies that provide programmers with common abstract data types (ADTs) like lists, queues, and stacks. While convenient, this approach introduces problems which ultimately affect application performance due to users over-specifying collection data types limiting implementation flexibility. In this article, we develop Collection Skeletons which provide a novel, declarative approach to data collections. Using our framework, programmers explicitly select properties for their collections, thereby truly decoupling specification from implementation. By making collection properties explicit, immediate benefits materialise in forms of reduced risk of over-specification and increased implementation flexibility. We have prototyped our declarative abstractions for collections as a C++ library, and demonstrate that benchmark applications rewritten to use Collection Skeletons incur little or no overhead. We also show how Collection Skeletons help shielding the application developer from parallel implementation details, either by encapsulating implicit parallelism or through explicit properties that capture the requirements of parallel algorithmic skeletons. We observe performance improvements across most of the 17 benchmarks resulting from the use of Collection Skeletons before trying to parallelise those benchmarks, while also enhancing performance portability across three different hardware platforms.

Highlights

A novel, declarative approach to data collections based on their properties.
Reduces risk of over-specification and increases implementation flexibility.
Introduces minimal overhead while maximising performance enhancement opportunities.
Shields the application developer from the details of parallel implementation.

References

[1]
Agrawal C., Kruskal disjoint, 2021, https://www.geeksforgeeks.org/kruskals-minimum-spanning-tree-using-stl-in-c/, [Accessed 20-Feb-2023].
[2]
Anon C., GitHub - rafa32/quantum-shor: Implementation of a quantum computer simulator together with shor’s algorithm. — github.com, 2017, https://github.com/rafa32/Quantum-Shor, [Accessed 20-Feb-2023].
[3]
Anon C., GitHub - khizmax/libcds: A C++ library of concurrent data structures — github.com, 2023, https://github.com/khizmax/libcds, [Accessed 21-Feb-2023].
[4]
Anon C., Specifications: oneAPI, 2023, https://www.oneapi.io/spec/, [Accessed 20-Feb-2023].
[5]
Augonnet C., Thibault S., Namyst R., Wacrenier P.-A., StarPU: A unified platform for task scheduling on heterogeneous multicore architectures, in: Sips H., Epema D., Lin H.-X. (Eds.), Euro-Par 2009 Parallel Processing, Springer Berlin Heidelberg, Berlin, Heidelberg, 2009, pp. 863–874,.
[6]
Basios M., Li L., Wu F., Kanthan L., Barr E.T., Darwinian data structure selection, in: Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, in: ESEC/FSE 2018, Association for Computing Machinery, New York, NY, USA, 2018, pp. 118–128,.
[7]
Benoit A., Cole M., Gilmore S., Hillston J., Flexible skeletal programming with eskel, in: Cunha J.C., Medeiros P.D. (Eds.), Euro-Par 2005 Parallel Processing, Springer Berlin Heidelberg, Berlin, Heidelberg, 2005, pp. 761–770.
[8]
Bensoussan C., Schöttle M., Kienzle J., Associations in MDE: a concern-oriented, reusable solution, in: European Conference on Modelling Foundations and Applications, Springer, 2016, pp. 121–137,.
[10]
Buss A., Harshvardhan M., Papadopoulos I., Pearce O., Smith T., Tanase G., Thomas N., Xu X., Bianco M., Amato N.M., Rauchwerger L., STAPL: Standard template adaptive parallel library, in: Proceedings of the 3rd Annual Haifa Experimental Systems Conference, SYSTOR ’10, Association for Computing Machinery, New York, NY, USA, 2010,.
[11]
Carlisle M.C., Olden: Parallelizing Programs with Dynamic Data Structures on Distributed-Memory Machines, (Ph.D. thesis) Princeton University, USA, 1996, UMI Order No. GAX96-27387.
[12]
Che S., Boyer M., Meng J., Tarjan D., Sheaffer J.W., Lee S.-H., Skadron K., Rodinia: A benchmark suite for heterogeneous computing, in: 2009 IEEE International Symposium on Workload Characterization, IISWC, IEEE, 2009, pp. 44–54,.
[13]
Chen L., Wu D., Ma W., Zhou Y., Xu B., Leung H., How C++ templates are used for generic programming: an empirical study on 50 open source systems, ACM Trans. Softw. Eng. Methodol. (TOSEM) 29 (1) (2020) 1–49,.
[14]
Ciechanowicz P., Kuchen H., Enhancing muesli’s data parallel skeletons for multi-core computer architectures, in: 2010 IEEE 12th International Conference on High Performance Computing and Communications, HPCC, 2010, pp. 108–113,.
[15]
Cole M.I., Algorithmic Skeletons: Structured Management of Parallel Computation, Pitman London, 1989.
[16]
Cormen T.H., Leiserson C.E., Rivest R.L., Stein C., Introduction to Algorithms, second ed., The MIT Press, 2001.
[17]
Costa D., Andrzejak A., CollectionSwitch: A framework for efficient and dynamic collection selection, in: Proceedings of the 2018 International Symposium on Code Generation and Optimization, 2018, pp. 16–26,.
[18]
Costa D., Andrzejak A., Seboek J., Lo D., Empirical study of usage and performance of Java collections, in: Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, 2017, pp. 389–400,.
[19]
Danelutto M., Mencagli G., Torquati M., González-Vélez H., Kilpatrick P., Algorithmic skeletons and parallel design patterns in mainstream parallel programming, Int. J. Parallel Program. 49 (2) (2021) 177–198,.
[20]
Danelutto M., Teti P., Lithium: A structured parallel programming environment in java, in: Sloot P.M.A., Hoekstra A.G., Tan C.J.K., Dongarra J.J. (Eds.), Computational Science — ICCS 2002, Springer Berlin Heidelberg, Berlin, Heidelberg, 2002, pp. 844–853,.
[21]
De Wael M., Marr S., De Koster J., Sartor J.B., De Meuter W., Just-in-time data structures, in: 2015 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software (Onward!), in: Onward! 2015, Association for Computing Machinery, New York, NY, USA, 2015, pp. 61–75,.
[22]
Diego M., GitHub - dlb04/infix-to-postfix ubinfix to postfix. — github.com, 2021, https://github.com/dlb04/infix-to-postfix, [Accessed 20-Feb-2023].
[23]
Doyle J., Rivest R.L., Linear expected time of a simple union-find algorithm, Inform. Process. Lett. 5 (5) (1976) 146–148,. URL https://www.sciencedirect.com/science/article/pii/0020019076900612.
[24]
Edwards H.C., Trott C.R., Kokkos: Enabling performance portability across manycore architectures, in: 2013 Extreme Scaling Workshop (Xsw 2013), 2013, pp. 18–24,.
[25]
Ernstsson A., Ahlqvist J., Zouzoula S., Kessler C., SkePU 3: Portable high-level programming of heterogeneous systems and HPC clusters, Int. J. Parallel Program. 49 (6) (2021) 846–866,.
[26]
Fisher J., GitHub - airplug/libactor: Actor model library for C — github.com, 2011, https://github.com/airplug/libactor, [Accessed 20-Feb-2023].
[27]
Franke B., Li Z., Morton M., Steuwer M., Collection Skeletons: Declarative abstractions for data collections, in: Proceedings of the 15th ACM SIGPLAN International Conference on Software Language Engineering, in: SLE 2022, Association for Computing Machinery, New York, NY, USA, 2022, pp. 189–201,.
[28]
González-Vélez H., Leyton M., A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers, Softw. - Pract. Exp. 40 (12) (2010) 1135–1160,.
[29]
Grelck C., Shared memory multiprocessor support for functional array processing in SAC, J. Funct. Programming 15 (3) (2005) 353–401,.
[30]
Hermann T., GitHub - Dobiasd/FunctionalPlus: Functional Programming Library for C++. Write concise and readable C++ code. — github.com, 2018, https://github.com/Dobiasd/FunctionalPlus, [Accessed 20-Feb-2023].
[31]
Hornung R.D., Keasler J.A., The RAJA portability layer: Overview and status, 2014,.
[32]
Huang T.-W., Lin D.-L., Lin C.-X., Lin Y., Taskflow: A lightweight parallel and heterogeneous task graph computing system, IEEE Trans. Parallel Distrib. Syst. 33 (6) (2022) 1303–1320,.
[33]
Jarek D., GitHub - djarek/md5lamacz: A multithreaded brute-force MD5 hashed password cracker. — github.com, 2015, https://github.com/djarek/md5lamacz, [Accessed 20-Feb-2023].
[34]
Jung C., Rus S., Railing B.P., Clark N., Pande S., Brainy: Effective selection of data structures, ACM SIGPLAN Notices 46 (6) (2011) 86–97,.
[35]
Koranne S., Boost C++ libraries, in: Handbook of Open Source Tools, Springer US, Boston, MA, 2011, pp. 127–143,.
[36]
Lattner C., Venancio L., Test-suite/SingleSource/Benchmarks/Shootout/hash.c at master ⋅ llvm-mirror/test-suite — github.com, 2021, https://github.com/llvm-mirror/test-suite/blob/master/SingleSource/Benchmarks/Shootout/hash.c, [Accessed 20-Feb-2023].
[37]
Leyton M., Piquer J.M., Skandium: Multi-core programming with algorithmic skeletons, in: 2010 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing, 2010, pp. 289–296,.
[38]
Liskov B., Zilles S., Programming with abstract data types, in: Proceedings of the ACM SIGPLAN Symposium on Very High Level Languages, Association for Computing Machinery, New York, NY, USA, 1974, pp. 50–59,.
[39]
Loidl H.-W., Jones S.P., Algorithm+ strategy=parallelism, J. Functional Program. 8 (1) (1998) 23–60,.
[40]
Louw G., GitHub - glouw/tinn: A tiny neural network library — github.com, 2020, https://github.com/glouw/tinn, [Accessed 20-Feb-2023].
[41]
Majidi A., Thomas N., Smith T., Amato N., Rauchwerger L., Nested parallelism with Algorithmic Skeletons, in: Hall M., Sundar H. (Eds.), Languages and Compilers for Parallel Computing, Springer International Publishing, Cham, 2019, pp. 159–175,.
[42]
Marcell J., GitHub - jasmarc/scheduler: CPU scheduling simulator — github.com, 2009, https://github.com/jasmarc/scheduler.
[43]
Marr S., Daloze B., Few versatile vs. many specialized collections: how to design a collection library for exploratory programming?, in: Conference Companion of the 2nd International Conference on Art, Science, and Engineering of Programming, 2018, pp. 135–143,.
[44]
McCool M., Reinders J., Robison A., Structured Parallel Programming: Patterns for Efficient Computation, first ed., Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 2012.
[45]
Naftalin M., Wadler P., Java Generics and Collections: Speed Up the Java Development Process, “O’Reilly Media, Inc.”, 2006.
[46]
Odersky M., Spoon L., The architecture of scala collections, 2019, https://docs.scala-lang.org/overviews/core/architecture-of-scala-collections.html.
[47]
Pataki N., Testing by C++ template metaprograms, 2010,. arXiv e-prints.
[48]
Poldner M., Kuchen H., Algorithmic skeletons for branch and bound, in: Filipe J., Shishkov B., Helfert M. (Eds.), Software and Data Technologies, Springer Berlin Heidelberg, Berlin, Heidelberg, 2008, pp. 204–219,.
[49]
Rosseta Code Contributors M., Rosetta code — Rosetta code, 2022, https://rosettacode.org/wiki/Rosetta_Code, [Accessed 20-Feb-2023].
[50]
Schöttle M., Kienzle J., On the difficulties of raising the level of abstraction and facilitating reuse in software modelling: The case for signature extension, in: 2019 IEEE/ACM 11th International Workshop on Modelling in Software Engineering (MiSE), 2019, pp. 71–77,.
[51]
Stepanov A., Meng L., The standard template library, HP Laboratories, 1995.
[52]
Stratton J.A., Rodrigues C., Sung I.-J., Obeid N., Chang L.-W., Anssari N., Liu G.D., Hwu W.-m.W., Parboil: A revised benchmark suite for scientific and commercial throughput computing, in: Center for Reliable and High-Performance Computing, 127, 2012, p. 27.
[53]
Thoman P., Tischler F., Salzmann P., Fahringer T., The celerity high-level API: C++20 for accelerator clusters, Int. J. Parallel Program. 50 (3–4) (2022) 341–359,.
[54]
Vasiladiotis C., Mischung-suite/programs/ising/data/original_source/ising.c at master ⋅ compor/mischung-suite — github.com, 2020, https://github.com/compor/mischung-suite/blob/master/programs/ising/data/original_source/ising.c, [Accessed 20-Feb-2023].
[55]
von Koch T.J.K.E., Manilov S., Vasiladiotis C., Cole M., Franke B., Towards a compiler analysis for parallel algorithmic skeletons, in: Proceedings of the 27th International Conference on Compiler Construction, in: CC 2018, Association for Computing Machinery, New York, NY, USA, 2018, pp. 174–184,.
[56]
Walker D.W., Dongarra J.J., MPI: a standard message passing interface, Supercomputer 12 (1996) 56–68.
[57]
Wang C., Yao P., Tang W., Shi Q., Zhang C., Complexity-guided container replacement synthesis, Proc. ACM Program. Lang. 6 (OOPSLA1) (2022),.
[58]
[59]
Xu G., CoCo: Sound and adaptive replacement of Java collections, in: European Conference on Object-Oriented Programming, Springer, 2013, pp. 1–26,.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Journal of Systems and Software
Journal of Systems and Software  Volume 213, Issue C
Jul 2024
376 pages

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 17 July 2024

Author Tags

  1. Containers
  2. Collections
  3. Data structures
  4. Properties

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media