Abstract
Applications in industry often have grown and improved over many years. Since their performance demands increase, they also need to benefit from the availability of multi-core processors. However, a reimplementation from scratch and even a restructuring of these industrial applications is very expensive, often due to high certification efforts. Therefore, a strategy for a systematic parallelization of legacy code is needed. We present a parallelization approach for hard real-time systems, which ensures a high reusage of legacy code and preserves timing analysability. To show its applicability, we apply it on the core algorithm of an avionics application as well as on the control program of a large construction machine. We create models of the legacy programs showing the potential of parallelism, optimize them and change the source codes accordingly. The parallelized applications are placed on a predictable multi-core processor with up to 18 cores. For evaluation, we compare the worst case execution times and their speedups. Furthermore, we analyse limitations coming up at the parallelization process.
Similar content being viewed by others
Notes
Due to e. g. synchronization overheads, some parts of the program might take longer. Therefore, it is important to keep an eye on periods and deadlines and to check if everything still works fine.
For example, the meta-pattern in OPL (see http://parlab.eecs.berkeley.edu/wiki/patterns/patterntemplate) requires the specification of name, problem, context, forces, solution, invariants, an example, known uses, related patterns, references, and author.
See online version: http://parlab.eecs.berkeley.edu/wiki/patterns/patterns.
Homepage: www.cscope.sf.net.
However, the XML example input file of our speedup approximation and parameter optimization tool gives a clue of the complete APD. It can be found at https://github.com/parmerasa-uau/parallelism-optimization/tree/master/ParallelismAnalysisJMetal.
Open-source, Homepage: www.cscope.sf.net.
Part of the Rapita Verification Suite (RVS), Homepage: www.rapitasystems.com.
A configuration specifies which program parts should be executed in parallel, how many cores are utilized and which variables have to be synchronized.
The tool is open source and can be downloaded at https://github.com/parmerasa-uau/parallelism-optimization.
Our Timing-analyzable Algorithmic Skeleton (TAS) library is open source (LGPLv3 licence) and can be downloaded at https://github.com/parmerasa-uau/tas.
Utilizing our PDPs, this can be realized with the Data Parallelism PDP, e. g. two threads, each doing SUM on half of the matrices. Finally, one thread would have to do the final SUM of the two resulting matrices.
Original Homepage: http://www.soclib.fr parMERASA simulator (open source under BSD licence): http://www.parmerasa.eu/files/open_source/soclib_parmerasa.zip.
For more than 1 core, it always has to be assumed that all memory requests of all other cores are processed by the memory controller before the own request is handled. This results in worst case memory access times of 54 cycles for 4 cores, 96 cycles for 8 cores and 138 cycles for 12 cores.
At the pipeline PDP, data is moved to the next stage when all stages have finished their work. Therefore, every time the largest stage is finished, one result matrix comes out of the pipeline.
In the sequential version, all activities have to be processed to get one result matrix. Then the next set of input data can be processed.
Many components check the armrest switch because for security reasons they are not allowed to run when there is no driver sitting in the cab.
Interrupts take place every 1 ms because this is the smallest period of periodic tasks.
They have a high degree of independence–however, the control application of the foundation crane contains no components which are completely independent of all others since they all share the same data structures, e. g. for accessing sensors and actuators.
Homepage: www.cscope.sf.net.
Homepage: www.rapitasystems.com.
Unfortunately, it was not possible to determine WCETs for tasks in the scheduler since RapiTime supports only analyzing functions in the program flow and OTAWA did not work on the TriCore platform because no detailed timing model is available.
Nevertheless, we are aware that the OETs may be different on the target platform. Our results show that two cores are nearly filled by the periodic tasks now because of lower clock frequency and synchronization overheads.
Download link: https://github.com/parmerasa-uau/parallelism-optimization/.
Fetch and increment barriers.
Alternatively a get method can return a copy of the structure which can be kept locally for reading operations. However, if the structure is modified, consistency can be an issue when the local copy is written back.
Available as open-source software: http://www.otawa.fr.
Our tool can be downloaded at https://www.github.com/parmerasa-uau/tas2otawa.
The real WCET cannot be estimated, only a safe upper bound, see [59].
Configurations with more cores usually have more parallel parts requiring more shared variables to be synchronized.
There is also one speedup of 5.0—Kempf et al. describe that this is a benchmark testing different components of the system. The parallelized version tests all components simultanously instead of successively.
References
Abella, J., Hardy, D., Puaut, I., Quiñones, E., Cazorla, F.: On the comparison of deterministic and probabilistic WCET estimation techniques. In: 26th Euromicro Conference on Real-Time Systems (ECRTS), pp. 266–275 (2014). doi:10.1109/ECRTS.2014.16
Altmeyer, S., Cucu-Grosjean, L., Davis, R.: Static probabilistic timing analysis for real-time systems using random replacement caches. Real Time Syst. 51(1), 77–123 (2015). doi:10.1007/s11241-014-9218-4
Amdahl, G.M.: Validity of the single processor approach to achieving large-scale computing capabilities. AFIPS Conference Proceedings, vol 30, pp. 483–485 (1967). doi:10.1145/1465482.1465560
Audsley, N., Tindell, K., Burns, A.: The end of the line for static cyclic scheduling? In: Fifth Euromicro Workshop on Real-Time Systems. Proceedings, pp. 36–41 (1993). doi:10.1109/EMWRT.1993.639042
Ballabriga, C., Cassé, H., Rochange, C., Sainrat, P.: OTAWA: an open toolbox for adaptive WCET analysis. In: Software Technologies for Embedded and Ubiquitous Systems (Lecture Notes in Computer Science), vol 6399, pp. 35–46. Springer, Berlin (2011). doi:10.1007/978-3-642-16256-5_6
BAUER Maschinen GmbH: MC 128 foundation crane datasheet (2013). https://www.bauer.de/export/shared/pdf/bma/products/foundation_crane/905-659-2.pdf
Benoit, A., Cole, M., Gilmore, S., Hillston, J.: Flexible skeletal programming with eSkel. In: Euro-Par 2005 Parallel Processing (Lecture Notes in Computer Science), vol 3648, pp. 761–770. Springer, Berlin (2005). doi:10.1007/11549468_83
Blumofe, R.D., Leiserson, C.E.: Scheduling multithreaded computations by work stealing. J. ACM 46(5), 720–748 (1999). doi:10.1145/324133.324234
Bonenfant, A., Broster, I., Ballabriga, C., Bernat, G., Cassé, H., Houston, M., Merriam, N., de Michiel, M., Rochange, C., Sainrat, P.: Coding Guidelines for WCET Analysis Using Measurement-Based and Static Analysis Techniques. Technical Report IRIT/RR-2010-8-FR, IRIT-Institut de recherche en informatique de Toulouse (2010)
Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004). doi:10.1016/j.parco.2003.12.002
Cordes, D., Marwedel, P.: Multi-objective aware extraction of task-level parallelism using genetic algorithms. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012, pp. 394–399 (2012). doi:10.1109/DATE.2012.6176503
Cordes, D., Marwedel, P., Mallik, A.: Automatic parallelization of embedded software using hierarchical task graphs and integer linear programming. In: 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), pp. 267–276 (2010). doi:10.1145/1878961.1879009
Cucu-Grosjean, L., Santinelli, L., Houston, M., Lo, C., Vardanega, T., Kosmidis, L., Abella, J., Mezzetti, E., Quiñones, E., Cazorla, F.: Measurement-based probabilistic timing analysis for multi-path programs. In: 24th Euromicro Conference on Real-Time Systems (ECRTS), pp. 91–101 (2012). doi:10.1109/ECRTS.2012.31
Falcou, J., Sérot, J., Chateau, T., Lapresté, J.T.: QUAFF: efficient C++ design for parallel skeletons. Parallel Comput. 32(7), 604–615 (2006). doi:10.1016/j.parco.2006.06.001
Foster, I.: Designing and Building Parallel Programs: Concepts and Tools for Parallel Software Engineering. Addison-Wesley Longman Publishing Co., Inc., Boston (1995)
Fraboulet, A., Risset, T., Scherrer, A.: Computer systems: architectures, modeling, and simulation: third and fourth international workshops, SAMOS 2004, Samos, Greece, July 21–23, 2004 and July 19–21, 2004. In: Proceedings, Chap. Cycle Accurate Simulation Model Generation for SoC Prototyping, pp. 453–462. Springer, Berlin (2004). doi:10.1007/978-3-540-27776-7_47
Gebhard, G., Cullmann, C., Heckmann, R.: Software structure and WCET predictability. In: Bringing Theory to Practice: Predictability and Performance in Embedded Systems, vol 18, pp. 1–10. Dagstuhl, Germany (2011). doi:10.4230/OASIcs.PPES.2011.1
Gerdes, M., Jahr, R., Ungerer, T.: parMERASA Pattern Catalogue: Timing Predictable Parallel Design Patterns. Technical Report 2013-11, University of Augsburg (2013)
Gerdes, M., Kluge, F., Ungerer, T., Rochange, C., Sainrat, P.: Time analysable synchronisation techniques for parallelised hard real-time applications. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2012, pp. 671–676 (2012). doi:10.1109/DATE.2012.6176555
Gerdes, M., Wolf, J., Guliashvili, I., Ungerer, T., Houston, M., Bernat, G., Schnitzler, S., Regler, H.: Large drilling machine control code—Parallelisation and WCET speedup. In: 6th IEEE International Symposium on Industrial Embedded Systems (SIES), pp. 91–94 (2011). doi:10.1109/SIES.2011.5953688
González-Vélez, H., Leyton, M.: A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Software: Practice and Experience 40(12), 1135–1160 (2010). doi:10.1002/spe.1026
Gustavsson, A., Gustafsson, J., Lisper, B.: Toward static timing analysis of parallel software. In: 12th International Workshop on Worst-Case Execution Time Analysis, vol 23, pp. 38–47. Dagstuhl, Germany (2012). doi:10.4230/OASIcs.WCET.2012.38
Herlihy, M.: Wait-free synchronization. ACM Trans. Progr. Lang. Syst.: TOPLAS 13(1), 124–149 (1991). doi:10.1145/114005.102808
Herlihy, M., Luchangco, V., Moir, M.: Obstruction-free synchronization: double-ended queues as an example. In: Proceedings of the 23rd International Conference on Distributed Computing Systems, pp. 522–529. IEEE (2003). doi:10.1109/ICDCS.2003.1203503
Infineon: AURIX—TC27x B-Step, 32-bit Single-Chip Micro-controller. User’s Manual, v14.1
Jahr, R., Frieb, M., Gerdes, M., Ungerer, T.: Model-based parallelization and optimization of an industrial control code. In: Dagstuhl-Workshop MBEES: Modellbasierte Entwicklung eingebetteter Systeme X, Schloss Dagstuhl, Germany, 2014, Tagungsband Modellbasierte Entwicklung eingebetteter Systeme, pp. 63–72. fortiss GmbH, München, Schloss Dagstuhl (2014). http://www4.in.tum.de/~schaetz/papers/MBEES2014.pdf
Jahr, R., Gerdes, M., Ungerer, T.: On efficient and effective model-based parallelization of hard real-time applications. In: Dagstuhl-Workshop MBEES: Modellbasierte Entwicklung eingebetteter Systeme IX, Schloss Dagstuhl, Germany, 2013, Tagungsband Modellbasierte Entwicklung eingebetteter Systeme, pp. 50–59. Fortiss GmbH, München, Schloss Dagstuhl (2013). http://www4.in.tum.de/~schaetz/papers/MBEES2013.pdf
Jahr, R., Gerdes, M., Ungerer, T.: A pattern-supported parallelization approach. In: Proceedings of the 2013 International Workshop on Programming Models and Applications for Multicores and Manycores, PMAM ’13, pp. 53–62. ACM, New York (2013). doi:10.1145/2442992.2442998
Jahr, R., Gerdes, M., Ungerer, T., Ozaktas, H., Rochange, C., Zaykov, P.: Effects of structured parallelism by parallel design patterns on embedded hard real-time systems. In: IEEE 20th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), pp. 1–10 (2014). doi:10.1109/RTCSA.2014.6910546
Jahr, R., Stegmeier, A., Kiefhaber, R., Frieb, M., Ungerer, T.: User Manual for the Optimization and WCET Analysis of Software with Timing Analyzable Algorithmic Skeletons. Technical Report no. 2014-05, University of Augsburg (2014)
Kehr, S., Quiñones, E., Böddeker, B., Schäfer, G.: Parallel execution of AUTOSAR legacy applications on multicore ECUs with timed implicit communication. In: 52nd ACM/EDAC/IEEE Design Automation Conference (DAC). San Francisco, USA (2015). doi:10.1145/2744769.2744889
Kempf, S., Veldema, R., Philippsen, M.: Is there hope for automatic parallelization of legacy industry automation applications? In: Gesellschaft für Informatik e.V. (ed.) Proceedings of the 24th Workshop on Parallel Systems and Algorithms (PARS), pp. 80–89 (2011). https://www2.informatik.uni-erlangen.de/publication/download/PARS2011.pdf
Keutzer, K., Massingill, B.L., Mattson, T.G., Sanders, B.A.: A design pattern language for engineering (parallel) software: merging the PLPP and OPL projects. In: Proceedings of the 2010 Workshop on Parallel Programming Patterns, ParaPLoP ’10, pp. 9:1–9:8. ACM, New York (2010). doi:10.1145/1953611.1953620
Liu, X., Zhou, J., Zhang, D., Shen, Y., Guo, M.: A parallel skeleton library for embedded multicores. In: 39th International Conference on Parallel Processing Workshops (ICPPW), pp. 65–73. IEEE (2010). doi:10.1109/ICPPW.2010.21
Lukas, R.G.: Dynamic compaction. Geotechnical Engineering Circular No. 1(FHWA-SA-95-037), 1–97 (1995). http://isddc.dot.gov/OLPFiles/FHWA/009754.pdf
Massingill, B.L., Mattson, T.G., Sanders, B.A.: Patterns for parallel application programs. In: Proceedings of the Sixth Pattern Languages of Programs Workshop (PLoP), Allerton Park in Monticello, IL (1999). http://www.hillside.net/plop/plop99/proceedings/massingill/massingill.pdf
Mattson, T.G., Sanders, B.A., Massingill, B.L.: Patterns for Parallel Programming, 1st edn. Addison-Wesley Professional, Boston, MA (2004)
Meade, A., Buckley, J., Collins, J.J.: Challenges of evolving sequential to parallel code: an exploratory review. In: Proceedings of the 12th International Workshop on Principles of Software Evolution and the 7th Annual ERCIM Workshop on Software Evolution, IWPSE-EVOL ’11, pp. 1–5. ACM, New York (2011). doi:10.1145/2024445.2024447
Metzlaff, S., Mische, J., Ungerer, T.: A real-time capable many-core model. In: Proceedings of 32nd IEEE Real-Time Systems Symposium: Work-in-Progress Session, pp. 21–24. Vienna, Austria (2011). http://www.cs.wayne.edu/~fishern/Meetings/wip-rtss2011/WiP-RTSS-2011-Proceedings-Post.pdf
OMG Unified Modeling Language™(OMG UML), Version 2.5. Standardization Document (2015). http://www.omg.org/spec/UML/2.5/
Ozaktas, H., Rochange, C., Sainrat, P.: Automatic WCET analysis of real-time parallel applications. In: 13th International Workshop on Worst-Case Execution Time Analysis, vol 30, pp. 11–20. Dagstuhl, Germany (2013). doi:10.4230/OASIcs.WCET.2013.11
Panić, M., Kehr, S., Quiñones, E., Böddecker, B., Abella, J., Cazorla, F.J.: RunPar: an allocation algorithm for automotive applications exploiting runnable parallelism in multicores. In: 2014 ACM International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). New Delhi, India (2014). doi:10.1145/2656075.2656096
Predictable parMERASA Multicore Processor. Deliverable 5.3 of the parMERASA Project (2013). http://www.parmerasa.eu/files/deliverables/Deliverable_5_3.pdf
Puschner, P., Schoeberl, M.: On composable system timing, task timing, and WCET analysis. In: R. Kirner (ed.) 8th International Workshop on Worst-Case Execution Time Analysis (WCET’08) (OpenAccess Series in Informatics (OASIcs)), vol 8. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl (2008). doi:10.4230/OASIcs.WCET.2008.1662. Also published in print by Austrian Computer Society (OCG) with ISBN 978-3-85403-237-3
Quinton, S., Bone, T.T., Hennig, J., Neukirchner, M., Negrean, M., Ernst, R.: Typical worst case response-time analysis and its use in automotive network design. In: Proceedings of the 51st Annual Design Automation Conference, DAC ’14, pp. 44:1–44:6. ACM, New York (2014). doi:10.1145/2593069.2602977
Quinton, S., Hanke, M., Ernst, R.: Formal analysis of sporadic overload in real-time systems. In: Design, Automation & Test in Europe Conference & Exhibition (DATE), pp. 515–520 (2012). doi:10.1109/DATE.2012.6176523
Rapita Systems: RapiTime explained. White Paper MC-WP-001-17, http://www.rapitasystems.com/downloads/white-papers/rapitime-explained
Report on support of tools for case studies. Deliverable 3.12 of the parMERASA Project (2014). http://www.parmerasa.eu/files/deliverables/Deliverable_3_12.pdf
Rochange, C., Bonenfant, A., Sainrat, P., Gerdes, M., Wolf, J., Ungerer, T., Petrov, Z., Mikulu, F.: WCET analysis of a parallel 3D multigrid solver executed on the MERASA multi-core. In: 10th International Workshop on Worst-Case Execution Time Analysis (WCET 2010), vol 15, pp. 90–100. Dagstuhl, Germany (2010). doi:10.4230/OASIcs.WCET.2010.90
Saifullah, A., Agrawal, K., Lu, C., Gill, C.: Multi-core real-time scheduling for generalized parallel task models. In: IEEE 32nd Real-Time Systems Symposium (RTSS), pp. 217–226 (2011). doi:10.1109/RTSS.2011.27
Schlingmann, S., Garbade, A., Weis, S., Ungerer, T.: Connectivity-sensitive algorithm for task placement on a many-core considering faulty regions. In: 19th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP), pp. 417–422 (2011). doi:10.1109/PDP.2011.58
Sensortechnik Wiedemann GmbH: ESX-3XL. Data Sheet. http://www.sensor-technik.de/images/stories/pdf/download/esx-3xl_datenblatt_en.pdf (2014)
Sérot, J., Ginhac, D.: Skeletons for parallel image processing: an overview of the SKIPPER project. Parallel Comput. 28(12), 1685–1708 (2002). doi:10.1016/S0167-8191(02)00189-8
Stegmeier, A., Frieb, M., Jahr, R., Ungerer, T.: Algorithmic skeletons for parallelization of embedded real-time systems. In: 3rd Workshop on High-Performance and Real-time Embedded Systems (HiRES) (2015). http://www.cister.isep.ipp.pt/hires2015/Algorithmic_Skeletons_for_Parallelization_of_Embedded_Real-time_Systems.pdf
Ungerer, T., Bradatsch, C., Frieb, M., Kluge, F., Mische, J., Stegmeier, A., Jahr, R., Gerdes, M., Zaykov, P., Matusova, L., Li, Z.J.J., Petrov, Z., Böddeker, B., Kehr, S., Regler, H., Hugl, A., Rochange, C., Ozaktas, H., Cassé, H., Bonenfant, A., Sainrat, P., Lay, N., George, D., Broster, I., Quiñones, E., Panić, M., Abella, J., Hernandez, C., Cazorla, F., Uhrig, S., Rohde, M., Pyka, A.: Parallelizing industrial hard real-time applications for the parMERASA multi-core. Trans. Embed. Comput. Syst.: TECS (2016) (To appear)
Ungerer, T., Bradatsch, C., Gerdes, M., Kluge, F., Jahr, R., Mische, J., Fernandes, J., Zaykov, P., Petrov, Z., Böddeker, B., Kehr, S., Regler, H., Hugl, A., Rochange, C., Ozaktas, H., Casse, H., Bonenfant, A., Sainrat, P., Broster, I., Lay, N., George, D., Quiñones, E., Panić, M., Abella, J., Cazorla, F., Uhrig, S., Rohde, M., Pyka, A.: parMERASA—multi-core execution of parallelised hard real-time applications supporting analysability. In: 2013 Euromicro Conference on Digital System Design (DSD), pp. 363–370 (2013). doi:10.1109/DSD.2013.46
Ungerer, T., Bradatsch, C., Gerdes, M., Kluge, F., Jahr, R., Mische, J., Stegmeier, A., Frieb, M., Fernandes, J., Zaykov, P., Petrov, Z., Böddeker, B., Kehr, S., Regler, H., Hugl, A., Rochange, C., Ozaktas, H., Casse, H., Bonenfant, A., Sainrat, P., Broster, I., Lay, N., George, D., Quiñones, E., Panić, M., Abella, J., Cazorla, F., Uhrig, S., Rohde, M., Pyka, A.: Experiences and results of parallelisation of industrial hard real-time applications for the parMERASA multi-core. In: 3rd Workshop on High-Performance and Real-Time Embedded Systems (HiRES) (2015). http://www.cister.isep.ipp.pt/hires2015/Experiences_and_Results_of_Parallelisation_of_Industrial_Hard_Real-time_Applications_for_the_parMERASA_Multi-core.pdf
Wang, Z., O’Boyle, M.F.: Mapping parallelism to multi-cores: a machine learning based approach. In: Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP ’09, pp. 75–84. ACM, New York (2009). doi:10.1145/1504176.1504189
Wilhelm, R., Engblom, J., Aandreas, E., Holsti, N., Thesing, S., Whalley, D., Bernat, G., Ferdinand, C., Heckmann, R., Mitra, T., Mueller, F., Puaut, I., Puschner, P., Staschulat, J., Stenström, P.: The worst-case execution time problem-overview of methods and survey of tools. ACM Trans. Embed. Comput. Syst. 7(3), 36:1–36:53 (2008). doi:10.1145/1347375.1347389
Acknowledgments
The research leading to these results has received funding from the European Union Seventh Framework Programme under the name parMERASA and Grant Agreement No. 287519.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Frieb, M., Jahr, R., Ozaktas, H. et al. A Parallelization Approach for Hard Real-Time Systems and Its Application on Two Industrial Programs. Int J Parallel Prog 44, 1296–1336 (2016). https://doi.org/10.1007/s10766-016-0432-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-016-0432-7