Applications in industry often have grown and improved over many years. Since their performance demands increase, they also need to benefit from the availability of multi-core processors. However, a reimplementation from scratch and even a restructuring of these industrial applications is very expensive, often due to high certification efforts. Therefore, a strategy for a systematic parallelization of legacy code is needed. We present a parallelization approach for hard real-time systems, which ensures a high reusage of legacy code and preserves timing analysability. To show its applicability, we apply it on the core algorithm of an avionics application as well as on the control program of a large construction machine. We create models of the legacy programs showing the potential of parallelism, optimize them and change the source codes accordingly. The parallelized applications are placed on a predictable multi-core processor with up to 18 cores. For evaluation, we compare the worst case execution times and their speedups. Furthermore, we analyse limitations coming up at the parallelization process.
Due to e. g. synchronization overheads, some parts of the program might take longer. Therefore, it is important to keep an eye on periods and deadlines and to check if everything still works fine.
For example, the meta-pattern in OPL (see http://parlab.eecs.berkeley.edu/wiki/patterns/patterntemplate) requires the specification of name, problem, context, forces, solution, invariants, an example, known uses, related patterns, references, and author.
See online version: http://parlab.eecs.berkeley.edu/wiki/patterns/patterns.
Homepage: www.cscope.sf.net.
However, the XML example input file of our speedup approximation and parameter optimization tool gives a clue of the complete APD. It can be found at https://github.com/parmerasa-uau/parallelism-optimization/tree/master/ParallelismAnalysisJMetal.
Open-source, Homepage: www.cscope.sf.net.
Part of the Rapita Verification Suite (RVS), Homepage: www.rapitasystems.com.
A configuration specifies which program parts should be executed in parallel, how many cores are utilized and which variables have to be synchronized.
The tool is open source and can be downloaded at https://github.com/parmerasa-uau/parallelism-optimization.
Our Timing-analyzable Algorithmic Skeleton (TAS) library is open source (LGPLv3 licence) and can be downloaded at https://github.com/parmerasa-uau/tas.
Utilizing our PDPs, this can be realized with the Data Parallelism PDP, e. g. two threads, each doing SUM on half of the matrices. Finally, one thread would have to do the final SUM of the two resulting matrices.
Original Homepage: http://www.soclib.fr parMERASA simulator (open source under BSD licence): http://www.parmerasa.eu/files/open_source/soclib_parmerasa.zip.
For more than 1 core, it always has to be assumed that all memory requests of all other cores are processed by the memory controller before the own request is handled. This results in worst case memory access times of 54 cycles for 4 cores, 96 cycles for 8 cores and 138 cycles for 12 cores.
At the pipeline PDP, data is moved to the next stage when all stages have finished their work. Therefore, every time the largest stage is finished, one result matrix comes out of the pipeline.
In the sequential version, all activities have to be processed to get one result matrix. Then the next set of input data can be processed.
Many components check the armrest switch because for security reasons they are not allowed to run when there is no driver sitting in the cab.
Interrupts take place every 1 ms because this is the smallest period of periodic tasks.
They have a high degree of independence–however, the control application of the foundation crane contains no components which are completely independent of all others since they all share the same data structures, e. g. for accessing sensors and actuators.
Homepage: www.cscope.sf.net.
Homepage: www.rapitasystems.com.
Unfortunately, it was not possible to determine WCETs for tasks in the scheduler since RapiTime supports only analyzing functions in the program flow and OTAWA did not work on the TriCore platform because no detailed timing model is available.
Nevertheless, we are aware that the OETs may be different on the target platform. Our results show that two cores are nearly filled by the periodic tasks now because of lower clock frequency and synchronization overheads.
Download link: https://github.com/parmerasa-uau/parallelism-optimization/.
Fetch and increment barriers.
Alternatively a get method can return a copy of the structure which can be kept locally for reading operations. However, if the structure is modified, consistency can be an issue when the local copy is written back.
Available as open-source software: http://www.otawa.fr.
Our tool can be downloaded at https://www.github.com/parmerasa-uau/tas2otawa.
The real WCET cannot be estimated, only a safe upper bound, see [59].
Configurations with more cores usually have more parallel parts requiring more shared variables to be synchronized.
There is also one speedup of 5.0—Kempf et al. describe that this is a benchmark testing different components of the system. The parallelized version tests all components simultanously instead of successively.
The research leading to these results has received funding from the European Union Seventh Framework Programme under the name parMERASA and Grant Agreement No. 287519.
