Abstract
The embedded system industry is facing an increasing pressure for migrating from single-core to multi- and many-core platforms for size, performance and cost purposes. Real-time embedded system design follows this trend by integrating multiple applications with different safety criticality levels into a common platform. Scheduling mixed-criticality applications on today’s multi/many-core platforms and providing safe worst-case response time bounds for the real-time applications is challenging given the shared platform resources. For instance, sharing of memory buses introduces delays due to contention, which are non-negligible. Bounding these delays is not trivial, as one needs to model all possible interference scenarios. In this work, we introduce a combined analysis of computing, memory and communication scheduling in a mixed-criticality setting. In particular, we propose: (1) a mixed-criticality scheduling policy for cluster-based many-core systems with two shared resource classes, i.e., a shared multi-bank memory within each cluster, and a network-on-chip for inter-cluster communication and access to external memories; (2) a response time analysis for the proposed scheduling policy, which takes into account the interferences from the two classes of shared resources; and (3) a design exploration framework and algorithms for optimizing the resource utilizations under mixed-criticality timing constraints. The considered cluster-based architecture model describes closely state-of-the-art many-core platforms, such as the Kalray MPPA®-256. The applicability of the approach is demonstrated with a real-world avionics application. Also, the scheduling policy is compared against state-of-the-art scheduling policies based on extensive simulations with synthetic task sets.
Similar content being viewed by others
Notes
Given the definition of \(\delta \), Rx cannot perform more than \(\delta \) high-priority memory accesses within one single frame. Therefore, it is too pessimistic to increase \(CWCRT_{p,k'}(f',\ell )\) for several sub-frames \(k'\) of the same frame \(f'\). This would lead to a potential increase of \(barriers(f',\ell )_L\) by multiples of \(\delta \cdot T_{acc}\).
References
Alur R, Dill DL (1994) A theory of timed automata. Theor Comput Sci 126(2):183–235
Anderson J, Baruah S, Brandenburg B (2009) Multicore operating-system support for mixed criticality. In: Workshop on mixed criticality: roadmap to evolving UAV certification
ARINC (2003) ARINC 653–1 avionics application software standard interface. Technical report
Baruah S, Bonifaci V, D’Angelo G, Li H, Marchetti-Spaccamela A, Van der Ster S, Stougie L (2012) The preemptive uniprocessor scheduling of mixed-criticality implicit-deadline sporadic task systems. In: ECRTS, pp. 145–154
Baruah S, Chattopadhyay B, Li H, Shin I (2014) Mixed-criticality scheduling on multiprocessors. Real-Time Syst 50(1):142–177
Baruah S, Fohler G (2011) Certification-cognizant time-triggered scheduling of mixed-criticality systems. In: RTSS, pp. 3–12
Baruah S, Li H, Stougie L (2010) Towards the design of certifiable mixed-criticality systems. In: RTAS, pp. 13–22
Burns A, Davis R (2013) Mixed criticality systems: a review. http://www-users.cs.york.ac.uk/burns/review.pdf
Certainty. D8.3—validation results. Technical report (2014)
Chang C-S (2000) Performance guarantees in communication networks. Springer, New York
Cruz RL (1991) A calculus for network delay. I. Network elements in isolation. IEEE Trans Inf Theory 37(1):114–131
de Dinechin B, Ayrignac R, Beaucamps P-E, Couvert P, Ganne B, de Massas P, Jacquet F, Jones S, Chaisemartin N, Riss F, Strudel T (2013) A clustered manycore processor architecture for embedded and accelerated applications. In: HPEC, pp. 1–6
de Dinechin B, van Amstel D, Poulhies M, Lager G (2014) Time-critical computing on a single-chip massively parallel processor. In: DATE, pp. 1–6
Diemer J, Ernst R (2010) Back suction: service guarantees for latency-sensitive on-chip networks. In: NOCS, pp. 155–162
European commission’s 7th framework programme: Certification of real-time applications designed for mixed criticality (certainty). www.certainty-project.eu
Flodin J, Lampka K, Yi W (2014) Dynamic budgeting for settling dram contention of co-running hard and soft real-time tasks. In: SIES, pp. 151–159
Giannopoulou G, Lampka K, Stoimenov N, Thiele L (2012) Timed model checking with abstractions: towards worst-case response time analysis in resource-sharing manycore systems. In: EMSOFT, pp. 63–72
Giannopoulou G, Stoimenov N, Huang P, Thiele L (2013) Scheduling of mixed-criticality applications on resource-sharing multicore systems. In: EMSOFT, pp. 17:1–17:15
Giannopoulou G, Stoimenov N, Huang P, Thiele L (2014) Mapping mixed-criticality applications on multi-core architectures. In: DATE, pp. 1–6
Goossens S, Akesson B, Goossens K (2013) Conservative open-page policy for mixed time-criticality memory controllers. In: DATE, pp. 525–530
Hahn S, Reineke J, Wilhelm R (2013) Towards compositionality in execution time analysis-definition and challenges. In: Workshop on compositional theory and technology for real-time embedded systems
Kim H, de Niz D, Andersson B, Klein M, Mutlu O, Rajkumar RR (2014) Bounding memory interference delay in cots-based multi-core systems. In: RTAS, pp. 145–154
Kim Y, Lee J, Shrivastava A, Paek Y (2010) Operation and data mapping for cgras with multi-bank memory. In: LCTES, pp. 17–26
Kirkpatrick S, Gelatt CD, Vecchi MP (1983) Optimization by simulated annealing. Science 220:671–680
Le Boudec J-Y, Thiran P (2001) Network calculus: a theory of deterministic queuing systems for the internet, vol 2050. Springer, New York
Li H, Baruah S (2012) Global mixed-criticality scheduling on multiprocessors. In: ECRTS, pp. 166–175
Liu L, Cui Z, Xing M, Bao Y, Chen M, Wu C (2012) A software memory partition approach for eliminating bank-level interference in multicore systems. In: PACT, pp. 367–376
Lu Z, Millberg M, Jantsch A, Bruce A, van der Wolf P, Henriksson T (2009) Flow regulation for on-chip communication. In: DATE, pp. 578–581
Melpignano D, Benini L, Flamand E, Jego B, Lepley T, Haugou G, Clermidy F, Dutoit D (2012) Platform 2012, a many-core computing accelerator for embedded socs: performance evaluation of visual analytics applications. In: DAC, pp. 1137–1142
Mi W, Feng X, Xue J, Jia Y (2010) Software-hardware cooperative dram bank partitioning for chip multiprocessors. In: Network and parallel computing, vol 6289. Lecture note on computer science, pp. 329–343
Mollison M, Erickson J, Anderson J, Baruah S, Scoredos J, et al (2010) Mixed-criticality real-time scheduling for multicore systems. In: ICCIT, pp. 1864–1871
Paolieri M, Qui nones E, Cazorla FJ, Bernat G, Valero M (2009) Hardware support for wcet analysis of hard real-time multicore systems. In: ISCA, pp. 57–68
Pathan R (2012) Schedulability analysis of mixed-criticality systems on multiprocessors. In: ECRTS, pp. 309–320
Pellizzoni R, Bui BD, Caccamo M, Sha L (2008) Coscheduling of cpu and i/o transactions in cots-based embedded systems. In: RTSS, pp. 221–231
Qian Y, Lu Z, Dou W (2009) Analysis of communication delay bounds for network on chips. In: NOCS
Qian Y, Lu Z, Dou W (2010) Analysis of worst-case delay bounds for on-chip packet-switching networks. IEEE Trans Comput Aided Design Integr Circ Syst 29(5):802–815
Reineke J, Liu I, Patel HD, Kim S, Lee EA (2011) Pret dram controller: bank privatization for predictability and temporal isolation. In: CODES+ISSS, pp. 99–108
RTCA/DO-178B (1992) Software considerations in airborne systems and equipment certification
Tamas-Selicean D, Pop P (2011) Design optimization of mixed-criticality real-time applications on cost-constrained partitioned architectures. In: RTSS, pp. 24–33
Thiele L, Chakraborty S, Naedele M (2000) Real-time calculus for scheduling hard real-time systems. In: ISCAS, pp. 101–104
The dol-critical framework for mixed-criticality applications on multicores. https://www.tik.ee.ethz.ch/certainty/download.html
Thiele L, Stoimenov N (2009) Modular performance analysis of cyclic dataflow graphs. In: EMSOFT, pp. 127–136
Tobuschat S, Axer P, Ernst R, Diemer J (2013) Idamc: a noc for mixed criticality systems. In: RTCSA, pp. 149–156
Vestal S (2007) Preemptive scheduling of multi-criticality systems with varying degrees of execution time assurance. In: RTSS, pp. 239–243
Wandeler E, Maxiaguine A, Thiele L (2006) Performance analysis of greedy shapers in real-time systems. In: DATE, pp. 444–449, Munich, Germany
Wandeler E, Thiele L, Verhoef M, Lieverse P (2006) System architecture evaluation using modular performance analysis—a case study. Int J Softw Tools Technol Transf 8(6):649–667
Wilhelm R, Grund D, Reineke J, Schlickling M, Pister M, Ferdinand C (2009) Memory hierarchies, pipelines, and buses for future architectures in time-critical embedded systems. IEEE Trans Comput Aided Design Integr Circ Syst 28(7):966–978
Wu ZP, Krish Y, Pellizzoni R (2013) Worst case analysis of dram latency in multi-requestor systems. In: RTSS, pp. 372–383
Yun H, Mancuso R, Wu Z-P, Pellizzoni R (2014) Palloc: dram bank-aware memory allocator for performance isolation on multicore platforms. In: RTAS, pp. 155–166
Yun H, Yao G, Pellizzoni R, Caccamo M, Sha L (2012) Memory access control in multiprocessor for real-time systems with mixed criticality. In: ECRTS, pp. 299–308
Zhan J, Stoimenov N, Ouyang J, Thiele L, Narayanan V, Xie Y (2013) Designing energy-efficient noc for real-time embedded systems through slack optimization. In: DAC, pp. 1–6
Acknowledgments
The authors would like to thank the anonymous reviewers for their valuable feedback. This work has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) project CERTAINTY under grant agreement number 288175.
Author information
Authors and Affiliations
Corresponding author
Notation summary
Notation summary
Notation | Meaning | Source |
---|---|---|
System model—Sect. 3 | ||
\(\tau \) | Task set | Fixed |
\(L\) | Number of criticality levels | Fixed |
\(W_i\), \(\chi _i\) | Period and criticality level of task \(\tau _i\) | Fixed |
\(C_i(\ell )\) | Execution profile with lower and upper bounds on execution time | Fixed |
(\(e_i\)) and number of memory accesses (\(\mu _i\)) of \(\tau _i\) at level \(\ell \le \chi _i\) | Fixed | |
\(C_{i,deg}\) | Execution profile of \(\tau _i\) at level of assurance \(\ell > \chi _i\) | Fixed |
\(\mathcal {D} ep \) | Dependency graph | The edges (dependencies) are known, |
but their weight (minimum distance constraint) | ||
is determined from NoC analysis (Sect. 5.2) | ||
\(\mathcal {P}\) | Set of processing cores on a target cluster | Fixed |
\(T_{acc}\) | Memory access latency | Fixed |
\(\mathcal {M}_{\tau }\) | Mapping of tasks of \(\tau \) to cores of \(\mathcal {P}\) | Optimized in Sect. 6 |
Remote fetch protocol—Sect. 3.2 | ||
\(\tau _{init,i} \rightarrow \tau _i\) | Initiating task for remote data fetch, task using fetched data | Fixed |
WCNT | Worst-case time for transfer of notification from \(\tau _{init,i}\) | |
to listener in remote cluster | ||
WCRST | Worst-case time for set-up of DMA transfer in remote cluster | Based on measurements |
WCDFT | Worst-case time for complete transfer of data from remote cluster | |
Flexible time-triggered and synchronization based scheduling—Sect. 4 | ||
\(h\) | FTTS cycle | Computed as hyper-period of \(\tau \) |
\(\mathcal {F}\) | Set of FTTS frames | Computed after selecting frame lengths |
\(\mathcal {L}_f\) | Length of FTTS frame \(f\) | Selected manually |
\(barriers(f,l)_k\) | Worst-case length of \(k\)-th sub-frame in frame \(f\) at level \(\ell \) | Computed for a given \(\mathcal {M}_{\tau }, \mathcal {M}_{mem}\) in Sect. 5.1 (Eq. 6) |
Response time analysis—Sect. 5 | ||
\(\mathcal {I}\) | Memory interference graph | Consisting of \(\mathcal {E}_1, \mathcal {E}_2\) |
\(\mathcal {E}_1\) | Mapping of tasks to memory blocks | Fixed |
\(\mathcal {E}_2\) | Mapping of memory blocks to memory banks | Optimized in Sect. 6 |
\(\mathcal {M}_{mem}\) | Equivalent to \(\mathcal {E}_2\) | Optimized in Sect. 6 |
\(D\) | Mutual delay matrix | |
Rx | Special node in \(\mathcal {I}\) indicating a high-priority NoC Rx access | Added to \(\mathcal {I}\) in Sect. 5.1.2 |
\(\delta \) | Weight of edge between Rx and accessed memory block(s) | |
\(WCRT_i(f,l)\) | Worst-case response time of \(\tau _i\) in frame \(f\) at level \(\ell \) | |
\(CWCRT_{p,k}(f,l)\) | Worst-case response time of tasks executing on core \(p\) in the \(k\)-th sub-frame of frame \(f\) at level \(\ell \) | |
Design optimization—Sect. 6 | ||
\(\Vert barriers\Vert _3\) | 3rd norm of \(barriers\) for all \(f\in \mathcal {F}, \ell \in \{1,\ldots ,L\}\) | Computed for each candidate \(\mathcal {M}_{\tau }\) (Eq. 6) |
\(T_0\) | Initial temperature | Parameter of SA algorithm |
\(a\) | Temperature decreasing factor | Parameter of SA algorithm |
\(T_{final}\) | Final temperature | Parameter of SA algorithm |
\(time_{max}\) | Time budget | Parameter of SA algorithm |
Rights and permissions
About this article
Cite this article
Giannopoulou, G., Stoimenov, N., Huang, P. et al. Mixed-criticality scheduling on cluster-based manycores with shared communication and storage resources. Real-Time Syst 52, 399–449 (2016). https://doi.org/10.1007/s11241-015-9227-y
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11241-015-9227-y