Theoretical Validation and Hardware Implementation of Dynamic Adaptive Scheduling for Heterogeneous Systems on Chip †
<p>An example of the relationship of the (<b>a</b>) execution time and (<b>b</b>) energy-delay product (EDP) between simple low-overhead (lookup table or LUT) and sophisticated high-overhead schedulers.</p> "> Figure 2
<p>Flowchart describing the construction of an Oracle to dynamically choose the best-performing scheduler at runtime.</p> "> Figure 3
<p>ETF scheduling overhead and fitted quadratic curve.</p> "> Figure 4
<p>A comparison of the average execution time (<b>a</b>–<b>c</b>) and energy-delay product (EDP) (<b>d</b>–<b>f</b>) between DAS, the lookup table (LUT), earliest task first (ETF), and ETF-ideal for three different workloads.</p> "> Figure 5
<p>A comparison of the average job slowdown of DAS, LUT, and ETF for twenty-five workloads.</p> "> Figure 6
<p>A comparison of the average execution times of DAS, LUT, ETF, ETF-ideal, and the heuristic approach.</p> "> Figure 7
<p>Decisions made by the DAS framework represented as bar plots and total scheduling energy overheads of LUT, ETF, and DAS as line plots.</p> "> Figure 8
<p>A comparison of the average execution time (<b>a</b>–<b>c</b>) and energy-delay product (EDP) (<b>d</b>–<b>f</b>) of DAS, LUT, and ETF on a hardware platform for three different workloads.</p> "> Figure A1
<p>Possible regions that cover the comparison between DAS and the fast and slow schedulers.</p> "> Figure A2
<p>Experimental results, showing the average execution times for different <math display="inline"><semantics> <mrow> <msub> <mi>p</mi> <mrow> <mi>s</mi> <mi>f</mi> </mrow> </msub> <mo>/</mo> <msub> <mi>p</mi> <mrow> <mi>s</mi> <mi>s</mi> </mrow> </msub> </mrow> </semantics></math> ratios. The results of the fast and slow schedulers are also indicated in the figure by straight lines. The DAS preselection classifier model used in <a href="#sec4-jlpea-13-00056" class="html-sec">Section 4</a> is also indicated by a red star. The shaded region is where the expected trained model results should be.</p> "> Figure A3
<p>The distribution of the number and type of application instances in the 40 workloads used for the evaluation of the DAS framework in the DSSoC simulator.</p> "> Figure A4
<p>The distribution of the number and type of application instances in the 15 workloads used for the evaluation of the DAS framework in the runtime framework.</p> ">
Abstract
:1. Introduction
- Theoretical proof of the DAS framework and its experimental validation using a DSSoC simulator;
- Integration of the DAS framework with an open-source runtime environment and its training and deployment on a Xilinx Zynq ZCU102 SoC;
- Extensive performance evaluation in the trade space of the execution time, energy, and scheduling overhead over the Xilinx Zynq ZCU102 SoC based on workload scenarios composed of real-life applications.
2. Related Works
3. Dynamic Adaptive Scheduling Framework
3.1. Overview and Preliminaries
3.2. DAS Preselection Classifier
- Offline Classifier Design: The first step in the design process of the preselection classifier is to generate the training data based on the domain applications known at design time. Each scenario in the training data consists of concurrent applications and their respective data rates. For example, a combination of WiFi transmitter and receiver chains at a specific upload and download speed could be one such scenario. To this end, we run each scenario twice on an instrumented hardware platform or a simulator (see Figure 2).
- First Execution: The instrumentation enables us to run both fast and slow schedulers each time a task scheduling decision is made. We note that the scheduler is invoked whenever a task is to be scheduled. If the decisions of the fast () and slow () schedulers for task are identical, then we label task with F (i.e., the fast scheduler) and store a snapshot of the performance counters. The label F implies that the fast scheduler can reach the same scheduling decision as the sophisticated algorithm under the system states captured by the performance counters. If the decisions of the schedulers are different, the label is left as pending and the execution continues by following the decision of the fast scheduler, as described in Figure 2. At the end of the execution, the training data contains a mix of both labeled (F) and pending decisions.
- Second Execution: The same scenario is executed for the second time. This time, the execution always follows the decisions of the slow scheduler. At the end of the execution, we analyze the target metric, such as the average execution time or the total energy consumption. If a better result can be achieved using the slow scheduler, the pending labels are replaced with S to indicate that the slow scheduler is preferred despite its larger overhead. Otherwise, we conclude that the lower overhead of the fast scheduler pays off, and the pending labels are replaced with F. An alternative approach is to evaluate each pending instance individually; however, this would not offer a scalable solution, as scheduling is a sequential decision-making problem, and a decision at time affects the remaining execution.
- Online Use of the Classifier: The last step for the classifier is the deployment at runtime (last block in Figure 2). At runtime, a background process periodically updates a pre-allocated local memory with the subset of performance counters that the classifier requires for the scheduler selection. After each update, the classifier determines whether the fast F or slow S scheduler should be used for the next available task. When a new ready task becomes available, the features are already loaded and we know which scheduler is a better choice for the read task. Therefore, the DAS framework does not incur any extra delays on the critical path. Moreover, it has a negligible energy overhead, as demonstrated in Section 4, which is critical for the performance and applicability of such an algorithm.
3.3. Fast and Slow (F and S) Scheduler Selection
Algorithm 1 ETF Scheduler |
|
4. Evaluation of DAS Using Simulations
4.1. Simulation Setup
- Domain Applications: The DAS framework is evaluated using five real-world streaming applications: range detection, temporal mitigation, WiFi-Transmitter, WiFi-Receiver applications, and a proprietary industrial application (App-1), as summarized in Table 2. We then construct 40 different workloads for runtime analysis of the schedulers used in this paper. All workloads are run in streaming mode, and for each data point in the figures in Section 4.3, approximately 10,000 tasks are scheduled. Details of the workload mixes are given in Appendix B.
- Emulation Environment: Conducting a realistic runtime overhead and energy analysis is one of our key goals in this study. For this purpose, we leverage a Compiler-integrated, Extensible DSSoC Runtime (CEDR) framework introduced by Mack et al. [5]. The CEDR has been validated extensively on x86 and Arm-based platforms. It enables pre-silicon performance evaluations of heterogeneous hardware configurations composed of mixtures of CPU cores and accelerators based on dynamically arriving workload scenarios. Compared to other emulation frameworks (e.g., ZeBu [39] and Veloce [40]), this portable and open-source environment offers distinct plug-and-play integration points, where developers can individually integrate and evaluate their applications, scheduling heuristics, and accelerator IPs in a realistic system.
- Simulation Environment: We use DS3 [43], an open-source domain-specific system-on-chip simulation framework, to perform detailed evaluations of our proposed scheduling approach. DS3 is a high-level simulation tool that includes built-in scheduling algorithms, models for PEs, interconnect, and memory systems. The framework has been validated on Xilinx Zynq ZCU102 and Odroid-XU3 (with a Samsung Exynos 5422 SoC) platforms. Therefore, it is a robust and reliable tool for performing a detailed exploration and evaluation of our proposed scheduling approach. DS3 supports the execution of streaming applications (represented as directed flow graphs). It allows multiple modes for injecting new applications at runtime, including fixed-interval injection and exponential distribution-based job injection. The inputs to the tool are the execution profiles of the processing elements in the SoC, application DFGs, and the interconnect configuration. At the end of the simulation, DS3 provides various workload statistics, including the injection rate, execution time, throughput, and PE utilization. We use these metrics provided by DS3 in our extensive evaluation to demonstrate the benefits of the DAS scheduling methodology.
- DSSoC Configuration: We construct a DSSoC configuration that comprises clusters of general-purpose cores and hardware accelerators. The hardware accelerators include fixed-function designs, and a multi-function systolic array processor (SAP). The application domains used in this study are wireless communications and radar systems, and hence, accelerators that expedite the execution of relevant tasks are included.
- Performance Metrics: We use the average execution time per application instance as our main performance metric. This metric is calculated by dividing the time each application instance is executed by the number of application instances. Another metric used for performance comparison is the average job slowdown. It is calculated as the average execution time of a scheduler divided by the average execution time of the idealized version of the ETF scheduler to show the distance of scheduler performances from an ideal scenario. Furthermore, we use the energy-delay product (EDP) metric to analyze the effects on energy consumption. The EDP is calculated as the multiplication of the total execution time (makespan) and energy consumption.
4.2. Exploration of Machine Learning Techniques and Feature Space for DAS
- Machine Learning Technique Exploration: The DT classifiers achieved similar or higher accuracies compared to the LR classifiers with lower storage overheads. While a DT with a depth of 16, which uses all the features, achieved the best classification accuracy, there was a significant impact on the storage overhead, which, in turn, affected the latency and energy consumption of the classifier. In comparison, DTs with tree depths of 2 and 4 had negligible storage overheads with competitive accuracies (>85%). Hence, we adopted the DT classifier with a depth of 2 for the DAS framework.
- Feature Space Exploration: We collected 62 performance counters in our training data. Selecting a subset of these counters as the DAS classifier features is crucial for minimizing the energy and performance overheads. A systematic feature space exploration was performed using feature selection and importance methods such as analysis of variance (ANOVA), F-value, and Chi-squared statistics [37]. Among the top six features, increasing the feature list from a single feature (input data rate) to two features with the addition of the earliest availability time of the Arm big cluster increased the accuracy from 63.66% to 85.48%. We used an 8-entry × 16-bit shift register to track the data rate at runtime. Therefore, we selected the two most important features: data rate and the earliest availability time of the Arm big cluster to design the DAS classifier model with a decision tree of depth 2. We note that these two features do not contain task-related information. Hence, this enables DAS to be compatible with diverse task scenarios without incurring additional overheads since it is not on the critical path.
4.3. Performance Analysis for Different Workloads
4.4. Scheduling Overhead and Energy Consumption Analysis
5. Evaluation of DAS Using FPGA Emulation
5.1. Experimental Setup
- DSSoC Configuration: The domain applications presented in Section 4.1 frequently perform FFT and matrix multiplication operations. To this end, we constructed a hardware platform comprising hardware accelerators for FFT and matrix multiplication. Additionally, we included three general-purpose cores to execute the other tasks. The full-system hardware that integrated the cores and accelerators was implemented on a Xilinx Zynq UltraScale+ ZCU102 FPGA [42].
- Runtime Environment: This study utilized the CEDR runtime framework [5] to implement DAS and evaluate its benefits for a DSSoC. CEDR allows users to compile and execute applications on heterogeneous SoC architectures. Furthermore, CEDR launches the execution of workloads comprising a combination of applications, each streaming with user-specified injection intervals. It offers a suite of schedulers and allows users to plug and play custom scheduling algorithms, making it a highly suitable environment for evaluating DAS. Therefore, we integrated DAS into CEDR and executed the workloads on customized domain-specific hardware. Furthermore, we implemented the LUT in software using inline assembly code and filled the task-PE assignments using the profiling information of domain applications on the target hardware.
- Training Setup: We utilized four real-world applications from the telecommunication and radar domains—WiFi-TX, Temporal Mitigation, Range Detection, and Pulse Doppler—to generate the training data for the DAS preselection classifier. Fifteen different workloads were generated from these four applications by varying the constitution of the number of jobs and their injection intervals to represent a variety of data rates. For example, one workload has 80 Temporal Mitigation and 20 WiFi-TX instances, whereas another one has 100 Range Detection instances. Details of the workload combinations are provided in Appendix C. Each workload was run in streaming mode and repeated for a hundred trials of twelve data points on the FPGA to mitigate runtime variations due to the operating system and memory. Consequently, each data point in Figure 8 represents approximately 150,000 scheduled tasks, with 100 jobs per trial and an average of 15 tasks per job using a specific scheduler. We utilized the same subset of performance counters described in Section 4.2 on the hardware platform to train the DAS preselection classifier model. The model employed a decision tree classifier with a maximum depth of 2. The choice of decision trees as the machine learning technique for the DAS preselection classifier and the tree depth was discussed in Section 4.2. It achieved a classification accuracy of 82.02% when choosing between the slow and fast schedulers at runtime. The accuracy on the hardware platform was lower than observed on the system-level simulator (85.48%) due to runtime variations of the operating system and memory.
5.2. Performance Results
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A. Theoretical Proof of the DAS Framework
Appendix A.1. Necessary Conditions for the Superiority of DAS
Notation | Definition |
---|---|
Ideal scheduler decision for task i | |
Decision of DAS scheduler for task i | |
Selecting scheduler-X when the ideal selection is Y | |
Execution time difference for task i with respect to the fast scheduler if selecting scheduler X when the ideal selection is Y | |
Total execution time difference for all tasks | |
Execution time for the nth simulation |
Appendix A.2. Experimental Validation of the Proof
Appendix A.2.1. Finding the Empirical Values for ΔL and ΔG
Algorithm A1 Algorithm to find ideal decisions, , and values |
|
Appendix A.2.2. Validating the DAS Framework Superiority
Appendix B. DSSoC Simulator Workload Mixes
Appendix C. Runtime Framework Workload Mixes
References
- Hennessy, J.L.; Patterson, D.A. A New Golden Age for Computer Architecture. Commun. ACM 2019, 62, 48–60. [Google Scholar] [CrossRef]
- Green, D.; Hamilton, B.A. Heterogeneous Integration at DARPA: Pathfinding and Progress in Assembly Approaches. In Proceedings of the 68th IEEE Electronic Components and Technology Conference, San Diego, CA, USA, 29 May–1 June 2018. [Google Scholar]
- RF Convergence: From the Signals to the Computer by Dr. Tom Rondeau (Microsystems Technology Office, DARPA). Available online: https://futurenetworks.ieee.org/images/files/pdf/FirstResponder/Tom-Rondeau-DARPA.pdf (accessed on 19 March 2023).
- Moazzemi, K.; Maity, B.; Yi, S.; Rahmani, A.M.; Dutt, N. HESSLE-FREE: Heterogeneous Systems Leveraging Fuzzy Control for Runtime Resource Management. ACM Trans. Embed. Comput. Syst. (TECS) 2019, 18, 1–19. [Google Scholar] [CrossRef]
- Mack, J.; Hassan, S.; Kumbhare, N.; Gonzalez, M.C.; Akoglu, A. CEDR-A Compiler-integrated, Extensible DSSoC Runtime. ACM Trans. Embed. Comput. Syst. (TECS) 2022, 22, 1–34. [Google Scholar] [CrossRef]
- Magarshack, P.; Paulin, P.G. System-on-chip Beyond the Nanometer Wall. In Proceedings of the Design Automation Conference, Anaheim, CA, USA, 2–6 June 2003; pp. 419–424. [Google Scholar]
- Choi, Y.K.; Cong, J.; Fang, Z.; Hao, Y.; Reinman, G.; Wei, P. In-depth Analysis on Microarchitectures of Modern Heterogeneous CPU-FPGA Platforms. ACM Trans. Reconfig. Technol. Syst. 2019, 12, 1–20. [Google Scholar] [CrossRef]
- Krishnakumar, A.; Ogras, U.; Marculescu, R.; Kishinevsky, M.; Mudge, T. Domain-Specific Architectures: Research Problems and Promising Approaches. ACM Trans. Embed. Comput. Syst. 2023, 22, 1–26. [Google Scholar] [CrossRef]
- Krishnakumar, A.; Arda, S.E.; Goksoy, A.A.; Mandal, S.K.; Ogras, U.Y.; Sartor, A.L.; Marculescu, R. Runtime Task Scheduling using Imitation Learning for Heterogeneous Many-core Systems. IEEE Trans. CAD Integr. Circuits Syst. 2020, 39, 4064–4077. [Google Scholar] [CrossRef]
- Pabla, C.S. Completely Fair Scheduler. Linux J. 2009, 184, 1–4. [Google Scholar]
- Beisel, T.; Wiersema, T.; Plessl, C.; Brinkmann, A. Cooperative Multitasking for Heterogeneous Accelerators in the Linux Completely Fair Scheduler. In Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors, Santa Monica, CA, USA, 11–14 September 2011; pp. 223–226. [Google Scholar]
- Goksoy, A.A.; Krishnakumar, A.; Hassan, M.S.; Farcas, A.J.; Akoglu, A.; Marculescu, R.; Ogras, U.Y. DAS: Dynamic adaptive scheduling for energy-efficient heterogeneous SoCs. IEEE Embed. Syst. Lett. 2021, 14, 51–54. [Google Scholar] [CrossRef]
- Topcuoglu, H.; Hariri, S.; Wu, M.Y. Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing. IEEE Trans. Parallel Distrib. Syst. 2002, 13, 260–274. [Google Scholar] [CrossRef]
- Bittencourt, L.F.; Sakellariou, R.; Madeira, E.R. DAG Scheduling Using a Lookahead Variant of the Heterogeneous Earliest Finish Time Algorithm. In Proceedings of the 18th Euromicro Conference on Parallel, Distributed and Network-Based Processing, Pisa, Italy, 17–19 February 2010; pp. 27–34. [Google Scholar]
- Vasile, M.A.; Pop, F.; Tutueanu, R.I.; Cristea, V.; Kołodziej, J. Resource-Aware Hybrid Scheduling Algorithm in Heterogeneous Distributed Computing. Future Gener. Comput. Syst. 2015, 51, 61–71. [Google Scholar] [CrossRef]
- Yang, H.; Ha, S. ILP based data parallel multi-task mapping/scheduling technique for MPSoC. In Proceedings of the 2008 International SoC Design Conference, Busan, Republic of Korea, 24–25 November 2008; Volume 1, pp. 1–134. [Google Scholar]
- Benini, L.; Bertozzi, D.; Milano, M. Resource Management Policy Handling Multiple Use-Cases in MpSoC Platforms using Constraint Programming. In Proceedings of the Logic Programming: 24th International Conference, ICLP 2008, Udine, Italy, 9–13 December 2008; Springer: Berlin/Heidelberg, Germany, 2008; pp. 470–484. [Google Scholar]
- Yoo, A.B.; Jette, M.A.; Grondona, M. Slurm: Simple Linux Utility for Resource Management. In Job Scheduling Strategies for Parallel Processing, Proceedings of the 9th International Workshop, JSSPP 2003, Seattle, WA, USA, 24 June 2003; Springer: Cham, Switzerland, 2003; pp. 44–60. [Google Scholar]
- Thain, D.; Tannenbaum, T.; Livny, M. Distributed computing in practice: The Condor experience. Concurr. Comput. Pract. Exp. 2005, 17, 323–356. [Google Scholar] [CrossRef]
- Chronaki, K.; Rico, A.; Casas, M.; Moretó, M.; Badia, R.M.; Ayguadé, E.; Labarta, J.; Valero, M. Task Scheduling Techniques for Asymmetric Multi-core Systems. IEEE Trans. Parallel Distrib. Syst. 2016, 28, 2074–2087. [Google Scholar] [CrossRef]
- Zhou, J. Real-time Task Scheduling and Network Device Security for Complex Embedded Systems based on Deep Learning Networks. Microprocess. Microsyst. 2020, 79, 103282. [Google Scholar] [CrossRef]
- Namazi, A.; Safari, S.; Mohammadi, S. CMV: Clustered Majority Voting Reliability-aware Task Scheduling for Multicore Real-time Systems. IEEE Trans. Reliab. 2018, 68, 187–200. [Google Scholar] [CrossRef]
- Xie, G.; Zeng, G.; Liu, L.; Li, R.; Li, K. Mixed Real-Time Scheduling of Multiple DAGs-based Applications on Heterogeneous Multi-core Processors. Microprocess. Microsyst. 2016, 47, 93–103. [Google Scholar] [CrossRef]
- Xiaoyong, T.; Li, K.; Zeng, Z.; Veeravalli, B. A Novel Security-Driven Scheduling Algorithm for Precedence-Constrained Tasks in Heterogeneous Distributed Systems. IEEE Trans. Comput. 2010, 60, 1017–1029. [Google Scholar] [CrossRef]
- Kwok, Y.K.; Ahmad, I. Dynamic Critical-Path Scheduling: An Effective Technique for Allocating Task Graphs to Multiprocessors. IEEE Trans. Parallel Distrib. Syst. 1996, 7, 506–521. [Google Scholar] [CrossRef]
- Sakellariou, R.; Zhao, H. A Hybrid Heuristic for DAG Scheduling on Heterogeneous Systems. In Proceedings of the 18th International Parallel and Distributed Processing Symposium, Santa Fe, NM, USA, 26–30 April 2004; p. 111. [Google Scholar]
- Prodromou, A.; Venkat, A.; Tullsen, D.M. Agon: A Scalable Competitive Scheduler for Large Heterogeneous Systems. arXiv 2021, arXiv:2109.00665. [Google Scholar]
- Jejurikar, R.; Gupta, R. Energy-aware Task Scheduling with Task Synchronization for Embedded Real-time Systems. IEEE Trans. CAD Integr. Circuits Syst. 2006, 25, 1024–1037. [Google Scholar] [CrossRef]
- Azad, P.; Navimipour, N.J. An Energy-aware Task Scheduling in the Cloud Computing using a Hybrid Cultural and Ant Colony Optimization Algorithm. Int. J. Cloud Appl. Comput. 2017, 7, 20–40. [Google Scholar] [CrossRef]
- Baskiyar, S.; Abdel-Kader, R. Energy Aware DAG Scheduling on Heterogeneous Systems. Clust. Comput. 2010, 13, 373–383. [Google Scholar] [CrossRef]
- Swaminathan, V.; Chakrabarty, K. Real-Time Task Scheduling for Energy-Aware Embedded Systems. J. Frankl. Inst. 2001, 338, 729–750. [Google Scholar] [CrossRef]
- Tomoutzoglou, O.; Mbakoyiannis, D.; Kornaros, G.; Coppola, M. Efficient Job Offloading in Heterogeneous Systems through Hardware-Assisted Packet-Based Dispatching and User-Level Runtime Infrastructure. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 2019, 39, 1017–1030. [Google Scholar] [CrossRef]
- Mbakoyiannis, D.; Tomoutzoglou, O.; Kornaros, G. Energy-performance considerations for data offloading to FPGA-based accelerators over PCIe. ACM Trans. Archit. Code Optim. (TACO) 2018, 15, 1–24. [Google Scholar] [CrossRef]
- Streit, A. A Self-tuning Job Scheduler Family with Dynamic Policy Switching. In Job Scheduling Strategies for Parallel Processing, Proceedings of the 8th International Workshop, JSSPP 2002, Edinburgh, Scotland, UK, 24 July 2002; Springer: Cham, Switzerland, 2002; pp. 1–23. [Google Scholar]
- Daoud, M.I.; Kharma, N. A Hybrid Heuristic–genetic Algorithm for Task Scheduling in Heterogeneous Processor Networks. J. Parallel Distrib. Comput. 2011, 71, 1518–1531. [Google Scholar] [CrossRef]
- Boeres, C.; Lima, A.; Rebello, V.E. Hybrid Task Scheduling: Integrating Static and Dynamic Heuristics. In Proceedings of the 15th Symposium on Computer Architecture and High Performance Computing, Sao Paulo, Brazil, 12 November 2003; pp. 199–206. [Google Scholar]
- McHugh, M.L. The Chi-square Test of Independence. Biochem. Med. 2013, 23, 143–149. [Google Scholar] [CrossRef]
- Hwang, J.J.; Chow, Y.C.; Anger, F.D.; Lee, C.Y. Scheduling Precedence Graphs in Systems with Interprocessor Communication Times. SIAM J. Comput. 1989, 18, 244–257. [Google Scholar] [CrossRef]
- ZeBu Server 4. Available online: https://www.synopsys.com/verification/emulation/zebu-server.html (accessed on 2 January 2020).
- Veloce2 Emulator. Available online: https://www.mentor.com/products/fv/emulation-systems/veloce (accessed on 2 January 2020).
- Mack, J.; Kumbhare, N.; NK, A.; Ogras, U.Y.; Akoglu, A. User-Space Emulation Framework for Domain-Specific SoC Design. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), New Orleans, LA, USA, 18–22 May 2020; pp. 44–53. [Google Scholar]
- Zynq ZCU102 Evaluation Kit. Available online: https://www.xilinx.com/products/boards-and-kits/ek-u1-zcu102-g.html (accessed on 10 March 2023).
- Arda, S.E.; Krishnakumar, A.; Goksoy, A.A.; Kumbhare, N.; Mack, J.; Sartor, A.L.; Akoglu, A.; Marculescu, R.; Ogras, U.Y. DS3: A System-Level Domain-Specific System-on-Chip Simulation Framework. IEEE Trans. Comput. 2020, 69, 1248–1262. [Google Scholar] [CrossRef]
- Xilinx-Accurate Power Measurement. Available online: https://www.xilinx.com/developer/articles/accurate-design-power-measurement.html (accessed on 11 February 2023).
- Sysfs Interface in Linux. Available online: https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface (accessed on 11 February 2023).
Type | Features |
---|---|
Task | Task ID, Execution time, Power consumption, Depth of task in DFG, Application ID, Predecessor task ID and cluster IDs, Application type |
Processing Element (PE) | Earliest time when PE is ready to execute, Earliest availability time of each cluster, PE utilization, Communication cost |
System | Input data rate |
Application | Number of Tasks | Supported Clusters |
---|---|---|
Range Detection | 7 | big, LITTLE, FFT, SAP |
Temporal Mitigation | 10 | big, LITTLE, FIR, SAP |
WiFi-TX | 27 | big, LITTLE, FFT, SAP |
WiFi-RX | 34 | big, LITTLE, FFT, FEC, FIR, SAP |
App-1 | 10 | LITTLE, FIR, SAP |
Processing Cluster | No. of Cores | Functionality |
---|---|---|
LITTLE | 4 | General purpose |
big | 4 | General purpose |
FFT | 4 | Acceleration of FFT |
FEC | 1 | Acceleration of encoding and decoding operations |
FIR | 4 | Acceleration of FIR |
SAP | 2 | Multi-function acceleration |
TOTAL | 19 |
Classifier | Tree Depth | Number of Features | Classification Accuracy (%) | Storage (KB) |
---|---|---|---|---|
LR | - | 2 | 79.23 | 0.01 |
LR | - | 62 | 83.1 | 0.24 |
DT | 2 | 1 | 63.66 | 0.01 |
DT | 2 | 2 | 85.48 | 0.01 |
DT | 3 | 6 | 85.51 | 0.03 |
DT | 2 | 62 | 85.9 | 0.01 |
DT | 16 | 62 | 91.65 | 256 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Goksoy, A.A.; Hassan, S.; Krishnakumar, A.; Marculescu, R.; Akoglu, A.; Ogras, U.Y. Theoretical Validation and Hardware Implementation of Dynamic Adaptive Scheduling for Heterogeneous Systems on Chip. J. Low Power Electron. Appl. 2023, 13, 56. https://doi.org/10.3390/jlpea13040056
Goksoy AA, Hassan S, Krishnakumar A, Marculescu R, Akoglu A, Ogras UY. Theoretical Validation and Hardware Implementation of Dynamic Adaptive Scheduling for Heterogeneous Systems on Chip. Journal of Low Power Electronics and Applications. 2023; 13(4):56. https://doi.org/10.3390/jlpea13040056
Chicago/Turabian StyleGoksoy, A. Alper, Sahil Hassan, Anish Krishnakumar, Radu Marculescu, Ali Akoglu, and Umit Y. Ogras. 2023. "Theoretical Validation and Hardware Implementation of Dynamic Adaptive Scheduling for Heterogeneous Systems on Chip" Journal of Low Power Electronics and Applications 13, no. 4: 56. https://doi.org/10.3390/jlpea13040056
APA StyleGoksoy, A. A., Hassan, S., Krishnakumar, A., Marculescu, R., Akoglu, A., & Ogras, U. Y. (2023). Theoretical Validation and Hardware Implementation of Dynamic Adaptive Scheduling for Heterogeneous Systems on Chip. Journal of Low Power Electronics and Applications, 13(4), 56. https://doi.org/10.3390/jlpea13040056