Abstract
As today’s state-of-the-art signal processing systems often require heterogeneous computing and special-purpose accelerators to offer highly efficient performance for mixed application workloads, including not only traditional signal processing algorithms, but also the demands to enable smart applications with data analytics, machine learning, as well as the capability interacting with both physical and cyber worlds via sensors and networks. Thus, the complexity of such systems has been increasing, and the focus of designing has been shifting to exploring the design space with a mixture of processing cores/accelerators and the interconnection networks between the components to optimize the performance and efficiency at the system level. Traditional simulation tools may offer accurate performance estimation at micro architectural level, but it is highly complicated to combine the simulators for various components to perform complex applications, and they fall in short in terms of their capabilities to profiling application workload. Furthermore, the speed of such complex simulation would be unacceptably slow with traditional system-level simulation framework such as SystemC. To solve the problem, we develop a rapid hybrid emulation/simulation framework that allows the user to execute full-blown system and application software and plug in emulators, simulators, and timing models for various components in the prototype system, switching the timing models dynamically with our just-in-time model selection mechanism, and connect the emulated/simulated components with scalable communication channels, so that the framework can be accelerated effectively by a multicore host. Our just-in-time model selection mechanism is capable of detecting and skipping regular program patterns to save the simulation time dramatically. In addition, our framework is capable of estimating the performance of different system configurations with concurrent multiple timing models, which further saves the time needed for traversing the design space. Our experimental results have shown that our dynamic model selection and multi-model approach collectively can speed up the design space exploration by 13.4 times on a quad-core host for cache simulation.
Similar content being viewed by others
Notes
We have shared this work as an open source project on https://bitbucket.org/paslab/qemu_vpmu_opensource https://bitbucket.org/paslab/qemu_vpmu_opensource, https://github.com/snippits/qemu_vpmu https://github.com/snippits/qemu_vpmu.
In this case study, we explored the cache designs for ARM-based smartphone systems by considering the execution time, die area and power consumption.
References
Intel performance bottleneck (2012). Loads blocked by store forwarding https://software.intel.com/en-us/forums/intel-performance-bottleneck-analyzer/topic/333586.
Intel developer forum: 4k aliasing. https://software.intel.com/en-us/forums/intel-vtune-amplifier-xe/topic/606846 (2016).
Intel performance bottleneck: 4k aliasing. https://software.intel.com/en-us/node/544395 (2016).
Andersen, E. Buildroot: making embedded linux easy. https://buildroot.org/.
Angiolini, F., Ceng, J., Leupers, R., Ferrari, F., Ferri, C., & Benini, L. (2006). An integrated open framework for heterogeneous mpsoc design space exploration. In Proceedings of the conference on design, automation and test in Europe: proceedings (pp. 1145–1150.) European Design and Automation Association.
Beltrame, G., Fossati, L., & Sciuto, D. (2010). Decision-theoretic design space exploration of multiprocessor platforms. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 29(7), 1083–1095.
Binkert, N., Beckmann, B., Black, G., Reinhardt, S. K., Saidi, A., Basu, A., Hestness, J., Hower, D. R., Krishna, T., Sardashti, S., & et al. (2011). The gem5 simulator. ACM SIGARCH Computer Architecture News, 39(2), 1–7.
Binkert, N. L., Dreslinski, R. G., Hsu, L. R., Lim, K. T., Saidi, A. G., & Reinhardt, S. K. (2006). The m5 simulator: modeling networked systems. IEEE Micro, 26(4), 52–60.
Bray, T. (2014). The javascript object notation (json) data interchange format. https://en.wikipedia.org/wiki/JSON.
Burger, D., & Austin, T. M. (1997). The simplescalar tool set, version 2.0. ACM SIGARCH Computer Architecture News, 25(3), 13–25.
Calborean, H., Jahr, R., Ungerer, T., & Vintan, L. (2011). Optimizing a superscalar system using multi-objective design space exploration. In Proceedings of the 18th international conference on control systems and computer science (CSCS), (Vol. 1, pp. 339–346). Bucharest.
Calborean, H., & Vintan, L. (2010). An automatic design space exploration framework for multicore architecture optimizations. In 2010 9th on roedunet international conference (RoEduNet) (pp. 202–207). New York: IEEE.
Chen, T., Guo, Q., Tang, K., Temam, O., Xu, Z., Zhou, Z. H., & Chen, Y. (2014). Archranker: a ranking approach to design space exploration. In 2014 ACM/IEEE 41st International symposium on computer architecture (ISCA) (pp. 85–96). New York: IEEE.
Cheng, J. J., Hung, S. H., & Yeh, C. W. (2015). Rapid analysis of interprocessor communications on heterogeneous system architectures via parallel cache emulation. In Proceedings of the 2015 conference on research in adaptive and convergent systems (pp. 418–423). New York: ACM.
Chiou, D., Sunwoo, D., Kim, J., Patil, N. A., Reinhart, W., Johnson, D. E., Keefe, J., & Angepat, H. (2007). Fpga-accelerated simulation technologies (fast): fast, full-system, cycle-accurate simulators. In Proceedings of the 40th Annual IEEE/ACM international symposium on microarchitecture (pp. 249–261). Washington, D.C.: IEEE Computer Society.
Ding, J. H., Hsu, W. C., Jeng, B. C., Hung, S. H., & Chung, Y. C. (2014). Hsaemu: a full system emulator for hsa platforms. In Proceedings of the 2014 international conference on hardware/software codesign and system synthesis (p. 26). New York: ACM.
Dubach, C., Jones, T., & O’Boyle, M. (2007). Microarchitectural design space exploration using an architecture-centric approach. In Proceedings of the 40th Annual IEEE/ACM international symposium on microarchitecture (pp. 262–271). Washington, D.C.: IEEE Computer Society.
Durillo, J. J., Nebro, A. J., & Alba, E. (2010). The jmetal framework for multi-objective optimization: design and architecture. In IEEE congress on evolutionary computation (pp. 1–8). New York: IEEE.
Dutta, R., Roy, J., & Vemuri, R. (1992). Distributed design-space exploration for high-level synthesis systems. In Proceedings of the 29th ACM/IEEE design automation conference (pp. 644–650). Washington, D.C.: IEEE Computer Society Press.
Edler, J., & Hill, M.D. (1998). Dinero iv trace-driven uniprocessor cache simulator.
Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., & Brown, R. B. (2001). Mibench: a free, commercially representative embedded benchmark suite. In 2001 IEEE International workshop on workload characterization, 2001. WWC-4 (pp. 3–14). New York: IEEE.
Hsu, H. C., Yeh, C. W., Hung, S. H., Hsu, W. C., King, C. T., & Chung, Y. C. (2016). Hsaemu 2.0: full system emulation for hsa platforms with soft-mmu. In Proceedings of the international conference on research in adaptive and convergent systems (pp. 230–235). New York: ACM.
Hung, S. H., Chen, J. H., Tu, C. H., & Shieh, J. P. (2012). Adset: a framework of rapid design space exploration for android-based systems. In 2012 IEEE 1st Global conference on consumer electronics (GCCE) (pp. 586–587). New York: IEEE.
Hung, S. H., Kuo, T. W., Shih, C. S., & Tu, C. H. (2012). System-wide profiling and optimization with virtual machines. In 2012 17th Asia and South Pacific design automation conference (ASP-DAC) (pp. 395–400). New York: IEEE.
Hung, S. H., Liang, F. T., Tu, C. H., & Chang, N. (2013). Performance and power estimation for mobile-cloud applications on virtualized platforms. In 2013 Seventh international conference on innovative mobile and internet services in ubiquitous computing (IMIS) (pp. 260–267). New York: IEEE.
Ipek, E., McKee, S. A., Singh, K., Caruana, R., Supinski, B.R.D., & Schulz, M. (2008). Efficient architectural design space exploration via predictive modeling. ACM Transactions on Architecture and Code Optimization (TACO), 4(4), 1.
Issariyakul, T., & Hossain, E. (2011). Introduction to network simulator NS2. Berlin: Springer Science & Business Media.
Kang, E., Jackson, E., & Schulte, W. (2010). An approach for effective design space exploration. In Modeling, development, and verification of adaptive systems foundations of computer software (pp. 33–54). Berlin: Springer.
Lin, C.Y., Chen, P.Y., Tseng, C.K., Huang, C.W., Weng, C.C., Kuan, C.B., Lin, S.H., Huang, S.Y., & Lee, J.K. (2010). Power aware sid-based simulator for embedded multicore dsp subsystems. In Proceedings of the eighth IEEE/ACM/IFIP international conference on hardware/software codesign and system synthesis, CODES/ISSS ’10 (pp. 95–104). New York: ACM, DOI https://doi.org/10.1145/1878961.1878981, (to appear in print).
Liu, H. Y., & Carloni, L. P. (2013). On learning-based methods for design-space exploration with high-level synthesis. In Proceedings of the 50th annual design automation conference (p. 50). New York: ACM.
Miettinen, A. P., Hirvisalo, V., & Knuuttila, J. (2011). Using qemu in timing estimation for mobile software development.In 1st international QEMU users’ forum (Vol. 1, pp. 19–22).
Mohanty, S., Prasanna, V. K., Neema, S., & Davis, J. (2002). Rapid design space exploration of heterogeneous embedded systems using symbolic search and multi-granular simulation. ACM SIGPLAN Notices, 37 (7), 18–27.
Ozisikyilmaz, B., Memik, G., & Choudhary, A. (2008). Efficient system design space exploration using machine learning techniques. In Proceedings of the 45th annual design automation conference (pp. 966–969): ACM.
Power, J., Hestness, J., Orr, M. S., Hill, M. D., & Wood, D.A. (2015). gem5-gpu: a heterogeneous cpu-gpu simulator. IEEE Computer Architecture Letters, 14(1), 34–36.
Pullini, A., Conti, F., Rossi, D., Loi, I., Gautschi, M., & Benini, L. (2016). A heterogeneous multi-core system-on-chip for energy efficient brain inspired vision. In 2016 IEEE International symposium on circuits and systems (ISCAS) (pp. 2910–2910). New York: IEEE.
Renau, J., Fraguela, B., Tuck, J., Liu, W., Prvulovic, M., Ceze, L., Sarangi, S., Sack, P., Strauss, K., & Montesinos, P. (2005). Sesc: cycle accurate architectural simulator. Retrieved November 19 2013.
Rosenfeld, P., Cooper-Balis, E., & Jacob, B. (2011). Dramsim2: a cycle accurate memory system simulator. IEEE Computer Architecture Letters, 10(1), 16–19.
Sanchez, D., & Kozyrakis, C. (2013). Zsim: fast and accurate microarchitectural simulation of thousand-core systems. In ACM SIGARCH Computer architecture news, (Vol. 41 pp. 475–486): ACM.
Schatz, B., Holzl, F., & Lundkvist, T. (2010). Design-space exploration through constraint-based model-transformation. In 2010 17th IEEE International conference and workshops on engineering of computer based systems (ECBS) (pp. 173–182). New York: IEEE.
Stoif, C., Schoeberl, M., Liccardi, B., & Haase, J. (2011). Hardware synchronization for embedded multi-core processors. In 2011 IEEE International symposium of circuits and systems (ISCAS) (pp. 2557–2560). New York: IEEE.
Tu, C., Hung, S., & Tsai, T. (2012). Mcemu: a framework for software development and performance analysis of multicore systems. ACM Transactions on Design Automation of Electronic Systems, 17(4), 36. https://doi.org/10.1145/2348839.2348840.
Ubal, R., Sahuquillo, J., Petit, S., & López, P. (2007). Multi2sim: a simulation framework to evaluate multicore-multithread processors. In IEEE 19th International symposium on computer architecture and high performance computing (pp. 62–68). New York: Citeseer.
Yourst, M. T. (2007). Ptlsim: a cycle accurate full system x86-64 microarchitectural simulator. In IEEE International symposium on performance analysis of systems & software, 2007. ISPASS 2007 (pp. 23–34). New York: IEEE.
Yu, K., Bi, J., & Tresp, V. (2006). Active learning via transductive experimental design. In Proceedings of the 23rd international conference on Machine learning (pp. 1081–1088). New York: ACM.
Acknowledgements
This work was financially supported by the Ministry of Science and Technology of Taiwan under Grants MOST 105-2622-8-002-002, and sponsored by MediaTek Inc., Hsin-chu, Taiwan. We specially thank our colleagues, Tsung-Han Chiang and Jen-Chieh Wu, for proofing the concept of this work.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Yeh, CW., Tu, CH. & Hung, SH. Rapid Hybrid Simulation Methods for Exploring the Design Space of Signal Processors with Dynamic and Scalable Timing Models. J Sign Process Syst 91, 247–259 (2019). https://doi.org/10.1007/s11265-017-1285-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-017-1285-z