[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/1687399.1687436acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article

Operating system scheduling for efficient online self-test in robust systems

Published: 02 November 2009 Publication History

Abstract

Very thorough online self-test is essential for overcoming major reliability challenges such as early-life failures and transistor aging in advanced technologies. This paper demonstrates the need for operating system (OS) support to efficiently orchestrate online self-test in future robust systems. Experimental data from an actual dual quad-core system demonstrate that, without software support, online self-test can significantly degrade performance of soft real-time and computation-intensive applications (by up to 190%), and can result in perceptible delays for interactive applications. To mitigate these problems, we develop OS scheduling techniques that are aware of online self-test, and schedule/migrate tasks in multi-core systems by taking into account the unavailability of one or more cores undergoing online self-test. These techniques eliminate any performance degradation and perceptible delays in soft real-time and interactive applications (otherwise introduced by online self-test), and significantly reduce the impact of online self-test on the performance of computation-intensive applications. Our techniques require minor modifications to existing OS schedulers, thereby enabling practical and efficient online self-test in real systems.

References

[1]
{Agarwal 07} Agarwal, M., et al., "Circuit Failure Prediction and Its Application to Transistor Aging," VTS, 2007.
[2]
{Agarwal 08} Agarwal, M., et al., "Optimized Circuit Failure Prediction for Aging: Practicality and Promise," ITC, 2008.
[3]
{Agostinelli 05} Agostinelli, M., et al., "Random charge effects for PMOS NBTI in ultra-small gate area devices," IRPS, 2005.
[4]
{Al-Yamani 03} Al-Yamani, A. A., et al., "BIST Reseeding with Very Few Seeds," VTS, 2003.
[5]
{Azumi 07} Azumi, M., et al., "Integration Challenges and Tradeoffs for Tera-Scale Architectures," Intel Technical Journal, 2007.
[6]
{Baba 09} Baba, H., and S. Mitra, "Testing for Transistor Aging," VTS, 2009.
[7]
{Balakrishnan 05} S. Balakrishnan et al., "The Impact of Performance Asymmetry in Emerging Multicore Architectures," ISCA, 2005.
[8]
{Bardell 87} Bardell, P. H., et al., "Built-In Test for VLSI: Pseudorandom Techniques," Wiley, 1987.
[9]
{Bienia 08} Bienia, C., et al., "The PARSEC Benchmark Suite: Characterization and Architectural Implications," PACT, 2008.
[10]
{Borkar 05} Borkar, S., "Designing Reliable Systems from Unreliable Components: The Challenges of Transistor Variability and Degradation," MICRO, 2005.
[11]
{Borkar 07} Borkar, S., "Thousand Core Chips -- A Technology Perspective," DAC, 2007.
[12]
{Carulli 06} Carulli Jr, J. M., and T. J. Anderson, "The impact of multiple failure modes on estimating product field reliability," IEEE Design & Test, 2006.
[13]
{Chen 07} Chen, J., and J. Thropp, "Review of Low Frame Rate Effects on Human Performance," IEEE Trans. on Systems, Man, and Cybernetics, 2007.
[14]
{Chen 08} Chen, T. W., et al., "Gate-Oxide Early Life Failure Prediction," VTS, 2008.
[15]
{Chen 09} Chen, T. W., Y. M. Kim, K. Kim, Y. Kameda, M. Mizuno and S. Mitra, "Experimental Study of Gate-Oxide Early Life Failures," IRPS, 2009.
[16]
{Constantinides 06} Constantinides, K., et al., "Ultra Low-Cost Defect Protection for Microprocessor Pipelines," ASPLOS, 2006.
[17]
{Constantinides 07} Constantinides, K., et al., "Software-Based On-line Detection of Hardware Defects: Mechanisms, Architectural Support, and Evaluation," MICRO, 2007.
[18]
{eCos 09} http://ecos.sourceware.org/.
[19]
{Flautner 00} Flautner, K., et al., "Thread-level Parallelism and Interactive Performance of Desktop Applications," ASPLOS, 2000.
[20]
{Gherman 04} Gherman, V., et al., "Efficient Pattern Mapping for Deterministic Logic BIST," ITC, 2004.
[21]
{Grecu 06} Grecu, C. et al., "BIST for Network-on-Chip Interconnect Infrastructures," VTS, 2006.
[22]
{Heo 03} Heo, S., et al., "Reducing Power Density through Activity migration," Intl. Symp. on Quality Electronic Design, 2003.
[23]
{HP 08} "HP xw8600 Workstation Specification," http://h10010.www1.hp.com/wwpc/us/en/sm/WF06a/12454-12454-296719-307907-296721-3432827.html.
[24]
{IEEE 05} "IEEE 1500 Standard for Embedded Core Test," http://grouper.ieee.org/groups/1500/index.html, 2005.
[25]
{Inoue 08} Inoue, H., et al., "VAST: Virtualization Assisted Concurrent Autonomous Self-Test," ITC, 2008.
[26]
{Intel 07} "Intel 64 and IA-32 Architectures Optimization Reference Manual," http://www.intel.com/design/processor/manuals/248966.pdf, 2007.
[27]
{Isci 06} Isci, C., et al., "An Analysis of Efficient Multi-Core Global Power management Policies: Maximizing Performance for a Given Power Budget," MICRO, 2006.
[28]
{Lai 05} Lai, L., et al., "Hardware Efficient LBIST with Complementary Weights," ICCD, 2005.
[29]
{Li 06} Li, J., and J. Martinez, "Dynamic power-performance adaptation of parallel computation on chip multiprocessors", HPCA, 2006.
[30]
{Li 07} Li, T., et al., Efficient Operating System Scheduling for Performance-Asymmetric multi-Core Architectures," Supercomputing, 2007.
[31]
{Li 08a} Li, Y., et al., "CASP: Concurrent Autonomous Chip Self-Test Using Stored Test Patterns," DATE, 2008.
[32]
{Li 08b} Li, Y., O. Mutlu, and S. Mitra, "Operating System Scheduling for Efficient Online Self-Test in Robust Systems (Technical Report)," 2008. Please contact the authors for access.
[33]
{Liu 01} Liu, J., et al., "Power-Aware Scheduling under Timing Constraints for Mission-Critical Embedded Systems," DAC, 2001.
[34]
{Luo 01} Luo, J., and N. K., Jha, "Battery-Aware Static Scheduling for Distributed Real-Time Embedded Systems," DAC, 2001.
[35]
{Kadayif 04} Kadayif, I., et al., "Exploiting Processor Workload Heterogeneity for Reducing Energy Consumption in Chip Multiprocessors," DATE, 2004.
[36]
{Kumar 03} R. Kumar, et. al., "Single-ISA Heterogeneous Multicore Architectures for Multithreaded Workload Performance," MICRO, 2003.
[37]
{McLaurin 02} McLaurin, T., and S. Ghosh, "ETM Incorporates Hardware Segment of IEEE P1500," IEEE Design & Test, 2002.
[38]
{Nakao 99} Nakao, M., et al., "Low Overhead Test Point Insertion for Scan-Based BIST," ITC, 1999.
[39]
{Parvathala 02} Parvathala, P., et al., "FRITS: A Microprocessor Functional BIST Method," ITC, 2002.
[40]
{Qiu 01} Qiu, Q., et al., "Dynamic Power Management in a Mobile Multimedia System with Guaranteed Quality-of-Service," DAC, 2001.
[41]
{Quan 01} Quan, G., and X. Hu, "Energy Efficient Fixed-Priority Scheduling for Real-Time Systems on Variable Voltage Processors," DAC, 2001.
[42]
{Qureshi 07} Qureshi, M., et al., "Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines," HPCA, 2007.
[43]
{RTLinux 09} http://www.rtlinuxfree.com/.
[44]
{Shen 98} Shen, J., and J. A. Abraham, "Native Mode Functional Test Generation for Processors with Applications to Self Test and Design Validation," ITC, 1998.
[45]
{Silberschatz 05} Silberschatz, A., "Operating System Concepts (7th edition)", John Wiley & Sons Inc., 2005.
[46]
{Sridharan 08} Sridharan, R., et al., "Feedback-Controlled Reliability-Aware Power Management for Real-Time Embedded Systems," DAC, 2008.
[47]
{Stavrou 06} Stavrou, K., and P. Trancoso, "Thermal-Aware Scheduling: A Solution for Future Chip Multiprocessors' Thermal Problems," EUROMICRO, 2006.
[48]
{Sylvester 06} Sylvester, D., et al., "ElastIC: An Adaptive Self-Healing Architecture for Unpredictable Silicon," IEEE Design & Test, 2006.
[49]
{Teodorescu 08} Teodorescu, R., and J. Torrellas, "Variation-Aware Application Scheduling and Power Management for Chip Multiprocessors," ISCA, 2008.
[50]
{Touba 95} Touba, N. A., and E. J. McCluskey, "Synthesis of Mapping Logic for Generating Transformed Pseudo-Random Patterns for BIST," ITC, 1995.
[51]
{Van Horn 05} Van Horn, J., "Towards Achieving Relentless Reliability Gains in a Server Marketplace of Teraflops, Laptops, Kilowatts, & Cost, Cost, Cost," ITC, 2005.
[52]
{Zhang 02} Zhang, Y., et al., "Task Scheduling and Voltage Selection for Energy Minimization", DAC, 2002.
[53]
{Zhang 04} Zhang, Y., et al., "Energy-Aware Deterministic Fault Tolerance in Distributed Real-Time Embedded Systems," DAC, 2004.
[54]
{Zhang 05} Zhang, Y., et al., "Optimal Procrastinating Voltage Scheduling for Hard Real-Time Systems," DAC, 2005.
[55]
{Zhuo 05} Zhuo, J., and C. Chakrabarti, "System-Level Energy-Efficient Dynamic Task Scheduling," DAC, 2005.

Cited By

View all
  • (2024)HUSTLE: A Hardware Unit for Self-test-Libraries Efficient ExecutionApplications in Electronics Pervading Industry, Environment and Society10.1007/978-3-031-48121-5_56(392-398)Online publication date: 13-Jan-2024
  • (2023)Harvesting Wasted Clock Cycles for Efficient Online Testing2023 IEEE European Test Symposium (ETS)10.1109/ETS56758.2023.10173955(1-6)Online publication date: 22-May-2023
  • (2019)Cross-Layer ResilienceProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3323474(1-4)Online publication date: 2-Jun-2019
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICCAD '09: Proceedings of the 2009 International Conference on Computer-Aided Design
November 2009
803 pages
ISBN:9781605588001
DOI:10.1145/1687399
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 November 2009

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

ICCAD '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)HUSTLE: A Hardware Unit for Self-test-Libraries Efficient ExecutionApplications in Electronics Pervading Industry, Environment and Society10.1007/978-3-031-48121-5_56(392-398)Online publication date: 13-Jan-2024
  • (2023)Harvesting Wasted Clock Cycles for Efficient Online Testing2023 IEEE European Test Symposium (ETS)10.1109/ETS56758.2023.10173955(1-6)Online publication date: 22-May-2023
  • (2019)Cross-Layer ResilienceProceedings of the 56th Annual Design Automation Conference 201910.1145/3316781.3323474(1-4)Online publication date: 2-Jun-2019
  • (2018)Exploring System Availability During Software-Based Self-Testing of Multi-core CPUsJournal of Electronic Testing: Theory and Applications10.1007/s10836-018-5706-034:1(67-81)Online publication date: 1-Feb-2018
  • (2018)Test CoverageVLSI Design and Test for Systems Dependability10.1007/978-4-431-56594-9_11(439-473)Online publication date: 21-Jul-2018
  • (2017)On Diagnosing the Aging Level of Automotive Semiconductor DevicesIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2016.260195964:7(822-826)Online publication date: Jul-2017
  • (2017)High-level test generation for processing elements in many-core systems2017 12th International Symposium on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC)10.1109/ReCoSoC.2017.8016156(1-8)Online publication date: Jul-2017
  • (2016)DaemonGuardIEEE Transactions on Computers10.1109/TC.2015.244984065:5(1453-1466)Online publication date: 1-May-2016
  • (2015)Synthesis of resilient circuits from guarded atomic actionsThe 20th Asia and South Pacific Design Automation Conference10.1109/ASPDAC.2015.7059064(550-555)Online publication date: Jan-2015
  • (2014)The resilience wall: Cross-layer solution strategiesProceedings of Technical Program - 2014 International Symposium on VLSI Technology, Systems and Application (VLSI-TSA)10.1109/VLSI-TSA.2014.6839639(1-11)Online publication date: Apr-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media