[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1109/DSN.2014.18guideproceedingsArticle/Chapter ViewAbstractPublication PagesConference Proceedingsacm-pubtype
Article

Failure Analysis of Virtual and Physical Machines: Patterns, Causes and Characteristics

Published: 23 June 2014 Publication History

Abstract

In today's commercial data centers, the computation density grows continuously as the number of hardware components and workloads in units of virtual machines increase. The service availability guaranteed by data centers heavily depends on the reliability of the physical and virtual servers. In this study, we conduct an analysis on 10K virtual and physical machines hosted on five commercial data centers over an observation period of one year. Our objective is to establish a sound understanding of the differences and similarities between failures of physical and virtual machines. We first capture their failure patterns, i.e., the failure rates, the distributions of times between failures and of repair times, as well as, the time and space dependency of failures. Moreover, we correlate failures with the resource capacity and run-time usage to identify the characteristics of failing servers. Finally, we discuss how virtual machine management actions, i.e., consolidation and on/off frequency, impact virtual machine failures.

Cited By

View all
  • (2023)Advanced Machine Learning for Runtime Data GenerationProceedings of the 12th Latin-American Symposium on Dependable and Secure Computing10.1145/3615366.3622793(182-187)Online publication date: 16-Oct-2023
  • (2023)Resilient Baseband Processing in Virtualized RANs with SlingshotProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3604841(654-667)Online publication date: 10-Sep-2023
  • (2023)Predicting GPU Failures With High Precision Under Deep Learning WorkloadsProceedings of the 16th ACM International Conference on Systems and Storage10.1145/3579370.3594777(124-135)Online publication date: 5-Jun-2023
  • Show More Cited By
  1. Failure Analysis of Virtual and Physical Machines: Patterns, Causes and Characteristics

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    DSN '14: Proceedings of the 2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
    June 2014
    801 pages
    ISBN:9781479922338

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 23 June 2014

    Author Tag

    1. Datacenters, VM failures, failure root causes

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 21 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Advanced Machine Learning for Runtime Data GenerationProceedings of the 12th Latin-American Symposium on Dependable and Secure Computing10.1145/3615366.3622793(182-187)Online publication date: 16-Oct-2023
    • (2023)Resilient Baseband Processing in Virtualized RANs with SlingshotProceedings of the ACM SIGCOMM 2023 Conference10.1145/3603269.3604841(654-667)Online publication date: 10-Sep-2023
    • (2023)Predicting GPU Failures With High Precision Under Deep Learning WorkloadsProceedings of the 16th ACM International Conference on Systems and Storage10.1145/3579370.3594777(124-135)Online publication date: 5-Jun-2023
    • (2023)Partial Network PartitioningACM Transactions on Computer Systems10.1145/357619241:1-4(1-34)Online publication date: 18-Dec-2023
    • (2023)Enabling Resilience in Virtualized RANs with AtlasProceedings of the 29th Annual International Conference on Mobile Computing and Networking10.1145/3570361.3613276(1-15)Online publication date: 2-Oct-2023
    • (2021)LineFSProceedings of the ACM SIGOPS 28th Symposium on Operating Systems Principles10.1145/3477132.3483565(756-771)Online publication date: 26-Oct-2021
    • (2020)AssiseProceedings of the 14th USENIX Conference on Operating Systems Design and Implementation10.5555/3488766.3488823(1011-1027)Online publication date: 4-Nov-2020
    • (2020)Toward a generic fault tolerance technique for partial network partitioningProceedings of the 14th USENIX Conference on Operating Systems Design and Implementation10.5555/3488766.3488786(351-368)Online publication date: 4-Nov-2020
    • (2018)An analysis of network-partitioning failures in cloud systemsProceedings of the 13th USENIX conference on Operating Systems Design and Implementation10.5555/3291168.3291173(51-68)Online publication date: 8-Oct-2018
    • (2018)ECHOProceedings of the 24th Annual International Conference on Mobile Computing and Networking10.1145/3241539.3241564(163-178)Online publication date: 15-Oct-2018
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media