[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.5555/3291168.3291171acmotherconferencesArticle/Chapter ViewAbstractPublication PagesosdiConference Proceedingsconference-collections
Article

REPT: reverse debugging of failures in deployed software

Published: 08 October 2018 Publication History

Abstract

Debugging software failures in deployed systems is important because they impact real users and customers. However, debugging such failures is notoriously hard in practice because developers have to rely on limited information such as memory dumps. The execution history is usually unavailable because high-fidelity program tracing is not affordable in deployed systems.
In this paper, we present REPT, a practical system that enables reverse debugging of software failures in deployed systems. REPT reconstructs the execution history with high fidelity by combining online lightweight hardware tracing of a program's control flow with offline binary analysis that recovers its data flow. It is seemingly impossible to recover data values thousands of instructions before the failure due to information loss and concurrent execution. REPT tackles these challenges by constructing a partial execution order based on timestamps logged by hardware and iteratively performing forward and backward execution with error correction.
We design and implement REPT, deploy it on Microsoft Windows, and integrate it into WinDbg. We evaluate REPT on 16 real-world bugs and show that it can recover data values accurately (92% on average) and efficiently (in less than 20 seconds) for these bugs. We also show that it enables effective reverse debugging for 14 bugs.

References

[1]
https://bz.apache.org/bugzilla/show bug.cgi?id=24483.
[2]
https://bz.apache.org/bugzilla/show bug.cgi?id=39722.
[3]
https://bz.apache.org/bugzilla/show bug.cgi?id=60324.
[4]
https://www.exploit-db.com/exploits/25005/.
[5]
http://ifsec.blogspot.com/2007/04/php-521-wbmp-file-handling-integer.html.
[6]
https://www.exploit-db.com/exploits/17201/.
[7]
https://bugs.php.net/bug.php?id=74194.
[8]
https://bugs.php.net/bug.php?id=76041.
[9]
https://github.com/tintinweb/pub/tree/master/pocs/cve-2016-2563.
[10]
https://bugs.python.org/issue1179.
[11]
https://bugs.python.org/issue28322.
[12]
https://bugs.chromium.org/p/chromium/issues/detail?id=784183.
[13]
https://bugs.python.org/issue31530.
[14]
https://bugs.chromium.org/p/chromium/issues/detail?id=776677.
[15]
https://bugs.documentfoundation.org/show_bug.cgi?id=88914.
[16]
R. Abreu, P. Zoeteweij, and A. J. C. v. Gemund. An evaluation of similarity coefficients for software fault localization. In Pacific Rim Intl. Symp. on Dependable Computing, 2006.
[17]
Apple Inc. MacOSX CrashReporter. https://developer.apple.com/library/content/technotes/tn2004/tn2123.html, 2017.
[18]
Arm Embedded Trace Macrocell (ETM), 2017. http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ihi0014q/index.html.
[19]
J. Arulraj, P.-C. Chang, G. Jin, and S. Lu. Production-run software failure diagnosis via hardware performance counters. In Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, 2013.
[20]
J. Arulraj, G. Jin, and S. Lu. Leveraging the shortterm memory of hardware to diagnose production-run software failures. In Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, 2014.
[21]
T. Ball, V. Levin, and S. K. Rajamani. A decade of software model checking with SLAM. Commun. ACM, 54(7), July 2011.
[22]
C. Cadar, D. Dunbar, and D. Engler. Klee: Unassisted and automatic generation of high-coverage tests for complex systems programs. In USENIX Conference on Operating Systems Design and Implementation, 2008.
[23]
M. Castro, M. Costa, and J.-P. Martin. Better bug reporting with better privacy. In Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, 2008.
[24]
T. M. Chilimbi, B. Liblit, K. Mehra, A. V. Nori, and K. Vaswani. HOLMES: Effective statistical debugging via efficient path profiling. In Intl. Conf. on Software Engineering, 2009.
[25]
V. Chipounov and G. Candea. Enabling sophisticated analyses of x86 binaries with revgen. In Proceedings of the 7th Workshop on Hot Topics in System Dependability, 2011.
[26]
L. Ciortea, C. Zamfir, S. Bucur, V. Chipounov, and G. Candea. Cloud9: A software testing service. SIGOPS Oper. Syst. Rev., 2010.
[27]
W. Cui, M. Peinado, S. K. Cha, Y. Fratantonio, and V. P. Kemerlis. RETracer: Triaging crashes by reverse execution from partial memory dumps. In International Conference on Software Engineering, 2016.
[28]
J. Engblom. A review of reverse debugging. In Proceedings of the 2012 System, Software, SoC and Silicon Debug Conference, Vienna, Austria, 2012.
[29]
J. Gilchrist. Parallel BZIP2. http://compression.ca/pbzip2, 2017.
[30]
K. Glerum, K. Kinshumann, S. Greenberg, G. Aul, V. Orgovan, G. Nichols, D. Grant, G. Loihle, and G. Hunt. Debugging in the (very) large: Ten years of implementation and experience. In ACM Symp. on Operating Systems Principles, 2009.
[31]
GNU Foundation. GDB and reverse debugging. https://www.gnu.org/software/gdb/news/reversible.html, 2018.
[32]
P. Godefroid and N. Nagappan. Concurrency at Microsoft - An exploratory survey. In Intl. Conf. on Computer Aided Verification, 2008.
[33]
Google Inc. Chrome Error and Crash Reporting. https://support.google.com/chrome/answer/96817?hl=enl, 2017.
[34]
M. D. Hill and M. Xu. Racey: A stress test for deterministic execution. http://www.cs.wisc.edu/~markhill/racey.html.
[35]
S. Huang, B. Cai, and J. Huang. Towards production-run heisenbugs reproduction on commercial hardware. In Proceedings of the 2017 USENIX Annual Technical Conference, Santa Clara, CA, 2017. USENIX Association.
[36]
Intel Corporation. Intel 64 and IA-32 architectures software developer's manual, 2017.
[37]
G. Jin, A. Thakur, B. Liblit, and S. Lu. Instrumentation and sampling strategies for cooperative concurrency bug isolation. In International Conference on Object Oriented Programming Systems Languages and Applications, 2010.
[38]
J. A. Jones and M. J. Harrold. Empirical evaluation of the tarantula automatic fault-localization technique. In IEEE/ACM International Conference on Automated Software Engineering, 2005.
[39]
B. Kasikci, W. Cui, X. Ge, and B. Niu. Lazy diagnosis of in-production concurrency bugs. In ACM Symp. on Operating Systems Principles, Shanghai, China, October 2017.
[40]
B. Kasikci, B. Schubert, C. Pereira, G. Pokam, and G. Candea. Failure sketching: A technique for automated root cause diagnosis of in-production failures. In ACM Symp. on Operating Systems Principles, 2015.
[41]
B. R. Liblit. Cooperative Bug Isolation. PhD thesis, University of California, Berkeley, Dec. 2004.
[42]
R. Manevich, M. Sridharan, S. Adams, M. Das, and Z. Yang. PSE: Explaining program failures via postmortem static analysis. In Proceedings of the 12th ACM International Symposium on Foundations of Software Engineering, 2004.
[43]
A. Mashtizadeh, T. Garfinkel, D. Terei, D. Mazieres, and M. Rosenblum. Towards practical default-on multi-core record/replay. In Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, 2017.
[44]
Microsoft Corporation. Time travel debugging. https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/time-travel-debugging-overview.
[45]
Microsoft Corporation. Windows Debugger. https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/.
[46]
P. Montesinos, L. Ceze, and J. Torrellas. Delorean: Recording and deterministically replaying shared-memory multiprocessor execution efficiently. In Intl. Symp. on Computer Architecture, 2008.
[47]
P. Montesinos, M. Hicks, S. T. King, and J. Torrellas. Capo: A software-hardware interface for practical deterministic multiprocessor replay. In Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, 2009.
[48]
Mozilla Corporation. Mozilla rr. http://rr-project.org/, 2017.
[49]
S. Narayanasamy, G. Pokam, and B. Calder. Bugnet: Continuously recording program execution for deterministic replay debugging. In Intl. Symp. on Computer Architecture, 2005.
[50]
M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: efficient deterministic multithreading in software. SIGPLAN Not., 2009.
[51]
S. Park, W. Xiong, Z. Yin, R. Kaushik, K. H. Lee, S. Lu, and Y. Zhou. PRES: Probabilistic replay with execution sketching on multiprocessors. In ACM Symp. on Operating Systems Principles, 2009.
[52]
G. Pokam, C. Pereira, S. Hu, A.-R. Adl-Tabatabai, J. Gottschlich, J. Ha, and Y. Wu. Coreracer: A practical memory race recorder for multicore x86 tso processors. In IEEE/ACM International Symposium on Microarchitecture, 2011.
[53]
C. Rossi. Rapid release at massive scale. https://code.facebook.com/posts/270314900139291/rapid-release-at-massive-scale/, 2015.
[54]
Ubuntu. Ubuntu error. https://wiki.ubuntu.com/ErrorTracker, 2017.
[55]
Undo. UndoDB: The interactive reverse debugger for C/C++ on Linux and Android. https://undo.io/, 2018.
[56]
K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. Doubleplay: Parallelizing sequential logging and replay. In Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, 2011.
[57]
J. Xu, D. Mu, X. Xing, P. Liu, P. Chen, and B. Mao. Postmortem program analysis with hardware-enhanced post-crash artifacts. In Proceedings of the 26th USENIX Security Symposium, Vancouver, BC, 2017. USENIX Association.
[58]
J. Yang, T. Chen, M. Wu, Z. Xu, X. Liu, H. Lin, M. Yang, F. Long, L. Zhang, and L. Zhou. Modist: Transparent model checking of unmodified distributed systems. In Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation, 2009.
[59]
Z. Yin, D. Yuan, Y. Zhou, S. Pasupathy, and L. Bairavasundaram. How do fixes become bugs? In ACM SIGSOFT European Conference on Foundations of Software Engineering, 2011.
[60]
C. Zamfir and G. Candea. Execution synthesis: A technique for automated debugging. In ACM European Conf. on Computer Systems, 2010.
[61]
A. Zeller and R. Hildebrandt. Simplifying and isolating failure-inducing input. IEEE Transactions on Software Engineering, 2002.
[62]
T. Zhang, C. Jung, and D. Lee. ProRace: Practical data race detection for production use. In Proceedings of the 22nd International Conference on Architectural Support for Programming Languages and Operating Systems, 2017.

Cited By

View all
  • (2021)Hindsight logging for model trainingProceedings of the VLDB Endowment10.14778/3436905.343692514:4(682-693)Online publication date: 22-Feb-2021
  • (2021)Postmortem accurate IR-level state recovery for deployed concurrent programsACM SIGAPP Applied Computing Review10.1145/3493499.349350221:3(33-48)Online publication date: 20-Oct-2021
  • (2021)RippleProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00063(734-747)Online publication date: 14-Jun-2021
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
OSDI'18: Proceedings of the 13th USENIX conference on Operating Systems Design and Implementation
October 2018
815 pages
ISBN:9781931971478

Sponsors

  • NetApp
  • Google Inc.
  • NSF
  • Microsoft: Microsoft
  • Facebook: Facebook

In-Cooperation

Publisher

USENIX Association

United States

Publication History

Published: 08 October 2018

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2021)Hindsight logging for model trainingProceedings of the VLDB Endowment10.14778/3436905.343692514:4(682-693)Online publication date: 22-Feb-2021
  • (2021)Postmortem accurate IR-level state recovery for deployed concurrent programsACM SIGAPP Applied Computing Review10.1145/3493499.349350221:3(33-48)Online publication date: 20-Oct-2021
  • (2021)RippleProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00063(734-747)Online publication date: 14-Jun-2021
  • (2019)DEEPVSAProceedings of the 28th USENIX Conference on Security Symposium10.5555/3361338.3361462(1787-1804)Online publication date: 14-Aug-2019
  • (2019)You can't debug what you can't seeProceedings of the Workshop on Hot Topics in Operating Systems10.1145/3317550.3321428(163-169)Online publication date: 13-May-2019

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media