[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

psc2code: Denoising Code Extraction from Programming Screencasts

Published: 01 June 2020 Publication History

Abstract

Programming screencasts have become a pervasive resource on the Internet, which help developers learn new programming technologies or skills. The source code in programming screencasts is an important and valuable information for developers. But the streaming nature of programming screencasts (i.e., a sequence of screen-captured images) limits the ways that developers can interact with the source code in the screencasts. Many studies use the Optical Character Recognition (OCR) technique to convert screen images (also referred to as video frames) into textual content, which can then be indexed and searched easily. However, noisy screen images significantly affect the quality of source code extracted by OCR, for example, no-code frames (e.g., PowerPoint slides, web pages of API specification), non-code regions (e.g., Package Explorer view, Console view), and noisy code regions with code in completion suggestion popups. Furthermore, due to the code characteristics (e.g., long compound identifiers like ItemListener), even professional OCR tools cannot extract source code without errors from screen images. The noisy OCRed source code will negatively affect the downstream applications, such as the effective search and navigation of the source code content in programming screencasts.
In this article, we propose an approach named psc2code to denoise the process of extracting source code from programming screencasts. First, psc2code leverages the Convolutional Neural Network (CNN) based image classification to remove non-code and noisy-code frames. Then, psc2code performs edge detection and clustering-based image segmentation to detect sub-windows in a code frame, and based on the detected sub-windows, it identifies and crops the screen region that is most likely to be a code editor. Finally, psc2code calls the API of a professional OCR tool to extract source code from the cropped code regions and leverages the OCRed cross-frame information in the programming screencast and the statistical language model of a large corpus of source code to correct errors in the OCRed source code.
We conduct an experiment on 1,142 programming screencasts from YouTube. We find that our CNN-based image classification technique can effectively remove the non-code and noisy-code frames, which achieves an F1-score of 0.95 on the valid code frames. We also find that psc2code can significantly improve the quality of the OCRed source code by truly correcting about half of incorrectly OCRed words. Based on the source code denoised by psc2code, we implement two applications: (1) a programming screencast search engine; (2) an interaction-enhanced programming screencast watching tool. Based on the source code extracted from the 1,142 collected programming screencasts, our experiments show that our programming screencast search engine achieves the precision@5, 10, and 20 of 0.93, 0.81, and 0.63, respectively. We also conduct a user study of our interaction-enhanced programming screencast watching tool with 10 participants. This user study shows that our interaction-enhanced watching tool can help participants learn the knowledge in the programming video more efficiently and effectively.

References

[1]
Mohammad Alahmadi, Jonathan Hassel, Biswas Parajuli, Sonia Haiduc, and Piyush Kumar. 2018. Accurately predicting the location of code fragments in programming video tutorials using deep learning. In Proceedings of the 14th International Conference on Predictive Models and Data Analytics in Software Engineering. ACM, 2–11.
[2]
Ricardo Baeza-Yates, Berthier Ribeiro-Neto, et al. 1999. Modern Information Retrieval. Vol. 463. ACM Press, New York, NY.
[3]
Nikola Banovic, Tovi Grossman, Justin Matejka, and George Fitzmaurice. 2012. Waken: Reverse engineering usage information and interface structure from software videos. In Proceedings of the 25th ACM Symposium on User Interface Software and Technology. ACM, 83–92.
[4]
Lingfeng Bao, Jing Li, Zhenchang Xing, Xinyu Wang, and Bo Zhou. 2015. Reverse engineering time-series interaction data from screen-captured videos. In Proceedings of the IEEE 22nd International Conference on Software Analysis, Evolution and Reengineering (SANER’15). IEEE, 399–408.
[5]
Lingfeng Bao, Zhenchang Xing, Xin Xia, and David Lo. 2018. VT-revolution: Interactive programming video tutorial authoring and watching system. IEEE Trans. Softw. Eng. 45, 8 (2018), 823–838.
[6]
Lingfeng Bao, Deheng Ye, Zhenchang Xing, Xin Xia, and Xinyu Wang. 2015. Activityspace: A remembrance framework to support interapplication information needs. In Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering (ASE’15). IEEE, 864–869.
[7]
John Canny. 1986. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6 (1986), 679–698.
[8]
Morgan Dixon and James Fogarty. 2010. Prefab: Implementing advanced behaviors using pixel-based reverse engineering of interface structure. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1525–1534.
[9]
Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In Proceedings of the Knowledge Discovery and Data Mining (KDD’96), Vol. 96. 226–231.
[10]
Joseph L. Fleiss. 1971. Measuring nominal scale agreement among many raters.Psychol. Bull. 76, 5 (1971), 378.
[11]
GoogleVision 2018. Google Vision API. Retrieved from https://cloud.google.com/vision/.
[12]
Philip J. Guo, Juho Kim, and Rob Rubin. 2014. How video production affects student engagement: An empirical study of MOOC videos. In Proceedings of the 1st ACM Conference on Learning@ Scale. ACM, 41–50.
[13]
Kandarp Khandwala and Philip J. Guo. 2018. Codemotion: Expanding the design space of learner interactions with computer programming tutorial videos. In Proceedings of the 5th ACM Conference on Learning @ Scale. ACM, 57.
[14]
Laura MacLeod, Margaret-Anne Storey, and Andreas Bergen. 2015. Code, camera, action: How software developers document and share program knowledge using YouTube. In Proceedings of the IEEE 23rd International Conference on Program Comprehension. IEEE Press, 104–114.
[15]
Jiri Matas, Charles Galambos, and Josef Kittler. 2000. Robust detection of lines using the progressive probabilistic Hough transform. Comput. Vis. Image Underst. 78, 1 (2000), 119–137.
[16]
Toni-Jan Keith Palma Monserrat, Shengdong Zhao, Kevin McGee, and Anshul Vikram Pandey. 2013. NoteVideo: Facilitating navigation of blackboard-style lecture videos. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1139–1148.
[17]
Parisa Moslehi, Bram Adams, and Juergen Rilling. 2018. Feature location using crowd-based screencasts. In Proceedings of the 15th International Conference on Mining Software Repositories. ACM, 192–202.
[18]
OpenCV 2018. OpenCV. Retrieved from https://opencv.org/.
[19]
Jordan Ott, Abigail Atchison, Paul Harnack, Adrienne Bergh, and Erik Linstead. 2018. A deep learning approach to identifying source code in images and video. In Proceedings of the 15th International Conference on Mining Software Repositories, (MSR’18). 376–386.
[20]
Jordan Ott, Abigail Atchison, Paul Harnack, Natalie Best, Haley Anderson, Cristiano Firmani, and Erik Linstead. 2018. Learning lexical features of programming languages from imagery using convolutional neural networks. In Proceedings of the 26th Conference on Program Comprehension. ACM, 336–339.
[21]
Jordan Ott, Abigail Atchison, and Erik J. Linstead. 2019. Exploring the applicability of low-shot learning in mining software repositories. J. Big Data 6, 1 (2019), 35.
[22]
Luca Ponzanelli, Gabriele Bavota, Andrea Mocci, Massimiliano Di Penta, Rocco Oliveto, Mir Hasan, Barbara Russo, Sonia Haiduc, and Michele Lanza. 2016. Too long; didn’t watch!: Extracting relevant fragments from software development video tutorials. In Proceedings of the 38th International Conference on Software Engineering. ACM, 261–272.
[23]
Luca Ponzanelli, Gabriele Bavota, Andrea Mocci, Rocco Oliveto, Massimiliano Di Penta, Sonia Cristina Haiduc, Barbara Russo, and Michele Lanza. 2017. Automatic identification and classification of software development video tutorial fragments. IEEE Trans. Softw. Eng. 45, 5 (2017), 464–488.
[24]
Shivani Rao and Avinash Kak. 2011. Retrieval from software libraries for bug localization: A comparative study of generic and composite text models. In Proceedings of the 8th Working Conference on Mining Software Repositories. ACM, 43–52.
[25]
Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 779–788.
[26]
Hijung Valentina Shin, Floraine Berthouzoz, Wilmot Li, and Frédo Durand. 2015. Visual transcripts: Lecture notes from blackboard-style lecture videos. ACM Trans. Graph. 34, 6 (2015), 240.
[27]
Snagit. 2018. Snagit. Retrieved from https://opencv.org/.
[28]
Ahmed Tamrawi, Tung Thanh Nguyen, Jafar M. Al-Kofahi, and Tien N. Nguyen. 2011. Fuzzy set and cache-based approach for bug triaging. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering. ACM, 365–375.
[29]
Tesseract 2018. Tesseract. Retrieved from https://github.com/tesseract-ocr/tesseract.
[30]
Frank Wilcoxon. 1945. Individual comparisons by ranking methods. Biomet. Bull. 1, 6 (1945), 80–83.
[31]
Da-Chun Wu and Wen-Hsiang Tsai. 2000. Spatial-domain image hiding using image differencing. IEE Proc. Vis. Image Sig. Proc. 147, 1 (2000), 29–37.
[32]
Xin Xia and David Lo. 2017. An effective change recommendation approach for supplementary bug fixes. Autom. Softw. Eng. 24, 2 (2017), 455–498.
[33]
Xin Xia, David Lo, Emad Shihab, Xinyu Wang, and Bo Zhou. 2015. Automatic, high accuracy prediction of reopened bugs. Autom. Softw. Eng. 22, 1 (2015), 75–109.
[34]
Shir Yadid and Eran Yahav. 2016. Extracting code from programming tutorial videos. In Proceedings of the ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software. ACM, 98–111.
[35]
Tom Yeh, Tsung-Hsiang Chang, and Robert C. Miller. 2009. Sikuli: Using GUI screenshots for search and automation. In Proceedings of the 22nd ACM Symposium on User Interface Software and Technology. ACM, 183–192.
[36]
Jian Zhou, Hongyu Zhang, and David Lo. 2012. Where should the bugs be fixed? More accurate information retrieval-based bug localization based on bug reports. In Proceedings of the 34th International Conference on Software Engineering (ICSE’12). IEEE, 14–24.
[37]
C. Lawrence Zitnick and Piotr Dollár. 2014. Edge boxes: Locating object proposals from edges. In Proceedings of the European Conference on Computer Vision. Springer, 391–405.

Cited By

View all
  • (2025)MITU: Locating relevant tutorial fragments of APIs with multi-source API knowledgeJournal of Systems and Software10.1016/j.jss.2024.112296222(112296)Online publication date: Apr-2025
  • (2024)Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language ModelsMathematics10.3390/math1207103612:7(1036)Online publication date: 30-Mar-2024
  • (2024)Breaking Down Barriers: A Survey of Screenshot-to-Code Translation Tools and StrategiesSSRN Electronic Journal10.2139/ssrn.4935453Online publication date: 2024
  • Show More Cited By

Index Terms

  1. psc2code: Denoising Code Extraction from Programming Screencasts

    Recommendations

    Comments

    Please enable JavaScript to view thecomments powered by Disqus.

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Software Engineering and Methodology
    ACM Transactions on Software Engineering and Methodology  Volume 29, Issue 3
    July 2020
    292 pages
    ISSN:1049-331X
    EISSN:1557-7392
    DOI:10.1145/3403667
    • Editor:
    • Mauro Pezzè
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 June 2020
    Online AM: 07 May 2020
    Accepted: 01 April 2020
    Revised: 01 February 2020
    Received: 01 January 2019
    Published in TOSEM Volume 29, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Programming videos
    2. code search
    3. deep learning

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • NSFC
    • Australian Research Council’s Discovery Early Career Researcher Award (DECRA)
    • ANU-Data61 Collaborative Researh
    • National Key Research and Development Program of China

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)59
    • Downloads (Last 6 weeks)14
    Reflects downloads up to 12 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)MITU: Locating relevant tutorial fragments of APIs with multi-source API knowledgeJournal of Systems and Software10.1016/j.jss.2024.112296222(112296)Online publication date: Apr-2025
    • (2024)Optimizing OCR Performance for Programming Videos: The Role of Image Super-Resolution and Large Language ModelsMathematics10.3390/math1207103612:7(1036)Online publication date: 30-Mar-2024
    • (2024)Breaking Down Barriers: A Survey of Screenshot-to-Code Translation Tools and StrategiesSSRN Electronic Journal10.2139/ssrn.4935453Online publication date: 2024
    • (2024)Keeping Deep Learning Models in Check: A History-Based Approach to Mitigate OverfittingIEEE Access10.1109/ACCESS.2024.340254312(70676-70689)Online publication date: 2024
    • (2024)Guidelines for using financial incentives in software-engineering experimentationEmpirical Software Engineering10.1007/s10664-024-10517-w29:5Online publication date: 10-Aug-2024
    • (2023)VID2XML: Automatic Extraction of a Complete XML Data From Mobile Programming ScreencastsIEEE Transactions on Software Engineering10.1109/TSE.2022.318889849:4(1726-1740)Online publication date: 1-Apr-2023
    • (2023)Machine/Deep Learning for Software Engineering: A Systematic Literature ReviewIEEE Transactions on Software Engineering10.1109/TSE.2022.317334649:3(1188-1231)Online publication date: 1-Mar-2023
    • (2023)SeeHow: Workflow Extraction from Programming Screencasts through Action-Aware Video Analytics2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE)10.1109/ICSE48619.2023.00165(1946-1957)Online publication date: May-2023
    • (2023)Improving Code Extraction from Coding Screencasts Using a Code-Aware Encoder-Decoder Model2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE)10.1109/ASE56229.2023.00184(1492-1504)Online publication date: 11-Sep-2023
    • (2023)Leveraging Stack Overflow to detect relevant tutorial fragments of APIsEmpirical Software Engineering10.1007/s10664-022-10235-128:1Online publication date: 1-Jan-2023
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media