[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3210459.3210473acmotherconferencesArticle/Chapter ViewAbstractPublication PageseaseConference Proceedingsconference-collections
short-paper

An Inception Architecture-Based Model for Improving Code Readability Classification

Published: 28 June 2018 Publication History

Abstract

The process of classifying a piece of source code into a Readable or Unreadable class is referred to as Code Readability Classification. To build accurate classification models, existing studies focus on handcrafting features from different aspects that intuitively seem to correlate with code readability, and then exploring various machine learning algorithms based on the newly proposed features. On the contrary, our work opens up a new way to tackle the problem by using the technique of deep learning. Specifically, we propose IncepCRM, a novel model based on the Inception architecture that can learn multi-scale features automatically from source code with little manual intervention. We apply the information of human annotators as the auxiliary input for training IncepCRM and empirically verify the performance of IncepCRM on three publicly available datasets. The results show that: 1) Annotator information is beneficial for model performance as confirmed by robust statistical tests (i.e., the Brunner-Munzel test and Cliff's delta); 2) IncepCRM can achieve an improved accuracy against previously reported models across all datasets. The findings of our study confirm the feasibility and effectiveness of deep learning for code readability classification.

References

[1]
Raymond P L Buse and Westley R. Weimer. 2010. Learning a Metric for Code Readability. IEEE Transactions on Software Engineering 36, 4 (jul 2010), 546--558.
[2]
Norman Cliff. 1993. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin 114, 3 (1993), 494--509.
[3]
Alexis Conneau, Holger Schwenk, Loïc Barrault, and Yann Lecun. 2017. Very deep convolutional networks for text classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Vol. 1. 1107--1116.
[4]
Ermira Daka, José Campos, Gordon Fraser, Jonathan Dorn, and Westley Weimer. 2015. Modeling readability to improve unit tests. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2015. ACM Press, New York, New York, USA, 107--118.
[5]
Hoa Khanh Dam, Truyen Tran, John Grundy, and Aditya Ghose. 2016. DeepSoft: a vision for a deep model of software. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2016, Vol. 1691. ACM Press, New York, New York, USA, 944--947. arXiv:1602.05561
[6]
Pieter-Tjerk de Boer, Dirk P. Kroese, Shie Mannor, and Reuven Y. Rubinstein. 2005. A Tutorial on the Cross-Entropy Method. Annals of Operations Research 134, 1 (feb 2005), 19--67.
[7]
Jonathan Dorn. 2012. A General Software Readability Model. MCS Thesis avairable from (http://www.cs.virginia.edu/~weimer/students/dorn-mcs-paper.pdf) (2012).
[8]
Rudolph Flesch. 1948. A new readability yardstick. Journal of applied psychology 32, 3 (1948), 221.
[9]
Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API learning. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2016. ACM Press, New York, New York, USA, 631--642. arXiv:1508.06655
[10]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 770--778. arXiv:1512.03385
[11]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. (2014), 1--15. arXiv:1412.6980
[12]
Barbara Kitchenham, Lech Madeyski, David Budgen, Jacky Keung, Pearl Brereton, Stuart Charters, Shirley Gibbs, and Amnart Pohthong. 2017. Robust Statistical Methods for Empirical Software Engineering. Empirical Software Engineering 22, 2 (apr 2017), 579--630.
[13]
Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher. 2016. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing. In Proceedings of The 33rd International Conference on Machine Learning, Vol. 48. PMLR, 1378--1387. http://proceedings.mlr.press/v48/kumar16.html
[14]
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444. arXiv:1312.6184v5
[15]
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1989. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation 1, 4 (dec 1989), 541--551.
[16]
Yann LeCun, Bernhard E Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne E Hubbard, and Lawrence D Jackel. 1990. Handwritten digit recognition with a back-propagation network. In Advances in neural information processing systems. 396--404.
[17]
Taek Lee, Jung Been Lee, and Hoh Peter In. 2013. A study of different coding styles affecting code readability. International Journal of Software Engineering and its Applications 7, 5 (2013), 413--422.
[18]
Rensis Likert. 1932. A technique for the measurement of attitudes. Archives of psychology (1932).
[19]
G Harry Mc Laughlin. 1969. SMOG grading-a new readability formula. Journal of reading 12, 8 (1969), 639--646.
[20]
Karin Neubert and Edgar Brunner. 2007. A studentized permutation test for the non-parametric Behrens-Fisher problem. Computational Statistics & Data Analysis 51, 10 (jun 2007), 5192--5204.
[21]
Daryl Posnett, Abram Hindle, and Premkumar Devanbu. 2011. A simpler model of software readability. In Proceeding of the 8th working conference on Mining software repositories - MSR '11, Vol. 11. ACM Press, New York, New York, USA, 73.
[22]
Jeanine Romano, Jeffrey D Kromrey, Jesse Coraggio, and Jeff Skowronek. 2006. Appropriate statistics for ordinal level data: Should we really be using t-test and cohen's d for evaluating group differences on the NSSE and other surveys. In annual meeting of the Florida Association of Institutional Research. 1--33.
[23]
Simone Scalabrino, Mario Linares-Vasquez, Denys Poshyvanyk, and Rocco Oliveto. 2016. Improving code readability models with textual features. In 2016 IEEE 24th International Conference on Program Comprehension (ICPC), Vol. 2016-July. IEEE, 1--10.
[24]
Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. Science (New York, N.Y.) 313, 5786 (sep 2014), 504--7. arXiv:1409.1556
[25]
Christian Szegedy. {n. d.}. Scene classification with inception-7.
[26]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1--9. arXiv:1409.4842
[27]
Yahya Tashtoush, Zeinab Odat, Izzat Alsmadi, and Maryan Yatim. 2013. Impact of Programming Features on Code Readability. International Journal of Software Engineering and Its Applications 7, 6 (nov 2013), 441--458.
[28]
Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically learning semantic features for defect prediction. In Proceedings of the 38th International Conference on Software Engineering - ICSE '16, Vol. 14--22-May-. ACM Press, New York, New York, USA, 297--308.

Cited By

View all
  • (2024)Exploring the Impact of Vocabulary Techniques on Code Completion: A Comparative ApproachInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402350068734:05(705-727)Online publication date: 13-Jan-2024
  • (2024)Classification of Jengkol (Archidendron Pauciflorum) Varieties using Deep Learning2024 20th IEEE International Colloquium on Signal Processing & Its Applications (CSPA)10.1109/CSPA60979.2024.10525630(137-142)Online publication date: 1-Mar-2024
  • (2024)Classification of Ficus Carica Variants using Transfer Learning2024 20th IEEE International Colloquium on Signal Processing & Its Applications (CSPA)10.1109/CSPA60979.2024.10525305(143-148)Online publication date: 1-Mar-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
EASE '18: Proceedings of the 22nd International Conference on Evaluation and Assessment in Software Engineering 2018
June 2018
223 pages
ISBN:9781450364034
DOI:10.1145/3210459
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • The University of Canterbury

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Code Readability Classification
  2. Deep Learning
  3. Empirical Software Engineering
  4. Inception Architecture

Qualifiers

  • Short-paper
  • Research
  • Refereed limited

Conference

EASE'18

Acceptance Rates

Overall Acceptance Rate 71 of 232 submissions, 31%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Exploring the Impact of Vocabulary Techniques on Code Completion: A Comparative ApproachInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402350068734:05(705-727)Online publication date: 13-Jan-2024
  • (2024)Classification of Jengkol (Archidendron Pauciflorum) Varieties using Deep Learning2024 20th IEEE International Colloquium on Signal Processing & Its Applications (CSPA)10.1109/CSPA60979.2024.10525630(137-142)Online publication date: 1-Mar-2024
  • (2024)Classification of Ficus Carica Variants using Transfer Learning2024 20th IEEE International Colloquium on Signal Processing & Its Applications (CSPA)10.1109/CSPA60979.2024.10525305(143-148)Online publication date: 1-Mar-2024
  • (2024)An eye tracking study assessing source code readability rules for program comprehensionEmpirical Software Engineering10.1007/s10664-024-10532-x29:6Online publication date: 5-Oct-2024
  • (2023)Classification of Sweet Potato Leaf Variants using Transfer Learning2023 9th International Conference on Wireless and Telematics (ICWT)10.1109/ICWT58823.2023.10335271(1-6)Online publication date: 6-Jul-2023
  • (2023)Raw Coffee Bean Classification for Roasting Suitability Assessment Using Transfer Learning2023 IEEE 11th Conference on Systems, Process & Control (ICSPC)10.1109/ICSPC59664.2023.10419990(1-6)Online publication date: 16-Dec-2023
  • (2023)Classification of Lettuce Leaf Variants Using Transfer Learning2023 3rd International Conference on Electronic and Electrical Engineering and Intelligent System (ICE3IS)10.1109/ICE3IS59323.2023.10335452(349-353)Online publication date: 9-Aug-2023
  • (2023)Identification of Sukun (Artocarpus altilis) and Kluwih (Artocarpus camansi) Leaves using Transfer Learning2023 IEEE 9th International Conference on Computing, Engineering and Design (ICCED)10.1109/ICCED60214.2023.10425317(1-6)Online publication date: 7-Nov-2023
  • (2023)Leaf Classification of Jackfruit (Artocarpus heterophyllus) and Cempedak (Radermachera integra) Using Deep Learning2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT56998.2023.10308171(1-6)Online publication date: 6-Jul-2023
  • (2023)Boosting source code suggestion with self-supervised Transformer Gated HighwayJournal of Systems and Software10.1016/j.jss.2022.111553196:COnline publication date: 1-Feb-2023
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media