More Web Proxy on the site http://driver.im/

short-paper

An Inception Architecture-Based Model for Improving Code Readability Classification

Authors:

Solomon Mensah,

Xiupei MeiAuthors Info & Claims

EASE '18: Proceedings of the 22nd International Conference on Evaluation and Assessment in Software Engineering 2018

Pages 139 - 144

https://doi.org/10.1145/3210459.3210473

Published: 28 June 2018 Publication History

Abstract

The process of classifying a piece of source code into a Readable or Unreadable class is referred to as Code Readability Classification. To build accurate classification models, existing studies focus on handcrafting features from different aspects that intuitively seem to correlate with code readability, and then exploring various machine learning algorithms based on the newly proposed features. On the contrary, our work opens up a new way to tackle the problem by using the technique of deep learning. Specifically, we propose IncepCRM, a novel model based on the Inception architecture that can learn multi-scale features automatically from source code with little manual intervention. We apply the information of human annotators as the auxiliary input for training IncepCRM and empirically verify the performance of IncepCRM on three publicly available datasets. The results show that: 1) Annotator information is beneficial for model performance as confirmed by robust statistical tests (i.e., the Brunner-Munzel test and Cliff's delta); 2) IncepCRM can achieve an improved accuracy against previously reported models across all datasets. The findings of our study confirm the feasibility and effectiveness of deep learning for code readability classification.

References

[1]

Raymond P L Buse and Westley R. Weimer. 2010. Learning a Metric for Code Readability. IEEE Transactions on Software Engineering 36, 4 (jul 2010), 546--558.

Digital Library

[2]

Norman Cliff. 1993. Dominance statistics: Ordinal analyses to answer ordinal questions. Psychological Bulletin 114, 3 (1993), 494--509.

[3]

Alexis Conneau, Holger Schwenk, Loïc Barrault, and Yann Lecun. 2017. Very deep convolutional networks for text classification. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Vol. 1. 1107--1116.

[4]

Ermira Daka, José Campos, Gordon Fraser, Jonathan Dorn, and Westley Weimer. 2015. Modeling readability to improve unit tests. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering - ESEC/FSE 2015. ACM Press, New York, New York, USA, 107--118.

Digital Library

[5]

Hoa Khanh Dam, Truyen Tran, John Grundy, and Aditya Ghose. 2016. DeepSoft: a vision for a deep model of software. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2016, Vol. 1691. ACM Press, New York, New York, USA, 944--947. arXiv:1602.05561

Digital Library

[6]

Pieter-Tjerk de Boer, Dirk P. Kroese, Shie Mannor, and Reuven Y. Rubinstein. 2005. A Tutorial on the Cross-Entropy Method. Annals of Operations Research 134, 1 (feb 2005), 19--67.

[7]

Jonathan Dorn. 2012. A General Software Readability Model. MCS Thesis avairable from (http://www.cs.virginia.edu/~weimer/students/dorn-mcs-paper.pdf) (2012).

[8]

Rudolph Flesch. 1948. A new readability yardstick. Journal of applied psychology 32, 3 (1948), 221.

[9]

Xiaodong Gu, Hongyu Zhang, Dongmei Zhang, and Sunghun Kim. 2016. Deep API learning. In Proceedings of the 2016 24th ACM SIGSOFT International Symposium on Foundations of Software Engineering - FSE 2016. ACM Press, New York, New York, USA, 631--642. arXiv:1508.06655

Digital Library

[10]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 770--778. arXiv:1512.03385

[11]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. (2014), 1--15. arXiv:1412.6980

[12]

Barbara Kitchenham, Lech Madeyski, David Budgen, Jacky Keung, Pearl Brereton, Stuart Charters, Shirley Gibbs, and Amnart Pohthong. 2017. Robust Statistical Methods for Empirical Software Engineering. Empirical Software Engineering 22, 2 (apr 2017), 579--630.

Digital Library

[13]

Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, and Richard Socher. 2016. Ask Me Anything: Dynamic Memory Networks for Natural Language Processing. In Proceedings of The 33rd International Conference on Machine Learning, Vol. 48. PMLR, 1378--1387. http://proceedings.mlr.press/v48/kumar16.html

Digital Library

[14]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. 2015. Deep learning. Nature 521, 7553 (2015), 436--444. arXiv:1312.6184v5

[15]

Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. 1989. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Computation 1, 4 (dec 1989), 541--551.

Digital Library

[16]

Yann LeCun, Bernhard E Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne E Hubbard, and Lawrence D Jackel. 1990. Handwritten digit recognition with a back-propagation network. In Advances in neural information processing systems. 396--404.

Digital Library

[17]

Taek Lee, Jung Been Lee, and Hoh Peter In. 2013. A study of different coding styles affecting code readability. International Journal of Software Engineering and its Applications 7, 5 (2013), 413--422.

[18]

Rensis Likert. 1932. A technique for the measurement of attitudes. Archives of psychology (1932).

[19]

G Harry Mc Laughlin. 1969. SMOG grading-a new readability formula. Journal of reading 12, 8 (1969), 639--646.

[20]

Karin Neubert and Edgar Brunner. 2007. A studentized permutation test for the non-parametric Behrens-Fisher problem. Computational Statistics & Data Analysis 51, 10 (jun 2007), 5192--5204.

Digital Library

[21]

Daryl Posnett, Abram Hindle, and Premkumar Devanbu. 2011. A simpler model of software readability. In Proceeding of the 8th working conference on Mining software repositories - MSR '11, Vol. 11. ACM Press, New York, New York, USA, 73.

Digital Library

[22]

Jeanine Romano, Jeffrey D Kromrey, Jesse Coraggio, and Jeff Skowronek. 2006. Appropriate statistics for ordinal level data: Should we really be using t-test and cohen's d for evaluating group differences on the NSSE and other surveys. In annual meeting of the Florida Association of Institutional Research. 1--33.

[23]

Simone Scalabrino, Mario Linares-Vasquez, Denys Poshyvanyk, and Rocco Oliveto. 2016. Improving code readability models with textual features. In 2016 IEEE 24th International Conference on Program Comprehension (ICPC), Vol. 2016-July. IEEE, 1--10.

[24]

Karen Simonyan and Andrew Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. Science (New York, N.Y.) 313, 5786 (sep 2014), 504--7. arXiv:1409.1556

[25]

Christian Szegedy. {n. d.}. Scene classification with inception-7.

[26]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 1--9. arXiv:1409.4842

[27]

Yahya Tashtoush, Zeinab Odat, Izzat Alsmadi, and Maryan Yatim. 2013. Impact of Programming Features on Code Readability. International Journal of Software Engineering and Its Applications 7, 6 (nov 2013), 441--458.

[28]

Song Wang, Taiyue Liu, and Lin Tan. 2016. Automatically learning semantic features for defect prediction. In Proceedings of the 38th International Conference on Software Engineering - ICSE '16, Vol. 14--22-May-. ACM Press, New York, New York, USA, 297--308.

Digital Library

Cited By

Hussain YHuang ZZhou YKhan I(2024)Exploring the Impact of Vocabulary Techniques on Code Completion: A Comparative ApproachInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402350068734:05(705-727)Online publication date: 13-Jan-2024
https://doi.org/10.1142/S0218194023500687
Pratondo ANashrullah AIbrahim FWirabudi A(2024)Classification of Jengkol (Archidendron Pauciflorum) Varieties using Deep Learning2024 20th IEEE International Colloquium on Signal Processing & Its Applications (CSPA)10.1109/CSPA60979.2024.10525630(137-142)Online publication date: 1-Mar-2024
https://doi.org/10.1109/CSPA60979.2024.10525630
Pratondo APasha MZhafran MFirdaus MSyah A(2024)Classification of Ficus Carica Variants using Transfer Learning2024 20th IEEE International Colloquium on Signal Processing & Its Applications (CSPA)10.1109/CSPA60979.2024.10525305(143-148)Online publication date: 1-Mar-2024
https://doi.org/10.1109/CSPA60979.2024.10525305
Show More Cited By

Index Terms

An Inception Architecture-Based Model for Improving Code Readability Classification
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Neural networks
2. Software and its engineering
  1. Software creation and management
    1. Software post-development issues
      1. Maintaining software
  2. Software organization and properties
    1. Extra-functional properties
      1. Software reliability

Recommendations

A graph-based code representation method to improve code readability classification
AbstractContext
Code readability is crucial for developers since it is closely related to code maintenance and affects developers’ work efficiency. Code readability classification refers to the source code being classified as pre-defined certain levels ...
Towards using visual, semantic and structural features to improve code readability classification
Abstract Context:
Code readability, which correlates strongly with software quality, plays a critical role in software maintenance and evolvement. Although existing deep learning-based code readability models have reached a rather ...
Highlights
- A novel method is proposed to reserve visual, semantic and structural information.
An Ensemble Model for 2D-data Classification based on Classical & Deep Learning Classifier
ICMAI '24: Proceedings of the 2024 9th International Conference on Mathematics and Artificial Intelligence

Ensemble learning is one of the most studied topics in classification domain, it is proven that ensemble learning is effective for classification tasks with multiple labels. Nevertheless, achieving accurate predictions for data with varying dimensions ...

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences

EASE '18: Proceedings of the 22nd International Conference on Evaluation and Assessment in Software Engineering 2018

June 2018

223 pages

ISBN:9781450364034

DOI:10.1145/3210459

General Chair:
Austen Rainer,
Program Chairs:
Stephen G. MacDonell,
Jacky Keung

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

The University of Canterbury

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Short-paper
Research
Refereed limited

Conference

EASE'18

EASE'18: 22nd International Conference on Evaluation and Assessment in Software Engineering 2018

June 28 - 29, 2018

Christchurch, New Zealand

Acceptance Rates

Overall Acceptance Rate 71 of 232 submissions, 31%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
178
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)1

Reflects downloads up to 13 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Hussain YHuang ZZhou YKhan I(2024)Exploring the Impact of Vocabulary Techniques on Code Completion: A Comparative ApproachInternational Journal of Software Engineering and Knowledge Engineering10.1142/S021819402350068734:05(705-727)Online publication date: 13-Jan-2024
https://doi.org/10.1142/S0218194023500687
Pratondo ANashrullah AIbrahim FWirabudi A(2024)Classification of Jengkol (Archidendron Pauciflorum) Varieties using Deep Learning2024 20th IEEE International Colloquium on Signal Processing & Its Applications (CSPA)10.1109/CSPA60979.2024.10525630(137-142)Online publication date: 1-Mar-2024
https://doi.org/10.1109/CSPA60979.2024.10525630
Pratondo APasha MZhafran MFirdaus MSyah A(2024)Classification of Ficus Carica Variants using Transfer Learning2024 20th IEEE International Colloquium on Signal Processing & Its Applications (CSPA)10.1109/CSPA60979.2024.10525305(143-148)Online publication date: 1-Mar-2024
https://doi.org/10.1109/CSPA60979.2024.10525305
Park KJohnson JPeterson CYedla NBaysinger IAponte JSharif B(2024)An eye tracking study assessing source code readability rules for program comprehensionEmpirical Software Engineering10.1007/s10664-024-10532-x29:6Online publication date: 5-Oct-2024
https://doi.org/10.1007/s10664-024-10532-x
Pratondo AIsmail Sujana ASusanti FRoedavan RBudianto A(2023)Classification of Sweet Potato Leaf Variants using Transfer Learning2023 9th International Conference on Wireless and Telematics (ICWT)10.1109/ICWT58823.2023.10335271(1-6)Online publication date: 6-Jul-2023
https://doi.org/10.1109/ICWT58823.2023.10335271
Pratondo AZani TNovianty APudjoatmodjo B(2023)Raw Coffee Bean Classification for Roasting Suitability Assessment Using Transfer Learning2023 IEEE 11th Conference on Systems, Process & Control (ICSPC)10.1109/ICSPC59664.2023.10419990(1-6)Online publication date: 16-Dec-2023
https://doi.org/10.1109/ICSPC59664.2023.10419990
Pratondo ANovianty AFauzi H(2023)Classification of Lettuce Leaf Variants Using Transfer Learning2023 3rd International Conference on Electronic and Electrical Engineering and Intelligent System (ICE3IS)10.1109/ICE3IS59323.2023.10335452(349-353)Online publication date: 9-Aug-2023
https://doi.org/10.1109/ICE3IS59323.2023.10335452
Pratondo AIsmail NSujana ANovianty A(2023)Identification of Sukun (Artocarpus altilis) and Kluwih (Artocarpus camansi) Leaves using Transfer Learning2023 IEEE 9th International Conference on Computing, Engineering and Design (ICCED)10.1109/ICCED60214.2023.10425317(1-6)Online publication date: 7-Nov-2023
https://doi.org/10.1109/ICCED60214.2023.10425317
Pratondo ABudiman GUtoro RRizqyawan MRudawan RNovianty A(2023)Leaf Classification of Jackfruit (Artocarpus heterophyllus) and Cempedak (Radermachera integra) Using Deep Learning2023 14th International Conference on Computing Communication and Networking Technologies (ICCCNT)10.1109/ICCCNT56998.2023.10308171(1-6)Online publication date: 6-Jul-2023
https://doi.org/10.1109/ICCCNT56998.2023.10308171
Hussain YHuang ZZhou YWang S(2023)Boosting source code suggestion with self-supervised Transformer Gated HighwayJournal of Systems and Software10.1016/j.jss.2022.111553196:COnline publication date: 1-Feb-2023
https://dl.acm.org/doi/10.1016/j.jss.2022.111553
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents