[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3510454.3517068acmconferencesArticle/Chapter ViewAbstractPublication PagesicseConference Proceedingsconference-collections
research-article

Topology of the documentation landscape

Published: 19 October 2022 Publication History

Abstract

Every software system (ideally) comes with one or more forms of documentation. Beside source code comments, other structured and unstructured sources (e.g., design documents, API references, wikis, usage examples, tutorials) constitute critical assets. Cloud-based repositories for collaborative development (e.g., GitHub, Bitbucket, GitLab) provide many functionalities to create, persist, and version documentation artifacts. On the other hand, the last decade has seen the rise of rich instant messaging clients used as global software community platforms (e.g., Slack, Discord). Although completely detached from a specific versioning system or development workflow, they allow developers to discuss implementation issues, report bugs, and, in general, interact with one another.
We refer to this evolving heterogeneous collection of information sources and documentation artifacts as the documentation landscape. It is important to have tools to extract information from these sources and integrate them in a topological visualization, to ease comprehension of a software system. How can we automatically generate this topology? How can we link elements in the topology back to the source code they refer to?
The goal of this PhD research is to automatically mine the documentation landscape of a system by disclosing pieces of information to aid, for example, in program maintenance tasks. We present our classification of possible documentation sources. The long term vision is to provide a domain model of the documentation landscape to build, visualize, and explore its instances for real software systems and evaluate the usefulness of the metaphor we propose.

References

[1]
Nahla Abid, Natalia Dragan, Michael L. Collard, and Jonathan I. Maletic. 2017. The Evaluation of an Approach for Automatic Generated Documentation. In Proceedings of ICSME 2017 (International Conference on Software Maintenance and Evolution). IEEE, 307--317.
[2]
Emad Aghajani, Csaba Nagy, Mario Linares-Vásquez, Laura Moreno, Gabriele Bavota, Michele Lanza, and David C. Shepherd. 2020. Software Documentation: The Practitioners' Perspective. In Proceedings of ICSE 2020 (International Conference on Software Engineering). ACM, 590--601.
[3]
Emad Aghajani, Csaba Nagy, Olga Lucero Vega-Márquez, Mario Linares-Vásquez, Laura Moreno, Gabriele Bavota, and Michele Lanza. 2019. Software Documentation Issues Unveiled. In Proceedings of ICSE 2019 (International Conference on Software Engineering). IEEE/ACM, 1199--1210.
[4]
Maurício Aniche, Christoph Treude, Igor Steinmacher, Igor Wiese, Gustavo Pinto, Margaret-Anne Storey, and Marco Aurélio Gerosa. 2018. How Modern News Aggregators Help Development Communities Shape and Share Knowledge. In Proceedings of ICSE 2018 (International Conference on Software Engineering). ACM, 499--510.
[5]
Tegawendé F. Bissyandé, David Lo, Lingxiao Jiang, Laurent Réveillère, Jacques Klein, and Yves Le Traon. 2013. Got issues? Who cares about it? A large scale investigation of issue trackers from GitHub. In Proceedings of ISSRE 2013 (International Symposium on Software Reliability Engineering). IEEE, 188--197.
[6]
Joshua Charles Campbell, Chenlei Zhang, Zhen Xu, Abram Hindle, and James Miller. 2013. Deficient Documentation Detection a Methodology to Locate Deficient Project Documentation Using Topic Analysis. In Proceedings of MSR 2013 (Working Conference on Mining Software Repositories). IEEE, 57--60.
[7]
Jie-Cherng Chen and Sun-Jen Huang. 2009. An Empirical Analysis of the Impact of Software Development Problem Factors on Software Maintainability. Journal of Systems and Software 82, 6 (2009), 981--992.
[8]
Laura Dabbish, Colleen Stuart, Jason Tsay, and Jim Herbsleb. 2012. Social Coding in GitHub: Transparency and Collaboration in an Open Software Repository. In Proceedings of CSCW 2012 (Conference on Computer Supported Cooperative Work). ACM, 1277--1286.
[9]
Barthélémy Dagenais and Martin P Robillard. 2010. Creating and Evolving Developer Documentation: Understanding the Decisions of Open Source Contributors. In Proceedings of FSE 2010 (International Symposium on Foundations of Software Engineering). ACM, 127--136.
[10]
Andrea Di Sorbo, Sebastiano Panichella, Corrado A. Visaggio, Massimiliano Di Penta, Gerardo Canfora, and Harald C. Gall. 2021. Exploiting Natural Language Structures in Software Informal Documentation. IEEE Transactions on Software Engineering 47, 8 (2021), 1587--1604.
[11]
Osama Ehsan, Safwat Hassan, Mariam El Mezouar, and Ying Zou. 2020. An Empirical Study of Developer Discussions in the Gitter Platform. Transactions on Software Engineering and Methodology 30, 1 (2020), 1--39.
[12]
Aron Fiechter, Roberto Minelli, Csaba Nagy, and Michele Lanza. 2021. Visualizing GitHub Issues. In Proceedings of VISSOFT 2021 (Working Conference on Software Visualization). IEEE, 155--159.
[13]
Andrew Forward and Timothy C Lethbridge. 2002. The Relevance of Software Documentation, Tools and Technologies: A Survey. In Proceedings of DocEng 2002 (Symposium on Document Engineering). ACM, 26--33.
[14]
Golara Garousi, Vahid Garousi-Yusifoğlu, Guenther Ruhe, Junji Zhi, Mahmoud Moussavi, and Brian Smith. 2015. Usage and Usefulness of Technical Software Documentation: An Industrial Case Study. Information and Software Technology 57 (2015), 664--682.
[15]
Mehdi Golzadeh, Alexandre Decan, Damien Legay, and Tom Mens. 2021. A Ground-Truth Dataset and Classification Model for Detecting Bots in GitHub Issue and PR Comments. Journal of Systems and Software 175 (2021), 110911.
[16]
Hideaki Hata, Nicole Novielli, Sebastian Baltes, Raula Gaikovina Kula, and Christoph Treude. 2021. GitHub Discussions: An Exploratory Study of Early Adoption. arXiv:2102.05230
[17]
Rafael Kallis, Andrea Di Sorbo, Gerardo Canfora, and Sebastiano Panichella. 2021. Predicting Issue types on GitHub. Science of Computer Programming 205 (2021), 102598.
[18]
Riivo Kikas, Marlon Dumas, and Dietmar Pfahl. 2016. Using Dynamic and Contextual Features to Predict Issue Lifetime in GitHub Projects. In Proceedings of MSR 2016 (Working Conference on Mining Software Repositories). IEEE/ACM, 291--302.
[19]
Alexander LeClair, Sakib Haque, Lingfei Wu, and Collin McMillan. 2020. Improved Code Summarization via a Graph Neural Network. In Proceedings of ICPC 2020 (International Conference on Program Comprehension). ACM, 184--195.
[20]
Bo Lin, Shangwen Wang, Kui Liu, Xiaoguang Mao, and Tegawendé F. Bissyandé. 2021. Automated Comment Update: How Far are We?. In Proceedings of ICPC 2021 (International Conference on Program Comprehension). IEEE/ACM, 36--46.
[21]
Bin Lin, Alexey Zagalsky, Margaret-Anne Storey, and Alexander Serebrenik. 2016. Why Developers Are Slacking Off: Understanding How Software Teams Use Slack. In Proceedings of CSCW/SCC 2016 (Conference on Computer Supported Cooperative Work and Social Computing Companion). ACM, 333--336.
[22]
Christian D. Newman, Natalia Dragan, Michael L. Collard, Jonathan I. Maletic, Michael J. Decker, Drew T. Guarnera, and Nahla Abid. 2018. Automatically Generating Natural Language Documentation for Methods. In Proceedings of DysDoc 2018 (International Workshop on Dynamic Software Documentation). IEEE, 1--2.
[23]
Jalves Nicacio and Fabio Petrillo. 2021. Towards Improving Architectural Diagram Consistency Using System Descriptors. In Proceedings of ICPC 2021 (International Conference on Program Comprehension). IEEE/ACM, 401--405.
[24]
Dennis Pagano and Walid Maalej. 2011. How Do Developers Blog? An Exploratory Study. In Proceedings of MSR 2011 (Working Conference on Mining Software Repositories). ACM, 123--132.
[25]
Chris Parnin, Christoph Treude, Lars Grammel, and Margaret-Anne Storey. 2012. Crowd Documentation: Exploring the Coverage and the Dynamics of API Discussions on Stack Overflow. Technical Report. Georgia Institute of Technology.
[26]
Esteban Parra, Ashley Ellis, and Sonia Haiduc. 2020. GitterCom: A Dataset of Open Source Developer Communications in Gitter. In Proceedings of MSR 2020 (International Conference on Mining Software Repositories). ACM, 563--567.
[27]
Jirat Pasuksmit, Patanamon Thongtanunam, and Shanika Karunasekera. 2021. Towards Just-Enough Documentation for Agile Effort Estimation: What Information Should Be Documented?. In Proceedings of ICSME 2021 (International Conference on Software Maintenance and Evolution). IEEE.
[28]
Luca Ponzanelli, Gabriele Bavota, Massimiliano Di Penta, Rocco Oliveto, and Michele Lanza. 2014. Mining StackOverflow to Turn the IDE into a Self-Confident Programming Prompter. In Proceedings of MSR 2014 (Working Conference on Mining Software Repositories). IEEE/ACM, 102--111.
[29]
Marco Raglianti, Roberto Minelli, Csaba Nagy, and Michele Lanza. 2021. Visualizing Discord Servers. In Proceedings of VISSOFT 2021 (Working Conference on Software Visualization). IEEE, 150--154.
[30]
Martin P Robillard. 2009. What Makes APIs Hard to Learn? Answers from Developers. IEEE Software 26, 6 (2009), 27--34.
[31]
Martin P Robillard and Robert DeLine. 2011. A Field Study of API Learning Obstacles. Empirical Software Engineering 16, 6 (2011), 703--732.
[32]
Martin P. Robillard, Andrian Marcus, Christoph Treude, Gabriele Bavota, Oscar Chaparro, Neil Ernst, Marco Aurélio Gerosa, Michael Godfrey, Michele Lanza, Mario Linares-Vásquez, Gail C. Murphy, Laura Moreno, David Shepherd, and Edmund Wong. 2017. On-demand Developer Documentation. In Proceedings of ICSME 2017 (International Conference on Software Maintenance and Evolution). IEEE, 479--483.
[33]
Lin Shi, Xiao Chen, Ye Yang, Hanzhi Jiang, Ziyou Jiang, Nan Niu, and Qing Wang. 2021. A First Look at Developers' Live Chat on Gitter. In Proceedings of ESEC/FSE 2021 (European Software Engineering Conference and Symposium on the Foundations of Software Engineering). ACM, 391--403.
[34]
Ian Sommerville. 2015. Software Engineering (10th ed.). Pearson.
[35]
Margaret-Anne Storey, David F. Fracchia, and Hausi A. Müller. 1999. Cognitive Design Elements to Support the Construction of a Mental Model During Software Exploration. Journal of Systems and Software 44, 3 (1999), 171--185.
[36]
Jirateep Tantisuwankul, Yusuf Sulistyo Nugroho, Raula Gaikovina Kula, Hideaki Hata, Arnon Rungsawang, Pattara Leelaprute, and Kenichi Matsumoto. 2019. A topological analysis of communication channels for knowledge sharing in contemporary GitHub projects. Journal of Systems and Software 158 (2019), 110416.
[37]
Yuan Tian, Palakorn Achananuparp, Ibrahim Nelman Lubis, David Lo, and Ee-Peng Lim. 2012. What does software engineering community microblog about?. In Proceedings of MSR 2012 (Working Conference on Mining Software Repositories). IEEE, 247--250.
[38]
Christoph Treude, Martin P. Robillard, and Barthélémy Dagenais. 2015. Extracting Development Tasks to Navigate Software Documentation. IEEE Transactions on Software Engineering 41, 6 (2015), 565--581.
[39]
Gias Uddin and Martin P Robillard. 2015. How API Documentation Fails. IEEE Software 32, 4 (2015), 68--75.
[40]
Fengcai Wen, Csaba Nagy, Gabriele Bavota, and Michele Lanza. 2019. A Large-Scale Empirical Study on Code-Comment Inconsistencies. In Proceedings of ICPC 2019 (International Conference on Program Comprehension). IEEE/ACM, 53--64.
[41]
Juan Zhai, Xiangzhe Xu, Yu Shi, Guanhong Tao, Minxue Pan, Shiqing Ma, Lei Xu, Weifeng Zhang, Lin Tan, and Xiangyu Zhang. 2020. CPC: Automatically Classifying and Propagating Natural Language Comments via Program Analysis. In Proceedings of ICSE 2020 (International Conference on Software Engineering). ACM, 1359--1371.
[42]
Jian Zhang, Xu Wang, Hongyu Zhang, Hailong Sun, and Xudong Liu. 2020. Retrieval-Based Neural Source Code Summarization. In Proceedings of ICSE 2020 (International Conference on Software Engineering). ACM, 1385--1397.
[43]
Junji Zhi, Vahid Garousi-Yusifoğlu, Bo Sun, Golara Garousi, Shawn Shahnewaz, and Guenther Ruhe. 2015. Cost, Benefits and Quality of Software Development Documentation: A Systematic Mapping. Journal of Systems and Software 99 (2015), 175--198.

Cited By

View all
  • (2024)An exploratory study of software artifacts on GitHub from the lens of documentationInformation and Software Technology10.1016/j.infsof.2024.107425169:COnline publication date: 2-Jul-2024
  • (2024)Richen: Automated enrichment of Git documentation with usage examples and scenariosJournal of Software: Evolution and Process10.1002/smr.2662Online publication date: 13-Mar-2024
  • (2023)Towards identifying and minimizing customer-facing documentation debt2023 ACM/IEEE International Conference on Technical Debt (TechDebt)10.1109/TechDebt59074.2023.00015(72-81)Online publication date: May-2023

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
ICSE '22: Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings
May 2022
394 pages
ISBN:9781450392235
DOI:10.1145/3510454
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 October 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. communication platforms
  2. software documentation
  3. visualization

Qualifiers

  • Research-article

Conference

ICSE '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 276 of 1,856 submissions, 15%

Upcoming Conference

ICSE 2025

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)0
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An exploratory study of software artifacts on GitHub from the lens of documentationInformation and Software Technology10.1016/j.infsof.2024.107425169:COnline publication date: 2-Jul-2024
  • (2024)Richen: Automated enrichment of Git documentation with usage examples and scenariosJournal of Software: Evolution and Process10.1002/smr.2662Online publication date: 13-Mar-2024
  • (2023)Towards identifying and minimizing customer-facing documentation debt2023 ACM/IEEE International Conference on Technical Debt (TechDebt)10.1109/TechDebt59074.2023.00015(72-81)Online publication date: May-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media