[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
research-article

Applying bayesian data analysis for causal inference about requirements quality: a controlled experiment

Published: 22 November 2024 Publication History

Abstract

It is commonly accepted that the quality of requirements specifications impacts subsequent software engineering activities. However, we still lack empirical evidence to support organizations in deciding whether their requirements are good enough or impede subsequent activities. We aim to contribute empirical evidence to the effect that requirements quality defects have on a software engineering activity that depends on this requirement. We conduct a controlled experiment in which 25 participants from industry and university generate domain models from four natural language requirements containing different quality defects. We evaluate the resulting models using both frequentist and Bayesian data analysis. Contrary to our expectations, our results show that the use of passive voice only has a minor impact on the resulting domain models. The use of ambiguous pronouns, however, shows a strong effect on various properties of the resulting domain models. Most notably, ambiguous pronouns lead to incorrect associations in domain models. Despite being equally advised against by literature and frequentist methods, the Bayesian data analysis shows that the two investigated quality defects have vastly different impacts on software engineering activities and, hence, deserve different levels of attention. Our employed method can be further utilized by researchers to improve reliable, detailed empirical evidence on requirements quality.

References

[1]
Badampudi D, Wohlin C, Gorschek T (2019) Contextualizing research evidence through knowledge translation in software engineering. In: Proceedings of the 23rd international conference on evaluation and assessment in software engineering, pp 306–311
[2]
Baldassarre MT, Carver J, Dieste O, Juristo, N (2014) Replication types: towards a shared taxonomy. In: Proceedings of the 18th international conference on evaluation and assessment in software engineering, pp 1–4
[3]
Baltes S and Ralph P Sampling in software engineering research: a critical review and guidelines Empir Softw Eng 2022 27 4 94
[4]
Bano M (2015) Addressing the challenges of requirements ambiguity: a review of empirical literature. In: 2015 IEEE fifth international workshop on empirical requirements engineering (EmpiRE), pp 21–24. IEEE
[5]
Belev G (1989) Guidelines for specification development. In: Proceedings., annual reliability and maintainability symposium, pp 15–21. IEEE
[6]
Benjamini Y and Hochberg Y Controlling the false discovery rate: a practical and powerful approach to multiple testing J R Stat Soc: Ser B (Methodological) 1995 57 1 289-300
[7]
Berntsson Svensson R and Torkar R Not all requirements prioritization criteria are equal at all times: a quantitative analysis J Syst Softw 2024 209 111909
[8]
Berry DM, Kamsties E (2004) Ambiguity in requirements specification. In: Perspectives on software requirements, pp 7–44. Springer
[9]
Boehm BW Software engineering economics IEEE Trans Softw Eng 1984 SE–10 1 4-21
[10]
Bogner J, Kotstein S, and Pfaff T Do restful api design rules have an impact on the understandability of web apis? Empir Softw Eng 2023 28 6 132
[11]
Boyd S, Zowghi D, Farroukh A (2005) Measuring the expressiveness of a constrained natural language: an empirical study. In: 13th IEEE international conference on requirements engineering (RE’05), pp 339–349. IEEE
[12]
Briand L, Bianculli D, Nejati S, Pastore F, and Sabetzadeh M The case for context-driven software engineering research: generalizability is overrated IEEE Softw 2017 34 5 72-75
[13]
Brooks S, Gelman A, Jones G, Meng XL (2011) Handbook of Markov Chain Monte Carlo. CRC press
[14]
Brown Jr BW (1980) The crossover experiment for clinical trials. Biometrics 69–79
[15]
Bürkner PC brms: An R package for Bayesian multilevel models using Stan J Stat Softw 2017 80 1-28
[16]
Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, Brubaker M, Guo J, Li P, Riddell A (2017) Stan: a probabilistic programming language. J Stat Softw 76(1)
[17]
Carver JC (2010) Towards reporting guidelines for experimental replications: a proposal. In: 1st International workshop on replication in empirical software engineering, vol 1, pp 1–4
[18]
Carver J, Jaccheri L, Morasca S, Shull F (2004) Issues in using students in empirical studies in software engineering education. In: Proceedings. 5th international workshop on enterprise networking and computing in healthcare industry (IEEE Cat. No. 03EX717), pp 239–249. IEEE
[19]
Chantree F, Nuseibeh B, De Roeck A, Willis A (2006) Identifying nocuous ambiguities in natural language requirements. In: 14th IEEE international requirements engineering conference (RE’06), pp 59–68. IEEE
[20]
Christel MG, Kang KC (1992) Issues in requirements elicitation
[21]
Ciolkowski M and Münch J Accumulation and presentation of empirical evidence: problems and challenges ACM SIGSOFT Softw Eng Notes 2005 30 4 1-3
[22]
Cohen BH (2008) Explaining psychological statistics. John Wiley & Sons
[23]
Cohen J (1969) Statistical power analysis for the behavioral sciences. Academic press
[24]
de Bruijn F, Dekkers HL (2010) Ambiguity in natural language software requirements: a case study. In: Requirements engineering: foundation for software quality: 16th international working conference, REFSQ 2010, Essen, Germany, June 30–July 2, 2010. Proceedings 16, pp 233–247. Springer
[25]
Deissenboeck F, Wagner S, Pizka M, Teuchert S, Girard JF (2007) An activity-based quality model for maintainability. In: 2007 IEEE international conference on software maintenance, pp 184–193. IEEE
[26]
Demaris A (1992) Logit modeling: practical applications. 86. Sage
[27]
Drechsler R, Soeken M, Wille R (2014) Automated and quality-driven requirements engineering. In: 2014 IEEE/ACM international conference on computer-aided design (ICCAD), pp 586–590. IEEE
[28]
Dybå T, Kampenes VB, and Sjøberg DI A systematic review of statistical power in software engineering experiments Inf Softw Technol 2006 48 8 745-755
[29]
Elwert F (2013) Graphical causal models. In: Handbook of causal analysis for social research, pp 245–273. Springer
[30]
Ernst NA (2018) Bayesian hierarchical modelling for tailoring metric thresholds. In: Proceedings of the 15th international conference on mining software repositories, pp 587–591.
[31]
Ezzini S, Abualhaija S, Arora C, Sabetzadeh M (2022) Automated handling of anaphoric ambiguity in requirements: a multi-solution study. In: Proceedings of the 44th international conference on software engineering, pp 187–199
[32]
Ezzini S, Abualhaija S, Arora C, Sabetzadeh M (2022) TAPHSIR: towards AnaPHoric ambiguity detection and resolution in requirements. In: Proceedings of the 30th ACM joint european software engineering conference and symposium on the foundations of software engineering, pp 1677–1681
[33]
Ezzini S, Abualhaija S, Arora C, Sabetzadeh M, Briand LC (2021) Using domain-specific corpora for improved handling of ambiguity in requirements. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pp 1485–1497. IEEE
[34]
Femmer H (2018) Requirements quality defect detection with the qualicen requirements scout. In: REFSQ Workshops
[35]
Femmer H and Vogelsang A Requirements quality is quality in use IEEE Softw 2018 36 3 83-91
[36]
Femmer H, Fernández DM, Wagner S, and Eder S Rapid quality assurance with requirements smells J Syst Softw 2017 123 190-213
[37]
Femmer H, Kučera J, Vetrò A (2014) On the impact of passive voice requirements on domain modelling. In: Proceedings of the 8th ACM/IEEE international symposium on empirical software engineering and measurement, pp 1–4
[38]
Femmer H, Mund J, Fernández DM (2015) It’s the activities, stupid! A new perspective on re quality. In: 2015 IEEE/ACM 2nd international workshop on requirements engineering and testing, pp 13–19. IEEE
[39]
Ferrari A and Esuli A An NLP approach for cross-domain ambiguity detection in requirements engineering Autom Softw Eng 2019 26 3 559-598
[40]
Ferrari A, Gori G, Rosadini B, Trotta I, Bacherini S, Fantechi A, and Gnesi S Detecting requirements defects with NLP patterns: an industrial experience in the railway domain Empir Softw Eng 2018 23 3684-3733
[41]
Ferrari A, Spagnolo GO, Gnesi S (2017) Pure: a dataset of public requirements documents. In: 2017 IEEE 25th international requirements engineering conference (RE), pp 502–505. IEEE
[42]
Firesmith D Common requirements problems, their negative consequences, and the industry best practices to help solve them J Object Technol 2007 6 1 17-33
[43]
Franch X, Fernández DM, Oriol M, Vogelsang A, Heldal R, Knauss E, Travassos GH, Carver JC, Dieste O, Zimmermann T (2017) How do practitioners perceive the relevance of requirements engineering research? An ongoing study. In: 2017 IEEE 25th international requirements engineering conference (RE), pp 382–387. IEEE
[44]
Franch X, Mendez D, Vogelsang A, Heldal R, Knauss E, Oriol M, Travassos G, Carver JC, Zimmermann T (2020) How do practitioners perceive the relevance of requirements engineering research? IEEE Trans Softw Eng
[45]
Franch X, Palomares C, Quer C, Chatzipetrou P, Gorschek T (2023) The state-of-practice in requirements specification: an extended interview study at 12 companies. Requir Eng 1–33.
[46]
Frattini J (2024) Replication package for the applying bayesian data analysis for causal inference about requirements quality: a controlled experiment. https://zenodo.org/doi/10.5281/zenodo.10423665 Accessed: 21-June-2024
[47]
Frattini J, Fucci D, Torkar R, Mendez D (2024) A second look at the impact of passive voice requirements on domain modeling: Bayesian reanalysis of an experiment. In: International workshop on methodological issues with empirical studies in software engineering (WSESE’24)
[48]
Frattini J, Montgomery L, Fischbach J, Mendez D, Fucci D, Unterkalmsteiner M (2023) Requirements quality research: a harmonized theory, evaluation, and roadmap. Requir Eng 1–14
[49]
Frattini J, Montgomery L, Fischbach J, Unterkalmsteiner M, Mendez D, Fucci D (2022) A live extensible ontology of quality factors for textual requirements. In: 2022 IEEE 30th international requirements engineering conference (RE), pp 274–280. IEEE
[50]
Frattini J, Unterkalmsteiner M, Fucci D, Mendez D (2024) NLP4RE Tools: classification, overview, and management. Springer International Publishing
[51]
Fucci D, Scanniello G, Romano S, Shepperd M, Sigweni B, Uyaguari F, Turhan B, Juristo N, Oivo M (2016) An external replication on the effects of test-driven development using a multi-site blind analysis approach. In: Proceedings of the 10th ACM/IEEE international symposium on empirical software engineering and measurement, ESEM ’16. Association for Computing Machinery, New York, NY, USA.
[52]
Furia CA, Torkar R, Feldt R (2023) Towards causal analysis of empirical software engineering data: the impact of programming languages on coding competitions. ACM Trans Softw Eng Methodol 33(1).
[53]
Furia CA, Feldt R, and Torkar R Bayesian data analysis in empirical software engineering research IEEE Trans Softw Eng 2019 47 9 1786-1810
[54]
Furia CA, Torkar R, and Feldt R Applying Bayesian analysis guidelines to empirical software engineering data: The case of programming languages and code quality ACM Trans Softw Eng Methodol (TOSEM) 2022 31 3 1-38
[55]
Gelman A (2018) You need 16 times the sample size to estimate an interaction than to estimate a main effect. https://statmodeling.stat.columbia.edu/2018/03/15/need16/. Accessed 24-Nov-2023
[56]
Gelman A, Vehtari A, Simpson D, Margossian CC, Carpenter B, Yao Y, Kennedy L, Gabry J, Bürkner PC, Modrák M (2020) Bayesian workflow. arXiv:2011.01808
[57]
Génova G, Fuentes JM, Llorens J, Hurtado O, and Moreno V A framework to measure and improve the quality of textual requirements Requir Eng 2013 18 25-41
[58]
Gleich B, Creighton O, Kof L (2010) Ambiguity detection: towards a tool explaining ambiguity sources. In: Requirements engineering: foundation for software quality: 16th international working conference, REFSQ 2010, Essen, Germany, June 30–July 2, 2010. Proceedings 16, pp 218–232. Springer
[59]
Gómez OS, Juristo N, Vegas S (2010) Replications types in experimental disciplines. In: Proceedings of the 2010 ACM-IEEE international symposium on empirical software engineering and measurement, pp 1–10
[60]
Gren L and Berntsson Svensson R Is it possible to disregard obsolete requirements? a family of experiments in software effort estimation Requir Eng 2021 26 3 459-480
[61]
Hasso H, Dembach M, Geppert H, Toews D (2019) Detection of defective requirements using rule-based scripts. In: REFSQ workshops
[62]
Hsu H, Lachenbruch PA (2014) Paired t test. Wiley StatsRef: statistics reference online
[63]
Jaynes ET Probability theory: the logic of science 2003 Cambridge Cambridge University Press
[64]
Jedlitschka A, Ciolkowski M, Pfahl D (2008) Reporting experiments in software engineering. Guide Adv Empir Softw Eng 201–228
[65]
Juergens E, Deissenboeck F (2010) How much is a clone. In: Proceedings of the 4th international workshop on software quality and maintainability, pp 79–88
[66]
Juristo N and Vegas S The role of non-exact replications in software engineering experiments Empir Softw Eng 2011 16 295-324
[67]
Kamsties E, Peach B (2000) Taming ambiguity in natural language requirements. In: Proceedings of the thirteenth international conference on software and systems engineering and applications, vol 1315
[68]
Kamsties E, von Knethen A, Philipps J (2005) An empirical investigation of requirements specification languages: detecting defects while formalizing requirements. In: Information modeling methods and methodologies: advanced topics in database research, pp 125–147. IGI Global
[69]
King BM, Rosopa PJ, Minium EW (2018) Statistical reasoning in the behavioral sciences. John Wiley & Sons
[70]
Kitchenham B, Fry J, Linkman S (2003) The case against cross-over designs in software engineering. In: Eleventh annual international workshop on software technology and engineering practice, pp 65–67. IEEE
[71]
Knauss E, Schneider K, Stapel K (2009) Learning to write better requirements through heuristic critiques. In: 2009 17th IEEE international requirements engineering conference, pp 387–388. IEEE
[72]
Kof L (2007) Treatment of passive voice and conjunctions in use case documents. In: Natural language processing and information systems: 12th international conference on applications of natural language to information systems, NLDB 2007, Paris, France, June 27-29, 2007. Proceedings 12, pp 181–192. Springer
[73]
Krisch J, Houdek F (2015) The myth of bad passive voice and weak words an empirical investigation in the automotive industry. In: 2015 IEEE 23rd international requirements engineering conference (RE), pp 344–351. IEEE
[74]
Levén W, Broman H, Besker T, Torkar R (2022) The broken windows theory applies to technical debt. arXiv:2209.01549
[75]
Martyniuk W (2006) Common european framework of reference for languages: learning, teaching, assessment (cefr)–a synopsis. In: Annual meeting of the consortium for language teaching and learning cornell university. Concil of Europe, Language policy division. https://rm.coe.int/16802fc1bf
[76]
McElreath R (2020) Statistical rethinking: a Bayesian course with examples in R and Stan. CRC press
[77]
Méndez Fernández D and Penzenstadler B Artefact-based requirements engineering: the AMDiRE approach Requir Eng 2015 20 405-434
[78]
Méndez Fernández D, Böhm W, Vogelsang A, Mund J, Broy M, Kuhrmann M, and Weyer T Artefacts in software engineering: a fundamental positioning Softw Syst Model 2019 18 2777-2786
[79]
Méndez D, Wagner S, Kalinowski M, Felderer M, Mafra P, Vetrò A, Conte T, Christiansson MT, Greer D, Lassenius C, et al. Naming the pain in requirements engineering: contemporary problems, causes, and effects in practice Empir Softw Eng 2017 22 2298-2338
[80]
Montgomery L, Fucci D, Bouraffa A, Scholz L, and Maalej W Empirical research on requirements quality: a systematic mapping study Requir Eng 2022 27 2 183-209
[81]
Mund J, Fernandez DM, Femmer H, Eckhardt J (2015) Does quality of requirements specifications matter? Combined results of two empirical studies. In: 2015 ACM/IEEE international symposium on Empirical Software Engineering and Measurement (ESEM), pp 1–10. IEEE
[82]
Nilsson A, Bonander C, Strömberg U, and Björk J A directed acyclic graph for interactions Int J Epidemiol 2021 50 2 613-619
[83]
Nosek BA and Errington TM What is replication? PLoS Biol 2020 18 3 e3000691
[84]
Nuseibeh B, Easterbrook S (2000) Requirements engineering: a roadmap. In: Proceedings of the conference on the future of software engineering, pp 35–46
[85]
O’Grady W, Archibald J, Aronoff M, and Rees-Miller J Contemporary linguistics: an introduction 2001 Boston Bedford/St. Martin’s
[86]
Parra E, Dimou C, Llorens J, Moreno V, and Fraga A A methodology for the classification of quality of requirements using machine learning techniques Inf Softw Technol 2015 67 180-195
[87]
Pearl J (1995) From Bayesian networks to causal networks. In: Mathematical models for handling partial knowledge in artificial intelligence, pp 157–182. Springer
[88]
Pearl J, Glymour M, Jewell NP (2016) Causal inference in statistics: a primer. John Wiley & Sons
[89]
Petersen K, Wohlin C (2009) Context in industrial software engineering research. In: 2009 3rd International symposium on empirical software engineering and measurement, pp 401–404. IEEE
[90]
Phalp KT, Vincent J, and Cox K Assessing the quality of use case descriptions Software Qual J 2007 15 1 69-97
[91]
Philippo EJ, Heijstek W, Kruiswijk B, Chaudron MR, Berry DM (2013) Requirement ambiguity not as important as expected-results of an empirical evaluation. In: Requirements engineering: foundation for software quality: 19th international working conference, REFSQ 2013, Essen, Germany, April 8-11, 2013. Proceedings 19, pp 65–79. Springer
[92]
Pickard LM, Kitchenham BA, and Jones PW Combining empirical results in software engineering Inf Softw Technol 1998 40 14 811-821
[93]
Poesio M (1996) Semantic ambiguity and perceived ambiguity. In: van Deemter K, Peters S (eds) Semantic ambiguity and underspecification. Center for the study of language and Inf, United Kingdom.
[94]
Pohl K (2016) Requirements engineering fundamentals: a study guide for the certified professional for requirements engineering exam-foundation level-IREB compliant. Rocky Nook, Inc
[95]
Rosadini B, Ferrari A, Gori G, Fantechi A, Gnesi S, Trotta I, Bacherini S (2017) Using NLP to detect requirements defects: an industrial experience in the railway domain. In: Requirements engineering: foundation for software quality: 23rd international working conference, REFSQ 2017, Essen, Germany, February 27–March 2, 2017, Proceedings 23, pp 344–360. Springer
[96]
Russo D and Stol KJ Gender differences in personality traits of software engineers IEEE Trans Softw Eng 2020 48 3 819-834
[97]
Salman I, Misirli AT, Juristo N (2015) Are students representatives of professionals in software engineering experiments? In: 2015 IEEE/ACM 37th IEEE international conference on software engineering, vol 1, pp 666–676. IEEE
[98]
Shah US and Jinwala DC Resolving ambiguity in natural language specification to generate UML diagrams for requirements specification Int J Softw Eng Technol Appl 2015 1 2–4 308-334
[99]
Shapiro SS and Wilk MB An analysis of variance test for normality (complete samples) Biometrika 1965 52 3/4 591-611
[100]
Sharma R, Sharma N, Biswas K (2016) Machine learning for detecting pronominal anaphora ambiguity in NL requirements. In: 2016 4th Intl conf on applied computing and information technology/3rd intl conf on computational science/intelligence and applied informatics/1st intl conf on big data, cloud computing, data science & engineering (ACIT-CSII-BCD), pp 177–182. IEEE
[101]
Siebert J (2023) Applications of statistical causal inference in software engineering. Inf Softw Technol 107198
[102]
Sjøberg DI, Anda B, Arisholm E, Dybå T, Jørgensen M, Karahasanović A, Vokáč M (2003) Challenges and recommendations when increasing the realism of controlled software engineering experiments. In: Empirical methods and studies in software engineering: Experiences from ESERNET, pp. 24–38. Springer.
[103]
Soeken M, Abdessaied N, Allahyari-Abhari A, Buzo A, Musat L, Pelz G, Drechsler R (2014) Quality assessment for requirements based on natural language processing. In: Forum on specification and design languages. Proceedings. Citeseer
[104]
Stol KJ and Fitzgerald B The ABC of software engineering research ACM Trans Softw Eng Methodol (TOSEM) 2018 27 3 1-51
[105]
Svensson RB, Feldt R, Torkar R (2019) The unfulfilled potential of data-driven decision making in agile software development. In: Agile processes in software engineering and extreme programming: 20th international conference, XP 2019, Montréal, QC, Canada, May 21–25, 2019, Proceedings 20, pp 69–85. Springer
[106]
Torkar R, Feldt R, Furia CA (2020) Bayesian data analysis in empirical software engineering: the case of missing data, pp 289–324. Springer International Publishing, Cham.
[107]
Vegas S, Apa C, and Juristo N Crossover designs in software engineering experiments: benefits and perils IEEE Trans Softw Eng 2015 42 2 120-135
[108]
Vieira R, Mesquita D, Mattos CL, Britto R, Rocha L, Gomes J (2022) Bayesian analysis of bug-fixing time using report data. In: Proceedings of the 16th ACM/IEEE international symposium on empirical software engineering and measurement, pp 57–68
[109]
Wagner S, Fernández DM, Felderer M, Vetrò A, Kalinowski M, Wieringa R, Pfahl D, Conte T, Christiansson MT, Greer D, et al. Status quo in requirements engineering: a theory and a global family of surveys ACM Trans Softw Eng Methodol (TOSEM) 2019 28 2 1-48
[110]
Wagner S, Lochmann K, Heinemann L, Kläs M, Trendowicz A, Plösch R, Seidi A, Goeb A, Streit J (2012) The Quamoco product quality modelling and assessment approach. In: 2012 34th International Conference on Software Engineering (ICSE), pp 1133–1142. IEEE
[111]
Wesner JS and Pomeranz JP Choosing priors in Bayesian ecological models by simulating from the prior predictive distribution Ecosphere 2021 12 9
[112]
Wilcoxon F Individual comparisons by ranking methods Biom Bull 1945 1 6 80-83
[113]
Wohlin C, Runeson P, Höst M, Ohlsson MC, Regnell B, Wesslén A (2012) Experimentation in software engineering. Springer Science & Business Media
[114]
Yang H, De Roeck A, Gervasi V, Willis A, and Nuseibeh B Analysing anaphoric ambiguity in natural language requirements Requir Eng 2011 16 163-189
[115]
Yang H, De Roeck A, Gervasi V, Willis A, Nuseibeh B (2010) Extending nocuous ambiguity analysis for anaphora in natural language requirements. In: 2010 18th IEEE international requirements engineering conference, pp 25–34. IEEE
[116]
Zhao L, Alhoshan W, Ferrari A, Letsholo KJ, Ajagbe MA, Chioasca EV, and Batista-Navarro RT Natural language processing for requirements engineering: a systematic mapping study ACM Comput Surv (CSUR) 2021 54 3 1-41

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image Empirical Software Engineering
Empirical Software Engineering  Volume 30, Issue 1
Feb 2025
1462 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 22 November 2024
Accepted: 24 October 2024

Author Tags

  1. Requirements engineering
  2. Requirements quality
  3. Experiment
  4. Replication
  5. Bayesian data analysis

Qualifiers

  • Research-article

Funding Sources

  • Blekinge Institute of Technology

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 04 Jan 2025

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media