User story clustering in agile development: a framework and an empirical study

Bo Yang^1,2,3,
Xiuyin Ma³,
Chunhui Wang⁴,
Haoran Guo³,
Huai Liu⁵ &
…
Zhi Jin^6,7

563 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Agile development aims at rapidly developing software while embracing the continuous evolution of user requirements along the whole development process. User stories are the primary means of requirements collection and elicitation in the agile development. A project can involve a large amount of user stories, which should be clustered into different groups based on their functionality’s similarity for systematic requirements analysis, effective mapping to developed features, and efficient maintenance. Nevertheless, the current user story clustering is mainly conducted in a manual manner, which is time-consuming and subjective to human bias. In this paper, we propose a novel approach for clustering the user stories automatically on the basis of natural language processing. Specifically, the sentence patterns of each component in a user story are first analysed and determined such that the critical structure in the representative tasks can be automatically extracted based on the user story meta-model. The similarity of user stories is calculated, which can be used to generate the connected graph as the basis of automatic user story clustering. We evaluate the approach based on thirteen datasets, compared against ten baseline techniques. Experimental results show that our clustering approach has higher accuracy, recall rate and F1-score than these baselines. It is demonstrated that the proposed approach can significantly improve the efficacy of user story clustering and thus enhance the overall performance of agile development. The study also highlights promising research directions for more accurate requirements elicitation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (United Kingdom)

Instant access to the full article PDF.

Institutional subscriptions

Machine Learning Based Approach for User Story Clustering in Agile Engineering

Article 30 September 2023

Improving agile requirements: the Quality User Story framework and tool

Article Open access 01 April 2016

Automatic user story generation: a comprehensive systematic literature review

Article 03 June 2024

References

Sillitti A, Succi G. Requirements engineering for agile methods. In: Aurum A, Wohlin C, eds. Engineering and Managing Software Requirements. Berlin, Heidelberg: Springer, 2005, 309–326
Chapter Google Scholar
Leffingwell D. Agile Software Requirements: Lean Requirements Practices for Teams, Programs, and the Enterprise. Upper Saddle River: Addison-Wesley Professional, 2011
Google Scholar
Wang X, Zhao L, Wang Y, Sun J. The role of requirements engineering practices in agile development: an empirical study. In: Zowghi D, Jin Z, eds. Requirements Engineering. Berlin, Heidelberg: Springer, 2014, 195–209
Chapter Google Scholar
Kassab M. The changing landscape of requirements engineering practices over the past decade. In: Proceedings of the 5th IEEE International Workshop on Empirical Requirements Engineering (EmpiRE). 2015, 1–8
Dimitrijević S, Jovanović J, Devedžić V. A comparative study of software tools for user story management. Information and Software Technology, 2015, 57: 352–368
Article Google Scholar
Patton J, Economy P. User Story Mapping: Discover the Whole Story, Build the Right Product. Sebastopol: O’Reilly Media, Inc., 2014
Google Scholar
Wang C H, Jin Z, Zhao H Y, Liu L, Zhang W, Cui M Y. Humanassisted elicitation and evolution of user stories with scenarios. Journal of Software, 2019, 30(10): 3186–3205
Google Scholar
Lucassen G, Dalpiaz F, van der Werf J M E M, Brinkkemper S. Visualizing user story requirements at multiple granularity levels via semantic relatedness. In: Proceedings of the 35th International Conference on Conceptual Modeling. 2016, 463–478
Wautelet Y, Heng S, Kolp M, Mirbel I, Poelmans S. Building a rationale diagram for evaluating user story sets. In: Proceedings of the 10th IEEE International Conference on Research Challenges in Information Science (RCIS). 2016, 1–12
Tsilionis K, Maene J, Heng S, Wautelet Y, Poelmans S. Conceptual modeling versus user story mapping: which is the best approach to agile requirements engineering? In: Proceedings of the 15th International Conference on Research Challenges in Information Science. 2021, 356–373
Berends J, Dalpiaz F. Refining user stories via example mapping: an empirical investigation. Proceedings of the 29th IEEE International Requirements Engineering Conference (RE), 2021: 345–355
Wautelet Y, Heng S, Kolp M, Mirbel I. Unifying and extending user story models. In: Proceedings of the 26th International Conference on Advanced Information Systems Engineering. 2014, 211–225
Grau G, Franch X, Mayol E, Ayala C, Cares C, Haya M, Navarrete F, Botella P, Quer C. RiSD: a methodology for building i* strategic dependency models. In: Proceedings of the 17th International Conference on Software Engineering and Knowledge Engineering. 2005, 259–266
Lucassen G, Dalpiaz F, van der Werf J M E M, Brinkkemper S. Forging high-quality user stories: towards a discipline for agile requirements. In: Proceedings of the 23rd IEEE International Requirements Engineering Conference (RE). 2015, 126–135
Kanungo T, Mount D M, Netanyahu N S, Piatko C D, Silverman R, Wu A Y. An efficient k-means clustering algorithm: analysis and implementation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24: 881
Article MATH Google Scholar
Ester M, Kriegel H P, Sander J, Xu X. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. 1996, 226–231
Belkin M, Niyogi P. Laplacian Eigenmaps and spectral techniques for embedding and clustering. In: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic. 2001, 585–591
Joachims T. A probabilistic analysis of the rocchio algorithm with TFIDF for text categorization. In: Proceedings of the 14th International Conference on Machine Learning. 1997, 143–151
Le Q, Mikolov T. Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning. 2014, 1188–1196
Xu J, Wang P, Tian G, Xu B, Zhao J, Wang F, Hao H. Short text clustering via convolutional neural networks. In: Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing. 2015, 62–69
Goutte C, Gaussier E. A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Proceedings of the 27th European Conference on Information Retrieval. 2005, 345–359
Larsen B, Aone C. Fast and effective text mining using linear-time document clustering. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1999, 16–22
Hedges L V, Olkin I. Statistical Methods for Meta-Analysis. New York: Academic Press, 1985
MATH Google Scholar
Sawilowsky S S. New effect size rules of thumb. Journal of Modern Applied Statistical Methods, 2009, 8(2): 597–599
Article Google Scholar
Rodeghero P, Jiang S, Armaly A, McMillan C. Detecting user story information in developer-client conversations to generate extractive summaries. In: Proceedings of the 39th IEEE/ACM International Conference on Software Engineering (ICSE). 2017, 49–59
Lucassen G, Dalpiaz F, van der Werf J M E M, Brinkkemper S. Improving agile requirements: the Quality User Story framework and tool. Requirements Engineering, 2016, 21(3): 383–403
Article Google Scholar
Robeer M, Lucassen G, van der Werf J M E M, Dalpiaz F, Brinkkemper S. Automated extraction of conceptual models from user stories via NLP. In: Proceedings of the 24th IEEE International Requirements Engineering Conference (RE). 2016, 196–205
Dalpiaz F, van der Schalk I, Lucassen G. Pinpointing ambiguity and incompleteness in requirements engineering via information visualization and NLP. In: Proceedings of the 24th International Working Conference on Requirements Engineering: Foundation for Software Quality. 2018, 119–135
Wautelet Y, Heng S, Hintea D, Kolp M, Poelmans S. Bridging user story sets with the use case model. In: Proceedings of 2016 International Conference on Conceptual Modeling. 2016, 127–138
Mesquita R, Jaqueira A, Agra C, Lucena M, Alencar F. US2StarTool: generating i* models from user stories. In: Proceedings of the 8th International i* Workshop (istar 2015). 2015, 103–108
Jaqueira A, Lucena M, Alencar F M R, Castro J, Aranha E. Using i* models to enrich user stories. In: Proceedings of the 6th International i* Workshop 2013. 2013, 55–60
Trkman M, Mendling J, Krisper M. Using business process models to better understand the dependencies among user stories. Information and Software Technology, 2016, 71: 58–76
Article Google Scholar
Wautelet Y, Velghe M, Heng S, Poelmans S, Kolp M. On modelers ability to build a visual diagram from a user story set: a goal-oriented approach. In: Proceedings of the 24th International Working Conference on Requirements Engineering: Foundation for Software Quality. 2018, 209–226
Barbosa R, Silva A E A, Moraes R. Use of similarity measure to suggest the existence of duplicate user stories in the srum process. In: Proceedings of the 46th Annual IEEE/IFIP International Conference on Dependable Systems and Networks Workshop (DSN-W). 2016, 2–5
Li C, Duan Y, Wang H, Zhang Z, Sun A, Ma Z. Enhancing topic modeling for short texts with auxiliary word embeddings. ACM Transactions on Information Systems, 2017, 36(2): 11
Google Scholar
Quan X, Kit C, Ge Y, Pan S J. Short and sparse text topic modeling via self-aggregation. In: Proceedings of the 24th International Conference on Artificial Intelligence. 2015, 2270–2276
Seifzadeh S, Farahat A K, Kamel M S, Karray F. Short-text clustering using statistical semantics. In: Proceedings of the 24th International Conference on World Wide Web. 2015, 805–810
Joulin A, Grave E, Bojanowski P, Mikolov T. Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. 2017, 427–431
De Boom C, Van Canneyt S, Bohez S, Demeester T, Dhoedt B. Learning semantic similarity for very short texts. In: Proceedings of 2015 IEEE International Conference on Data Mining Workshop (ICDMW), 2015, 1229–1234
Zeng J, Li J, Song Y, Gao C, Lyu M R, King I. Topic memory networks for short text classification. In: Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 3120–3131
Kenter T, de Rijke M. Short text similarity with word embeddings. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. 2015, 1411–1420
Hua W, Wang Z, Wang H, Zheng K, Zhou X. Short text understanding through lexical-semantic analysis. In: Proceedings of the 31st IEEE International Conference on Data Engineering. 2015, 495–506
Liang S, Yilmaz E, Kanoulas E. Dynamic clustering of streaming short documents. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 995–1004
Zuo Y, Wu J, Zhang H, Lin H, Wang F, Xu K, Xiong H. Topic modeling of short texts: a pseudo-document view. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016, 2105–2114
Yin J, Chao D, Liu Z, Zhang W, Yu X, Wang J. Model-based clustering of short text streams. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2018, 2634–2642
Banerjee S, Ramanathan K, Gupta A. Clustering texts using wikipedia. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval. 2007
Fodeh S, Punch B, Tan P N. On ontology-driven document clustering using core semantic features. Knowledge and Information Systems, 2011, 28(2): 395–421
Article Google Scholar
Yin J, Wang J. A Dirichlet multinomial mixture model-based approach for short text clustering. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2014, 233–242
Lai S, Xu L, Liu K, Zhao J. Recurrent convolutional neural networks for text classification. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015, 2267–2273
Ravi S, Kozareva Z. Self-governing neural networks for on-device short text classification. In: Proceedings of 2018 Conference on Empirical Methods in Natural Language Processing. 2018, 804–810
Xie J, Girshick R, Farhadi A. Unsupervised deep embedding for clustering analysis. In: Proceedings of the 33rd International Conference on International Conference on Machine Learning. 2016, 478–487
Zhang X, Zhao J, LeCun Y. Character-level convolutional networks for text classification. In: Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015, 649–657
Wang M, Lu Z, Li H, Liu Q. Syntax-based deep matching of short texts. In: Proceedings of the 24th International Conference on Artificial Intelligence. 2015, 1354–1361
Wang P, Xu J, Xu B, Liu C L, Zhang H, Wang F, Hao H. Semantic clustering and convolutional neural network for short text categorization. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015, 352–357

Download references

Acknowledgements

We thank anonymous reviewers for their thoughtful comments. This work was sponsored by the National Natural Science Foundation of China (Grant Nos. 62192731, 62192730, 62162051), the Australian Research Council Discovery Project (DP210102447), and the Fundamental Research Funds for the Central Universities (BLX202003).

Author information

Authors and Affiliations

School of Information Science and Technology, Beijing Forestry University, Beijing, 100083, China
Bo Yang
Engineering Research Center for Forestry Oriented Intelligent Information Processing, National Forestry and Grassland Administration, Beijing, 100083, China
Bo Yang
School of Information Science and Technology, North China University of Technology, Beijing, 100144, China
Bo Yang, Xiuyin Ma & Haoran Guo
College of Computer Science and Technology, Inner Mongolia Normal University, Hohhot, 010020, China
Chunhui Wang
Department of Computer Science and Software Engineering, Swinburne University of Technology, Hawthorn, VIC, 3122, Australia
Huai Liu
Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, Beijing, 100871, China
Zhi Jin
Institute of Software, School of Computer Science, Peking University, Beijing, 100871, China
Zhi Jin

Authors

Bo Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xiuyin Ma
View author publications
You can also search for this author in PubMed Google Scholar
Chunhui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Haoran Guo
View author publications
You can also search for this author in PubMed Google Scholar
Huai Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhi Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Bo Yang or Zhi Jin.

Additional information

Bo Yang received the PhD degree in computer software and theory from the Beihang University, China. He is an associate professor at the School of Information Science and Technology, Beijing Forestry University, China. His research interests include deep learning, software testing, software fault localization, and software requirements analysis. He is a member of CCF.

Xiuyin Ma received the BEng degree from the North China University of Technology, China. Her research interests include software requirements analysis and software testing.

Chunhui Wang received her PhD degree in computer science from School of Electronics Engineering and Computer Science, Peking University, China in 2020. Currently, she is an associate professor at the School of Computer Science, Inner Mongolia Normal University, China. Her research interests include requirements engineering and collective intelligence based software engineering. She is a member of CCF.

Haoran Guo received the BEng degree from the North China University of Technology, China. His research interests include software fault localization and software testing.

Huai Liu received the PhD degree in software engineering from the Swinburne University of Technology, Australia. He is a senior lecturer in the Department of Computing Technologies, Swinburne University of Technology, Australia. He has worked as a lecturer at Victoria University and a research fellow at RMIT University Australia. His current research interests include software testing, cloud computing, and end-user software engineering.

Zhi Jin obtained her BSc from Zhejiang University, China in 1984, and PhD from National University of Defense Technology, China in 1992, respectively. She is a professor in School of Computer Science, Peking University (PKU), China and serves as the Deputy Director of High-Confidence Software Technologies (PKU), Ministry of Education, China since 2009. Her research interests include requirements engineering, knowledge engineering, and knowledge-based software engineering.

Electronic supplementary material