[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/2566486.2568036acmotherconferencesArticle/Chapter ViewAbstractPublication PagesthewebconfConference Proceedingsconference-collections
research-article

Chaff from the wheat: characterization and modeling of deleted questions on stack overflow

Published: 07 April 2014 Publication History

Abstract

Stack Overflow is the most popular Community based Question Answering (CQA) website for programmers on the web with 2.05M users, 5.1M questions and 9.4M answers. Stack Overflow has explicit, detailed guidelines on how to post questions and an ebullient moderation community. Despite these precise communications and safeguards, questions posted on Stack Overflow can be extremely off topic or very poor in quality. Such questions can be deleted from Stack Overflow at the discretion of experienced community members and moderators. We present the first study of deleted questions on Stack Overflow. We divide our study into two parts - (i) Characterization of deleted questions over ~5 years (2008-2013) of data, (ii) Prediction of deletion at the time of question creation. Our characterization study reveals multiple insights on question deletion phenomena. We find that it takes substantial time to vote a question to be deleted but once voted, the community takes swift action. We also see that question authors delete their questions to salvage reputation points. We notice some instances of accidental deletion of good quality questions but such questions are voted back to be undeleted quickly. We discover a pyramidal structure of question quality on Stack Overflow and find that deleted questions lie at the bottom (lowest quality) of the pyramid. We also build a predictive model to detect the deletion of question at the creation time. We experiment with 47 features -- based on User Profile, Community Generated, Question Content and Syntactic style -- and report an accuracy of 66%. Our findings reveal important suggestions for content quality maintenance on community based question answering websites. To the best of our knowledge, this is the first large scale study on poor quality (deleted) questions on Stack Overflow.

References

[1]
Why and how are some questions deleted? http://stackoverflow.com/help/deleted-questions.
[2]
How does deleting work? what can cause a post to be deleted, and what does that actually mean? what are the criteria for deletion? http://meta.stackoverflow.com/q/5221/214223, September 2008.
[3]
The great question deletion audit of 2010. http://meta.stackoverflow.com/questions/51097/the-great-question-deletion-audit-of-2010, May 2010.
[4]
E. Agichtein, C. Castillo, D. Donato, A. Gionis, and G. Mishne. Finding high-quality content in social media. In Proceedings of the international conference on Web search and web data mining, pages 183--194. ACM, 2008.
[5]
M. Allamanis and C. Sutton. Why, when, and what: analyzing stack overflow questions by topic, type, and code. In Proceedings of the Tenth International Workshop on Mining Software Repositories, pages 53--56. IEEE Press, 2013.
[6]
A. Anderson, D. Huttenlocher, J. Kleinberg, and J. Leskovec. Steering user behavior with badges. 2013.
[7]
M. Asaduzzaman, A. S. Mashiyat, C. K. Roy, and K. A. Schneider. Answering questions about unanswered questions of stack overflow. In Proceedings of the Tenth International Workshop on Mining Software Repositories, pages 97--100. IEEE Press, 2013.
[8]
J. Atwood. Stack overflow creative commons data dump. http://blog.stackoverflow.com/2009/06/stack-overflow-creative-commons-data-dump/, June 2009.
[9]
A. Barua, S. W. Thomas, and A. E. Hassan. What are developers talking about? an analysis of topics and trends in stack overflow. Empirical Software Engineering, pages 1--36, 2012.
[10]
A. Bosu, C. S. Corley, D. Heaton, D. Chatterji, J. C. Carver, and N. A. Kraft. Building reputation in stackoverflow: an empirical investigation. In Proceedings of the Tenth International Workshop on Mining Software Repositories, pages 89--92. IEEE Press, 2013.
[11]
D. Correa and A. Sureka. Fit or unfit: analysis and prediction of 'closed questions' on stack overflow. In Proceedings of the first ACM conference on Online social networks, COSN '13, pages 201--212, New York, NY, USA, 2013. ACM.
[12]
Y. Freund and R. E. Schapire. A desicion-theoretic generalization of on-line learning and an application to boosting. In Computational learning theory, pages 23--37. Springer, 1995.
[13]
C. Gómez, B. Cleary, and L. Singer. A study of innovation diffusion through link sharing on stack overflow. In Proceedings of the Tenth International Workshop on Mining Software Repositories, pages 81--84. IEEE Press, 2013.
[14]
H. He and E. A. Garcia. Learning from imbalanced data. Knowledge and Data Engineering, IEEE Transactions on, 21(9):1263--1284, 2009.
[15]
J. S. Jeff Atwood. Stack exchange platform. http://stackexchange.com, September 2009.
[16]
J. Jeon, W. B. Croft, J. H. Lee, and S. Park. A framework to predict the quality of answers with non-textual features. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR '06, pages 228--235, New York, NY, USA, 2006. ACM.
[17]
B. Li, T. Jin, M. R. Lyu, I. King, and B. Mak. Analyzing and predicting question quality in community question answering services. In Proceedings of the 21st international conference companion on World Wide Web, WWW '12 Companion, pages 775--782, New York, NY, USA, 2012. ACM.
[18]
M. Linares-Vásquez, B. Dit, and D. Poshyvanyk. An exploratory analysis of mobile development issues using stack overflow. In Proceedings of the Tenth International Workshop on Mining Software Repositories, pages 93--96. IEEE Press, 2013.
[19]
L. Mamykina, B. Manoim, M. Mittal, G. Hripcsak, and B. Hartmann. Design lessons from the fastest q&a site in the west. In Proceedings of the 2011 annual conference on Human factors in computing systems, pages 2857--2866. ACM, 2011.
[20]
S. M. Nasehi, J. Sillito, F. Maurer, and C. Burns. What makes a good code example?: A study of programming q&a in stackoverflow. In Software Maintenance (ICSM), 2012 28th IEEE International Conference on, pages 25--34. IEEE, 2012.
[21]
A. Pal, F. M. Harper, and J. A. Konstan. Exploring question selection bias to identify experts and potential experts in community question answering. ACM Transactions on Information Systems (TOIS), 30(2):10, 2012.
[22]
L. Ponzanelli, A. Bacchelli, and M. Lanza. Seahawk: stack overflow in the ide. In Proceedings of the 2013 International Conference on Software Engineering, pages 1295--1298. IEEE Press, 2013.
[23]
C. Shah and J. Pomerantz. Evaluating and predicting answer quality in community qa. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval, pages 411--418. ACM, 2010.
[24]
Y. R. Tausczik and J. W. Pennebaker. The psychological meaning of words: Liwc and computerized text analysis methods. Journal of Language and Social Psychology, 29(1):24--54, 2010.
[25]
W. Wang and M. W. Godfrey. Detecting api usage obstacles: a study of ios and android developer questions. In Proceedings of the Tenth International Workshop on Mining Software Repositories, pages 61--64. IEEE Press, 2013.
[26]
J. Zhu, S. Rosset, H. Zou, and T. Hastie. Multi-class adaboost. Ann Arbor, 1001(48109):1612, 2006.

Cited By

View all
  • (2024)Semantic Web Approaches in Stack OverflowInternational Journal on Semantic Web and Information Systems10.4018/IJSWIS.35861720:1(1-61)Online publication date: 9-Nov-2024
  • (2024)How to Analyze and Enhance Participation in Electronic Networks of PracticeFoundations of Management10.2478/fman-2024-000716:1(103-126)Online publication date: 2-Aug-2024
  • (2024)What Does a Downvote Do? Performing Complementary and Competing Knowledge Practices on an Online PlatformProceedings of the ACM on Human-Computer Interaction10.1145/36536928:CSCW1(1-28)Online publication date: 26-Apr-2024
  • Show More Cited By

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Other conferences
WWW '14: Proceedings of the 23rd international conference on World wide web
April 2014
926 pages
ISBN:9781450327442
DOI:10.1145/2566486

Sponsors

  • IW3C2: International World Wide Web Conference Committee

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 April 2014

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. question quality
  2. question-answering
  3. stack overflow

Qualifiers

  • Research-article

Conference

WWW '14
Sponsor:
  • IW3C2

Acceptance Rates

WWW '14 Paper Acceptance Rate 84 of 645 submissions, 13%;
Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)20
  • Downloads (Last 6 weeks)3
Reflects downloads up to 13 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Semantic Web Approaches in Stack OverflowInternational Journal on Semantic Web and Information Systems10.4018/IJSWIS.35861720:1(1-61)Online publication date: 9-Nov-2024
  • (2024)How to Analyze and Enhance Participation in Electronic Networks of PracticeFoundations of Management10.2478/fman-2024-000716:1(103-126)Online publication date: 2-Aug-2024
  • (2024)What Does a Downvote Do? Performing Complementary and Competing Knowledge Practices on an Online PlatformProceedings of the ACM on Human-Computer Interaction10.1145/36536928:CSCW1(1-28)Online publication date: 26-Apr-2024
  • (2024)On the Helpfulness of Answering Developer Questions on Discord with Similar Conversations and Posts from the PastProceedings of the IEEE/ACM 46th International Conference on Software Engineering10.1145/3597503.3623341(1-13)Online publication date: 20-May-2024
  • (2024)An Empirical Study of Unanswered Python-Related Questions on Stack Overflow2024 International Conference on Information Technology Research and Innovation (ICITRI)10.1109/ICITRI62858.2024.10699159(230-235)Online publication date: 5-Sep-2024
  • (2023)Understanding the Role of Images on Stack Overflow2023 IEEE/ACM 20th International Conference on Mining Software Repositories (MSR)10.1109/MSR59073.2023.00059(377-388)Online publication date: May-2023
  • (2022)Generating High Quality Titles in StackOverflow via Data Denoising Method2022 IEEE 13th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)10.1109/PAAP56126.2022.10010656(1-6)Online publication date: 25-Nov-2022
  • (2022)Ask It Right! Identifying Low-Quality questions on Community Question Answering Services2022 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN55064.2022.9892454(1-8)Online publication date: 18-Jul-2022
  • (2022)Analysis of community question‐answering issues via machine learning and deep learningCAAI Transactions on Intelligence Technology10.1049/cit2.120818:1(95-117)Online publication date: 4-May-2022
  • (2022)Code samples summarization for knowledge exchange in developer communitySoftware: Practice and Experience10.1002/spe.315153:2(347-365)Online publication date: 20-Sep-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media