[go: up one dir, main page]
More Web Proxy on the site http://driver.im/ skip to main content
10.1145/3598579.3689378acmconferencesArticle/Chapter ViewAbstractPublication PagescompedConference Proceedingsconference-collections
research-article
Open access

Where's the Data? Finding and Reusing Datasets in Computing Education

Published: 24 September 2024 Publication History

Abstract

Computing education research (CER) is a rapidly advancing discipline, offering vast potential for data-driven, secondary research or replication studies. Although gathering and analyzing data for research seem straightforward, making research data publicly available to the community remains a challenge. Likewise, finding and reusing high-quality, prominent, and well-documented research data proves to be a daunting task. In this working group paper, the authors present their search for available datasets in the CER context (e.g., in databases and repositories). The available datasets are further analyzed using a newly developed metadata scheme and presented to the community as a resource. The second component of this work is a summary of the community's perspective and concerns on publishing their research data, which has been gathered through a survey among 52 computing education researchers. Based on this status quo, this report presents recommendations for measures and future steps for the community to become more accessible and establish open data practices. We thus emphasize the potential of making research data available to enhance productivity, transparency, and reproducibility in the CER community.

References

[1]
ACM. 2023. ACM Digital Library. https://dl.acm.org/
[2]
David Aha and University of California. 2023. UC Irvine Machine Learning Repository. http://archive.ics.uci.edu/
[3]
Alireza Ahadi, Arto Hellas, Petri Ihantola, Ari Korhonen, and Andrew Petersen. 2016. Replication in computing education research: researcher attitudes and experiences. In Proceedings of the 16th Koli Calling International Conference on Computing Education Research (Koli, Finland) (Koli Calling). Association for Computing Machinery, New York, NY, USA, 2--11. https://doi.org/10.1145/ 2999541.2999554
[4]
Carlos Alario-Hoyos. 2021. Dataset MOOC Forum edX. Universidad Carlos III de Madrid. https://doi.org/10.5281/zenodo.5115573
[5]
Farid Anvari and Daniël Lakens. 2018. The replicability crisis and public trust in psychological science. Comprehensive Results in Social Psychology 3, 3 (2018), 266--286.
[6]
Mikko Apiola, Sonsoles López-Pernas, and Mohammed Saqr. 2023. Past, Present and Future of Computing Education Research: A Global Perspective. Springer Nature, Cham. https://doi.org/10.1007/978--3-031--25336--2
[7]
FAIR Aware. 2021. FAIRaware. https://fairaware.dans.knaw.nl/ Last access: 2023--10-08.
[8]
RSJD Baker et al. 2010. Data mining for education. International encyclopedia of education 7, 3 (2010), 112--118.
[9]
Austin Cory Bart, Ryan Whitcomb, Jason Riddle, Omar Saleem, Eli Tilevich, Clifford A. Shaffer, and Dennis Kafura. 2023. CORGIS. The Collection of Really Great, Interesting, Situated Datasets. https://corgis-edu.github.io/corgis/
[10]
Katarzyna Biernacka, Ron Dockhorn, Claudia Engelhardt, Kerstin Helbig, Juliane Jacob, Tereza Kalová, Adienne Karsten, Kristin Meier, Andreas Mühlichen, Janna Neumann, Britta Petersen, Benjamin Slowig, Ute Trautwein-Bruns, Jeanne Wilbrandt, and Cord Wiljes. 2023. Train-the-Trainer-Konzept zum Thema Forschungsdatenmanagement. Zenodo. https://doi.org/10.5281/zenodo.10122153
[11]
Katarzyna Biernacka, Adrian Mulligan, Jonathan Zimmermann, and Rudi Rudiak. 2023. Research Data Sharing and Reuse 2020. Online. https://doi.org/10.17632/ nr9n75cpv2.1 Mendeley Data, V1.
[12]
Katarzyna Biernacka and Niels Pinkwart. 2021. Opportunities for adopting open research data in Learning Analytics. In Advancing the Power of Learning Analytics and Big Data in Education. IGI Global, Hershey, PA, 29--60.
[13]
Katarzyna Biernacka and Sandra Schulz. 2022. Forschungsdatenmanagement in der Informatik. Logos Verlag. https://doi.org/10.30819/5490
[14]
Jeremiah Blanchard, John R. Hott, Vincent Berry, Rebecca Carroll, Bob Edmison, Richard Glassey, Oscar Karnalim, Brian Plancher, and Seán Russell. 2022. Stop Reinventing the Wheel! Promoting Community Software in Computing Education. In Proceedings of the 2022 Working Group Reports on Innovation and Technology in Computer Science Education (, Dublin, Ireland,) (ITiCSEWGR '22). Association for Computing Machinery, New York, NY, USA, 261--292. https://doi.org/10.1145/3571785.3574129
[15]
Christine L. Borgman and Irene V. Pasquetto. 2017. Why Data Sharing and Reuse Are Hard To Do. https://escholarship.org/uc/item/0jj17309
[16]
Neil C. C. Brown, Amjad Altadmri, Sue Sentance, and Michael Kölling. 2018. Blackbox, Five Years On: An Evaluation of a Large-Scale Programming Data Collection Project. In Proceedings of the 2018 ACM Conference on International Computing Education Research (Espoo, Finland) (ICER '18). ACM, New York, 196--204. https://doi.org/10.1145/3230977.3230991
[17]
Neil C. C. Brown and Mark Guzdial. 2024. Confidence vs Insight: Big and Rich Data in Computing Education Research. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1 (Portland, OR, USA) (SIGCSE 2024). Association for Computing Machinery, New York, NY, USA, 158--164. https://doi.org/10.1145/3626252.3630813
[18]
Neil C. C. Brown and Mark Guzdial. 2024. Confidence vs Insight: Big and Rich Data in Computing Education Research. In Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1 (, Portland, OR, USA,) (SIGCSE 2024). Association for Computing Machinery, New York, NY, USA, 158--164. https://doi.org/10.1145/3626252.3630813
[19]
Neil Christopher Charles Brown, Michael Kölling, Davin McCall, and Ian Utting. 2014. Blackbox: A Large Scale Repository of Novice Programmers? Activity. In Proceedings of the ACM Technical Symposium on Computer Science Education (SIGCSE). ACM, New York, 000000--000000. https://doi.org/10.1145/2538862. 2538924
[20]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]
[21]
Peter Brusilovsky, Ken Koedinger, David A. Joyner, and Thomas W. Price. 2020. Building an Infrastructure for Computer Science Education Research and Practice at Scale. In Proceedings of the Seventh ACM Conference on Learning @ Scale (Virtual Event, USA) (L@S '20). Association for Computing Machinery, New York, NY, USA, 211--213. https://doi.org/10.1145/3386527.3405936
[22]
Canadian Institute for Cybersecurity (CIC). 2023. CIC Datasets. https://www. unb.ca/cic/datasets/index.html.
[23]
Canadian Institute for Cybersecurity (CIC). 2023. University of New Brunswick. https://www.unb.ca/cic/datasets/index.html.
[24]
Carnegie Mellon University. 2023. Datashop@Carnegie Mellon University. https://pslcdatashop.web. cmu.edu
[25]
Arturo Casadevall and Ferric C. Fang. 2010. Reproducible Science. Infection and Immunity 78, 12 (2010), 4972--4975. https://doi.org/10.1128/IAI.00908--10 arXiv:https://journals.asm.org/doi/pdf/10.1128/IAI.00908--10
[26]
Lillian N. Cassel, Gordon Davies, William Fone, Anneke Hacquebard, John Impagliazzo, Richard LeBlanc, Joyce Currie Little, Andrew McGettrick, and Michela Pedrona. 2007. The Computing Ontology: Application in Education. In Working Group Reports on ITiCSE on Innovation and Technology in Computer Science Education (Dundee, Scotland) (ITiCSE-WGR '07). Association for Computing Machinery, New York, 171--183. https://doi.org/10.1145/1345443.1345439
[27]
Center for open Science. 2023. Open Sciene Framework. https://osf.io/
[28]
Centre for Science and Technology Studies, Elsevier and Leiden University. 2017. Open Data. The researcher perspective. Technical Report. Centre for Science and Technology Studies, Elsevier and Leiden University. https://www.elsevier.com/ open-science/research-data/open-data-report
[29]
CERN. 2023. Zenodo. https://zenodo.org/
[30]
Lee Chaw. 2022. Dataset related to MOOCs. UCSI University. https://doi.org/10. 17632/v398vj34h6.1 V1.
[31]
Juan Chen, Sheikh Ghafoor, and John Impagliazzo. 2022. Producing competent HPC graduates. Commun. ACM 65, 12 (2022), 56--65.
[32]
Victoria Clarke and Virginia Braun. 2014. Thematic Analysis. Springer New York, New York, NY, 1947--1952. https://doi.org/10.1007/978--1--4614--5583--7_311
[33]
Andy Cockburn, Pierre Dragicevic, Lonni Besançon, and Carl Gutwin. 2020. Threats of a Replication Crisis in Empirical Computer Science. Commun. ACM 63, 8 (jul 2020), 70--79. https://doi.org/10.1145/3360311
[34]
Code.Org. 2023. Code.org's Annual State of Computer Science Education Report. https://code.org.
[35]
Code.Org. 2023. Code.org's Annual State of Computer Science Education Report. https://code.org/research.
[36]
European Commission, Directorate-General for Research, and Innovation. 2018. Cost-benefit analysis for FAIR research data -- Cost of not having FAIR research data. Publications Office. https://doi.org/10.2777/02999
[37]
PREMIS Editorial Committee. 2015. PREMIS Data Dictionary for Preservation Metadata. http://www.loc.gov/standards/premis
[38]
Computing Technology Industry Association (CompTIA). 2023. Tech Industry Workforce Report. https://www.comptia.org/
[39]
Edward M Corrado. 2021. Artificial intelligence: The possibilities for metadata creation. Technical Services Quarterly 38, 4 (2021), 395--405.
[40]
Creative Commons Corporation. 2023. Creative Commons. https:// creativecommons.org
[41]
DataCite MetadataWorking Group. 2021. DataCite Metadata Schema Documentation for the Publication and Citation of Research Data and Other Research Outputs v4.4. https://doi.org/10.14454/3W3Z-SA82
[42]
Carnegie Mellon University DataLab. 2023. What is Educational Data Mining (EDM)' https://www.cmu.edu/datalab/getting-started/what-is-edm.html Last access: 2023--11--10.
[43]
Anusuriya Devaraju, Robert Huber, Mustapha Mokrane, Patricia Herterich, Linas Cepinskas, Jerry de Vries, Herve L'Hours, Joy Davidson, and Angus White. 2022. FAIRsFAIR Data Object Assessment Metrics. https://doi.org/10. 5281/zenodo.6461229
[44]
Hendrik Drachsler and Wolfgang Greller. 2016. Privacy and analytics. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge - LAK '16. ACM, New York, 89--98. https://doi.org/10.1145/2883851.2883893
[45]
Dublin Core? Metadata Initiative (DCMI). 2023. DCMI Metadata Terms. https://www.dublincore.org/specifications/dublin-core/dcmi-terms/
[46]
Florian Echtler and Maximilian Häußler. 2018. Open Source, Open Science, and the Replication Crisis in HCI. In Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI EA '18). Association for Computing Machinery, New York, NY, USA, 1--8. https://doi.org/10.1145/3170427.3188395
[47]
John Edwards. 2022. 2021 CS1 Keystroke Data. Utah State University. https://doi.org/10.7910/DVN/BVOF7S
[48]
John Edwards, Kaden Hart, and Raj Shrestha. 2023. Review of CSEDM Data and Introduction of Two Public CS1 Keystroke Datasets. Journal of Educational Data Mining 15, 1 (March 2023), 1--31. https://doi.org/10.5281/zenodo.7646659
[49]
Elsevier. 2023. Mendeley Data. https://data.mendeley.com/
[50]
Claudia Engelhardt, Raisa Barthauer, Katarzyna Biernacka, Aoife Coffey, Ronald Cornet, Alina Danciu, Yuri Demchenko, Stephen Downes, Christopher Erdmann, Federica Garbuglia, Kerstin Germer, Kerstin Helbig, et al. 2022. How to be FAIR with your data. Universitätsverlag Göttingen, Göttingen. https://doi.org/10. 17875/gup2022--1915
[51]
European Union (EU). 2023. European Data. https://data.europa.eu/en/
[52]
European Union. 2023. European Open Science Cloud. https://eosc-portal.eu/
[53]
Expanding Computing Education Pathways (ECEP) Alliance. 2023. National Data Resources. https://ecepalliance.org/cs-data/national-data-resources/
[54]
FAIRsFAIR. 2023. FAIRsFAIR Research Data Object Assessment Service. https: //www.f-uji.net/ Last access: 2023--10--20.
[55]
R. A. Fisher. 1988. Iris. UCI Machine Learning Repository.
[56]
National Center for Education Statistics (NCES). 2023. Integrated Postsecondary Education Data System (IPEDS). https://nces.ed.gov/ipeds/
[57]
Erin D Foster and Ariel Deardorff. 2017. Open science framework (OSF). Journal of the Medical Library Association: JMLA 105, 2 (2017), 203.
[58]
Dolores Frias-Navarro, Juan Pascual-Llobell, Marcos Pascual-Soler, Jose Perezgonzalez, and Jose Berrios-Riquelme. 2020. Replication crisis or an opportunity to improve scientific production? European Journal of Education 55, 4 (2020), 618--631.
[59]
Ge Gao, Samiha Marwan, and ThomasWPrice. 2021. Early performance prediction using interpretable patterns in programming process data. In Proceedings of the 52nd ACM technical symposium on computer science education. ACM, New York, 342--348.
[60]
German Centre for Higher Education Research and Science Studies (DZHW). 2022. Find Higher Education and Science Research Data Packages. https: //metadata.fdz.dzhw.eu/en/start
[61]
German Research Foundation (DFG). 2023. Handling of Research Data. https://www.dfg.de/en/research_funding/principles_dfg_funding/ research_data/index.html
[62]
GitHub. 2023. GitHub Archive. https://github.com/datasets.
[63]
Jeremy Goecks, Anton Nekrutenko, and James Taylor. 2010. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome biology 11, 8 (2010), 1--13.
[64]
Goeller, Sandra and Soltau, Kerstin. 2022. Dokumentation RADAR Metadatenschema. https://radar.products.fiz-karlsruhe.de/sites/default/files/radar/docs/ info/RADAR_Metadaten_Dokumentation_v9.1.pdf
[65]
Alejandra González-Beltrán, Peter Li, Jun Zhao, Maria Susana Avila-Garcia, Marco Roos, Mark Thompson, Eelke van der Horst, Rajaram Kaliyaperumal, Ruibang Luo, Tin-Lap Lee, et al. 2015. From peer-reviewed to peer-reproduced in scholarly publishing: the complementary roles of data models and workflows in bioinformatics. PLOS one 10, 7 (2015), e0127612.
[66]
Google. 2023. Dataset Search. https://datasetsearch.research.google.com/
[67]
Virginia Grande, Päivi Kinnunen, Anne-Kathrin Peters, Matthew Barr, Åsa Cajander, Mats Daniels, Amari N. Lewis, Mihaela Sabin, Matilde Sánchez-Peña, and Neena Thota. 2022. Role Modeling as a Computing Educator in Higher Education: A Focus on Care, Emotions and Professional Competencies. In Proceedings of the 2022 Working Group Reports on Innovation and Technology in Computer Science Education (Dublin, Ireland) (ITiCSE-WGR '22). ACM, New York, 37--63. https://doi.org/10.1145/3571785.3574122
[68]
Wouter Groeneveld, Brett A. Becker, and Joost Vennekens. 2021. How Creatively Are We Teaching and Assessing Creativity in Computing Education: A Systematic Literature Review. https://doi.org/10.5281/zenodo.5752559
[69]
Rahul Gupta, Soham Pal, Aditya Kanade, and Shirish Shevade. 2017. DeepFix: Fixing Common C Language Errors by Deep Learning. Proceedings of the AAAI Conference on Artificial Intelligence 31, 1 (Feb. 2017), 7 pages. https: //doi.org/10.1609/aaai.v31i1.10742
[70]
Qiang Hao, David H. Smith IV, Naitra Iriumi, Michail Tsikerdekis, and Amy J. Ko. 2019. A Systematic Investigation of Replications in Computing Education Research. ACM Trans. Comput. Educ. 19, 4, Article 42 (aug 2019), 18 pages. https://doi.org/10.1145/3345328
[71]
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst. 5, 4, Article 19 (Dec. 2015), 19 pages. https://doi.org/10.1145/2827872
[72]
Harvard University Library. 2023. Dataverse Project. https://dataverse.org/
[73]
David Hovemeyer, Arto Hellas, Andrew Petersen, and Jaime Spacco. 2017. Progsnap: Sharing Programming Snapshots for Research (Abstract Only). In Proceedings of the 2017 ACM SIGCSE Technical Symposium on Computer Science Education (Seattle, Washington, USA) (SIGCSE '17). Association for Computing Machinery, New York, NY, USA, 709. https://doi.org/10.1145/3017680.3022418
[74]
Jeff Huang. 2022. Computer Science Open Data ? jeffhuang.com. https:// jeffhuang.com/computer-science-open-data/. [Accessed 22--11--2023].
[75]
Hugging Face. 2023. Hugging Face. https://huggingface.co/
[76]
IEEE. 2023. IEEE DataPort: Dataset Storage and Dataset Search Platform. https: //ieee-dataport.org/
[77]
IEEE. 2023. IEEE Xplore. https://ieeexplore.ieee.org/
[78]
IEEE Computer Society. 2020. IEEE Standard for Learning Object Metadata., 50 pages. https://doi.org/10.1109/IEEESTD.2020.9262118
[79]
Petri Ihantola, Arto Vihavainen, Alireza Ahadi, Matthew Butler, Jürgen Börstler, Stephen H. Edwards, Essi Isohanni, Ari Korhonen, Andrew Petersen, Kelly Rivers, Miguel Ángel Rubio, Judy Sheard, Bronius Skupas, Jaime Spacco, Claudia Szabo, and Daniel Toll. 2015. Educational Data Mining and Learning Analytics in Programming: Literature Review and Case Studies. In Proceedings of the 2015 ITiCSE on Working Group Reports (ITICSE-WGR '15). ACM, New York, 41--63.
[80]
Darrel Ince. 2011. The Duke University scandal - what can be done? Significance 8, 3 (Aug. 2011), 113--115. https://doi.org/10.1111/j.1740--9713.2011.00505.x
[81]
The NLP Index. 2023. The NLP Index. https://index.quantumstat.com/
[82]
International Association for the Evaluation of Educational Achievement (IEA). 2023. International Computer and Information Literacy Study (ICILS) 2018 Dataset. https://www.iea.nl/studies/iea/icils-2018.
[83]
Matti Järvisalo, Daniel Le Berre, Olivier Roussel, and Laurent Simon. 2012. The international SAT solver competitions. Ai Magazine 33, 1 (2012), 89--92.
[84]
Johan Jeuring, Hieke Keuning, Samiha Marwan, Dennis Bouvier, Cruz Izu, Natalie Kiesler, Teemu Lehtinen, Dominic Lohr, Andrew Petersen, and Sami Sarsa. 2022. Steps Learners Take When Solving Programming Tasks, and How Learning Environments (Should) Respond to Them. In Proceedings of the 27th ACM Conference on on Innovation and Technology in Computer Science Education Vol. 2 (Dublin, Ireland) (ITiCSE '22). ACM, New York, 570--571. https://doi.org/ 10.1145/3502717.3532168
[85]
Johan Jeuring, Hieke Keuning, Samiha Marwan, Dennis Bouvier, Cruz Izu, Natalie Kiesler, Teemu Lehtinen, Dominic Lohr, Andrew Peterson, and Sami Sarsa. 2022. Towards Giving Timely Formative Feedback and Hints to Novice Programmers. In Proceedings of the 2022 Working Group Reports on Innovation and Technology in Computer Science Education (Dublin, Ireland) (ITiCSE-WGR '22). ACM, New York, 95--115. https://doi.org/10.1145/3571785.3574124
[86]
Kaggle. 2023. Kaggle. https://www.kaggle.com/
[87]
Daniel S. Katz, Morane Gruenpeter, and Tom Honeyman. 2021. Taking a fresh look at FAIR for research software. Patterns 2, 3 (2021), 100222. https://doi.org/ 10.1016/j.patter.2021.100222
[88]
Daniel S. Katz, Fotis E. Psomopoulos, and Leyla Jael Castro. 2021. Working Towards Understanding the Role of FAIR for Machine Learning. In 2ndWorkshop on Data and Research Objects Management for Linked Open Science. ZB MEDPublikationsportal Lebenswissenschaften, Online, 1--6. https://doi.org/10.4126/ FRL01-006429415
[89]
Hieke Keuning. 2024. The interplay between rich and big data in programming education research. In 22. Fachtagung Bildungstechnologien (DELFI), Sandra Schulz and Natalie Kiesler (Eds.). Gesellschaft für Informatik e.V., Bonn, 19--21. https://doi.org/10.18420/delfi2024_01
[90]
Natalie Kiesler. 2022. Dataset: Recursive problem solving in the online learning environment CodingBat by computer science students. Online. https://doi. org/10.21249/DZHW:studentsteps:1.0.0 Datenerhebung: 2017. Version: 1.0.0. Datenpaketzugangsweg: Download-SUF. Hannover: FDZ-DZHW.
[91]
Natalie Kiesler. 2022. Daten- und Methodenbericht Rekursive Problemlösung in der Online Lernumgebung CodingBat durch Informatik-Studierende. Technical Report. DZHW. https://metadata.fdz.dzhw.eu/public/files/data-packages/stustudentsteps$/ attachments/studentsteps_Data_Methods_Report_de.pdf
[92]
Natalie Kiesler, John Impagliazzo, Katarzyna Biernacka, Amanpreet Kapoor, Zain Kazmi, Sujeeth Goud Ramagoni, Aamod Sane, Keith Tran, Shubbhi Taneja, and Zihan Wu. 2023. Where?s the Data? Exploring Datasets in Computing Education. In Proceedings of the ACM Conference on Global Computing Education Vol 2 (Hyderabad, India) (CompEd 2023). Association for Computing Machinery, New York, NY, USA, 209--210. https://doi.org/10.1145/3617650.3624951
[93]
Natalie Kiesler, John Impagliazzo, Katarzyna Biernacka, Amanpreet Kapoor, Zain Kazmi, Sujeeth G Ramagoni, Aamod Sane, Keith Tran, Shubbi Taneja, and Zihan Wu. 2024. CompEd Working Group 2023 - Supplementary Material. https://doi.org/10.17605/OSF.IO/R83S5
[94]
Natalie Kiesler, Simone Opel, and Carsten Thorbrügge. 2024. With Great Power Comes Great Responsibility: Integrating Data Ethics into Computing Education. In Proceedings of the 2024 Conference on Innovation and Technology in Computer Science Education V. 2 (Milan, Italy) (ITiCSE 2024). ACM, New York. https: //doi.org/10.1145/3649217.3653637
[95]
Natalie Kiesler and Benedikt Pfülb. 2023. Higher Education Programming Competencies: A Novel Dataset. In Artificial Neural Networks and Machine Learning -- ICANN 2023, Lazaros Iliadis, Antonios Papaleonidas, Plamen Angelov, and Chrisina Jayne (Eds.). Springer Nature Switzerland, Cham, 319--330. https://doi.org/10.1007/978--3-031--44198--1_27
[96]
Natalie Kiesler, René Röpke, Daniel Schiffner, Sandra Schulz, Sven Strickroth, Matthias Ehlenz, Birte Heinemann, and Arno Wilhelm-Weidner. 2024. Towards Open Science at the DELFI Conference. In 22. Fachtagung Bildungstechnologien (DELFI), Sandra Schulz and Natalie Kiesler (Eds.). Gesellschaft für Informatik e.V., Bonn, 251--265. https://doi.org/10.18420/delfi2024_22
[97]
Natalie Kiesler and Daniel Schiffner. 2022. On the Lack of Recognition of Software Artifacts and IT Infrastructure in Educational Technology Research. In 20. Fachtagung Bildungstechnologien (DELFI), Peter A. Henning, Michael Striewe, and Matthias Wölfel (Eds.). Gesellschaft für Informatik e.V., Bonn, 201--206. https://doi.org/10.18420/delfi2022-034
[98]
Natalie Kiesler and Daniel Schiffner. 2023. Exploring and Improving Workflows for the Donation and Curation of Research Data. In 1st Conference on Research Data Infrastructure - Connecting Communities, CoRDI 2023, Karlsruhe, Germany, September 12--14, 2023, York Sure-Vetter and Carole A. Goble (Eds.). TIB Open Publishing, Karlsruhe (Germany), 1--4. https://doi.org/10.52825/CORDI.V1I.284
[99]
Natalie Kiesler and Daniel Schiffner. 2023. Open Science in den Bildungstechnologien: Zur Publikation und Begutachtung von Forschungsdaten inklusive Software im Rahmen der DELFI. In Workshops der 21. Fachtagung Bildungstechnologien (DELFI). Gesellschaft für Informatik e.V., Bonn, 159--168. https://doi.org/10.18420/wsdelfi2023--40
[100]
Natalie Kiesler and Daniel Schiffner. 2023. WhyWe Need Open Data in Computer Science Education Research. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education Vol. 1 (Turku, Finland) (ITiCSE 2023). ACM, New York, 348--353. https://doi.org/10.1145/3587102.3588860
[101]
Natalie Kiesler and Daniel Schiffner. 2024. Conferences are Exclusive by Nature. In Proceedings of the 2024 RESPECT Annual Conference (Atlanta, GA, USA) (RESPECT ?24). ACM, New York, 5 pages. https://doi.org/10.1145/3653666.3656077
[102]
Natalie Kiesler, Daniel Schiffner, and Axel Nieder-Vahrenholz. 2023. Adapting RDMO for the Efficient Management of Educational Research Data. In DELFI 2023, Die 21. Fachtagung Bildungstechnologien der Gesellschaft für Informatik e.V., 11.-13. September 2023, Aachen (LNI, Vol. P-338), René Röpke and Ulrik Schroeder (Eds.). Gesellschaft für Informatik e.V., Bonn, 271--272. https://doi.org/10.18420/ DELFI2023--51
[103]
Natalie Kiesler and Carsten Thorbrügge. 2023. Socially Responsible Programming in Computing Education and Expectations in the Profession. In Proceedings of the 2023 Conference on Innovation and Technology in Computer Science Education V. 1 (Turku, Finland) (ITiCSE 2023). ACM, New York, 443--449. https://doi.org/10.1145/3587102.3588839
[104]
Kenneth J Knapp, Christopher Maurer, and Miloslava Plachkinova. 2017. Maintaining a cybersecurity curriculum: Professional certifications as valuable guidance. Journal of Information Systems Education 28, 2 (2017), 101.
[105]
Michael Kölling, Bruce Quig, Andrew Patterson, and John Rosenberg. 2003. The BlueJ system and its pedagogy. Computer Science Education 13, 4 (2003), 249--268.
[106]
Michael Kölling and Ian Utting. 2012. Building an Open, Large-Scale Research Data Repository of Initial Programming Student Behaviour. In Proceedings of the 43rd ACM Technical Symposium on Computer Science Education (SIGCSE '12). ACM, New York, 323--324. https://doi.org/10.1145/2157136.2157234
[107]
Sumith Kulal, Panupong Pasupat, Kartik Chandra, Mina Lee, Oded Padon, Alex Aiken, and Percy S Liang. 2019. SPoC: Search-based Pseudocode to Code. In Advances in Neural Information Processing Systems, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.), Vol. 32. Curran Associates, Inc., Red Hook, NY, USA. https://proceedings.neurips.cc/paper_ files/paper/2019/file/7298332f04ac004a0ca44cc69ecf6f6b-Paper.pdf
[108]
Jakub Kuzilek, Martin Hlosta, and Zdenek Zdrahal. 2017. Open University Learning Analytics dataset. https://doi.org/10.1038/sdata.2017.171
[109]
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradientbased learning applied to document recognition. Proc. IEEE 86, 11 (1998), 2278-- 2324.
[110]
Carnegie Mellon University Libraries. 2023. Carnegie Mellon University Libraries. https://guides.library.cmu.edu/az.php
[111]
Thong Chee Ling, Yusmadi Yah Jusoh, Rusli Adbullah, and Nor Hayati Alwi. 2013. An Ontology for Software Engineering Education.
[112]
LinkedIn. 2023. LinkedIn Economic Graph. https://economicgraph.linkedin. com/
[113]
Monica M. McGill. 2019. Discovering Empirically-Based Best Practices in Computing Education Through Replication, Reproducibility, and Meta-Analysis Studies. In Proceedings of the 19th Koli Calling International Conference on Computing Education Research (Koli, Finland) (Koli Calling '19). Association for Computing Machinery, New York, NY, USA, Article 7, 5 pages. https: //doi.org/10.1145/3364510.3364528
[114]
MDPI. 2023. MDPI Publisher of Open Access Journals. https://www.mdpi.com/
[115]
Meta. 2023. The latest on Machine Learning | Papers with Code. https: //paperswithcode.com/
[116]
Metadata Standards Catalog. 2023. Index of subjects. https://rdamsc.bath.ac. uk/subject-index
[117]
Barend Mons, Herman van Haagen, Christine Chichester, Peter-Bram?t Hoen, Johan T den Dunnen, Gertjan van Ommen, Erik van Mulligen, Bharat Singh, Rob Hooft, Marco Roos, et al. 2011. The value of data. Nature genetics 43, 4 (2011), 281--283.
[118]
N.A. 2019. Degrees in computer and information sciences conferred by postsecondary institutions, by level of degree and sex of student: 1970--71 through 2017--18 ' nces.ed.gov. https://nces.ed.gov/programs/digest/d19/tables/dt19_ 325.35.asp. [Accessed 22--11--2023].
[119]
National Center for Education Statistics (NCES). 2023. National Center for Education Statistics (NCES) Datasets. https://nces.ed.gov/datalab/index.aspx.
[120]
National Science Foundation. 2023. Open Data at NSF. https://www.nsf.gov/ data/
[121]
n.d. 2023. Learning engineering. https://groups.google.com/g/learningengineering/ about. [Accessed 06--12--2023].
[122]
Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. 2011. Reading digits in natural images with unsupervised feature learning. https://storage.googleapis.com/pub-tools-public-publicationdata/ pdf/37648.pdf
[123]
Brian A Nosek, Tom E Hardwicke, Hannah Moshontz, Aurélien Allard, Katherine S Corker, Anna Dreber, Fiona Fidler, Joe Hilgard, Melissa Kline Struhl, Michèle B Nuijten, et al. 2022. Replicability, robustness, and reproducibility in psychological science. Annual review of psychology 73 (2022), 719--748.
[124]
Bureau of Labor Statistics (BLS). 2023. Technical Workforce Report. https: //www.bls.gov/
[125]
Institute of Computing (IComp of the Federal University of Amazonas. 2023. CodeBench. https://codebench.icomp.ufam.edu.br/dataset/
[126]
Open AI. 2023. OpenAI's GPT-3 Playground Usage Data. https://github.com/ openai/gpt-3.
[127]
Open Knowledge Foundation. 2015. Open Definition. http://opendefinition. org/od/2.1/en/
[128]
Open Source Initiative. 2023. OSI Approved Licenses. https://opensource.org/ licenses/
[129]
Organisation for Economic Co-operation and Development (OECD). 2018. PISA 2018 Dataset. https://www.oecd.org/pisa/data/2018database/.
[130]
Benjamin Paaßen. 2020. Python Programming Dataset. https://doi.org/10.4119/ unibi/2941052 Bielefeld University.
[131]
James Paterson, Joshua Adams, Laurie White, Andrew Csizmadia, D Cenk Erdil, Derek Foster, Mark Hills, Zain Kazmi, Karthik Kuber, Sajid Nazir, et al. 2021. Designing dissemination and validation of a framework for teaching cloud fundamentals. In Proceedings of the 2021 Working Group Reports on Innovation and Technology in Computer Science Education. ACM, New York, 163--181.
[132]
Michael Quinn Patton. 2002. Qualitative Research & Evaluation Methods. Sage, Thousand Oaks.
[133]
Ana Persic, Fernanda Beigel, Simon Hodson, and Peggy Oti-Boateng. 2021. The time for open science is now. UNESCO Science Report: The race against time for smarter development 2021 (2021), 12.
[134]
Dirk Pilat and Yukiko Fukasaku. 2007. OECD principles and guidelines for access to research data from public funding. Data Science Journal 6 (2007), OD4--OD11.
[135]
Leo Porter, Daniel Zingaro, Soohyun Nam Liao, Cynthia Taylor, Kevin C Webb, Cynthia Lee, and Michael Clancy. 2019. BDSI: A validated concept inventory for basic data structures. In Proceedings of the 2019 ACM Conference on International Computing Education Research. 111--119.
[136]
James Prather, Paul Denny, Juho Leinonen, Brett A. Becker, Ibrahim Albluwi, Michelle Craig, Hieke Keuning, Natalie Kiesler, Tobias Kohn, Andrew Luxton- Reilly, Stephen MacNeil, Andrew Petersen, Raymond Pettit, Brent N. Reeves, and Jaromir Savelka. 2023. The Robots Are Here: Navigating the Generative AI Revolution in Computing Education. In Proceedings of the 2023 Working Group Reports on Innovation and Technology in Computer Science Education (Turku, Finland) (ITiCSE-WGR '23). Association for Computing Machinery, New York, NY, USA, 108--159. https://doi.org/10.1145/3623762.3633499
[137]
Thomas W Price, David Hovemeyer, Kelly Rivers, Ge Gao, Austin Cory Bart, Ayaan M Kazerouni, Brett A Becker, Andrew Petersen, Luke Gusukuma, Stephen H Edwards, et al. 2020. Progsnap2: A flexible format for programming process data. In Proceedings of the 2020 ACM Conference on Innovation and Technology in Computer Science Education. ACM, New York, 356--362.
[138]
Keith Quille and Keith Nolan. 2022. Predicting Success in CS1-An Open Access Data Project. In Proceedings of the 53rd ACM Technical Symposium on Computer Science Education V. 2. ACM, New York, 1126. https://doi.org/10.1145/3478432. 3499092
[139]
Rajendra Raj, Mihaela Sabin, John Impagliazzo, David Bowers, Mats Daniels, Felienne Hermans, Natalie Kiesler, Amruth N. Kumar, Bonnie MacKellar, Renée McCauley, Syed Waqar Nabi, and Michael Oudshoorn. 2021. Professional Competencies in Computing Education: Pedagogies and Assessment. In Proceedings of the 2021 Working Group Report on Innovation and Technology in Computer Science Education (Virtual Event, Germany) (ITiCSE-WGR '21). ACM, New York, 133--161. https://doi.org/10.1145/3502870.3506570
[140]
Rajendra K Raj, Carol J Romanowski, John Impagliazzo, Sherif G Aly, Brett A Becker, Juan Chen, Sheikh Ghafoor, Nasser Giacaman, Steven I Gordon, Cruz Izu, et al. 2020. High performance computing education: Current challenges and future directions. In Proceedings of the Working Group Reports on Innovation and Technology in Computer Science Education. Association for Computing Machinery, New York, NY, USA, 51--74. https://doi.org/10.1145/3437800.3439203
[141]
Joel R Reidenberg and Florian Schaub. 2018. Achieving big data privacy in education. Theory and Research in Education 16, 3 (2018), 263--279.
[142]
UC Irvine Machine Learning Repository. 2023. Datasets-UCI Machine Learning Repository. https://archive.ics.uci.edu/datasets
[143]
Bernat Romagosa, Michael Ball, Jens Mönig, Brian Harvey, and Jadge Hügle. 2023. Snap! Build Your Own Blocks ' cloud.snap.berkeley.edu. https://cloud. snap.berkeley.edu/. [Accessed 06--12--2023].
[144]
Sage. 2023. Sage. https://us.sagepub.com
[145]
Sarah Berndt Sandra Schulz and Anja Hawlitschek. 2023. Exploring students? and lecturers? views on collaboration and cooperation in computer science courses - a qualitative analysis. Computer Science Education 33, 3 (2023), 318--341. https://doi.org/10.1080/08993408.2021.2022361 arXiv:https://doi.org/10.1080/08993408.2021.2022361
[146]
Geir Kjetil Sandve, Anton Nekrutenko, James Taylor, and Eivind Hovig. 2013. Ten simple rules for reproducible computational research. PLoS computational biology 9, 10 (2013), e1003285.
[147]
Schloss Dagstul. Leibniz-Zentrum für Informatik. 2023. dblp computer science bibliography. https://dblp.org/
[148]
Andreas Scholl and Natalie Kiesler. 2024. Data: Analyzing Chat Protocols of Novice Programmers Solving Introductory Programming Tasks with ChatGPT. https://doi.org/10.17605/OSF.IO/WBKQV
[149]
Andreas Scholl and Natalie Kiesler. 2024. Data: How Novice Programmers Use and Experience ChatGPT when Solving Programming Exercises in an Introductory Course. https://doi.org/10.17605/OSF.IO/6EN4Z
[150]
Sandra Schulz, Sarah Berndt, and Anja Hawlitschek. 2023. Gruppenarbeit beim Programmieren lernen (GAPL). Datenerhebung: 2020. Version: 1.0.0. Datenpaketzugangsweg: SUF: Download. https://doi.org/10.21249/DZHW:dipit2020:1.0.0
[151]
Sue Sentance, Ethel Tshukudu, and Keith Quille. 2022. METRECC Africa 2020 data. University of Cambridge. https://doi.org/10.17863/CAM.87121
[152]
Otto Seppälä, Petri Ihantola, Essi Isohanni, Juha Sorva, and Arto Vihavainen. 2015. Do we know how difficult the rainfall problem is?. In Proceedings of the 15th Koli Calling Conference on Computing Education Research. 87--96.
[153]
International Educational Data Mining Society. 2023. Educational Data Mining. https://educationaldatamining.org/ Last access: 2023--11--10.
[154]
Daniel Spikol, Olga Viberg, Alejandra Martinez-Mones, and Philip Guo (Eds.). 2023. L@S '23: Proceedings of the Tenth ACM Conference on Learning @ Scale (Copenhagen, Denmark). ACM, New York.
[155]
Kristian Stancin, Patrizia Poscic, and Danijela Jaksic. 2020. Ontologies in education--state of the art. Education and Information Technologies 25, 6 (2020), 5301--5320.
[156]
Stanford Vision Lab, Stanford University, Princeton University. 2021. ImageNet. https://www.image-net.org/
[157]
Anna Stepanova, Alexis Weaver, Joanna Lahey, Gerianne Alexander, and Tracy Hammond. 2021. Hiring CS Graduates: What We Learned from Employers. ACM Trans. Comput. Educ. 22, 1, Article 5 (oct 2021), 20 pages. https://doi.org/ 10.1145/3474623
[158]
Cynthia Taylor, Daniel Zingaro, Leo Porter, Kevin C Webb, Cynthia Bailey Lee, and Mike Clancy. 2014. Computer science concept inventories: past and future. Computer Science Education 24, 4 (2014), 253--276.
[159]
Taylor and Francis. 2023. Taylor & Francis Online. https://www.tandfonline. com/
[160]
The Open Knowledge Foundation. 2023. Conformant Licenses. https: //opendefinition.org/licenses/
[161]
Keith Tran. 2023. Systematic-Analysis of Open Access CSed dataset. https: //go.ncsu.edu/csed-dataset Last access: 2023--11-04.
[162]
Ethel Tshukudu, Sue Sentance, Oluwatoyin Adelakun-Adeyemo, Brenda Nyaringita, Keith Quille, and Ziling Zhong. 2023. Investigating K-12 Computing Education in Four African Countries (Botswana, Kenya, Nigeria, and Uganda). ACM Trans. Comput. Educ. 23, 1, Article 9 (jan 2023), 29 pages. https: //doi.org/10.1145/3554924
[163]
Antony Unwin and Kim Kleinman. 2021. The iris data set: In search of the source of virginica. Significance 18 (2021), 4 pages. https://api.semanticscholar. org/CorpusID:244763032
[164]
Zeeshan-Ul-Hassan Usmani and Hussain Shahbaz Khawaja. 2021. Pakistan Intellectual Capital ' kaggle.com. https://www.kaggle.com/datasets/zusmani/ pakistanintellectualcapitalcs. [Accessed 22--11--2023].
[165]
Aline Valente, Maristela Holanda, Ari Melo Mariano, Richard Furuta, and Dilma Da Silva. 2022. Analysis ofAcademic Databases for Literature Reviewin the Computer Science Education Field. In 2022 IEEE Frontiers in Education Conference (FIE). IEEE, Uppsala, Sweden, 1--7. https://doi.org/10.1109/FIE56618.2022.9962393
[166]
Tim van der Zee and Justin Reich. 2018. Open education science. AERA Open 4, 3 (2018), 2332858418787466.
[167]
Laurens Versluis, Mehmet Cetin, Caspar Greeven, Kristian Laursen, Damian Podareanu, Valeriu Codreanu, Alexandru Uta, and Alexandru Iosup. 2023. Less is not more: We need rich datasets to explore. Future Generation Computer Systems 142 (2023), 117--130.
[168]
VisualData. 2023. VisualData Discovery. https://visualdata.io/discovery
[169]
Jan Vykopal. 2020. What Are Cybersecurity Education Papers About? A Systematic Literature Review of SIGCSE and ITiCSE Conferences. In Proceedings of the 51st ACM Technical Symposium on Computer Science Education (Portland, OR, USA) (SIGCSE '20). ACM, New York, 2--8. https://doi.org/10.1145/3328778.3366816
[170]
Thomas Way, Mary-Angela Papalaskari, Lillian Cassel, Paula Matuszek, Carol Weiss, and Yamini Praveena Tella. 2017. Machine Learning Modules for All Disciplines. In Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education (Bologna, Italy) (ITiCSE '17). ACM, New York, 84--85. https://doi.org/10.1145/3059009.3072979
[171]
Mark D Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E Bourne, et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Scientific data 3, 1 (2016), 1--9.
[172]
Mark D. Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E. Bourne, Jildau Bouwman, Anthony J. Brookes, et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3 (2016), 9. https://doi.org/10.1038/sdata.2016.18
[173]
Ian Wolff, David Broneske, and Veit Köppen. 2022. Towards a Learning Analytics Metadata Model. In Companion Proceedings, Alyssa Friend Wise, Roberto Martinez-Maldonado, and Isabel Hilliger (Eds.). 12th International Learning Analytics and Knowledge Conference (LAK'22), Online, 51--53. https://www.solaresearch.org/wp-content/uploads/2022/03/LAK22_ CompanionProceedings.pdf
[174]
Mustafa Yagci. 2022. Educational data mining: prediction of students? academic performance using machine learning algorithms. Smart Learning Environments 9, 1 (2022), 11.

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Conferences
CompEd 2023: Working Group Reports on 2023 ACM Conference on Global Computing Education
September 2024
37 pages
ISBN:9798400702228
DOI:10.1145/3598579
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 September 2024

Check for updates

Author Tags

  1. computing education
  2. datasets
  3. educational data mining
  4. open data
  5. open science
  6. programming process data
  7. reusing data
  8. secondary research

Qualifiers

  • Research-article

Funding Sources

  • CCRI funding
  • National Science Foundation of the United States
  • HEADT Centre
  • Female Promotion of the Computer Science Department of the Humboldt-Universität zu Berlin

Conference

CompEd 2023
Sponsor:

Acceptance Rates

Overall Acceptance Rate 33 of 100 submissions, 33%

Upcoming Conference

CompEd '25
ACM Global Computing Education Conference 2025
October 21 - 25, 2025
Gaborone , Botswana

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 271
    Total Downloads
  • Downloads (Last 12 months)271
  • Downloads (Last 6 weeks)56
Reflects downloads up to 02 Mar 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media