Abstract
Using automated classifiers to code discourse data enables researchers to carry out analyses on large datasets. This paper presents a detailed example of applying training, validation and test sets frequently utilized in machine learning to develop automated classifiers for use in quantitative ethnography research. The method was applied to two dispositional constructs. Within one cycle of the process, reliable and valid automated classifiers were developed for Social Disposition. However, the automated coding scheme for Inclusive Disposition was rejected during the validation stage due to issues of overfitting. Nonetheless, the results demonstrate the beneficial potential of using preclassified datasets in enhancing the efficiency and effectiveness of the automation process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)
Dönmez, P., Rosé, C., Stegmann, K., Weinberger, A., Fischer, F.: Supporting CSCL with automatic corpus analysis technology. In: Proceedings of the 2005 Conference on Computer Support for Collaborative Learning (CSCL), pp. 125–134. International Society of the Learning Sciences (2005)
Eagan, B.R., Hamilton, E.: Epistemic network analysis of an international digital makerspace in Africa, Europe, and the US. Paper presented at the annual meeting of the American Education Research Association (AERA), New York (2018)
Eagan, B.R., Rogers, B., Pozen, R., Marquart, C., Shaffer, D.W.: rhoR: rho for inter rater reliability (version 1.2.1.0) (2019)
Eagan, B.R., Rogers, B., Serlin, R., Ruis, A.R., Arastoopour Irgens, G., Shaffer, D.W.: Can we rely on IRR? Testing the assumptions of inter-rater reliability. In: Proceedings of the 12th International Conference on Computer Supported Collaborative Learning, Philadelphia (2017)
Espino, D.P., Lee, S.B., Eagan, B.R., Hamilton, E.R.: An initial look at the developing culture of online global meet-ups in establishing a collaborative, STEM media-making community. In: Proceedings of the 13th International Conference on Computer-Supported Collaborative Learning (CSCL), pp. 608–611. International Society of the Learning Sciences (2019)
Frederiksen, J.R., Sipusic, M., Sherin, M., Wolfe, E.W.: Video portfolio assessment: creating a framework for viewing the functions of teaching. Educ. Assess. 5(4), 225–297 (1998)
Haykin, S.S.: Neural Networks and Learning Machines, 3rd edn. Prentice Hall, New York (2009)
Herrenkohl, L.R., Cornelius, L.: Investigating elementary students’ scientific and historical argumentation. J. Learn. Sci. 22(3), 413–461 (2013)
Katz, L.G., McClellan, D.E.: Research into practice series, vol. 8. Fostering children’s social competence: the teacher’s role. National Association for the Education of Young Children, Washington, D.C. (1997)
Lee, S.B., Espino, D.P., Hamilton, E.R.: Exploratory research application of epistemic network analysis for examining international virtual collaborative STEM learning. Paper presented at the annual meeting of the American Educational Research Association (AERA), Toronto (2019)
Lever, J., Krzywinski, M., Altman, N.: Points of significance: model selection and overfitting. Nat. Methods 13(9), 703–704 (2016)
Marquart, C., Swiecki, Z., Eagan, B.R., Shaffer, D.W.: ncodeR: techniques for automated classifiers (version 0.1.2) (2018)
Marquart, C., Hinojosa, C., Swiecki, Z., Eagan, B., Shaffer, D.W.: Epistemic network analysis (version 1.5.2) (2018)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)
Shaffer, D.W.: Quantitative Ethnography. Cathcart Press, Madison (2017)
Shaffer, D.W.: Big data for thick description of deep learning. In: Millis, K., Long, D., Magliano, J., Wiemer, K. (eds.) Deep Comprehension, pp. 265–277. Routledge, New York (2018)
Shaffer, D.W., Ruis, A.R.: Epistemic network analysis: a worked example of theory-based learning analytics. In: Lang, C., Siemens, G., Wise, A.F., Gasevic, D. (eds.) Handbook of Learning Analytics, pp. 175–187. Society for Learning Analytics Research (2017)
Swiecki, Z., Ruis, A.R., Farrell, C., Shaffer, D.W.: Assessing individual contributions to collaborative problem solving: a network analysis approach. Comput. Hum. Behav. (2019, in press)
Wise, A.F., Shaffer, D.W.: Why theory matters more than ever in the age of big data. J. Learn. Anal. 2(2), 5–13 (2016)
Acknowledgements
The authors gratefully acknowledge funding support from the US National Science Foundation for the work this paper reports. Views appearing in this paper do not reflect those of the funding agency.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Lee, S.B., Gui, X., Manquen, M., Hamilton, E.R. (2019). Use of Training, Validation, and Test Sets for Developing Automated Classifiers in Quantitative Ethnography. In: Eagan, B., Misfeldt, M., Siebert-Evenstone, A. (eds) Advances in Quantitative Ethnography. ICQE 2019. Communications in Computer and Information Science, vol 1112. Springer, Cham. https://doi.org/10.1007/978-3-030-33232-7_10
Download citation
DOI: https://doi.org/10.1007/978-3-030-33232-7_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33231-0
Online ISBN: 978-3-030-33232-7
eBook Packages: Computer ScienceComputer Science (R0)