Tackling the Credit Assignment Problem in Reinforcement Learning-Induced Pedagogical Policies with Neural Networks

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12748))

Included in the following conference series:

International Conference on Artificial Intelligence in Education

3668 Accesses
7 Citations

Abstract

Intelligent Tutoring Systems (ITS) provide a powerful tool for students to learn in an adaptive, personalized, and goal-oriented manner. In recent years, Reinforcement Learning (RL) has shown to be capable of leveraging previous student data to induce effective pedagogical policies for future students. One of the most desirable goals of these policies is to maximize student learning gains while minimizing the training time. However, this metric is often not available until a student has completed the entire tutor. For this reason, the reinforcement signal of the effectiveness of the tutor is delayed. Assigning credit for each intermediate action based on a delayed reward is a challenging problem denoted the temporal Credit Assignment Problem (CAP). The CAP makes it difficult for most RL algorithms to assign credit to each action. In this work, we develop a general Neural Network-based algorithm that tackles the CAP by inferring immediate rewards from delayed rewards. We perform two empirical classroom studies, and the results show that this algorithm, in combination with a Deep RL agent, can improve student learning performance while reducing training time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

£29.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: GBP 19.95; Price includes VAT (United Kingdom)

eBook: GBP 63.99; Price includes VAT (United Kingdom)

Softcover Book: GBP 79.99; Price includes VAT (United Kingdom)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Where’s the Reward?

Article 14 November 2019

Hierarchical Reinforcement Learning for Pedagogical Policy Induction

Raising Student Completion Rates with Adaptive Curriculum and Contextual Bandits

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E.: TensorFlow: arge-scale machine learning on heterogeneous systems (2015). https://www.tensorflow.org/, software available from tensorflow.org
Andrychowicz, M., Baker, B., et al.: Learning dexterous in-hand manipulation. arXiv:1808.00177 (2018)
Ausin, M.S.: Leveraging deep reinforcement learning for pedagogical policy induction in an intelligent tutoring system. In: Proceedings of the 12th International Conference on Educational Data Mining (EDM 2019) (2019)
Google Scholar
Sanz Ausin, M., Maniktala, M., Barnes, T., Chi, M.: Exploring the impact of simple explanations and agency on batch deep reinforcement learning induced pedagogical policies. In: Bittencourt, I.I., Cukurova, M., Muldner, K., Luckin, R., Millán, E. (eds.) AIED 2020. LNCS (LNAI), vol. 12163, pp. 472–485. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-52237-7_38
Chapter Google Scholar
Azizsoltani, H., et al.: Unobserved is not equal to non-existent: using gaussian processes to infer immediate rewards across contexts. In: Proceedings of the 28th IJCAI (2019)
Google Scholar
Chen, B., Xu, M., Li, L., Zhao, D.: Delay-aware model-based reinforcement learning for continuous control. arXiv preprint arXiv:2005.05440 (2020)
Chi, M., VanLehn, K., Litman, D., Jordan, P.: Empirically evaluating the application of reinforcement learning to the induction of effective and adaptive pedagogical strategies. User Model. User-Adapted Interact. 21(1–2), 137–180 (2011)
Article Google Scholar
Chollet, F.: Keras. https://keras.io (2015)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Iglesias, A., Martínez, P., Aler, R., Fernández, F.: Learning teaching strategies in an adaptive and intelligent educational system through reinforcement learning. Appl. Intell 31(1), 89–106 (2009)
Article Google Scholar
Iglesias, A., Martínez, P., Aler, R., Fernández, F.: Reinforcement learning of pedagogical policies in adaptive and intelligent educational systems. Knowl.-Based Syst. 22(4), 266–270 (2009)
Article Google Scholar
Ju, S., Chi, M., Zhou, G.: Pick the moment: identifying critical pedagogical decisions using long-short term rewards. In: Rafferty, A.N., Whitehill, J., Romero, C., Cavalli-Sforza, V. (eds.) Proceedings of the 13th International Conference on Educational Data Mining, EDM 2020, Fully Virtual Conference, 10–13 July 2020. International Educational Data Mining Society (2020). https://educationaldatamining.org/files/conferences/EDM2020/papers/paper_167.pdf
Ju, S., Zhou, G., Azizsoltani, H., Barnes, T., Chi, M.: Identifying critical pedagogical decisions through adversarial deep reinforcement learning. In: EDM International Educational Data Mining Society (IEDMS) (2019)
Google Scholar
Koedinger, K.R., Anderson, J.R., Hadley, W.H., Mark, M.A.: Intelligent tutoring goes to school in the big city. Int. J. Artif. Intell. Educ. (IJAIED) 8, 30–43 (1997)
Google Scholar
Mandel, T., Liu, Y.E., Levine, S., Brunskill, E., Popovic, Z.: Offline policy evaluation across representations with applications to educational games. In: Proceedings of the 2014 International Conference on Autonomous Agents and Multi-agent Systems, pp. 1077–1084. International Foundation for Autonomous Agents and Multiagent Systems (2014)
Google Scholar
McLaren, B.M., van Gog, T., Ganoe, C., Yaron, D., Karabinos, M.: Exploring the assistance dilemma: comparing instructional support in examples and problems. In: Trausan-Matu, S., Boyer, K.E., Crosby, M., Panourgia, K. (eds.) ITS 2014. LNCS, vol. 8474, pp. 354–361. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07221-0_44
Chapter Google Scholar
McLaren, B.M., Isotani, S.: When is it best to learn with all worked examples? In: Biswas, G., Bull, S., Kay, J., Mitrovic, A. (eds.) AIED 2011. LNCS (LNAI), vol. 6738, pp. 222–229. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21869-9_30
Chapter Google Scholar
McLaren, B.M., Lim, S.J., Koedinger, K.R.: When and how often should worked examples be given to students? New results and a summary of the current state of research. In: Proceedings of the 30th Annual Conference of the Cognitive Science Society, pp. 2176–2181 (2008)
Google Scholar
Minsky, M.: Steps toward artificial intelligence. Proc. IRE 49, 8–30 (1961)
Article MathSciNet Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)
Article Google Scholar
Najar, A.S., Mitrovic, A., McLaren, B.M.: Learning with intelligent tutors and worked examples: selecting learning activities adaptively leads to better learning outcomes than a fixed curriculum. User Model. User-Adapted Interact. 26(5), 459–491 (2016). https://doi.org/10.1007/s11257-016-9181-y
Article Google Scholar
Rafferty, A.N., Brunskill, E., et al.: Faster teaching via pomdp planning. Cognit. Sci. 40(6), 1290–1332 (2016)
Article Google Scholar
Renkl, A., Atkinson, R.K., et al.: From example study to problem solving: smooth transitions help learning. J. Exp. Educ. 70(4), 293–315 (2002)
Article Google Scholar
Salden, R.J., Aleven, V., Schwonke, R., Renkl, A.: The expertise reversal effect and worked examples in tutored problem solving. Instr. Sci. 38(3), 289–307 (2010)
Article Google Scholar
Schwab, D., Ray, S.: Offline reinforcement learning with task hierarchies. Mach. Learn. 106(9), 1569–1598 (2017). https://doi.org/10.1007/s10994-017-5650-8
Article MathSciNet MATH Google Scholar
Schwonke, R., Renkl, A., Krieg, C., Wittwer, J., Aleven, V., Salden, R.: The worked-example effect: not an artefact of lousy control conditions. Comput. Hum. Behav. 25(2), 258–266 (2009)
Article Google Scholar
Shen, S., Ausin, M.S., Mostafavi, B., Chi, M.: Improving learning & reducing time: a constrained action-based reinforcement learning approach. In: UMAP, pp. 43–51. ACM (2018)
Google Scholar
Shen, S., Chi, M.: Aim Low: Correlation-based Feature Selection for Model-based Reinforcement Learning. International Educational Data Mining Society (2016)
Google Scholar
Shen, S., Chi, M.: Reinforcement learning: the sooner the better, or the later the better? In: UMAP, pp. 37–44. ACM (2016)
Google Scholar
Shen, S., Mostafavi, B., Lynch, C., Barnes, T., Chi, M.: Empirically evaluating the effectiveness of pomdp vs. mdp towards the pedagogical strategies induction. In: Penstein Rosé, C., et al. (eds.) AIED 2018. LNCS (LNAI), vol. 10948, pp. 327–331. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93846-2_61
Chapter Google Scholar
Silver, D., Huang, A., Maddison, C.J., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)
Article Google Scholar
Silver, D., Hubert, T., Schrittwieser, J., et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)
Article MathSciNet Google Scholar
Sutton, R.S.: Learning to predict by the methods of temporal differences. Mach. Learn. 3(1), 9–44 (1988)
Google Scholar
Sweller, J., Cooper, G.A.: The use of worked examples as a substitute for problem solving in learning algebra. Cognit. Instr. 2(1), 59–89 (1985)
Article Google Scholar
VanLehn, K., Graesser, A.C., et al.: When are tutorial dialogues more effective than reading? Cognit. Sci. 31(1), 3–62 (2007)
Article Google Scholar
Vinyals, O., Babuschkin, I., Czarnecki, W., et al.: Grandmaster level in StarCraft ii using multi-agent reinforcement learning. Nature 575, 350 (2019)
Article Google Scholar
Wang, P., Rowe, J., Min, W., Mott, B., Lester, J.: Interactive narrative personalization with deep reinforcement learning. In: IJCAI (2017)
Google Scholar
Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. arXiv:1511.06581 (2015)
Zhou, G., Azizsoltani, H., Ausin, M.S., Barnes, T., Chi, M.: Hierarchical reinforcement learning for pedagogical policy induction (extended abstract). In: IJCAI, pp. 4691–4695. ijcai.org (2020)
Google Scholar

Download references

Acknowledgements

This research was supported by the NSF Grants: #1726550, #1651909, #1937037 and #2013502.

Author information

Authors and Affiliations

North Carolina State University, Raleigh, NC, 27695, USA
Markel Sanz Ausin, Mehak Maniktala, Tiffany Barnes & Min Chi

Authors

Markel Sanz Ausin
View author publications
You can also search for this author in PubMed Google Scholar
Mehak Maniktala
View author publications
You can also search for this author in PubMed Google Scholar
Tiffany Barnes
View author publications
You can also search for this author in PubMed Google Scholar
Min Chi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Markel Sanz Ausin .

Editor information

Editors and Affiliations

Technion – Israel Institute of Technology, Haifa, Israel
Ido Roll
Arizona State University, Tempe, AZ, USA
Danielle McNamara
Utrecht University, Utrecht, The Netherlands
Sergey Sosnovsky
London Knowledge Lab, London, UK
Rose Luckin
University of Leeds, Leeds, UK
Vania Dimitrova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ausin, M.S., Maniktala, M., Barnes, T., Chi, M. (2021). Tackling the Credit Assignment Problem in Reinforcement Learning-Induced Pedagogical Policies with Neural Networks. In: Roll, I., McNamara, D., Sosnovsky, S., Luckin, R., Dimitrova, V. (eds) Artificial Intelligence in Education. AIED 2021. Lecture Notes in Computer Science(), vol 12748. Springer, Cham. https://doi.org/10.1007/978-3-030-78292-4_29

Download citation

DOI: https://doi.org/10.1007/978-3-030-78292-4_29
Published: 11 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-78291-7
Online ISBN: 978-3-030-78292-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics