CN116680477A - Personalized problem recommendation method based on reinforcement learning - Google Patents
Personalized problem recommendation method based on reinforcement learning Download PDFInfo
- Publication number
- CN116680477A CN116680477A CN202310703313.2A CN202310703313A CN116680477A CN 116680477 A CN116680477 A CN 116680477A CN 202310703313 A CN202310703313 A CN 202310703313A CN 116680477 A CN116680477 A CN 116680477A
- Authority
- CN
- China
- Prior art keywords
- learner
- model
- reinforcement learning
- personalized
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000002787 reinforcement Effects 0.000 title claims abstract description 70
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000008569 process Effects 0.000 claims abstract description 12
- 230000006870 function Effects 0.000 claims description 62
- 238000012986 modification Methods 0.000 claims description 37
- 230000004048 modification Effects 0.000 claims description 37
- 230000009471 action Effects 0.000 claims description 33
- 238000012549 training Methods 0.000 claims description 33
- 239000013598 vector Substances 0.000 claims description 29
- 238000004364 calculation method Methods 0.000 claims description 18
- 230000015654 memory Effects 0.000 claims description 15
- 239000011159 matrix material Substances 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 238000010276 construction Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 230000000717 retained effect Effects 0.000 claims description 6
- 238000003062 neural network model Methods 0.000 claims description 4
- 238000005070 sampling Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 3
- 238000007418 data mining Methods 0.000 abstract description 3
- 210000004027 cell Anatomy 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000006403 short-term memory Effects 0.000 description 4
- 125000004122 cyclic group Chemical group 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000007787 long-term memory Effects 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9535—Search customisation based on user profiles and personalisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
- G06N3/0442—Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/092—Reinforcement learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a personalized problem recommendation method based on reinforcement learning, and relates to the technical field of education data mining. According to the invention, the learning record of the learner is firstly obtained, the potential knowledge level of the learner is judged through the knowledge tracking model, and the potential knowledge level is used as a part of the characteristics of the learner, so that the characteristic modeling of the learner is more accurate. And then, the unsatisfactory problems selected by the learner by mistake in the problem records are deleted through a reinforcement learning algorithm, so that the recommendation accuracy is improved. And finally, performing problem recommendation on the learner through the personalized recommendation model. The invention combines personalized recommendation, knowledge tracking and reinforcement learning algorithms, considers the potential knowledge level of a learner, removes the influence caused by wrong selection problems in the learning process, and has important theoretical and practical application values.
Description
Technical Field
The invention relates to the technical field of education data mining, in particular to a personalized problem recommendation method based on reinforcement learning.
Background
The development of emerging information and communication technologies such as mobile communications, the internet of things, cloud computing, big data, and artificial intelligence is changing the thinking, production, life, and learning ways of humans. The present education is developing in the direction of 'networking, digitalization, individuation, ubiquitous and intelligent' as main characteristics, and a large number of novel education modes such as mobile learning, generalized learning, intelligent learning, mixed learning and the like are emerging.
In recent years, online learning is an emerging personalized learning mode, and by virtue of convenience, openness and richness of learning resources, the online learning successfully attracts registration and use of a large number of learners. In a new generation of learning environment based on the Internet, the learning time is more flexible, the learning method is more various, and the learning resources are more abundant. The learner can autonomously arrange learning time, learning mode and learning resources according to self-learning conditions and learning targets.
However, unlike the conventional classroom, the online education platform cannot supervise and guide the learner in real time, thus creating problems of 'information overload' and 'knowledge navigation'. These problems are mainly manifested in that when a learner faces a large number of learning resources with good quality, a great deal of time is often required to find the learning resources of interest, and meanwhile, how to perform learning planning is not known, and sometimes learning cannot be completed effectively even if a great deal of time is spent. These problems may lead to reduced learning efficiency, reduced learning quality, reduced learning enthusiasm, and increased risk of learning failure. The occurrence of these problems has led to the foreshadowing of numerous educators and researchers, and how to use computers to guide and assist learners instead of teachers has gradually become a popular direction of research.
The problem that online learners are difficult to find interesting problem resources when facing massive learning resources is solved, a feasible personalized problem recommendation algorithm method is provided, so that learning efficiency of the learners is greatly improved, and the problem is needed to be solved at present, and the following three problems are needed to be considered:
first, how to accurately construct the learner's features.
The conventional personalized recommendation model, whether a matrix decomposition model, a cyclic neural network model or an attention mechanism model, models the characteristics of a learner through the problem records of the learner when solving the problem recommendation problem, and does not consider the performance of the learner on the practice problem, so the following problems may occur: assume that learner i and learner j have substantially the same problem record, but different performance on the problem. Learner i has made most exercises and learner j has made most exercises, so that the exercises they select at the next moment are likely to be different.
It can be seen that building features of a learner based only on problems that the learner has made is not accurate enough. How to consider potential knowledge levels of a learner when modeling the learner is a primary concern.
Secondly, how to dig out the influence caused by the wrong choice problem in the learning process.
Often, a learner selects unsatisfactory problems, such as unsatisfactory difficulty or unsatisfactory category, but the problem records do not include the satisfaction degree of the learner on the problems, and the problems of the wrong selection form interference items when modeling the interest characteristics of the learner. Although researchers have attempted to distinguish the importance of problems by assigning different attention coefficients to each of the learner's historical problems through an attention mechanism, the effects of these misconvergence problems still cannot be completely eliminated. How to remove the effect of the misconvergence problem is a necessary issue to consider.
Thirdly, how to accurately conduct problem recommendation.
After considering the potential knowledge level of the learner and removing the influence caused by the wrong choice of the problems, what is needed to be done finally is how to accurately recommend the problems to the learner. The choice of which personalized recommendation algorithm is therefore an important issue to consider.
The problem encountered in online education is handled in combination with reinforcement learning-related algorithms, which is a research hotspot in current educational data mining. The knowledge tracking model, the personalized recommendation model and the reinforcement learning model are combined, the potential knowledge level of a learner is considered, the influence caused by the problem of wrong selection is removed, and the problem of information overload in online education is effectively solved. Personalized problem recommendation by reinforcement learning in online education is a better way to improve learning efficiency of learners.
Disclosure of Invention
The invention aims to solve the technical problem of providing a personalized problem recommendation method based on reinforcement learning aiming at the defects of the prior art, which is a personalized problem recommendation method based on reinforcement learning and combining a knowledge tracking model and a personalized recommendation model, and is used for solving the real problem that a learner is difficult to find interested learning resources in online education.
In order to solve the technical problems, the invention adopts the following technical scheme:
a personalized problem recommendation method based on reinforcement learning comprises the following steps:
step 1: calculating potential knowledge level of the learner by using the knowledge tracking model, and adding the potential knowledge level into the characteristic construction of the personalized recommendation model and the state representation of the problem record modification model;
step 2: constructing and training a personalized recommendation model for problem recommendation;
step 3: designing and training a problem record modification model based on a Deep Q-Learning algorithm of reinforcement Learning to remove dislike or dissatisfaction problems selected by mistake in the Learning process;
step 4: performing joint training on the personalized recommendation model and the problem record modification model;
step 5: and (3) modifying the problem records of the learner by using the problem record modification model obtained after the combined training in the step (4), and recommending the problems of the learner by using the personalized recommendation model obtained after the combined training in the step (4) to obtain a problem recommendation list.
Further, in the step 1, the knowledge tracking model is a depth knowledge tracking model DKT; the DKT model predicts the question score at the next moment according to the historic learning record of the learner by utilizing a time sequence relation through a long-short-period memory network LSTM; the DKT model firstly generates a one-hot vector from the historical achievements of a learner through one-hot coding, the one-hot vector is input into an LSTM network, features are extracted through the LSTM layer, the extracted features are input into a hidden layer, then a prediction result is output from an output layer, and the output of the DKT model represents the probability of each problem correctly answered by the learner, namely the achievement of the next answer of the learner; the output of the LSTM layer is used as the potential knowledge level of the learner and is added to the characteristic construction of the personalized recommendation model and the state representation of the problem record modification model. The input of DKT model is training record of learnerThe exercise record of learner i at time t is specifically denoted +.> wherein />Question indicating that learner i selected at time t,/->The answer result of the learner i at the time t is shown; recording of exercises->Comprises only the exercises of learner i selection learning, exercise record +.>The answer result of learner i is also recorded.
Further, the personalized recommendation model in the step 2 comprises three parts, namely an Embedding layer, a GRU layer and a full connection layer; the Embedding layer is used for mapping one-hot vectors of problem records made by learners to a low-dimensional vector space for encoding; the GRU layer is a gating circulation unit layer, and the layer is also an improved circulation neural network model and is used for extracting sequence characteristics of problem records; the full-connection layer is used for calculating the probability of each problem selected by the learner through the characteristics of the learner, and recommending the problems for the learner according to the size of the selected probability.
Further, the specific method of the step 2 is as follows:
step 2-1: recording problems made by the learner i through an Embedding layerIs>Mapping to low-dimensional vector space for coding, and outputting as low-dimensional vector
Step 2-2: extracting sequence features of problem records through the GRU layer;
the update gate of the GRU determines the amount by which the state information at the previous time and the state information at the current time continue to be transferred into the future, and the calculation formula is as follows:
wherein ,low representing problem done by learner i at time tDimension vector representation, h t-1 Hidden state information indicating time t-1, W z The weight coefficient representing the update gate, σ (·) is the sigmod activation function;
the reset gate of the GRU layer determines the amount by which the state information of the previous time is to be forgotten, and the calculation formula is as follows:
wherein ,Wr A weight coefficient representing a reset gate;
the calculation formula of the current memory content is shown as follows:
wherein ,Wh Is another weight coefficient of the reset gate, reset gate r t And hidden state information h t-1 The corresponding element product of (a) determines the information to be preserved at the previous moment, which is an operator representing the dot product of the matrix;
the final memorized calculation formula of the current time step is shown as follows:
wherein, (1-z t )*h t-1 Information representing the previous time is retained to the amount that is ultimately remembered at the current time,representing the amount of final memory of the current memory content reserved to the current moment; h finally obtained t The sequence characteristic of the problem records of the learner;
step 2-3: the probability of each problem selected by the learner is calculated according to the characteristics of the learner through the full connection layer, and the following formula is shown:
y=softmax(W j ·[K i ,h t ]+b j )
wherein ,Wj Is the weight coefficient of the full connection layer, b j Is the bias factor of the fully connected layer,is the potential knowledge level of learner i calculated by the DKT model; [ K ] i ,h t ]Is the sequence characteristic h of the learner problem record obtained by combining the potential knowledge level of the learner i with the GRU layer t Splicing; softmax (·) is an activation function, limiting the output value between 0 and 1;
step 2-4: the personalized recommendation model adopts cross entropy as a loss function to train and update the model, and the calculation formula is shown as follows:
wherein M is the number of learners, p i Is the true probability distribution of the problem selected by learner i at the next moment, q i The personalized recommendation model gives out the prediction probability distribution for representing the problem selected by the learner i at the next moment;
the cross entropy loss function is an index for measuring the difference between the real probability distribution p and the model predictive probability distribution q;
step 2-5: and sequencing the probability of selecting each problem by the learner i calculated by the personalized recommendation model according to the sequence from big to small, and recommending the first K problems to the learner i to form a problem recommendation list.
Further, the problem record modification model in the step 3 adopts a reinforcement learning related algorithm, including action representation, state representation, rewarding function of the model and reinforcement learning algorithm, and specifically comprises the following steps:
in order to delete problems that are disliked or unsatisfied in the learning process of the learner, the action a of each step t With only two values, a t =0 means that the problem is deleted in the problem record, a t =1 means that the problem is kept in the recordLeaving the problem;
the status of the learner is represented by the following formula:
S=[k 1 ,k 2 ,...,k N ,p 1 ,p 2 ,...,p N ]
wherein ,k1 ,k 2 ,...,k N Representing potential knowledge levels of learners, specifically to the ith learner as Given by a knowledge tracking model; p is p 1 ,p 2 ,...,p N Is a low-dimensional vector representation of learner problem records and location identifiers that function to record the location of modifications;
the reward function of the reinforcement learning module is given by a personalized recommendation model, and the form is shown as follows:
wherein ,etarget Is the problem actually selected by the learner at the next moment,representing the probability of selecting a target problem based on the modified problem records, p (e target |E i ) Representing a probability of selecting a target problem based on the original problem record; the reinforcement learning module adopts a round update strategy, and obtains a reward function only after finishing the modification of the whole learning record of one learner, and the reward function is 0 at the rest time;
the reinforcement learning algorithm adopts a depth Q network algorithm DQN, and the algorithm combines a neural network and a Q-learning algorithm in the traditional reinforcement learning algorithm;
the reinforcement learning module takes the square of the difference between the true value and the predicted value as a loss function, trains and updates the parameters of the DQN model, and the specific formula of the loss function is shown as follows:
wherein ,Qθ (s t ,a t ) Represented in state s t Lower selection action a t Calculating the obtained predicted value of the rewards by a predicted Q network, wherein the network parameter of the predicted Q network is theta;representing state s t Lower selection action a t A true value of the prize available; wherein->Calculated by the target Q network, represents the next state s t+1 The maximum prize value that can be obtained, the network parameters of the target Q network are +.>r t Is the current available reward value, which is given by the reward function;
the gradient of the loss function is shown as follows:
network parameters are updated according to the gradient descent.
Further, the specific procedure of modifying the learner problem record in the step 3 is as follows:
step 3-1: initializing a model, including initializing parameters of a predictive Q network and a target Q network; initializing an experience playback pool, wherein the capacity is N; initializing a learner-modified problem record setLearner index i=1, whenT=0;
step 3-2: obtaining problem records of learner iAnd an initial state s 0 ;
Step 3-3: state s t Feature vector phi(s) t ) As the input of the predictive Q network, obtaining the Q value corresponding to the action in the current state;
step 3-4: selecting action a in current Q value by adopting epsilon-greedy strategy t ;
Step 3-5: if a is t =0, deleteIs->
Step 3-6: in state s t Executing the current action a t Obtaining the next state s t+1 Sum prize r t ;
Step 3-7: will { s ] t ,a t ,r t ,s t+1 This quadruple is stored in an experience playback pool;
step 3-8: updating state s t =s t+1 ;
Step 3-9: sampling m samples { s } from an empirical playback pool j ,a j ,r j ,s j+1 J=1, 2, …, m, calculating the current target Q value y j :
Step 3-10: using a mean square error loss functionUpdating parameters of the predictive Q network;
step 3-11: updating parameters of the target Q network after each step C, wherein the parameter value is the parameter value of the current predicted Q network;
step 3-12: judging whether the moment reaches a set value T or not; if not, returning to the step 3-3; if so, executing the next step;
step 3-13: record E of the problem after modification i Is marked asWill->Added to the problem record set modified by the learner
Step 3-14: judging whether all the problem records of the learners are modified, if not, returning to the step 3-2, continuing the modification of the problem records of the next learner, and if so, ending the step.
Further, the process of the step 4 joint training is specifically as follows:
step 4-1: initializing parameters α=α of personalized recommendation model 0 Parameter β=β of knowledge tracking model 0 And the parameter θ=θ of the reinforcement learning module 0 ;
Step 4-2: using learner exercise recordsTraining a knowledge tracking model;
step 4-3: recording of problems with learnerTraining the personalized recommendation model by the knowledge tracking model;
step 4-4: parameter α=α of the fixed personalized recommendation model 1 And parameter β=β of knowledge tracking model 1 Pre-training the reinforcement learning module; the specific method comprises the following steps:
step 4-4-1: reinforced learning algorithm in problem recordingA step of up-selecting action;
step 4-4-2: calculating a Reward function Reward according to the selected action;
step 4-4-3: updating parameters of the reinforcement Learning module according to a loss function of the Deep Q-Learning algorithm;
step 4-4-4: circularly executing the steps 4-4-1 to 4-4-3 until all problems are recordedThe circulation is completed;
step 4-4-5: repeating the steps 4-4-1 to 4-4-4 until the parameters of the reinforcement learning module reach the optimal values;
step 4-5: parameter β=β for fixed knowledge tracking 1 Performing joint training on the personalized recommendation model and the reinforcement learning module; the specific method comprises the following steps:
step 4-5-1: reinforced learning algorithm in problem recordingA step of up-selecting action;
step 4-5-2: calculating a Reward function Reward according to the selected action;
step 4-5-3: updating parameters of the reinforcement Learning module according to a loss function of the Deep Q-Learning algorithm;
step 4-5-4: step 4-5-1 to step 4-5-3 are circularly performed until allThe circulation is completed;
step 4-5-5: updating parameters of the recommendation model according to the loss function of the recommendation model;
step 4-5-6: and repeatedly and circularly executing the steps 4-5-1 to 4-5-5 until the parameters of the personalized recommendation model and the reinforcement learning module reach the optimal.
The beneficial effects of adopting above-mentioned technical scheme to produce lie in: according to the personalized problem recommendation method based on reinforcement learning, firstly, the learning record of a learner is obtained, the potential knowledge level of the learner is judged through the knowledge tracking model, and the potential knowledge level is used as a part of the characteristics of the learner, so that the characteristic modeling of the learner is more accurate. Then, the chapter tries to delete unsatisfactory problems selected by the learner by mistake in the problem records through a reinforcement learning algorithm, so that the recommendation accuracy is improved. And finally, performing problem recommendation on the learner through the personalized recommendation model. The method combines personalized recommendation, knowledge tracking and reinforcement learning algorithms, considers the potential knowledge level of a learner, removes the influence caused by wrong selection problems in the learning process, and has important theoretical and practical application values.
Drawings
FIG. 1 is a diagram of a personalized problem recommendation model provided by an embodiment of the present invention;
FIG. 2 is a flowchart of a personalized problem recommendation method based on reinforcement learning according to an embodiment of the present invention;
fig. 3 is a block diagram of a knowledge tracking model DKT provided by an embodiment of the present invention;
FIG. 4 is a block diagram of a long and short term memory network LSTM provided by an embodiment of the present invention;
FIG. 5 is a block diagram of a personalized recommendation model provided by an embodiment of the present invention;
fig. 6 is a block diagram of a deep Q network DQN provided by an embodiment of the invention.
Detailed Description
The following describes in further detail the embodiments of the present invention with reference to the drawings and examples. The following examples are illustrative of the invention and are not intended to limit the scope of the invention.
The embodiment provides a personalized problem recommendation method based on reinforcement learning, as shown in fig. 1, a model constructed by the method of the embodiment is composed of three parts, namely a knowledge tracking model, a personalized recommendation model and a problem record modification model. The knowledge tracking model can calculate potential knowledge level of the learner and add the potential knowledge level to the characteristic construction of the personalized recommendation model and the state representation of the problem record modification model. The personalized recommendation model provides a reward function for the problem record modification model and recommends problems for learners. The problem record modification model modifies the history problem record of the learner, and judges and updates the modified problem according to the reward function provided by the personalized learning model, thereby improving the accuracy of problem recommendation. The flow of this method is shown in FIG. 2, and the specific method is as follows.
Step 1: the potential knowledge level of the learner can be calculated by using the knowledge tracking model and added to the characteristic construction of the personalized recommendation model and the state representation of the problem record modification model.
The knowledge tracking model employed in this embodiment is a depth knowledge tracking model (Deep Knowledge Tracing, DKT). The DKT model predicts the topic performance at the next moment according to the historic learning record of the learner by using a time sequence relation through a cyclic neural network or a long-short-term memory network LSTM. The recurrent neural network in this embodiment employs a long short-term memory network LSTM. The DKT model firstly generates a one-hot vector from the historical achievements of the learner through one-hot coding, inputs the one-hot vector into an LSTM network, extracts characteristics through the LSTM layer, inputs the characteristics into a hidden layer, and then outputs a prediction result from an output layer, wherein the output of the DKT represents the probability of the learner to answer each problem correctly, namely the achievement of the next answer of the learner.
The structure of the DKT model is shown in fig. 3, where the model is a knowledge tracking model based on Long Short-Term Memory (LSTM) network, and the potential knowledge level of the learner can be determined by the learner's performance on the learning record. The input of the DKT model is the training record of learner iThe exercise record of learner i at time t is specifically denoted +.> wherein />Problem number indicating the choice of learner i at time t,/->The learner i's performance on the problem at time t is shown, 1 indicates that the problem was done, and 0 indicates that the problem was done incorrectly. First will->Conversion to one-hot vector by one-hot coding>Is input into the LSTM network.
The LSTM network is an improved recurrent neural network, which can solve the problem that RNNs cannot handle long-distance dependence, and the LSTM structure is shown in figure 4.
Different from the cyclic neural network, the long-term and short-term memory neural network introduces a memory state, and the three gating units of the neurons are used for controlling the stored information, so that the memory state of the neurons always stores the information of the whole long sequence.
The forget gate in the LSTM network is responsible for controlling the state reservation at the last moment, and the calculation formula is shown as follows:
wherein ,Wf Is a weight matrix of the forgetting gate,is the input of a forgetting gate at time t, here the training record of learner i at time t,/>Representing the concatenation of two vectors, h t-1 Indicating the output at time t-1, b f The bias term representing the forgetting gate, σ (·) is the sigmoid activation function.
An input gate in the LSTM network is responsible for controlling the input of the current state into the long-term state, and the calculation formula is as follows:
wherein ,WI Is the weight matrix of the input gate, b I Is an offset term of the input gate.
The cell state of the current input is represented as follows:
wherein ,Wc Is a weight matrix of cell states, b c Is an offset term for the cell state, and tanh is an activation function.
Through the above three formulas, the cell state C at the previous time t-1 The cell state at the current time is obtained as shown in the following formula:
where x is an operator representing the dot product of the matrix.
The output gate in the LSTM network is responsible for controlling whether the long-term status is taken as the current output, which is expressed as follows:
wherein ,Wo Is the weight matrix of the output gate, b c Is the bias term of the output gate.
Finally, the output state is obtained by the following formula:
h t =o t *tanh(C t )
the DKT model can comprehensively consider the exercise performance of the learner for a long time and the recent exercise performance, therebyThe potential knowledge level of the learner is determined. And wherein the design of the forgetting gate conforms to the feature that the learner will decrease over time, with a gradual decrease in the level of mastery of previously learned knowledge. The present embodiment marks the output of the LSTM layer as the knowledge level at the potential N knowledge points of learner i asWhich is used as part of the learner's profile to enhance the performance of the recommendation.
Step 2: a personalized recommendation model is built and trained, and comprises three parts, namely an Embedding layer, a GRU layer and a full connection layer. The Embedding layer is used for mapping one-hot vectors of problem records made by a learner i to a low-dimensional vector space for encoding; the GRU layer is a gating circulation unit layer, and is also an improved circulation neural network model and used for extracting sequence characteristics of problem records; the function of the full connection layer is to calculate the probability of selecting each problem by the learner through the characteristic of the learner i, and to recommend problems for the learner according to the size of the selected probability. The personalized recommendation model has two functions: firstly, providing a reward function for the problem record modification model, and secondly, recommending problems for learners. The personalized recommendation model structure is shown in fig. 5, and the specific method is as follows.
Step 2-1: recording problems made by the learner i through an Embedding layerIs>Mapping to low-dimensional vector space for coding, and outputting as low-dimensional vector
Step 2-2: and extracting sequence features of the problem records through the GRU layer.
The GRU layer has only two operations, update gates and reset gates. The GRU layer calculates the output of the reset gate and the update gate according to the input of the current moment and the network hiding state of the last moment, calculates the candidate hiding state according to the input of the current moment and the output of the reset gate, obtains the final hiding state according to the candidate hiding state and the output of the update gate, and obtains the output of the current moment according to the hiding state.
The update gate of the GRU determines the amount by which the state information at the previous time and the state information at the current time continue to be transferred into the future, and the calculation formula is as follows:
wherein ,a low-dimensional vector representation representing problems performed by learner i at time t, h t-1 Hidden state information indicating time t-1, W z The weight coefficient representing the update gate, σ (·) is the sigmoid activation function.
The reset gate of the GRU layer determines the amount by which the state information of the previous time is to be forgotten, and the calculation formula is as follows:
wherein ,Wr Representing the weight coefficient of the reset gate.
The calculation formula of the current memory content is shown as follows:
wherein ,Wh Is reset gate r t Is used to reset the gate r t And hidden state information h t-1 The corresponding element product of (a) determines the information to be retained at the previous instant, an operator representing the dot product of the matrix.
The final memorized calculation formula of the current time step is shown as follows:
wherein, (1-z t )*h t-1 Information representing the previous time is retained to the amount that is ultimately remembered at the current time,indicating the amount of memory that the current memory content remains to the end of the current time. H finally obtained t Is the sequence feature of the learner problem record.
Step 2-3: the probability of each problem selected by the learner is calculated according to the characteristics of the learner through the full connection layer, and the following formula is shown:
y=softmax(W j ·[K i ,h t ]+b j )
wherein ,Wj Is the weight coefficient of the full connection layer, b j Is the bias factor of the fully connected layer,is the potential knowledge level of learner i calculated by the DKT model; [ K ] i ,h t ]Is the sequence characteristic h of the learner problem record obtained by combining the potential knowledge level of the learner i with the GRU layer t Splicing; softmax (·) is an activation function, limiting the output value between 0 and 1;
step 2-4: the personalized recommendation model adopts cross entropy as a loss function to train and update the model, and the calculation formula is shown as follows:
wherein M is the number of learners, p i Is the true probability distribution of the problem selected by learner i at the next moment, q i Is to give a personalized recommendation model to indicate that the learner i is at the next momentA predictive probability distribution of the problem is selected. The cross entropy loss function is an indicator that measures the difference between the true probability distribution p and the model predictive probability distribution q.
Step 2-5: and sequencing the probability of selecting each problem by the learner i obtained by calculation of the personalized recommendation model according to the sequence from big to small, and recommending the first K problems to the learner i to form a problem recommendation list.
Step 3: a problem record modification model is constructed and trained to remove dislike or dissatisfied problems which are mistakenly selected by a learner in the learning process, so that problem recommendation is more accurately performed for the learner. Because the problem record modification model adopts a reinforcement learning related algorithm, the action representation, the state representation, the rewarding function and the reinforcement learning algorithm of the model are described in detail according to the general development flow of reinforcement learning.
(1) Motion representation
The problem record modification model is used for deleting the problems that the learner dislikes or is dissatisfied with, so that the action a of each step t With only two values, a t =0 means that the problem is deleted in the problem record, a t =1 means that the problem is retained in the problem record.
(2) State representation
The status of the learner is represented by the following formula:
S=[k 1 ,k 2 ,...,k N ,p 1 ,p 2 ,...,p N ]
wherein ,k1 ,k 2 ,...,k N Representing potential knowledge levels of the learner, given by a knowledge tracking model; p is p 1 ,p 2 ,...,p N Is a low-dimensional vector representation of the learner problem record and a location identifier that functions to record the location of the modification.
(3) Reward function
The reward function of the reinforcement learning module is given by a personalized recommendation model, and the form is shown as follows:
wherein ,etarget Is the problem actually selected by the learner at the next moment,representing the probability of selecting a target problem based on the modified problem records, p (e target |E i ) Representing the probability of selecting a target problem based on the original problem record. The reinforcement learning module adopts a round-trip updating strategy, and obtains the rewarding function only after finishing the modification of the whole learning record of one learner, and the rewarding function is 0 at the rest time.
(4) Reinforcement learning algorithm
The present embodiment employs a Deep Q Network (DQN) algorithm that combines a neural Network with a Q-learning algorithm in a conventional reinforcement learning algorithm. The structure of the DQN is shown in fig. 6.
The reinforcement learning module takes the square of the difference between the true value and the predicted value as a loss function, trains and updates the parameters of the DQN model, and the specific formula of the loss function is shown as follows:
wherein ,Qθ (s t ,a t ) Represented in state s t Lower selection action a t And calculating the obtained predicted value of the rewards by a predicted Q network, wherein the network parameter of the predicted Q network is theta.Representing state s t Lower selection action a t The true value of the prize that can be achieved. Wherein->Calculated by the target Q network, represents the next state s t+1 Maximum prize value obtainable, target QThe network parameters of the network are->r t Is the current prize value available, given by the prize function.
The gradient of the loss function is shown in the following equation, and the network parameters are updated according to the gradient descent.
The specific procedure for the learner problem record modification is as follows:
step 3-1: initializing a model, including initializing parameters of a predictive Q network and a target Q network; initializing an experience playback pool, wherein the capacity is N; initializing a learner-modified problem record setLearner index i=1, time t=0;
step 3-2: obtaining problem records of learner iAnd an initial state s 0 ;
Step 3-3: state s t Feature vector phi(s) t ) As the input of the predictive Q network, obtaining the Q value corresponding to the action in the current state;
step 3-4: selecting action a in current Q value by adopting epsilon-greedy strategy t ;
Step 3-5: if a is t =0, deleteIs->
Step 3-6: in state s t Executing the current action a t Obtaining the next state s t+1 Sum prize r t ;
Step 3-7: will { s ] t ,a t ,r t ,s t+1 This quadruple is stored in an experience playback pool;
step 3-8: updating state s t =s t+1 ;
Step 3-9: sampling m samples { s } from an empirical playback pool j ,a j ,r j ,s j+1 J=1, 2, …, m, calculating the current target Q value y j :
Step 3-10: using a mean square error loss functionUpdating parameters of the predictive Q network;
step 3-11: updating parameters of the target Q network after each step C, wherein the parameter value is the parameter value of the current predicted Q network;
step 3-12: judging whether the moment reaches a set value T or not; if not, returning to the step 3-3; if so, executing the next step;
step 3-13: record E of the problem after modification i Is marked asWill->Added to the problem record set modified by the learner
Step 3-14: judging whether all the problem records of the learners are modified, if not, returning to the step 3-2, continuing the modification of the problem records of the next learner, and if so, ending the step.
Step 4: and carrying out combined training on the personalized recommendation model and the problem record modification model to obtain optimal model parameters, and improving the accuracy of problem recommendation. The combined training process of the personalized problem recommendation model based on the reinforcement learning algorithm provided in this embodiment is specifically as follows.
Step 4-1: initializing parameters α=α of personalized recommendation model 0 Parameter β=β of knowledge tracking model 0 And the parameter θ=θ of the reinforcement learning module 0 ;
Step 4-2: using learner exercise recordsRecording and training a knowledge tracking model;
step 4-3: recording using problemsRecording and training the personalized recommendation model by the knowledge tracking model;
step 4-4: parameter α=α of the fixed personalized recommendation model 1 And parameter β=β of knowledge tracking model 1 Pre-training the reinforcement learning module; the specific method comprises the following steps:
step 4-4-1: reinforced learning algorithm in problem recordingA step of up-selecting action;
step 4-4-2: calculating a Reward function Reward according to the selected action;
step 4-4-3: updating parameters of the reinforcement Learning module according to a loss function of the Deep Q-Learning algorithm;
step 4-4-4: circularly executing the steps 4-4-1 to 4-4-3 until all problems are recordedThe circulation is completed;
step 4-4-5: repeating the steps 4-4-1 to 4-4-4 until the parameters of the reinforcement learning module reach the optimal values;
step 4-5: parameter β=β for fixed knowledge tracking 1 Performing joint training on the personalized recommendation model and the reinforcement learning module; the specific method comprises the following steps:
step 4-5-1: reinforced learning algorithm in problem recordingA step of up-selecting action;
step 4-5-2: calculating a Reward function Reward according to the selected action;
step 4-5-3: updating parameters of the reinforcement Learning module according to a loss function of the Deep Q-Learning algorithm;
step 4-5-4: step 4-5-1 to step 4-5-3 are circularly performed until allThe circulation is completed;
step 4-5-5: updating parameters of the recommendation model according to the loss function of the recommendation model;
step 4-5-6: and repeatedly and circularly executing the steps 4-5-1 to 4-5-5 until the parameters of the personalized recommendation model and the reinforcement learning module reach the optimal.
Step 5: and (3) modifying the problem records of the learner by using the problem record modification model obtained after the combined training in the step (4), and recommending the problems of the learner by using the personalized recommendation model obtained after the combined training in the step (4) to obtain a problem recommendation list.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions, which are defined by the scope of the appended claims.
Claims (7)
1. A personalized problem recommendation method based on reinforcement learning is characterized in that: the method comprises the following steps:
step 1: calculating potential knowledge level of the learner by using the knowledge tracking model, and adding the potential knowledge level into the characteristic construction of the personalized recommendation model and the state representation of the problem record modification model;
step 2: constructing and training a personalized recommendation model for problem recommendation;
step 3: designing and training a problem record modification model based on a Deep Q-Learning algorithm of reinforcement Learning to remove dislike or dissatisfaction problems selected by mistake in the Learning process;
step 4: performing joint training on the personalized recommendation model and the problem record modification model;
step 5: and (3) modifying the problem records of the learner by using the problem record modification model obtained after the combined training in the step (4), and recommending the problems of the learner by using the personalized recommendation model obtained after the combined training in the step (4) to obtain a problem recommendation list.
2. The reinforcement learning-based personalized problem recommendation method of claim 1, wherein: in the step 1, the knowledge tracking model is a depth knowledge tracking model DKT; the DKT model predicts the question score at the next moment according to the historic learning record of the learner by utilizing a time sequence relation through a long-short-period memory network LSTM; the DKT model firstly generates a one-hot vector from the historical achievements of a learner through one-hot coding, the one-hot vector is input into an LSTM network, features are extracted through the LSTM layer, the extracted features are input into a hidden layer, then a prediction result is output from an output layer, and the output of the DKT model represents the probability of each problem correctly answered by the learner, namely the achievement of the next answer of the learner; taking the output of the LSTM layer as the potential knowledge level of a learner, and adding the output into the characteristic construction of the personalized recommendation model and the state representation of the problem record modification model; the input of DKT model is training record of learnerTraining memory of learner i at time tThe transcript is shown as-> wherein />Question indicating that learner i selected at time t,/->The answer result of the learner i at the time t is shown; recording of exercises->Comprises only the exercises of learner i selection learning, exercise record +.> The answer result of learner i is also recorded.
3. The reinforcement learning-based personalized problem recommendation method of claim 1, wherein: the personalized recommendation model in the step 2 comprises three parts, namely an Embedding layer, a GRU layer and a full-connection layer; the Embedding layer is used for mapping one-hot vectors of problem records made by learners to a low-dimensional vector space for encoding; the GRU layer is a gating circulation unit layer, and the layer is also an improved circulation neural network model and is used for extracting sequence characteristics of problem records; the full-connection layer is used for calculating the probability of each problem selected by the learner through the characteristics of the learner, and recommending the problems for the learner according to the size of the selected probability.
4. The reinforcement learning-based personalized problem recommendation method of claim 3, wherein: the specific method of the step 2 is as follows:
step 2-1: recording problems made by the learner i through an Embedding layerIs>Mapping to low-dimensional vector space for coding, and outputting as low-dimensional vector +.>
Step 2-2: extracting sequence features of problem records through the GRU layer;
the update gate of the GRU determines the amount by which the state information at the previous time and the state information at the current time continue to be transferred into the future, and the calculation formula is as follows:
wherein ,a low-dimensional vector representation representing problems performed by learner i at time t, h t-1 Hidden state information indicating time t-1, W z The weight coefficient representing the update gate, σ (·) is the sigmod activation function;
the reset gate of the GRU layer determines the amount by which the state information of the previous time is to be forgotten, and the calculation formula is as follows:
wherein ,Wr A weight coefficient representing a reset gate;
the calculation formula of the current memory content is shown as follows:
wherein ,Wh Is another weight coefficient of the reset gate, reset gate r t And hidden state information h t-1 The corresponding element product of (a) determines the information to be preserved at the previous moment, which is an operator representing the dot product of the matrix;
the final memorized calculation formula of the current time step is shown as follows:
wherein, (1-z t )*h t-1 Information representing the previous time is retained to the amount that is ultimately remembered at the current time,representing the amount of final memory of the current memory content reserved to the current moment; h finally obtained t The sequence characteristic of the problem records of the learner;
step 2-3: the probability of each problem selected by the learner is calculated according to the characteristics of the learner through the full connection layer, and the following formula is shown:
y=softmax(W j ·[K i ,h t ]+b j )
wherein ,Wj Is the weight coefficient of the full connection layer, b j Is the bias factor of the fully connected layer,is the potential knowledge level of learner i calculated by the DKT model; [ K ] i ,h t ]Is the sequence characteristic h of the learner problem record obtained by combining the potential knowledge level of the learner i with the GRU layer t Splicing; softmax (·) is an activation function, limiting the output value between 0 and 1;
step 2-4: the personalized recommendation model adopts cross entropy as a loss function to train and update the model, and the calculation formula is shown as follows:
wherein M is the number of learners, p i Is the true probability distribution of the problem selected by learner i at the next moment, q i The personalized recommendation model gives out the prediction probability distribution for representing the problem selected by the learner i at the next moment;
the cross entropy loss function is an index for measuring the difference between the real probability distribution p and the model predictive probability distribution q;
step 2-5: and sequencing the probability of selecting each problem by the learner i calculated by the personalized recommendation model according to the sequence from big to small, and recommending the first K problems to the learner i to form a problem recommendation list.
5. The reinforcement learning-based personalized problem recommendation method of claim 4, wherein: the problem record modification model in the step 3 adopts a reinforcement learning related algorithm, comprising action representation, state representation, rewarding function of the model and reinforcement learning algorithm, and specifically comprises the following steps:
in order to delete problems that are disliked or unsatisfied in the learning process of the learner, the action a of each step t With only two values, a t =0 means that the problem is deleted in the problem record, a t =1 means that the problem is retained in the problem record;
the status of the learner is represented by the following formula:
S=[k 1 ,k 2 ,…,k N ,p 1 ,p 2 ,…,p N ]
wherein ,k1 ,k 2 ,…,k N Representing potential knowledge levels of learners, specifically to the ith learner as Given by a knowledge tracking model; p is p 1 ,p 2 ,…,p N Is a low-dimensional vector representation of learner problem records and location identifiers that function to record the location of modifications;
the reward function of the reinforcement learning module is given by a personalized recommendation model, and the form is shown as follows:
wherein ,etarget Is the problem actually selected by the learner at the next moment,representing the probability of selecting a target problem based on the modified problem records, p (e target |E i ) Representing a probability of selecting a target problem based on the original problem record; the reinforcement learning module adopts a round update strategy, and obtains a reward function only after finishing the modification of the whole learning record of one learner, and the reward function is 0 at the rest time;
the reinforcement Learning algorithm adopts a depth Q network algorithm DQN, and the algorithm combines a neural network and a Q-Learning algorithm in the traditional reinforcement Learning algorithm;
the reinforcement learning module takes the square of the difference between the true value and the predicted value as a loss function, trains and updates the parameters of the DQN model, and the specific formula of the loss function is shown as follows:
wherein ,Qθ (s t ,a t ) Represented in state s t Lower selection action a t Calculating the predicted value of the obtained rewards from the predicted Q network, and predicting the network parameters of the Q networkθ;representing state s t Lower selection action a t A true value of the prize available; wherein->Calculated by the target Q network, represents the next state s t+1 The maximum prize value that can be obtained, the network parameters of the target Q network are +.>r t Is the current available reward value, which is given by the reward function;
the gradient of the loss function is shown as follows:
network parameters are updated according to the gradient descent.
6. The reinforcement learning-based personalized problem recommendation method of claim 5, wherein: the specific process of modifying the problem records of the learner in the step 3 is as follows:
step 3-1: initializing a model, including initializing parameters of a predictive Q network and a target Q network; initializing an experience playback pool, wherein the capacity is N; initializing a learner-modified problem record setLearner index i=1, time t=0;
step 3-2: obtaining problem records of learner iAnd an initial state s 0 ;
Step 3-3: state s t Feature vector phi(s) t ) As the input of the predictive Q network, obtaining the Q value corresponding to the action in the current state;
step 3-4: selecting action a in current Q value by adopting epsilon-greedy strategy t ;
Step 3-5: if a is t =0, deleteIs->
Step 3-6: in state s t Executing the current action a t Obtaining the next state s t+1 Sum prize r t ;
Step 3-7: will { s ] t ,a t ,r t ,s t+1 This quadruple is stored in an experience playback pool;
step 3-8: updating state s t =s t+1 ;
Step 3-9: sampling m samples { s } from an empirical playback pool j ,a j ,r j ,s j+1 J=1, 2, …, m, calculating the current target Q value y j :
Step 3-10: using a mean square error loss functionUpdating parameters of the predictive Q network;
step 3-11: updating parameters of the target Q network after each step C, wherein the parameter value is the parameter value of the current predicted Q network;
step 3-12: judging whether the moment reaches a set value T or not; if not, returning to the step 3-3; if so, executing the next step;
step 3-13: record E of the problem after modification i Is marked asWill->Add to learner modified problem record set +.>
Step 3-14: judging whether all the problem records of the learners are modified, if not, returning to the step 3-2, continuing the modification of the problem records of the next learner, and if so, ending the step.
7. The reinforcement learning-based personalized problem recommendation method of claim 6, wherein: the process of the step 4 joint training is specifically as follows:
step 4-1: initializing parameters α=α of personalized recommendation model 0 Parameter β=β of knowledge tracking model 0 And the parameter θ=θ of the reinforcement learning module 0 ;
Step 4-2: using learner exercise recordsTraining a knowledge tracking model;
step 4-3: recording of problems with learnerTraining the personalized recommendation model by the knowledge tracking model;
step 4-4: parameter α=α of the fixed personalized recommendation model 1 And parameter β=β of knowledge tracking model 1 Pre-training the reinforcement learning module; the specific method comprises the following steps:
step 4-4-1: reinforced learning algorithm in problem recordingA step of up-selecting action;
step 4-4-2: calculating a Reward function Reward according to the selected action;
step 4-4-3: updating parameters of the reinforcement Learning module according to a loss function of the Deep Q-Learning algorithm;
step 4-4-4: circularly executing the steps 4-4-1 to 4-4-3 until all problems are recordedThe circulation is completed;
step 4-4-5: repeating the steps 4-4-1 to 4-4-4 until the parameters of the reinforcement learning module reach the optimal values;
step 4-5: parameter β=β for fixed knowledge tracking 1 Performing joint training on the personalized recommendation model and the reinforcement learning module; the specific method comprises the following steps:
step 4-5-1: reinforced learning algorithm in problem recordingA step of up-selecting action;
step 4-5-2: calculating a Reward function Reward according to the selected action;
step 4-5-3: updating parameters of the reinforcement Learning module according to a loss function of the Deep Q-Learning algorithm;
step 4-5-4: step 4-5-1 to step 4-5-3 are circularly performed until allThe circulation is completed;
step 4-5-5: updating parameters of the recommendation model according to the loss function of the recommendation model;
step 4-5-6: and repeatedly and circularly executing the steps 4-5-1 to 4-5-5 until the parameters of the personalized recommendation model and the reinforcement learning module reach the optimal.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310703313.2A CN116680477A (en) | 2023-06-14 | 2023-06-14 | Personalized problem recommendation method based on reinforcement learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310703313.2A CN116680477A (en) | 2023-06-14 | 2023-06-14 | Personalized problem recommendation method based on reinforcement learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116680477A true CN116680477A (en) | 2023-09-01 |
Family
ID=87787013
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310703313.2A Pending CN116680477A (en) | 2023-06-14 | 2023-06-14 | Personalized problem recommendation method based on reinforcement learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116680477A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116720007A (en) * | 2023-08-11 | 2023-09-08 | 河北工业大学 | Online learning resource recommendation method based on multidimensional learner state and joint rewards |
CN118313975A (en) * | 2024-04-18 | 2024-07-09 | 华南师范大学 | Exercise path recommending method and device, electronic equipment and storage medium |
-
2023
- 2023-06-14 CN CN202310703313.2A patent/CN116680477A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116720007A (en) * | 2023-08-11 | 2023-09-08 | 河北工业大学 | Online learning resource recommendation method based on multidimensional learner state and joint rewards |
CN116720007B (en) * | 2023-08-11 | 2023-11-28 | 河北工业大学 | Online learning resource recommendation method based on multidimensional learner state and joint rewards |
CN118313975A (en) * | 2024-04-18 | 2024-07-09 | 华南师范大学 | Exercise path recommending method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111460249B (en) | Personalized learning resource recommendation method based on learner preference modeling | |
CN116680477A (en) | Personalized problem recommendation method based on reinforcement learning | |
CN108095716B (en) | Electrocardiosignal detection method based on confidence rule base and deep neural network | |
CN112529155B (en) | Dynamic knowledge mastering modeling method, modeling system, storage medium and processing terminal | |
CN113610235A (en) | Adaptive learning support device and method based on deep knowledge tracking | |
CN112085168A (en) | Knowledge tracking method and system based on dynamic key value gating circulation network | |
CN114299349B (en) | Crowdsourcing image learning method based on multi-expert system and knowledge distillation | |
CN113591988B (en) | Knowledge cognitive structure analysis method, system, computer equipment, medium and terminal | |
CN114567815B (en) | Pre-training-based adaptive learning system construction method and device for lessons | |
CN113361791A (en) | Student score prediction method based on graph convolution | |
CN116882450B (en) | Question-answering model editing method and device, electronic equipment and storage medium | |
CN116521997A (en) | Personalized learning path recommendation method based on reinforcement learning | |
CN113934840B (en) | Covering heuristic quantity sensing exercise recommendation method | |
CN117422062A (en) | Test question generation method based on course knowledge network and reinforcement learning | |
CN117808637A (en) | Intelligent guide method based on GPT and multi-agent reinforcement learning | |
CN117349362A (en) | Dynamic knowledge cognitive hierarchy mining method, system, equipment and terminal | |
CN112884129B (en) | Multi-step rule extraction method, device and storage medium based on teaching data | |
CN115906831A (en) | Distance perception-based Transformer visual language navigation algorithm | |
CN111539292B (en) | Action decision model and method for question-answering task with actualized scene | |
CN114091657A (en) | Intelligent learning state tracking method, system and application based on multi-task framework | |
CN114742292A (en) | Knowledge tracking process-oriented two-state co-evolution method for predicting future performance of students | |
KR102426812B1 (en) | Scheme for reinforcement learning based enhancement of interaction | |
CN118364042A (en) | Depth knowledge tracking method and system integrating cognitive theory | |
CN117057422B (en) | Knowledge tracking system for global knowledge convergence sensing | |
CN115952838B (en) | Self-adaptive learning recommendation system-based generation method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |