[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2021068638A1 - Procédé d'apprentissage interactif intensif qui combine une structure tamer et une rétroaction d'expression faciale - Google Patents

Procédé d'apprentissage interactif intensif qui combine une structure tamer et une rétroaction d'expression faciale Download PDF

Info

Publication number
WO2021068638A1
WO2021068638A1 PCT/CN2020/108156 CN2020108156W WO2021068638A1 WO 2021068638 A1 WO2021068638 A1 WO 2021068638A1 CN 2020108156 W CN2020108156 W CN 2020108156W WO 2021068638 A1 WO2021068638 A1 WO 2021068638A1
Authority
WO
WIPO (PCT)
Prior art keywords
tamer
agent
feedback
reward
facial expression
Prior art date
Application number
PCT/CN2020/108156
Other languages
English (en)
Chinese (zh)
Inventor
李光亮
林金莹
张期磊
何波
冯晨
Original Assignee
中国海洋大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国海洋大学 filed Critical 中国海洋大学
Publication of WO2021068638A1 publication Critical patent/WO2021068638A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • G06F18/2178Validation; Performance evaluation; Active pattern learning techniques based on feedback of a supervisor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition

Definitions

  • the present invention relates to the field of artificial intelligence technology, in particular to an interactive reinforcement learning method combining TAMER framework and facial expression feedback.
  • TAMER is a typical interactive reinforcement learning method.
  • the system can learn a prediction model of human user rewards. This model can successfully train TAMER agents even when human rewards are delayed or inconsistent.
  • TAMER agents can understand the intentions of human users and adapt to their preferences.
  • human user preferences are conveyed through clear instructions or expensive corrective feedback, such as through predefined words or sentences, buttons, mouse clicks, etc.
  • these feedbacks Methods increase the cognitive load of human users.
  • the problem in the prior art is that adjusting the behavior of the TAMER agent through explicit feedback forms such as predefined keyboard feedback will increase the cognitive burden of human users, and strategy updates require a large number of interactive behaviors, which increases the cost of learning. .
  • the direct significance is to reduce the number of explicit feedback required during the TAMER agent training process, and reduce the cognitive burden of human users;
  • the purpose of the present invention is to provide an interactive reinforcement learning method that combines the TAMER framework and facial expression feedback, and combines explicit feedback and facial expression feedback to perform learning on the TAMER framework.
  • an interactive reinforcement learning method combining TAMER framework and facial expression feedback including:
  • Face Valuing-TAMER agent Combining the TAMER framework and facial expression evaluation to form a Face Valuing-TAMER agent; the Face Valuing-TAMER agent anticipates future rewards by learning a value function from human feedback;
  • the combination of the TAMER framework and facial expression evaluation to form the Face Valuing-TAMER agent is specifically: the trainer trains the TAMER agent under the TAMER framework, uses keyboard button feedback to determine the keyboard reward signal, and trains the TAMER agent to obtain an initial Executable strategy;
  • the trainer determines the facial reward signal through facial expression feedback to adjust the behavior strategy of the TAMER agent.
  • the trainer trains the TAMER agent under the TAMER framework, determines keyboard reward signals through keyboard key feedback, and trains the TAMER agent to obtain an initial executable strategy, which specifically includes:
  • the trainer observes the current action of the TAMER agent, and feeds it back through a keyboard interface, obtains a keyboard feedback signal, and determines a keyboard reward signal according to the keyboard feedback signal;
  • An initial executable strategy is determined according to the keyboard feedback signal and the keyboard reward signal.
  • enabling the trainer to determine the facial reward signal through facial expression feedback to adjust the behavior strategy of the TAMER agent specifically includes:
  • the updated value function includes a state value function and an action value function;
  • the trainer obtains a facial feedback signal through facial expression feedback, and determines a facial reward signal according to the facial feedback signal;
  • the updated value function is:
  • the Face Valuing-TAMER agent predicts future rewards by learning a value function from human feedback, which specifically includes:
  • a TAMER agent learns a reward model
  • the behavior of is defined as the expected human reward in the current state and action: Agent S t is the signal received after taking reward operation in either state A t;
  • the TAMER agent chooses the maximum expected return:
  • the trainer Based on the maximum expected return, the trainer observes and evaluates the behavior of the TAMER agent and awards it.
  • An information data processing terminal applied to the interactive reinforcement learning method combining TAMER framework and facial expression feedback.
  • the present invention Compared with the prior art, the present invention has the advantages that: the present invention introduces human user's facial expression feedback in the TAMER framework, and the human user can provide feedback through the keyboard or other interactive interfaces to train the TAMER agent. After the agent learns an initial strategy, the human user adjusts the agent's behavior through facial expression feedback. This process will reduce the cognitive burden of the human user and free the human user from the heavy feedback task. This method is The supplement to the existing interactive machine learning methods will help to further improve the interaction efficiency between the agent and the human user.
  • the interactive reinforcement learning method of the present invention combining the TAMER framework and facial expression feedback can reduce the cognitive burden of the human user in the process of training the agent, so that the agent can better understand human preferences, and can effectively learn from human rewards.
  • Figure 5 shows the number of time steps required for each Episode through keyboard feedback training and facial expression feedback training during the training process.
  • the histogram shows the average value and standard deviation of each Episode, and the table shows the average value;
  • Figure 6 reflects During the training process, each Episode uses keyboard feedback training and introduces the required number of feedback for facial expression feedback training.
  • the histogram shows the average value and standard deviation of each Episode, and the table shows the average value.
  • FIG. 1 is a flowchart of an interactive reinforcement learning method combining TAMER framework and facial expression feedback provided by an embodiment of the present invention
  • FIG. 2 is an implementation flowchart of an interactive reinforcement learning method combining TAMER framework and facial expression feedback provided by an embodiment of the present invention
  • Fig. 3 is a screenshot of an example of a training interface interface and a Grid World environment task provided by an embodiment of the present invention
  • FIG. 4 is a schematic block diagram of an agent interaction reinforcement learning combining TAMER framework and facial expression feedback provided by an embodiment of the present invention
  • FIG. 5 is a comparison diagram of required time steps for keyboard feedback training and facial expression feedback training provided by an embodiment of the present invention
  • Fig. 6 is a comparison diagram of the number of feedbacks required by keyboard feedback training and the introduction of facial expression feedback training provided by an embodiment of the present invention.
  • the present invention provides an interactive reinforcement learning method combining the TAMER framework and facial expression feedback.
  • the present invention will be described in detail below with reference to the accompanying drawings.
  • the interactive reinforcement learning method combining TAMER framework and facial expression feedback includes the following steps:
  • S101 Face Valuing-TAMER allows human trainers to first train the agent under the TAMER framework; the agent chooses actions according to the current state.
  • S102 The human trainer observes and provides clear feedback as an incentive signal through keyboard keys and other interfaces.
  • S105 The agent obtains an initial executable strategy through keyboard feedback learning.
  • S106 The human trainer provides rewards to adjust the behavior of the agent through facial expression feedback, and adjusts the strategy to detect whether it reaches a satisfactory state; if satisfied, it ends, and if not satisfied, the strategy is adjusted again through facial expression feedback.
  • the algorithm for the agent to learn from human feedback includes:
  • TAMER learns a value function through a predictive reward model learned from human feedback:
  • R t+i is the reward obtained by TAMER agent performing action a in state s at time t+i
  • G t is the expected reward at time t, which is defined as the total discount of rewards after time t
  • v ⁇ (s) is the state value function corresponding to each strategy ⁇ , which maps each state s ⁇ S to the expected return G t of the state by following the strategy ⁇ ;
  • q ⁇ (s, a) is the action value function corresponding to each strategy ⁇ , which provides the expected return G t by following the strategy ⁇ and executing action a in the state s.
  • the state value function is very important. On the contrary, if the given task needs to be controlled, it is very important to use the action value function q ⁇ (s, a); human trainers can use keyboard keys or facial expressions Provide reward feedback to adjust the behavior of the agent.
  • the present invention introduces facial expression recognition feedback on the basis of the TAMER framework, which is a typical method for an agent to learn from human rewards. Assuming that the TAMER agent learns an initial strategy through keyboard feedback, the amount of explicit feedback needed to adjust through facial expression feedback is less than the amount of feedback that the agent needs to learn from keyboard feedback alone, and the algorithm is tested in the Grid World task field. Compared with the use of different discount factors on human rewards for agent learning through the TAMER framework, the results show that although training the agent directly through facial expression feedback cannot quickly obtain an effective strategy, it can capture the face of human users in real time. Features, adjust the agent's strategy online according to user preferences without changing the model.
  • Figure 4 is a schematic block diagram of agent interactive reinforcement learning combining the TAMER framework and facial expression feedback.
  • the TAMER framework is constructed for a variant of the Markov decision process, which is a model of sequential decision-making, which is solved through dynamic programming and reinforcement learning.
  • an agent learns in MDP without a clearly defined reward function, but learns a reward model, denoted by MDP ⁇ R.
  • the TAMER agent learns from the real-time evaluation of its behavior by human trainers.
  • the agent interprets this evaluation as a human reward, creates a predictive model, and selects the behavior that it predicts will receive the most human rewards. It strives to maximize the immediate rewards caused by behavior, which is in sharp contrast with traditional reinforcement learning. In traditional reinforcement learning, the agent seeks the largest future reward.
  • human rewards can be delivered with a small delay. This delay is the time for the trainer to evaluate the behavior of the agent and deliver its feedback. .
  • Second, the assessment provided by the human trainer judges the behavior itself and takes into account the model of its long-term consequences.
  • a TAMER agent learns a reward model Similar to the human reward expected in the current state and action, Given a state s, the agent chooses the largest expected return in the short-term, The trainer can observe and evaluate the behavior of the agent and give rewards.
  • TAMER feedback is given through keyboard input and is attributed to the agent’s recent actions.
  • Each feedback button press is marked as a scalar reward signal (-1 or +1). This signal can also be enhanced by pressing the button multiple times.
  • the label of the sample is used as the delay-weighted total return, which is based on the specific The probability of the human reward signal of the time step is calculated.
  • the TAMER learning algorithm continuously repeats actions, perceives rewards, and updates This process.
  • VI-TAMER TAMER variant
  • the agent learns from discounted human rewards and produces a planning algorithm-value iteration.
  • a VI-TAMER agent learns and applies its value function to the reward function that was recently changed from TAMER , And use the value function to choose the next action.
  • reinforcement learning RL
  • the discount factor ⁇ (0 ⁇ 1) determines how far the agent can look into the future. Since the discount factor ⁇ of the initial TAMER is 0 (short-term), it can be regarded as VI-TAMER Special circumstances. Therefore, in the present invention, from now on, TAMER is used as a general method for agents to learn from human rewards, and ⁇ TAMER is used as a discount factor for human rewards.
  • the Grid World task contains 30 states. In each state, the agent's movement can be selected from four actions in the action space: move up, down, left or right. The agent cannot pass through the wall, and attempts to pass through the wall will not change the current state of the agent.
  • the task performance index is the time step required to reach the target position from the initial position, that is, the number of actions. As shown in the middle of the screenshot of Figure 3, the small dark gray square is the agent, and the cross indicates the direction of the agent's next movement. In this task, the agent tries to learn a strategy so that it can reach the goal state and minimize the number of time steps. The optimal strategy from the starting state to the target state requires 20 time steps.
  • the current position of the agent is the initial state of the agent, and the target state is the position where the elliptical square in the upper right corner is located.
  • the black line and the light gray square both indicate the fence, and the agent cannot directly pass through this area. .
  • a radial basis function effectively creates a pseudo-table centered on each square cell of Grid World, which can be slightly generalized among nearby cells.
  • Face Valuing-TAMER first provide feedback through the keyboard to train the agent to obtain an initial strategy, and then use the user's facial expression feedback to adjust the strategy obtained by the agent. It is expected to analyze the average data collected in 20 experiments to test the performance of the proposed method.
  • ⁇ TAMER 0, 0.2, 0.5, 0.8 , 1. It should be noted that in the TAMER module of Face Valuing-TAMER and the TAMER framework, the value of ⁇ TAMER is the same.
  • Face Valuing-TAMER to train an agent requires less feedback, especially explicit feedback, than using the TAMER framework to train an agent.
  • the number of time steps for receiving feedback can be calculated to compare the number of time steps for the total feedback, positive feedback, and negative feedback received by the agent under different discount factors for Face Valuing-TAMER and TAMER. It is expected that the Face Valuing-TAMER agent will receive much less feedback than the TAMER agent. This result shows that humans provide evaluation feedback for the agent in the form of facial expression feedback, which can reduce the amount of feedback needed to train the agent and effectively reduce The cognitive burden of human training agent behavior.
  • the research results can show that, compared with learning from the TAMER framework, although facial expression feedback cannot effectively reduce the number of explicit feedback required (because the current facial expression recognition accuracy is only more than 60%), it can still obtain one and Learn the same optimal strategy from keyboard feedback. Further improving the accuracy of facial expression recognition can effectively reduce the amount of explicit feedback required to obtain an optimal strategy.
  • the total number of time steps required to train the agent to obtain the optimal strategy can be used as the performance measurement in the experiment.
  • the experiment compares the total number of time steps required to train the Face Valuing-TAMER and TAMER agents to obtain the best strategy using different discount factors on human rewards. It is expected that the total time step required to train a Face Valuing-TAMER agent is much less than that of the TAMER agent.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

L'invention concerne un procédé d'apprentissage interactif intensif qui combine une structure de formation d'un agent manuellement par l'intermédiaire du renforcement évaluatif (TAMER) et une rétroaction d'expression faciale, consistant : à former un agent TAMER-évaluation faciale par combinaison d'une structure TAMER et d'une évaluation d'expression faciale ; et à attendre, par l'agent TAMER-évaluation faciale, une récompense future à partir d'une fonction de valeur d'apprentissage dans la rétroaction humaine. Un formateur humain forme d'abord un agent dans une structure TAMER et fournit un signal de récompense au moyen d'une rétroaction de touche de clavier, et l'agent formé acquiert une stratégie exécutable initiale et permet ensuite au formateur humain de fournir une récompense au moyen d'une rétroaction d'expression faciale de façon à ajuster le comportement de l'agent. La charge cognitive au cours d'un processus dans lequel un utilisateur humain forme un agent peut être réduit au moyen du procédé d'apprentissage interactif intensif basé sur une rétroaction d'expression faciale, de sorte que l'agent peut mieux comprendre les préférences humaines et apprendre efficacement à partir de récompenses humaines.
PCT/CN2020/108156 2019-10-12 2020-08-10 Procédé d'apprentissage interactif intensif qui combine une structure tamer et une rétroaction d'expression faciale WO2021068638A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910967991.3A CN110826723A (zh) 2019-10-12 2019-10-12 一种结合tamer框架和面部表情反馈的交互强化学习方法
CN201910967991.3 2019-10-12

Publications (1)

Publication Number Publication Date
WO2021068638A1 true WO2021068638A1 (fr) 2021-04-15

Family

ID=69548992

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/108156 WO2021068638A1 (fr) 2019-10-12 2020-08-10 Procédé d'apprentissage interactif intensif qui combine une structure tamer et une rétroaction d'expression faciale

Country Status (3)

Country Link
CN (1) CN110826723A (fr)
LU (1) LU500028B1 (fr)
WO (1) WO2021068638A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657583A (zh) * 2021-08-24 2021-11-16 广州市香港科大霍英东研究院 一种基于强化学习的大数据特征提取方法及系统
CN114003121A (zh) * 2021-09-30 2022-02-01 中国科学院计算技术研究所 数据中心服务器能效优化方法与装置、电子设备及存储介质
CN114371728A (zh) * 2021-12-14 2022-04-19 河南大学 一种基于多智能体协同优化的无人机资源调度方法
CN114710792A (zh) * 2022-03-30 2022-07-05 合肥工业大学 基于强化学习的5g配网分布式保护装置的优化布置方法
CN115250156A (zh) * 2021-09-09 2022-10-28 李枫 一种基于联邦学习的无线网络多信道频谱接入方法
CN115361717A (zh) * 2022-07-12 2022-11-18 华中科技大学 一种基于vr用户视点轨迹的毫米波接入点选择方法及系统
CN116307241A (zh) * 2023-04-04 2023-06-23 暨南大学 基于带约束多智能体强化学习的分布式作业车间调度方法

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110826723A (zh) * 2019-10-12 2020-02-21 中国海洋大学 一种结合tamer框架和面部表情反馈的交互强化学习方法
CN114118434A (zh) * 2020-08-27 2022-03-01 朱宝 智能机器人及其学习方法
CN112859591B (zh) * 2020-12-23 2022-10-21 华电电力科学研究院有限公司 一种面向能源系统运行优化的强化学习控制系统
CN112818672A (zh) * 2021-01-26 2021-05-18 山西三友和智慧信息技术股份有限公司 一种基于文本游戏的强化学习情感分析系统
CN114355786A (zh) * 2022-01-17 2022-04-15 北京三月雨文化传播有限责任公司 基于大数据的多媒体数字化展厅的调控云系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105105771A (zh) * 2015-08-07 2015-12-02 北京环度智慧智能技术研究所有限公司 潜能值测验的认知指标分析方法
CN105759677A (zh) * 2015-03-30 2016-07-13 公安部第研究所 一种适于视觉终端作业岗位的多模态行为分析与监控系统及方法
US20190179893A1 (en) * 2017-12-08 2019-06-13 General Electric Company Systems and methods for learning to extract relations from text via user feedback
CN110070185A (zh) * 2019-04-09 2019-07-30 中国海洋大学 一种从演示和人类评估反馈进行交互强化学习的方法
CN110826723A (zh) * 2019-10-12 2020-02-21 中国海洋大学 一种结合tamer框架和面部表情反馈的交互强化学习方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978012A (zh) * 2019-03-05 2019-07-05 北京工业大学 一种基于结合反馈的改进贝叶斯逆强化学习方法
CN110070188B (zh) * 2019-04-30 2021-03-30 山东大学 一种融合交互式强化学习的增量式认知发育系统及方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105759677A (zh) * 2015-03-30 2016-07-13 公安部第研究所 一种适于视觉终端作业岗位的多模态行为分析与监控系统及方法
CN105105771A (zh) * 2015-08-07 2015-12-02 北京环度智慧智能技术研究所有限公司 潜能值测验的认知指标分析方法
US20190179893A1 (en) * 2017-12-08 2019-06-13 General Electric Company Systems and methods for learning to extract relations from text via user feedback
CN110070185A (zh) * 2019-04-09 2019-07-30 中国海洋大学 一种从演示和人类评估反馈进行交互强化学习的方法
CN110826723A (zh) * 2019-10-12 2020-02-21 中国海洋大学 一种结合tamer框架和面部表情反馈的交互强化学习方法

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657583A (zh) * 2021-08-24 2021-11-16 广州市香港科大霍英东研究院 一种基于强化学习的大数据特征提取方法及系统
CN115250156A (zh) * 2021-09-09 2022-10-28 李枫 一种基于联邦学习的无线网络多信道频谱接入方法
CN114003121A (zh) * 2021-09-30 2022-02-01 中国科学院计算技术研究所 数据中心服务器能效优化方法与装置、电子设备及存储介质
CN114003121B (zh) * 2021-09-30 2023-10-31 中国科学院计算技术研究所 数据中心服务器能效优化方法与装置、电子设备及存储介质
CN114371728A (zh) * 2021-12-14 2022-04-19 河南大学 一种基于多智能体协同优化的无人机资源调度方法
CN114371728B (zh) * 2021-12-14 2023-06-30 河南大学 一种基于多智能体协同优化的无人机资源调度方法
CN114710792A (zh) * 2022-03-30 2022-07-05 合肥工业大学 基于强化学习的5g配网分布式保护装置的优化布置方法
CN114710792B (zh) * 2022-03-30 2024-09-06 合肥工业大学 基于强化学习的5g配网分布式保护装置的优化布置方法
CN115361717A (zh) * 2022-07-12 2022-11-18 华中科技大学 一种基于vr用户视点轨迹的毫米波接入点选择方法及系统
CN115361717B (zh) * 2022-07-12 2024-04-19 华中科技大学 一种基于vr用户视点轨迹的毫米波接入点选择方法及系统
CN116307241A (zh) * 2023-04-04 2023-06-23 暨南大学 基于带约束多智能体强化学习的分布式作业车间调度方法
CN116307241B (zh) * 2023-04-04 2024-01-05 暨南大学 基于带约束多智能体强化学习的分布式作业车间调度方法

Also Published As

Publication number Publication date
LU500028B1 (en) 2021-04-23
CN110826723A (zh) 2020-02-21

Similar Documents

Publication Publication Date Title
WO2021068638A1 (fr) Procédé d'apprentissage interactif intensif qui combine une structure tamer et une rétroaction d'expression faciale
Kaufmann et al. A survey of reinforcement learning from human feedback
Bernard et al. Learning style Identifier: Improving the precision of learning style identification through computational intelligence algorithms
CN108415923B (zh) 封闭域的智能人机对话系统
CN108647233B (zh) 一种用于问答系统的答案排序方法
US12130603B2 (en) Method and apparatus for controlling smart home
CN108664589A (zh) 基于领域自适应的文本信息提取方法、装置、系统及介质
CN114999610B (zh) 基于深度学习的情绪感知与支持的对话系统构建方法
CN111274438A (zh) 一种语言描述引导的视频时序定位方法
CN108765228A (zh) 一种计算机自适应私教学习方法
Voskuilen et al. Modeling confidence and response time in associative recognition
Yang et al. [Retracted] Research on Students’ Adaptive Learning System Based on Deep Learning Model
CN111191722B (zh) 通过计算机训练预测模型的方法及装置
Franke et al. The softmax function: Properties, motivation, and interpretation
Latona et al. The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates
Lin et al. A comprehensive survey on deep learning techniques in educational data mining
JP2021140749A (ja) 人間の知能を人工知能に移植するための精密行動プロファイリングのための電子装置およびその動作方法
US12052183B1 (en) Resource allocation discovery and optimization service
Wu et al. A Tutorial-Generating Method for Autonomous Online Learning
Wu et al. A Generative Approach for Proactive Assistance Forecasting in Intelligent Tutoring Environments
Zhang et al. A novel action decision method of deep reinforcement learning based on a neural network and confidence bound
Dai et al. DMH-CL: Dynamic Model Hardness Based Curriculum Learning for Complex Pose Estimation
Ren et al. Long-term student performance prediction using learning ability self-adaptive algorithm
CN113642804B (zh) 多组件增强的本科生毕业去向预测与推荐多任务方法及系统
Lerch Beyond Bounded Rationality: Towards a Computationally Rational Theory of Motor Control

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20874225

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20874225

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 20874225

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05.06.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 20874225

Country of ref document: EP

Kind code of ref document: A1