[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113657573A - Robot skill acquisition method based on meta-learning under guidance of contextual memory - Google Patents

Robot skill acquisition method based on meta-learning under guidance of contextual memory Download PDF

Info

Publication number
CN113657573A
CN113657573A CN202110740838.4A CN202110740838A CN113657573A CN 113657573 A CN113657573 A CN 113657573A CN 202110740838 A CN202110740838 A CN 202110740838A CN 113657573 A CN113657573 A CN 113657573A
Authority
CN
China
Prior art keywords
memory
scene
robot
event
learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110740838.4A
Other languages
Chinese (zh)
Other versions
CN113657573B (en
Inventor
刘冬
于洪华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Research Institute Co Ltd of Dalian University of Technology
Original Assignee
Jiangsu Research Institute Co Ltd of Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Research Institute Co Ltd of Dalian University of Technology filed Critical Jiangsu Research Institute Co Ltd of Dalian University of Technology
Priority to CN202110740838.4A priority Critical patent/CN113657573B/en
Publication of CN113657573A publication Critical patent/CN113657573A/en
Application granted granted Critical
Publication of CN113657573B publication Critical patent/CN113657573B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/004Artificial life, i.e. computing arrangements simulating life
    • G06N3/008Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Robotics (AREA)
  • Manipulator (AREA)
  • Feedback Control In General (AREA)

Abstract

The invention provides a robot skill acquisition method based on meta-learning under the guidance of context memory, which comprises the steps of firstly establishing a context memory model of a robot learning system, constructing a similarity measurement algorithm for robot perception and memory, and realizing retrieval matching of events and scene information and updating and calling of events in the context memory; and then, constructing a robot operation skill meta learning algorithm guided by scene memory, and acquiring knowledge from the single task and all tasks respectively to learn the skills. The invention provides a method for guiding a robot to learn new skills by using the existing experience, which improves the learning efficiency of the robot on the operation skills and solves the problems that the data size is too large and similar tasks need to be trained repeatedly in the process of learning the operation skills of the robot.

Description

Robot skill acquisition method based on meta-learning under guidance of contextual memory
Technical Field
The invention belongs to the technical field of intelligent robot service, and relates to a robot operation skill learning method based on scene memory and meta-learning.
Background
In recent years, in the fields of industrial production, medical treatment, business, family service and the like, the current learning method of the intelligent robot can be competent for accurate and repetitive tasks, but the learning capability of new tasks is lacked, repeated training is needed in similar task scenes, and the problems that the experience cannot be accumulated to guide the new tasks to realize quick learning and the like are solved. In the invention patent CN108333941A, durolon, zhangleing, etc. of south china university disclose a cloud robot collaborative learning method based on hybrid enhanced intelligence. The general task is decomposed into simple subtasks by a neural task programming meta-learning method, the robot learns the subtasks based on a teaching learning method, and then the subtasks are gathered and shared. SongRui, Lifengming and the like at Shandong university disclose a robot operation skill learning system and method under the guidance of prior knowledge in patent CN111618862A, a robot thermal system is modularized into modules of physics, evaluation, strategy learning and the like, a state-action mapping set of a robot is established, and the difficulty of robot skill learning is relieved. However, the above methods have a limited range of applicability, and above all, none of the above methods have experienced reuse and lack attention to biological learning systems. Secondly, the robot is only suitable for learning of specific tasks, the extended learning of robot operating skills cannot be carried out, the robot lacks the relevant capabilities of autonomous learning and exploration, lacks the adaptability to task environments, cannot realize the real-time learning of the robot in practical application, and is difficult to meet the requirements that the robot can continuously contact with new tasks and learn new skills. Finally, the robot learning system has a complex framework and is difficult to design and build. Therefore, the method cannot meet the requirements of rapid learning and generalization of the operation technology of the intelligent robot.
Disclosure of Invention
The invention mainly solves the problem of how to solve a new task faced by an intelligent robot during working by using learned knowledge and experience, and adapts to a new task target. The invention provides a meta-learning robot skill learning method combined with context memory, aiming at the problems that a large amount of data is required for training, similar task scenes need to be repeatedly trained, experience cannot be accumulated to guide a new task to realize rapid learning and the like in the current robot skill learning. Firstly, learning a task by a meta-learning method in a learning process, and storing a scene observation value, a trained network weight and the like as experience information into a scene memory model; secondly, performing memory matching reading through the similarity between cosine distance measurement scenes, and writing and updating the memory by adopting an LURA algorithm; and finally, the robot sensing planning module is combined with the sensing environment, the target detection and the path planning capability to interact with a target object to complete a task, so that the memory-guided robot operation skill rapid learning is realized. The method specifically comprises the following steps:
step 1: establishing a robot learning system memory model;
a skill-based event modeling method is used for establishing a robot scenario memory mathematical model M, wherein M is a memory set formed by a plurality of scenario memories M, and the scenario memories M mainly comprise the following components: a time-varying scenario event sequence combination E, empirical knowledge G learned by a meta-learning network belonging to the scenario, and a key-value eigenvector K for retrieving a matching similar event, i.e., m ═ E, G, K }. The event sequence combination E is composed of i events, i.e., E ═ E1,e2,…eiAnd each event stores information such as environment observation values and actions related to the scene, and experience knowledge is acquired through event matching so as to guide decision-making behaviors.
Step 2: constructing a similarity measurement algorithm for robot perception and memory;
the more similar the new task and the trained task are in the meta-learning training stage, the more available scenes are, the more the task encoder encodes the event information at each time t into a key-value feature vector Kst. When retrieving and matching the scenes, selecting proper scene memory by calculating the similarity of the current event and the key value characteristic vector of the event stored in the scene memory. In the application stage, the task encoder encodes the scene information transmitted by the perception system to generate a key value eigenvector Kt(i) And searching and matching by calculating the similarity metric value of the scene information of the current event and the event information stored in the scene memory.
And step 3: writing the real-time experience into a memory model according to a scene memory writing mechanism;
and judging whether the current scene is a new event or not, if so, recording the event, and if not, updating the existing event in the scene memory. When the number of the stored scene memories reaches 20 set maximum numbers, the memory storage area only remains the reserved memory storage buffer area, at the moment, the current task memory is temporarily stored in the buffer area, and the memory is updated by analogy of an LRUA algorithm after the task is finished.
And 4, step 4: constructing a robot operation skill element learning algorithm guided by scene memory;
meta-learning learns on two levels, first to get knowledge quickly in each individual task, and the second to extract information slowly from all tasks. And enabling the robot to learn skills from the training task through the data of the training set. Firstly, a training task is split into events, each action executed by the robot corresponds to one event, in the training process, the robot packages the events and the executed strategy (skills) through a scenario memory module, and establishes the relation between the events and the skills, in addition, the robot learns all the training tasks through a meta-learning network, and packages information such as network weight and the like into experience knowledge.
And 5: and constructing a generalized learning algorithm aiming at the new task based on the scene memory.
And guiding the robot to learn new tasks appearing in the working environment according to the robot memories obtained in the steps 2, 3 and 4. Firstly, the sensing module is used for obtaining environment state information, similarity measurement is carried out on the current sensing information and events existing in a memory base, and proper events are selected from the memory to guide a current task.
The invention has the advantages that:
the invention effectively solves the problems that the operation skill learning of the intelligent robot needs a large amount of data training, similar task scenes need repeated training and experience cannot be accumulated to guide the new task to realize quick learning and the like at present, introduces the humanoid scene memory into the meta-learning method, and can guide the skill learning of the robot by using the experience when the robot faces the new task to realize the multiplexing of the skills. The invention can learn in a small amount of samples, complete complex and various tasks by learning and memorizing simple tasks, can utilize the prior experience knowledge to quickly master skills through a small amount of training to complete the learning tasks, and effectively improves the learning efficiency and the execution success rate of the robot skill learning.
Drawings
FIG. 1 is an overall flow diagram of the process of the present invention;
FIG. 2 is a diagram of a scenario memory model architecture;
FIG. 3 is a diagram of a scene memory update process;
FIG. 4 is a schematic diagram of an LSTM network structure;
Detailed Description
The following further describes a specific embodiment of the present invention with reference to the drawings and technical solutions.
The scenario-guided meta-learning-based robot skill acquisition flow diagram provided in the present example is shown in fig. 1. The invention relates to a meta-learning method based on context memory guidance, which comprises the steps of constructing a perception planning module, realizing the positioning and identification of an object through target detection, realizing the interaction of the context memory and a meta-learning network through a task encoder and a task decoder in the process of establishing and calling a context memory model, encoding a single task of the meta-learning network into an addressable label by the encoder, and decoding context experience into information which can be used by transmitting the information to the meta-learning network by the task decoder. In the meta-learning process, the meta-learner learns the current task at a low level for each task and grasps the current task; and learning is performed on all learning tasks at a high level, the experience knowledge is stored through a scene memory model, and a meta-learner is guided to learn the subsequent tasks.
In this embodiment, for example, the table top platform wood block stacking operation skill learning is adopted, and the wood block stacking learning method includes the following steps:
step 1: and establishing a memory model of the robot operation skill learning system. Establishing a robot scenario memory mathematical model M, which is formed as shown in fig. 2, wherein each scenario memory M is { E, G, K }, M comprises a time-varying event sequence combination E and empirical knowledge G learned by a meta-learning network belonging to the scenario, so as to obtain a robot scenario memory mathematical model MAnd a key-value feature vector K for retrieving matching similar events. In which the event sequence combination E is formed by i events, i.e. E ═ E1,e2,…,eiEach event stores information such as environment observation values and actions related to scenes and represents scenes and action sequences which the robot has experienced in the task; the empirical knowledge G is empirical knowledge such as skills learned in the task. The robot continuously accumulates experience in learning, and simultaneously stores important scene information in a task in events, wherein each event e is composed of four tuples<o,pe,a,pt>A composition in which o is a state perception of the environment obtained by a sensor, including distribution of objects in an image, a positional relationship between each other, joint information of a robot, and the like; p is a radical ofeIs the three-dimensional coordinates of the end effector of the mechanical arm; a is the action executed by the mechanical arm, and represents the action sequence of the robot at the current task in the time dimension; p is a radical oftThe three-dimensional coordinates of the target object for the mechanical arm to perform interactive operation are shown in the overall structure of fig. 2.
Step 2: and carrying out similarity measurement on robot perception and memory. In the learning process, the task encoder encodes the event information at each time t to generate a key value feature vector Kst. When retrieving and matching the scenes, selecting proper scene memory by calculating the similarity of the current event and the key value characteristic vector of the event stored in the scene memory. The scene memory update process is shown in fig. 3.
And step 3: and writing the real-time experience into the memory model according to the scene memory writing mechanism. When the number of the stored scene memories reaches 20 set maximum numbers, the storage area only remains a reserved memory storage buffer area, at the moment, the current task memory is temporarily stored in the buffer area, and after the task is finished, the memory is updated by analogy with an LRUA (least recent Used Access) algorithm. LRUA: the least recently used method is to store information to a memory location with a small number of uses to protect the recently written information, or to write to a memory location that has just been read, so as to avoid repeatedly storing similar memories. When the memory is updated, the softmax function is used for memorizing each time event in the buffer scene memory and the scene memoryEvent cosine distance converted into write weight
Figure BDA0003141329930000051
Figure BDA0003141329930000052
Wherein D (K)s,Mt(i) Is the cosine distance of the scene from the memory event at time t, KsKey-value feature vectors of memory events in a context memory for the state at time t, Mt(i) And memorizing the key value characteristic vector of each time event in the scene memory in the buffer area.
Then writing the events belonging to the same scene memory into the weight
Figure BDA0003141329930000053
Summing and averaging as coverage weights
Figure BDA0003141329930000054
According to
Figure BDA0003141329930000055
As a result, the new memory will be overwritten in the following two ways:
when there is a high similarity between two scenes, i.e. if
Figure BDA0003141329930000056
And writing the scene into the position of the most frequently called scene of the buffer scene.
B is, if
Figure BDA0003141329930000057
And if the situation in the buffer area is not particularly similar to the situation in the memory storage area, selecting the position of the situation memory with the lowest use weight, and covering the situation memory to ensure the high-efficiency utilization of the storage area. Use weight
Figure BDA0003141329930000058
The number of times the scene memory is matched in the memory storage area is defined as adding 1 to its use weight each time the scene memory is matched.
Figure BDA0003141329930000059
And 4, step 4: and (5) performing robot operation skill training by using a meta-learning method. Because the gradient-based updating mechanism in the back propagation has similarities with the updating of the cell state in the LSTM, and the long-term memory structure of the LSTM network is very similar to the idea of meta-learning, the LSTM is adopted to replace the back propagation meta-learning network, and the network structure is shown in FIG. 4, wherein X is XtAs input of the current unit cell, htFor hidden layer output, σ is sigmoid activation function, tanh is tanh activation function,
Figure BDA00031413299300000510
in order to carry out the multiplication,
Figure BDA00031413299300000511
is an addition.
Setting the learning rate at time t to αtThen, the learner parameter updating method is:
Figure BDA00031413299300000512
wherein theta istIs the parameter after the t-th update iteration, αtIs the learning rate at the time of the t-th,
Figure BDA00031413299300000513
is the time t-1 loss function with respect to thetat-1Gradient of (a), LtThe subscript t represents the loss function of the loss function at the time of the t-th update, and the calculation and gradient of the loss function are relative to the parameter theta after the last iterationt-1
This process has the same form as the updating of the cell state (cell state) in the LSTM:
Figure BDA0003141329930000061
order forgetting door ftStatus of cell unit c ═ 1t-1=θt-1Learning rate it=αt
Figure BDA0003141329930000062
And (4) finishing. When the network parameter falls into the 'saddle point', the current parameter needs to be shrunk and the previous parameter theta needs to be matchedt-1Forget, so the learning rate i needs to be redefinedtAnd forget door ftComprises the following steps:
Figure BDA0003141329930000063
Figure BDA0003141329930000064
where σ is the sigmod function, WIAnd WFUpdate functions for input gate and forget gate, respectively, bIAnd bFSeparately asking for the offset parameters, theta, of the input gate and the forgetting gatet-1Is the learner parameter at time t-1, LtFor the loss function after t updates,
Figure BDA0003141329930000065
is the time t-1 loss function with respect to thetat-1A gradient of (a);
the meta-learner updates the LSTM cell state through the two steps, and the meta-learner can quickly train while avoiding divergence. In the training process, a training task is firstly split into events, each action executed by the robot corresponds to one event, in the training process, the events and executed strategies (skills) are packaged by the robot through a scenario memory module, the relation between the events and the skills is established, in addition, the robot learns all the training tasks through a meta-learning network, and information such as network weight is packaged into experience knowledge.
The mean and variance are collected on each meta-test data set, so that during meta-training we use the batch statistics of the training set and the test set, while during the meta-test phase we use the batch statistics of the training set and the running average of the test set during the classifier test, which avoids information loss. For each feature channel of each layer, the corresponding inputs for all samples within the current batch are calculated and their mean and variance are counted. The mean and variance are then used to normalize the corresponding input for each sample. After normalization, the mean of all input features was 0 and the standard deviation was 1. Meanwhile, to prevent normalization from causing loss of feature information, γ, β: learnable parameters introduced by each feature for recovering the original input features,
Figure BDA0003141329930000066
respectively input and output, BNγ,β(xi) Representing a batch normalization process:
Figure BDA0003141329930000067
a SeLU activation function is adopted in a convolutional neural network layer, and the defect that some neurons are in an inactivated state and do not work after network parameters are updated due to the fact that the gradient of an input function of the ReLU activation function is too large is overcome. When the difference after activation is too large, the variance can be reduced, and gradient explosion is prevented. And the gradient is larger than 1 on the positive half axis, the gradient can be increased when the variance is too small, the gradient is prevented from disappearing, and the output of each layer of the neural network is the mean value of 0 and the variance is 1. The expression is as follows:
Figure BDA0003141329930000071
wherein lambda is approximately equal to 1.05, alpha is approximately equal to 1.67.
And 5: learning new robot operating skills based on trained contextual memory
In the application process, when the perception is similar to the previously coded event or the new event is different from the previously perceived event, the task encoder encodes the scene information transmitted by the perception system to generate a key value feature vector Kt(i) In that respect And the cosine distance is used as a similarity measurement function, and the matching scene is retrieved by calculating the similarity measurement value of the scene information of the current event and the event information stored in the scene memory:
Figure BDA0003141329930000072
then read the weight by weighting calculation
Figure BDA0003141329930000073
Figure BDA0003141329930000074
Where xi is the attenuation coefficient, the bigger value of xi represents the larger influence of the previous event on the current state, and when t is 1, xi is 0,
Figure BDA0003141329930000075
and storing cosine measurement of the event information for the current event scene information and the scene memory at the moment t. According to the read weight
Figure BDA0003141329930000076
The calculation result selects one of the following two decoding scenes for guiding the learning of the new task:
(1) when the read weight value is larger than a given threshold value, extracting experience information in the scene to which the event belongs, and guiding the learning of a new task by taking the scene as the experience of the new task;
(2) and if the weight average of the event reading weights in the scenes stored in the air during traversal is smaller than a given threshold value, defining the current event as a new event, establishing a new scene for the current task, and selecting the scene with the highest reading weight value to guide the new task to learn.
Let the event matched from the scene at the current time step be eiScene action information in the event of low-level extraction and matching is transmitted to a meta-learning network to help the robot to make a decision; and at a high level, the experience information such as the weight of the meta-learning network corresponding to the scene memory of the event is decoded by a task decoder and then transmitted to the meta-learning device, so that a more optimized network weight is given to the meta-learning device, and the convergence speed is accelerated.
Context awareness over current tasksiJudging whether the task is completed, if soiAnd context awareness at task completion ofIf the current task is the same as the current task, ending the current task; and if the events are different, continuing to match and calling the next event, combining skills corresponding to the events in the scene, and continuously interacting with the environment through closed-loop feedback until a task target is realized.
The above description of exemplary embodiments has been presented only to illustrate the technical solution of the invention and is not intended to be exhaustive or to limit the invention to the precise form described. Obviously, many modifications and variations are possible in light of the above teaching to those skilled in the art. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to thereby enable others skilled in the art to understand, implement and utilize the invention in various exemplary embodiments and with various alternatives and modifications. It is intended that the scope of the invention be defined by the following claims and their equivalents.

Claims (3)

1. A robot skill acquisition method based on meta-learning under the guidance of contextual memory is characterized in that a contextual memory module is added on the basis of a meta-learning method, and empirical knowledge learned by a robot in a task is stored, and the method comprises the following steps:
step 1: establishing a robot learning system memory model;
establishing a robot scene memory mathematical model M, wherein M is a memory set formed by a plurality of scene memories M, and the scene memories M mainly form partsComprises the following steps: a time-varying scenario event sequence combination E, empirical knowledge G learned by a meta-learning network belonging to the scenario, and a key-value eigenvector K for retrieving a matched similar event, namely m ═ E, G, K }; the event sequence combination E is composed of a plurality of events, i.e., E ═ E1,e2,···eiStoring information related to the situation in each event, and acquiring experience knowledge through event matching so as to guide decision-making behaviors;
step 2: constructing a robot event perception similarity measurement algorithm;
the task encoder encodes the event information at each moment t to generate a key value characteristic vector Kst(ii) a When retrieving and matching the scenario memories, selecting the scenario memories by calculating the similarity of key value characteristic vectors of the current events and the events stored in the scenario memories; in the application stage, a task encoder encodes the scene information transmitted by the perception system to generate a key value eigenvector Kt(i) Selecting a proper scenario memory by calculating the similarity of the key value eigenvector of the current event and the event stored in the scenario memory by adopting the cosine distance as a similarity measurement function:
Figure FDA0003141329920000011
wherein st is time information at the time t;
and step 3: writing the real-time experience into a memory model according to a scene memory writing mechanism;
judging whether the current scene is a new event or not, if so, recording the event, and if not, updating the existing event in the scene memory; when the stored scene memory amount reaches the set maximum amount of 20, the memory storage area only remains the reserved memory storage buffer area, the current task memory is temporarily stored in the buffer area at the moment, the memory is updated by utilizing an LRUA algorithm after the task is finished, and the LRUA: the least recently used method is that information is stored to a memory position with less use times to protect the recently written information or the memory position which is just read is written to avoid repeatedly storing similar memory; while updating the memoryConverting cosine distance between each moment event in the buffer scene memory and memory event in the scene memory into writing weight by using softmax function
Figure FDA0003141329920000021
Figure FDA0003141329920000022
Wherein, D (K)s,Mt(i) Is the cosine distance of the scene from the memory event at time t, KsKey-value feature vectors of memory events in a context memory for the state at time t, Mt(i) Key value feature vectors of events at each moment in the scene memory in the buffer area;
then writing the events belonging to the same scene memory into the weight
Figure FDA0003141329920000023
Summing and averaging to obtain coverage weight
Figure FDA0003141329920000024
According to
Figure FDA0003141329920000025
Calculating the result that the new memory is written into the position of the memory area most similar to the scene memory or the position of the scene memory least frequently called;
and 4, step 4: constructing a robot motor skill meta-learning algorithm guided by scene memory;
meta-learning is learned on two levels, the first learning level is to rapidly acquire knowledge in each individual task, and the second learning level is to slowly extract information from all tasks; enabling the robot to learn skills from the training task through data of the training set; firstly, splitting a training task into subtasks, wherein each action executed by a robot corresponds to an event, in the training process, the robot packages the event perception and the behavior through a scene memory module, and establishes the relation between the events and the behavior, and in addition, the robot learns all the training tasks through a meta-learning network and packages network weight information into experience knowledge;
the construction of the meta-learning network adopts the LSTM to replace the learning network of back propagation, and the time t sets the learning rate as alphatThen, the learner parameter updating method is:
Figure FDA0003141329920000026
the learner parameter update procedure has the same form as the updating of the cell states in LSTM:
Figure FDA0003141329920000027
order forgetting door ftStatus of cell unit c ═ 1t-1=θt-1Learning rate it=αt
Figure FDA0003141329920000031
Then the method is finished; when the network parameter falls into the 'saddle point', the current parameter needs to be shrunk and the previous parameter theta needs to be matchedt-1Forget to go on, redefine learning rate itAnd forget door ftComprises the following steps:
Figure FDA0003141329920000032
Figure FDA0003141329920000033
wherein σ is sigmoid function, WIAnd WFUpdate functions for input gate and forget gate, respectively, bIAnd bFSeparately asking for the offset parameters, theta, of the input gate and the forgetting gatet-1Learning for time t-1Device parameter, LtFor the loss function after t updates,
Figure FDA0003141329920000034
is the time t-1 loss function with respect to thetat-1A gradient of (a);
the meta-learner updates the LSTM cell state through the two steps, and the meta-learner can quickly train while avoiding divergence;
and 5: constructing a generalized learning algorithm aiming at a new task based on the scene memory;
guiding the robot to learn new tasks appearing in the working environment according to the robot memories obtained in the steps 2, 3 and 4; firstly, obtaining environment state information by using a perception module, carrying out similarity measurement on current perception information and events in a memory bank, taking cosine distance as a similarity measurement function, and retrieving matching scenes by calculating similarity measurement values of scene information of the current events and event information stored in a scene memory:
Figure FDA0003141329920000035
then read the weight by weighting calculation
Figure FDA0003141329920000036
Figure FDA0003141329920000037
Where xi is the attenuation coefficient, the bigger value of xi represents the larger influence of the previous event on the current state, and xi is 0 when t is 1,
Figure FDA0003141329920000038
cosine measurement of the current event scene information and the scene memory storage event information at the moment t;
secondly, selecting proper scene memory to guide the current task; root of herbaceous plantAccording to the read weight
Figure FDA0003141329920000039
Selecting a guiding experience according to a calculation result; if the read weight value is larger than a given threshold value, extracting experience information in the scene to which the event belongs and guiding the learning of a new task by taking the scene as the experience of the new task; if the event with the reading weight larger than the threshold value does not exist in the memory, the current event is defined as a new event, a new scene is established for the current task, and the scene with the highest reading weight value is selected to guide the new task to learn.
2. The method for acquiring robot skills based on meta-learning under the guidance of contextual memory according to claim 1, characterized in that event eiBy quadruplets<o,pe,a,pt>The robot comprises a composition, wherein o is state perception of the environment obtained through a sensor, and comprises distribution of objects in an image, a position relation among the objects and joint information of the robot; p is a radical ofeIs the three-dimensional coordinates of the end effector of the mechanical arm; a is the action executed by the mechanical arm, and represents the action sequence of the robot at the current task in the time dimension; p is a radical oftIs the three-dimensional coordinate of the target object for the mechanical arm to carry out interactive operation.
3. A contextual memory guided meta-learning based robot skill acquisition method according to claim 1 or 2, characterized in that the new memory will be overwritten to the location of the most similar contextual memory or to the location of the least frequently called contextual memory in the memory area:
(1) when there is a high similarity between the two scenes, i.e. if
Figure FDA0003141329920000041
Writing the scene into the position of the most frequently called scene in the buffer area;
(2) if it is
Figure FDA0003141329920000042
Indicate slowAnd if the scenes in the conflict area are not particularly similar to the scenes in the memory storage area, selecting the position of the scene memory with the lowest use weight, and covering the scene memory to ensure the high-efficiency utilization of the storage area. Use weight
Figure FDA0003141329920000043
The number of times the scene memory is matched in the memory storage area is defined as adding 1 to its use weight each time the scene memory is matched.
CN202110740838.4A 2021-06-30 2021-06-30 Robot skill acquisition method based on meta learning under scene memory guidance Active CN113657573B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110740838.4A CN113657573B (en) 2021-06-30 2021-06-30 Robot skill acquisition method based on meta learning under scene memory guidance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110740838.4A CN113657573B (en) 2021-06-30 2021-06-30 Robot skill acquisition method based on meta learning under scene memory guidance

Publications (2)

Publication Number Publication Date
CN113657573A true CN113657573A (en) 2021-11-16
CN113657573B CN113657573B (en) 2024-06-21

Family

ID=78477833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110740838.4A Active CN113657573B (en) 2021-06-30 2021-06-30 Robot skill acquisition method based on meta learning under scene memory guidance

Country Status (1)

Country Link
CN (1) CN113657573B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114161419A (en) * 2021-12-13 2022-03-11 大连理工大学 Robot operation skill efficient learning method guided by scene memory
CN115082717A (en) * 2022-08-22 2022-09-20 成都不烦智能科技有限责任公司 Dynamic target identification and context memory cognition method and system based on visual perception
CN116563638A (en) * 2023-05-19 2023-08-08 广东石油化工学院 Image classification model optimization method and system based on scene memory
CN118536545B (en) * 2024-06-13 2024-11-15 东北电力大学 Scene memory network design method based on synaptic remodeling model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180210939A1 (en) * 2017-01-26 2018-07-26 Hrl Laboratories, Llc Scalable and efficient episodic memory in cognitive processing for automated systems
CN109668566A (en) * 2018-12-05 2019-04-23 大连理工大学 Robot scene cognition map construction and navigation method based on mouse brain positioning cells
CN111474932A (en) * 2020-04-23 2020-07-31 大连理工大学 Mobile robot mapping and navigation method integrating scene experience
CN112231489A (en) * 2020-10-19 2021-01-15 中国科学技术大学 Knowledge learning and transferring method and system for epidemic prevention robot

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180210939A1 (en) * 2017-01-26 2018-07-26 Hrl Laboratories, Llc Scalable and efficient episodic memory in cognitive processing for automated systems
CN109668566A (en) * 2018-12-05 2019-04-23 大连理工大学 Robot scene cognition map construction and navigation method based on mouse brain positioning cells
CN111474932A (en) * 2020-04-23 2020-07-31 大连理工大学 Mobile robot mapping and navigation method integrating scene experience
CN112231489A (en) * 2020-10-19 2021-01-15 中国科学技术大学 Knowledge learning and transferring method and system for epidemic prevention robot

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114161419A (en) * 2021-12-13 2022-03-11 大连理工大学 Robot operation skill efficient learning method guided by scene memory
CN114161419B (en) * 2021-12-13 2023-09-15 大连理工大学 Efficient learning method for robot operation skills guided by scene memory
CN115082717A (en) * 2022-08-22 2022-09-20 成都不烦智能科技有限责任公司 Dynamic target identification and context memory cognition method and system based on visual perception
CN115082717B (en) * 2022-08-22 2022-11-08 成都不烦智能科技有限责任公司 Dynamic target identification and context memory cognition method and system based on visual perception
CN116563638A (en) * 2023-05-19 2023-08-08 广东石油化工学院 Image classification model optimization method and system based on scene memory
CN116563638B (en) * 2023-05-19 2023-12-05 广东石油化工学院 Image classification model optimization method and system based on scene memory
CN118536545B (en) * 2024-06-13 2024-11-15 东北电力大学 Scene memory network design method based on synaptic remodeling model

Also Published As

Publication number Publication date
CN113657573B (en) 2024-06-21

Similar Documents

Publication Publication Date Title
CN111203878B (en) Robot sequence task learning method based on visual simulation
CN112605973B (en) Robot motor skill learning method and system
Zhu An adaptive agent decision model based on deep reinforcement learning and autonomous learning
Paxton et al. Prospection: Interpretable plans from language by predicting the future
CN113657573A (en) Robot skill acquisition method based on meta-learning under guidance of contextual memory
CN111300390B (en) Intelligent mechanical arm control system based on reservoir sampling and double-channel inspection pool
CN109940614B (en) Mechanical arm multi-scene rapid motion planning method integrating memory mechanism
EP4121256A1 (en) Training and/or utilizing machine learning model(s) for use in natural language based robotic control
CN112183188B (en) Method for simulating learning of mechanical arm based on task embedded network
Lippi et al. Enabling visual action planning for object manipulation through latent space roadmap
CN115860107B (en) Multi-machine searching method and system based on multi-agent deep reinforcement learning
Waytowich et al. A narration-based reward shaping approach using grounded natural language commands
CN112509392B (en) Robot behavior teaching method based on meta-learning
Li et al. Curiosity-driven exploration for off-policy reinforcement learning methods
CN114161419B (en) Efficient learning method for robot operation skills guided by scene memory
CN118365099B (en) Multi-AGV scheduling method, device, equipment and storage medium
CN117332366A (en) Information processing method, task execution method, device, equipment and medium
Reinhart Reservoir computing with output feedback
CN115016499A (en) Path planning method based on SCA-QL
US20220305647A1 (en) Future prediction, using stochastic adversarial based sampling, for robotic control and/or other purpose(s)
Zhou et al. Humanoid action imitation learning via boosting sample DQN in virtual demonstrator environment
Xiong et al. Primitives generation policy learning without catastrophic forgetting for robotic manipulation
Yu et al. LSTM learn policy from dynamical system of demonstration motions for robot imitation learning
CN117590756B (en) Motion control method, device, equipment and storage medium for underwater robot
Chen et al. Distributed continuous control with meta learning on robotic arms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant