CN113657573A - Robot skill acquisition method based on meta-learning under guidance of contextual memory - Google Patents
Robot skill acquisition method based on meta-learning under guidance of contextual memory Download PDFInfo
- Publication number
- CN113657573A CN113657573A CN202110740838.4A CN202110740838A CN113657573A CN 113657573 A CN113657573 A CN 113657573A CN 202110740838 A CN202110740838 A CN 202110740838A CN 113657573 A CN113657573 A CN 113657573A
- Authority
- CN
- China
- Prior art keywords
- memory
- scene
- robot
- event
- learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000015654 memory Effects 0.000 title claims abstract description 124
- 238000000034 method Methods 0.000 title claims abstract description 47
- 230000008569 process Effects 0.000 claims abstract description 15
- 238000005259 measurement Methods 0.000 claims abstract description 14
- 230000008447 perception Effects 0.000 claims abstract description 14
- 238000012549 training Methods 0.000 claims description 24
- 230000006870 function Effects 0.000 claims description 21
- 239000013598 vector Substances 0.000 claims description 12
- 230000009471 action Effects 0.000 claims description 11
- 239000000872 buffer Substances 0.000 claims description 10
- 230000005055 memory storage Effects 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 5
- 230000006399 behavior Effects 0.000 claims description 4
- 238000013178 mathematical model Methods 0.000 claims description 4
- 230000007246 mechanism Effects 0.000 claims description 4
- 238000003860 storage Methods 0.000 claims description 3
- 239000012536 storage buffer Substances 0.000 claims description 3
- 238000012935 Averaging Methods 0.000 claims description 2
- 238000009826 distribution Methods 0.000 claims description 2
- 239000012636 effector Substances 0.000 claims description 2
- 230000002452 interceptive effect Effects 0.000 claims description 2
- 238000010276 construction Methods 0.000 claims 1
- 210000004027 cell Anatomy 0.000 description 6
- 230000004913 activation Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000010606 normalization Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 239000002023 wood Substances 0.000 description 2
- 235000004257 Cordia myxa Nutrition 0.000 description 1
- 244000157795 Cordia myxa Species 0.000 description 1
- 239000004424 Durolon Substances 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000009776 industrial production Methods 0.000 description 1
- 238000012905 input function Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000007787 long-term memory Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000003252 repetitive effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/008—Artificial life, i.e. computing arrangements simulating life based on physical entities controlled by simulated intelligence so as to replicate intelligent life forms, e.g. based on robots replicating pets or humans in their appearance or behaviour
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computing Systems (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Robotics (AREA)
- Manipulator (AREA)
- Feedback Control In General (AREA)
Abstract
The invention provides a robot skill acquisition method based on meta-learning under the guidance of context memory, which comprises the steps of firstly establishing a context memory model of a robot learning system, constructing a similarity measurement algorithm for robot perception and memory, and realizing retrieval matching of events and scene information and updating and calling of events in the context memory; and then, constructing a robot operation skill meta learning algorithm guided by scene memory, and acquiring knowledge from the single task and all tasks respectively to learn the skills. The invention provides a method for guiding a robot to learn new skills by using the existing experience, which improves the learning efficiency of the robot on the operation skills and solves the problems that the data size is too large and similar tasks need to be trained repeatedly in the process of learning the operation skills of the robot.
Description
Technical Field
The invention belongs to the technical field of intelligent robot service, and relates to a robot operation skill learning method based on scene memory and meta-learning.
Background
In recent years, in the fields of industrial production, medical treatment, business, family service and the like, the current learning method of the intelligent robot can be competent for accurate and repetitive tasks, but the learning capability of new tasks is lacked, repeated training is needed in similar task scenes, and the problems that the experience cannot be accumulated to guide the new tasks to realize quick learning and the like are solved. In the invention patent CN108333941A, durolon, zhangleing, etc. of south china university disclose a cloud robot collaborative learning method based on hybrid enhanced intelligence. The general task is decomposed into simple subtasks by a neural task programming meta-learning method, the robot learns the subtasks based on a teaching learning method, and then the subtasks are gathered and shared. SongRui, Lifengming and the like at Shandong university disclose a robot operation skill learning system and method under the guidance of prior knowledge in patent CN111618862A, a robot thermal system is modularized into modules of physics, evaluation, strategy learning and the like, a state-action mapping set of a robot is established, and the difficulty of robot skill learning is relieved. However, the above methods have a limited range of applicability, and above all, none of the above methods have experienced reuse and lack attention to biological learning systems. Secondly, the robot is only suitable for learning of specific tasks, the extended learning of robot operating skills cannot be carried out, the robot lacks the relevant capabilities of autonomous learning and exploration, lacks the adaptability to task environments, cannot realize the real-time learning of the robot in practical application, and is difficult to meet the requirements that the robot can continuously contact with new tasks and learn new skills. Finally, the robot learning system has a complex framework and is difficult to design and build. Therefore, the method cannot meet the requirements of rapid learning and generalization of the operation technology of the intelligent robot.
Disclosure of Invention
The invention mainly solves the problem of how to solve a new task faced by an intelligent robot during working by using learned knowledge and experience, and adapts to a new task target. The invention provides a meta-learning robot skill learning method combined with context memory, aiming at the problems that a large amount of data is required for training, similar task scenes need to be repeatedly trained, experience cannot be accumulated to guide a new task to realize rapid learning and the like in the current robot skill learning. Firstly, learning a task by a meta-learning method in a learning process, and storing a scene observation value, a trained network weight and the like as experience information into a scene memory model; secondly, performing memory matching reading through the similarity between cosine distance measurement scenes, and writing and updating the memory by adopting an LURA algorithm; and finally, the robot sensing planning module is combined with the sensing environment, the target detection and the path planning capability to interact with a target object to complete a task, so that the memory-guided robot operation skill rapid learning is realized. The method specifically comprises the following steps:
step 1: establishing a robot learning system memory model;
a skill-based event modeling method is used for establishing a robot scenario memory mathematical model M, wherein M is a memory set formed by a plurality of scenario memories M, and the scenario memories M mainly comprise the following components: a time-varying scenario event sequence combination E, empirical knowledge G learned by a meta-learning network belonging to the scenario, and a key-value eigenvector K for retrieving a matching similar event, i.e., m ═ E, G, K }. The event sequence combination E is composed of i events, i.e., E ═ E1,e2,…eiAnd each event stores information such as environment observation values and actions related to the scene, and experience knowledge is acquired through event matching so as to guide decision-making behaviors.
Step 2: constructing a similarity measurement algorithm for robot perception and memory;
the more similar the new task and the trained task are in the meta-learning training stage, the more available scenes are, the more the task encoder encodes the event information at each time t into a key-value feature vector Kst. When retrieving and matching the scenes, selecting proper scene memory by calculating the similarity of the current event and the key value characteristic vector of the event stored in the scene memory. In the application stage, the task encoder encodes the scene information transmitted by the perception system to generate a key value eigenvector Kt(i) And searching and matching by calculating the similarity metric value of the scene information of the current event and the event information stored in the scene memory.
And step 3: writing the real-time experience into a memory model according to a scene memory writing mechanism;
and judging whether the current scene is a new event or not, if so, recording the event, and if not, updating the existing event in the scene memory. When the number of the stored scene memories reaches 20 set maximum numbers, the memory storage area only remains the reserved memory storage buffer area, at the moment, the current task memory is temporarily stored in the buffer area, and the memory is updated by analogy of an LRUA algorithm after the task is finished.
And 4, step 4: constructing a robot operation skill element learning algorithm guided by scene memory;
meta-learning learns on two levels, first to get knowledge quickly in each individual task, and the second to extract information slowly from all tasks. And enabling the robot to learn skills from the training task through the data of the training set. Firstly, a training task is split into events, each action executed by the robot corresponds to one event, in the training process, the robot packages the events and the executed strategy (skills) through a scenario memory module, and establishes the relation between the events and the skills, in addition, the robot learns all the training tasks through a meta-learning network, and packages information such as network weight and the like into experience knowledge.
And 5: and constructing a generalized learning algorithm aiming at the new task based on the scene memory.
And guiding the robot to learn new tasks appearing in the working environment according to the robot memories obtained in the steps 2, 3 and 4. Firstly, the sensing module is used for obtaining environment state information, similarity measurement is carried out on the current sensing information and events existing in a memory base, and proper events are selected from the memory to guide a current task.
The invention has the advantages that:
the invention effectively solves the problems that the operation skill learning of the intelligent robot needs a large amount of data training, similar task scenes need repeated training and experience cannot be accumulated to guide the new task to realize quick learning and the like at present, introduces the humanoid scene memory into the meta-learning method, and can guide the skill learning of the robot by using the experience when the robot faces the new task to realize the multiplexing of the skills. The invention can learn in a small amount of samples, complete complex and various tasks by learning and memorizing simple tasks, can utilize the prior experience knowledge to quickly master skills through a small amount of training to complete the learning tasks, and effectively improves the learning efficiency and the execution success rate of the robot skill learning.
Drawings
FIG. 1 is an overall flow diagram of the process of the present invention;
FIG. 2 is a diagram of a scenario memory model architecture;
FIG. 3 is a diagram of a scene memory update process;
FIG. 4 is a schematic diagram of an LSTM network structure;
Detailed Description
The following further describes a specific embodiment of the present invention with reference to the drawings and technical solutions.
The scenario-guided meta-learning-based robot skill acquisition flow diagram provided in the present example is shown in fig. 1. The invention relates to a meta-learning method based on context memory guidance, which comprises the steps of constructing a perception planning module, realizing the positioning and identification of an object through target detection, realizing the interaction of the context memory and a meta-learning network through a task encoder and a task decoder in the process of establishing and calling a context memory model, encoding a single task of the meta-learning network into an addressable label by the encoder, and decoding context experience into information which can be used by transmitting the information to the meta-learning network by the task decoder. In the meta-learning process, the meta-learner learns the current task at a low level for each task and grasps the current task; and learning is performed on all learning tasks at a high level, the experience knowledge is stored through a scene memory model, and a meta-learner is guided to learn the subsequent tasks.
In this embodiment, for example, the table top platform wood block stacking operation skill learning is adopted, and the wood block stacking learning method includes the following steps:
step 1: and establishing a memory model of the robot operation skill learning system. Establishing a robot scenario memory mathematical model M, which is formed as shown in fig. 2, wherein each scenario memory M is { E, G, K }, M comprises a time-varying event sequence combination E and empirical knowledge G learned by a meta-learning network belonging to the scenario, so as to obtain a robot scenario memory mathematical model MAnd a key-value feature vector K for retrieving matching similar events. In which the event sequence combination E is formed by i events, i.e. E ═ E1,e2,…,eiEach event stores information such as environment observation values and actions related to scenes and represents scenes and action sequences which the robot has experienced in the task; the empirical knowledge G is empirical knowledge such as skills learned in the task. The robot continuously accumulates experience in learning, and simultaneously stores important scene information in a task in events, wherein each event e is composed of four tuples<o,pe,a,pt>A composition in which o is a state perception of the environment obtained by a sensor, including distribution of objects in an image, a positional relationship between each other, joint information of a robot, and the like; p is a radical ofeIs the three-dimensional coordinates of the end effector of the mechanical arm; a is the action executed by the mechanical arm, and represents the action sequence of the robot at the current task in the time dimension; p is a radical oftThe three-dimensional coordinates of the target object for the mechanical arm to perform interactive operation are shown in the overall structure of fig. 2.
Step 2: and carrying out similarity measurement on robot perception and memory. In the learning process, the task encoder encodes the event information at each time t to generate a key value feature vector Kst. When retrieving and matching the scenes, selecting proper scene memory by calculating the similarity of the current event and the key value characteristic vector of the event stored in the scene memory. The scene memory update process is shown in fig. 3.
And step 3: and writing the real-time experience into the memory model according to the scene memory writing mechanism. When the number of the stored scene memories reaches 20 set maximum numbers, the storage area only remains a reserved memory storage buffer area, at the moment, the current task memory is temporarily stored in the buffer area, and after the task is finished, the memory is updated by analogy with an LRUA (least recent Used Access) algorithm. LRUA: the least recently used method is to store information to a memory location with a small number of uses to protect the recently written information, or to write to a memory location that has just been read, so as to avoid repeatedly storing similar memories. When the memory is updated, the softmax function is used for memorizing each time event in the buffer scene memory and the scene memoryEvent cosine distance converted into write weight
Wherein D (K)s,Mt(i) Is the cosine distance of the scene from the memory event at time t, KsKey-value feature vectors of memory events in a context memory for the state at time t, Mt(i) And memorizing the key value characteristic vector of each time event in the scene memory in the buffer area.
Then writing the events belonging to the same scene memory into the weightSumming and averaging as coverage weightsAccording toAs a result, the new memory will be overwritten in the following two ways:
when there is a high similarity between two scenes, i.e. ifAnd writing the scene into the position of the most frequently called scene of the buffer scene.
B is, ifAnd if the situation in the buffer area is not particularly similar to the situation in the memory storage area, selecting the position of the situation memory with the lowest use weight, and covering the situation memory to ensure the high-efficiency utilization of the storage area. Use weightThe number of times the scene memory is matched in the memory storage area is defined as adding 1 to its use weight each time the scene memory is matched.
And 4, step 4: and (5) performing robot operation skill training by using a meta-learning method. Because the gradient-based updating mechanism in the back propagation has similarities with the updating of the cell state in the LSTM, and the long-term memory structure of the LSTM network is very similar to the idea of meta-learning, the LSTM is adopted to replace the back propagation meta-learning network, and the network structure is shown in FIG. 4, wherein X is XtAs input of the current unit cell, htFor hidden layer output, σ is sigmoid activation function, tanh is tanh activation function,in order to carry out the multiplication,is an addition.
Setting the learning rate at time t to αtThen, the learner parameter updating method is:
wherein theta istIs the parameter after the t-th update iteration, αtIs the learning rate at the time of the t-th,is the time t-1 loss function with respect to thetat-1Gradient of (a), LtThe subscript t represents the loss function of the loss function at the time of the t-th update, and the calculation and gradient of the loss function are relative to the parameter theta after the last iterationt-1。
This process has the same form as the updating of the cell state (cell state) in the LSTM:
order forgetting door ftStatus of cell unit c ═ 1t-1=θt-1Learning rate it=αt,And (4) finishing. When the network parameter falls into the 'saddle point', the current parameter needs to be shrunk and the previous parameter theta needs to be matchedt-1Forget, so the learning rate i needs to be redefinedtAnd forget door ftComprises the following steps:
where σ is the sigmod function, WIAnd WFUpdate functions for input gate and forget gate, respectively, bIAnd bFSeparately asking for the offset parameters, theta, of the input gate and the forgetting gatet-1Is the learner parameter at time t-1, LtFor the loss function after t updates,is the time t-1 loss function with respect to thetat-1A gradient of (a);
the meta-learner updates the LSTM cell state through the two steps, and the meta-learner can quickly train while avoiding divergence. In the training process, a training task is firstly split into events, each action executed by the robot corresponds to one event, in the training process, the events and executed strategies (skills) are packaged by the robot through a scenario memory module, the relation between the events and the skills is established, in addition, the robot learns all the training tasks through a meta-learning network, and information such as network weight is packaged into experience knowledge.
The mean and variance are collected on each meta-test data set, so that during meta-training we use the batch statistics of the training set and the test set, while during the meta-test phase we use the batch statistics of the training set and the running average of the test set during the classifier test, which avoids information loss. For each feature channel of each layer, the corresponding inputs for all samples within the current batch are calculated and their mean and variance are counted. The mean and variance are then used to normalize the corresponding input for each sample. After normalization, the mean of all input features was 0 and the standard deviation was 1. Meanwhile, to prevent normalization from causing loss of feature information, γ, β: learnable parameters introduced by each feature for recovering the original input features,respectively input and output, BNγ,β(xi) Representing a batch normalization process:
a SeLU activation function is adopted in a convolutional neural network layer, and the defect that some neurons are in an inactivated state and do not work after network parameters are updated due to the fact that the gradient of an input function of the ReLU activation function is too large is overcome. When the difference after activation is too large, the variance can be reduced, and gradient explosion is prevented. And the gradient is larger than 1 on the positive half axis, the gradient can be increased when the variance is too small, the gradient is prevented from disappearing, and the output of each layer of the neural network is the mean value of 0 and the variance is 1. The expression is as follows:
wherein lambda is approximately equal to 1.05, alpha is approximately equal to 1.67.
And 5: learning new robot operating skills based on trained contextual memory
In the application process, when the perception is similar to the previously coded event or the new event is different from the previously perceived event, the task encoder encodes the scene information transmitted by the perception system to generate a key value feature vector Kt(i) In that respect And the cosine distance is used as a similarity measurement function, and the matching scene is retrieved by calculating the similarity measurement value of the scene information of the current event and the event information stored in the scene memory:
Where xi is the attenuation coefficient, the bigger value of xi represents the larger influence of the previous event on the current state, and when t is 1, xi is 0,and storing cosine measurement of the event information for the current event scene information and the scene memory at the moment t. According to the read weightThe calculation result selects one of the following two decoding scenes for guiding the learning of the new task:
(1) when the read weight value is larger than a given threshold value, extracting experience information in the scene to which the event belongs, and guiding the learning of a new task by taking the scene as the experience of the new task;
(2) and if the weight average of the event reading weights in the scenes stored in the air during traversal is smaller than a given threshold value, defining the current event as a new event, establishing a new scene for the current task, and selecting the scene with the highest reading weight value to guide the new task to learn.
Let the event matched from the scene at the current time step be eiScene action information in the event of low-level extraction and matching is transmitted to a meta-learning network to help the robot to make a decision; and at a high level, the experience information such as the weight of the meta-learning network corresponding to the scene memory of the event is decoded by a task decoder and then transmitted to the meta-learning device, so that a more optimized network weight is given to the meta-learning device, and the convergence speed is accelerated.
Context awareness over current tasksiJudging whether the task is completed, if soiAnd context awareness at task completion ofIf the current task is the same as the current task, ending the current task; and if the events are different, continuing to match and calling the next event, combining skills corresponding to the events in the scene, and continuously interacting with the environment through closed-loop feedback until a task target is realized.
The above description of exemplary embodiments has been presented only to illustrate the technical solution of the invention and is not intended to be exhaustive or to limit the invention to the precise form described. Obviously, many modifications and variations are possible in light of the above teaching to those skilled in the art. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to thereby enable others skilled in the art to understand, implement and utilize the invention in various exemplary embodiments and with various alternatives and modifications. It is intended that the scope of the invention be defined by the following claims and their equivalents.
Claims (3)
1. A robot skill acquisition method based on meta-learning under the guidance of contextual memory is characterized in that a contextual memory module is added on the basis of a meta-learning method, and empirical knowledge learned by a robot in a task is stored, and the method comprises the following steps:
step 1: establishing a robot learning system memory model;
establishing a robot scene memory mathematical model M, wherein M is a memory set formed by a plurality of scene memories M, and the scene memories M mainly form partsComprises the following steps: a time-varying scenario event sequence combination E, empirical knowledge G learned by a meta-learning network belonging to the scenario, and a key-value eigenvector K for retrieving a matched similar event, namely m ═ E, G, K }; the event sequence combination E is composed of a plurality of events, i.e., E ═ E1,e2,···eiStoring information related to the situation in each event, and acquiring experience knowledge through event matching so as to guide decision-making behaviors;
step 2: constructing a robot event perception similarity measurement algorithm;
the task encoder encodes the event information at each moment t to generate a key value characteristic vector Kst(ii) a When retrieving and matching the scenario memories, selecting the scenario memories by calculating the similarity of key value characteristic vectors of the current events and the events stored in the scenario memories; in the application stage, a task encoder encodes the scene information transmitted by the perception system to generate a key value eigenvector Kt(i) Selecting a proper scenario memory by calculating the similarity of the key value eigenvector of the current event and the event stored in the scenario memory by adopting the cosine distance as a similarity measurement function:
wherein st is time information at the time t;
and step 3: writing the real-time experience into a memory model according to a scene memory writing mechanism;
judging whether the current scene is a new event or not, if so, recording the event, and if not, updating the existing event in the scene memory; when the stored scene memory amount reaches the set maximum amount of 20, the memory storage area only remains the reserved memory storage buffer area, the current task memory is temporarily stored in the buffer area at the moment, the memory is updated by utilizing an LRUA algorithm after the task is finished, and the LRUA: the least recently used method is that information is stored to a memory position with less use times to protect the recently written information or the memory position which is just read is written to avoid repeatedly storing similar memory; while updating the memoryConverting cosine distance between each moment event in the buffer scene memory and memory event in the scene memory into writing weight by using softmax function
Wherein, D (K)s,Mt(i) Is the cosine distance of the scene from the memory event at time t, KsKey-value feature vectors of memory events in a context memory for the state at time t, Mt(i) Key value feature vectors of events at each moment in the scene memory in the buffer area;
then writing the events belonging to the same scene memory into the weightSumming and averaging to obtain coverage weightAccording toCalculating the result that the new memory is written into the position of the memory area most similar to the scene memory or the position of the scene memory least frequently called;
and 4, step 4: constructing a robot motor skill meta-learning algorithm guided by scene memory;
meta-learning is learned on two levels, the first learning level is to rapidly acquire knowledge in each individual task, and the second learning level is to slowly extract information from all tasks; enabling the robot to learn skills from the training task through data of the training set; firstly, splitting a training task into subtasks, wherein each action executed by a robot corresponds to an event, in the training process, the robot packages the event perception and the behavior through a scene memory module, and establishes the relation between the events and the behavior, and in addition, the robot learns all the training tasks through a meta-learning network and packages network weight information into experience knowledge;
the construction of the meta-learning network adopts the LSTM to replace the learning network of back propagation, and the time t sets the learning rate as alphatThen, the learner parameter updating method is:
the learner parameter update procedure has the same form as the updating of the cell states in LSTM:
order forgetting door ftStatus of cell unit c ═ 1t-1=θt-1Learning rate it=αt,Then the method is finished; when the network parameter falls into the 'saddle point', the current parameter needs to be shrunk and the previous parameter theta needs to be matchedt-1Forget to go on, redefine learning rate itAnd forget door ftComprises the following steps:
wherein σ is sigmoid function, WIAnd WFUpdate functions for input gate and forget gate, respectively, bIAnd bFSeparately asking for the offset parameters, theta, of the input gate and the forgetting gatet-1Learning for time t-1Device parameter, LtFor the loss function after t updates,is the time t-1 loss function with respect to thetat-1A gradient of (a);
the meta-learner updates the LSTM cell state through the two steps, and the meta-learner can quickly train while avoiding divergence;
and 5: constructing a generalized learning algorithm aiming at a new task based on the scene memory;
guiding the robot to learn new tasks appearing in the working environment according to the robot memories obtained in the steps 2, 3 and 4; firstly, obtaining environment state information by using a perception module, carrying out similarity measurement on current perception information and events in a memory bank, taking cosine distance as a similarity measurement function, and retrieving matching scenes by calculating similarity measurement values of scene information of the current events and event information stored in a scene memory:
Where xi is the attenuation coefficient, the bigger value of xi represents the larger influence of the previous event on the current state, and xi is 0 when t is 1,cosine measurement of the current event scene information and the scene memory storage event information at the moment t;
secondly, selecting proper scene memory to guide the current task; root of herbaceous plantAccording to the read weightSelecting a guiding experience according to a calculation result; if the read weight value is larger than a given threshold value, extracting experience information in the scene to which the event belongs and guiding the learning of a new task by taking the scene as the experience of the new task; if the event with the reading weight larger than the threshold value does not exist in the memory, the current event is defined as a new event, a new scene is established for the current task, and the scene with the highest reading weight value is selected to guide the new task to learn.
2. The method for acquiring robot skills based on meta-learning under the guidance of contextual memory according to claim 1, characterized in that event eiBy quadruplets<o,pe,a,pt>The robot comprises a composition, wherein o is state perception of the environment obtained through a sensor, and comprises distribution of objects in an image, a position relation among the objects and joint information of the robot; p is a radical ofeIs the three-dimensional coordinates of the end effector of the mechanical arm; a is the action executed by the mechanical arm, and represents the action sequence of the robot at the current task in the time dimension; p is a radical oftIs the three-dimensional coordinate of the target object for the mechanical arm to carry out interactive operation.
3. A contextual memory guided meta-learning based robot skill acquisition method according to claim 1 or 2, characterized in that the new memory will be overwritten to the location of the most similar contextual memory or to the location of the least frequently called contextual memory in the memory area:
(1) when there is a high similarity between the two scenes, i.e. ifWriting the scene into the position of the most frequently called scene in the buffer area;
(2) if it isIndicate slowAnd if the scenes in the conflict area are not particularly similar to the scenes in the memory storage area, selecting the position of the scene memory with the lowest use weight, and covering the scene memory to ensure the high-efficiency utilization of the storage area. Use weightThe number of times the scene memory is matched in the memory storage area is defined as adding 1 to its use weight each time the scene memory is matched.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110740838.4A CN113657573B (en) | 2021-06-30 | 2021-06-30 | Robot skill acquisition method based on meta learning under scene memory guidance |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110740838.4A CN113657573B (en) | 2021-06-30 | 2021-06-30 | Robot skill acquisition method based on meta learning under scene memory guidance |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113657573A true CN113657573A (en) | 2021-11-16 |
CN113657573B CN113657573B (en) | 2024-06-21 |
Family
ID=78477833
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110740838.4A Active CN113657573B (en) | 2021-06-30 | 2021-06-30 | Robot skill acquisition method based on meta learning under scene memory guidance |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113657573B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114161419A (en) * | 2021-12-13 | 2022-03-11 | 大连理工大学 | Robot operation skill efficient learning method guided by scene memory |
CN115082717A (en) * | 2022-08-22 | 2022-09-20 | 成都不烦智能科技有限责任公司 | Dynamic target identification and context memory cognition method and system based on visual perception |
CN116563638A (en) * | 2023-05-19 | 2023-08-08 | 广东石油化工学院 | Image classification model optimization method and system based on scene memory |
CN118536545B (en) * | 2024-06-13 | 2024-11-15 | 东北电力大学 | Scene memory network design method based on synaptic remodeling model |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180210939A1 (en) * | 2017-01-26 | 2018-07-26 | Hrl Laboratories, Llc | Scalable and efficient episodic memory in cognitive processing for automated systems |
CN109668566A (en) * | 2018-12-05 | 2019-04-23 | 大连理工大学 | Robot scene cognition map construction and navigation method based on mouse brain positioning cells |
CN111474932A (en) * | 2020-04-23 | 2020-07-31 | 大连理工大学 | Mobile robot mapping and navigation method integrating scene experience |
CN112231489A (en) * | 2020-10-19 | 2021-01-15 | 中国科学技术大学 | Knowledge learning and transferring method and system for epidemic prevention robot |
-
2021
- 2021-06-30 CN CN202110740838.4A patent/CN113657573B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180210939A1 (en) * | 2017-01-26 | 2018-07-26 | Hrl Laboratories, Llc | Scalable and efficient episodic memory in cognitive processing for automated systems |
CN109668566A (en) * | 2018-12-05 | 2019-04-23 | 大连理工大学 | Robot scene cognition map construction and navigation method based on mouse brain positioning cells |
CN111474932A (en) * | 2020-04-23 | 2020-07-31 | 大连理工大学 | Mobile robot mapping and navigation method integrating scene experience |
CN112231489A (en) * | 2020-10-19 | 2021-01-15 | 中国科学技术大学 | Knowledge learning and transferring method and system for epidemic prevention robot |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114161419A (en) * | 2021-12-13 | 2022-03-11 | 大连理工大学 | Robot operation skill efficient learning method guided by scene memory |
CN114161419B (en) * | 2021-12-13 | 2023-09-15 | 大连理工大学 | Efficient learning method for robot operation skills guided by scene memory |
CN115082717A (en) * | 2022-08-22 | 2022-09-20 | 成都不烦智能科技有限责任公司 | Dynamic target identification and context memory cognition method and system based on visual perception |
CN115082717B (en) * | 2022-08-22 | 2022-11-08 | 成都不烦智能科技有限责任公司 | Dynamic target identification and context memory cognition method and system based on visual perception |
CN116563638A (en) * | 2023-05-19 | 2023-08-08 | 广东石油化工学院 | Image classification model optimization method and system based on scene memory |
CN116563638B (en) * | 2023-05-19 | 2023-12-05 | 广东石油化工学院 | Image classification model optimization method and system based on scene memory |
CN118536545B (en) * | 2024-06-13 | 2024-11-15 | 东北电力大学 | Scene memory network design method based on synaptic remodeling model |
Also Published As
Publication number | Publication date |
---|---|
CN113657573B (en) | 2024-06-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111203878B (en) | Robot sequence task learning method based on visual simulation | |
CN112605973B (en) | Robot motor skill learning method and system | |
Zhu | An adaptive agent decision model based on deep reinforcement learning and autonomous learning | |
Paxton et al. | Prospection: Interpretable plans from language by predicting the future | |
CN113657573A (en) | Robot skill acquisition method based on meta-learning under guidance of contextual memory | |
CN111300390B (en) | Intelligent mechanical arm control system based on reservoir sampling and double-channel inspection pool | |
CN109940614B (en) | Mechanical arm multi-scene rapid motion planning method integrating memory mechanism | |
EP4121256A1 (en) | Training and/or utilizing machine learning model(s) for use in natural language based robotic control | |
CN112183188B (en) | Method for simulating learning of mechanical arm based on task embedded network | |
Lippi et al. | Enabling visual action planning for object manipulation through latent space roadmap | |
CN115860107B (en) | Multi-machine searching method and system based on multi-agent deep reinforcement learning | |
Waytowich et al. | A narration-based reward shaping approach using grounded natural language commands | |
CN112509392B (en) | Robot behavior teaching method based on meta-learning | |
Li et al. | Curiosity-driven exploration for off-policy reinforcement learning methods | |
CN114161419B (en) | Efficient learning method for robot operation skills guided by scene memory | |
CN118365099B (en) | Multi-AGV scheduling method, device, equipment and storage medium | |
CN117332366A (en) | Information processing method, task execution method, device, equipment and medium | |
Reinhart | Reservoir computing with output feedback | |
CN115016499A (en) | Path planning method based on SCA-QL | |
US20220305647A1 (en) | Future prediction, using stochastic adversarial based sampling, for robotic control and/or other purpose(s) | |
Zhou et al. | Humanoid action imitation learning via boosting sample DQN in virtual demonstrator environment | |
Xiong et al. | Primitives generation policy learning without catastrophic forgetting for robotic manipulation | |
Yu et al. | LSTM learn policy from dynamical system of demonstration motions for robot imitation learning | |
CN117590756B (en) | Motion control method, device, equipment and storage medium for underwater robot | |
Chen et al. | Distributed continuous control with meta learning on robotic arms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |