[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN111401557B - Agent decision making method, AI model training method, server and medium - Google Patents

Agent decision making method, AI model training method, server and medium Download PDF

Info

Publication number
CN111401557B
CN111401557B CN202010492473.3A CN202010492473A CN111401557B CN 111401557 B CN111401557 B CN 111401557B CN 202010492473 A CN202010492473 A CN 202010492473A CN 111401557 B CN111401557 B CN 111401557B
Authority
CN
China
Prior art keywords
information
agent
current frame
frame
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010492473.3A
Other languages
Chinese (zh)
Other versions
CN111401557A (en
Inventor
张弛
郭仁杰
王宇舟
武建芳
杨木
杨正云
李宏亮
刘永升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Super Parameter Technology Shenzhen Co ltd
Original Assignee
Super Parameter Technology Shenzhen Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Super Parameter Technology Shenzhen Co ltd filed Critical Super Parameter Technology Shenzhen Co ltd
Priority to CN202010492473.3A priority Critical patent/CN111401557B/en
Publication of CN111401557A publication Critical patent/CN111401557A/en
Application granted granted Critical
Publication of CN111401557B publication Critical patent/CN111401557B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/55Controlling game characters or game objects based on the game progress
    • A63F13/58Controlling game characters or game objects based on the game progress by computing conditions of game characters, e.g. stamina, strength, motivation or energy level
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/60Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor
    • A63F13/67Generating or modifying game content before or while executing the game program, e.g. authoring tools specially adapted for game development or game-integrated level editor adaptively or by learning from player actions, e.g. skill level adjustment or by storing successful combat sequences for re-use
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/6027Methods for processing data by generating or executing the game program using adaptive systems learning from user actions, e.g. for skill level adjustment
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F2300/00Features of games using an electronically generated display having two or more dimensions, e.g. on a television screen, showing representations related to the game
    • A63F2300/60Methods for processing data by generating or executing the game program
    • A63F2300/65Methods for processing data by generating or executing the game program for computing the condition of a game character

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The application discloses an agent decision making method based on an AI model, an AI model training method, a server and a medium, wherein the method comprises the following steps: acquiring current frame state information of an agent and current frame 3D map information in a 3D virtual environment; outputting current frame action output information corresponding to the agent based on the current frame state information and the current frame 3D map information through a time sequence feature extraction module of an AI model; obtaining the next frame state information of the intelligent agent according to the current frame action output information; acquiring historical position information of the intelligent agent, and generating the next frame of 3D map information according to the historical position information; and outputting next frame action output information corresponding to the intelligent agent according to the next frame state information and the next frame 3D map information. Therefore, reliable and efficient AI simulation is achieved.

Description

Agent decision making method, AI model training method, server and medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to an agent decision making method, an AI model training method, a server, and a medium.
Background
With the rapid development of Artificial Intelligence (AI) technology, the AI technology is widely applied to various fields such as 3D games, virtual traffic, automatic driving simulation, robot trajectory planning, etc., and AI simulation in a 3D virtual space has a great commercial value.
At present, the correct decisions which need to be made at different positions of an intelligent agent in AI simulation are generally learned through the memory capacity of a neural network, and a soft-attention mechanism is used to perform decision analysis on all state information, including dynamically changing information and static unchanging information, such as information that teammates and enemies continuously move in a 3D game, position of material points and other various information, so that some scenes that environmental information simply changes can be met, but the method is not suitable for scenes that the environmental information rapidly changes, and the intelligent agent is difficult to make long-term decisions. Therefore, how to realize reliable and efficient AI simulation becomes an urgent problem to be solved.
Disclosure of Invention
The embodiment of the application provides an agent decision making method, an AI model training method, a server and a medium, which can realize reliable and efficient AI simulation.
In a first aspect, an embodiment of the present application provides an agent decision making method based on an AI model, including:
acquiring current frame state information of an agent and current frame 3D map information in a 3D virtual environment;
outputting current frame action output information corresponding to the agent based on the current frame state information and the current frame 3D map information through a time sequence feature extraction module of an AI model;
obtaining the next frame state information of the intelligent agent according to the current frame action output information;
acquiring historical position information of the intelligent agent, and generating the next frame of 3D map information according to the historical position information;
and outputting next frame action output information corresponding to the intelligent agent according to the next frame state information and the next frame 3D map information.
In a second aspect, an embodiment of the present application further provides a method for training an AI model, including:
acquiring a sample data set, wherein the sample data set comprises multi-frame state information and multi-frame 3D map information of an agent;
outputting multi-frame fusion state vector information corresponding to the agent through a timing sequence feature extraction module of an AI model to be trained based on the multi-frame state information and the multi-frame 3D map information;
constructing a loss function according to the multi-frame fusion state vector information;
and performing multi-step iteration on the loss function to train and update the AI model.
In a third aspect, an embodiment of the present application further provides a server, where the server includes a processor, a memory, and a computer program stored on the memory and executable by the processor, where the memory stores an AI model, and where the computer program, when executed by the processor, implements the AI model-based agent decision making method as described above; alternatively, a training method of the AI model as described above is implemented.
In a fourth aspect, the present application further provides a computer-readable storage medium for storing a computer program, which when executed by a processor, causes the processor to implement the above-mentioned AI model-based agent decision-making method; alternatively, the above-described training method of the AI model is implemented.
The embodiment of the application provides an agent decision making method based on an AI model, an AI model training method, a server and a computer readable storage medium, based on current frame state information of an agent and current frame 3D map information in a 3D virtual environment, a time sequence feature extraction module of the AI model outputs current frame action output information corresponding to the agent based on the current frame state information of the agent and the current frame 3D map information, obtains next frame state information of the agent according to the current frame action output information, generates next frame 3D map information according to historical position information of the agent, further obtains next frame action output information of the agent according to the next frame state information of the agent and the next frame 3D map information, obtains each frame action output information of the agent according to the mode, thereby realizing long-term decision, therefore, reliable and efficient AI simulation is achieved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flow chart illustrating steps of an AI model based agent decision-making method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a first-level channel of a 3D map provided by an embodiment of the present application;
FIG. 3 is a schematic diagram of a second level channel of a 3D map provided by an embodiment of the present application;
FIG. 4 is a schematic diagram of a third level channel of a 3D map provided by an embodiment of the present application;
FIG. 5 is a diagram illustrating a fourth layer of channels of a 3D map according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of an AI model based agent action output provided by an embodiment of the application;
FIG. 7 is a flowchart illustrating steps of a method for training an AI model according to an embodiment of the present application;
FIG. 8 is a schematic diagram of AI model training provided by an embodiment of the present application;
fig. 9 is a schematic block diagram of a server provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The flow diagrams depicted in the figures are merely illustrative and do not necessarily include all of the elements and operations/steps, nor do they necessarily have to be performed in the order depicted. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
At present, in the AI simulation of a 3D virtual space, generally, the memory capacity of a neural network is used to learn the correct decisions that need to be made at different positions of an agent in the AI simulation, and a soft-attention mechanism is used to analyze all state information including dynamically changing information and static unchanging information, such as information that teammates and enemies continuously move in a 3D game, and various information such as positions of material points, so that the AI simulation can meet some scenes where environmental information changes simply, but is not suitable for scenes where environmental information changes rapidly, and the agent is difficult to make long-term decisions.
In order to solve the above problems, embodiments of the present application provide an AI model-based agent decision making method, an AI model training method, a server, and a computer-readable storage medium, which are used to implement reliable and efficient AI simulation. The AI model-based agent decision making method and the AI model training method can be applied to a server, and the server can be a single server or a server cluster consisting of a plurality of servers.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an AI model-based agent decision-making method according to an embodiment of the present disclosure.
As shown in fig. 1, the AI model-based agent decision-making method specifically includes steps S101 to S105.
S101, obtaining current frame state information of an agent and current frame 3D map information in a 3D virtual environment.
For example, in various application scenarios such as Artificial Intelligence (AI), robot simulation in a 3D virtual environment, mechanical arm, unmanned driving, virtual traffic simulation, and the like, or in game AI in a 3D type game, in order to implement rapid and efficient simulation, a correct decision is made on an Agent (Agent) in the 3D virtual environment, and current frame state information of the Agent and current frame 3D map information in the 3D virtual environment are acquired. The intelligent agent is an intelligent agent which is hosted in a complex dynamic environment, autonomously senses environment information, autonomously takes action and realizes a series of preset targets or tasks. The status information of the agent includes, but is not limited to, location information, athletic information, combat power information, etc. of the agent.
Illustratively, the 3D map information is relative map information within a preset range centered on a current location of the agent, rather than global map information of the 3D virtual environment. For example, relative map information in the range of 90m × 90m around the current position of the agent as the center point.
In an embodiment, the 3D map of the 3D virtual environment comprises a plurality of layers of channels, each layer of channels being composed of a plurality of meshes, for example each layer of channels being composed of n × n meshes, say n is 9, each layer of channels being composed of 9 × 9 meshes. Each grid size is Lm, for example, L is 10, and each grid size is 10 m.
It should be noted that the number of grids of each layer of channels and the size of the grids can be flexibly set according to actual situations, and are not specifically limited herein. By carrying out grid segmentation on the local 3D map, the problem of overlarge map information dimension caused by overlarge map dimension is avoided.
Each layer of channel records different types of information, wherein the different types of information include but are not limited to whether the intelligent agent moves to the position of the grid, the frequency of the intelligent agent moving to the position of the grid, the sequence of the intelligent agent moving to the position of the grid, the number of material points in the position of the grid, the state information of the intelligent agent moving to the position of the grid, and the like.
Optionally, the multiple layers of channels of the 3D map include a first layer of channels, a second layer of channels, a third layer of channels, and a fourth layer of channels, where a grid of the first layer of channels records whether the agent moves to a position where the grid is located, a grid of the second layer of channels records a frequency with which the agent moves to the position where the grid is located, a grid of the third layer of channels records an order in which the agent moves to the position where the grid is located, and a grid of the fourth layer of channels records a number of material points at the position where the grid is located.
Illustratively, filling a corresponding grid of the first-layer channel with first identification information to represent that the intelligent agent moves to the position of the grid; and filling the second identification information into the corresponding grid of the first-layer channel to represent that the intelligent agent does not move to the position of the grid.
For example, as shown in fig. 2, the first identification information is set to be a value 1, the second identification information is set to be a value 0, and the grid of the first-layer channel stores the value 0 or the value 1 to represent whether the agent has moved to the location of the grid, where the grid storing the value 0 represents that the agent has not moved to the location of the grid, and the grid storing the value 1 represents that the agent has moved to the location of the grid.
Illustratively, the respective grid of the second-level channel is filled with respective integers, representing the frequency of the agent moving to the location of the grid. For example, as shown in FIG. 3, grid fill 0 of the second level of channels represents that the agent has not moved to the location of the grid, grid fill 1 of the second level of channels represents that the agent has moved to the location of the grid 1 time, grid fill 2 of the second level of channels represents that the agent has moved to the location of the grid 2 times, grid fill 3 of the second level of channels represents that the agent has moved to the location of the grid 3 times, and so on.
Illustratively, the respective grids filled to the third level of channels are numbered in different sizes, characterizing the order in which the agents move to the locations of the grids. For example, as shown in FIG. 4, the smaller the number stored in the grid of the third tier of channels, the later the time the agent moves to the location of the grid. It should be noted that, the representation may be reversed, and the number stored in the grid is smaller as the time for the agent to move to the position of the corresponding grid is earlier.
Illustratively, different values are adopted to fill corresponding grids of the fourth layer of channels, and the quantity of material points at the positions of the grids is represented. For example, as shown in fig. 5, a grid filling value 0 of the fourth layer of channels represents that there are no material points at the position of the grid, a grid filling value 1 of the fourth layer of channels represents that there are 1 material points at the position of the grid, a grid filling value 2 of the fourth layer of channels represents that there are 2 material points at the position of the grid, a grid filling value 3 of the fourth layer of channels represents that there are 3 material points at the position of the grid, and so on.
Based on the 3D map information, information such as whether the intelligent agent moves to the position of the corresponding grid of the 3D map, the frequency of the intelligent agent moving to the position of the grid, the sequence of the intelligent agent moving to the position of the grid, the number of material points at the position of the grid and the like is obtained.
In an embodiment, the 3D map records information such as whether the agent moves to the position of the corresponding grid of the 3D map within a preset time period, the frequency of the agent moving to the position of the grid, the order of the agent moving to the position of the grid, and the number of material points existing at the position of the grid. For example, setting the preset number of times as 20, where a grid storage value 0 of a first-layer channel indicates that the agent has not moved to the position of the grid in the 20 times of history data, and a storage value 1 indicates that the agent has moved to the position of the grid in the 20 times of history data; grid filling 1 of the second layer channel represents that the intelligent agent reaches the position of the grid for 1 time in the 20 times of historical data, and grid filling 2 of the second layer channel represents that the intelligent agent reaches the position of the grid for 2 times in the 20 times of historical data; the grid of the third tier of channels stores the order in which agents arrive at the grid in the 20 historical data, numbering the order in which agents arrive at the grid from 0 to 19, with the later the time to arrive at the grid, the smaller the number stored by the grid.
By embedding the corresponding position information of the intelligent agent into the channel of the 3D map and adding the information of the material points into the channel, the position information identification in the AI simulation is promoted, and the generalization of the AI model network is further improved.
And S102, outputting current frame action output information corresponding to the agent through a time sequence feature extraction module of an AI model based on the current frame state information and the current frame 3D map information.
In this embodiment, the AI model is provided with a corresponding timing feature extraction module, where the timing feature extraction module includes, but is not limited to, an LSTM (Long Short-Term Memory) module, a GRU (Gated secured unit) module, a transform module, and the like.
The method comprises the steps that an AI model is called, a time sequence feature extraction module based on the AI model takes current frame state information and current frame 3D map information of an agent as input information, the input information is processed by the time sequence feature extraction module, time sequence feature extraction is carried out, and current frame action output information corresponding to the agent is output.
In an embodiment, the current frame state information of the agent and the current frame 3D map information are first subjected to CONCAT fusion and then input to the timing feature extraction module for processing. Specifically, firstly, extracting the state embedding vector feature S in the current frame state information of the agenttAnd obtaining map vector feature M according to current frame 3D map informationtEmbedding the states into vector features StAnd map vector feature MtMerged input full-connection godProcessing the data through a network to obtain state embedded vector characteristics StAnd map vector feature MtCorresponding fusion information. And inputting the fusion information into a time sequence characteristic extraction module for processing, and outputting current frame action output information corresponding to the intelligent agent.
In one embodiment, different types of information are recorded in a multi-layer channel based on a 3D map, and a map vector feature M is obtained according to current frame 3D map informationtSpecifically, the different types of information are subjected to multilayer convolution calculation to obtain corresponding map vector features Mt. For example, taking a 3D map including the four layers of channels as an example, the current frame 3D map information is subjected to 4 layers of convolution calculation, and is subjected to a leveling operation in the last layer of convolution calculation to obtain a map vector feature Mt
And S103, acquiring state information of the next frame of the intelligent agent according to the current frame action output information.
And S104, acquiring historical position information of the intelligent agent, and generating the next frame of 3D map information according to the historical position information.
And controlling the intelligent agent to execute corresponding action output based on the output current frame action output information, interacting with the 3D virtual environment, updating the state information of the intelligent agent, and obtaining the next frame state information of the intelligent agent. Meanwhile, the location information of the agent is recorded, and the location information recorded each time is stored, and the location information is stored locally in the server as the historical location information of the agent, or may be stored in another storage device other than the server, which is not limited specifically herein.
And inquiring and acquiring the stored historical position information of the intelligent agent, and constructing and obtaining the next frame of 3D map information according to the historical position information. For example, a preset number corresponding to historical position information for constructing 3D map information is preset, the preset number of historical position information is acquired, and based on the preset number of historical position information, the next frame of 3D map information is constructed. Optionally, the preset number is set to 20, that is, the next frame of 3D map information is constructed according to 20 sets of historical position information. The preset number can be flexibly set according to actual conditions, and is not particularly limited herein.
In one embodiment, to save storage space, only a preset amount of historical location information is saved. The history position information is stored once every time the history position information is recorded, and the history position information with the earliest record is deleted from a plurality of stored history position information, so that the number of the stored history position information is maintained at a preset number. Specifically, each time the current location information of the agent is recorded, it is determined whether the number of stored historical location information reaches a preset number. If the quantity of the stored historical position information does not reach the preset quantity, directly storing the current position information; and if the quantity of the stored historical position information reaches the preset quantity, storing the current position information, and deleting the historical position information recorded earliest in the stored historical position information, so that the quantity of the stored historical position information is maintained at the preset quantity.
In one embodiment, the position information of the agent is not recorded every frame, but the position information of the agent is recorded and stored every preset time period by setting a preset time period as the historical position information of the agent. The preset duration can be flexibly set according to actual conditions, for example, the preset duration is set to 10s, that is, the position information of the agent is recorded and stored every 10 s.
In combination with the above preset number, it is assumed that the preset number is 20, the preset duration is 10s, that is, the location information of the agent is recorded every 10s, and a total of 20 sets of historical location information are saved, which is equivalent to saving the historical location information within a time span of 200 s. And constructing and obtaining the 3D map information of the next frame according to 20 groups of historical position information within the 200s time span.
For example, taking the time sequence feature extraction module as an LSTM module as an example, the LSTM module serves as an independent feature extraction unit, and can accept previous frame hidden state information and current frame state information as inputs, and output corresponding current frame hidden state information, where the hidden state information includes hidden information (hidden state) and cell state information (cell state), and the current frame hidden state information serves as an input of a next frame. The LSTM module carries out CONCAT fusion on the current frame state information and the current frame 3D map information of the agent based on the current frame state information, the current frame 3D map information and the previous frame hidden state information of the agent, then inputs the fusion information and the previous frame hidden state information into the LSTM module, and outputs the corresponding current frame hidden state information. And then, acquiring current frame action output information corresponding to the intelligent agent according to the current frame hidden state information.
For example, as shown in fig. 6, the current frame 3D map information is subjected to 4-layer convolution calculation to obtain a current frame map vector feature MtEmbedding the current frame state corresponding to the current frame state information of the agent into the vector characteristic StWith the current frame map vector feature MtCONCAT merging is carried out, input into the full-connection neural network for processing, corresponding fusion information is obtained, and then the fusion information and the previous frame of hidden information h are processedt-1Last frame unit status information Ct-1And inputting the input into an LSTM module for processing, and outputting current frame action output information corresponding to the intelligent agent.
Three gates are designed in the LSTM module, which are a forgetting gate (forget gate), an input gate (input gate), and an output gate (output gate), and the three gates perform different processing on input information. Inputting the previous frame concealment state information including the previous frame concealment information ht-1And last frame unit state information Ct-1And fusion information x of current frame state information and current frame 3D map information of agenttOutputting the hidden information h of the current frametAnd current frame unit state information Ct. Hiding the previous frame with the information h through a forgetting gatet-1And fusion information xtMerging (CONCAT), passing through a forward network and then outputting forgetting probability f through Sigmoid functiont(a value between 0 and 1). The information h of the previous frame can be hidden by inputting the information ht-1And fusion information xtMerging (CONCAT), passing through a forward network and then through a Sigmoid function to output a corresponding input probability it(value between 0 and 1) and outputs the fusion information x through another forward network via the tanh functiontIs processed intoFruit Ct By multiplication of ftAnd last frame unit state information Ct-1Multiply, and itAnd Ct Multiplying, adding the two obtained product values, and updating to the output current frame unit state information C with the added sum valuetThe method comprises the following steps:
Ct=ft·Ct-1+it·Ct
the output gate controls the output information of the LSTM unit, and the output current frame hidden information
Figure 263234DEST_PATH_IMAGE001
Integrates the hidden information h of the previous framet-1And last frame unit state information Ct-1And fusion information xt. Calculating fusion information x by Sigmoid functiontOutput probability of (1)tAt the same time, the current frame unit state information CtProcessed by tanh function and reacted with OtMultiplying to obtain hidden information h of current frametComprises the following steps:
ht=Ot·tanh(Ct)
wherein, the current frame conceals the information htThe method comprises the step of including fusion state vector information corresponding to an agent, and based on the current frame hidden information h in the output current frame hidden state informationtAnd acquiring fusion state vector information corresponding to the intelligent agent, wherein the fusion state vector information comprises multi-frame state information of the intelligent agent. And obtaining the current frame action output information of the agent according to the fusion state vector information.
And S105, outputting next frame action output information corresponding to the intelligent agent according to the next frame state information and the next frame 3D map information.
After the next frame state information and the next frame 3D map information of the agent are obtained, according to the operation in step S102, the next frame action output information corresponding to the agent is output based on the next frame state information and the next frame 3D map information of the agent through the timing feature extraction module of the AI model. The specific operation process can refer to the process described in step S102, and is not described herein again.
Therefore, based on the state information of each frame of the intelligent agent and the 3D map information of each frame, the action output information of each frame corresponding to the intelligent agent can be output, and the intelligent agent can make efficient and reliable long-term decision according to the action output information of each frame corresponding to the intelligent agent. That is, by combining the timing feature extraction module, such as the LSTM module, with the 3D map, the agent can form a good memory in the 3D virtual environment, and make long-term decisions.
The method for making a decision of an agent based on an AI model according to the foregoing embodiments is based on current frame state information of the agent and current frame 3D map information in a 3D virtual environment, and outputs current frame action output information corresponding to the agent based on the current frame state information of the agent and the current frame 3D map information through a timing feature extraction module of the AI model, and obtains next frame state information of the agent according to the current frame action output information, and generates next frame 3D map information according to historical position information of the agent, and further obtains next frame action output information of the agent according to the next frame state information of the agent and the next frame 3D map information, and obtains each frame action output information of the agent according to the method, thereby implementing a long-term decision, and thus implementing reliable and efficient AI simulation.
The embodiment of the application also provides a training method of the AI model. The training method of the AI model can be applied to a server, so that reliable and efficient AI simulation can be realized by calling the trained AI model. The server may be a single server or a server cluster composed of a plurality of servers.
Referring to fig. 7, fig. 7 is a flowchart illustrating a method for training an AI model according to an embodiment of the present disclosure.
As shown in fig. 7, the AI model training method includes steps S201 to 204.
S201, obtaining a sample data set, wherein the sample data set comprises multi-frame state information and multi-frame 3D map information of the intelligent agent.
Illustratively, a sample data set corresponding to AI model training is stored in a Remote Dictionary service (Remote Dictionary service) database, and is used for training the AI model, wherein the sample data set includes, but is not limited to, multiframe state information and multiframe 3D map information of a smart agent. And acquiring a sample data set corresponding to AI model training through query access redis.
S202, outputting multi-frame fusion state vector information corresponding to the intelligent agent through a time sequence feature extraction module of the AI model to be trained on the basis of the multi-frame state information and the multi-frame 3D map information.
As described in the embodiment of the AI model-based agent decision making method, the AI model is provided with a corresponding time sequence feature extraction module, wherein the time sequence feature extraction module includes, but is not limited to, an LSTM module, a GRU module, a Transformer module, and the like.
And the timing sequence feature extraction module based on the AI model takes the multi-frame state information and the multi-frame 3D map information of the intelligent agent as input information, processes the input information by the timing sequence feature extraction module, extracts the timing sequence feature and outputs multi-frame fusion state vector information corresponding to the intelligent agent. Specifically, extracting state embedding vector feature S in multi-frame state informationiAnd map vector feature M corresponding to multi-frame 3D map informationiEmbedding multiframe states into vector features SiAnd map vector feature MiAnd the input time sequence characteristic extraction module processes and outputs multi-frame fusion state vector information corresponding to the intelligent agent.
For example, as shown in fig. 8, the map vector feature M corresponding to each frame of 3D map information is obtained by respectively performing multi-layer convolution calculation on multiple frames of 3D map information to obtain a time sequence feature extraction module, which is an LSTM module as an exampleiIs composed of Mt、Mt+1And obtaining the state embedding vector characteristic S corresponding to the state information of each frameiIncluding St、St+1Etc. respectively adding StAnd MtAnd the previous frame hidden information ht-1Last frame unit status information Ct-1Inputting LSTM module for processing, and outputting hidden information h of current frametCurrent frame unit state information CtCorresponding fusion state vector information; will St+1、Mt+1And current frame hidden information htCurrent frame unit state information CtInputting LSTM module for processing, and outputting the hidden information h of next framet+1Next frame unit state information Ct+1And obtaining multi-frame fusion state vector information according to the corresponding fusion state vector information.
S203, constructing a loss function according to the multi-frame fusion state vector information.
The loss function includes a value loss (value loss), a policy gradient loss (policy gradient loss), an entropy loss (entropy loss), and the like.
In an embodiment, for multi-frame fusion state vector information, based on each frame of fusion state vector information, motion output information corresponding to each frame of fusion state vector information and a cost function output value corresponding to the motion output information are obtained respectively. The value function output value is used for evaluating the action output information, and if the value function output value is high, the relevant action instruction of the corresponding action output information can be controlled to be executed; and if the value function output value is low, the relevant action command of the corresponding action output information is not executed. And constructing a corresponding loss function based on the obtained multi-frame action output information and the value function output value corresponding to the multi-frame action output information.
And S204, carrying out multi-step iteration on the loss function to train and update the AI model.
Optionally, as shown in fig. 8, the loss function is sent to a GPU (Graphics Processing Unit) for multi-step iterative optimization, so as to obtain relevant AI model parameters after iteration, where the AI model parameters include, but are not limited to, parameters of a timing feature extraction module, parameters of a cost function, and so on. And updating the parameters of the iterative relevant AI model to the AI model so as to finish the training and updating of the AI model.
Meanwhile, various information such as state information of the agent, 3D map information and the like generated by continuous interaction with the 3D virtual environment are stored in a data storage system, such as redis, and used as data in sample data set for iterative training of the AI model.
Referring to fig. 9, fig. 9 is a schematic block diagram of a server according to an embodiment of the present disclosure.
As shown in fig. 9, the server may include a processor, memory, and a network interface. The processor, memory, and network interface are connected by a system bus, such as an I2C (Inter-integrated Circuit) bus.
Specifically, the Processor may be a Micro-controller Unit (MCU), a Central Processing Unit (CPU), a Digital Signal Processor (DSP), or the like.
Specifically, the Memory may be a Flash chip, a Read-Only Memory (ROM) magnetic disk, an optical disk, a usb disk, or a removable hard disk.
The network interface is used for network communication, such as sending assigned tasks and the like. Those skilled in the art will appreciate that the architecture shown in fig. 9 is a block diagram of only a portion of the architecture associated with the subject application, and does not constitute a limitation on the servers to which the subject application applies, as a particular server may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
Wherein the processor is configured to run a computer program stored in the memory and to implement the following steps when executing the computer program:
acquiring current frame state information of an agent and current frame 3D map information in a 3D virtual environment;
outputting current frame action output information corresponding to the agent based on the current frame state information and the current frame 3D map information through a time sequence feature extraction module of an AI model;
obtaining the next frame state information of the intelligent agent according to the current frame action output information;
acquiring historical position information of the intelligent agent, and generating the next frame of 3D map information according to the historical position information;
and outputting next frame action output information corresponding to the intelligent agent according to the next frame state information and the next frame 3D map information.
In some embodiments, before implementing the timing feature extraction module through the AI model to output the current frame action output information corresponding to the agent based on the current frame state information and the current frame 3D map information, the processor further implements:
extracting state embedding vector features in the current frame state information, and acquiring map vector features according to the current frame 3D map information;
merging the state embedding vector features and the map vector features and inputting the merged state embedding vector features and the map vector features into a fully-connected neural network to obtain corresponding fusion information;
when the processor outputs the current frame action output information corresponding to the agent based on the current frame state information and the current frame 3D map information through the timing characteristic extraction module of the AI model, the following steps are specifically implemented:
and inputting the fusion information into the time sequence feature extraction module, and outputting the current frame action output information corresponding to the agent.
In some embodiments, the 3D map of the 3D virtual environment comprises a plurality of layers of channels, each layer of channels consisting of a plurality of meshes, the plurality of layers of channels each recording different types of information.
In some embodiments, the multi-layer channels include at least two layers of channels among a first layer of channels, a second layer of channels, a third layer of channels, and a fourth layer of channels, where a grid of the first layer of channels records whether the agent moves to a position where the grid is located, a grid of the second layer of channels records a frequency of the agent moving to the position where the grid is located, a grid of the third layer of channels records a sequence of the agent moving to the position where the grid is located, and a grid of the fourth layer of channels records a number of material points existing at the position where the grid is located.
In some embodiments, the current frame 3D map information includes different types of information recorded in multiple channels of a 3D map, and the processor specifically implements, when implementing the obtaining of the map vector feature according to the current frame 3D map information:
and carrying out multilayer convolution calculation on the different types of information to obtain the map vector characteristics.
In some embodiments, the current frame 3D map information is relative map information within a preset range centered on a current location of the agent.
In some embodiments, the processor, when executing the computer program, further implements:
and recording and storing the position information of the intelligent agent every preset time, wherein the historical position information of the intelligent agent is a plurality of stored position information.
In some embodiments, when the processor implements the recording and storing of the location information of the agent, the processor implements:
determining whether the quantity of the stored historical position information reaches a preset quantity every time the current position information of the intelligent agent is recorded;
if the quantity of the stored historical position information does not reach the preset quantity, storing the current position information;
and if the quantity of the stored historical position information reaches the preset quantity, storing the current position information, and deleting the historical position information recorded earliest in the stored historical position information.
In some embodiments, the temporal feature extraction module comprises an LSTM module, the processor, when executing the computer program, further implementing:
acquiring previous frame hidden state information corresponding to the LSTM module;
when the processor outputs the current frame action output information corresponding to the agent based on the current frame state information and the current frame 3D map information through the timing characteristic extraction module of the AI model, the following steps are specifically implemented:
outputting, by the LSTM module, current frame hidden state information corresponding to the LSTM module based on the current frame state information, the current frame 3D map information, and the previous frame hidden state information;
and acquiring the current frame action output information corresponding to the agent according to the current frame hidden state information.
In some embodiments, when the processor obtains the current frame action output information corresponding to the agent according to the current frame hidden state information, the following is specifically implemented:
acquiring fusion state vector information corresponding to the agent according to the current frame hidden state information;
and acquiring the current frame action output information according to the fusion state vector information.
In some embodiments, the processor, when executing the computer program, further implements:
acquiring a sample data set, wherein the sample data set comprises multi-frame state information and multi-frame 3D map information of an agent;
outputting multi-frame fusion state vector information corresponding to the agent through a timing sequence feature extraction module of an AI model to be trained based on the multi-frame state information and the multi-frame 3D map information;
constructing a loss function according to the multi-frame fusion state vector information;
and performing multi-step iteration on the loss function to train and update the AI model.
In some embodiments, the time-series feature extraction module includes an LSTM module, the sample data set further includes hidden state information corresponding to the LSTM module, and the processor is specifically configured to, when implementing that the time-series feature extraction module passing through the AI model to be trained outputs multi-frame fusion state vector information corresponding to the agent based on the multi-frame state information and the multi-frame 3D map information:
outputting the multi-frame fusion state vector information based on the hidden state information, the multi-frame state information and the multi-frame 3D map information through the LSTM module.
In some embodiments, when the processor implements the constructing of the loss function according to the multi-frame fusion state vector information, the following is specifically implemented:
acquiring multiframe action output information and a value function output value corresponding to the multiframe action output information according to the multiframe fusion state vector information;
and constructing the loss function according to the multi-frame action output information and the value function output value corresponding to the multi-frame action output information.
It should be noted that, as will be clearly understood by those skilled in the art, for convenience and brevity of description, the specific working process of the server described above may refer to the corresponding process in the foregoing embodiment of the AI model-based intelligent agent decision making method and/or the AI model training method, and details are not repeated herein.
Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, where the computer program includes program instructions, and the processor executes the program instructions to implement the steps of the AI model-based agent decision-making method and/or the AI model training method provided in the foregoing embodiments. For example, the computer program is loaded by a processor and may perform the following steps:
acquiring current frame state information of an agent and current frame 3D map information in a 3D virtual environment;
calling an AI model, and outputting current frame action output information corresponding to the agent through a time sequence feature extraction module of the AI model based on the current frame state information and the current frame 3D map information;
obtaining the next frame state information of the intelligent agent according to the current frame action output information;
and acquiring historical position information of the intelligent agent, and generating the next frame of 3D map information according to the historical position information.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
The computer readable storage medium may be an internal storage unit of the server in the foregoing embodiment, for example, a hard disk or a memory of the server. The computer readable storage medium may also be an external storage device of the server, such as a plug-in hard disk provided on the server, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like.
Since the computer program stored in the computer-readable storage medium can execute any one of the AI model-based agent decision making methods and/or AI model training methods provided in the embodiments of the present application, beneficial effects that can be achieved by any one of the AI model-based agent decision making methods and/or AI model training methods provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described herein again.
It is to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments. While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and various equivalent modifications or substitutions can be easily made by those skilled in the art within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (12)

1. An AI model-based agent decision-making method, comprising:
acquiring current frame state information of an agent and current frame 3D map information in a 3D virtual environment; the current frame 3D map information is relative map information in a preset range taking the current position of the intelligent agent as a center;
outputting current frame action output information corresponding to the agent based on the current frame state information and the current frame 3D map information through a time sequence feature extraction module of an AI model;
obtaining the next frame state information of the intelligent agent according to the current frame action output information;
acquiring historical position information of the intelligent agent, and generating the next frame of 3D map information according to the historical position information;
outputting next frame action output information corresponding to the intelligent agent according to the next frame state information and the next frame 3D map information;
before the outputting of the current frame action output information corresponding to the agent by the timing characteristic extraction module of the AI model based on the current frame state information and the current frame 3D map information, the method includes:
extracting state embedding vector features in the current frame state information, and acquiring map vector features according to the current frame 3D map information;
merging the state embedding vector features and the map vector features and inputting the merged state embedding vector features and the map vector features into a fully-connected neural network to obtain corresponding fusion information;
the outputting of the current frame action output information corresponding to the agent by the timing characteristic extraction module of the AI model based on the current frame state information and the current frame 3D map information includes:
and inputting the fusion information into the time sequence feature extraction module, and outputting the current frame action output information corresponding to the agent.
2. The method of claim 1, wherein the 3D map of the 3D virtual environment comprises a plurality of layers of channels, each layer of channels being composed of a plurality of meshes, the plurality of layers of channels each recording different types of information.
3. The method of claim 2, wherein the plurality of layers of channels includes at least two layers of channels selected from a first layer of channels, a second layer of channels, a third layer of channels, and a fourth layer of channels, wherein a grid of the first layer of channels records whether the agent moves to a location of the grid, wherein a grid of the second layer of channels records a frequency of the agent moving to the location of the grid, wherein a grid of the third layer of channels records an order of the agent moving to the location of the grid, and wherein a grid of the fourth layer of channels records a number of asset points at the location of the grid.
4. The method of claim 1, wherein the current frame 3D map information includes different types of information recorded in a multi-layer channel of a 3D map, and the obtaining map vector features according to the current frame 3D map information includes:
and carrying out multilayer convolution calculation on the different types of information to obtain the map vector characteristics.
5. The method of claim 1, further comprising:
and recording and storing the position information of the intelligent agent every preset time, wherein the historical position information of the intelligent agent is a plurality of stored position information.
6. The method of claim 5, wherein recording and storing the location information of the agent comprises:
determining whether the quantity of the stored historical position information reaches a preset quantity every time the current position information of the intelligent agent is recorded;
if the quantity of the stored historical position information does not reach the preset quantity, storing the current position information;
and if the quantity of the stored historical position information reaches the preset quantity, storing the current position information, and deleting the historical position information recorded earliest in the stored historical position information.
7. The method of any of claims 1 to 6, wherein the temporal feature extraction module comprises an LSTM module, the method further comprising:
acquiring previous frame hidden state information corresponding to the LSTM module;
the outputting, by the timing feature extraction module of the AI model, current frame action output information corresponding to the agent based on the current frame state information and the current frame 3D map information includes:
outputting, by the LSTM module, current frame hidden state information corresponding to the LSTM module based on the current frame state information, the current frame 3D map information, and the previous frame hidden state information;
and acquiring the current frame action output information corresponding to the agent according to the current frame hidden state information.
8. The method according to claim 7, wherein the obtaining the current frame action output information corresponding to the agent according to the current frame hidden state information includes:
acquiring fusion state vector information corresponding to the agent according to the current frame hidden state information;
and acquiring the current frame action output information according to the fusion state vector information.
9. A method for training an AI model, comprising:
acquiring a sample data set, wherein the sample data set comprises multi-frame state information and multi-frame 3D map information of an agent;
outputting multi-frame fusion state vector information corresponding to the agent through a timing sequence feature extraction module of an AI model to be trained based on the multi-frame state information and the multi-frame 3D map information;
constructing a loss function according to the multi-frame fusion state vector information;
performing multi-step iteration on the loss function to train and update the AI model;
wherein, the constructing a loss function according to the multi-frame fusion state vector information comprises:
acquiring multiframe action output information and a value function output value corresponding to the multiframe action output information according to the multiframe fusion state vector information;
and constructing the loss function according to the multi-frame action output information and the value function output value corresponding to the multi-frame action output information.
10. The method of claim 9, wherein the temporal feature extraction module comprises an LSTM module, the sample data set further includes hidden state information corresponding to the LSTM module, and the outputting, by the temporal feature extraction module of the AI model to be trained, multi-frame fusion state vector information corresponding to the agent based on the multi-frame state information and the multi-frame 3D map information comprises:
outputting the multi-frame fusion state vector information based on the hidden state information, the multi-frame state information and the multi-frame 3D map information through the LSTM module.
11. A server, characterized in that the server comprises a processor, a memory, and a computer program stored on the memory and executable by the processor, the memory storing an AI model, wherein the computer program, when executed by the processor, implements an AI model based agent decision-making method according to any one of claims 1 to 8; alternatively, a training method of the AI model according to any one of claims 9 to 10 is implemented.
12. A computer-readable storage medium, characterized in that a computer program is stored thereon, which, when executed by a processor, causes the processor to carry out the AI model-based agent decision-making method of any one of claims 1 to 8; alternatively, a training method of the AI model according to any one of claims 9 to 10 is implemented.
CN202010492473.3A 2020-06-03 2020-06-03 Agent decision making method, AI model training method, server and medium Active CN111401557B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010492473.3A CN111401557B (en) 2020-06-03 2020-06-03 Agent decision making method, AI model training method, server and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010492473.3A CN111401557B (en) 2020-06-03 2020-06-03 Agent decision making method, AI model training method, server and medium

Publications (2)

Publication Number Publication Date
CN111401557A CN111401557A (en) 2020-07-10
CN111401557B true CN111401557B (en) 2020-09-18

Family

ID=71435720

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010492473.3A Active CN111401557B (en) 2020-06-03 2020-06-03 Agent decision making method, AI model training method, server and medium

Country Status (1)

Country Link
CN (1) CN111401557B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111738372B (en) * 2020-08-26 2020-11-17 中国科学院自动化研究所 Distributed multi-agent space-time feature extraction method and behavior decision method
CN112494949B (en) * 2020-11-20 2023-10-31 超参数科技(深圳)有限公司 Intelligent body action policy making method, server and storage medium
CN112295232B (en) * 2020-11-23 2024-01-23 超参数科技(深圳)有限公司 Navigation decision making method, AI model training method, server and medium
CN114627981A (en) * 2020-12-14 2022-06-14 阿里巴巴集团控股有限公司 Method and apparatus for generating molecular structure of compound, and nonvolatile storage medium
WO2023206532A1 (en) * 2022-04-29 2023-11-02 Oppo广东移动通信有限公司 Prediction method and apparatus, electronic device and computer-readable storage medium
CN118378094B (en) * 2024-06-25 2024-09-17 武汉人工智能研究院 Chip layout model training and application method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737097A (en) * 2012-03-30 2012-10-17 北京峰盛博远科技有限公司 Three-dimensional vector real-time dynamic stacking technique based on LOD (Level of Detail) transparent textures
CN107679618A (en) * 2017-07-28 2018-02-09 北京深鉴科技有限公司 A kind of static policies fixed point training method and device
CN109241291A (en) * 2018-07-18 2019-01-18 华南师范大学 Knowledge mapping optimal path inquiry system and method based on deeply study

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102146398B1 (en) * 2015-07-14 2020-08-20 삼성전자주식회사 Three dimensional content producing apparatus and three dimensional content producing method thereof
CA3115898C (en) * 2017-10-11 2023-09-26 Aquifi, Inc. Systems and methods for object identification
CN108427989B (en) * 2018-06-12 2019-10-11 中国人民解放军国防科技大学 Deep space-time prediction neural network training method for radar echo extrapolation
CN109464803B (en) * 2018-11-05 2022-03-04 腾讯科技(深圳)有限公司 Virtual object control method, virtual object control device, model training device, storage medium and equipment
CN109711529B (en) * 2018-11-13 2022-11-08 中山大学 Cross-domain federated learning model and method based on value iterative network
CN110827320B (en) * 2019-09-17 2022-05-20 北京邮电大学 Target tracking method and device based on time sequence prediction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102737097A (en) * 2012-03-30 2012-10-17 北京峰盛博远科技有限公司 Three-dimensional vector real-time dynamic stacking technique based on LOD (Level of Detail) transparent textures
CN107679618A (en) * 2017-07-28 2018-02-09 北京深鉴科技有限公司 A kind of static policies fixed point training method and device
CN109241291A (en) * 2018-07-18 2019-01-18 华南师范大学 Knowledge mapping optimal path inquiry system and method based on deeply study

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
模型化强化学习研究综述;赵婷婷等;《计算机科学与探索》;20200401;第918-927页 *

Also Published As

Publication number Publication date
CN111401557A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN111401557B (en) Agent decision making method, AI model training method, server and medium
Seff et al. Continual learning in generative adversarial nets
CN108073981B (en) Method and apparatus for processing convolutional neural network
CN111295675B (en) Apparatus and method for processing convolution operations using kernels
CN112199190B (en) Memory allocation method and device, storage medium and electronic equipment
CN110728317A (en) Training method and system of decision tree model, storage medium and prediction method
CN114915630B (en) Task allocation method, network training method and device based on Internet of Things equipment
US20200327409A1 (en) Method and device for hierarchical learning of neural network, based on weakly supervised learning
CN111325664B (en) Style migration method and device, storage medium and electronic equipment
CN110781893B (en) Feature map processing method, image processing method, device and storage medium
CN112163601B (en) Image classification method, system, computer device and storage medium
CN108510058B (en) Weight storage method in neural network and processor based on method
CN111783937A (en) Neural network construction method and system
CN110132282A (en) Unmanned plane paths planning method and device
CN113177470B (en) Pedestrian trajectory prediction method, device, equipment and storage medium
CN111125519A (en) User behavior prediction method and device, electronic equipment and storage medium
CN111709493A (en) Object classification method, training method, device, equipment and storage medium
CN111967271A (en) Analysis result generation method, device, equipment and readable storage medium
CN112597217B (en) Intelligent decision platform driven by historical decision data and implementation method thereof
CN114757362A (en) Multi-agent system communication method based on edge enhancement and related device
CN113625753A (en) Method for guiding neural network to learn maneuvering flight of unmanned aerial vehicle by expert rules
CN114445684A (en) Method, device and equipment for training lane line segmentation model and storage medium
CN112200310A (en) Intelligent processor, data processing method and storage medium
CN117993443A (en) Model processing method, apparatus, computer device, storage medium, and program product
EP4246375A1 (en) Model processing method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant