CN109109863B

CN109109863B - Intelligent device and control method and device thereof

Info

Publication number: CN109109863B
Application number: CN201810850160.3A
Authority: CN
Inventors: 袁庭球; 黄韬; 黄永兵; 刘兵
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-07-28
Filing date: 2018-07-28
Publication date: 2020-06-16
Anticipated expiration: 2038-07-28
Also published as: CN109109863A

Abstract

The application provides intelligent equipment and a control method and device thereof, and belongs to the field of machine learning. The method comprises the steps that after an execution instruction aiming at a target task is received, detection data are obtained, the detection data and the target task are input into a sensing model, and representative detection data related to the target task are obtained; then, target tasks and representative detection data can be input into the planning model to obtain target state data; the target state data and part or all of the representative detection data can then be input to a control model to obtain control parameters for controlling the smart device, and the smart device is controlled based on the control parameters. The problems that in the control process of the intelligent equipment in the prior art, dependence on training samples is large and training effect is not ideal are solved, and better control over the intelligent equipment can be achieved.

Description

Intelligent device and control method and device thereof

Technical Field

The application relates to the field of machine learning, in particular to intelligent equipment and a control method and device thereof.

Background

Smart devices, also known as Intelligent Agents (IA), are autonomous entities (autonomous entities). Smart devices are able to sense the surrounding environment through sensors and may perform operations through actuators (actuators). Common smart devices typically include robots and autonomous vehicles, among others.

In the related art, a control model obtained based on machine learning algorithm training is generally arranged in a control device of an intelligent device, the control model can take data collected by a sensor as input data, and generate control parameters for controlling an actuator after processing the input data, and the control parameters can be used for indicating the actuator to execute corresponding operations. For example, for an autonomous vehicle, the control model may generate control parameters for controlling at least one actuator of a throttle, a brake, and a steering wheel based on road images captured by the camera.

However, the control effect of the control model in the related art depends on the sample amount of sample data used in the model training, and the control effect of the control model is poor when the sample amount is small.

Disclosure of Invention

The embodiment of the invention provides intelligent equipment and a control method and device thereof, which can solve the problem of poor control effect of a control model in the related art. The technical scheme is as follows:

in one aspect, a method for controlling a smart device is provided, which may be applied to a control apparatus of the smart device. The method can comprise the following steps: after receiving an execution instruction for a target task, acquiring detection data, where the detection data may include environmental data of an environment around the smart device and status data of the smart device. Then, the control device can input the detection data and the target task into the sensing model to obtain representative detection data associated with the target task; then, target tasks and representative detection data can be input into the planning model to obtain target state data; the target state data and part or all of the representative detection data can be input into the control model to obtain control parameters for controlling the intelligent equipment, and the intelligent equipment is controlled to execute the target task based on the control parameters. Wherein the control model is initialized based on the control theory data.

According to the control method, the control model is obtained based on control theory data initialization, so that the control model is less dependent on training samples during training, and the training effect is better. The control method based on the control model has good control effect when controlling the intelligent equipment.

Optionally, the perception model may be trained based on deep learning. The planning model can be trained based on a reinforcement learning mode. The control model can be trained based on reinforcement learning.

Of course, the perception model may also be trained based on reinforcement learning or deep reinforcement learning, and the planning model and the control model may also be trained based on deep learning or deep reinforcement learning.

Optionally, before receiving the execution instruction for the target task, the method may further include:

acquiring detection sample data and representative detection sample data associated with the specified task, wherein the detection sample data comprises environmental sample data of the surrounding environment of the intelligent equipment when the intelligent equipment executes the specified task and state sample data of the intelligent equipment; training the detection sample data, the designated task and the representative detection sample data based on a deep learning mode to obtain the perception model.

In the process of training based on the deep learning mode, the detection sample data and the designated task can be input into an initial perception model, and then parameters of the initial perception model are adjusted based on the difference between the representative detection data output by the perception model and the representative detection sample data to obtain the perception model.

Optionally, before receiving the execution instruction for the target task, the method further includes:

obtaining representative detection sample data and effect value sample data associated with a specified task; and training the initial planning model by adopting the representative detection sample data, the specified task and the effect value sample data based on a reinforcement learning mode to obtain the planning model.

In the training process based on the reinforcement learning mode, the representative detection sample data and the designated task can be input into the initial planning model, and the parameters of the initial planning model are adjusted based on the effect value sample data, so that the planning model is obtained.

initializing an initial control model based on the control theory data; acquiring partial or all representative detection sample data, target state sample data and effect value sample data associated with the specified task; and training the initial control model by adopting the obtained representative detection sample data, the target state sample data and the effect value sample data based on a reinforcement learning mode to obtain the control model.

Because the control theory data can directly reflect and represent the control rule and principle of the intelligent equipment, after the initial control model is initialized based on the control theory data, the required sample size in the subsequent model training can be effectively reduced, the training efficiency is improved, and the training cost is reduced.

Optionally, the control model may include: a control submodel for calculating the weight, and one or more calculation submodels for calculating the control parameter; before receiving an execution instruction for a target task, the method further comprises:

acquiring partial or all representative detection sample data, target state sample data and effect value sample data associated with the specified task; training an initial control submodel by adopting the obtained representative detection sample data, the target state sample data and the effect value sample data based on a reinforcement learning mode to obtain the control submodel; each of the calculation submodels is determined based on the control theory data.

The calculation submodel for calculating the control parameters is determined based on the control theory data, so that the training efficiency of the control model is effectively improved, and the training cost is reduced.

Optionally, the control model may include: a control submodel for calculating the weight, and one or more calculation submodels for calculating the control parameter;

inputting the target state data and part or all of the representative detection data into a control model, and obtaining a control parameter for controlling the intelligent device may include:

acquiring a group of target input data corresponding to each calculation submodel from the target state data and part or all of the representative detection data; respectively inputting each group of target input data to a corresponding calculation sub-model to obtain values of control parameters corresponding to each group of target input data; inputting the target state data and part or all of the representative detection data into the control submodel to obtain a group of weights; and determining the target value of the control parameter according to the group of weights and the values of the control parameter corresponding to each group of target input data.

For example, the values of the control parameters corresponding to each set of target input data may be weighted and summed by using the set of weights, so as to obtain the target value of the control parameter.

The target value of the control parameter is determined based on the weight output by the control submodel and the value of the control parameter calculated by the calculation submodel, so that the weight can form constraint on the target value of the control parameter, namely the weight can constrain the value range of the finally determined target value, the rationality of the control parameter output by the control model is ensured, and the safety and the reliability of the intelligent equipment controlled based on the control parameter are ensured.

Optionally, the method may further include:

after the intelligent device is controlled based on the control parameters, acquiring new state data of the intelligent device; determining a control effect according to the new state data and the target task; and adjusting parameters of one or more of the perception model, the planning model and the control model according to the control effect.

In the process of controlling the intelligent equipment, the control effect is evaluated according to the new state data of the intelligent equipment, and the parameters of the model are adjusted based on the control effect, so that the online adjustment and perfection of the model are realized, and the control effect when the intelligent equipment is controlled can be continuously improved.

Optionally, the process of determining the control effect according to the new state data and the target task may include:

and inputting the new state data and the target task into an evaluation model to obtain the control effect of the control parameter. The evaluation model can store evaluation algorithms corresponding to different tasks, after the evaluation model obtains the target task, the evaluation algorithm corresponding to the target task can be selected, and then new state data is processed based on the evaluation algorithm to determine the control effect. The evaluation model adopts different evaluation algorithms to evaluate the control effect when executing different tasks, thereby effectively improving the flexibility and reliability when evaluating the control effect.

Alternatively, the smart device may be an autonomous vehicle or a smart robot.

In another aspect, an apparatus for controlling a smart device is provided, where the apparatus may include at least one module, and the at least one module may be configured to implement the method for controlling a smart device provided in the above aspect.

In still another aspect, there is provided a control apparatus of a smart device, which may include: the intelligent device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the control method of the intelligent device provided by the above aspect.

In still another aspect, a computer-readable storage medium having instructions stored therein, when the computer-readable storage medium is run on a computer, causes the computer to execute the control method of the smart device provided in the above aspect.

In still another aspect, a smart device is provided, which may include the control apparatus of the smart device provided in the above aspect.

In a further aspect, there is provided a computer program product containing instructions which, when run on a computer, cause the computer to perform the method of controlling a smart device as provided in the above aspect.

The beneficial effects that technical scheme that this application provided brought can include at least:

according to the scheme, the acquired detection data and the target task can be input into a sensing model, and representative detection data associated with the target task are obtained. The target tasks and the representative test data may then be input to a planning model, resulting in target state data. The target state data and the representative detection data may then be input to a control model to derive control parameters for controlling the smart device. Finally, the smart device may be controlled based on the control parameter. Because the control model is obtained based on the initialization of the control theoretical data, the control theoretical data can directly express and reflect the control rule and principle of the intelligent equipment, and compared with the method of directly adopting the training sample to train in the related technology, the control model not only reduces the dependence of the control model on the training sample, improves the training efficiency, but also can ensure the control effect on the intelligent equipment.

Drawings

Fig. 1 is a schematic diagram of an intelligent device provided by an embodiment of the present invention;

fig. 2 is a flowchart of a control method for an intelligent device according to an embodiment of the present invention;

FIG. 3 is an architecture diagram of a control system provided by an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a perceptual model according to an embodiment of the present invention;

FIG. 5 is a flowchart of a method for obtaining perception data associated with a target task by a perception model according to an embodiment of the present invention;

FIG. 6 is an architecture diagram of a perceptual model provided by an embodiment of the present invention;

FIG. 7 is an architecture diagram of a planning model provided by an embodiment of the present invention;

FIG. 8 is an architecture diagram of a control model provided by an embodiment of the present invention;

FIG. 9 is an architecture diagram of a control submodel according to an embodiment of the present invention;

FIG. 10 is a flowchart of a method for adjusting parameters of models in a control system according to an embodiment of the present invention;

FIG. 11 is an architecture diagram of an evaluation algorithm model provided by an embodiment of the present invention;

FIG. 12 is a partial block diagram of a control system according to an embodiment of the present invention;

FIG. 13 is a flowchart of a method for training a perceptual model according to an embodiment of the present invention;

FIG. 14 is a flowchart of a training method for planning a known model according to an embodiment of the present invention;

FIG. 15 is a flowchart of a method for training a control model according to an embodiment of the present invention;

fig. 16 is a schematic structural diagram of a control apparatus of an intelligent device according to an embodiment of the present invention;

fig. 17 is a schematic structural diagram of a control apparatus of another intelligent device according to an embodiment of the present invention;

fig. 18 is a schematic structural diagram of a control apparatus of another intelligent device according to an embodiment of the present invention;

fig. 19 is a schematic structural diagram of a control apparatus of another smart device according to an embodiment of the present invention.

Detailed Description

With the development and maturity of artificial intelligence technology, the development of intelligent equipment related industries such as intelligent robots and automatic driving vehicles is greatly promoted, and meanwhile, the requirement on the control effect of the intelligent equipment is higher and higher. The embodiment of the invention provides a control method of intelligent equipment, which can be applied to a control device of the intelligent equipment. The control device may be configured in an intelligent device, or the control device may be configured in a control device that establishes a communication connection with the intelligent device. The control equipment can be communicated with each sensor and the driving device in the intelligent equipment, can acquire detection data acquired by each sensor, and controls the driving device of the intelligent equipment according to the detection data. Referring to fig. 1, the smart device 00 may be an autonomous vehicle or a smart robot, etc. The control device may be a server, and the server may be one server, a server cluster composed of several servers, or a cloud computing service center.

In the embodiment of the present invention, the smart device 00 may be provided with a plurality of sensors for detecting environmental data of the environment around the smart device, and a plurality of sensors for detecting status data of the smart device itself. Since different types of sensors have respective advantages in terms of detection range, detection accuracy, detection conditions, and the like, a plurality of types of sensors are generally provided in the smart device, and the functions of the plurality of types of sensors can be complemented with each other. For example, the sensors for detecting environmental data may include a vision sensor, a laser radar, an ultrasonic sensor, a millimeter wave radar, and the like; the sensors for detecting the state data may include a Global Positioning System (GPS) sensor, a speed sensor, a steering sensor, and the like. The control device of the intelligent device can acquire detection data (namely environment data and state data) acquired by the sensors, and can analyze and process the detection data according to received target tasks (such as going straight, backing or automatically adjusting temperature and the like) to obtain control parameters for controlling the intelligent device.

Taking the autonomous vehicle shown in fig. 1 as an example, the sensors for detecting environmental data provided on the autonomous vehicle may include a laser radar (provided at the roof, the front, and the rear of the vehicle, respectively), a camera (provided at the front, the rear, and the side of the vehicle, respectively), and a front millimeter wave radar (provided at the front and the rear of the vehicle, respectively), and the sensors for detecting status data may be provided inside the autonomous vehicle, not shown in the drawings.

Sensors provided on the autonomous vehicle can detect ambient environmental data as well as status data of the autonomous vehicle itself. After the control device of the autonomous vehicle acquires the detection data, a control parameter can be generated according to a target task issued by the user and the detection data. And then, the control device can send control parameters to a transmission device of the automatic driving vehicle through a control bus of the automatic driving vehicle, so as to control the speed and the steering of the automatic driving vehicle and ensure that the automatic driving vehicle can safely and reliably run on a road.

It should be noted that, for different types of intelligent devices, the type, number, and setting position of the sensors arranged on the intelligent devices may be adjusted according to actual situations, which is not limited in the embodiment of the present invention.

Fig. 2 is a flowchart of a control method for an intelligent device according to an embodiment of the present invention, where the method may be applied to a control apparatus for an intelligent device. Referring to fig. 2, the method may include:

step 101, after receiving an execution instruction for a target task, acquiring detection data.

In the embodiment of the invention, when a user wants to automatically control the intelligent device through the control device, the execution instruction can be triggered through preset triggering operation. For example, a task icon may be displayed in a display interface of the smart device, the trigger operation may be an operation of clicking an icon of a target task by a user, and the smart device may generate an execution instruction for the target task after detecting the trigger operation. The target task can be tasks of automatically driving to a specified destination, automatically following a vehicle, automatically backing a vehicle or automatically adjusting the temperature and humidity in the vehicle and the like. Optionally, the triggering operation may also be a voice operation, a sliding operation, or an operation of pressing a designated key, and the like, which is not limited in the embodiment of the present invention.

After receiving the execution instruction, the control device may acquire detection data collected by a sensor (e.g., an image sensor, a laser radar sensor, a millimeter wave radar sensor, a GPS sensor, etc.) of the smart device. The detection data may include environmental data of the environment surrounding the smart device, as well as status data of the smart device. The environment data may include attribute data of different objects in the environment around the smart device, and may include, for example, road data (e.g., width of a road, number of lanes, etc.), obstacle data (e.g., size, position, moving speed, etc. of an obstacle), indicator light data (e.g., color of an indicator light), and the like. The status data may include behavioral status data of the smart device, which may include, for example, movement speed and steering angle, among other data.

Optionally, the environment data may further include: temperature data, humidity data, and air pressure data, etc., the status data may further include: the residual electric quantity, the residual oil quantity, the tire pressure, the temperature in the vehicle, the humidity in the vehicle and the like of the intelligent equipment can reflect the running state or the running effect of the intelligent equipment. The data types included in the environment data and the state data can be flexibly adjusted according to the type of the sensor arranged in the intelligent device, and the embodiment of the invention is not limited.

Optionally, the detection data acquired by the control device may be raw data acquired by a sensor, for example, point cloud data acquired by a laser radar, or may be data subjected to preliminary processing by the sensor, for example, data such as size and distance of an object analyzed from the point cloud data by the laser radar.

The environment data in the detection data can comprise data such as lane line curvature k1, vehicle current speed v1, type b of front obstacle, speed v2, distance d2 from the vehicle, temperature t1 and wind speed v3, and the state data in the detection data can comprise data such as vehicle speed v0 and steering angle α of the automatic driving vehicle.

And 102, inputting the detection data and the target task into a sensing model to obtain representative detection data associated with the target task.

In the embodiment of the present invention, a control system may be configured in the control device, and the control device may control the intelligent device through the control system. Fig. 3 is an architecture diagram of a control system according to an embodiment of the present invention. As shown in fig. 3, the control system may include a perception model 01, a planning model 02, and a control model 03. The perception model 01 is used for obtaining representative detection data associated with a target task from the detection data. The planning model 02 is used to determine target status data of the smart device based on the representative detection data and the target task. The control model 03 is used for obtaining control parameters for controlling the intelligent device according to part or all of the representative detection data and the target state data.

After determining the target task and acquiring the detection data, the control device may input the detection data and the target task to the sensing model 01. The perception model 01 can obtain representative detection data associated with the target task according to the input data.

Optionally, in order to improve the data processing efficiency, the sensing model 01 may perform preprocessing on the input detection data, where the preprocessing may include: at least one of extracting, classifying, and fusing. For example, the perception model 01 may sequentially perform extraction, classification, and fusion processes on input detection data. After preprocessing, the perception model 01 can filter out useless data, and acquire attribute data of each object in the surrounding environment of the intelligent device and attribute data of the intelligent device.

Fig. 4 is a schematic structural diagram of a perceptual model provided by an embodiment of the present invention, and as shown in fig. 4, the perceptual model 01 may include a perceptual fusion sub-model 011 and a feature extraction sub-model 012. The perception fusion submodel 011 can be used for preprocessing the detection data and sending the preprocessed detection data to the feature extraction submodel 012. The feature extraction submodel 012 is configured to acquire representative detection data associated with a target task from the preprocessed detection data.

The sensing fusion sub-model 011 outputs detection data y at a certain time t_tIt can be understood that a large amount of detection data z output from the plurality of sensors at the time t_tGenerated after fusion. Therefore, the sensing data y output by the perceptual fusion submodel 011 at time t_tCan be expressed as P (y)_t|z_t). The perception fusion submodel 011 can calculate and output different y in the process of preprocessing the detection data_tAnd selecting y with the highest probability_tAs the actual output.

For example, taking an autonomous vehicle as an example, the perceptual fusion submodel 011 may identify static objects and dynamic objects in the surroundings of the autonomous vehicle from the input environmental data. For static objects, the perceptual fusion submodel 011 can detect data such as classification (i.e. type) and size; for dynamic objects, the perceptual fusion submodel 011 can detect its speed and intent (i.e., predicted trajectory) data. Then, the perceptual fusion sub-model 011 can classify and fuse the original data of the same object output by different sensors, so as to obtain the attribute data of different objects and the attribute data of the objects. For example, the raw data output by the radar is point cloud data consisting of a plurality of points, and the raw data output by the camera is image data, and the raw data does not contain semantic information of any object. The perception fusion sub-model 011 can classify the raw data output by each sensor by using algorithms such as kalman filtering and the like, and generate attribute data (also called feature data) for representing the features of the surrounding environment and the features of the autonomous driving vehicle after fusing the raw data output by various sensors. If each feature of the autonomous vehicle and its surroundings is expressed in data of one dimension, the dimension of the feature data may exceed 100 dimensions.

Further, the feature extraction sub-model 012 may obtain representative detection data associated with the target task from the preprocessed detection data, where the representative detection data may also be referred to as representative feature (representational state) data. The representative detection data, which is a key input of the planning model 02, directly affects and determines the execution result of the planning model 02, so that the accuracy of the representative detection data selected by the sensing model 01 directly affects the control effect of the control system.

Optionally, referring to fig. 3, the control system may further include a knowledge base 04, and the knowledge base 04 may store data (also referred to as knowledge) for assisting the operation of each model. By way of example, the data may include perception data for assisting the perception model 01 in acquiring representative detection data, planning data for assisting the planning model 02 in determining target state data, and control data for assisting the control model 03 in generating control parameters. The knowledge base 04 may form stored data using a table or a matrix, etc. Alternatively, the knowledge base 04 may also adopt other more complex forms, such as a simple complex shape, etc., to store the data in combination with the traveling of the geometric shape, which is not limited in the embodiment of the present invention.

The sensing data stored in the knowledge base 04 may include sensing data corresponding to different tasks, and after the feature extraction submodel 012 acquires the input target task and the detection data, the sensing data related to the target task may be acquired from the knowledge base 04. The perception data may include data that can assist the perception model in obtaining representative detection data. For example, the perception data may include a type of representative detection data. When the perceptual model is a model trained based on a machine Learning method such as Deep Learning (DL) or Reinforcement Learning (RL), the perceptual data may further include parameters of the perceptual model corresponding to the target task. The parameters may include model parameters, input parameters, and output parameters of the perceptual model.

As shown in fig. 5, the process of the perception model acquiring the perception data associated with the target task may include the following steps:

step 1021, determining the current scene of the intelligent device according to the detection data.

The perception model can determine a scene identifier corresponding to the currently acquired detection data according to the corresponding relation between the detection data and the scene identifier, and determine the scene indicated by the scene identifier as the scene where the intelligent device is currently located.

For example, assuming that the sensing model queries that the scene identifier corresponding to the currently acquired detection data is 1 from the corresponding relationship, the scene indicated by the scene identifier 1 may be determined as the scene where the autonomous vehicle is currently located.

And step 1022, detecting whether sensing data corresponding to the current scene of the intelligent device is recorded in the corresponding relation between the task, the scene and the sensing data.

In the embodiment of the invention, the control system can store the corresponding relation between tasks, scenes and perception data. Since the scenes where the intelligent device can be located are various, after the current scene where the intelligent device is located is determined, the sensing model can detect whether sensing data corresponding to the current scene is recorded in the corresponding relationship.

When the corresponding relationship records perception data corresponding to the current scene of the intelligent device, the perception model may execute step 1023; when the sensing model does not acquire the scene identifier corresponding to the detection data in step 1021, or the scene identifier acquired by the sensing model is not recorded in the corresponding relationship between the task, the scene and the sensing data, the sensing model may determine that the sensing data corresponding to the current scene where the smart device is located is not recorded in the corresponding relationship, and may execute step 1024.

For example, assuming that the smart device is an autonomous vehicle, the correspondence between the tasks, scenes, and perception data stored in the control system of the smart device may be as shown in table 1. Referring to table 1, the task with task identification 10 and the perception data corresponding to the scene with scene identification 1 may include types of representative detection data, including: lane line curvature, vehicle current speed, distance of the vehicle from the lane center line, and the type, speed, and distance of the preceding obstacle from the host vehicle. The task with the task identifier 20 and the perception data corresponding to the scene with the scene identifier 3 may include model parameters of the perception model, and the type of the input parameters: lane line curvature, type, speed and distance to the vehicle ahead of the obstacle, and type of output parameters: abstract features 1 and abstract features 2. Because the representative detection data output by the perception model obtained based on the training of machine learning methods such as deep learning or reinforcement learning are not directly selected from the detection data, but obtained by processing the input detection data, the output representative detection data can be called as abstract features or implicit features.

TABLE 1

And 1023, acquiring perception data corresponding to the target task and the current scene of the intelligent device.

When the corresponding relationship records the perception data corresponding to the current scene of the intelligent device, the perception model can directly acquire the corresponding perception data from the corresponding relationship according to the task identifier of the target task and the scene identifier of the current scene of the intelligent device. The task identifier of the target task may be carried in the execution instruction, or the perception model may determine the task identifier of the target task according to a pre-stored correspondence between the task and the identifier.

For example, assuming that the task identifier of the target task is 10 and the scene identifier of the scene where the autonomous vehicle is currently located is 1, the sensing data acquired by the sensing model may be the type of the representative detection data according to the corresponding relationship shown in table 1: lane line curvature, current speed of the vehicle, distance of the vehicle from the center line of the lane, type of obstacle ahead, speed, and distance from the vehicle.

And 1024, determining a similar scene similar to the current scene from the corresponding relation, and acquiring perception data corresponding to the target task and the similar scene.

When the scene in which the intelligent device is located is a new scene which is not recorded in the corresponding relationship between the task and the scene and the perception data, the perception model may determine a similar scene similar to the current scene in which the intelligent device is located from the scenes recorded in the corresponding relationship, and acquire the perception data corresponding to the target task and the similar scene.

Optionally, when determining a similar scene similar to the scene where the intelligent device is currently located, the perceptual model may respectively calculate the similarity between the detection data corresponding to each scene in the correspondence and the currently acquired detection data, and determine the scene corresponding to the detection data with the highest similarity as the similar scene.

For example, assume that the scene identification of the scene in which the autonomous vehicle is currently located, as determined by the perceptual model at step 1021 above, is 5. Then, since the scene identifier is not recorded in the correspondence relationship shown in table 1, the perception model may respectively calculate the similarity between the detection data corresponding to each of the scene identifiers 1 to 3 and the currently acquired detection data. If the similarity between the detection data corresponding to the scene identifier 2 and the currently acquired detection data is the highest, the perception model may determine the scene indicated by the scene identifier 2 as a similar scene similar to the scene indicated by the scene identifier 5. Further, the perception model may obtain perception data corresponding to the task identifier 10 and the scene identifier 2 from the corresponding relationship shown in table 1.

When the sensing data corresponding to the current scene is not recorded in the corresponding relation, the control system can quickly adapt to the new scene by acquiring the sensing data of the similar scene, the adaptability of the control system is strong, the application scene is not limited by the application scene of training sample data, and the application flexibility and the expandability of the control system are effectively improved.

In the embodiment of the present invention, the perception model 01 may further store a part of recently acquired perception data, and the corresponding relationship between the task, the scene and the perception data may be stored in the knowledge base 04. Therefore, after determining the current scene of the intelligent device, the perception model 01 may first determine whether the perception data corresponding to the target task and the scene is stored locally. If so, the perception model 01 can directly acquire the corresponding perception data. Otherwise, the perception model 01 may send the preprocessed detection data or the scene identifier of the current scene where the smart device is located to the knowledge base 04. After receiving the data sent by the sensing model 01, the knowledge base 04 may obtain sensing data corresponding to a current scene where the intelligent device is located, and feed the sensing data back to the sensing model 01.

In an alternative implementation manner, if the sensing data acquired by the sensing model 01 includes a type of representative detection data associated with a target task, the feature extraction sub-model 012 in the sensing model 01 may directly extract the type of detection data from the preprocessed detection data, so as to obtain the representative detection data.

For example, it is assumed that the preprocessed detection data include lane line curvature k1, vehicle current speed v1, distance d1 between the vehicle and the lane center line, type b of front obstacle, speed v2, distance d2 between the vehicle and the vehicle, temperature t1 and wind speed v3, vehicle speed v0 of the automatic driving vehicle, steering angle α₀. The type of representative detection data in the perception data acquired by the perception model 01 and associated with the target task includes: lane line curvature, current speed of vehicle, distance of vehicle from lane center line, type of obstacle aheadClass, speed, and distance from the host vehicle. The representative detection data extracted by the feature extraction submodel 012 from the preprocessed detection data may include: lane line curvature k1, vehicle current speed v1, distance d1 between the vehicle and the lane center line, type b of front obstacle, speed v2, and distance d2 from the vehicle.

In another optional implementation manner, if the perceptual model is a model trained based on machine learning such as deep learning or reinforcement learning, the perceptual data may include: parameters of a model corresponding to the target task. When the representative detection data associated with the target task is extracted through the perception model, the detection data and the target task can be directly input into the perception model adopting the parameters, and the output of the perception model is the representative detection data associated with the target task.

For example, the perception model may be a neural network model trained based on deep learning. For example, the model may be a Recurrent Neural Network (RNN) model or a Convolutional Neural Network (CNN) model. Fig. 6 is an architecture diagram of a perceptual model provided by an embodiment of the present invention, and as shown in fig. 6, the perceptual model may be a multi-layer interconnected neural network model, where each layer of neural network is composed of a plurality of neurons. The Weight (Weight) of each neuron may be included in the parameters of the model in the perception data acquired by the perception model. After obtaining the parameters of the model, the control device may configure corresponding weights for each neuron in the perceptual model, then input the preprocessed detection data and the target task to the perceptual model, and determine the output of the perceptual model as representative detection data.

Optionally, as described above, the sensing data acquired by the sensing model may further include a type of an input parameter of the sensing model. The control device may select, according to the type of the input parameter, detection data of a corresponding type from the preprocessed detection data and input the detection data to the perception model, so as to obtain representative detection data output by the perception model and associated with the target task.

By way of example, assume that the types of input parameters of the perceptual model in the perceptual data include: feature 1, feature 2, and feature 3, the control device may select 3 types of detection data in total, from the preprocessed detection data, of the feature 1 to the feature 3, and input the 3 types of detection data to the sensing model. The abstract features 1 and 2 output by the perceptual model may then be provided to the planning model as representative sensed data.

In the embodiment of the present invention, the feature extraction submodel 012 needs to extract representative detection data that can accurately reflect the environment where the smart device is located and the current state of the smart device from the detection data, so as to be processed by the planning model 02. The representative test data needs to be complete on the one hand and cannot be specified a priori by a human on the other hand. The probability that the feature extraction submodel 012 extracts the representative detection data h from the preprocessed detection data y may satisfy the following mathematical model: p (y)_1:N,z_1:N,h_1:N)＝∏_t＝1…Np(y_t|z_t)p(h_t|y_t)p(h_t|h_t-1). Where N refers to the N times at which the probability is calculated, p (y)_t|z_t) The sensing data z output from the sensor at the time t by the sensing fusion submodel 011_tIn (1) is selected from_tProbability of output, p (h)_t|y_t) Data y output from the perceptual fusion submodel 011 at time t by the feature extraction submodel 012_tIn selecting h_tProbability of output, p (h)_t|h_t-1) Means that the feature extraction submodel 012 selects h at the time of t-1_t-1On the premise of output, h is selected at the moment t_tThe probability of the output. Pi is a quadrature symbol and represents a calculation expression p (y)_t|z_t)p(h_t|y_t)p(h_t|h_t-1) And when the value of t is from 1 to N in sequence, the product of N numerical values is obtained.

The working principle of the perception model can be understood as follows: the perception model calculates the probability of outputting different representative detection data h through the mathematical model based on the input detection data, and takes the representative detection data h with the highest probability as the actual output.

In yet another alternative implementation, if the perceptual model is a model trained based on machine learning such as deep learning or reinforcement learning, the perceptual model may further include a plurality of perceptual submodels corresponding to different tasks. After the control device inputs the preprocessed detection data and the target task into the perception model, the perception model can determine a target perception sub-model corresponding to the target task, and inputs the preprocessed detection data into the target perception sub-model. The output of the target perception submodel is representative detection data associated with the target task. The architecture of each sensing submodel may be similar to the architecture of the sensing model shown in fig. 6, and is not described herein again.

Optionally, in the embodiment of the present invention, the sensing data acquired by the sensing model and associated with the target task may further include environmental experience data obtained by summarizing and analyzing historical environmental data of the intelligent device. For example, for an autonomous vehicle or a smart robot, the environmental experience data may include: at least one of weather experience data, road experience data, and obstacle experience data. The weather experience data may include weather forecast knowledge data and weather forecast data. The road experience data may include previously acquired attribute data (e.g., width, number of lanes, lane center line curvature, etc.) of different roads. The obstacle experience data may be general attribute data (e.g., average size and average moving speed, etc.) for different types of obstacles, including static obstacles and dynamic obstacles.

For example, assuming that the target task is automatic following, the perception model may obtain perception data associated with the automatic following task. For example, the perception data may include: the type of representative sensed data, attribute data of the road on which the autonomous vehicle is currently located, and obstacle experience data. Or, if the target task is to automatically adjust the in-vehicle temperature, the sensing data acquired by the sensing model and associated with the task of automatically adjusting the in-vehicle temperature may include: the type of representative test data, and weather experience data.

The perception model can also perfect the acquired representative detection data based on the environmental experience data so as to ensure the completeness and reliability of the representative detection data. For example, when a certain type of data (e.g., temperature, size or speed of an obstacle, etc.) specified in the sensing data is not included in the sensing data, the sensing model may extract the type of data from the environmental experience data as representative sensing data. Or, when the value of a certain representative detection data extracted from the detection data by the perception model exceeds the theoretical range, the perception model may correct the value of the representative detection data according to the same type of data in the environmental experience data. For example, assuming that the temperature extracted from the detection data by the sensing model is 100 ℃, the sensing model may determine the current temperature from the weather empirical data in the environmental empirical data since the temperature far exceeds the theoretical temperature range, and use the determined temperature as the representative detection data.

And 103, inputting the target task and the representative detection data into a planning model to obtain target state data.

In the embodiment of the present invention, after the planning model acquires the representative detection data sent by the perception model and the target task, the behavior of the intelligent device may be planned to determine the target state data of the intelligent device. The target status data is used to indicate the status that the smart device needs to reach. For example, for an autonomous vehicle or a smart robot, the target state data may include the position of a target point that the smart device needs to reach, and data such as the speed and steering angle at the target point.

Optionally, the planning model may obtain planning data associated with the target task, and determine the target state data under the guidance of the planning data, so as to ensure reliability and accuracy of the finally determined target state data. The planning data may include control experience data obtained by summarizing historical control experiences of the intelligent device, or may also include control theory data (for example, kinetic theory data and some common sense physical knowledge), and the planning data may be used for assisting the planning model in determining the intention of the intelligent device.

For autonomous vehicles or intelligent robots, the control experience data may include: at least one of driving experience data and driving regulation data. The driving experience data may include: empirical data (such as accident rate, congestion rate, barrier situation, traffic flow and potential accident point data obtained by big data analysis) of a plurality of roads frequently driven by the intelligent equipment; the driving rule data may include driving rules for several roads that the smart device frequently drives on (e.g., driving directions for a one-way road). When the planning model includes a model trained based on machine learning, the planning data may also include parameters of the model corresponding to the target task.

For example, for an autonomous vehicle, assuming the target task is an automatic following task, the planning data includes an accident rate of the current road, and the representative detection data includes a current speed of the vehicle, a speed of the preceding vehicle, and a distance to the preceding vehicle. The planning model may determine the speed at which the autonomous vehicle needs to be maintained based on the planning data and the representative sensed data. Wherein the higher the accident rate in the planning data, the lower the speed that the autonomous vehicle is required to maintain as determined by the planning model, without the change in the representative test data.

Optionally, part of the planning data that has been acquired recently may be stored in the planning model 02, and the corresponding relationship between the tasks and the planning data may be stored in the knowledge base 04. Therefore, after receiving the representative detection data and the target task sent by the perception model 01, the planning model 02 may first determine whether planning data corresponding to the target task is stored locally, and if so, may directly obtain the corresponding planning data; otherwise, the planning model 02 may obtain planning data corresponding to the target task from the knowledge base 04.

Fig. 7 is an architecture diagram of a planning model provided by an embodiment of the present invention, and as shown in fig. 7, the planning model 02 may include an intent prediction submodel 021, an intent decomposition submodel 022, and an intent execution submodel 023. The intention prediction submodel 021 may predict the intention of the smart device based on the target task and the environmental data in the acquired representative detection data. The intent decomposition submodel 022 may then decompose the intent, resulting in one or more subtasks. The intent execution submodel 023 may determine target state data corresponding to each subtask based on the planning data and the representative detection data.

The intention predicted by the intention predictor model 021 may include a global intention and a local intention, and each subtask obtained by decomposing the intention by the intention decomposition sub model 022 may also be referred to as an atomic intention. The global intention refers to a macroscopic target that the intelligent device needs to achieve, for example, in an automatic driving scene, if the target task is to drive from point a to point B, the global intention may be a driving track (i.e., navigation information) of the automatic driving vehicle to drive from point a to point B. The local intent may be an intent that is factored into the global intent in conjunction with environmental data, for example in an autonomous driving scenario, the local intent may include: and keeping the intention of driving on the current lane or changing lanes on a certain road section from the point A to the point B. The atomic intent may be a minimum intent for generating a control parameter by decomposing the global intent and the local intent, and may include, for example, an intent to accelerate or brake.

In an alternative implementation, the planning data obtained by the planning model may include intention prediction data, intention decomposition data, and intention execution data. The intent prediction data can assist the intent prediction submodel 021 in intent prediction. In an autonomous driving scenario, the intent prediction data may include driving experience data and driving rule data, which may include, for example, accident rates and congestion rates for several roads. The intention prediction submodel 021 can determine a driving track of the automatic driving vehicle from the point A to the point B according to the accident rate and the congestion rate of each road section from the point A to the point B.

The intent decomposition data may be rules that decompose an intent into subtasks, which may include one or more subtasks to which each intent corresponds. For example, assuming that the intent is to turn right 100 meters ahead, the subtasks corresponding to the intent may include: and changing the lane to the rightmost lane and turning right. The intent decomposition submodel 022 can decompose the intent output by the intent predictor submodel 021 based on the intent decomposition data, resulting in one or more subtasks. For example, assuming that the target task is a cut-in, the intent decomposition data associated with the cut-in task may include: accelerating, changing the left lane, changing the right lane and decelerating. If the environment data in the representative detection data acquired by the planning model includes: lane line curvature k1, vehicle current speed v1, distance d1 between the vehicle and the lane center line, type b of front obstacle, speed v2, and distance d2 from the vehicle. And based on the lane line curvature k1, the current lane can be determined to be a straight lane, and the speed v2 of the front obstacle is less than a preset threshold value. The intent decomposition submodel 022 may decompose the overtaking task into four subtasks: accelerating, changing the left lane, changing the right lane and decelerating.

The intention execution data may be a rule for determining the target state data based on the subtasks and the representative detection data, and the rule may be a correspondence relationship, or may be a physical or mathematical formula. The intent execution submodel 023 may process the representative detection data according to the rule to obtain target state data corresponding to each subtask. For example, for a subtask: for the left lane, the intention execution submodel 023 may calculate data such as a distance between the vehicle and a center line of the lane, a distance between a front obstacle and the vehicle, and the like in the representative detection data by using a dynamic formula and a mathematical formula, and obtain a position of a target point of the left lane to which the vehicle needs to move.

Optionally, the intention predictor submodel 021 and the intention execution submodel 023 may also be models trained based on a machine learning manner. Accordingly, the intent prediction data may be model parameters of an intent prediction submodel 021, and the intent execution data may be model parameters of an intent execution submodel 023.

In another alternative implementation, the planning model 02 may be a model trained based on machine learning, such as deep learning or reinforcement learning. The planning data may include parameters of a model corresponding to the target task. When the planning model obtains the target state data, the parameter configuration can be performed by adopting the parameter corresponding to the target task, then the input representative detection data and the target task can be processed, and the target state data can be output.

In yet another alternative implementation manner, the planning model may be a model trained based on machine learning such as deep learning or reinforcement learning, and the planning model may further include a plurality of planning submodels corresponding to different tasks. After the control device inputs the representative detection data and the target task into the planning model, the planning model may determine a target planning sub-model corresponding to the target task, and may input the representative detection data into the target planning sub-model. The output of the goal planning submodel is the goal state data.

Optionally, if the intelligent device is an autonomous vehicle or an intelligent robot, after the sensing model obtains the representative detection data, the movement trajectory of the obstacle may be predicted based on the obtained attribute data of the obstacle in the surrounding environment, and the prediction result is sent to the planning model. The planning model may make a reasonable decision on the behavior of the intelligent device in combination with the prediction result, i.e., determine the target state data. If the target task is a driving task, the planning model further needs to determine the target state data by combining the path planning information and the current position of the intelligent device.

And 104, inputting the target state data and part or all of the representative detection data into a control model to obtain control parameters for controlling the intelligent equipment.

In the embodiment of the present invention, the control model may be a model initialized based on control theory data. The control theory data may include: kinetic theoretical data (such as a law of mechanics) and some common knowledge of physics (such as a friction coefficient of a road surface).

Optionally, after receiving the target state data sent by the planning model, the control model may first obtain control data associated with the target task. The control data may be used to assist the control plan model in generating control parameters. The control data may include control theory data. When the control model is a model trained based on deep learning such as deep learning or reinforcement learning, the control data may include parameters of the model corresponding to the target task. The control model can generate the control parameters based on representative detection data output by the perception model and target state data output by the planning model under the guidance of control data.

Optionally, part of the control data that has been recently acquired may be stored in the control model 03, and the corresponding relationship between the task and the control data may be stored in the knowledge base 04. Therefore, after receiving the target state data sent by the planning model, the control model 03 may first determine whether control data corresponding to the target task is stored locally, and if so, may directly obtain the corresponding control data; otherwise, the control model 03 may obtain the control data corresponding to the target task from the knowledge base 04.

If the target state data output by the planning model includes target state data corresponding to each subtask, the control device may input the target state data corresponding to the current subtask to be processed and the representative detection data in the one or more subtasks to the control model, so as to obtain a control parameter corresponding to the current subtask to be processed.

As an alternative implementation, the control model may be a model trained based on machine learning. Accordingly, the control data may include model parameters of the control model corresponding to the target task, and a type of representative detection data (i.e., a type of input parameter) required to be input in the control model. That is, the model parameters of the control model are different for different tasks, and the type of representative test data input is also different. Correspondingly, the step 104 may include:

and 1041a, obtaining representative detection data of the corresponding type from the representative detection data.

For example, assuming that it corresponds to the automatic following task, the types of input parameters of the control model include: lane line curvature and speed of the obstacle ahead. If the representative detection data output by the perception model comprises: lane line curvature k1, vehicle current speed v1, distance d1 between the vehicle and the lane center line, type b of front obstacle, speed v2, and distance d2 from the vehicle. The corresponding type of representative detection data obtained by the control model from the input representative detection data may include: lane line curvature k1, velocity of the front obstacle v 2.

Optionally, the type of the input parameter of the control model corresponding to the automatic reverse task may include: the distance to the rear obstacle, the distance to the left obstacle, and the distance to the right obstacle. The types of input parameters of the control model, corresponding to a task of automatically driving to a specified destination, may include: lane line curvature, distance to the front obstacle, and speed of the front obstacle.

According to the analysis, for different tasks, the representative detection data required to be input by the control model can be all the representative detection data output by the perception model, or can be part of the representative detection data. Alternatively, the type of the input parameter of the control model corresponding to each task may be set by a developer in advance according to experience.

1042a, processing the target state data and the obtained representative detection data by using a control model of the model parameters to obtain control parameters for controlling the intelligent device.

The control model may be a neural network model trained based on machine learning, and may be, for example, an RNN model or a CNN model. The model parameters in the control data may include a weight for each neuron in the neural network model. After the control model obtains the model parameters, corresponding weights can be configured for each neuron. And then, processing the input target state data and the acquired representative detection data to obtain control parameters for controlling the intelligent equipment.

If the target state data includes target state data corresponding to each subtask, for each subtask, the control device may process the target state data corresponding to the subtask and the representative detection data using the control model to determine a control parameter corresponding to each subtask.

Taking an automatic driving scenario as an example, as shown in fig. 8, assuming that the output parameters of the control model include acceleration and steering angle of the steering wheel, the representative detection data input in the control model is representative detection data 1: lane line curvature k1, and representative detection data 2: speed of front obstacle v2., the control model can process the target state data output by the planning model and the two representative detection data, if the acceleration output by the control model is a1, and the steering angle of the steering wheel is α₁The control means may control the transmission and power means of the autonomous vehicle via the control bus of the autonomous vehicle such that the acceleration of the autonomous vehicle is a1 and the steering angle of the steering wheel is α₁。

As another optional implementation manner, the control model may further include a plurality of control submodels corresponding to different tasks, and each control submodel may be trained based on machine learning manners such as deep learning, reinforcement learning, or deep reinforcement learning. After the control device inputs the target state data and part or all of the representative detection data into the control model, the control model can determine a target control sub-model corresponding to the target task, and input the representative detection data and the target state data into the target control sub-model. The output of the target control submodel is the control parameter.

Wherein the type of representative detection data input to the control model may be determined by the control device based on the target task. For example, the control device may acquire control data associated with a target task, and the type of representative detection data required to be input by the control model may be included in the control data associated with the target task. That is, the type of representative sensed data that the control model requires to input may be different for different tasks. The control means may acquire, from the representative detection data output from the perception model, representative detection data of a corresponding type based on the type specified in the control data associated with the target task, and input the representative detection data to the control model.

As yet another alternative implementation, the control data may include a first rule corresponding to the target task for generating the control parameter, and a type of representative detection data for generating the control parameter. That is, the rules for generating the control parameter are different for different tasks, and the type of representative test data used to generate the control parameter is also different. Correspondingly, the step 104 may include:

and 1041b, acquiring representative detection data of the corresponding type from the representative detection data.

For example, it is assumed that the type of representative detection data for generating the control parameter corresponding to the automatic following task includes: lane line curvature and vehicle current speed. Representative detection data output by the perception model include: lane line curvature k1, vehicle current speed v1, distance d1 between the vehicle and the lane center line, type b of front obstacle, speed v2, and distance d2 from the vehicle. The corresponding type of representative detection data obtained by the control model from the input representative detection data may include: lane line curvature k1, vehicle current speed v 1.

1042b, processing the target state data and the obtained representative detection data by using the first rule to obtain a control parameter for controlling the intelligent device.

In the embodiment of the present invention, the first rule may be a correspondence relationship between the target state data, the representative detection data, and the control parameter. After the control model acquires the target state data sent by the planning model and acquires the representative detection data of the corresponding type, the corresponding control parameters can be directly acquired from the corresponding relation.

Alternatively, the first rule may also be a formula (e.g., a physical formula or a mathematical formula) between the target state data, the representative sensed data, and the control parameter. The control model may bring the acquired target state data sent by the planning model and the acquired representative detection data of the corresponding type into the formula, thereby obtaining the control parameter by calculation.

For example, it is assumed that, corresponding to the automatic following task, the control parameters are generated: the first rule for the acceleration a is the mathematical formula f₁Then the expression for the acceleration a may be: a ═ f₁(s, k, v), where s is the target state data, k is the lane line curvature, and v is the current speed of the vehicle. If the target state data received by the control model is s1, the obtained representative detection data of the corresponding type is: lane line curvature k1, vehicle current speed v 1. The control model may bring the above parameters into the mathematical formula f used to generate the acceleration₁Thereby obtaining an acceleration a2 for controlling the autonomous vehicle, the acceleration a2 satisfying: a2 ═ f₁(s1，k1，v1)。

As still another alternative implementation, the control model may include a control submodel for calculating the weights, and one or more calculation submodels for calculating the control parameters. The control submodel may be a model trained based on machine learning such as deep learning or reinforcement learning, and the calculation submodel may be a calculation formula (e.g., a physical formula or a mathematical formula) determined after initialization based on control theory data. The control data may include: a type of a set of input data corresponding to each computation submodel. Alternatively, the control data may include the control theory data, that is, the one or more calculation submodels for calculating the control parameter may also be obtained by the control model from the control data. Correspondingly, the step 104 may include:

and 1041c, acquiring a group of target input data corresponding to each calculation submodel from the representative detection data and the target state data.

The set of target input data for each computation submodel may include: at least one type of representative test data, and/or at least one type of target status data. The data types included in the target input data corresponding to any two computation submodels may be completely different or partially the same, which is not limited in the embodiment of the present invention.

By way of example, it is assumed that the control model includes parameters for calculating the control parameters: two calculation submodels of a steering angle of a steering wheel, a set of target input data corresponding to a first calculation submodel including: lane line curvature, and a set of input data corresponding to the second computation submodel includes: the distance between the vehicle and the central line of the lane line. If the representative detection data output by the perception model comprises: lane line curvature k1, vehicle current speed v1, distance d1 between the vehicle and the lane center line, type b of front obstacle, speed v2, and distance d2 from the vehicle. The set of target input data corresponding to the first computation submodel, which the control model obtains from the input representative detection data, may include: lane line curvature k 1; the set of target input data corresponding to the second computation submodel may include: the distance d1 of the vehicle from the center line of the lane.

And 1042c, respectively inputting each group of target input data to the corresponding calculation submodel to obtain the value of the control parameter corresponding to each group of target input data.

Each calculation submodel may be a formula (e.g., a physical formula or a mathematical formula) between a corresponding set of input data and values of the control parameters. The control model can bring the acquired target input data of each group into the corresponding formula respectively, so as to calculate and obtain the value of the control parameter corresponding to the target input data of each group.

For example, it is assumed that two calculation submodels for calculating the steering angle α of the steering wheel, which are control parameters included in the control data corresponding to the automatic following task, are mathematical formulas, and the first calculation submodel is α ═ f₂(k) The second computation submodel is α ═ f₃(d) In that respect Where k is the lane line curvature and d is the distance between the vehicle and the lane center line. If a group of target input data corresponding to the first calculation sub-model acquired by the control model is: the lane line curvature k1, the set of target input data corresponding to the second computation submodel is: the distance d1 of the vehicle from the center line of the lane. The control model may substitute the lane line curvature k1 forInto a corresponding mathematical formula f₂α, the steering angle corresponding to the set of target input data is obtained₂＝f₂(k1) In that respect Similarly, the control model may substitute the distance d1 between the vehicle and the center line of the lane into the corresponding mathematical formula f₃α, the steering angle corresponding to the set of target input data is obtained₃＝f₃(d1)。

Step 1043c, inputting the target state data and part or all of the representative detection data to the control submodel to obtain a set of weights.

Model parameters of the control submodel corresponding to the target task may be included in the control data. The control model can perform parameter configuration on the control submodel according to the obtained model parameters. The target state data, as well as some or all of the representative test data, may then be input to a control submodel that employs the model parameters to derive a set of weights. The set of weights may include a plurality of weights. Among them, part or all of the representative detection data input to the control submodel may be determined according to the target task, and for example, the control submodel may be determined according to the type of the representative feature data recorded in the control data associated with the target task.

For example, fig. 9 is an architecture diagram of a control submodel according to an embodiment of the present invention, and as shown in fig. 9, it is assumed that part of representative detection data acquired by the control model is: representative test data 3: lane line curvature k1, representative detection data 4: the distance d1 of the vehicle from the center line of the lane. The control model may input the target state data output by the planning model and the two representative detection data to the control submodel. The set of weights for the control submodel output may include: the weight of a set of target input data corresponding to the first computation submodel is w1, and the weight of a set of target input data corresponding to the second computation submodel is w 2.

Optionally, in this embodiment of the present invention, the control data associated with the target task may further include a reference value of the control parameter, where the reference value may be a constant, and is used to reflect an influence of other implicit relevant data on the control parameter. Accordingly, the output parameter of the control submodel may further include a weight corresponding to the reference value. For example, as shown in fig. 9, the control submodel may output a weight w3 corresponding to a reference value of the steering angle of the steering wheel.

And step 1044c, determining a target value of the control parameter according to the group of weights and the values of the control parameter corresponding to each group of target input data.

In an optional implementation manner of the embodiment of the present invention, the control model may multiply the weight of each set of target input data with the value of the control parameter corresponding to the set of target input data to obtain a product corresponding to each set of target data, and then add the products corresponding to each set of target input data to obtain the target value of the control parameter. That is, the control model may perform weighted summation on the values of the control parameters corresponding to each group of target input data according to the weight of each group of target input data, so as to obtain the target value of the control parameter.

For example, it is assumed that the steering angle of the steering wheel obtained based on the lane line curvature k1 takes a value of α₂The steering angle of the steering wheel obtained based on the distance d1 between the vehicle and the lane center line is α₃The reference value of the steering angle of the steering wheel is α₀The lane line curvature k1 corresponds to the weight w1, the distance d1 between the vehicle and the lane center line corresponds to the weight w2, and the reference value of the steering angle corresponds to the weight w3., and the control model weights and sums the values of the steering angle of the steering wheel to obtain the target value α of the steering angle of the steering wheel_avCan satisfy the following conditions:

α_av＝w1×α₂+w2×α₃+w3×α₀。

in another optional implementation manner of the embodiment of the present invention, the control model may multiply the weight corresponding to each set of target input data by the value of the control parameter corresponding to the set of target input data, so as to obtain the product corresponding to each set of target input data. Then, the control model may take the product with the maximum or minimum value among the products corresponding to each group of target input data as the target value of the control parameter. Or, the control model may also select a set of target input data corresponding to the product with the largest or smallest value, and then take the value of the control parameter corresponding to the selected set of target input data as the target value of the control parameter.

In another optional implementation manner of the embodiment of the present invention, a weighted summation algorithm for calculating the target value may also be stored in the control model in advance, and the control model may perform weighted summation on the values of the control parameters corresponding to each group of target input data by using the group of weights based on the weighted summation algorithm, so as to obtain the target value of the control parameter.

Optionally, the step 1043c may also be executed before the step 1042c, that is, the control model may first obtain a set of weights, and then calculate values of the control parameters corresponding to each set of target input data, and a part of the weights in the set of weights may also be used as input parameters of the computation submodel to calculate values of the control parameters.

For example, assuming that the set of weights for controlling the submodel output includes two computation submodels where W1, W2, and w3. compute the steering angle α of the steering wheel, the set of target input data corresponding to the first computation submodel may include the vehicle wheel base W, the steering radius R1 at the current position, the steering radius R2 at the next target point position, the distance d from the lane center line, and the weight w1., the first computation submodel may be:

α₂asin (W/((R1+ R2-d) × 0.5 × W1)), where asin is an arcsine function. The turning radius may refer to a distance between a longitudinal (i.e., length-wise) symmetry plane of the vehicle and the instant turning center O.

The set of target input data corresponding to the second calculation submodel may include a distance d from the center line of the lane and a farthest recognition distance A _ d of the vehicle, and the second calculation submodel may be α₃＝asin(d/A_d)。

Target value α stored in the control model for calculating steering angle_avThe weighted sum algorithm of (a) may be:

α_av＝w2*α₂+(1-w2)*(w3*α₃+(1-w3)*α₀) Wherein, α₀Is a reference value of the steering angle of the steering wheel.

And 105, controlling the intelligent equipment to execute the target task based on the control parameter.

In an embodiment of the present invention, the control device may be connected to an underlying drive module (e.g., a transmission and power plant of an autonomous vehicle) of the smart device via a control bus of the smart device. After the control device obtains the control parameter, an operation instruction may be generated based on the control parameter, and the operation instruction may be sent to a bottom layer driving module of the intelligent device, where the operation instruction may be used to instruct the bottom layer driving module to drive the intelligent device to execute a corresponding operation, that is, to execute the target task. In the case of an autonomous vehicle, the operation is generally to adjust the steering angle of a steering wheel, adjust acceleration, step on an accelerator, or brake.

For example, assume that the control parameter includes a steering angle of α_avThe control device may control the autonomous vehicle to adjust the steering angle of the steering wheel to α_av。

According to the control method of the intelligent device, the acquired detection data and the target task can be input into the sensing model, and representative detection data associated with the target task can be obtained. The target tasks and the representative test data may then be input to a planning model, resulting in target state data. The target state data and the representative detection data may then be input to a control model to derive control parameters for controlling the smart device. Finally, the smart device may be controlled based on the control parameter. Because the control model is obtained based on the initialization of the control theoretical data, the control theoretical data can directly express and reflect the control rule and principle of the intelligent equipment, and compared with the method of directly adopting the training sample to train in the related technology, the control model not only reduces the dependence of the control model on the training sample, improves the training efficiency, but also can ensure the control effect on the intelligent equipment.

In the embodiment of the present invention, in order to further improve the control effect on the intelligent device, the control device may further evaluate the control effect on the control parameters, and adjust the parameters of one or more models in the control system based on the evaluation result. Fig. 10 is a flowchart of a method for adjusting parameters of models in a control system according to an embodiment of the present invention, and referring to fig. 10, the method may include:

and 106, acquiring new state data of the intelligent device after the intelligent device is controlled based on the control parameters.

The control device may acquire new state data of the smart device after controlling the smart device to perform a corresponding operation based on the control parameter. Similar to the sensed data, the new status data may be collected by a sensor provided on the smart device. In an embodiment of the present invention, the type of the new state data may be the same as the type of the detection data; or, may be the same type as the representative detection data extracted by the perception model; alternatively, the control device may further store a correspondence between the task and the new status data type, and the control device may determine the new status data type corresponding to the target task based on the correspondence, and acquire the new status data of the corresponding type.

And step 107, determining a control effect according to the new state data and the target task.

Taking an automatic driving scenario as an example, assuming that the target task is driving along a lane center line (i.e. in an ideal case, the distance between the vehicle and the lane center line is 0), the control device may determine the control effect of the control parameter according to the difference between the distance d between the vehicle and the lane center line in the new state data and 0, that is, the size of the distance d between the vehicle and the lane center line. And, the smaller the distance d, the better the control device can determine the control effect.

Because the types of the control parameters generated by the control device are different for different tasks and the complexity of the control device in controlling the intelligent equipment is different, the control device can also adopt different evaluation algorithms to determine the control effect for different tasks.

Referring to fig. 3, the control system in the control device may further include an evaluation model 05, and the evaluation model 05 or the knowledge base 04 may store the corresponding relationship between the task and the evaluation algorithm. After the evaluation model 05 obtains the new state data of the intelligent device and the target task, an evaluation algorithm corresponding to the target task may be obtained from the corresponding relationship, and the obtained evaluation algorithm corresponding to the target task may be used to determine the control effect.

For simpler tasks (e.g. automatic following or driving along the lane centre line), the evaluation algorithm may be a calculation between the new state data and the evaluation result. After the control system acquires the new state data, the new state data can be directly brought into the calculation formula, so that an evaluation result for reflecting the control effect is obtained through calculation.

For example, in the case of an automatic following task, the target of the automatic following task is that a preceding vehicle is kept at a certain distance, and the host vehicle is kept within the lane line. The control system controls the intelligent equipment to execute corresponding operation based on the received execution instruction of the automatic car following, and then can determine a new state data type corresponding to the automatic car following task according to the corresponding relation between the task and the new state data type, and acquire the new state data of the corresponding type. Assume that the new state data acquired by the evaluation model 05 includes: distance D1 between the vehicle and the center line of the lane, and distance D2 between the vehicle and the vehicle ahead. The evaluation algorithm corresponding to the automatic car following task is a formula f₀According to the formula f₀The evaluation result obtained by calculation can satisfy the following conditions:

s＝f₀(b.times.D 1, (1-b). times.D 2). Wherein b is a predetermined coefficient which is greater than or equal to 0 and less than or equal to 1.

For complex tasks, the evaluation algorithm can be an evaluation algorithm model trained in a machine learning-based mode. For example, the evaluation algorithm model may be trained based on reinforcement learning (for example, reference may be made to an implementation manner of a value network in reinforcement learning), or may be trained by using a deep learning method based on supervised learning. In addition, the training process of the evaluation algorithm model may be offline training or online training, which is not limited in the embodiments of the present invention.

Referring to fig. 11, the input parameters of the evaluation algorithm model may include representative detection data of the output of the perception model in addition to the new state data. The evaluation model may input new state data and representative detection data to the evaluation algorithm model to obtain an evaluation result. The evaluation result can be a value which is greater than or equal to 0 and less than or equal to 1, and the value of the value is positively correlated with the quality of the evaluation result, namely, the larger the value is, the better the evaluation result is, namely, the better the control effect is. If the value is less than a threshold value, the evaluation result is poor, that is, the control effect of the control device is not expected or not satisfied. If the statistical value (e.g. average value) of the evaluation result over a period of time is always lower than a certain threshold value, the control device may determine that the operation state of a certain model in the control system or the whole control system is poor, and the parameters of each model need to be adjusted.

And 108, adjusting the parameters of the control system according to the control effect.

The parameters may include at least one of model parameters, input parameters, and output parameters of the control system. Optionally, as shown in fig. 3, since the control system may include a perception model, a planning model and a control model, after the evaluation model determines the control effect, as an optional implementation manner, the evaluation model may send the control effect to each model. One or more of the perception model, the planning model, and the control model may adjust its own parameters according to the control effect, and the adjusted parameters of each model may include at least one of model parameters, input parameters, and output parameters. As another optional implementation manner, the control device may respectively adjust parameters of each model according to the evaluation effect, or randomly adjust parameters of several models, or may adjust parameters of a model of a corresponding type according to a type of a preset model.

For example, the evaluation model 05 may send evaluation results for reflecting the control effect to the perception model 01, the planning model 02, and the control model 03, respectively. Each model can adjust its own parameters when the value of the evaluation result is smaller than a preset threshold value. Taking the sensing model 01 as an example, if the sensing model 01 detects that the value of the control effect is smaller than the preset threshold, it may be determined that the type of the currently selected representative detection data is not suitable for the target task and the current scene, and thus the type of the representative detection data corresponding to the target task and the current scene may be adjusted. Alternatively, if the perceptual model 01 is a model trained based on deep learning, the perceptual model 01 may adjust the weight of each neuron or the type of representative detection data output by the perceptual model 01.

Optionally, in the embodiment of the present invention, the data associated with the target task acquired by each model may further include a constraint parameter for defining an adjustment range of a parameter of the model. Accordingly, when adjusting the parameters of each model according to the control effect, each model can be adjusted within the range defined by the constraint parameters. Therefore, the output of the control system can be ensured to meet the actual condition requirement, and the safety and the reliability of the intelligent device in the control process can be further ensured.

Referring to fig. 3, it can be seen that, the sensing model 01, the planning model 02 and the control model 03 are closely combined, and adjustment of input parameters or output parameters of each model may affect adjacent models, so that when a model adjusts its input parameters or output parameters, parameters of the adjacent models also need to be adjusted accordingly. For example, if the perceptual model adjusts its output parameters, i.e., the type of representative sensed data, the input parameters of the planning model are adjusted accordingly.

In the embodiment of the invention, the control device can continuously adjust the parameters on line according to the control effect generated by the evaluation model each time, thereby continuously perfecting the model per se and improving the control effect. The feedback control effect of the evaluation model is more direct, so that the adjustment direction of the control device during parameter adjustment is more accurate.

Taking the perception model 01 as an example, as the selection of the representative detection data of the perception model 01 is complex for complex tasks or complex scenes, it is difficult to extract appropriate representative detection data through experience or simple algorithms. In the embodiment of the invention, the type of the selected representative detection data can be continuously adjusted through the online evaluation and feedback of the evaluation model 05, so that the performance of the evaluation model can be continuously improved, and more appropriate representative detection data can be extracted subsequently. For example, the dimension of the representative detection data extracted by the perception model 01 can be reduced from 100 dimensions to 5 dimensions by continuous adjustment.

Optionally, the control device may evaluate a control effect of the entire control system at an initial stage when controlling the intelligent device, and adjust a parameter of each model in the control system based on the control effect. Thereafter, the control apparatus may perform effect evaluation and parameter adjustment only for a specific model (for example, a control model) in the control system. When the effect of a certain specific model is evaluated, the parameters of other models can be kept unchanged, only the parameters of the specific model are adjusted, then the control effect of the control parameters output by the control system is evaluated, and the evaluation result obtained by the evaluation can be used as the evaluation result after the parameters of the specific model are adjusted. The method for evaluating the control effect of the specific model and the method for adjusting the parameters of the specific model may refer to the above steps 106 to 108, which are not described herein again.

In the embodiment of the present invention, after controlling the smart device to execute the operation each time, the control device may evaluate the control effect of the smart device through the methods shown in the above steps 106 to 108, and adjust the parameters of the control system. Alternatively, the control device may control the smart device to perform the operations several times, and then perform the methods shown in the above steps 106 to 108. Alternatively, the control device may execute the methods shown in steps 106 to 108 after receiving the adjustment command, where the adjustment command may be triggered by the user.

And step 109, updating the data stored in the knowledge base according to the adjusted parameters of each model.

In the embodiment of the invention, after each model in the control system completes the adjustment of the parameters of the model, the data stored in the knowledge base can be updated according to the adjusted parameters. For example, the sensing model, the planning model, and the control model may respectively send the adjusted parameters to the knowledge base 04, and the learning sub-model 041 in the knowledge base 04 may update the sensing data stored in the knowledge base sub-model 042 according to the adjusted parameters of the sensing model 01; the planning data stored in the knowledge base submodel 042 is updated according to the adjusted parameters of the planning model 02, and the control data stored in the knowledge base submodel 042 is updated according to the adjusted parameters of the control model 03.

The learning submodel 041 is directly related to the algorithms of the sensing model 01, the planning model 02 and the control model 03, and the learning submodel 041 may be a part of the knowledge base 04 or a part of the sensing model 01, the planning model 02 and the control model 03, that is, each of the sensing model 01, the planning model 02 and the control model 03 may have a learning submodel 041.

Optionally, the learning submodel 041 may also include a neural network model for learning and extracting data, and the knowledge base 04 may input the adjusted parameters sent by each model to the neural network model, and update the data corresponding to each model stored in the knowledge base submodel 042 based on the output of the neural network model.

For example, taking the feature extraction submodel 012 in the perception model 01 as an example, referring to fig. 12, when the feature extraction submodel 012 adjusts the type of the representative detection data that it extracts according to the control effect, the feature extraction submodel 012 can transmit the adjusted type of the representative detection data to the learning submodel 041 in the knowledge base 04. The learning submodel 041 may update the type of representative detection data stored in the knowledge base submodel 042 corresponding to the target task and the current scene it is currently in based on the received type. Therefore, online updating and adjustment of the perception data can be realized, and the reliability of the representative detection data extracted based on the perception data is ensured.

It should be noted that, the order of the steps of the control method for the intelligent device provided by the embodiment of the present invention may be appropriately adjusted, and the steps may also be increased or decreased according to the situation. For example, steps 106 to 109 can be deleted according to the circumstances, and any method that is within the technical scope of the present disclosure and can be easily conceived by those skilled in the art is within the scope of the present disclosure, and thus, the detailed description is omitted.

In summary, the embodiments of the present invention provide a control method for an intelligent device, where the method may input the acquired detection data and the target task into a sensing model, so as to obtain representative detection data associated with the target task. The target tasks and the representative test data may then be input to a planning model, resulting in target state data. The target state data and the representative detection data may then be input to a control model to derive control parameters for controlling the smart device. Finally, the smart device may be controlled based on the control parameter. Because the control model is obtained based on the initialization of the control theoretical data, the control theoretical data can directly express and reflect the control rule and principle of the intelligent equipment, and compared with the method of directly adopting the training sample to train in the related technology, the control model not only reduces the dependence of the control model on the training sample, improves the training efficiency, but also can ensure the control effect on the intelligent equipment.

Furthermore, the method provided by the embodiment of the invention can also evaluate the control effect of the control device and adjust the parameters of each model in the control device according to the control effect, so that the performance of the control device can be continuously improved in the use process of the control device, and the control effect of the intelligent equipment is improved.

The embodiment of the invention provides a training method for a control system of intelligent equipment, which can be used for training a perception model, a planning model and a control model included in the control system in the embodiment of the method. The training method can be applied to a training apparatus. The training device and the control device of the intelligent device may be the same device, or both may be configured differently in the same device, for example, both devices may be configured in the intelligent device. Alternatively, the training device and the control device may be disposed in different devices, for example, the training device may be disposed in a training server, and the control device may be disposed in the smart device. After the training device completes the training of each model in the control system, each trained model can be sent to the control device.

The perception model can be obtained by training based on a deep learning mode, for example, the perception model can be obtained by training based on a deep learning method of supervised learning; the planning model and the control model can be trained based on reinforcement learning. Of course, the perception model may also be trained based on reinforcement learning or deep reinforcement learning, and the planning model and the control model may also be trained based on deep learning or deep reinforcement learning. The embodiment of the invention does not limit the type of the machine learning method based on which each model is trained.

Optionally, the reinforcement learning manner may include a Q-learning (Q-learning) method or a State-Action-Reward-State-Action (State Action rewarded State Action, SARSA) method, etc. The Deep reinforcement learning manner may include a Deep Q Network (DQN) or a Deep Deterministic Policy Gradient (DDPG), and the like.

As an alternative implementation, referring to fig. 13, the training process of the perception model may include:

step 201a, obtaining detection sample data and representative detection sample data associated with the specified task.

The detection data may include environmental sample data of the ambient environment of the smart device when performing the specified task, as well as state sample data of the smart device. The representative test sample data associated with the specified task may be obtained from a sample database.

Step 202a, training the detection sample data, the designated task and the representative detection sample data based on a deep learning mode to obtain the perception model.

In the process of training based on the deep learning method, the training device may input the detection sample data and the designated task to an initial perceptual model, and obtain representative detection data associated with the designated task and output by the initial perceptual model. Then, the training device may continuously adjust parameters (e.g., at least one of model parameters, input parameters, and output parameters) of the initial perceptual model according to a difference between the representative detection data output by the initial perceptual model and the representative detection sample data, so as to obtain the perceptual model.

Optionally, in the embodiment of the present invention, the training device may train the initial perception model by using detection sample data of different tasks and representative detection sample data, so as to obtain parameters of the perception models corresponding to the different tasks, and may store the parameters of the perception models corresponding to the different tasks as perception data in the knowledge base 04. Or, the training device may train the initial perception model by using different task detection sample data and representative detection sample data, so as to obtain perception submodels corresponding to different tasks.

As an alternative implementation, referring to fig. 14, the training process of the planning model may include:

step 201b, obtaining representative detection sample data and effect value sample data associated with the specified task.

The representative detection sample data associated with the specified task, as well as the target state sample data, may both be obtained from a sample database. In the case of an automatically driven vehicle, the effect value sample data may be determined according to a difference between target state data of the vehicle during manual driving and target state data output by the initial planning model in the same scene. The same scene means that the executed tasks are the same, and the acquired representative detection data are the same.

And 202b, training the initial planning model by adopting the representative detection sample data, the specified task and the effect value sample data based on a reinforcement learning mode to obtain the planning model.

Further, the training device may train the initial planning model based on reinforcement learning. In the training process, the representative detection sample data and the designated task may be input into the initial planning model, and parameters of the initial planning model are adjusted based on the effect value (Q value) sample data to obtain the planning model. The reinforcement learning method may include a Q-learning method or an SARSA method.

Optionally, in the embodiment of the present invention, the training device may train the initial planning model by using representative detection sample data and effect value sample data of different tasks, so as to obtain parameters of the planning model corresponding to the different tasks, and may store the parameters of the planning model corresponding to the different tasks as planning data in the knowledge base 04. Or, the training device may train the initial planning model by using representative detection sample data and effect value sample data of different tasks, so as to obtain planning sub-models corresponding to the different tasks.

As an alternative implementation, referring to fig. 15, the training process of the control model may include:

and step 201c, initializing the initial control model by using the control theory data.

The training device can configure an initial value for the initial control model according to the control theory data, so that the initial control model is initialized. For example, if the control model is trained based on the Q-learning method, the training device may initialize a Q-table (Q-table) of the initial control model according to the control theory data. The control theory data may include: the dynamic theoretical data (such as a mechanical law) and some common-sense physical knowledge (such as a friction coefficient of a road surface) and the like, because the control theoretical data can directly express the control rule and principle of the intelligent device, a large amount of training sample data is collected without machine learning for training, the training amount of the machine learning can be effectively reduced, the training speed is increased (for example, the training speed can be increased by about 100 times), the training cost is lower, and the training effect is better.

Step 202c, acquiring partial or all representative detection sample data, target state sample data and effect value sample data associated with the specified task.

The representative detection sample data, target state sample data and effect value sample data associated with the specified task may all be obtained from a sample database. In the case of an automatically driven vehicle, the effect value sample data may be determined according to a difference between a control parameter of the vehicle during manual driving and a control parameter output by the initial control model in the same scene. The same scene may mean that the acquired target state data and the representative detection data are the same.

And 203c, training the initial control model by adopting the acquired representative detection sample data, the target state sample data and the effect value sample data based on a reinforcement learning mode to obtain the control model.

The reinforcement learning method may include a Q-learning method or an SARSA method. In the training process based on the reinforcement learning mode, the obtained representative detection sample data and the target state sample data can be input into the initial control model, and the parameters of the initial control model can be continuously adjusted according to the effect value sample data so as to continuously improve the performance of the initial control model and finally obtain the control model.

Compared with training by directly adopting training sample data, the training method provided by the embodiment of the invention has higher training efficiency, lower training cost and smaller dependence on the training sample by initializing the control model through the control theoretical data, for example, if the control theoretical data comprises a formula of calculating the steering angle α of the steering wheel according to the curvature k of the lane line, α f₂(k) And the formula of calculating the steering angle α of the steering wheel according to the distance d between the vehicle and the center line of the lane, wherein α is f₃(d) After the initial control model is initialized by adopting the control theory data, the training device does not need to pass a large amount of training samplesThe data learns the relationship between the curvature k of the lane line, the distance d between the vehicle and the lane center line and the steering angle α of the steering wheel, so that the training efficiency can be effectively improved, and the sample size required by training is reduced.

Optionally, in the embodiment of the present invention, the training device may train the initial control model by using representative detection sample data, target state sample data, and effect value sample data of different tasks, so as to obtain parameters of the control models corresponding to the different tasks, and may store the parameters of the control models corresponding to the different tasks as control data in the knowledge base 04. Or, the training device may train the initial control model by using representative detection sample data, target state sample data, and effect value sample data of different tasks, so as to obtain control submodels corresponding to the different tasks.

As can be seen from the above steps 1041c to 1044c, the control model may include a control sub-model and one or more computation sub-models. Therefore, in training the control model, as another alternative implementation manner, in step 201c, the training device may initialize one or more calculation models based on the control theory data, that is, determine a calculation formula of each calculation model. Accordingly, in step 203c, the initial control submodel may be trained by using the obtained representative detection sample data, the target state sample data, and the effect value sample data based on a reinforcement learning manner, so as to obtain the control submodel for calculating the weight.

Optionally, in the embodiment of the present invention, after the training device completes training of each model, the data corresponding to each model stored in the knowledge base may also be updated according to the parameters of each model.

In the embodiment of the present invention, when training each model in the control system, the training device may acquire a large amount of training sample data (e.g., detection sample data, representative detection sample data, and target state sample data) for different tasks, train each model by using the above method for each training sample data, and update the data stored in the knowledge base, thereby continuously improving the data stored in the knowledge base and the operation effect of each model in the control system.

Optionally, in the embodiment of the present invention, the data stored in the knowledge base may further include constraint parameters for defining an adjustment range of parameters of each model in the control system. Accordingly, when each model is trained, the parameters of the model need to be adjusted within the range defined by the constraint parameters.

Therefore, the data stored in the knowledge base can form constraints for training of each model in the control system, the training method based on the constraints can reduce the training amount of machine learning, can ensure that the output of the model meets the requirements of actual conditions, and can ensure the safety and reliability when the intelligent equipment is controlled. For example, the steering angle of the steering wheel output by the control system can be ensured within a certain range, and the safety and the stability of automatic driving are ensured.

In summary, the embodiment of the present invention provides a method for training each model in a control system of an intelligent device, where the method may initialize a control model by using control theory data, and the initialized control model requires a small amount of samples during training, and has a high training efficiency and a low training cost.

Fig. 16 is a schematic structural diagram of a control apparatus of an intelligent device according to an embodiment of the present invention, where the control apparatus may be configured in the intelligent device, or may also be configured in a control device that establishes a communication connection with the intelligent device. The control device can be used for realizing the control method of the intelligent equipment provided by the method embodiment. As shown in fig. 16, the apparatus may include:

the first obtaining module 301 may be configured to implement the method shown in step 101 in the foregoing method embodiment.

The first processing module 302 may be configured to implement the method shown in step 102 in the foregoing method embodiment.

The second processing module 303 may be configured to implement the method shown in step 103 in the foregoing method embodiment.

The third processing module 304 may be configured to implement the method shown in step 104 in the foregoing method embodiment.

The control module 305 may be configured to implement the method shown in step 105 in the above method embodiment.

Wherein the control model is initialized based on the control theory data.

Fig. 17 is a schematic structural diagram of another control apparatus for an intelligent device according to an embodiment of the present invention, and as shown in fig. 17, the apparatus may further include:

the second obtaining module 306 may be configured to implement the method shown in step 201a in the foregoing method embodiment.

The first training module 307 may be configured to implement the method shown in step 202a in the foregoing method embodiment.

Optionally, as shown in fig. 17, the apparatus may further include:

the third obtaining module 308 may be configured to implement the method shown in step 201b in the foregoing method embodiment.

The second training module 309 may be configured to implement the method shown in step 202b in the above method embodiment.

Optionally, as shown in fig. 17, the apparatus may further include:

the initialization module 310 may be configured to implement the method shown in step 201c in the foregoing method embodiment.

The fourth obtaining module 311 may be configured to implement the method shown in step 202c in the foregoing method embodiment.

The third training module 312 may be configured to implement the method shown in step 203c in the above method embodiment.

Fig. 18 is a schematic structural diagram of a control apparatus of another intelligent device according to an embodiment of the present invention, and referring to fig. 18, the apparatus may further include:

the fifth obtaining module 313 may be configured to implement the method shown in step 106 in the foregoing method embodiment.

The determining module 314 may be configured to implement the method shown in step 107 in the above method embodiment.

The adjusting module 315 may be configured to implement the method shown in step 108 in the foregoing method embodiment.

Optionally, the control model may include: a control submodel for calculating the weight, and one or more calculation submodels for calculating the control parameter; the third processing module 304 may be configured to implement the methods shown in steps 1041c to 1044c in the foregoing method embodiments.

Optionally, the adjusting module 315 may be configured to: and inputting the new state data and the target task into an evaluation model to obtain the control effect of the control parameter.

Optionally, the intelligent device is an autonomous vehicle or an intelligent robot.

In summary, embodiments of the present invention provide a control device for an intelligent device, where the device may input the acquired detection data and the target task into a sensing model, so as to obtain representative detection data associated with the target task. The target tasks and the representative test data may then be input to a planning model, resulting in target state data. The target state data and the representative detection data may then be input to a control model to derive control parameters for controlling the smart device. Finally, the smart device may be controlled based on the control parameter. Because the control model is obtained based on the initialization of the control theoretical data, the control theoretical data can directly express and reflect the control rule and principle of the intelligent equipment, and compared with the method of directly adopting the training sample to train in the related technology, the control model not only reduces the dependence of the control model on the training sample, improves the training efficiency, but also can ensure the control effect on the intelligent equipment.

The embodiment of the invention also provides a control device of the intelligent equipment. As shown in fig. 19, the control device may include: a processor 1201 (e.g., a CPU), memory 1202, a network interface 1203, and a bus 1204. The bus 1204 is used for connecting the processor 1201, the memory 1202, and the network interface 1203. The Memory 1202 may include a Random Access Memory (RAM) or a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the server and the communication device is realized through a network interface 1203 (which may be wired or wireless). The memory 1202 stores therein a computer program 12021, and the computer program 12021 is used to realize various application functions. The processor 1201 may be configured to execute the computer program 12021 stored in the memory 1202 to implement the control method of the smart device provided by the above method embodiment.

The embodiment of the present invention further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium runs on a computer, the computer is enabled to execute the method for controlling an intelligent device according to the above method embodiment.

The embodiment of the present invention further provides a computer program product containing instructions, which, when running on a computer, causes the computer to execute the method for controlling an intelligent device provided in the above method embodiment.

An embodiment of the present invention further provides an intelligent device, which may include a control apparatus as shown in any one of fig. 16 to 19.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product comprising one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device including one or more available media integrated servers, data centers, and the like. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium, or a semiconductor medium (e.g., solid state disk), among others.

The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A control method of an intelligent device, the method comprising:

after receiving an execution instruction for a target task, acquiring detection data, wherein the detection data comprises environmental data of the surrounding environment of the intelligent equipment and state data of the intelligent equipment;

inputting the detection data and the target task into a sensing model to obtain representative detection data associated with the target task;

inputting the target task and the representative detection data into a planning model to obtain target state data, wherein the target state data is used for indicating the state required to be reached by the intelligent equipment;

inputting the target state data and part or all of the representative detection data into a control model to obtain control parameters for controlling the intelligent equipment;

controlling the intelligent device to execute the target task based on the control parameter;

and the control model is obtained by initialization based on control theory data.

2. The method of claim 1,

the perception model is obtained by training based on a deep learning mode.

3. The method of claim 1,

the planning model is obtained by training based on a reinforcement learning mode.

4. The method of claim 1,

the control model is obtained by training based on a reinforcement learning mode.

5. The method of claim 2, wherein prior to receiving the execution instruction for the target task, the method further comprises:

acquiring detection sample data and representative detection sample data associated with a specified task, wherein the detection sample data comprises environmental sample data of the surrounding environment of the intelligent equipment when the intelligent equipment executes the specified task and state sample data of the intelligent equipment;

and training an initial perception model by adopting the detection sample data, the designated task and the representative detection sample data based on a deep learning mode to obtain the perception model.

6. The method of claim 3, wherein prior to receiving the execution instruction for the target task, the method further comprises:

obtaining representative detection sample data and effect value sample data associated with a specified task;

and training an initial planning model by adopting the representative detection sample data, the specified task and the effect value sample data based on a reinforcement learning mode to obtain the planning model.

7. The method of claim 4, wherein prior to receiving the execution instruction for the target task, the method further comprises:

initializing an initial control model based on the control theory data;

acquiring partial or all representative detection sample data, target state sample data and effect value sample data associated with the specified task;

and training the initial control model by adopting the obtained representative detection sample data, the target state sample data and the effect value sample data based on a reinforcement learning mode to obtain the control model.

8. The method of claim 4, wherein the control model comprises: a control submodel for calculating weights, and one or more calculation submodels for calculating the control parameters; before receiving an execution instruction for a target task, the method further comprises:

training an initial control submodel by adopting the obtained representative detection sample data, the target state sample data and the effect value sample data based on a reinforcement learning mode to obtain the control submodel;

determining each of the calculation submodels based on the control theory data.

9. The method of claim 8, wherein said inputting said target state data, and some or all of said representative sensed data, into a control model, resulting in control parameters for controlling said smart device, comprises:

acquiring a group of target input data corresponding to each calculation submodel from the target state data and part or all of the representative detection data;

respectively inputting each group of target input data to a corresponding calculation sub-model to obtain values of control parameters corresponding to each group of target input data;

inputting the target state data and part or all of the representative detection data into the control submodel to obtain a group of weights;

and determining the target value of the control parameter according to the group of weights and the values of the control parameter corresponding to each group of target input data.

10. The method according to any one of claims 1 to 9, further comprising:

after the intelligent device is controlled based on the control parameters, acquiring new state data of the intelligent device;

determining a control effect according to the new state data and the target task;

and adjusting parameters of one or more of the perception model, the planning model and the control model according to the control effect.

11. The method according to any one of claims 1 to 9, wherein the smart device is an autonomous vehicle or a smart robot.

12. An apparatus for controlling a smart device, the apparatus comprising:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring detection data after receiving an execution instruction aiming at a target task, and the detection data comprises environmental data of the surrounding environment of the intelligent equipment and state data of the intelligent equipment;

the first processing module is used for inputting the detection data and the target task into a sensing model to obtain representative detection data associated with the target task;

the second processing module is used for inputting the target tasks and the representative detection data into a planning model to obtain target state data, and the target state data is used for indicating the state required to be reached by the intelligent equipment;

the third processing module is used for inputting the target state data and part or all of the representative detection data into a control model to obtain control parameters for controlling the intelligent equipment;

the control module is used for controlling the intelligent equipment to execute the target task based on the control parameters;

13. The apparatus of claim 12, further comprising:

the second acquisition module is used for acquiring detection sample data and representative detection sample data associated with the specified task before receiving an execution instruction for the target task, wherein the detection sample data comprises environment sample data of the surrounding environment of the intelligent equipment when the specified task is executed and state sample data of the intelligent equipment;

and the first training module is used for training an initial perception model by adopting the detection sample data, the designated task and the representative detection sample data based on a deep learning mode to obtain the perception model.

14. The apparatus of claim 12, further comprising:

the third acquisition module is used for acquiring representative detection sample data and effect value sample data associated with the specified task before receiving an execution instruction for the target task;

and the second training module is used for training an initial planning model by adopting the representative detection sample data, the designated task and the effect value sample data based on a reinforcement learning mode to obtain the planning model.

15. The apparatus of claim 12, further comprising:

the initialization module is used for initializing an initial control model based on the control theory data before receiving an execution instruction aiming at a target task;

the fourth acquisition module is used for acquiring partial or all representative detection sample data, target state sample data and effect value sample data which are associated with the specified task;

and the third training module is used for training the initial control model by adopting the obtained representative detection sample data, the target state sample data and the effect value sample data based on a reinforcement learning mode to obtain the control model.

16. The apparatus of claim 12, wherein the control model comprises: a control submodel for calculating weights, and one or more calculation submodels for calculating the control parameters; the device further comprises:

the third training module is used for training an initial control submodel by adopting the obtained representative detection sample data, the target state sample data and the effect value sample data based on a reinforcement learning mode to obtain the control submodel;

and the initialization module is used for determining each calculation submodel based on the control theory data.

17. The apparatus of claim 16,

the third processing module is configured to:

18. The apparatus of any one of claims 12 to 17, further comprising:

a fifth obtaining module, configured to obtain new state data of the smart device after controlling the smart device based on the control parameter;

the determining module is used for determining a control effect according to the new state data and the target task;

and the adjusting module is used for adjusting parameters of one or more of the perception model, the planning model and the control model according to the control effect.

19. The apparatus of claim 18, wherein the adjustment module is configured to:

and inputting the new state data and the target task into an evaluation model to obtain the control effect of the control parameter.

20. An apparatus for controlling a smart device, the apparatus comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the control method of the smart device according to any one of claims 1 to 11 when executing the computer program.

21. A computer-readable storage medium having stored therein instructions which, when run on a computer, cause the computer to execute the control method of an intelligent device according to any one of claims 1 to 11.

22. A smart device, characterized in that it comprises an apparatus according to any one of claims 12 to 20.