CN115830399A

CN115830399A - Classification model training method, apparatus, device, storage medium, and program product

Info

Publication number: CN115830399A
Application number: CN202211722251.1A
Authority: CN
Inventors: 王梦琪; 李果; 张璐; 吴广力; 方涵
Original assignee: Guangzhou Woya Technology Co ltd
Current assignee: Guangzhou Woya Technology Co ltd
Priority date: 2022-12-30
Filing date: 2022-12-30
Publication date: 2023-03-21
Anticipated expiration: 2042-12-30
Also published as: CN115830399B

Abstract

The present application relates to a classification model training method, apparatus, device, storage medium and program product, the method comprising: identifying target traffic images corresponding to various different traffic scenes based on the long-tail identification model to obtain an identification result; determining a long-tail traffic image from the target traffic image based on the identification result, wherein the long-tail traffic image is an image in a long-tail traffic scene; training an initial classification model based on the long-tail traffic image to obtain a target classification model, wherein the target classification model is used for object classification. According to the method and the device, the long-tail traffic image in the long-tail traffic scene is determined from the target traffic images corresponding to different traffic scenes through the long-tail recognition model, the target classification model is obtained based on the long-tail traffic image training, and then the target classification model can be used for rapidly classifying the objects in the long-tail traffic scene, so that the classification efficiency under the long-tail traffic scene is improved.

Description

Classification model training method, apparatus, device, storage medium and program product

Technical Field

The present application relates to the field of automatic driving technologies, and in particular, to a classification model training method, apparatus, device, storage medium, and program product.

Background

With the development of the automatic driving technology, the related art has been able to solve most of the problems occurring in daily life. There are still some long-tailed traffic scenarios that are difficult to encounter in daily life, such as fragmented scenarios, extreme situations, and unpredictable human behavior. These long tailed traffic scenarios, even if rare, are based on the long tailed effect, but the cumulative total has posed a significant threat to the safety of autonomous driving. The object classification is a very important link in automatic driving, tasks such as vehicle behavior prediction, trajectory planning and the like are all relevant to the result of the object classification, and the tasks are to accurately predict the types of objects detected in a road, such as vehicles, pedestrians, bicycles and the like. Therefore, how to solve the problem of object classification in a long-tail traffic scene is an important research direction for automatic driving.

In the traditional technology, different rule conditions are set for different long-tail traffic scenes, and target classification processing is respectively carried out. However, the long-tail traffic scenes are rich in types and numerous in number, and although the method can effectively solve the problem of target classification in a specific long-tail traffic scene, the efficiency is low, and the problem of target classification in various long-tail traffic scenes is difficult to rapidly solve.

Disclosure of Invention

In view of the above, it is necessary to provide a classification model training method, apparatus, device, storage medium, and program product capable of improving classification efficiency in a long-tailed traffic scenario in view of the above technical problems.

In a first aspect, the present application provides a classification model training method. The method comprises the following steps: identifying target traffic images corresponding to various different traffic scenes based on the long-tail identification model to obtain an identification result; determining a long-tail traffic image from the target traffic image based on the recognition result, wherein the long-tail traffic image is an image in a long-tail traffic scene; training the initial classification model based on the long-tail traffic image to obtain a target classification model, wherein the target classification model is used for object classification.

In one embodiment, the long tail recognition model is used for recognizing the image in the long tail traffic scene with the object class as the target class; determining a long-tail traffic image from the target traffic image based on the recognition result, comprising: and acquiring a long-tail traffic image of which the identification result indicates that the object type in the image is the target type from the target traffic image.

In one embodiment, training the initial classification model based on the long-tailed traffic image to obtain a target classification model includes: performing target image processing on an original traffic image corresponding to the long-tail traffic image, wherein the target image processing is image processing related to an upstream task of the object classification task; and training the initial classification model based on the image characteristics obtained by processing the target image to obtain a target classification model.

In one embodiment, before performing the target image processing on the original traffic image corresponding to the long-tail traffic image, the method further includes: splicing the long-tail traffic images belonging to the same traffic scene to obtain at least one long-tail image sequence; for each long-tail image sequence, determining an upstream task according to the traffic scene to which the long-tail image sequence belongs, and determining an image processing category according to the upstream task; correspondingly, the target image processing is carried out on the original traffic image corresponding to the long-tail traffic image, and the target image processing comprises the following steps: and for each long-tail image sequence, performing target image processing on the original traffic image corresponding to the long-tail image sequence based on the corresponding image processing type.

In one embodiment, training the initial classification model based on image features obtained by processing the target image to obtain the target classification model includes: determining an object class label corresponding to the image feature based on the image feature and the recognition result; and training the initial classification model based on the image characteristics and the object class labels corresponding to the image characteristics to obtain a target classification model.

In one embodiment, the image feature includes a feature detection box, the feature detection box is used for identifying an object in an original traffic image, the identification result includes an identification detection box, the identification detection box is used for identifying an object belonging to a target category in a long-tailed traffic image, and an object category label corresponding to the image feature is determined based on the image feature and the identification result, including: matching the characteristic detection frame and the identification detection frame; and if the certain characteristic detection frame is matched with the identification detection frame, taking the target class as an object class label corresponding to the certain characteristic detection frame.

In one embodiment, the matching process for the feature detection box and the recognition detection box includes: mapping the characteristic detection frame to a coordinate system where the identification detection frame is located based on the coordinate conversion matrix; and performing matching processing according to the overlapping degree between the mapped feature detection frame and the identification detection frame.

In one embodiment, before the traffic images corresponding to various traffic scenes are identified based on the long-tail identification model and the identification result is obtained, the method further includes: the method comprises the steps of obtaining long-tail sample data, wherein the long-tail sample data comprises a long-tail sample image and a sample detection frame, the long-tail sample image is a sample image in a long-tail traffic scene, a sample object in the long-tail sample image is a target category, and the sample detection frame is used for identifying the sample object in the long-tail sample image; and training the initial recognition model based on the long-tail sample data to obtain the long-tail recognition model.

In a second aspect, the application further provides a classification model training device. The device comprises: the identification module is used for identifying and processing target traffic images corresponding to various different traffic scenes based on the long-tail identification model to obtain an identification result; the determining module is used for determining a long-tail traffic image from the target traffic image based on the recognition result, wherein the long-tail traffic image is an image in a long-tail traffic scene; and the training module is used for training the initial classification model based on the long-tail traffic image to obtain a target classification model, and the target classification model is used for object classification.

In one embodiment, the long tail recognition model is used for recognizing the image in the long tail traffic scene with the object class as the target class; the determining module is specifically used for acquiring the long-tail traffic image of which the identification result indicates that the object type in the image is the target type from the target traffic image.

In one embodiment, the training module is specifically configured to perform target image processing on an original traffic image corresponding to a long-tailed traffic image, where the target image processing is image processing related to an upstream task of an object classification task; and training the initial classification model based on the image characteristics obtained by processing the target image to obtain a target classification model.

In one embodiment, the apparatus further comprises: the splicing module is used for splicing the long-tail traffic images belonging to the same traffic scene to obtain at least one long-tail image sequence; the image processing type determining module is used for determining an upstream task according to the traffic scene to which the long-tail image sequence belongs and determining an image processing type according to the upstream task for each long-tail image sequence; correspondingly, the training module is further configured to perform target image processing on the original traffic image corresponding to the long-tail image sequence based on the corresponding image processing category for each long-tail image sequence.

In one embodiment, the training module is further configured to determine an object class label corresponding to the image feature based on the image feature and the recognition result; and training the initial classification model based on the image characteristics and the object class labels corresponding to the image characteristics to obtain a target classification model.

In one embodiment, the image features comprise a feature detection frame, the feature detection frame is used for identifying an object in an original traffic image, the identification result comprises an identification detection frame, the identification detection frame is used for identifying the object belonging to a target category in a long-tailed traffic image, and the training module is further used for matching the feature detection frame and the identification detection frame; and if the certain characteristic detection frame is matched with the identification detection frame, taking the target class as an object class label corresponding to the certain characteristic detection frame.

In one embodiment, the training module is further configured to map the feature detection frame to a coordinate system where the recognition detection frame is located based on the coordinate transformation matrix; and performing matching processing according to the overlapping degree between the mapped feature detection frame and the identification detection frame.

In one embodiment, the apparatus further comprises: the long-tail identification model training module is used for acquiring long-tail sample data, the long-tail sample data comprises a long-tail sample image and a sample detection frame, the long-tail sample image is a sample image in a long-tail traffic scene, a sample object in the long-tail sample image is a target category, and the sample detection frame is used for identifying the sample object in the long-tail sample image; and training the initial recognition model based on the long-tail sample data to obtain the long-tail recognition model.

In a third aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the method of any of the first aspect above when the computer program is executed.

In a fourth aspect, the present application further provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of the method of any of the above-mentioned first aspects.

In a fifth aspect, the present application further provides a computer program product. The computer program product comprising a computer program that when executed by a processor implements the steps of the method of any of the first aspects described above.

According to the classification model training method, the device, the equipment, the storage medium and the program product, the long-tail identification model is used for identifying and processing target traffic images corresponding to various different traffic scenes to obtain an identification result, the long-tail traffic images are determined from the target traffic images based on the identification result, the long-tail traffic images are images in the long-tail traffic scenes, the initial classification model is trained based on the long-tail traffic images to obtain the target classification model, and the target classification model is used for object classification.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating a classification model training method according to an embodiment;

FIG. 2 is a schematic flow chart diagram of another classification model training method in one embodiment;

FIG. 3 is a block diagram of an exemplary classification model training apparatus;

FIG. 4 is a block diagram showing another example of a classification model training apparatus according to another embodiment;

FIG. 5 is a block diagram showing an exemplary embodiment of an apparatus for training classification models;

FIG. 6 is a diagram of the internal structure of a computer device, in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

The target classification is a very important link in automatic driving, because tasks such as behavior prediction and trajectory planning of vehicles are closely related to the result of the target classification, the technology of the target classification of common traffic scenes is mature at present, but the target classification of long-tailed traffic scenes still needs to be researched. At present, aiming at long-tail traffic scenes, one method is to set different rule conditions aiming at different long-tail traffic scenes and carry out target classification treatment respectively, however, the long-tail traffic scenes are rich in types and numerous in quantity, and although the method can effectively solve the target classification problem under a certain specific long-tail traffic scene, the efficiency is low, and the target classification problem under various long-tail traffic scenes is difficult to rapidly solve; the other is to solve the problem of long-tailed traffic scene through vehicle-road cooperation, but the vehicle-road cooperation has higher requirements for infrastructure hardware facilities and has larger time delay, and certain difficulty still exists, so that a technical means capable of efficiently, effectively and inexpensively classifying the long-tailed traffic scene is needed to be provided.

In an embodiment, as shown in fig. 1, a flowchart of a classification model training method is provided, and this embodiment is illustrated by applying the method to a server. The method comprises the following steps:

and 101, identifying target traffic images corresponding to various traffic scenes based on a long-tail identification model to obtain identification results.

The Long-tail recognition model is a neural network classification model trained based on Long-tail sample data, and specifically may be a trained LSTM (Long Short-Term Memory) model, and the Long-tail recognition model is capable of accurately recognizing images of a target category from a plurality of images, where the target category is a category learned by the Long-tail recognition model when the Long-tail recognition model is trained. A long tail recognition model can recognize images of one type and also can recognize images of multiple types. If the long tail identification model can only identify one type of image, a plurality of long tail identification models can be provided in the embodiment of the application; if the long tail recognition model can recognize multiple types of images, the number of the long tail recognition models in the embodiment of the application may be one.

Various traffic scenes comprise common traffic scenes and long-tail traffic scenes, wherein the long-tail traffic scenes comprise the situations that people walk with an umbrella, people move boxes behind cars, trees fall in the center of a road and the like; the target traffic image refers to any image frame in a video corresponding to a traffic scene, and one traffic scene corresponds to a plurality of target traffic images.

Optionally, each traffic scene corresponds to a unique scene ID (Identity document), the server inputs a plurality of target traffic images corresponding to various different traffic scenes into the long-tailed recognition model, and for each target traffic image, the long-tailed recognition model determines whether a learned object exists in the target traffic image, if so, the target traffic image is labeled with a recognition detection frame, and then the target traffic image with the recognition detection frame is output, and a target category corresponding to the object in the recognition detection frame is also output, that is, the recognition result includes: a target traffic image with an identification detection frame and a target category; if not, the target traffic image is not marked by the identification detection frame, the target traffic image with the identification detection frame is not output, and the target type corresponding to the object in the identification detection frame is also output.

The method comprises the steps that a large amount of traffic scene data can be stored in the automatic driving vehicle during drive test, only a small amount of long-tail traffic scenes can be manually marked and recorded, and a large amount of workload of the rest of long-tail traffic scenes can be ignored, so that long-tail sample data corresponding to the small amount of manually marked long-tail traffic scenes are used for training to obtain a long-tail recognition model, target traffic images corresponding to various different traffic scenes are recognized by the long-tail recognition model, the target traffic images corresponding to the long-tail traffic scenes can be recognized from the target traffic images corresponding to the various different traffic scenes, the purpose of excavating the target traffic images corresponding to the long-tail traffic scenes from the target traffic images corresponding to the various different traffic scenes is achieved, the data size is increased for input of the training target classification model, and therefore the accuracy of the target classification model can be improved.

And 102, determining a long-tail traffic image from the target traffic image based on the recognition result, wherein the long-tail traffic image is an image in a long-tail traffic scene.

Optionally, according to step 101, the recognition result includes a target traffic image with a recognition detection frame, and the target category, because the long-tailed recognition model is trained based on the long-tailed sample data, the long-tailed recognition model can only recognize the long-tailed traffic scene in various traffic scenes, so that the target traffic image with the recognition detection frame output by the long-tailed recognition model is the long-tailed traffic image.

And 103, training the initial classification model based on the long-tail traffic image to obtain a target classification model, wherein the target classification model is used for object classification.

The initial classification model is a neural network classification model, which may be an LSTM model, and accordingly, the target classification model is a trained neural network classification model, which may be a trained LSTM model.

Optionally, the long-tail recognition model can recognize the long-tail traffic images of the target categories from the target traffic images corresponding to various different traffic scenes, and then the long-tail traffic images and the target categories corresponding to the long-tail traffic images are input into the initial classification model together for training, so that the obtained target classification model can classify the long-tail traffic images corresponding to the long-tail traffic scenes.

Optionally, the initial classification model may be trained based on the long-tailed traffic image and the common traffic image corresponding to the common traffic scene to obtain a target classification model, and the target classification model is used for object classification. Specifically, the long-tail traffic image and the target category corresponding to the long-tail traffic image, and the common traffic image and the target category corresponding to the common traffic image are input into the initial classification model together for training, so that the obtained target classification model can classify the traffic images corresponding to various traffic scenes. It should be noted that the common traffic image and the target category corresponding to the common traffic image may be obtained from the historical traffic data repository.

In addition, it should be noted that the target classification model may be applied to an autonomous vehicle for classifying an object in front of the vehicle based on the target classification model.

In summary, the long-tail traffic images are determined from the target traffic images corresponding to different traffic scenes based on the long-tail recognition model, the initial classification model is trained based on the long-tail traffic images to obtain the target classification model, and the target classification model is used for object classification.

In one embodiment, the long-tail identification model is used for identifying the image in the long-tail traffic scene with the object class as the target class in the image; determining a long-tail traffic image from the target traffic image based on the recognition result, comprising: and acquiring a long-tail traffic image of which the identification result indicates that the object type in the image is the target type from the target traffic image.

The long-tail traffic image input to the long-tail recognition model is an image with a detection frame and a target category corresponding to the object in the detection frame in the training stage of the long-tail recognition model. Therefore, the long tail recognition model can recognize the image in the long tail traffic scene with the object class as the target class in the image.

Optionally, after the target traffic images corresponding to various different traffic scenes are identified based on the long-tail identification model to obtain the category of each target traffic image, the categories of the target traffic images are selected as the target traffic images of the target categories, and then the selected target traffic images are the long-tail traffic images.

The original traffic image refers to an image extracted from the historical traffic data storage according to the traffic scene ID corresponding to the long-tail traffic image. The image processing involved in the upstream task refers to detecting and segmenting an image, and the like. The image features include a feature detection frame, a target speed, a target pose, a size, point cloud coordinates, and the like, and it should be noted that each original traffic image corresponds to one image feature.

Optionally, image processing related to upstream tasks such as detection and segmentation is performed according to the original traffic image, map information, the position of the vehicle, internal and external parameters of the sensor, and the like, so as to obtain image features, and then the long-tailed traffic image obtained in step 102 and the image features in the embodiment are input into the initial classification model for training, so as to obtain a target classification model.

In the target classification task of the automatic driving system, the input of the target classification task is not directly from original scene data, such as observation of a radar and a camera, but from output after processing of an upstream task, so that in the embodiment, target image processing is performed on an original traffic image to simulate image processing related to the upstream task, so that image features obtained through the target image processing are consistent with actual image features, and further, a target classification model obtained by training an initial classification model through the image features can be used for accurately classifying objects in the image, and the aim of improving the accuracy of target classification of a long-tail traffic scene is fulfilled.

In one embodiment, before the target image processing is performed on the original traffic image corresponding to the long-tail traffic image, the method further includes: splicing the long-tail traffic images belonging to the same traffic scene to obtain at least one long-tail image sequence; for each long-tail image sequence, determining an upstream task according to the traffic scene to which the long-tail image sequence belongs, and determining an image processing category according to the upstream task; correspondingly, the target image processing is carried out on the original traffic image corresponding to the long-tail traffic image, and the target image processing comprises the following steps: and for each long-tail image sequence, performing target image processing on the original traffic image corresponding to the long-tail image sequence based on the corresponding image processing type.

Optionally, in the training stage of the long-tail recognition model, the long-tail traffic image input to the long-tail recognition model is an image with a detection frame and a target category corresponding to an object in the detection frame, so that after the target traffic images corresponding to various different traffic scenes are input to the long-tail recognition model, the long-tail recognition model can output the long-tail traffic image, the recognition detection frame and the target category corresponding to the object in the long-tail traffic image, wherein the target traffic images all have IDs, and the IDs of the target traffic images belonging to the same traffic scene are the same. After the long-tail traffic image, and the identification detection frame and the target category corresponding to the object in the long-tail traffic image are obtained, the long-tail traffic images belonging to the same ID are spliced according to the time stamp to obtain at least one initial long-tail image sequence, wherein each initial long-tail image sequence comprises the ID, the time stamp, the identification detection frame, the target category and the like. And then, carrying out manual examination on the initial long-tail image, and removing the initial long-tail image sequence obtained by splicing the long-tail identification models which identify the common traffic image as the long-tail traffic image by mistake so as to obtain at least one long-tail image sequence, wherein each long-tail image sequence also comprises an ID (identity), a timestamp, an identification detection frame, a target category and the like.

Optionally, after each long-tail image sequence is obtained, according to the ID of the long-tail image sequence, the original traffic image of the traffic scene to which the long-tail image sequence belongs and the data record of the upstream task for processing the original traffic image are extracted from the historical traffic data repository, and then the image processing category is determined according to the data record, where the image processing category includes detection, segmentation, and the like, so that the original traffic image corresponding to each long-tail image sequence can be subjected to target image processing according to the image category to obtain the image features of the image which conforms to the processing of the upstream task.

In one embodiment, the matching process for the feature detection box and the recognition detection box includes: mapping the characteristic detection frame to a coordinate system where the identification detection frame is located based on the coordinate conversion matrix; and matching according to the overlapping degree between the mapped feature detection frame and the mapped identification detection frame.

The overlap degree (Intersection over Union, ioU) is a standard for measuring the accuracy of detecting a corresponding object in a specific data set, and the overlap degree between a mapped feature detection frame and an identification detection frame is specifically the overlap degree obtained by calculating the area of the mapped feature detection frame and the area of the identification detection frame, and the area of the mapped feature detection frame is divided by the area of the identification detection frame and then multiplied by 100%.

Optionally, the image characteristics obtained by performing the target image processing on the original traffic image include: the method comprises the following steps of detecting a characteristic detection frame corresponding to an object in an original traffic image, and identifying and processing target traffic images corresponding to various traffic scenes based on a long-tail identification model to obtain identification results, wherein the identification results comprise the following steps: the method comprises the steps of obtaining a long-tail traffic image, and identifying and detecting frames and target classes corresponding to objects in the long-tail traffic image. And calculating a coordinate transformation matrix according to vehicle information and camera information, wherein the vehicle information comprises the pose of the vehicle and the like, and the camera information comprises parameters of the camera, the position of the camera relative to the vehicle and the like. And then, converting the coordinates of the feature detection frame from the global coordinate system to an image coordinate system according to the coordinate conversion matrix, so that the feature detection frame and the identification detection frame are positioned in the same coordinate system. And calculating the overlapping degree of the characteristic detection frame and the identification detection frame, if the overlapping degree is greater than or equal to a specific threshold value, indicating that the characteristic detection frame is matched with the identification detection frame, and if the overlapping degree is less than the specific threshold value, indicating that the characteristic detection frame is not matched with the identification detection frame. And regarding the matched feature detection frame and the recognition detection frame, taking the target class as an object class label corresponding to the feature detection frame. And finally, inputting the image features and the object class labels corresponding to the image features and the common image features and the object class labels corresponding to the common image features corresponding to the common traffic scenes into the initial classification model for training to obtain a target classification model. Common image features corresponding to common traffic scenes and object class labels corresponding to the common image features can be acquired from a historical traffic data repository.

Optionally, the matching processing according to the overlapping degree between the mapped feature detection frame and the recognition detection frame includes: aiming at each long-tail traffic scene, at least one feature detection frame and at least one identification detection frame are provided, for each feature detection frame, the overlapping degree of the feature detection frame and the at least one identification detection frame is calculated, and if at least one overlapping degree is larger than or equal to a specific threshold value, the feature detection frame is matched with the identification detection frame; if all the overlapping degrees are smaller than the specific threshold value, the characteristic detection frame is not matched with the identification detection frame.

Because the recognition result obtained by data mining through the long-tail recognition model is not manually labeled, the situation that the recognition result is wrong may exist, the long-tail traffic image obtained by directly using the recognition result and the corresponding target class are used as object class labels required by training the initial classification model, and the accuracy of the trained target classification model is low. Therefore, matching processing between the feature detection frame and the recognition detection frame is required to obtain the object class label corresponding to the image feature, and the object class label corresponding to the image feature is used as the object class label required for training the initial classification model, so that the purpose of improving the accuracy of the target classification model can be achieved.

In one embodiment, before the traffic images corresponding to various traffic scenes are identified based on the long-tail identification model and the identification result is obtained, the method further includes: the method comprises the steps of obtaining long-tail sample data, wherein the long-tail sample data comprise a long-tail sample image and a sample detection frame, the long-tail sample image is a sample image in a long-tail traffic scene, a sample object in the long-tail sample image is a target category, and the sample detection frame is used for identifying the sample object in the long-tail sample image; and training the initial recognition model based on the long-tail sample data to obtain the long-tail recognition model.

The initial recognition model is a neural network classification model, which can be an LSTM model, and correspondingly, the long-tail recognition model is a trained neural network classification model, which can be a trained LSTM model.

Optionally, the long-tail sample image is manually labeled and recorded in advance, and can be obtained from a historical traffic data repository. The sample detection frame is obtained through at least one interactive retrieval mode of active learning, multivariate retrieval and manual marking. And taking the long-tail sample image with the sample detection frame and the target category of the sample object in the sample detection frame as the input of the initial recognition model to obtain the long-tail recognition model.

In summary, as shown in fig. 2, a flow diagram of another classification model training method is provided, in which long-tail sample data is first obtained, a long-tail recognition model is obtained based on the long-tail sample data training, data mining is performed by using the long-tail recognition model, and a long-tail traffic image, a recognition detection frame, a target category, and the like are obtained from target traffic images corresponding to various traffic scenes. And splicing the long-tail traffic images of the same traffic scene to obtain at least one long-tail image sequence. And then for each long-tail image sequence, determining an upstream task according to the traffic scene to which the long-tail image sequence belongs, determining an image processing category according to the upstream task, and performing target image processing on the original traffic image corresponding to the long-tail image sequence based on the corresponding image processing category to obtain image characteristics, wherein the image characteristics comprise a characteristic detection frame, a target speed, a target pose, a size, point cloud coordinates and the like. And then mapping the feature detection frame to a coordinate system where the identification detection frame is located based on the coordinate transformation matrix, performing matching processing according to the overlapping degree between the mapped feature detection frame and the identification detection frame, and if a certain feature detection frame is matched with the identification detection frame, using the target category as an object category label corresponding to the certain feature detection frame, namely, associating the target category of the long-tailed traffic image obtained by data mining with the feature detection frame after target image processing. And finally, performing model training by taking the image features corresponding to the matched feature detection frames and the object class labels corresponding to the feature detection frames, and the common image features and the object class labels corresponding to the common image features, which are acquired from the historical traffic database, as input of an initial classification model to obtain a target classification model for classifying objects, wherein in order to obviously distinguish the image features corresponding to the matched feature detection frames from the common image features, the image features corresponding to the matched feature detection frames can be called long-tail image features. Based on the mode, the data volume and the richness of model training input can be improved, so that the target classification model can perform target classification on various traffic scenes, and the classification efficiency is improved. And this application is compared with the mode that long tail traffic scene was solved to car road collaborative mode, does not need car road collaborative required basic hardware facility and cost of labor, has realized effectively, low-cost solution long tail traffic scene, and has higher enforceability and expansibility.

It should be understood that, although the steps in the flowcharts related to the embodiments as described above are sequentially displayed as indicated by arrows, the steps are not necessarily performed sequentially as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a part of the steps in the flowcharts related to the embodiments described above may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, and the execution order of the steps or stages is not necessarily sequential, but may be rotated or alternated with other steps or at least a part of the steps or stages in other steps.

Based on the same inventive concept, the embodiment of the present application further provides a classification model training apparatus for implementing the above-mentioned classification model training method. The implementation scheme for solving the problem provided by the device is similar to the implementation scheme recorded in the method, so the specific limitations in one or more embodiments of the classification model training device provided below can be referred to the limitations of the classification model training method in the above, and details are not repeated here.

In one embodiment, as shown in fig. 3, a block diagram of a classification model training apparatus 300 is provided, and the classification model training apparatus 300 includes: an identification module 301, a determination module 302, and a training module 303, wherein:

the identification module 301 is configured to perform identification processing on target traffic images corresponding to various traffic scenes based on the long-tail identification model to obtain an identification result.

The determining module 302 is configured to determine a long-tail traffic image from the target traffic image based on the recognition result, where the long-tail traffic image is an image in a long-tail traffic scene.

The training module 303 is configured to train the initial classification model based on the long-tail traffic image to obtain a target classification model, where the target classification model is used for object classification.

In one embodiment, the long tail recognition model is used for recognizing the image in the long tail traffic scene with the object class as the target class; the determining module 302 is specifically configured to obtain, from the target traffic image, a long-tailed traffic image in which the recognition result indicates that the category of the object in the image is the target category.

In one embodiment, the training module 303 is specifically configured to perform target image processing on an original traffic image corresponding to a long-tail traffic image, where the target image processing is image processing related to an upstream task of an object classification task; and training the initial classification model based on the image characteristics obtained by processing the target image to obtain a target classification model.

In one embodiment, as shown in fig. 4, a block diagram of another classification model training apparatus is provided, and the classification model training apparatus 300 further includes: the splicing module 401 is configured to splice long-tail traffic images belonging to the same traffic scene to obtain at least one long-tail image sequence; the image processing type determining module 402 is configured to determine, for each long-tail image sequence, an upstream task according to a traffic scene to which the long-tail image sequence belongs, and determine an image processing type according to the upstream task; correspondingly, the training module 303 is further configured to, for each long-tailed image sequence, perform target image processing on the original traffic image corresponding to the long-tailed image sequence based on the corresponding image processing category.

In one embodiment, the training module 303 is further configured to determine an object class label corresponding to the image feature based on the image feature and the recognition result; and training the initial classification model based on the image features and the object class labels corresponding to the image features to obtain a target classification model.

In one embodiment, the image features include a feature detection frame, the feature detection frame is used for identifying an object in an original traffic image, the identification result includes an identification detection frame, the identification detection frame is used for identifying an object belonging to a target category in a long-tail traffic image, and the training module 303 is further used for performing matching processing on the feature detection frame and the identification detection frame; and if the certain characteristic detection frame is matched with the identification detection frame, taking the target category as an object category label corresponding to the certain characteristic detection frame.

In one embodiment, the training module 303 is further configured to map the feature detection frame to a coordinate system where the identification detection frame is located based on the coordinate transformation matrix; and performing matching processing according to the overlapping degree between the mapped feature detection frame and the identification detection frame.

In one embodiment, as shown in fig. 5, a block diagram of a classification model training apparatus 300 is provided, where the classification model training apparatus 300 further includes: the long-tail identification model training module 501 is configured to acquire long-tail sample data, where the long-tail sample data includes a long-tail sample image and a sample detection box, the long-tail sample image is a sample image in a long-tail traffic scene, a sample object in the long-tail sample image is a target category, and the sample detection box is used to identify the sample object in the long-tail sample image; and training the initial recognition model based on the long-tail sample data to obtain the long-tail recognition model.

The modules in the classification model training device can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure thereof may be as shown in fig. 6. The computer device includes a processor, a memory, an Input/Output interface (I/O for short), and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing original traffic images, long tail sample data and the like. The input/output interface of the computer device is used for exchanging information between the processor and an external device. The communication interface of the computer device is used for connecting and communicating with an external terminal through a network. The computer program is executed by a processor to implement a classification model training method.

Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is further provided, which includes a memory and a processor, the memory stores a computer program, and the processor implements the steps of the above method embodiments when executing the computer program.

In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

In an embodiment, a computer program product is provided, comprising a computer program which, when being executed by a processor, carries out the steps of the above-mentioned method embodiments.

It should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, displayed data, etc.) referred to in the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the relevant laws and regulations and standards of the relevant countries and regions.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high-density embedded nonvolatile Memory, resistive Random Access Memory (ReRAM), magnetic Random Access Memory (MRAM), ferroelectric Random Access Memory (FRAM), phase Change Memory (PCM), graphene Memory, and the like. Volatile Memory can include Random Access Memory (RAM), external cache Memory, and the like. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others. The databases involved in the embodiments provided herein may include at least one of relational and non-relational databases. The non-relational database may include, but is not limited to, a block chain based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic devices, quantum computing based data processing logic devices, etc., without limitation.

The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims

1. A classification model training method, the method comprising:

identifying target traffic images corresponding to various different traffic scenes based on the long-tail identification model to obtain an identification result;

determining a long-tail traffic image from the target traffic image based on the identification result, wherein the long-tail traffic image is an image in a long-tail traffic scene;

training an initial classification model based on the long-tail traffic image to obtain a target classification model, wherein the target classification model is used for object classification.

2. The method according to claim 1, wherein the long tail recognition model is used for recognizing the image in the long tail traffic scene with the object in the image being in the target class; the determining the long tail traffic image from the target traffic image based on the recognition result comprises:

and acquiring the long-tail traffic image of which the identification result indicates that the class of the object in the image is the target class from the target traffic image.

3. The method of claim 1, wherein training an initial classification model based on the long tailed traffic image to obtain a target classification model comprises:

performing target image processing on an original traffic image corresponding to the long-tail traffic image, wherein the target image processing is image processing related to an upstream task of an object classification task;

and training the initial classification model based on the image characteristics obtained by processing the target image to obtain the target classification model.

4. The method of claim 3, wherein before the target image processing is performed on the original traffic image corresponding to the long-tail traffic image, the method further comprises:

splicing the long-tail traffic images belonging to the same traffic scene to obtain at least one long-tail image sequence;

for each long-tail image sequence, determining the upstream task according to the traffic scene to which the long-tail image sequence belongs, and determining the image processing category according to the upstream task;

correspondingly, the target image processing of the original traffic image corresponding to the long-tail traffic image includes:

and for each long-tail image sequence, performing the target image processing on the original traffic image corresponding to the long-tail image sequence based on the corresponding image processing category.

5. The method of claim 3, wherein training the initial classification model based on image features obtained by the target image processing to obtain the target classification model comprises:

determining an object class label corresponding to the image feature based on the image feature and the recognition result;

and training the initial classification model based on the image features and the object class labels corresponding to the image features to obtain the target classification model.

6. The method of claim 5, wherein the image feature comprises a feature detection box for identifying an object in the original traffic image, wherein the recognition result comprises a recognition detection box for identifying an object in the long-tailed traffic image belonging to the target category, and wherein determining an object category label corresponding to the image feature based on the image feature and the recognition result comprises:

matching the feature detection frame and the identification detection frame;

and if a certain characteristic detection frame is matched with the identification detection frame, taking the target class as an object class label corresponding to the certain characteristic detection frame.

7. The method according to claim 6, wherein the matching the feature detection box and the recognition detection box comprises:

mapping the characteristic detection frame to a coordinate system where the identification detection frame is located based on a coordinate conversion matrix;

and matching according to the overlapping degree between the mapped feature detection frame and the identification detection frame.

8. The method according to any one of claims 1 to 7, wherein before the identification processing is performed on the traffic images corresponding to various traffic scenes based on the long tail identification model, and the identification result is obtained, the method further comprises:

obtaining long-tail sample data, wherein the long-tail sample data comprises a long-tail sample image and a sample detection box, the long-tail sample image is a sample image in a long-tail traffic scene, a sample object in the long-tail sample image is a target category, and the sample detection box is used for identifying the sample object in the long-tail sample image;

training an initial recognition model based on the long tail sample data to obtain the long tail recognition model.

9. A classification model training apparatus, characterized in that the apparatus comprises:

the identification module is used for identifying and processing target traffic images corresponding to various different traffic scenes based on the long-tail identification model to obtain an identification result;

the determining module is used for determining a long-tail traffic image from the target traffic image based on the recognition result, wherein the long-tail traffic image is an image in a long-tail traffic scene;

and the training module is used for training the initial classification model based on the long-tail traffic image to obtain a target classification model, and the target classification model is used for object classification.

10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the method of any of claims 1 to 8.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 8.

12. A computer program product comprising a computer program, characterized in that the computer program realizes the steps of the method of any one of claims 1 to 8 when executed by a processor.