CN118587622A

CN118587622A - Lightweight target detection method and system based on unmanned aerial vehicle platform

Info

Publication number: CN118587622A
Application number: CN202411073757.3A
Authority: CN
Inventors: 徐鑫; 李奇; 潘洁; 吴海涛; 陈俊美; 杨杰; 张亦卓; 董晓晗; 亓立壮; 李延港; 刘承浩; 孙明正; 刘冲; 周英; 逯行政; 魏宏伟; 朱宏亮; 张昊泽
Original assignee: Qilu Aerospace Information Research Institute; Aerospace Information Research Institute of CAS
Current assignee: Qilu Aerospace Information Research Institute; Aerospace Information Research Institute of CAS
Priority date: 2024-08-07
Filing date: 2024-08-07
Publication date: 2024-09-03
Anticipated expiration: 2044-08-07
Also published as: CN118587622B

Abstract

The invention belongs to the field of target detection, and provides a lightweight target detection method and system based on an unmanned aerial vehicle platform, which are used for solving the technical problem that the target detection technology under the existing unmanned aerial vehicle platform in the background technology cannot simultaneously consider the target detection precision and efficiency. The lightweight target detection method based on the unmanned aerial vehicle platform comprises the steps of acquiring an unmanned aerial vehicle inspection image; processing the unmanned aerial vehicle inspection image by using a lightweight target detection model which is trained and embedded in an unmanned aerial vehicle platform to obtain a target detection result; the lightweight target detection model comprises a backbone network, a path aggregation network and a detection head, and can effectively balance detection precision and speed.

Description

Lightweight target detection method and system based on unmanned aerial vehicle platform

Technical Field

The invention belongs to the field of target detection, and particularly relates to a lightweight target detection method and system based on an unmanned aerial vehicle platform.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

Due to the characteristics of wide visual field range and strong real-time performance of the unmanned aerial vehicle, the unmanned aerial vehicle is gradually an important tool for remote sensing scenes and is widely applied to various fields such as emergency rescue, disaster monitoring and urban traffic management. In the field of target detection based on unmanned aerial vehicles, on one hand, due to the fact that the aircraft flight height is large in change and high in speed, the situation that objects are fuzzy and the proportion of small objects is high frequently occurs in a scene, and therefore the difficulty of an object detection task is increased; on the other hand, unmanned aerial vehicle platform load and energy resource are limited, are difficult to carry high-power consumption, high performance computing equipment and carry out object detection, and current application is usually through network transmission original image to ground processing end processing, is difficult to satisfy real-time detection's requirement. According to the existing two-stage detection method in the target detection technology under the unmanned aerial vehicle platform, candidate frames are generated through end-to-end learning of a regional generation network (Region Proposal Networks, RPN), then feature extraction is carried out on each candidate frame through a deep neural network (such as ResNet, VGG and the like), finally the candidate frames are classified through a classifier and a regressor, fine adjustment is carried out on positions of the boundary frames to accurately fit, and representative algorithms include Faster R-CNN, SPPNet and the like. Although the method has better detection performance, the repeated candidate areas can cause large calculation amount, and the real-time requirement of detection under a low-calculation-force platform is difficult to meet.

The single-stage detection method in the existing unmanned aerial vehicle under-platform target detection technology does not depend on an explicit candidate frame generation stage, and the position and the category of a target, such as a YOLO series, SSD, retinaNet and the like, are directly predicted densely on an input image. Although the method has advantages in speed compared with a two-stage algorithm, the method is difficult to achieve higher precision due to the influence of factors such as target diversity, complex background and the like, so that the conventional method often has false detection and missing detection.

Disclosure of Invention

In order to solve the technical problem that the existing target detection technology under the unmanned aerial vehicle platform in the background technology cannot simultaneously consider the target detection precision and efficiency, the invention provides a lightweight target detection method and system based on the unmanned aerial vehicle platform, which can effectively balance the detection precision and speed.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

the first aspect of the invention provides a lightweight target detection method based on an unmanned aerial vehicle platform.

A lightweight target detection method based on an unmanned aerial vehicle platform, comprising:

acquiring an unmanned aerial vehicle inspection image;

processing the unmanned aerial vehicle inspection image by using a lightweight target detection model which is trained and embedded in an unmanned aerial vehicle platform to obtain a target detection result;

the lightweight target detection model comprises a backbone network, a path aggregation network and a detection head;

the backbone network is used for extracting spatial information and semantic information of a plurality of layers of different scales of the unmanned aerial vehicle inspection image to obtain a plurality of layers of feature images;

The path aggregation network comprises 1*1 convolution layers and a bidirectional connection layer; the 1*1 convolution layer is used for reducing the channel number of the corresponding layer feature map extracted by the backbone network and then transmitting the channel number to the bidirectional connection layer; the bidirectional connection layer is used for connecting the feature images output by the 1*1 convolution layer and the corresponding feature images directly output by the backbone network, and on the basis, adding a cross-scale connection path from bottom to top to obtain a plurality of fusion feature images;

the detection head is used for detecting each fusion characteristic diagram to obtain corresponding result diagrams containing target categories and positions, and the result diagrams are processed through a non-maximum value inhibition method to obtain final target category and position coordinates.

As an implementation manner, the path aggregation network is used for fusing the third layer of feature graphs extracted by the backbone network and the subsequent feature graphs.

As an implementation mode, the backbone network comprises a plurality of feature extraction layers and a spatial pyramid pooling layer; the feature extraction layers are connected in series, and the spatial pyramid pooling layer is connected in series at the rear end of the feature extraction layers connected in series.

A graph depth convolution network is introduced into the feature extraction layer, and the graph depth convolution network processes an input original feature graph to obtain a feature graph containing global information and local space information;

The map depth convolution network comprises two branches, wherein one branch comprises a local feature extraction network and a mixed local channel attention mechanism module, the local feature extraction network is used for extracting local features of an original feature map, and then the mixed local channel attention mechanism module is used for generating a local feature map; the other branch directly transmits the original feature map to a summation calculation module, and the summation calculation module sums and calculates the original feature map and the corresponding local feature map to obtain the feature map containing global information and local space information.

As one implementation mode, the spatial pyramid pooling layer comprises a first lightweight depth separable convolution network, a plurality of pooling convolution kernels connected in series, a splicing layer and a second lightweight depth separable convolution network; the first lightweight depth separable convolution network is used for extracting features of an input feature map, dividing the extracted features into two paths, carrying out step-by-step pooling on one path through a plurality of pooling convolution kernels connected in series and transmitting the pooled result to a splicing layer, and directly transmitting the other path to the splicing layer; the splicing layer is used for splicing all the received characteristics and then transmitting the characteristics to a second light-weight depth separable convolution network for further characteristic extraction.

As an implementation mode, the detection head comprises a decoupling head structure, and the input end of the decoupling head structure is also connected with a partial convolution module with an inverted residual error structure in series;

The partial convolution module of the inverse residual structure is used for: and after normalization and activation function processing, calculating in a high-dimensional space by using a partial convolution network, and finally restoring the channel number through the inverse process of the dimension lifting, and sending the obtained feature map into a decoupling head structure for processing.

As an embodiment, the partial convolution network is configured to: and carrying out convolution operation on part of channels of the feature map, and carrying out splicing operation on the calculated convolution result and the rest channels of the feature map.

As one embodiment, the activation function is an H-Swish function.

A second aspect of the present invention provides a lightweight target detection system based on an unmanned aerial vehicle platform.

In one or more embodiments, a lightweight target detection system based on a drone platform, comprising:

the image acquisition module is used for acquiring an unmanned aerial vehicle inspection image;

The target detection module is used for processing the unmanned aerial vehicle inspection image by utilizing a lightweight target detection model which is trained and embedded in the unmanned aerial vehicle platform to obtain a target detection result;

An image acquisition module and an image processing module;

The image acquisition module is carried on the unmanned plane platform; the image processing module is embedded into the unmanned aerial vehicle platform;

The image acquisition module is used for acquiring the unmanned aerial vehicle inspection image and transmitting the image to the image processing module;

The image processing module is provided with a light-weight target detection model, and the light-weight target detection model is used for processing the unmanned aerial vehicle inspection image to obtain a target detection result;

Compared with the prior art, the invention has the beneficial effects that:

(1) The path aggregation network of the invention utilizes 1*1 convolution layers to reduce the channel number of the corresponding layer feature images extracted by the backbone network, then utilizes the bidirectional connection layer to connect the feature images output by the 1*1 convolution layers and the corresponding feature images directly output by the backbone network, and on the basis, adds a cross-scale connection path from bottom to top to fuse the feature images, thereby realizing the fusion of spatial information in the backbone network shallow feature extraction images, improving the extraction capability of small targets, reducing the calculation amount, and introducing the cross-scale connection path from bottom to top, realizing the fusion of the feature images extracted by the cavity convolution kernels with different sizes, and reducing the influence caused by the size difference.

(2) The image depth convolution network is introduced into the feature extraction layer, and comprises two branches, wherein one branch comprises a local feature extraction network and a mixed local channel attention mechanism module, the other branch directly transmits an original feature image to a summation calculation module, the image depth convolution network integrates the adaptability of variability convolution to the shape and the size of an object, and the extraction capability of the mixed local channel attention mechanism to important information is utilized, so that the problems of interference of a complex background and large object size change are effectively solved.

(3) The decoupling head structure input end of the detection head is also connected with the partial convolution module with the inverted residual error structure in series, so that the detection precision brought by the decoupling head is ensured, the quantity of parameters and the calculated amount are reduced, and the real-time performance of the whole lightweight target detection model is ensured.

Additional aspects of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

FIG. 1 is a flowchart of a lightweight target detection method based on an unmanned aerial vehicle platform according to an embodiment of the invention;

FIG. 2 is a schematic diagram of YOLOv s model structure;

FIG. 3 is a schematic diagram of a DGLFG module structure according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of the structure of an MLCA attention mechanism module according to an embodiment of the invention;

FIG. 5 is a schematic diagram of the structure of a spatial pyramid pooling layer according to embodiments of the present invention;

FIG. 6 is a schematic diagram of a path aggregation network architecture according to an embodiment of the present invention;

FIG. 7 is a schematic diagram of a detecting head according to an embodiment of the present invention;

FIG. 8 is a graph of comparison of detection effects of different target detection models;

fig. 9 is a schematic structural diagram of a lightweight target detection system based on a unmanned aerial vehicle platform according to an embodiment of the present invention.

Detailed Description

The invention will be further described with reference to the drawings and examples.

It should be noted that the following detailed description is illustrative and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the present invention. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.

Fig. 1 is a flowchart of a lightweight target detection method based on an unmanned aerial vehicle platform according to an embodiment of the present invention. According to fig. 1, the lightweight target detection method based on the unmanned aerial vehicle platform according to the embodiment of the invention comprises the following steps:

s101: acquiring an unmanned aerial vehicle inspection image;

s102: and processing the unmanned aerial vehicle inspection image by utilizing a lightweight target detection model which is trained and embedded in the unmanned aerial vehicle platform to obtain a target detection result.

In the specific implementation process, the lightweight target detection model comprises a backbone network, a path aggregation network and a detection head;

The path aggregation network comprises 1*1 convolution layers and a bidirectional connection layer; the 1*1 convolution layer is used for reducing the channel number of the corresponding layer feature map extracted by the backbone network and then transmitting the channel number to the bidirectional connection layer; the bidirectional connection layer is used for connecting the feature map output by the 1*1 convolution layer and the corresponding feature map directly output by the backbone network, and on the basis, adding a cross-scale connection path from bottom to top to obtain a plurality of fusion feature maps, as shown in fig. 6;

In fig. 6:

SiLU: sigmoid Linear Unit, activating a function;

P is an abbreviation for pyremid, and feature maps are typically named in layers that represent different resolutions or scales. Wherein, P3-P7 are respectively in the feature graphs of the third layer to the seventh layer in the feature pyramid.

According to the embodiment, the cross-scale connection is realized by adding the features in the backbone feature extraction network and the paths from bottom to top, the spatial information of a shallow layer is reserved, and the extraction capability of a small target is enhanced.

It should be noted that the backbone network may be implemented by using an existing neural network structure, such as a target detection model of the YOLO system, and those skilled in the art may specifically select the backbone network according to the actual situation.

Taking YOLOv s as an example, as shown in fig. 2, YOLOv s is composed of a Backbone network (Backbone), a neck network, and a detection head; the backbone network is used for extracting characteristics of the input unmanned aerial vehicle inspection image to obtain spatial information and semantic information with different scales.

The english abbreviations in fig. 2 have the following meanings:

c2f: channel to pixel, channel to pixel;

CBS: c represents Conv (i.e., convolume, convolution); b represents BatchNorm d (Batch Normalization ); s stands for SiLU (Sigmoid Linear Unit, activation function);

SPPF: SPATIAL PYRAMID Pooling Fast, spatial pyramid pooling is rapid.

The spatial information refers to pixel value information of different positions in an image, and the pixel value information comprises geometrical characteristics such as edges, textures, simple shapes and the like;

The semantic information refers to category and attribute information of each region of the image, and is represented by each pixel in the feature map, and specifically includes:

Category probability: predicting a category distribution of the target at each location;

target confidence: predicting whether a target exists at each location;

and (3) boundary box regression: the position, width and height of the target are predicted at each location.

In this embodiment, the path aggregation network is configured to fuse a third layer of feature graphs extracted by the backbone network and the feature graphs after the third layer of feature graphs. On one hand, the semantic expression and positioning capability on multiple scales can be enhanced, and on the other hand, the data calculation amount can be reduced.

In this embodiment, the backbone network includes a plurality of feature extraction layers and a spatial pyramid pooling layer; the feature extraction layers are connected in series, and the spatial pyramid pooling layer is connected in series at the rear end of the feature extraction layers connected in series.

In this embodiment, the graph depth convolution network is defined as DGLFG (Deformable Global-Local Feature Guidance) module, which is used to replace the C2f module in the backbone network, as shown in fig. 3. The DGLFG module can meet the shape and multi-scale difference of the detection targets.

In fig. 3:

CBS: c represents Conv (i.e., convolution, convolution); b represents BatchNorm d functions, and the normalization processing functions; s represents SiLU functions (Sigmoid Linear Unit, activation functions);

the DCBS is short for a network formed by serially connecting DCNv functions, batch normalization functions and SiLU functions;

DCNv3: deformable Convolutional Networks Version 3, a variable convolutional network;

SiLU: sigmoid Linear Unit, activating a function;

MLCA: mixed Local Channel Attention, mix local channel attention.

The 3×3 convolution network of DarknetBottleneck (deep learning bottleneck) part in the original C2f module is replaced by a deformable convolution network, and compared with the original network, the convolution network enlarges the shallow characteristic receptive field, so that the network can adapt to the problem of larger target size difference. And then inputting the feature map processed by the convolution network into an MLCA attention mechanism module, wherein the module structure is shown in fig. 4, firstly, converting the feature map into Cxdxd by local average pooling, wherein C is the channel number, d is the length and the width, the step can rapidly reduce the calculated amount while extracting the local space features, further, restoring the original feature map size of the feature map by global average pooling, 1×1 convolution and other steps, and adding and calculating the processed feature map and the original feature, so that the result feature map contains global information and local space information.

In this embodiment, as shown in fig. 5, the spatial pyramid pooling layer (SPPF, spatial Pyramid Pooling Fast) includes a first lightweight depth-separable convolutional network, several serially connected pooling convolution kernels (e.g., 3×3,5×5, and 9×9 pooling convolution kernels), a stitching layer, and a second lightweight depth-separable convolutional network; the output after each pooling is used as the input of the next pooling, so that the shallow characteristic receptive field is enlarged, and the network can adapt to the problem of large target size difference.

The first lightweight depth separable convolution network is used for extracting features of an input feature map, the extracted features are divided into two paths, one path is subjected to gradual pooling through a plurality of pooling convolution kernels connected in series and is transmitted to a splicing layer, and the other path is directly transmitted to the splicing layer; the splicing layer is used for splicing all the received characteristics and then transmitting the characteristics to a second light-weight depth separable convolution network for further characteristic extraction.

Specifically, as shown in fig. 7, the detection head includes a decoupling head structure, and the input end of the decoupling head structure is further connected in series with a partial convolution module with an inverted residual structure;

The english of the partial convolution module of the inverted residual structure is simply referred to as: ISPP module, THE INVERSE-residual SHARED PARAMETER based on Partial convolution module.

Wherein the partial convolution network is configured to: and (3) carrying out convolution operation on part of channels (such as 1/3 channels) of the feature map, and carrying out splicing operation on the calculated convolution result and the rest channels (such as 2/3 channels) of the feature map.

Wherein the activation function is an H-Swish (Hard Swish) function. Wherein the expression of the H-Swish function is:；

Wherein the method comprises the steps of Representing the input factor(s),And the ReLU activation function representing the value range of [0,6] is designed for adapting to the low precision of the mobile terminal.

In FIG. 7, BN_ HSwish is a combination of batch normalization (Batch Normalization, BN) and H-Swish activation functions. h is the height of the input feature map; w is the width of the input feature map; cp is the number of partial channels; k is the height and width of the feature map processed by the filter.

The training process of the lightweight target detection model is specifically given below:

Step 1: a dataset is made.

Step 1.1: and (5) data collection. Typical datasets under two unmanned aerial vehicle platforms VisDrone, 2019 and UAVDT are collected as sample data, and the main scene involved is low brightness, occlusion and small target, wherein the VisDrone2019 dataset comprises ten types of data such as people, bicycles, cars and the like.

Step 1.2: and (5) data division. To demonstrate the effectiveness of the present invention, the data in this example were read as 6:2: and 2, randomly dividing the data into a training set, a verification set and a test set, training model parameters by using the training set and the verification set, and testing the model performance by using the test set, thereby ensuring that the data of the training set, the verification set and the test set are not crossed.

Step 1.3: and (5) preprocessing data. To improve detection efficiency, the model input data is resampled to 640 x 640. Meanwhile, a training data set is expanded by adopting random cutting, overturning, mixup and a Mosaic technology, so that the problem of unbalanced category is reduced, and the robustness of the lightweight target detection model is improved.

Step 2: training and validating the model. Inputting the training data set in the step 1 into the constructed lightweight target detection model for training, wherein training parameters are set as follows: the input image size is 640×640, the batch processing size is 16, the optimizer is SGD, the initial learning rate is 0.01, and the iteration number is 300, so that a trained lightweight target detection model is obtained.

Step 3: and (5) model derivation. And exporting a pt format weight file of the trained lightweight target detection model into a ONNX format lightweight target detection model, compiling the lightweight target detection model by a tool chain, and converting the lightweight target detection model into a rknn format lightweight target detection model suitable for NPU equipment.

Step 4: and deploying the trained light target detection model on the unmanned plane platform. Wherein, be provided with rk3588 chip in the unmanned aerial vehicle platform, will train the lightweight target detection model and dispose on the integrated circuit board of carrying rk3588 chip, carry out the reasoning through the access multichannel video stream, return target coordinates and category confidence after carrying out non-maximum suppression algorithm processing.

Table 1 shows the results of the performance comparison of the lightweight object detection model of the present invention with YOLOv s. As shown in table 1, the lightweight target detection method based on the unmanned aerial vehicle platform is simple in principle and easy to implement, wherein the lightweight target detection model is compared with YOLOv s: the accuracy of the target detection result is increased from 37.4% to 40.6%, the parameter quantity is reduced from 11.1M to 5.8M, the calculated quantity is reduced from 28.5GFLOPS to 20.6GFLOPS, and the reasoning speed is increased from 131FPS to 147FPS.

Table 1 the lightweight object detection model of the present invention compares to YOLOv s performance.

Fig. 8 is a graph of comparison of detection effects of different target detection models, wherein the target detection models include YOLOv s, YOLOv8s and the lightweight target detection model of the present invention. As can be seen from fig. 8, the light-weight object detection model of the present invention has higher detection accuracy and precision than YOLOv s and YOLOv s.

Fig. 9 is a schematic structural diagram of a lightweight target detection system based on a unmanned aerial vehicle platform according to an embodiment of the present invention. As shown in fig. 9, a lightweight target detection system based on an unmanned aerial vehicle platform according to an embodiment of the present invention includes:

an image acquisition module 201 for acquiring an unmanned aerial vehicle inspection image;

The target detection module 202 is configured to process the unmanned aerial vehicle inspection image by using a lightweight target detection model that is trained and embedded in an unmanned aerial vehicle platform, so as to obtain a target detection result;

Here, it should be noted that, each module in the lightweight target detection system based on the unmanned aerial vehicle platform in this embodiment corresponds to each step in the lightweight target detection method based on the unmanned aerial vehicle platform, and the specific implementation process is the same, and will not be described in detail here.

The specific structure of the lightweight target detection model is the same as that of the lightweight target detection method based on the unmanned aerial vehicle platform, and will not be described again here.

In one or more embodiments, there is also provided a lightweight target detection system based on an unmanned aerial vehicle platform, comprising: an image acquisition module and an image processing module;

The specific structure of the lightweight object detection model is the same as that of the lightweight object detection method based on the unmanned aerial vehicle platform, and will not be described here.

In one or more embodiments, there is also provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps in a lightweight target detection method based on a drone platform as shown in figure 1 above.

In one or more embodiments, there is also provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps in the drone platform-based lightweight target detection method as shown in fig. 1, described above, when the program is executed by the processor.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The lightweight target detection method based on the unmanned aerial vehicle platform is characterized by comprising the following steps of:

acquiring an unmanned aerial vehicle inspection image;

2. The unmanned aerial vehicle platform-based lightweight target detection method of claim 1, wherein the path aggregation network is configured to fuse a third layer of feature map extracted by the backbone network and the feature map after the third layer of feature map.

3. The unmanned aerial vehicle platform-based lightweight target detection method of claim 1, wherein the backbone network comprises a plurality of feature extraction layers and a spatial pyramid pooling layer; the feature extraction layers are connected in series, and the spatial pyramid pooling layer is connected in series at the rear end of the feature extraction layers connected in series.

4. The unmanned aerial vehicle platform-based lightweight target detection method according to claim 3, wherein a graph depth convolution network is introduced into the feature extraction layer, and the graph depth convolution network processes an input original feature graph to obtain a feature graph containing global information and local space information;

5. The unmanned aerial vehicle platform-based lightweight target detection method of claim 3, wherein the spatial pyramid pooling layer comprises a first lightweight depth separable convolutional network, a plurality of serially connected pooling convolutional kernels, a stitching layer, and a second lightweight depth separable convolutional network; the first lightweight depth separable convolution network is used for extracting features of an input feature map, dividing the extracted features into two paths, carrying out step-by-step pooling on one path through a plurality of pooling convolution kernels connected in series and transmitting the pooled result to a splicing layer, and directly transmitting the other path to the splicing layer; the splicing layer is used for splicing all the received characteristics and then transmitting the characteristics to a second light-weight depth separable convolution network for further characteristic extraction.

6. The unmanned aerial vehicle platform-based lightweight target detection method of claim 1, wherein the detection head comprises a decoupling head structure, and the input end of the decoupling head structure is further connected in series with a partial convolution module with an inverted residual error structure;

7. The unmanned aerial vehicle platform-based lightweight target detection method of claim 6, wherein the partial convolution network is configured to: and carrying out convolution operation on part of channels of the feature map, and carrying out splicing operation on the calculated convolution result and the rest channels of the feature map.

8. The unmanned aerial vehicle platform-based lightweight target detection method of claim 6, wherein the activation function is an H-Swish function.

9. Lightweight target detection system based on unmanned aerial vehicle platform, characterized by comprising:

10. Lightweight target detection system based on unmanned aerial vehicle platform, characterized by comprising:

An image acquisition module and an image processing module;