CN118396071B

CN118396071B - Boundary driving neural network structure for unmanned ship environment understanding

Info

Publication number: CN118396071B
Application number: CN202410867559.8A
Authority: CN
Inventors: 孔栋; 孙晓宇; 刘纪刚; 张立业; 李中正
Original assignee: Shandong University of Science and Technology
Current assignee: Shandong University of Science and Technology
Priority date: 2024-07-01
Filing date: 2024-07-01
Publication date: 2024-09-03
Anticipated expiration: 2044-07-01
Also published as: CN118396071A

Abstract

The invention discloses a boundary driving neural network structure for unmanned ship environment understanding, which belongs to the technical field of neural network structures and is used for extracting environment information under an unmanned ship visual angle, and comprises a network main branch, an edge extraction module, an edge strengthening module and a loss function; the network main branch comprises an encoder and a decoder, wherein the encoder comprises five residual convolution blocks, a maximum pooling layer and an average pooling layer, and the decoder comprises two attention refinement modules, a feature fusion module and a cavity convolution pyramid pooling module; the edge extraction module comprises three serially connected boundary attention flows, a plurality of convolution layers and an up-sampling layer; the edge strengthening module comprises a cavity convolution pyramid module, a channel attention module and a convolution layer; the invention solves the problem of fuzzy segmentation of the extraction of the offshore environment information on various boundaries under the view angle of the unmanned ship, improves the accuracy of the extraction of the offshore environment information, and can more effectively guide the unmanned ship to carry out path planning and autonomous navigation.

Description

Boundary driving neural network structure for unmanned ship environment understanding

Technical Field

The invention discloses a boundary driving neural network structure for unmanned ship environment understanding, and belongs to the technical field of neural network structures.

Background

Unmanned Surface Vehicle (USV) is small in size, strong in portability, and is generally provided with a sensor with low power consumption, so that autonomous navigation can be performed, and therefore the USV is mainly applied to scientific exploration, environmental information collection, search and rescue, inspection and the like in various scenes. When autonomous navigation is performed, the USV must be able to sense the surrounding environment, understand the position of the obstacle and the area of the water area where the vehicle can travel, and perform path planning and obstacle avoidance in advance. Common ship sensors, such as radar, sonar, etc., cannot be installed on USV due to factors such as limited size of USV, small load capacity, limited energy supply, and cost limitations. The camera has the characteristics of electricity saving and portability, and can provide rich environmental information. In many application scenarios, people replace sensors such as radar by using a camera in combination with a deep learning network, and USV is no exception, so that extracting navigation-related information from original picture information of the camera is crucial for realizing autonomous navigation of an unmanned ship.

In the fields of unmanned vehicles and autonomous navigation vehicles (AGVs), research on autonomous navigation using sensors has been rapidly progressed in recent years. However, since there are many differences between the AGV and the USV in the environment of the application scenario, the network applied to the AGV cannot be easily applied to the USV. In the running process of the unmanned ship, the situation that the traditional vision method is difficult to effectively treat is difficult to be treated due to the continuous change of the surrounding ocean background, and the method based on ship hardware equipment also faces great difficulty. With the rise of novel deep learning network architecture, the technology for extracting the marine environment characteristic information under the view angle of the unmanned ship is continuously perfected, and meanwhile, in order to more accurately divide a target object, the research of introducing edge information to assist the unmanned ship to extract the marine environment characteristic appears. The method comprises the steps of fusing inertial data of an Inertial Measurement Unit (IMU) with visual information of a decoder to improve accuracy of segmentation of the water surface boundary, and using a least square method and median filtering for estimation of the water surface boundary to improve accuracy of segmentation. The above researches are all carried out by estimating the water surface boundary position and assisting the network to extract the marine environment characteristic information, but under the complex marine environment, the prediction of the water surface boundary always has larger error, and in the running process of the USV, the captured edge information not only has the water surface boundary, but also relates to the boundary of the sky and the barrier, the boundary of the barrier in the water and the like, and the existing method is used for extracting the marine environment characteristic information under the view angle of the unmanned ship, so that the great challenges exist.

Disclosure of Invention

The invention aims to provide a boundary driving neural network structure for unmanned ship environment understanding, which is used for solving the problem of low extraction precision of marine environment characteristic information in the prior art.

A boundary driving neural network structure for unmanned ship environment understanding comprises a network main branch, an edge extraction module, an edge strengthening module and a loss function; the network main branch comprises an encoder and a decoder, the encoder adopts Resnet structures, the encoder comprises five residual convolution blocks, a maximum value pooling layer and an average pooling layer, and the decoder comprises two attention thinning modules, a feature fusion module and a cavity convolution pyramid pooling module; the edge extraction module comprises three serially connected boundary attention flows, a plurality of convolution layers with convolution kernel sizes of 1 multiplied by 1 and an up-sampling layer; the edge strengthening module comprises a cavity convolution pyramid module, a channel attention module and a convolution layer with a convolution kernel size of 1 multiplied by 1;

the offshore environment image shot by the unmanned ship is divided into a plurality of data sets, and the data sets are imported into a neural network structure to extract the offshore environment information characteristics.

The five residual convolution blocks of the encoder respectively comprise 1, 9, 12, 69 and 9 convolution layers, the convolution kernel sizes of the convolution layers comprise 7×7, 1×1 and 3×3, the step size of the convolution layers in the first residual convolution block is 2, and the step sizes of the convolution layers of the rest residual convolution blocks are all 1.

In the decoder, the attention thinning module comprises a downsampling layer, a convolution layer with a convolution kernel size of 1×1, a normalization layer and a Sigmoid activation layer, one attention thinning module receives the output of the encoder pooling layer, and the other attention thinning module receives the output of the second residual convolution block of the encoder.

The feature fusion module comprises a convolution layer with the convolution kernel size of 3 multiplied by 3, two convolution layers with the convolution kernel size of 1 multiplied by 1, a normalization and ReLu activation function layer, a downsampling layer and a Sigmoid activation function layer, wherein the outputs of the two attention refinement modules are subjected to Concat splicing, and the feature fusion module receives the splicing result and the output result of the second residual convolution block;

The cavity convolution pyramid pooling module comprises four cavity convolution layers, a pooling layer, two convolution layers with convolution kernel sizes of 1 multiplied by 1 and an up-sampling layer, wherein the convolution kernel sizes of the four cavity convolution layers are 1 multiplied by 1,3 multiplied by 3 and 3 multiplied by 3 respectively, the expansion rates are 1, 6, 12 and 18 respectively, and the cavity convolution pyramid pooling module receives the output from the characteristic fusion layer and takes the output as the output result of a main branch after being processed through up-sampling operation.

The boundary attention flow comprises three convolution layers with the convolution kernel size of 1 multiplied by 1, reLu activation function layers and Sigmoid activation function layers;

The input of the first boundary attention stream is the output of the 1x1 convolution layer processed input image and the up-sampled layer processed main branch first residual convolution block, the input of the second boundary attention stream is the output of the 1x1 convolution layer processed first boundary attention stream and the up-sampled layer processed main branch max-pooling layer, the input of the third boundary attention stream is the output of the 1x1 convolution layer processed second boundary attention stream and the up-sampled layer processed main branch third residual convolution block, and the three boundary attention streams are spliced Concat as the final output of the edge extraction module.

The cavity convolution pyramid module comprises four cavity convolution layers, wherein the convolution kernel sizes are respectively 1 multiplied by 1,3 multiplied by 3 and 3 multiplied by 3, and the expansion rates are respectively 1,4 and 8; the input of the cavity convolution pyramid module is the result of the network main branch output and the edge extraction module output which are spliced by Concat, and the result of the cavity convolution pyramid module is spliced by Concat and is output.

The channel attention module comprises a maximum pooling layer, an average pooling layer, a multi-layer perceptron and a Sigmoid activation function layer, and receives the output of the cavity convolution pyramid module as input, and outputs through a convolution layer with the convolution kernel size of 1 multiplied by 1 as final output of the edge strengthening module.

The loss function includes feature segmentation lossBoundary lossAnd focal point loss；

The feature segmentation loss is obtained through the output calculation of the fourth residual convolution block of the main branch, and the output and boundary true value of the third edge attention flowCalculating to obtain boundary loss, calculating to obtain focus loss through output and characteristic true value of the edge strengthening module, and obtaining a network total loss functionThe three are weighted and summed to obtain the following components:

；

、、 the weights corresponding to the three losses are respectively.

Training neural network structure, iterating and optimizing network parameters by random gradient descent method, using constructed datasetPre-training the neural network structure;

using training sets And a verification setPerforming network training, and setting pre-training super-parameters and loss function weights, wherein the pre-training super-parameters comprise maximum iteration timesInitial learning rateWeight decay rateAnd batch size。

After training the neural network structure, testing the neural network structure, and verifying the environment information extraction effect.

Compared with the prior art, the invention has the following beneficial effects: the neural network structure trained by the invention can rapidly extract the boundary of the water surface, has high extraction precision, low false detection rate and good edge segmentation effect, and improves the accuracy of boundary feature extraction in the offshore environment feature information.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions in the present invention will be clearly and completely described below, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

；

、、 the weights corresponding to the three losses are respectively.

The technology of carrying out semantic segmentation on the offshore environment based on ship hardware equipment is difficult to apply to unmanned ships due to the limiting factors such as the size, the load capacity and the power consumption of the unmanned ships; the research of semantic segmentation technology under the offshore environment at the present stage is mainly focused on the aspects of eliminating the influence of environmental noise, improving the performance of a model and the like, and neglecting the research of boundary characteristics influencing the segmentation accuracy; however, the existing semantic segmentation technology based on the edge features only relates to the boundary of the water surface, but does not relate to the boundary of the sky and the obstacle, the boundary of the obstacle in the water and the like. In order to solve the problem of accurate semantic segmentation of camera input images in a complex offshore environment so as to effectively guide unmanned ships to conduct autonomous navigation, the invention designs a boundary driving neural network structure for unmanned ship environment understanding. According to the network structure, the edge extraction module and the edge reinforcement module are introduced, and through the synergistic effect of the two modules, the boundary characteristics among various categories can be effectively identified by the network, so that more accurate segmentation of various offshore environmental elements is realized.

The sample size of MaSTr1325 data set commonly used in the field of marine semantic segmentation is adjusted to 512px x 384px and recorded asExtracting boundary true value according to the label value of the data set, and recording asThen the mixture is mixed according to the ratio of 8:1:1Is divided into a training set, a verification set and a test set, which are respectively marked as，And。

When the unmanned ship runs on the sea surface, the information captured by the camera mainly comprises three parts, namely sky, coast and sea surface. Because sea surface has wave and reflection and other conditions, sea surface information is complex, and therefore, higher requirements are put forward on a network encoder. Referring to the analysis of different network encoder architectures, the backbone portion of Deeplabv of the prior art carries Resnet101, which is considered an encoder of the present network, with higher accuracy than other networks in coping with complex marine environment segmentation tasks.

And testing the network model and verifying the segmentation effect.

Sub-step 1: testing was performed on MaSTr1325 data sets. The data set constructed in step oneDivide it into test sets. Predicting the test set image using the trained network and using the evaluation indexAnd (3) withTo evaluate the accuracy and quality of the segmentation of the network. Meanwhile, a common semantic segmentation network Deeplabv3+ HRNet in the deep learning field is introduced, and the semantic segmentation network WaSRNet applied to the maritime field is compared with the design network, so that the effectiveness of the design network is further verified.

Sub-step 2: tests were performed on MODD data sets. Predicting the image in the data set by using the trained network, and evaluating the segmentation precision and quality of the network by using a target detection and obstacle segmentation evaluation system MODS oriented to the unmanned ship running environment. Meanwhile, a common semantic segmentation network Deeplabv3+ in the deep learning field is introduced, and the key point detection network HRNet is compared with a semantic segmentation network WaSRNet applied to the maritime field and a design network, so that the effectiveness of the design network is further verified.

Sub-step 3: ablation experiments were performed on MODD data sets for the edge extraction module and the edge enhancement module. In order to further verify the roles of the edge extraction module and the edge enhancement module in the design network, the data sets constructed in the first step are respectively added to the network only comprising the main branches, the network added with the edge extraction module, the network added with the edge enhancement module, the network added with the edge extraction module and the edge enhancement moduleI.e., maSTr, and tested on MODD2 dataset using the MODS evaluation system.

The invention provides a boundary strengthening network fused with multi-scale features in order to realize more accurate semantic segmentation in a complex marine environment, thereby better guiding an unmanned ship to carry out autonomous navigation. The network fully compensates the boundary information loss of the picture in the encoder caused by deep convolution, and solves the problem of inaccurate segmentation caused by category edge blurring in the offshore environment by fusing boundary information and semantic information. The invention provides an edge extraction module. The module fuses the low-level and high-level characteristics of the main branch, and enriches the characteristic information of the boundary. Meanwhile, the information irrelevant to the boundary is adaptively filtered by utilizing the serial edge attention flow, so that the independence of the edge characteristics and other semantic characteristics is ensured. And introducing an edge loss function, and ensuring the accuracy of the edge extraction module through additional supervision. The invention provides an edge strengthening module. The module fuses the edge extraction result and the semantic segmentation result, utilizes a cavity convolution layer to form a cavity convolution pyramid module, introduces a channel attention module, enhances the relationship between the characteristics of each channel while expanding the receptive field, enhances the perception of the edge information context, and improves the accuracy of each edge in the segmentation result.

The input image of the network is three-channel RGB image, which is extracted layer by the encoder part, and finally the high-level characteristics of the average pooling layer output are sent to the decoder. In the decoder part, this feature is first input into the attention refinement module, while the low-level features through the residual convolution block are input into the attention refinement module. The main branch can effectively divide each class by extracting the feature map and fusing the low-level features and the high-level features, but because the main branch does not learn the class edge features, the network has insufficient understanding of the main branch and the class edge of the output result is fuzzy. In order to make up for the limitation, an edge extraction module and an edge enhancement module are introduced to enhance the capturing of edge characteristics by the model and improve the accuracy of semantic segmentation.

The key of high-precision semantic segmentation in a complex offshore environment is to precisely divide boundaries between semantic information of various categories. In order to obtain accurate boundary information from an input image and a decoder, the invention designs a boundary extraction module. The low-level feature map near the input image has a higher resolution, i.e. has more edge features, but less semantic information, than the high-level feature map. The high-level feature map has richer semantic information and larger receptive field, but the loss of edge features is caused by lower resolution. Therefore, in order to combine the advantages of the two, in order to reduce the loss of boundary information while maintaining rich semantics, the invention fuses the high-resolution low-level features and the low-resolution high-level features of the network, so that the low-level features provide clear edge information and the high-level features provide rich semantic information. Because the fusion of the features can introduce disturbance information irrelevant to tasks, thereby influencing the extraction of edge information, the edge attention flow is designed, and the module can concentrate the network attention on the information relevant to the edges and adaptively reject irrelevant information.

The number of channels is adjusted through a1×1 convolution layer and a Sigmoid activation function layer, and an attention weight graph a is obtained. Judging the sizes of A and (1-A), assigning the larger value of the A and (1-A) to the attention weight graph B, and assigning the smaller value of the A and (1-A) to the attention weight graph C.

The definition of this operation is as follows:

；

The attention weight graph B and the input_1 are multiplied element by element, so that the missing edge information in the advanced features can be reserved in the fused feature graph, meanwhile, the interference of irrelevant information is reduced, and the learning capacity of the network on the edge features is enhanced on the basis of not obviously increasing the computational complexity.

After the semantic segmentation result and the edge extraction result of the image are obtained, in order to accurately segment the image input in the complex offshore environment, the two are further fused to solve the problem of unclear edges between different categories, and therefore, an edge enhancement module is designed. The module strengthens edge characteristics while fusing two types of characteristics, so that edge information is more obvious. The design aims to solve the problem of fuzzy class edges and improve the accuracy of semantic segmentation. The edge enhancement module comprises two sub-modules, namely a cavity convolution pyramid module and a channel attention module. The two inputs of the answer cluster predictor ACP are the main branch output result and the edge extraction module output result respectively, and the two are spliced and then respectively sent to four convolution layers for convolution. The purpose of this design is to expand the receptive field, and in particular to provide a more thorough understanding of edge features, including the surrounding background and structures. Simultaneously, semantic information of each category is captured better, so that accuracy of semantic segmentation and adaptability in complex scenes are improved. Features processed by the cavity convolution pyramid module are spliced together through Concat operation and are transmitted to the channel attention module as input. Firstly, in order to preserve the most significant spatial features in each channel, the inputs are subjected to an average pooling and a maximum pooling operation, and then the two pooled results are sent to the same multi-layer perceptron, which comprises two hidden layers, the number of neurons of each hidden layer being set to one half of the number of channels. This step aims at further integrating the characteristic information of the multi-scale multi-channel; and finally, adding two outputs of the multi-layer perceptron, and normalizing by a Sigmoid activation function layer to obtain a channel attention weight graph M. And multiplying the input of the M channel attention module channel by channel to obtain the adjusted channel characteristics. And adding the characteristic with the original input again, and adjusting the channel number through a1 multiplied by 1 convolution layer to finally obtain a prediction result. Through the synergistic effect of the ACP and the attention mechanism CA, the Bayesian-based expectation maximization method BEM can more effectively integrate semantic features of main branch output and edge features of the two-way associative memory network BAM output, so that the network can more accurately capture the positions and interrelationships of edges in the whole image, and the global understanding capability of the edge features is improved. Meanwhile, BEM is helpful for better understanding the relation between the edge information and the surrounding semantic information, improves the context perception of the edge, improves the capability of the model on the aspect of edge-like segmentation, and improves the accuracy of final segmentation.

The output of the network designed by the present invention comprises two parts: one part is the edge feature extracted by the last edge attention flow in the edge extraction module, and the other part is the semantic segmentation result obtained by the edge enhancement module. In the semantic segmentation task, the goal is to divide elements in the image into three categories, sea surface, sky, and obstacles. However, there are two problems to be solved: first, in the normal case, the elements of the sea surface and sky occupy the vast majority of the image, with a greater number relative to the obstacle elements, which results in the problem of sample number imbalance between the different classes; secondly, because the sea surface has complex optical phenomena such as reflection, wave and the like, the characteristic information of the sea surface changes obviously, and the problem of difficult segmentation of sea surface elements exists.

Aiming at the problem of sample unbalance between classes, the network uses focus lossIt alleviates this problem by adjusting the weight of the loss function, making the model more focused on samples that are difficult to classify. Aiming at the problem of difficult sea surface element segmentation, the important point is placed on a sea surface segmentation task in the training process so as to better capture and learn the special visual characteristics of the sea surface, thereby improving the performance of the segmentation model in the aspect of processing the sea surface complexity. According to research, the main reason that the sea surface elements are difficult to learn and divide is that the sea surface has light reflection when sunlight is sufficient, so that part of the sea surface can be mistaken as sky or an obstacle. In the running process of the unmanned ship, the influence of false sky detection as sea surface on path planning and running safety is small. However, erroneously identifying an obstacle as a sea surface may lead to erroneous driving planning of the unmanned ship, with serious consequences. To avoid such false positives, the network should learn the code accurately at an early level, ensuring that highly similar features are produced for sea surface appearance, while significantly different features are produced for obstacles. Thus introducingAnd taking the calculation result of the fourth residual convolution block of the main branch as the input of loss, clustering sea surface semantic information by learning sea surface appearance characteristics, and separating the sea surface semantic information from obstacle semantic information to enable the two types of information to be independent, thereby realizing better separation of the sea surface and the obstacle. For edge extraction results, introduce into the networkAnd uses edge truth values to monitor the accuracy of edge extraction. Edge extraction is typically a two-classification problem at one pixel level, i.e. each pixel is classified either as an edge (positive class) or as a non-edge (negative class). The invention therefore uses a binary cross entropy penalty to calculate, which effectively penalizes the erroneous classification of each pixel by the model, causing the model to learn better the fine features of the edges, defined as follows:

；

wherein, Is the position of the pixel, N is all positions of the pixel,Is the edge true value,Is an edge prediction value, and the method comprises the steps of,Is a logarithmic base, with a specific value of 2 or，Equal to 2, the units of entropy are bits; Equal to When the entropy unit is nats.

Iteration is performed by a random gradient descent method and network parameters are optimized. Using the data set constructed in step onePretraining designed deep learning networks, i.e. using training setsAbout 1060 pictures, and a verification setAbout 130 pictures, performing network training, and setting pre-training super parameters and loss function weights. Wherein the pre-training hyper-parameters include: maximum number of iterations100, Initial learning rate1×10 ^-6, Weight decay rate1X 10 ^-6, batch size8.

The loss function weights include: Is a number of 1, and is not limited by the specification, The content of the acid in the solution is 0.1,0.1.

Testing was performed on MaSTr1325 data sets. The data set constructed in step oneDivide it into test setsAbout 133 pictures. Predicting the test set image using the trained network and using the evaluation indexAnd (3) withTo evaluate the accuracy and quality of the segmentation of the network, defined as follows:

；；

Wherein TP, FP and FN represent true positive, false positive and false negative counts, respectively, The number of categories is indicated and,Represent the firstA category;

meanwhile, the common semantic segmentation network Deeplabv3+ HRNet in the deep learning field is introduced to be compared with the semantic segmentation network WaSRNet applied to the maritime field and the design network, the effectiveness of the design network is further verified, and the result is as follows and is shown in table 1.

Table 1 effect comparison

；

Table 1 summarizes the predictions for dataset MaStr over different networks. The observation results show that the network of the invention has better segmentation accuracy than all other networks on the sea surface, the sky and the obstacles. When the sea surface and the sky are segmented, the difference of the network in the segmentation precision is small because the proportion of the sea surface to the sky in the picture is relatively high. BEM has only a slight difference in IoU index compared to WaSRNet of the second name, 0.03% and 0.04%, respectively. When the obstacle is segmented, the network strengthens the boundary information, and the obstacle occupies smaller space than the sky and the sea surface in the picture, so that the network has a remarkable difference from other networks in IoU indexes. The network designed by the invention leads by about 0.28% compared with WaSRNet of the second name, and the advantages are more obvious compared with HRNet. In dataset MaStr, the splitting effect of different networks. For the building supports (first, third, fifth rows) on shore, the marine buoys (fourth row) and yachts (second row), the splitting effect of the inventive network on the boundary is better than the rest of the network. This further demonstrates that the invention can actually accomplish the semantic segmentation task in an unmanned ship offshore environment.

Tests were performed on MODD data sets. All 94 sequences in the dataset were used for prediction, for a total of 80828 pictures. All pictures were adjusted from 640 x 480 to 512 x 384 before entering the network. Predicting the test centralized image by using the trained network, and evaluating the segmentation precision and quality of the network by using a target detection and obstacle segmentation evaluation system MODS oriented to the unmanned ship running environment. The evaluation criteria involved are as follows:

obstacle-surface edge positioning accuracy The water edge position of each picture is included in the true value of the evaluation dataset. The obstacle-water edge positioning accuracy is defined as: for the class of masks that predicts water levels in the mask, the square root of the average squared distance from the water edge truth for the layer of pixels that are closest to the water edge truth in the vertical distance among all pixels. The definition is as follows:

；

in the method, in the process of the invention, Is the firstThe true value of the distance of each pixel from the water edge,The average distance between the pixel and the water edge true value;

Accuracy rate of Is the accuracy rate and recall rate of the semantic segmentation of the modelThe recall rate of the semantic segmentation of the model is embodied, and the capability of successfully identifying all positive examples of the model is embodied, so that the error rate is improvedIs the average number of false positive detections per hundred images. Score ofIs an index comprehensively considering the accuracy and recall rate and is used for comprehensively evaluating the performance of the model. The prediction results are shown in table 2.

Table 2 comparison of predicted results

；

Table 2 summarizes the predicted results for the different networks on dataset MODD and evaluates using the MODS evaluation criteria. In the aspect of the positioning precision of the obstacle-water edge, as the boundary extraction and strengthening module is introduced into the network, the average deviation of the detection of the water edge reaches 16.4 pixels, which is better than WaSRNet about 0.5 pixels of the second name. The results demonstrate that the boundary module plays an important role in improving edge accuracy, and that the boundary module has obvious influence on improving water edge positioning accuracy. In the aspect of accuracy (Pr), the network is superior to other networks, the accuracy reaches 96.3 percent, then WaSRNet percent and 95.2 percent, and the accuracy is far higher than that of other networks because the two networks are mainly applied to offshore environments. The network of the invention has better dividing effect on the boundary than other networks. In conclusion, the network designed by the invention has the highest positioning precision of the obstacle-water edge, lowest false detection rate, highest F1 and better edge segmentation effect, thereby being more capable of being used for the autonomous navigation guidance task of the unmanned ship.

Ablation experiments were performed on MODD data sets for the edge extraction module and the edge enhancement module. In order to further verify the roles of the edge extraction module and the edge enhancement module in the design network, the data sets constructed in the first step are respectively added to the network only comprising the main branches, the network added with the edge extraction module, the network added with the edge enhancement module, the network added with the edge extraction module and the edge enhancement moduleI.e., maSTr, the same parameters and segmentation ratios as step 3 were used for training on the MaSTr dataset and testing on the MODD dataset using the MODS evaluation system. MODD2 the dataset settings and evaluation parameters are the same as in sub-step 2 of this step. The prediction results are shown in Table 3.

Table 3 comparison of ablation experimental results

；

The addition of the boundary extraction moduleThe method reduces 0.4px, and mainly because the boundary extraction module fuses multi-scale semantic information, so that the sea boundary division is more accurate. The addition of the boundary enhancement module willReduces the px by 0.3, improves the accuracy by 0.5 percent,The method reduces the cost by 0.3%, and the main reason is that the boundary enhancement module enlarges the receptive field and enhances the context relation between semantic information of each channel through hole convolution and channel attention mechanism. The addition of the boundary enhancement module complements the boundary details, but still has defects. After the boundary enhancement module and the boundary extraction module are introduced, the network successfully fuses the multi-scale semantic information and the boundary information, meanwhile, the association between the boundary information and the multi-channel information is enhanced, and the edge detail of the segmentation result is improved more than that of the main branch. Also, as shown in table 3, all indicators show the effectiveness of these modules compared to the baseline model and other models.

The above embodiments are only for illustrating the technical aspects of the present invention, not for limiting the same, and although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may be modified or some or all of the technical features may be replaced with other technical solutions, which do not depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. The method for constructing the boundary driving neural network structure for the unmanned ship environment understanding is characterized by comprising a network main branch, an edge extraction module, an edge strengthening module and a loss function; the network main branch comprises an encoder and a decoder, the encoder adopts Resnet structures, the encoder comprises five residual convolution blocks, a maximum value pooling layer and an average pooling layer, and the decoder comprises two attention thinning modules, a feature fusion module and a cavity convolution pyramid pooling module; the edge extraction module comprises three serially connected boundary attention flows, a plurality of convolution layers with convolution kernel sizes of 1 multiplied by 1 and an up-sampling layer; the edge strengthening module comprises a cavity convolution pyramid module, a channel attention module and a convolution layer with a convolution kernel size of 1 multiplied by 1;

dividing an offshore environment image shot by an unmanned ship into a plurality of data sets, and importing the data sets into a neural network structure to extract offshore environment information characteristics;

The five residual convolution blocks of the encoder are respectively provided with 1, 9, 12, 69 and 9 convolution layers, the convolution kernel sizes of the convolution layers comprise 7 multiplied by 7,1 multiplied by 1 and 3 multiplied by 3, the step length of the convolution layers in the first residual convolution block is 2, and the step length of the convolution layers of the rest residual convolution blocks is 1;

In the decoder, the attention thinning module comprises a downsampling layer, a convolution layer with the convolution kernel size of 1 multiplied by 1, a normalization layer and a Sigmoid activation layer, wherein one attention thinning module receives the output of the coder pooling layer, and the other attention thinning module receives the output of a second residual convolution block of the coder;

The cavity convolution pyramid pooling module comprises four cavity convolution layers, a pooling layer, two convolution layers with convolution kernel sizes of 1 multiplied by 1 and an up-sampling layer, wherein the convolution kernel sizes of the four cavity convolution layers are 1 multiplied by 1,3 multiplied by 3 and 3 multiplied by 3 respectively, the expansion rates are 1, 6, 12 and 18 respectively, and the cavity convolution pyramid pooling module receives the output from the characteristic fusion layer and takes the output as the output result of a main branch after being processed by the up-sampling operation;

The input of the first boundary attention stream is the output of the first residual convolution block of the main branch processed by the 1 x1 convolution layer and the input image processed by the up-sampling layer, the input of the second boundary attention stream is the output of the first boundary attention stream processed by the 1 x1 convolution layer and the output of the maximum pooling layer of the main branch processed by the up-sampling layer, the input of the third boundary attention stream is the output of the second boundary attention stream processed by the 1 x1 convolution layer and the output of the third residual convolution block of the main branch processed by the up-sampling layer, and the three boundary attention streams are spliced to be the final output of the edge extraction module Concat;

The cavity convolution pyramid module comprises four cavity convolution layers, wherein the convolution kernel sizes are respectively 1 multiplied by 1,3 multiplied by 3 and 3 multiplied by 3, and the expansion rates are respectively 1, 4 and 8; the input of the cavity convolution pyramid module is the result of splicing the network main branch output and the edge extraction module output through Concat, and the result of the cavity convolution pyramid module is spliced and output through Concat;

the channel attention module comprises a maximum pooling layer, an average pooling layer, a multi-layer perceptron and a Sigmoid activation function layer, and receives the output of the cavity convolution pyramid module as input, and outputs through a convolution layer with the convolution kernel size of 1 multiplied by 1 as final output of the edge strengthening module;

the loss function includes feature segmentation loss Boundary lossAnd focal point loss；

；

、、 The weights corresponding to the three losses are respectively;

training neural network structure, iterating and optimizing network parameters by random gradient descent method, using constructed dataset Pre-training the neural network structure;

using training sets And a verification setPerforming network training, and setting pre-training super-parameters and loss function weights, wherein the pre-training super-parameters comprise maximum iteration timesInitial learning rateWeight decay rateAnd batch size；