CN114492766B

CN114492766B - Convolutional neural network model design space McaNetX and optimization method thereof

Info

Publication number: CN114492766B
Application number: CN202210176263.2A
Authority: CN
Inventors: 王青禄; 俞翔; 陈逸霖; 张宇; 郭展希; 陆耀欢; 陆媛媛; 王权林; 罗栋楠
Original assignee: Nanjing Institute of Technology
Current assignee: Nanjing Institute of Technology
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2024-10-29
Anticipated expiration: 2042-02-25
Also published as: CN114492766A

Abstract

The invention provides a convolutional neural network model design space McaNetX and an optimization method thereof, wherein the optimization method comprises the following steps: 1) According to the characteristics of SAR images, designing and characteristic extraction network base structures and representing the base structures by using structural parameters; 2) Analyzing the structural parameter rule in the step 1) by using a design space sample analysis tool; 3) Overall judging the performance of the designed space sample model, and comparing the change of the performance; 4) And optimizing the design space set McaNetX according to the structural parameter change rule in the step 2). The network structure designed based on the design space optimization method can complete the target recognition task of the SAR image on the lightweight model, and has strong robustness and generalization capability.

Description

Convolutional neural network model design space McaNetX and optimization method thereof

Technical Field

The invention relates to the field of image target recognition, in particular to a convolutional neural network model design space McaNetX and an optimization method thereof.

Background

Traditional approaches to SAR image recognition problems rely on design feature extraction algorithms and classifier selection. The current main SAR image target recognition algorithm comprises a template matching-based method, a Boosting-based method, a sparse matrix representation-based method and the like, the accuracy of the methods depends on whether an artificially designed algorithm is suitable or not, the method is easily influenced by factors such as speckle noise, azimuth angle and pitch angle, and a feature extraction algorithm is often required to be redesigned aiming at different SAR image features. The current mainstream method for selecting the classifier is to manually select a classifier with a better classifying effect to adapt to the characteristics extracted from the SAR image by using a classifier facing the optical image design such as a support vector machine (Support Vector Machine, SVM) and the like, and the association degree of the classifier and the characteristics is low.

The deep learning method for SAR image target recognition can improve the SAR image recognition capability by designing a neural network structure, so that a complex feature extractor is avoided. At present, the neural network aiming at SAR image target recognition is mostly designed manually, and high labor cost is required.

The design rule of the neural network structure can be popularized to other application directions to a certain extent. The structural characteristics of the deep convolutional neural network are an indispensable part of the neural network, and the convolutional kernel, the parameter size, the depth and the residual structure of the convolutional neural network influence the performance level of the neural network. LeNet, alexNet, resNet and the like have significant influence on the rules of designing networks of subsequent researchers. Thus, the neural network architecture that has proven to be well-effective is not only a specific network instance, but also a design rule that can be generalized and applied to other networks.

Both manual design and NAS (Neural Architecture Search) design networks have certain limitations. In the process of designing a convolutional neural network, the way in which the neural network extracts and fuses features is different for different data sets, so the challenge of manually designing the network is that the process of designing the network is complex and professional ability is tested. The NAS can automatically search a network structure with good performance from a search space in the network design process, the design cost and the labor cost of the neural network are obviously reduced due to the automatic characteristic of the NAS, but the search result is that a single network is optimized to a specific setting, and the design rule of the network is not displayed in the network search process.

Disclosure of Invention

In order to solve the problems set forth in the background art, the present invention provides a convolutional neural network model design space McaNetX and an optimization method thereof, the optimization method comprising the following steps: 1) According to the characteristics of SAR images, designing and characteristic extraction network base structures and representing the base structures by using structural parameters; 2) Analyzing the structural parameter rule in the step 1) by using a design space sample analysis tool; 3) Overall judging the performance of the designed space sample model, and comparing the change of the performance; 4) And optimizing the design space set McaNetX according to the structural parameter change rule in the step 2).

The structural parameter rule analyzed here is a process in which a batch of models acquire the structural parameter rule through statistical and summarizing means. The design space is a set formed by a large number of models, and the structural parameter rule of the design space regulates the structure of each model at a macroscopic angle. Taking Depth (the number of blocks per Stage, which can be approximately understood as the network Depth) as an example, if the design space sample analysis tool is used to analyze the average Depth value of each Stage to be 1,3,2,1; the structural parameter law can be generalized to "lift". McaNetX can be considered as the union of all design spaces herein (in fact, mcaNetXA design space, see fig. 5), in which the design rules play the role of constraints, with each structural parameter having its corresponding design rule.

The "design space" is the complete set of models that theoretically exist under the constraint of structural parameter law, while the "model" is the minimum unit of training, i.e., the deep-learning neural network model (network structure+effective weight) trained based on gradient descent method. A "sample model" refers to a collection of models randomly sampled from a design space to represent the performance of the design space, but not equal to the design space. A "set of design spaces" is a union of all design spaces presented herein, i.e., a set of all models that exist theoretically.

Preferably, in step 1), the infrastructure is McaNet a multi-branch cross-channel attention network, and the McaNet multi-branch cross-channel attention network includes a fixed structure and a variable structure, and the structural parameters include a structural parameter Depth, a structural parameter Width, a structural parameter bot_mul, and a structural parameter Groups;

The fixed structure comprises a Stem network Stem and a Head network Head, wherein the Stem network Stem is responsible for preprocessing pictures of an input network, and the Head network Head consists of a simple pooling layer and a full-connection layer and plays a role of a classifier in the network;

The variable structure comprises a Body network Body, wherein the Body network Body part comprises 4 Stage network stages, each Stage network Stage consists of 1 Block-A network Block and a plurality of Block-B network blocks, and the number of blocks contained in each Stage network Stage is a structural parameter Depth.

Preferably, the backbone convolutions of the Block-a network Block and the Block-B network Block each comprise a plurality of branches and a cross channel attention module, each branch is the minimum variable unit of a design space variable structure, and each branch comprises 21×1 convolution layers and 13×3 convolution layers;

The number of output channels of the 3×3 convolution layer is a structural parameter Width, the size of the structural parameter Width is positively correlated with the size of the extracted information, the 3×3 convolution layer adopts cavity convolution, the cavity rate of the cavity convolution is controlled by a variable Rs, the number of branches in a Block is controlled by a variable Rn, and the output feature graphs of a plurality of branches are weighted by the cross channel attention module to extract features. The "information" may be regarded as image information carried by the image itself, such as "whether the tank has a gun barrel", "a shadow of the tank", and the like.

Preferably, the variable structure further includes a bottleneck structure, the bottleneck structure makes the structure of the network similar to the structure of the bottleneck by controlling the sizes of the front and rear convolution kernels of the network, the structural parameter bot_mul adjusts the channel number size of the partial convolution kernels, in a packet convolution mechanism of McaNet, the structural parameter Groups controls the group number of the packet convolution, and a proper packet convolution mechanism can reduce the parameter number and improve the model training speed. Bot_mul controls the "bottleneck structure" of the entire network by adjusting the size of the number of convolution kernel channels. Assuming that bot_mul is 2, the number of channels of the partial convolution kernel needs to be set to width/2 for the purpose of controlling the bottleneck structure.

Preferably, the variable structure further comprises a bypass convolution structure,

For the Block-A network Block, the bypass convolution structure is to convert the current input into the same channel number as the structural parameter Width through one 1×1 convolution kernel of the right bypass convolution in the Block-A of each Stage, and then to add the output tensor with the left trunk convolution output by matrix elements; in each part of the neural network model, the "input" is a feature map, i.e. an image after convolution or other manipulation of the original image,

For the Block-B network Block, the bypass convolution structure is to add matrix elements of the input and the trunk output.

Preferably, the method for analyzing the rule of the structural parameter in step 2) is to obtain the accuracy accuracy _j of the MSTAR test set and the structural parameter of the current model, and estimate the optimal structural parameter of the current Stage network Stage by using the accuracy accuracy _j of the MSTAR test set as a weight as the design rule of the excellent model, wherein the weighted average method formula is as follows:

wherein C _i is the optimal structural parameter, n is the number of models in the sample model, and C _j is the structural parameter of the current model;

and obtaining an optimal structural parameter mean value of each Stage network Stage of the sample model by a weighted average method, then arranging the optimal structural parameter mean value C _i by taking the Stage network Stage as a unit index, and generalizing and summarizing the parameter size relationship between every two d.

Preferably, in step 3), the performance of the entire design space is evaluated by a sample distribution function, first by the formula

Calculating a sample distribution function F (e), wherein e _i is an error value of a certain model, namely a difference value between the accuracy of the MSTAR dataset test set and 100%, n is the number of sample models, and e is an artificially set error limit value; then introducing weight values through NEDF functions to perform normalization operation so as to eliminate fairness evaluation deviation caused by other factors of the model, wherein NEDF functions are as follows

Where w _i is the weight of the single sample model; the introduction process of the weight w _i divides the complexity interval set according to the requirement into k equal subintervals, and then m _j' models falling into the same subinterval j' have the same weight:

Finally through the formula

Estimated sample model performance values, where P is NEDF estimated sample model performance values, ε is the set upper integral error value limit,Is a NEDF function value of the curve.

Preferably, the set of design spaces McaNetX in step 4) includes mcanetxa, mcanetxb, mcanexc, mcanetxd, mcanetxe, mcanexf.

For McaNetXB: let b. of each Stage of McaNet be the same, after fixing b. of each Stage, mcaNetXB design space performance is close to McaNetXA design space performance;

For mcanexc, on the basis of McaNetXB, the g. of each Stage of McaNet is made the same, and the fixed Groups have less influence on the performance of the network;

for mcanetdd, from a batch model trained by McaNetXC, a design space sample analysis tool is applied to analyze structural parameters w. to obtain design rules. Then, the design rule is applied to McaNetXC, namely, the structural parameter w is constrained by manually setting a random sampling mode, so that the design rule is ensured; the constraints on the structural parameters w. slightly increase the duty cycle of the high performance model in the sample model;

For mcanetxe, constraint on structural parameter d, we continue to use the same method as analysis w. to obtain a design rule, apply the rule to McaNetXD and randomly sample the model training, thus obtaining a sample model of McaNetXE;

For mcanetxf, constraint on structural parameters Rs and Rn, random sampling McaNetXE and analysis of Rs and Rn, applying the resulting design rules to McaNetXF, yields results that are again superior to the previous design space.

The beneficial effects of the invention are as follows: the semi-automatic network design flow designs the SAR image feature extraction network on the basis of combining the advantages of manual and NAS design networks so as to avoid the manual design of complex feature extraction algorithms and classifiers. In the design flow, the network design is not dependent on the modification of a single network instance, and the vision is not limited to the search space of a certain network, but the structure of the network is parameterized first, and the whole set is tested and optimized by adjusting the design rule of the structure parameters so as to obtain a model set with excellent performance for SAR image recognition tasks.

Drawings

FIG. 1 is a schematic diagram of the McaNet architecture of the present invention;

FIG. 2 is a schematic diagram of the i-th Stage network Stage architecture of the present invention;

FIG. 3 is se:Sup>A schematic BLOCK-A structure of the present invention;

FIG. 4 is a schematic BLOCK-B structure of the present invention;

FIG. 5 is a diagram of a McaNetX design space inclusion relationship of the present invention;

FIG. 6 is a NEDF image of the McaNetXA and McaNetXB sample models of the present invention;

FIG. 7 is a NEDF image of the McaNetXB and McaNetXC sample models of the present invention;

FIG. 8 is a NEDF image of the McaNetXC and McaNetXD sample models of the present invention;

FIG. 9 is a NEDF image of the McaNetXD and McaNetXE sample models of the present invention;

FIG. 10 is a NEDF image of the McaNetXE and McaNetXF sample models of the present invention;

FIG. 11 is a schematic diagram of a SOC operating condition confusion matrix of RdiNet-0.79M of the present invention;

FIG. 12 is a schematic diagram of a RdiNet-0.79M EOC-1 operating condition confusion matrix according to the present invention;

FIG. 13 is a schematic diagram of an EOC-2 operating condition confusion matrix according to the present invention, rdiNet-0.79M.

Detailed Description

The following describes specific embodiments of the present invention in detail with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.

McaNet (Multi-branch Cross-channel Attention Network) is a convolutional neural network structure of all of the design space models herein. McaNet comprises two parts, a fixed structure and a variable structure, the network performance is mainly related to the variable structure responsible for feature extraction, so the design of the fixed structure in the network needs to be as simple as possible. As shown in fig. 1, r inputted in the figure represents the image resolution, 1 represents the number of image channels, and the same applies below. In the fixed structure, the Stem structure is responsible for preprocessing the picture input to the network, and the Head structure consists of a simple pooling layer and a full connection layer, and plays a role of a classifier in the network.

McaNet is called Body (network Body), the structural design of which directly affects the feature extraction capability of the network, and the variability is controlled by the structural parameters shown in table 1. As shown in fig. 1, the Body part contains 4 stages (network phases), each Stage of fig. 2 is composed of 1 Block-a (network Block) and several blocks-B, and the number of blocks contained in a single Stage is the structural parameter Depth (hereinafter abbreviated as d.).

As shown in fig. 3 and 4, the difference between Block-a and Block-B is whether the resolution of the input image is modified by controlling the convolution kernel step size. The main convolution of two blocks consists of a plurality of branches, wherein each branch is the minimum variable unit of a design space variable structure, and the main structure of the main convolution is 21×1 convolution layers and 13×3 convolution layers. The number of output channels of the 3×3 convolution layer is structural parameter Width (hereinafter referred to as w.), the size of w is positively correlated with the size of the extracted information, the convolution layer adopts cavity convolution, the cavity convolution enlarges the receptive field of the convolution kernel under the condition of not increasing the number of convolution kernel parameters, and the cavity rate of the cavity convolution is controlled by Rs. The number of branches in Block is controlled by variable Rn, and the multi-branch output feature map is weighted by cross channel attention module to extract features. The large-scale receptive field of the multi-branch receptive field of the convolution kernel can extract large texture features of SAR images, the small-scale receptive field can simultaneously extract local details of the small texture features and speckle noise, and then the local details are weighted through the cross channel attention module, so that interference of the speckle noise is relieved to a certain extent.

McaNet have a bottleneck structure. Bot_mul (hereinafter referred to as b.) is a structural parameter for controlling the bottleneck ratio, and its function is to adjust the channel number of a part of convolution kernels, so that the existence of the bottleneck structure can flexibly design a network, reduce the calculation amount, and functionally remove high-frequency noise, but also cause the feature map to lose part of information. In the packet convolution mechanism of McaNet, the structural parameter Groups (hereinafter referred to as g.) controls the number of Groups of the packet convolution, and a proper packet convolution mechanism can reduce the number of parameters and improve the model training speed. Table 1 shows several structural parameters and the range of values.

McaNet the design of the bypass convolution structure shown in fig. 3 and fig. 4 is added, that is, in Block-a of each Stage, the current input is converted into the same channel number as w. through a1×1 convolution kernel of the right bypass convolution, then the output tensor and the left trunk convolution output are added by matrix elements, and in Block-B, the input and the trunk output are added by matrix elements in an identical mapping mode, so that the gradient vanishing problem easily occurring in the deep neural network training can be relieved.

Table 1 several structural parameters and values range table:

The core idea of optimizing the design space is to sample a set of models from the design space and describe the model accuracy distribution generated thereby to analyze the quality of the design space, and finally take the analyzed structural parameter rules as design rules. Analysis for the design space includes: (1) And analyzing the structural parameter rule of the design space sample model by using a design space sample analysis tool. (2) The performance of the design space sample model was evaluated as a whole using NEDF drawing tools, comparing the changes in performance. The design space sample analysis tool is a statistical tool used in the analysis process, screens and analyzes the sample model of the design space, and the displayed design space principle is richer than that of NAS (non-access stratum) search optimal model or manual network design. NEDF drawing tools introduce a mode of taking a complexity division interval as a weight, so that deviation of the overall evaluation design space can be reduced. Design space in the process of analysis and optimization, a model randomly sampled from the design space will gradually obtain performance improvement.

The design space can be used as a design rule by counting the combination rule of the structural parameters in the network structure. In randomly sampled samples of the design space, the distribution of the model consisting of different structural parameters on the model performance can be regarded as normal distribution, while excellent structural design rules can be deduced through a statistical way. The principle of the design space sample analysis tool is that a part of excellent models are extracted from sample models of the design space, weighted average operation is carried out on a certain parameter of an extracted model set, and the result is generalized, so that the obtained design rule can be used as the design rule for optimizing the design space. In this context, the normalization operation is performed by setting the optimal accuracy of the test set of the model as the weight in a weighted average manner, as shown in formula 1, taking the structural parameter d.e. as an example, wherein accuracy _j and d _j respectively represent the accuracy of the MSTAR test set and the structural parameter d.of the current model, estimating the optimal d.of the current Stage as the design rule of the excellent model by taking the accuracy of the test set as the weight, and generalizing after obtaining the average d.of 4 stages.

N is the number of models in the sample model.

The generalization process is a process of abstracting a result obtained by a design space sample analysis tool into a design rule, the design of the generalization process is a key of obtaining performance enhancement of the design space, and in order to ensure that a subset of the design space maintains a certain random optimization space while inheriting the design rule, the design of the generalization process cannot be too loose or tight. After comprehensively considering the characteristics of the SAR image, in order to keep the characteristics of low parameter quantity of the model, the generalization process is determined as simple fitting taking local linearity and piecewise function as thought. Taking d. as an example, after the d. average value of each Stage of the sample model is obtained through the weighted average method, the d. average value is arranged in an index way by taking the Stage as a unit, and the parameter size relationship between every two d. is generalized and summarized in three conditions of ascending, descending and flattening to be used as a design rule.

And (5) evaluating the overall performance of the design space, namely evaluating the overall performance of all models. The statistical EMPIRICAL DISTRIBUTION FUNCTION (sample distribution function) can analyze the performance of the whole design space, and the principle formula is shown in formula 2, wherein e _i is the error value of a certain model, namely the difference between the accuracy of the MSTAR dataset test set and 100%, n is the number of sample models, e is the artificially set error limit value, and F (e) is the model proportion that the error value is smaller than e.

The function image of the EDF function reflects the cumulative relation between the model accuracy and the parameter quantity distribution in the sample model error value. The NEDF function introduces weight values to perform normalization operation when calculating the EDF function value so as to eliminate fairness evaluation deviation caused by other factors of the model, a NEDF function formula is shown in a formula 3, wherein w _i is the weight value of a single sample model, and the meanings of the rest variables are consistent with a formula 2.

The weight of the sample model is obtained by dividing the complexity interval. Taking design spaces with different sample model complexity as an example, the introduction process of the weight divides a complexity interval set according to needs into k equal subintervals, and then m _j models falling into the same subinterval (recorded as subinterval j) have the same weight: The method ensures the correctness of normalization by using the weight.

In NEDF function images, the performance of the sample model when the sample model is counted to a specific error value can be represented by the area of a graph enclosed by NEDF function curves and abscissa, as shown in a formula 4, wherein P is a sample model performance value estimated by NEDF, epsilon is a set integral error value upper limit,Is a NEDF function value of the curve.

The NEDF images cover a comparison of the performance of the sample models. Taking NEDF function images as shown in fig. 9 as an example, the projection of any point on the function curve on the abscissa represents the proportion of the number of models in the design space that is less than or equal to a certain error value in the sample model of the corresponding design space. If the abscissa is taken as a certain error value, the McaNetXE curve in the graph is always located above the McaNetXD curve, which indicates that in the McaNetXE sample model, more models are counted into the interval with smaller error value; if the ordinate is taken to be a certain proportion value, the McaNetXE curve in the graph is always positioned at the left side of the McaNetXD curve, which indicates that after the models in the two sample model sets are arranged in ascending order of error values, the models with the same proportion are taken out from the model with the smallest error value, and the error maximum value of the sample model of McaNetXE is smaller than McaNetXD, so that the overall performance of the sample model of McaNetXE can be considered to be better than McaNetXD.

McaNetX is a set of all design spaces herein, and its optimization operations can be simply summarized as design, sampling (training), and evaluation, where evaluation and design are the process of taking design rules and applying design rules to the next design space by using design space tool statistical structural parameter rules, respectively, and sampling and training are the process of randomly sampling quantitative models from the design space and training with low duration and low computation.

Mcanetxa. Based on McaNet structural parameters and their range of values, it can be estimated in McaNetXA that there are (6×75×3×5×3×6) ⁴≈2.18*10²⁰ model structures, and the design space does not contain any structural parameter constraint rules, so McaNetXA is the parent set of all the following design spaces.

Mcanetxb. In this design space, let b. of each Stage of McaNet be the same, the purpose of this operation is to optimize the design space and to streamline the design rules of the network. 1500 models were randomly sampled and trained from McaNetXA using the same neural network setup, but the b.of each Stage of the network was defined as consistent. As can be seen from fig. 6, after fixing b. of each Stage, mcaNetXB design space performance is close to McaNetXA design space performance.

Mcanexc. On the basis of McaNetXB, g. of each Stage of McaNet is made identical, and appropriate g. can reduce the number of network parameters, reduce the cost of network training, and improve the parallelization efficiency. Fig. 7 shows that the fixed Groups have less impact on the performance of the network.

The mcanetdd.xd design space optimally designs the variable structure of McaNet. We apply a design space sample analysis tool of 3.2.1 to analyze the structural parameters w. from the McaNetXC trained batch model to obtain the design rules. And then, applying the design rule to McaNetXC, namely manually setting a random sampling mode to restrict w. As can be seen from fig. 8, the high accuracy model of the design space XD has a slightly higher duty ratio than the design space XC, but has a slight gap from XC in the middle section of the curve, and the lower section is not up and down. The results show that applying the design rule constraint on w slightly increases the duty cycle of the high performance model in the sample model.

Mcanetxe. Next, constraint on structural parameter d. We continue to use the same method as analysis w. to obtain the design rule, apply this rule to McaNetXD and randomly sample the model training, resulting in a sample model of McaNetXE. From the graph trend of fig. 9, the overall performance of the design space is improved in the high, middle and low stages, which indicates that the constraint on d. effectively improves the performance of the model.

Mcanetxf. This design space is the last design space of McaNetX design spaces, and the structural parameters that need to be fixed are Rs and Rn. The 3×3 convolution kernel of McaNet incorporates a design of hole convolution, and the size and number of the receptive fields of the convolution layer are controlled by Rn and Rs respectively, so that Rs and Rn are fixed at the same time in McaNetXF. In keeping with the foregoing, the Rs and Rn are randomly sampled McaNetXE and analyzed, and the resulting design rule is applied to McaNetXF. As shown in fig. 10, we have obtained a result that is again superior to the previous design space, which demonstrates the effectiveness of McaNet unique multi-branch hole convolution structure in the SAR image identification field.

In the foregoing, we have gradually discussed the design method and principles from McaNetXA to McaNetXF (hereinafter referred to as RdiNet in the following paragraphs), by applying a series of design rules derived from analysis of the design space, we gradually sample McaNetX the design space while improving the performance of the sample model and verifying the validity of statistical rules to guide the network design.

Of these, rdiNet-2.4M is one of the excellent models extracted from RdiNet, and the structural parameter data are shown in Table 2. In the MSTAR dataset experiments, the performance of this model is shown in tables 4,5, and 6, respectively.

Table 2 RdiNet-2.4M structural parameters table:

Model	RdiNet-0.24M
		Quantity of parameters	247925
Depth	[3，3，2，1]
		Width	[152,80,344,416]
Bot_mul	[1，1，1，1]
		Groups	[8，8，8，8]
Rn	[1，3，1，3]
		Rs	[1，11，1，9]

In order to verify the effectiveness of the present Wen Suanfa on SAR image target recognition, we choose MSTAR dataset to test the model. The pixel size of each image in the MSTAR dataset is 128×128, and the sample set is divided into standard operating conditions (Standard Operating Condition, SOC) and extended operating conditions (Extended Operating Condition, EOC). The sample set is generated under the condition of changing the view angle of the imaging side, the target attitude, the target model or the like SAR acquisition, and has great challenges for the target recognition algorithm.

The SOC condition comprises 3 kinds of targets with 7 types, the data azimuth angles of the training set and the testing set are 17 degrees and 15 degrees respectively, the sample information of the data set is ten kinds of targets, and the numbers of the training set and the testing set are 2747 and 3203 pieces respectively.

The EOC conditions include two subsets of EOC-1 and EOC-2. The EOC-1 dataset contains 4 types of ground targets, and the azimuth angles and the sample numbers of the training set and the test set are 17 degrees, 1195 degrees and 30 degrees, 1151 degrees respectively. The large azimuth angle difference between the test set and the training set causes obvious image characterization difference between the training set and the test set of the same gesture of the same target, belongs to the problem of small sample learning, and has higher requirement on generalization capability of the model. The training set of the EOC-2 data set comprises 996 types of 4 ground targets including T72, the azimuth angle condition is 15 degrees, the test set uses T72 targets with different serial number versions respectively, the azimuth angle condition is 2710 pieces under the condition of 15 degrees or 17 degrees, the T72 targets with different serial number versions have slight characteristic differences, the T72 target types of the test set are not in the training set, and the recognition difficulty is greatly improved. In combination, the SOC, the EOC-1 and the EOC-2 have higher requirements on the generalization capability of the model, and the recognition difficulty is gradually increased. In this experiment, rdiNet was used to target the MSTAR dataset and control groups were set, each group of control groups created sub-experiments based on SOC, EOC-1, EOC-2 conditions, respectively, and PC hardware used for the experiments was Intel Xeno Gold 6240CPU,Tesla V100S GPU based.

Data enhancement was not used for all experiments. Data enhancement is a means of expanding the number of data set samples by means of translation, rotation, noise addition, occlusion, etc. Experiments of MSTAR data sets belong to the problem of small sample learning, and the generalization capability and robustness of RdiNet aiming at SAR target recognition problems can be conveniently displayed without using a data enhancement means.

The test was carried out at RdiNet-0.79M, and the data of the structural parameters are shown in Table 3. The confusion matrix is a visualization tool that evaluates the classification performance of an object, with its rows corresponding to the correct class of the object and columns corresponding to the predicted class labels of the object, each element being the number of the corresponding object. FIGS. 11, 12, and 13 show, in the form of confusion matrices, the test set classification capability of RdiNet-0.79M under SOC, EOC-1, EOC-2 conditions, respectively. To compare RdiNet-0.79M with the ability of an optical convolutional neural network to perform SAR image recognition tasks, we compared with the optical convolutional neural network under three classes of operating conditions of the same training environment and conditions, respectively. As shown in tables 4 and 5.

Table 3 structural parameters table of the model used for the experiment:

Structural parameters	Value of
		Depth	[2，3，2，1]
Width	[128,64,200，256]
		Bot_muls	[1，1，1，1]
Groups	[8，8，8，8]
		Rn	[1，2，1，2]
Rs	[1，5，1，6]

Table 4 SOC conditional performance comparison table:

Network type	Quantity of parameters per million	Training time/second	Test time/second	Optimum accuracy/%
					ResNet-18	11.18	339.70	0.15	99.26
ResNext-50	23.00	922.04	0.44	98.92
					DenseNet-121	6.96	803.35	0.65	99.38
VGG-11	128.78	610.84	0.05	95.46
					SENet	1.23	289.55	0.15	92.82
SKNet	10.44	4338.52	0.47	96.70
					A-CONVNet	0.30	165.88	0.05	98.73
RdiNet-0.79M	0.79	901.46	0.43	99.75
					RdiNet-2.40M	2.40	1449.19	0.52	99.67

Table 5 EOC-1 condition Performance comparison Table:

Network type	Quantity of parameters per million	Training time/second	Test time/second	Optimum accuracy/%
					ResNet-18	11.18	223.75	0.08	99.22
ResNext-50	23.00	454.82	0.16	98.70
					DenseNet-121	6.96	421.67	0.25	91.75
VGG-11	128.78	322.26	0.02	98.87
					SENet	1.23	215.13	0.07	95.83
SKNet	10.44	1929.05	0.17	98.87
					A-CONVNet	0.30	132.15	0.04	96.18
RdiNet-0.79M	0.79	530.04	0.22	99.48
					RdiNet-2.40M	2.40	828.47	0.24	95.91

As can be seen from tables 4 and 5, rdiNet-0.79M is superior to the mainstream convolutional neural network in accuracy of performing the target recognition task under SOC condition and EOC-1 condition with low MSTAR dataset difficulty, the recognition accuracy reaches 99.75% and 99.48%, while under EOC-2 condition with high requirement on model generalization capability, rdiNet-0.79M is improved in accuracy by 8.19% with 3.16% of parameter quantity compared with ResNet with most similar structural characteristics. RdiNet-0.79M also has the phenomenon of over fitting on EOC-2, and finally achieves the accuracy of the test set shown in Table 6 on the premise of applying Early-Stop method. Table 6 EOC-2 condition Performance comparison Table:

Network type	Quantity of parameters per million	Training time/second	Test time/second	Optimum accuracy/%
					ResNet-18	11.18	225.30	0.14	87.13
ResNext-50	23.00	538.95	0.41	82.15
					DenseNet-121	6.96	453.60	0.73	78.47
VGG-11	128.78	368.13	0.05	91.59
					SENet	1.23	205.90	0.18	83.11
SKNet	10.44	2255.71	0.43	94.36
					A-CONVNet	0.30	152.68	0.10	93.81
RdiNet-0.79M	0.79	591.29	0.48	95.32
					RdiNet-2.40M	2.40	876.40	0.55	94.21

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

While the foregoing is directed to embodiments of the present invention, other and further details of the invention may be had by the present invention, it should be understood that the foregoing description is merely illustrative of the present invention and that no limitations are intended to the scope of the invention, except insofar as modifications, equivalents, improvements or modifications are within the spirit and principles of the invention.

Claims

1. An optimization method of a convolutional neural network model design space McaNetX, which is characterized by comprising the following steps:

1) According to the characteristics of SAR images, designing a characteristic extraction network basic structure and representing the basic structure by using structural parameters;

2) Analyzing the structural parameter rule in the step 2) by using a design space sample analysis tool as a design rule;

3) Evaluating the performance of a sample model composed of the model obtained by randomly sampling and training from the design space McaNetX as a whole, comparing the variation of the performance;

4) Optimizing the design space McaNetX according to the structural parameter change rule in the step 2);

The basic structure in the step 1) is McaNet multi-branch cross-channel attention network, the McaNet multi-branch cross-channel attention network comprises a fixed structure and a variable structure, and the structural parameters comprise a structural parameter Depth, width, bot _mul and a structural parameter Groups;

the fixed structure comprises a Stem network Stem and a Head network Head, wherein the Stem network Stem is responsible for preprocessing pictures input into the McaNet multi-branch cross-channel attention network, and the Head network Head consists of a global pooling layer and a full connection layer;

The variable structure comprises a Body network Body, wherein the Body network Body comprises 4 Stage network stages, each Stage network Stage consists of 1 Block-A network Block and a plurality of Block-B network blocks, and the number of the Block blocks contained in each Stage network Stage is the structural parameter Depth;

The backbone convolutions of the Block-A network Block and the Block-B network Block comprise a plurality of branches and a cross-channel attention module, each branch is a minimum variable unit of a variable structure in a McaNet multi-branch cross-channel attention network, and each branch comprises 21 multiplied by 1 convolution layers and 1 multiplied by 3 convolution layers;

The output channel number of the 3X 3 convolution layer is a structural parameter Width, the size of the structural parameter Width is positively correlated with the size of image characteristic information quantity extracted from an input image by a neural network, the 3X 3 convolution layer adopts cavity convolution, the cavity rate of the cavity convolution is controlled by a variable Rs, the branch numbers in a Block-A network Block and a Block-B network Block are controlled by a variable Rn, and the output characteristic diagrams of a plurality of branches are weighted and extracted by the cross-channel attention module;

The variable structure further comprises a bottleneck structure, the structural parameter Bot_mul is a structural parameter for controlling the bottleneck ratio, and the structural parameter Bot_mul achieves the purpose of controlling the bottleneck structure of the Body part of the network by adjusting the channel number of the partial convolution kernel; in the packet convolution mechanism of McaNet, the number of Groups of the packet convolution is controlled through the structural parameter Groups;

The variable structure further includes a bypass convolution structure,

For the Block-A network Block, the bypass convolution structure is to convert the current feature diagram input into the same channel number as the structural parameter Width through one 1×1 convolution kernel of the right bypass convolution in the Block-A of each Stage, and then to add matrix elements to the left trunk convolution output of the output feature diagram;

For the Block-B network Block, the bypass convolution structure is to add matrix elements of the input and the trunk output;

the method for analyzing the rules of the structural parameters in the step 2) is as follows

2.1 Sorting the sample models according to the accuracy of the test set, and extracting the model of the first 10% of the total amount of the sorted sample models; the MSTAR test set accuracy accuracy _j and the structural parameters of the model with the sequence number j in the sample model are obtained, and the structural parameter mean value of the current Stage network Stage is obtained by using the MSTAR test set accuracy accuracy _j as a weight by using a weighted average method, wherein the weighted average method has the formula:

wherein C _i is the optimal structural parameter, C _j is the structural parameter of the current model, n is the number of models in the sample model, and j is the sequence number of the models in the sample model;

2.2 After the structural parameter average value C _i of each Stage network Stage of the sample model is obtained through a weighted average method, arranging a plurality of structural parameter average values C _i by taking the Stage network stages as indexes, and summarizing the parameter size relationship between every two optimal structural parameter average values C _i by taking three conditions of ascending, descending and averaging as a design rule;

in the step 3), the performance of the whole design space is judged through a sample distribution function, firstly, the performance is judged through a formula

Where w _i is the weight of the single sample model; the introduction process of the weight w _i divides the complexity interval set according to the requirement into k equal subintervals, and then m _j' models falling into the subintervals with the same serial number j' have the same weight:

Finally through the formula Estimating the performance value of the sample model, wherein the larger the performance value P represents the better performance, P is NEDF estimated sample model performance value, E is the set integral error upper limit,For NEDF function values of the curve, j' is the sequence number of each of k equal subintervals, and m is the number of the sample model falling into the complexity interval;

The set McaNetX of design spaces in step 4) includes McaNetXA, mcaNetXB, mcaNetXC, mcaNetXD, mcaNetXE, mcaNetXF with McaNetXA as a parent set of all subsequent design spaces;

for McaNetXB: the structural parameter Bot_mul of each Stage in the design space McaNet is the same, and after the structural parameter Bot_mul of each Stage is fixed, the McaNetXB design space performance is close to the McaNetXA design space performance;

for McaNetXC: on the basis of McaNetXB, the structural parameters of each Stage in the design space McaNet are the same, and the fixed Groups have small influence on the performance of the McaNet multi-branch cross-channel attention network;

For McaNetXD: analyzing the structural parameter Width according to the step 2) to obtain a design rule, and then applying the design rule to McaNetXC design, so as to obtain a McaNetXD sample model;

for McaNetXE: performing constraint analysis on the structural parameter Depth according to the step 2) to obtain a design rule, and then applying the rule to McaNetXD and randomly sampling the model for training so as to obtain a McaNetXE sample model;

For McaNetXF: the structural parameters Rs and Rn are subjected to constraint analysis according to step 2) to obtain a design rule, and then the rule is applied to McaNetXE and randomly sampled for model training, so that a McaNetXF sample model is obtained.