CN110942057B

CN110942057B - A container number identification method, device and computer equipment

Info

Publication number: CN110942057B
Application number: CN201811113365.XA
Authority: CN
Inventors: 桂一鸣
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2018-09-25
Filing date: 2018-09-25
Publication date: 2024-12-06
Anticipated expiration: 2038-09-25
Also published as: CN110942057A

Abstract

The application provides a container number identification method, a device and computer equipment. The container number identification method comprises the steps of positioning a target area where a container number is located in an image to be identified containing the container number, performing space transformation on the target area to obtain a transformed target area, performing feature extraction on the transformed target area to obtain a first feature map, inputting the first feature map into a container number identification model trained in advance, serializing the first feature map by the container number identification model to obtain a feature sequence, performing coding processing on the feature sequence to obtain a coding result, decoding the coding result, and outputting a decoding result, and determining the container number in the image to be identified according to the decoding result. The container number identification method provided by the application can accurately identify the container number from the image to be identified.

Description

Container number identification method and device and computer equipment

Technical Field

The present application relates to the field of image recognition, and in particular, to a method, an apparatus, and a computer device for recognizing a container number.

Background

In a gate operation, each container is typically assigned a box number to identify the individual container by the box number. In recent years, in order to reduce manual transcription errors and labor cost, the number of the container is often identified by an automatic identification technology.

The related art discloses a container number recognition method, which comprises the steps of positioning a target area where a container number is located from an image to be recognized, performing character segmentation on the target area, recognizing a plurality of characters obtained through segmentation respectively to obtain a plurality of recognition results, and combining the plurality of recognition results to obtain the container number.

When the method is adopted to identify the container number, character segmentation needs to be carried out on a target area where the container number is located, the dependence is strong, the problem of inaccurate character segmentation exists under the conditions of poor light, pollution, large inclination and the like, and the problem of low identification accuracy caused by inaccurate character segmentation exists.

Disclosure of Invention

In view of the above, the present application provides a method, an apparatus and a computer device for identifying a container number, so as to provide a method for identifying a container number with high accuracy.

The first aspect of the application provides a container number identification method, which comprises the following steps:

Positioning a target area where the container number is located from an image to be identified containing the container number, and performing space transformation on the target area to obtain a transformed target area;

Extracting features of the transformed target area to obtain a first feature map;

inputting the first feature map into a pre-trained container number recognition model, serializing the first feature map by the container number recognition model to obtain a feature sequence, performing coding treatment on the feature sequence to obtain a coding result, decoding the coding result, and outputting a decoding result;

and determining the container number in the image to be identified according to the decoding result.

A second aspect of the application provides a container number identification device, said device comprising a detection module, an identification module and a processing module, wherein,

The detection module is used for positioning a target area where the container number is located from an image to be identified containing the container number;

the identification module is used for carrying out space transformation on the target area to obtain a transformed target area;

The identification module is also used for extracting the characteristics of the transformed target area to obtain a first characteristic diagram;

The identification module is further configured to input the first feature map into a pre-trained container number identification model, sequence the first feature map by using the container number identification model to obtain a feature sequence, encode the feature sequence to obtain an encoding result, decode the encoding result, and output a decoding result;

and the processing module is used for determining the container number in the image to be identified according to the decoding result.

A third aspect of the application provides a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of any of the methods provided in the first aspect of the application.

A fourth aspect of the application provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of any of the methods provided in the first aspect of the application when the program is executed.

According to the container number identification method, device and computer equipment, the target area where the container number is located in the image to be identified containing the container number, the target area is subjected to space transformation, the transformed target area is obtained, the transformed target area is subjected to feature extraction, a first feature map is obtained, the first feature map is further input into a container number identification model trained in advance, the container number identification model sequences the first feature map to obtain a feature sequence, the feature sequence is subjected to coding processing, a coding result is obtained, the coding result is decoded, and a decoding result is output, so that the container number in the image to be identified is determined according to the decoding result. Therefore, the container number can be identified based on the target area without character segmentation on the target area, and the identification accuracy is high.

Drawings

Fig. 1 is a flowchart of a first embodiment of a container number identification method provided by the present application;

FIG. 2 is a schematic illustration of a container number according to an exemplary embodiment of the present application;

FIG. 3 is a schematic diagram of an implementation of serializing a first feature diagram according to an exemplary embodiment of the present application;

FIG. 4 is a schematic diagram of an attention model shown in an exemplary embodiment of the present application;

fig. 5 is a flowchart of a second embodiment of a container number identification method provided by the present application;

FIG. 6 is a schematic diagram of a detection network according to an exemplary embodiment of the present application;

fig. 7 is a flowchart of a third embodiment of a container number identification method provided by the present application;

FIG. 8 is a schematic diagram of an implementation of a method for identifying a container number according to an exemplary embodiment of the present application;

FIG. 9 is a hardware block diagram of a computing device in which a container number identification device is located, according to an exemplary embodiment of the present application;

Fig. 10 is a schematic structural diagram of a container number identification device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the application. The term "if" as used herein may be interpreted as "at..once" or "when..once" or "in response to a determination", depending on the context.

The application provides a container number identification method, a device and computer equipment, and aims to provide a container number identification method with high identification accuracy.

The container number identification method and device provided by the application can be applied to computer equipment. For example, the present application is applicable to an image capturing apparatus (which may be a camera), and for example, it is also applicable to a server. In the present application, this is not limited.

The following specific embodiments are provided to describe the technical solution of the present application in detail, and the following specific embodiments may be combined with each other, and may not be repeated in some embodiments for the same or similar concepts or processes.

Fig. 1 is a flowchart of a first embodiment of a container number identification method provided by the present application. Referring to fig. 1, the method provided in this embodiment may include:

S101, locating a target area where the container number is located from an image to be identified containing the container number, and performing space transformation on the target area to obtain a transformed target area.

It should be noted that, the image to be identified is a snap frame image collected by the image collecting device, which may be an image containing a container number collected by the image collecting device in real time, or may be an image containing a container number stored by the image collecting device.

Specifically, fig. 2 is a schematic diagram of a container number according to an exemplary embodiment of the present application. Referring to fig. 2, the container numbers may be distributed horizontally or vertically, and the composition structure of the container numbers may be expressed in XYZ form no matter what kind of distribution is adopted. Where X is a 4-bit master bin number (all letters), Y is a 7-bit number, and Z is a 4-bit ISO number (it should be noted that the ISO number may not appear at times). It should be noted that the last digit in Y is a check code, which can be calculated from the 4-digit letter in X and the first 6-digit number in Y.

Further, a target area where the container number is located can be located from the image to be identified containing the container number using a related target detection algorithm. In the present embodiment, this is not limited. For example, in one embodiment, the target area where the container number is located may be located from the image to be identified based on a YOLO (You Only Look Once, YOLO for short) model.

For another example, the target area where the container number is located may also be located from the image to be identified containing the container number by a detection network.

For example, a deep convolutional neural network may be constructed for container number detection, where the input of the deep convolutional neural network is set as an image, the input is set as a coordinate position of a container number on the image, a quadrangle is used to represent the region coordinates of each row or each column, and when the container number is formed by a plurality of rows or columns, a plurality of quadrangle coordinates are output, and the quadrangle may be inclined, which indicates that the container number has a certain direction.

The constructed deep convolutional neural network may be a network modified based on an SSD network. For example, in one embodiment, the network may comprise a 29-layer full-convolutional network structure, wherein the first 13 layers inherit from a VGG-16 network (converting the last full-concatenated layer in the VGG16 network to a convolutional layer), followed by a 10-layer full-convolutional network structure, and then followed by a Text-box layer concatenated with 6 full-convolutional network structures. The Text-box layer is a key component of the constructed SSD-based modified network, and 6 full convolution network structures in the 6 Text-box layers are respectively connected with 6 feature map positions in the previous network, and each feature map position, one Text-box layer predicts an n-dimensional vector (the n-dimensional vector can comprise whether Text is 2-dimensional, horizontal boundary rectangle is 4-dimensional, rotation boundary rectangle is 5-dimensional and quadrilateral is 8-dimensional).

It should be noted that, the detailed description about the detection network will be described in detail in the following embodiments, which are not repeated here.

Further, the STN network may be used to spatially transform the target area to obtain a transformed target area. In a specific implementation, the target area is input to a pre-trained STN network (Spatial Transformers, STN for short), and the STN network spatially transforms the target area and outputs the transformed target area.

In particular, the STN network may perform spatial transformations (including, but not limited to, translation, scaling, rotation, etc.) on the target area without the need for calibration of the keypoints. Thus, the target area after transformation is utilized for identification, and the identification accuracy can be improved.

It should be noted that, for the specific structure and specific implementation principle of the STN network, reference may be made to the description in the related art, and no further description is given here.

S102, extracting features of the transformed target area to obtain a first feature map.

Specifically, the feature extraction can be performed on the transformed target region by using a conventional method. For example, a Scale-invariant feature transform SIFI algorithm (Scale-INVARIANT FEATURE TRANSFORM, SIFT) is adopted to extract features of the transformed target region. Of course, the feature extraction may also be performed on the transformed target area by using a neural network, for example, in an embodiment, a specific implementation procedure of this step may include:

The transformed target area is input into a neural network for feature extraction, a designated layer in the neural network is used for feature extraction of the transformed target area, the designated layer comprises a convolution layer or comprises a convolution layer and at least one of a pooling layer and a full-connection layer, and the output result of the designated layer is determined to be the first feature map.

In particular, the neural network for feature extraction may include a convolution layer for performing a filtering process on an input transformed target region. Further, at this time, the filtering processing result output by the convolution layer is the first feature map extracted. Furthermore, the neural network for feature extraction may also include a pooling layer and/or a fully connected layer. For example, in one embodiment, the neural network for performing feature extraction includes a convolution layer, a pooling layer, and a full connection layer, where the convolution layer is configured to perform filtering processing on an input transformed target area, the pooling layer is configured to perform compression processing on a filtering processing result, and the full connection layer is configured to perform aggregation processing on the compression processing result. Further, at this time, the aggregation processing result output by the full connection layer is the first feature map extracted.

S103, inputting the first feature map into a pre-trained container number recognition model, serializing the first feature map by the container number recognition model to obtain a feature sequence, performing coding processing on the feature sequence to obtain a coding result, decoding the coding result, and outputting a decoding result.

Specifically, fig. 3 is a schematic diagram of an implementation of serializing the first feature diagram according to an exemplary embodiment of the present application. Referring to fig. 3, the process of serializing the first feature map may include:

(1) Sliding a preset sliding window on the first characteristic diagram according to a preset moving step length so as to divide a local characteristic diagram of the position of the sliding window;

(2) And determining all the segmented local feature graphs as the feature sequences.

Specifically, in one embodiment, the container number identification model may be an attention model, where the attention model may include a convolution network, and the step (1) may be implemented through the convolution network.

In addition, the size of the preset sliding window is adapted to the first characteristic diagram. For example, when the dimension of the first feature map is a×b×c (where a and B are the height and width of the first feature map, respectively, and C is the number of channels included in the first feature map). At this time, the size of the sliding window may be set to a×a. In addition, the preset moving step length is set according to actual needs. In the present embodiment, this is not limited. For example, in one embodiment, the preset movement step is 2.

Further, referring to fig. 3, in a specific implementation, a preset sliding window may be disposed at one end of the first feature map, and a local feature map of the position of the sliding window may be segmented, so as to move the sliding window based on a preset movement step length, and segment the local feature map of the position of the sliding window after movement. In this way, this process is repeated until the sliding window moves to the other end of the first signature. And finally, determining all the segmented local feature graphs as feature sequences.

It should be noted that when the first feature map is segmented by using the preset sliding window and the preset moving step, if the final remaining portion cannot be covered by the sliding window, the first feature map may be filled. Further, since the above-described first feature map includes a plurality of channels, each of the divided partial feature maps also includes a plurality of channels.

Further, fig. 4 is a schematic diagram of an attention model according to an exemplary embodiment of the present application. Referring to fig. 4, the attention model may further include an input layer, a hidden layer, and an output layer connected in sequence, where X (X ₁、X₂、X₃、X₄……、X_m represents a feature sequence input to the input layer, α _t,1、α_t,2、α_t,3、α_t,3……α_t,m represents a weight parameter of each feature in the feature sequence at time t (the dimension of the weight parameter of each feature is the same as the dimension of the feature), ct represents a coding result at time t, S _t-1、S_t represents a hidden layer state (initial time, hidden layer state is 0) associated with each time, and y _t、y_t+1 represents a decoding result at each time.

Referring to fig. 4, a detailed implementation process of performing an encoding process on a feature sequence to obtain an encoding result, and outputting a decoding result after decoding the encoding result is described in detail below, where the process may include:

(1) And calculating weight parameters of all the features in the feature sequence at all the moments.

Specifically, this step is implemented by the input layer. In addition, the weight parameters of each feature in each time feature sequence may be calculated according to a first formula:

Wherein alpha _t,i is the weight parameter of the ith feature in the feature sequence at the time t;

X _i is the ith feature in the feature sequence;

s _t-1 is the hidden layer state at the time t-1;

Is an activation function;

w and U are model parameters of the attention model.

(2) And calculating the coding result at each moment according to the weight parameters of each feature in the feature sequence at each moment and the feature sequence.

Specifically, this step is implemented by hidden layers. In addition, the implementation process of the step can comprise the steps of carrying out weighted summation processing on the feature sequence by utilizing the weight parameters of each feature in each time feature sequence, and determining the obtained weighted summation as the coding result of the time.

With reference to the foregoing description, the process may be expressed by a second formula, which is:

Wherein Ct is the encoding result at time t.

(3) And calculating hidden layers related to the context of each time according to the characteristic sequence and the coding result of each time.

Specifically, this step is implemented by hidden layers. Furthermore, the context-dependent hidden layer state at each time instant may be calculated according to a third formula:

s_t＝LSTM(s_t-1,C_t,y_t-1)

The hidden layer state at time t is related to the hidden layer state at time t-1, the decoding result at time t, and the decoding result outputted by the attention model at time t-1.

(4) And obtaining decoding results at all the moments according to the hidden layers related to the context at all the moments.

Specifically, this step is implemented by the output layer. In addition, the decoding result at each time may be calculated by a fourth formula:

y_t＝softmax(s_t)

Specifically, the decoding result at each time includes the confidence level of each candidate character at that time and the character identified at that time. The character identified at this time is the candidate character with the highest confidence among the candidate characters.

According to the method provided by the embodiment, the attention model is utilized to identify the target area after the space transformation, so that the container number can be identified based on the target area without character identification, and the accuracy is high.

S104, determining the container number in the image to be identified according to the decoding result.

Specifically, in an embodiment, each character identified in the decoding result may be directly combined in sequence, and the combined result is determined as the container number in the image to be identified.

According to the method provided by the embodiment, the target area where the container number is located in the image to be identified containing the container number, the target area is subjected to space transformation, the transformed target area is obtained, the transformed target area is subjected to feature extraction, a first feature map is obtained, the first feature map is further input into a container number identification model trained in advance, the first feature map is serialized by the container number identification model, a feature sequence is obtained, the feature sequence is subjected to coding processing, a coding result is obtained, the coding result is decoded, and then a decoding result is output, so that the container number in the image to be identified is determined according to the decoding result. Therefore, the container number can be identified based on the target area where the whole container number is located without character segmentation, and the identification accuracy is high. In addition, the method provided by the application can identify the rotating, bending and tilting container numbers (namely, the container numbers with larger deformation can be identified) without adding manual marks, and has wider applicability.

Fig. 5 is a flowchart of a second embodiment of a container number identification method provided by the present application. Referring to fig. 5, based on the above embodiment, step S101 of the method provided in the present embodiment, a specific implementation process of locating a target area where a container number is located from an image to be identified including the container number may include:

S501, inputting the image to be identified into a pre-trained detection network, extracting multi-level features of the image to be identified by the detection network to obtain a specified number of second feature images, respectively carrying out classification and position regression on each second feature image, and outputting classification results and position information of a plurality of candidate areas, wherein the dimensions of the specified number of second feature images are different.

In particular, the detection network may be implemented by a convolutional layer. Further, the specified number is set according to actual needs. For example, in one embodiment, the specified number is 6.

Fig. 6 is a schematic diagram of a detection network according to an exemplary embodiment of the present application. Referring to fig. 6, the detection network may include a 29-layer full-convolution network structure, wherein the first 13-layer full-convolution network structure is inherited from the VGG-16 network (the last full-connection layer in the VGG-16 network is converted into a convolution layer), and is followed by a 10-layer full-convolution network structure (e.g. Conv6 to conv11_2 in fig. 6, wherein conv8_2, conv9_2, conv10_2, and conv11_2 are each preceded by a layer of full-convolution network structure, which is not shown in fig. 6). It should be noted that, referring to fig. 6, the 23-layer full convolution network structure is used to perform multi-level feature extraction, so as to obtain 6 second feature graphs.

Further, with continued reference to FIG. 6, the 23-layer full convolutional network structure is followed by 6 Text-box layers, each of which may be implemented by a full convolutional network structure. Referring to fig. 6, each Text-box layer is connected to a previous full convolution network structure, and is used to classify and regress the second feature map output by the full convolution network structure, and output classification results and position information of multiple candidate regions.

In addition, referring to fig. 6, an NMS layer is connected to the back of the text-box layer, and is configured to perform non-maximum suppression processing on the plurality of candidate areas based on the classification result and the location information of each candidate area, so as to obtain a target area where the container number is located.

It should be noted that, in an embodiment, the classification result may be a foreground and background classification result. In addition, the position information of a candidate region can be characterized by a 12-dimensional vector including coordinates (8 dimensions) of four points of a quadrangle including the candidate region, and coordinates of a center point, a width, and a height (4 dimensions) of a circumscribed horizontal rectangle of the quadrangle. Of course, in an embodiment, the location information may also include coordinates and width (5 dimensions) of two diagonal points of the rotation boundary rectangle corresponding to the candidate region.

It should be noted that, because the dimensions of the plurality of second feature maps are different, that is, the receptive fields of the plurality of second feature maps are different, the finally obtained target area is equivalent to the target area obtained by classifying and position regression on the feature maps of the plurality of different receptive fields, and the target area has strong multi-scale detection capability.

The following briefly describes the implementation principle of the Text-box layer for classifying the second feature map and performing position regression.

Specifically, the Text-box layer comprises three parts, namely a candidate frame layer, a classification layer and a position regression layer. The candidate frame layer is used for generating a plurality of candidate frames with different sizes at the positions of the pixel points according to a preset rule by taking each pixel point in the second feature map as a center, and further providing the candidate frames for the classification layer and the position regression layer to perform category judgment and position fine modification.

Further, the classification layer outputs probabilities that each candidate box belongs to the foreground and the background. The position regression layer outputs position information of each candidate frame. It should be noted that, the classification layer and the position regression layer are implemented by using a convolution layer. In addition, to accommodate text detection, the convolution layer may employ 3*5 convolution kernels.

For example, in one embodiment, the dimension of one of the second feature patterns is 40×42×128, where 40×42 represents the height and width of the second feature pattern, and 128 represents the number of channels included in the second feature pattern. The candidate frame layer uses each pixel point in the second feature map as a center, and generates 20 candidate frames with different sizes at the positions of the pixel points according to a preset rule (the specific implementation principle is described in the related art, and details are not repeated here).

In addition, the dimension of the convolution kernel in the classification layer is 40×3×5×128, where 40 denotes the number of convolution kernels and 3*5 denotes the size of the convolution kernel. The step size of the convolution kernel shift is 1. In this way, the classification layer carries out convolution processing on the second feature map to obtain a first convolution processing result. The dimension of the first convolution result is 40×40×42. Further, for a first target convolution processing result corresponding to each pixel point in the first convolution processing result (the first target convolution processing result includes 40 convolution values), the first convolution processing result represents a classification result of 20 candidate frames corresponding to the pixel point (the classification result of each candidate frame includes a classification (two-dimensional) of foreground and background).

Further, for example, the convolution kernel of the position regression layer has a dimension of 240×3×5×128, where 240 represents the number of convolution kernels and 3*5 represents the size of the convolution kernels. The step size of the convolution kernel shift is 1. Thus, the position regression layer carries out convolution processing on the second feature map to obtain a second convolution processing result. The dimension of the second convolution result is 240×40×42. And aiming at a second target convolution processing result (the second target convolution processing result comprises 240 convolution values) corresponding to each pixel point in the second convolution processing result, wherein the second target convolution processing result represents the position information of 20 candidate frames corresponding to the pixel point. Referring to the foregoing description, the location information of each candidate box includes 12 dimensions. For example, the first 12 of the 240 convolution values represent the location information of the first candidate box.

The above-mentioned candidate boxes are understood as candidate regions.

S502, performing non-maximum suppression processing on the plurality of candidate areas based on the classification result and the position information of each candidate area to obtain a target area where the container number is located.

With continued reference to fig. 6, in the example shown in fig. 6, this step may be accomplished by detecting the NMS layer in the network.

It should be noted that, for the specific implementation principle and implementation procedure of the non-maximum suppression, reference may be made to the description in the related art, and no further description is given here.

According to the method provided by the embodiment, the image to be identified is input into the pre-trained detection network, the detection network performs multi-level feature extraction on the image to be identified to obtain a designated number of second feature images, classification and position regression are performed on each second feature image respectively, classification results and position information of a plurality of candidate areas are output, and non-maximum suppression processing is performed on the plurality of candidate areas based on the classification results and the position information of each candidate area to obtain a target area where a container number is located. Therefore, the dimensions of the plurality of second feature images are different, namely the receptive fields of the plurality of second feature images are different, and the finally obtained target area is equivalent to the target area obtained by classifying and position regression on the feature images of the plurality of different receptive fields, so that the target area has strong multi-scale detection capability. Therefore, the target area where the container number is located can be accurately positioned.

Optionally, in a possible implementation manner of the present application, the step S101, a specific implementation process of locating a target area where the container number is located from the image to be identified containing the container number may include:

(1) And adjusting the size of the image to be identified to obtain a plurality of target images with different sizes.

For example, in an embodiment, interpolation processing or downsampling processing may be performed on the image to be identified to obtain the target images with different sizes.

(2) Inputting the target images to a pre-trained detection network aiming at each target image, carrying out multi-level feature extraction on the target images by the detection network to obtain a specified number of second feature images, respectively carrying out classification and position regression on each second feature image, and outputting classification results and position information of a plurality of candidate areas, wherein the dimensions of the specified number of second feature images are different.

(3) And performing non-maximum suppression processing on the plurality of candidate areas based on the classification result and the position information of each candidate area to obtain a target area where the container number in the target image is located.

In particular, the specific implementation process and implementation principle of the steps (2) and (3) may refer to the description in the foregoing embodiments, which are not repeated herein.

(4) And determining the target area of the container number in the image to be identified according to the target area of the container number in the target image.

Specifically, the non-maximum value inhibition processing can be performed on the target area where the container number is located in the plurality of target images based on the target area where the container number is located in the target image, so as to obtain the target area where the container number is located in the image to be identified.

According to the method provided by the embodiment, the size of the image to be identified is adjusted to obtain a plurality of target images with different sizes, and then the target area where the container in the image to be identified is located is positioned based on the target images. In this way, the accuracy of the positioning can be further improved.

It should be noted that, the networks used in the present application are all pre-trained networks. The training process of the network may include:

(1) Constructing a network;

For example, when constructing a detection network, the detection network sets an input image and outputs position information of an area where a container number is located. For example, when the square is used to represent the region coordinates of each row or each column and the box number is composed of a plurality of rows or columns, a plurality of square coordinates are output, and the square may be inclined, indicating that the box number has a certain direction.

For another example, for the identification network, an area where the container number is located may be set as an input, and output as a box number character string, represented by a row XYZ, X represents a 4-bit master box number, Y represents a 7-bit number, and Z represents a 4-bit ISO number.

(2) Obtaining a training sample;

for example, in this example, when training the detection network, the tag information of the training sample is the location information of the area where the container number is located. It should be noted that a complete box number may be composed of multiple rows or columns, where each row or column should be marked with a quadrilateral coordinate, the quadrilateral encloses all characters in the row or column, and not much blank should be left, and the quadrilateral may be inclined, indicating that the box number has a certain direction.

For another example, when training a container number recognition network, the tag information of the training sample is a number string. It should be noted that a complete box number may be formed by multiple rows or multiple columns, the labeling result is uniformly written in XYZ form of one row, X represents the 4-bit master box number, Y represents the 7-bit number, Z represents the 4-bit ISO number, the last digit in Y is the check bit, and the last digit can be calculated from the 4-bit letter in X and the first 6-bit digit of Y, and verification should be given when labeling.

(3) And training the network by adopting a training set to obtain a trained network.

Specifically, the network parameters in the network may be set to a specified value, and then the obtained training samples are used to train the network to obtain a trained network.

Specifically, the process can comprise two stages of forward propagation and backward propagation, namely, inputting a training sample, performing forward propagation on the training sample to extract data characteristics and calculate a loss function, and backward propagation, sequentially performing forward and backward propagation from the last layer of the network by using the loss function, and modifying network parameters of the network by using a gradient descent method so as to enable the loss function to be converged.

Fig. 7 is a flowchart of a third embodiment of a container number identification method provided by the present application. Referring to fig. 7, the method provided in this embodiment, based on the above embodiment, step S104, a process of determining the container number in the image to be identified according to the decoding result, may include:

S701, judging whether the decoding result meets a specified check rule.

Specifically, the specific implementation process of this step may include:

(1) And judging whether the composition structure of the decoding result is matched with the composition structure of the container number.

Specifically, referring to the foregoing description, the composition structure of the container number can be expressed in XYZ form. Where X is a 4-bit master bin number (all letters), Y is a 7-bit number, and Z is a 4-bit ISO number (it should be noted that the ISO number may not appear at times). In this step, when the first 4 characters identified in the decoding result are letters and the 5 th to 11 th characters identified in the decoding result are numbers, it is determined that the composition structure of the decoding result is matched with the composition structure of the container number, otherwise, it is determined that the composition structure of the decoding result is not matched with the composition structure of the container number.

(2) If not, determining that the decoding result does not meet the check rule.

Specifically, when it is determined that the composition structure of the decoding result does not match the composition structure of the container number according to step (1), it is determined that the decoding result does not satisfy the above-described verification rule at this time. For example, in one embodiment, when there are numbers in the first 4 characters identified in the decoding result, it is determined that the composition result of the decoding result is not matched with the composition structure of the container number, and then it is determined that the decoding result does not satisfy the verification rule. For another example, when there are letters in the 5 th to 11 th characters identified in the decoding result, it is determined that the composition result of the decoding result does not match the composition structure of the container number, and it is further determined that the decoding result does not satisfy the above-described verification rule.

(3) If so, calculating the check value of the decoding result according to a preset rule.

(4) And judging whether the check value is equal to the check code identified in the decoding result.

(5) And if the check value is equal to the check code identified in the decoding result, determining that the decoding result meets the check rule, otherwise, determining that the decoding result does not meet the check rule.

Specifically, the check value of the decoding result can be calculated according to the following method (1) according to the corresponding relation between the preset letters and the digits, the first 4 characters identified in the decoding result are converted into digits to obtain the converted decoding result, and (2) the check value of the decoding result is calculated according to the following formula:

S is a check value of a decoding result;

c _n is the nth character in the converted decoding result.

Specifically, when the first 4 characters identified in the decoding result are converted into numbers, the conversion can be performed according to a preset correspondence between letters and numbers. For example, table 1 shows the preset letter-to-number correspondence relationship in an exemplary embodiment of the present application:

Table 1 correspondence between preset letters and numbers

Further, referring to the foregoing description, the 11 th bit in the container number is a check code, and thus, the 11 th character identified in the decoding result is an identified check code. In this step, it is determined whether the calculated check value is equal to the check code identified in the decoding result, and then when the calculated check value is equal to the check code identified in the decoding result, it is determined that the decoding result satisfies the check rule, or else it is determined that the decoding result does not satisfy the check rule.

S702, if yes, determining a first combination result obtained by sequentially combining the characters identified in the decoding result as a container number in the image to be identified.

S703, if not, correcting the decoding result to obtain a corrected decoding result, and determining a second combination result of each character in the corrected decoding result after sequential combination as the container number in the image to be identified, wherein the corrected decoding result meets the verification rule.

Specifically, in one possible implementation manner, the specific implementation process of this step may include:

(1) And (3) executing the step (2) when the composition structure of the decoding result is not matched with the composition structure of the container number, and executing the step (5) when the composition structure of the decoding result is matched with the composition structure of the container number.

(2) And carrying out first correction on the decoding result so that the composition structure of the decoding result after the first correction is matched with the composition structure of the container number.

Specifically, if there is a number in the first 4 characters identified in the decoding result, in an embodiment, it may be determined whether the number is a character error character recorded in a pre-established character error recognition table of different types, if so, the number is replaced with a letter corresponding to the number recorded in the character error recognition table of different types, otherwise, the number is replaced with a letter with the highest confidence in each candidate character at the moment. Of course, in another embodiment, the number may be directly replaced by the letter with the highest confidence in each candidate character at that time.

Further, if there is a letter in the 5 th to 11 th characters identified in the decoding result, in an embodiment, it may be determined whether the letter is a misidentified character recorded in a pre-established different type of character misidentification table, if so, the letter is replaced with a number corresponding to the letter recorded in the different type of character misidentification table, otherwise, the letter is replaced with a number with the greatest confidence in each candidate character at that time. Of course, in an embodiment, the letter may be directly replaced by the number with the highest confidence in each candidate character at the moment.

For example, table 2 the present application shows a pre-established list of different types of character misidentifications for an exemplary embodiment. Referring to table 2, at the time of recognition, 0 is easily erroneously recognized as O, or O is erroneously recognized as 0. For example, in one embodiment, when "0" exists in the first 4 bits identified by the decoding result, at this time, "0" is replaced with "O".

TABLE 2 different types of character misidentification tables

0	O
		1	I
2	Z

(3) And when the first corrected decoding result meets the verification rule, determining a combined result of sequentially combining the characters in the first corrected decoding result as the container number in the image to be identified.

(4) And (5) executing the step when the decoding result after the first correction does not meet the check rule.

(5) And carrying out second correction on the decoding result or the first corrected decoding result to obtain a second corrected decoding result, and determining a combined result obtained by sequentially combining all characters in the second corrected decoding result as a container number in the image to be identified, wherein the second corrected decoding result meets the verification rule.

In one possible implementation, the decoding result or the character with the lowest confidence in each character in the first corrected decoding result may be modified to obtain the second corrected decoding result.

According to the verification rule, the target character corresponding to the character with the lowest confidence coefficient when the verification rule is met can be calculated, and then the character with the lowest confidence coefficient is replaced by the target character, so that a second corrected decoding result is obtained.

In addition, in another possible implementation manner, whether the error recognition character recorded in the error recognition table of the same type exists in the first 10 characters in the decoding result or the first corrected decoding result can be judged according to the error recognition table of the same type, if one error recognition character recorded in the error recognition table of the same type exists in the 10 characters, the error recognition character is replaced by the character corresponding to the error recognition character, the corrected decoding result is obtained, whether the corrected decoding result meets a verification rule is further judged, and if the correction rule is met, the combined result of the characters in the corrected decoding result after sequential combination is determined to be the recognized container number. If not, modifying the decoding result or the number with the lowest confidence in each character in the first corrected decoding result according to the method to obtain a second corrected decoding result.

Further, if at least two misidentification characters recorded in the same type of character misidentification table exist in the 11 characters, at this time, any one misidentification character can be replaced by a character corresponding to the misidentification character, a plurality of corrected decoding results are obtained, whether a target decoding result meeting a verification rule exists in the plurality of corrected decoding results is further judged, and if so, the target decoding result is determined to be a second corrected decoding result. And determining the combination result of the characters in the target decoding result after the characters are sequentially combined as the identified container number. If not, at this time, any two wrongly-recognized characters can be replaced by the corresponding characters of the wrongly-recognized characters to obtain at least one corrected decoding result, and further judging whether the corrected decoding result has a target decoding result meeting the check rule, and if so, determining the target decoding result as a second corrected decoding result. And determining the combination result of the characters in the target decoding result after the characters are sequentially combined as the container number in the image to be identified. If the character is not present, modifying the decoding result or the number with the lowest confidence in each character in the first corrected decoding result according to the method to obtain a second corrected decoding result.

In this example, if there is no misrecognized character recorded in the character misrecognizing table of the same type, the decoding result or the number with the lowest confidence coefficient in each character in the first corrected decoding result may be directly modified to obtain the second corrected decoding result.

For example, table 3 is a pre-established table of type-misrecognized characters as shown in an exemplary embodiment. Referring to table 3, at the time of recognition, "M" is easily mistakenly recognized as "N". Therefore, if "M" is identified in the decoding result, at this time, "M" may be modified to "N".

TABLE 3 same type of misrecognized character table

M	N
		O	D
U	J
		E	F
L	I
		6	8

According to the method provided by the embodiment, whether the decoding result meets the specified verification rule is judged, when the decoding result meets the specified verification rule, the first combination result obtained by sequentially combining all the characters identified in the decoding result is determined to be the container number in the image to be identified, and when the decoding result does not meet the specified verification rule, the decoding result is corrected to obtain the corrected decoding result, and the second combination result obtained by sequentially combining all the characters in the corrected decoding result is determined to be the container number in the image to be identified. Wherein the corrected decoding result satisfies the above-mentioned check rule. In this way, the accuracy can be further improved.

Fig. 8 is a schematic diagram illustrating an implementation of a container number recognition method according to an exemplary embodiment of the present application. Referring to fig. 8, in the example shown in fig. 8, the STN network, the network for performing feature extraction, and the container number recognition model are integrated into a recognition network, and when a target area is input into the recognition network, the network can output a decoding result. In this way, the container number in the image to be identified can be determined based on the decoding result.

The application also provides an embodiment of the container number recognition device corresponding to the embodiment of the container number recognition method.

The embodiment of the container number identification device can be applied to computer equipment. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a memory into a memory by a processor of a computer device where the device is located. In terms of hardware, as shown in fig. 9, a hardware configuration diagram of a computing device where a container number identifying device according to an exemplary embodiment of the present application is located is shown. In addition to the memory 910, the processor 920, the memory 930, and the network interface 940 shown in fig. 9, the computer device in which the apparatus is located in the embodiment generally includes other hardware according to the actual function of the container number identifying apparatus, which will not be described herein.

Fig. 10 is a schematic structural diagram of a container number identification device according to an embodiment of the present application. Referring to fig. 10, the container number identifying device provided in the present embodiment may include a detection module 100, an identifying module 200 and a processing module 300, wherein,

The detection module 100 is configured to locate a target area where the container number is located from an image to be identified including the container number;

The identifying module 200 is configured to spatially transform the target area to obtain a transformed target area;

the identification module 200 is further configured to perform feature extraction on the transformed target area to obtain a first feature map;

the recognition module 200 is further configured to input the first feature map into a pre-trained container number recognition model, sequence the first feature map by using the container number recognition model to obtain a feature sequence, encode the feature sequence to obtain an encoding result, and decode the encoding result to output a decoding result;

The device of the present embodiment may be used to implement the technical solution of the method embodiment shown in fig. 1, and its implementation principle and technical effects are similar, and are not described here again.

The present application also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of any of the methods provided in the first aspect of the present application.

In particular, computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., internal hard disk or removable disks), magneto-optical disks, and CD-ROM and DVD-ROM disks.

With continued reference to fig. 9, the present application further provides a computer device, including a memory 910, a processor 920, and a computer program stored in the memory 910 and executable on the processor 920, where the processor 920 implements the steps of any one of the methods provided in the first aspect of the present application when the program is executed.

The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the application.

Claims

1. A method for identifying a container number, characterized in that the method comprises:

Locating a target area where the container number is located from an image to be identified that contains the container number, and performing spatial transformation on the target area to obtain a transformed target area, wherein the spatial transformation includes: translation, scaling, and rotation;

Performing feature extraction on the transformed target area to obtain a first feature map;

The first feature map is input into a pre-trained container number recognition model, and a preset sliding window is slid on the first feature map according to a preset moving step length to segment a local feature map where the sliding window is located; all segmented local feature maps are determined as a feature sequence, and the feature sequence is encoded to obtain an encoding result, and the encoding result is decoded and then outputted;

Determine whether the decoding result satisfies the specified verification rule; the determining whether the decoding result satisfies the specified verification rule includes: determining whether the composition structure of the decoding result matches the composition structure of the container number; if not, determining that the decoding result does not satisfy the verification rule; if matching, calculating the verification value of the decoding result according to a preset rule; determining whether the verification value is equal to the verification code identified in the decoding result; if the verification value is equal to the verification code identified in the decoding result, determining that the decoding result satisfies the verification rule, otherwise, determining that the decoding result does not satisfy the verification rule;

If the specified verification rule is met, a first combination result after sequentially combining the characters identified in the decoding result is determined as the container number in the image to be identified;

If the specified verification rule is not satisfied, the decoding result is corrected to obtain a corrected decoding result, wherein the corrected decoding result satisfies the verification rule; and a second combination result after sequentially combining the characters in the corrected decoding result is determined as the container number in the image to be identified;

Wherein, locating the target area where the container number is located from the image to be identified containing the container number includes: using a detection network to output the coordinate position of the container number on the image to be identified, the coordinate position includes the area coordinates of each row or column of the container number represented by a quadrilateral, and the quadrilateral includes an inclined quadrilateral, indicating that the container number has a tilt angle.

2. The method according to claim 1, characterized in that the step of locating the target area where the container number is located from the image to be identified containing the container number comprises:

Input the image to be identified into a pre-trained detection network, and perform multi-level feature extraction on the image to be identified by the detection network to obtain a specified number of second feature maps, and perform classification and position regression on each of the second feature maps, and output classification results and position information of multiple candidate regions; wherein the dimensions of the specified number of second feature maps are different;

Based on the classification results and location information of each candidate area, non-maximum suppression processing is performed on the multiple candidate areas to obtain the target area where the container number is located.

3. The method according to claim 1, characterized in that the step of locating the target area where the container number is located from the image to be identified containing the container number comprises:

Resizing the image to be recognized to obtain a plurality of target images of different sizes;

For each target image, the target image is input into a pre-trained detection network, and the detection network performs multi-level feature extraction on the target image to obtain a specified number of second feature maps, and performs classification and position regression on each of the second feature maps, and outputs classification results and position information of multiple candidate regions; wherein the dimensions of the specified number of second feature maps are different;

Based on the classification results and position information of each candidate area, non-maximum suppression processing is performed on the multiple candidate areas to obtain the target area where the container number in the target image is located;

The target area where the container number in the to-be-recognized image is located is determined according to the target area where the container number in the target image is located.

4. The method according to claim 1, characterized in that the step of performing spatial transformation on the target area to obtain the transformed target area comprises:

The target region is input into a pre-trained STN network, and the STN network performs spatial transformation on the target region and then outputs the transformed target region.

5. The method according to claim 1, characterized in that the encoding process of the feature sequence to obtain an encoding result, and the decoding process of the encoding result to output a decoding result, comprises:

Calculating the weight parameter of each feature in the feature sequence at each moment;

Calculating the encoding result at each moment according to the weight parameter of each feature in the feature sequence at each moment and the feature sequence;

Calculating the context-related hidden layer state at each moment according to the feature sequence and the encoding result at each moment;

According to the hidden layer state related to the context at each moment, the decoding result at each moment is obtained.

6. A container number recognition device, characterized in that the device comprises a detection module, an identification module and a processing module, wherein:

The detection module is used to locate the target area where the container number is located from the image to be identified containing the container number;

The recognition module is used to perform spatial transformation on the target area to obtain a transformed target area, wherein the spatial transformation includes: translation, scaling, and rotation;

The recognition module is further used to extract features from the transformed target area to obtain a first feature map;

The recognition module is further used to input the first feature map into a pre-trained container number recognition model, slide a preset sliding window on the first feature map according to a preset moving step length to segment a local feature map where the sliding window is located; determine all the segmented local feature maps as a feature sequence, encode the feature sequence to obtain an encoding result, and decode the encoding result and output a decoding result;

The processing module is used to determine whether the decoding result satisfies the specified verification rule; the determination of whether the decoding result satisfies the specified verification rule includes: determining whether the composition structure of the decoding result matches the composition structure of the container number; if not, determining that the decoding result does not satisfy the verification rule; if matching, calculating the verification value of the decoding result according to a preset rule; determining whether the verification value is equal to the verification code identified in the decoding result; if the verification value is equal to the verification code identified in the decoding result, determining that the decoding result satisfies the verification rule, otherwise, determining that the decoding result does not satisfy the verification rule;

Among them, the detection module is specifically used to: use the detection network to output the coordinate position of the container number on the image to be identified, and the coordinate position includes the regional coordinates of each row or column of the container number represented by a quadrilateral, and the quadrilateral includes an inclined quadrilateral, indicating that the container number has a tilt angle.

7. A computer-readable storage medium having a computer program stored thereon, wherein when the program is executed by a processor, the steps of the method according to any one of claims 1 to 5 are implemented.

8. A computer device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the steps of the method according to any one of claims 1 to 5 when executing the program.