WO2018188653A1 - 检查方法和检查设备 - Google Patents
检查方法和检查设备 Download PDFInfo
- Publication number
- WO2018188653A1 WO2018188653A1 PCT/CN2018/083012 CN2018083012W WO2018188653A1 WO 2018188653 A1 WO2018188653 A1 WO 2018188653A1 CN 2018083012 W CN2018083012 W CN 2018083012W WO 2018188653 A1 WO2018188653 A1 WO 2018188653A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- vector
- transmission image
- category
- container
- word
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000007689 inspection Methods 0.000 title claims abstract description 38
- 239000013598 vector Substances 0.000 claims abstract description 121
- 230000005540 biological transmission Effects 0.000 claims abstract description 51
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 21
- 238000013528 artificial neural network Methods 0.000 claims abstract description 18
- 230000005577 local transmission Effects 0.000 claims abstract description 5
- 230000011218 segmentation Effects 0.000 claims description 19
- 125000004122 cyclic group Chemical group 0.000 claims description 15
- 230000006870 function Effects 0.000 claims description 15
- 230000015654 memory Effects 0.000 claims description 11
- 230000000306 recurrent effect Effects 0.000 abstract 1
- 230000008569 process Effects 0.000 description 19
- 238000010586 diagram Methods 0.000 description 14
- 238000004458 analytical method Methods 0.000 description 10
- 239000011159 matrix material Substances 0.000 description 9
- 238000012545 processing Methods 0.000 description 9
- 238000011176 pooling Methods 0.000 description 8
- 230000008447 perception Effects 0.000 description 5
- 230000019771 cognition Effects 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 238000013507 mapping Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000005855 radiation Effects 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 239000003814 drug Substances 0.000 description 1
- 229940079593 drug Drugs 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 239000000835 fiber Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000007788 liquid Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000007639 printing Methods 0.000 description 1
- 239000000523 sample Substances 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 230000001131 transforming effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01N—INVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
- G01N23/00—Investigating or analysing materials by the use of wave or particle radiation, e.g. X-rays or neutrons, not covered by groups G01N3/00 – G01N17/00, G01N21/00 or G01N22/00
- G01N23/02—Investigating or analysing materials by the use of wave or particle radiation, e.g. X-rays or neutrons, not covered by groups G01N3/00 – G01N17/00, G01N21/00 or G01N22/00 by transmitting the radiation through the material
- G01N23/04—Investigating or analysing materials by the use of wave or particle radiation, e.g. X-rays or neutrons, not covered by groups G01N3/00 – G01N17/00, G01N21/00 or G01N22/00 by transmitting the radiation through the material and forming images of the material
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
- G06F18/256—Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/809—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
- G06V10/811—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data the classifiers operating on different input data, e.g. multi-modal recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/52—Surveillance or monitoring of activities, e.g. for recognising suspicious objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/05—Recognition of patterns representing particular kinds of hidden objects, e.g. weapons, explosives, drugs
Definitions
- Embodiments of the present disclosure relate to security checks, and more particularly to a method and apparatus for inspecting a container or the like based on image information and text information.
- a method of inspecting a container comprising the steps of: performing an X-ray scan of the container to be inspected to obtain a transmission image; and generating a first vector describing the partial transmission image from the transmission image using a convolutional neural network; Generating a word vector from a textual description of the container cargo using a cyclic neural network as a second vector; integrating the first vector and the second vector to obtain a third vector representing the transmission image and the text description; The third vector discriminates the category to which the goods in the container belong.
- the step of discriminating a category to which the goods in the container belong based on the third vector further comprises: generating, based on a probability function, a probability value indicating that the goods in the container belong to a certain category from the third vector ; The category with the highest probability value is taken as the category to which the goods belong.
- the method further includes presenting to the user a typical transmission image associated with the category based on the identified category.
- the step of generating a word vector includes: performing a word segmentation operation on the text description of the container cargo; and vectorizing the character description after the word segmentation operation to obtain a word vector.
- the method further comprises the steps of: retrieving a corresponding representative transmission image from a library of typical transmission images based on the word vector; presenting the retrieved representative transmission image to a user.
- the method further comprises the steps of: retrieving a corresponding representative transmission image from a library of typical transmissive images based on the first vector; presenting the retrieved representative transmission image to a user.
- an inspection apparatus comprising: an X-ray inspection system that performs an X-ray scan of a container to be inspected to obtain a transmission image; a memory that stores the transmission image; and a processor configured to: utilize A convolutional neural network generates a first vector describing a local transmission image from a transmission image; generating a word vector from a textual description of the container cargo using a cyclic neural network as a second vector; integrating the first vector and the second vector to obtain Denoting the third image of the transmission image and the text description; and discriminating the category to which the goods in the container belong based on the third vector.
- the processor is configured to generate, from the third vector, a probability value indicating that the goods in the container belong to a certain category based on a probability function; and the category having the largest probability value as the belonging of the goods category.
- the processor is further configured to present a typical transmission image associated with the category to a user according to the identified category.
- the processor is configured to perform a word segmentation operation on the text description of the container cargo; and vectorize the character description after the word segmentation operation to obtain a word vector.
- the processor is further configured to: retrieve a corresponding representative transmission image from a typical transmission image library based on the word vector; present the retrieved representative transmission image to a user.
- the approximate category of the target cargo can be initially determined, which facilitates the judger's further judgment.
- FIG. 1 shows a schematic structural view of an inspection apparatus according to an embodiment of the present disclosure
- FIG. 2 is a schematic diagram showing the structure of a computing device included in the inspection apparatus as described in FIG. 1;
- FIG. 3 shows a schematic block configuration diagram of an inspection apparatus according to an embodiment of the present disclosure
- FIG. 4 shows a schematic flow chart of an inspection method according to an embodiment of the present disclosure
- FIG. 5 illustrates a schematic diagram of matching a cargo image with category information in accordance with an embodiment of the present disclosure
- FIG. 6 illustrates a schematic diagram of determining a category of a cargo using a convolutional neural network, in accordance with an embodiment of the present disclosure
- FIG. 7 illustrates a schematic diagram of retrieving a typical perspective image based on category information, in accordance with another embodiment of the present disclosure
- FIG. 8 illustrates a word vector space relationship diagram used in a method in accordance with an embodiment of the present disclosure
- FIG. 9 illustrates a cell structure diagram of a cyclic neural network used in a method in accordance with an embodiment of the present disclosure
- FIG. 10 illustrates a schematic diagram of generating category information from image vectors and word vectors, in accordance with an embodiment of the present disclosure.
- embodiments of the present disclosure propose a human-aided inspection technique based on x-ray images and text descriptions to complete intelligent analysis tools for classifying and checking goods in specific areas (key areas of interest).
- the inspection personnel are more judged on the local area of the image, which is also the most closely related and necessary part of the human-machine "mutual assistance".
- the technology uses the computer's data analysis and image understanding capabilities to initially determine the approximate category of the target shipment.
- human perception information is introduced, especially for the comprehensive cognition of local prominent regions, and more accurate classification results are given, thereby improving the effectiveness of the inspection recommendations.
- FIG. 1 shows a schematic structural view of an inspection apparatus according to an embodiment of the present disclosure.
- an inspection apparatus 100 includes an X-ray source 110, a detector 130, a data collection device 150, a controller 140, and a computing device 160 for an object to be inspected 120 such as a container truck.
- Conduct a security check such as determining whether it contains dangerous goods such as firearms/drugs and/or suspicious items.
- the detector 130 and the data acquisition device 150 are separately described in this embodiment, those skilled in the art will appreciate that they may also be integrated together as an X-ray detection and data acquisition device.
- the X-ray source 110 described above may be an isotope, or may be an X-ray machine or an accelerator or the like.
- the X-ray source 110 can be single energy or dual energy.
- the object to be inspected 120 is scanned by X-ray source 110 and detector 150 and controller 140 and computing device 160 to obtain probe data.
- the operator by means of the human-computer interaction interface of the computing device 160, issues an instruction through the controller 140 to command the X-ray source 110 to emit radiation, passing through the object under inspection 120 and being detected by the detector 130 and
- the data acquisition device 150 receives and processes the data by the computing device 160 to obtain a transmission image, and further generates an image vector (first vector) describing the local transmission image from the transmission image using the trained convolutional neural network, using the trained
- the circulatory neural network generates a word vector (second vector) from the textual description of the container cargo.
- Computing device 160 determines the category to which the goods in the container belong based on the image vector and the word vector. For example, computing device 160 integrates the first vector and the second vector to obtain a third vector representing the transmitted image and the textual description, and based on the third vector, discriminating the category to which the cargo in the container belongs.
- FIG. 2 shows a schematic structural diagram of a computing device as shown in FIG. 1.
- the signal detected by the detector 130 is collected by a data collector, and the data is stored in the memory 161 through the interface unit 167 and the bus 163.
- Configuration information and a program of the computer data processor are stored in a read only memory (ROM) 162.
- a random access memory (RAM) 163 is used to temporarily store various data during the operation of the processor 165.
- a computer program for performing data processing such as a substance recognition program, an image processing program, and the like are also stored in the memory 161.
- the internal bus 163 is connected to the above-described memory 161, read only memory 162, random access memory 163, input device 164, processor 165, display device 166, and interface unit 167.
- the instruction code of the computer program instructs the processor 165 to execute a predetermined data processing algorithm, and after obtaining the data processing result, displays it such as LCD (Liquid).
- the processing result is output on the display device 167 such as a display, or directly in the form of a hard copy such as printing.
- FIG. 3 shows a schematic block configuration diagram of an inspection apparatus according to an embodiment of the present disclosure.
- a software program is installed in the computing device 160 of the inspection device to determine the category of the shipment, such as HSCODE, based on the transmitted image of the containerized cargo and the textual information describing the shipment.
- the convolutional neural network based image understanding module 310 processes the input transmission image to obtain an image vector.
- the textual understanding module 320 based on the cyclic neural network processes the input text information to obtain a word vector.
- the analysis and learning module 330 determines the category to which the goods belong based on the image vector and the word vector.
- FIG. 4 shows a schematic flow chart of an inspection method according to an embodiment of the present disclosure.
- the inspection apparatus shown in FIG. 1 performs X-ray scanning on the container to be inspected to obtain a transmission image.
- a first vector describing the partial transmission image is generated from the transmission image using a convolutional neural network.
- a convolution kernelization operation is performed with a partial region of the container transmission image as an input, and then a full convolution operation is performed, and a vector representation of the transmission image is output as the first vector.
- the local area of the container transmission image is taken as input, and after five stages of convolution and pooling operations (each level corresponds to a set of convolution kernels and one pooling layer, the number and size of convolution kernels are independent)
- the network can output a vector representation of the transmitted image.
- the convolutional neural network based image understanding module 310 is responsible for cargo identification and analysis of x-ray images.
- the module 310 mainly includes two parts: a cargo category judgment using a convolution network and a typical template matching.
- the image of the local sensitive area contains rich texture information of the goods.
- the cargo category judgment of the convolutional neural network takes a specific local area image as an input, and through a multi-level matrix operation, a vector representation of the local area image can be generated, and the vector can be used to perform the category attribution inference information. As shown in Figure 5, the representation of this information is the HSCODE encoding of the cargo type and the corresponding confidence probability.
- the convolutional neural network preferably employs a network structure of VGG (Visual Geometry Group)-Net, but those skilled in the art understand that different embodiments may not be limited to such a structure.
- the input of the convolutional neural network is a local area of the cargo x-ray image. After multi-level convolution, pooling, full connection, etc., a vector can be obtained to represent the information of the image.
- the convolution operation is a process in which the analog filter performs feature learning on the image in order to fully extract the information in the image.
- multiple different and independent convolution kernels are utilized, each convolution kernel convolving the input separately and passing all convolution results into the next operation.
- each output matrix after convolution is divided into n*m grids, where n and m respectively represent the number of rows and columns of the grid, and then the maximum value of each grid is taken as the network.
- the output value of the cell can finally get a matrix of size n*m, which is the output of the pooling operation.
- the full connection process is to vectorize the output matrix through multi-layer convolution and pooling operations, and also adds a mapping operation to the data by using the full connection matrix. This increases the learning and simultaneously transforms the output matrix into a vector whose length is equal to the number of categories to facilitate subsequent classification operations.
- this vector is probabilistically processed, which is done using the Softmax function. That is to say, each element in the vector represents a probability value, which corresponds to the probability size of the object to be tested.
- the probabilistic formula of the Softmax function can be expressed as:
- v denotes the fully connected output vector
- v i denotes the i-th element in v
- k is the length of the vector
- c i denotes the i-th class
- v) is predicted to be the i-th based on the input The probability value of the class. Accordingly, the category with the largest probability value can be used as the prediction result of the first stage.
- the typical template matching function can use the HSCODE code given by the convolutional neural network to visually provide a number of typical texture image block data of the goods, thereby further confirming whether the inferred information is credible, as shown in FIG. 7. Shown.
- a typical cargo image with the corresponding HSCODE encoding can be called and presented in a nine-square grid manner.
- the inspector can compare the image of the goods to be inspected with the typical data for better judgment.
- a word vector is generated from the textual description of the container cargo using the cyclic neural network as the second vector.
- the textual description of the containerized goods is used as an input to the network, and the textual description is converted into a list by a word segmentation operation.
- a vector representation of each word in the list is obtained by querying the dictionary as a second vector. More specifically, the textual description of the inspector is used as a network input.
- a sentence description sentence is converted into a corresponding list of words (in some cases, it is possible to remove the repeated words or assign a certain weight to the words).
- the existing dictionary is then queried, turned into a word label, and a vector representation of each word in the list is extracted.
- the words in the word list are then input into the LSTM (Long-Short Term Memory) network one by one for prediction.
- the textual understanding module 320 based on the cyclic neural network takes a textual description as an input, performs system processing, and finally outputs a typical image consistent with the text description, in order to be able to provide an effective checker in a more humane manner. Information is decided.
- the module includes a participle operation part, a word vector generation part, a typical image display and the like.
- FIG. 8 illustrates a word vector space relationship diagram used in a method in accordance with an embodiment of the present disclosure.
- the typical image display part of the text understanding module is mainly for the fact that the previous inspection system basically takes images as input, and can only rely on the ability of the computer to understand the image, and the human perception function is rarely introduced. Under certain conditions, the inspector may not be able to judge which type of image belongs to, and can only characterize the characteristics of the goods by describing the texture, shape, etc., to call up typical historical images for comparison. If the traditional inspection system is used, the keyword information is required to be described, which indirectly becomes a burden on the inspector. The generation of the corresponding word vector by the cyclic neural network naturally has the ability to perform distance learning on similar words (Fig. 8). As a result, in actual operation, it is not necessary to completely and accurately type the fixed keyword of the goods. Convenient and accurate retrieval of the desired image.
- the word segmentation operation is for the preprocessing operation of the data input system with sentences (especially obvious in Chinese). Unlike previous template data retrieval with attributes or keywords as input, the module supports users to more flexible and complete typing of the desired information in the form of complete sentences, but the working method based on sentences as basic information units is complex and inefficient. Therefore, it is necessary to properly decompose sentences into unit words.
- the word segmentation operation follows the human understanding of the language. Based on the dictionary, the sentence is divided into an array (or vector) with words (words or phrases) as elements to facilitate computer understanding.
- Text understanding is the domain of natural language processing, and word segmentation technology is the basis of text mining. Especially for Chinese input, due to the innate language structure particularity, Chinese description only has obvious identification of sentences, paragraphs, etc., and lacks a clear delimiter for individual words. Therefore, the first condition of text understanding is the clear division of text description.
- the word segmentation operation is performed here based on a statistical and machine learning manner.
- a dictionary is first established based on historical knowledge. When the word segmentation is used, some rules are used for string matching. For some ambiguous words and words not entered into the dictionary, CRF (conditional random fields) is used.
- the sentence is marked with the word position (the beginning of the word, the middle of the word, the ending of the word, and the single subword), and then the CRF is used to perform the word segmentation operation, and at the same time, the new words not registered in the dictionary are added to facilitate the matching.
- the word position the beginning of the word, the middle of the word, the ending of the word, and the single subword
- the word vector is a process of transforming a language description into a feature that is easy for the computer to understand and manipulate. This process relies entirely on a circular neural network implementation. Cyclic neural networks naturally have the ability to process and analyze serialized association data. They can be used to group large pieces of information into several core elements, and can also enrich words that are not clearly related to each other into a piece of information that can be understood.
- the data after segmentation is input as a network in the form of a vector, and each time a word is analyzed and learned until the processing of the last word is completed, a vector representation, called a word vector, can be generated. This vector contains the information of the entire sentence, by which the typical image of the speech symbol can be retrieved accordingly, or the subsequent category determination can be made.
- the word vector is a process of vectorizing a textual description of a word segmentation operation.
- the LSTM Long-Short Term Memory
- the specific working process is to convert each word in the text description into a vector, which can be encoded by one-hot, or by using a mapping matrix whose number of lines is the number of dictionary words and the number of columns is a specified size. Ground, the latter is used here.
- FIG. 9 shows a cell structure diagram of a cyclic neural network used in a method in accordance with an embodiment of the present disclosure. After all the words are vectorized, they are reversed in the order of the text description, and then the initial vector of one word is selected in order, and then input into the network unit of the LSTM.
- the calculation process of the LSTM unit can be expressed as:
- m t sigmoid(f t )*m t-1 +sigmoid(i t )*tanh(c t )
- x t represents the initial vector of the tth word
- h t-1 is the output of the last LSTM unit
- W is the weight matrix
- the parameter matrix pre-trained by the previous samples i t , c t , f t and o t is the network intermediate state of the t-th word
- m t-1 is the transfer value of the middle state of the previous word
- sigmoid() and tanh() are called the activation function
- m t is the state transfer value of the t-th word.
- h t is the word vector generated by the previous t words. If the input text description contains a total of k words, then after the processing of the k LSTM units, a final word vector h k containing the description information is finally generated.
- the typical image display portion of the text description changes the traditional system to use image understanding as the sole criterion for the results and similar displays of the inspection system, but rather the process of visualizing the perception of the inspector by text.
- the typical image template data is manually labeled and described, and then the BOW (Bag of Words) method is used to classify the annotation information, and a BOW feature of each type of image can be obtained.
- the correlation between this vector and the BOW feature of the typical image is calculated. Then select the category corresponding to the top three BOW features of the correlation, and extract the typical images under the category for visual display.
- step S440 the first vector and the second vector are integrated to obtain a third vector representing the transmission image and the text description.
- the category to which the goods in the container belong is discriminated based on the third vector. For example, a probability value indicating that the goods in the container belong to a certain category is generated from the third vector based on a probability function (for example, a Softmax function), and the category having the largest probability value is regarded as the category to which the goods belong.
- a probability function for example, a Softmax function
- FIG. 10 illustrates a schematic diagram of generating category information from image vectors and word vectors, in accordance with an embodiment of the present disclosure.
- the analysis and learning module 330 combines the image comprehension capability of the computer with the human perception ability to more completely complete the new means of inspection tasks.
- the image comprehension module based on convolutional neural network and the text comprehension module based on convolutional neural network can respectively perform analysis on image or text description and give corresponding results, and the comprehensive analysis and learning module can combine the capabilities of both.
- the mutual learning process can be completed, and the output is a more accurate prediction result.
- a convolutional neural network-based image comprehension module and a convolutional network-based text comprehension module are currently each performing an inspection of an image for each of an image and a text description, and the comprehensive analysis of the system and The learning module effectively combines the two to better assist the inspector in completing the inspection.
- the convolutional network for image understanding and the cyclic network for textual understanding are each trained to calculate their losses, so that the initial learning of the two networks can be completed, and then the image representation vector and the loop of the convolutional network are completed.
- the word vectors output by the network are integrated, and after a projection mapping, the Softmax function is also used to obtain the prediction categories obtained by combining the two networks. This effectively combines the two kinds of information, and since the two networks go through a process of joint training during the training phase, when the feedback is adjusted, the adjustment of each network has another network intervention and Adjustments have increased the learning of the entire system.
- the local area of the container transmission image is used as the input of the VGG network, and undergoes five levels of convolution and pooling operations (each level corresponds to a set of convolution kernels and one pooling).
- the layer, the number and size of the convolution kernel are independent, and then through the 3-layer full convolution operation, the output of the last convolution layer is the vector representation I of the transmission image; the cyclic neural network for text understanding, to the inspector
- the text description is used as the network input. After the basic word segmentation operation, a text description sentence can be converted into a corresponding list of words (here, you can choose to remove the repeated words or assign a certain weight to the words), and then according to the existing dictionary.
- the query operation can be changed into a word label and extract the vector representation of each word in the list. Then the words in the word list are input into the LSTM network one by one according to the order, when all the words in the list are executed.
- a vector representation T of the final textual understanding can be generated.
- the vector representation I of the image and the vector representation T of the text are spliced into a vector, and then through the 2-layer full convolution network, the category prediction is performed using the Softmax layer, so that the container cargo category combining image and text information can be realized. Predictive reminder function.
- the network training learning process can use SGD (Stochastic Gradient Descent) and BGD (Stochastic Gradient Descent) to optimize the parameters of the learning network.
- the entire network structure contains their own (image, text) processing network, and there is a common learning process that combines two kinds of information. In each network adjustment, it will be interfered and adjusted by another network to some extent, adding to the system's Information utilization and learning.
- the data analysis and image understanding capabilities of the computer are utilized to initially determine the approximate category of the target goods.
- human perception information is introduced, especially for the comprehensive cognition of local prominent regions, and more accurate classification results are given, thereby improving the effectiveness of the inspection recommendations.
- aspects of the embodiments disclosed herein may be implemented in an integrated circuit as a whole or in part, as one or more of one or more computers running on one or more computers.
- a computer program eg, implemented as one or more programs running on one or more computer systems
- implemented as one or more programs running on one or more processors eg, implemented as one or One or more programs running on a plurality of microprocessors, implemented as firmware, or substantially in any combination of the above, and those skilled in the art, in accordance with the present disclosure, will be provided with design circuitry and/or write software and / or firmware code capabilities.
- signal bearing media include, but are not limited to, recordable media such as floppy disks, hard drives, compact disks (CDs), digital versatile disks (DVDs), digital tapes, computer memories, and the like; and transmission-type media such as digital and / or analog communication media (eg, fiber optic cable, waveguide, wired communication link, wireless communication link, etc.).
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Biochemistry (AREA)
- Immunology (AREA)
- Pathology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Analysing Materials By The Use Of Radiation (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (11)
- 一种检查集装箱的方法,包括步骤:对待检查的集装箱进行X射线扫描,得到透射图像;利用卷积神经网络从透射图像产生描述局部透射图像的第一向量;利用循环神经网络从集装箱货物的文字描述产生词向量,作为第二向量;整合所述第一向量和所述第二向量,得到表述所述透射图像和所述文字描述的第三向量;以及基于所述第三向量判别所述集装箱中的货物所属的类别。
- 如权利要求1所述的方法,其中基于所述第三向量判别所述集装箱中的货物所属的类别的步骤还包括:基于概率函数从所述第三向量产生表示集装箱中的货物属于某个类别的概率值;将具有最大概率值的类别作为所述货物所属的类别。
- 如权利要求2所述的方法,还包括:根据所判别的类别向用户呈现与所述类别相关联的典型透射图像。
- 如权利要求1所述的方法,其中产生词向量的步骤包括:对所述集装箱货物的文字描述进行分词操作;将分词操作后的文字描述向量化,得到所述词向量。
- 如权利要求4所述的方法,还包括步骤:基于所述词向量从典型透射图像库中检索相应的典型透射图像;向用户呈现所检索的典型透射图像。
- 如权利要求1所述的方法,还包括步骤:基于所述第一向量从典型透射图像库中检索相应的典型透射图像;向用户呈现所检索的典型透射图像。
- 一种检查设备,包括:X射线检查系统,对待检查的集装箱进行X射线扫描,得到透射图像;存储器,存储所述透射图像;处理器,配置为:利用卷积神经网络从透射图像产生描述局部透射图像的第一向量;利用循环神经网络从集装箱货物的文字描述产生词向量,作为第二向量;整合所述第一向量和所述第二向量,得到表述所述透射图像和所述文字描述的第三向量;以及基于所述第三向量判别所述集装箱中的货物所属的类别。
- 如权利要求7所述的检查设备,其中所述处理器配置为:基于概率函数从所述第三向量产生表示集装箱中的货物属于某个类别的概率值;将具有最大概率值的类别作为所述货物所属的类别。
- 如权利要求8所述的检查设备,所述处理器还被配置为:根据所判别的类别向用户呈现与所述类别相关联的典型透射图像。
- 如权利要求7所述的检查设备,其中所述处理器被配置为:对所述集装箱货物的文字描述进行分词操作;将分词操作后的文字描述向量化,得到词向量。
- 如权利要求10所述的检查设备,所述处理器还被配置为:基于所述词向量从典型透射图像库中检索相应的典型透射图像;向用户呈现所检索的典型透射图像。
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP18785106.8A EP3611666A4 (en) | 2017-04-14 | 2018-04-13 | INSPECTION PROCEDURE AND INSPECTION DEVICE |
KR1020197033257A KR20190139254A (ko) | 2017-04-14 | 2018-04-13 | 검사 방법 및 검사 장비 |
JP2019555877A JP2020516897A (ja) | 2017-04-14 | 2018-04-13 | 検査方法及び検査設備 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710243591.9 | 2017-04-14 | ||
CN201710243591.9A CN108734183A (zh) | 2017-04-14 | 2017-04-14 | 检查方法和检查设备 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2018188653A1 true WO2018188653A1 (zh) | 2018-10-18 |
Family
ID=63792302
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2018/083012 WO2018188653A1 (zh) | 2017-04-14 | 2018-04-13 | 检查方法和检查设备 |
Country Status (5)
Country | Link |
---|---|
EP (1) | EP3611666A4 (zh) |
JP (1) | JP2020516897A (zh) |
KR (1) | KR20190139254A (zh) |
CN (1) | CN108734183A (zh) |
WO (1) | WO2018188653A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110472728A (zh) * | 2019-07-30 | 2019-11-19 | 腾讯科技(深圳)有限公司 | 目标信息确定方法、目标信息确定装置、介质及电子设备 |
CN111860263A (zh) * | 2020-07-10 | 2020-10-30 | 海尔优家智能科技(北京)有限公司 | 信息录入方法、装置及计算机可读存储介质 |
CN113496046A (zh) * | 2021-01-18 | 2021-10-12 | 图林科技(深圳)有限公司 | 一种基于区块链的电商物流系统及方法 |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109522913B (zh) * | 2017-09-18 | 2022-07-19 | 同方威视技术股份有限公司 | 检查方法和检查设备以及计算机可读介质 |
CN111461152B (zh) * | 2019-01-21 | 2024-04-05 | 同方威视技术股份有限公司 | 货物检测方法及装置、电子设备和计算机可读介质 |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015062352A1 (zh) * | 2013-10-29 | 2015-05-07 | 同方威视技术股份有限公司 | 立体成像系统及其方法 |
CN104751163A (zh) * | 2013-12-27 | 2015-07-01 | 同方威视技术股份有限公司 | 对货物进行自动分类识别的透视检查系统和方法 |
CN105784732A (zh) * | 2014-12-26 | 2016-07-20 | 同方威视技术股份有限公司 | 检查方法和检查系统 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090174554A1 (en) * | 2005-05-11 | 2009-07-09 | Eric Bergeron | Method and system for screening luggage items, cargo containers or persons |
WO2013036735A1 (en) * | 2011-09-07 | 2013-03-14 | Rapiscan Systems, Inc. | X-ray inspection system that integrates manifest data with imaging/detection processing |
CN105808555B (zh) * | 2014-12-30 | 2019-07-26 | 清华大学 | 检查货物的方法和系统 |
JP6543986B2 (ja) * | 2015-03-25 | 2019-07-17 | 日本電気株式会社 | 情報処理装置、情報処理方法およびプログラム |
CN105574133A (zh) * | 2015-12-15 | 2016-05-11 | 苏州贝多环保技术有限公司 | 一种多模态的智能问答系统及方法 |
CN105975457A (zh) * | 2016-05-03 | 2016-09-28 | 成都数联铭品科技有限公司 | 基于全自动学习的信息分类预测系统 |
CN106446782A (zh) * | 2016-08-29 | 2017-02-22 | 北京小米移动软件有限公司 | 图像识别方法及装置 |
-
2017
- 2017-04-14 CN CN201710243591.9A patent/CN108734183A/zh active Pending
-
2018
- 2018-04-13 WO PCT/CN2018/083012 patent/WO2018188653A1/zh unknown
- 2018-04-13 JP JP2019555877A patent/JP2020516897A/ja active Pending
- 2018-04-13 KR KR1020197033257A patent/KR20190139254A/ko not_active Application Discontinuation
- 2018-04-13 EP EP18785106.8A patent/EP3611666A4/en not_active Withdrawn
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015062352A1 (zh) * | 2013-10-29 | 2015-05-07 | 同方威视技术股份有限公司 | 立体成像系统及其方法 |
CN104751163A (zh) * | 2013-12-27 | 2015-07-01 | 同方威视技术股份有限公司 | 对货物进行自动分类识别的透视检查系统和方法 |
CN105784732A (zh) * | 2014-12-26 | 2016-07-20 | 同方威视技术股份有限公司 | 检查方法和检查系统 |
Non-Patent Citations (2)
Title |
---|
See also references of EP3611666A4 * |
ZHANG, JIAN ET AL.: "The Application of Neural Network Model for X-ray Image Fusion", COMPUTERIZED TOMOGRAPHY THEORY AND APPLICATIONS, vol. 20, no. 2, 30 June 2011 (2011-06-30), pages 235 - 243, XP009517427, ISSN: 1004-4140 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110472728A (zh) * | 2019-07-30 | 2019-11-19 | 腾讯科技(深圳)有限公司 | 目标信息确定方法、目标信息确定装置、介质及电子设备 |
CN111860263A (zh) * | 2020-07-10 | 2020-10-30 | 海尔优家智能科技(北京)有限公司 | 信息录入方法、装置及计算机可读存储介质 |
CN113496046A (zh) * | 2021-01-18 | 2021-10-12 | 图林科技(深圳)有限公司 | 一种基于区块链的电商物流系统及方法 |
CN113496046B (zh) * | 2021-01-18 | 2024-05-10 | 华翼(广东)电商科技有限公司 | 一种基于区块链的电商物流系统及方法 |
Also Published As
Publication number | Publication date |
---|---|
CN108734183A (zh) | 2018-11-02 |
EP3611666A1 (en) | 2020-02-19 |
KR20190139254A (ko) | 2019-12-17 |
JP2020516897A (ja) | 2020-06-11 |
EP3611666A4 (en) | 2021-01-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ghiasi et al. | Scaling open-vocabulary image segmentation with image-level labels | |
WO2022227207A1 (zh) | 文本分类方法、装置、计算机设备和存储介质 | |
WO2018188653A1 (zh) | 检查方法和检查设备 | |
CN112214995B (zh) | 用于同义词预测的分层多任务术语嵌入学习 | |
CN111797241B (zh) | 基于强化学习的事件论元抽取方法及装置 | |
CN111125406B (zh) | 一种基于自适应聚类学习的视觉关系检测方法 | |
CN111914097A (zh) | 基于注意力机制和多层级特征融合的实体抽取方法与装置 | |
Younis et al. | Detection and annotation of plant organs from digitised herbarium scans using deep learning | |
WO2019052561A1 (zh) | 检查方法和检查设备以及计算机可读介质 | |
Menshawy | Deep Learning By Example: A hands-on guide to implementing advanced machine learning algorithms and neural networks | |
CN113672931B (zh) | 一种基于预训练的软件漏洞自动检测方法及装置 | |
Cheng et al. | A semi-supervised deep learning image caption model based on Pseudo Label and N-gram | |
CN117611576A (zh) | 一种基于图文融合对比学习预测方法 | |
CN115965818A (zh) | 一种基于相似度特征融合的小样本图像分类方法 | |
Gunaseelan et al. | Automatic extraction of segments from resumes using machine learning | |
CN111242059B (zh) | 基于递归记忆网络的无监督图像描述模型的生成方法 | |
Wu et al. | AGNet: Automatic generation network for skin imaging reports | |
US20240028828A1 (en) | Machine learning model architecture and user interface to indicate impact of text ngrams | |
Li et al. | Legal case inspection: An analogy-based approach to judgment evaluation | |
Sharma et al. | Optical Character Recognition Using Hybrid CRNN Based Lexicon-Free Approach with Grey Wolf Hyperparameter Optimization | |
CN116503674B (zh) | 一种基于语义指导的小样本图像分类方法、装置及介质 | |
Souri et al. | Neural network dealing with Arabic language | |
Yan et al. | Causality Extraction Cascade Model Based on Dual Labeling | |
Jayaswal et al. | Image Captioning Using VGG-16 Deep Learning Model | |
Abbruzzese et al. | REMOAC: A retroactive explainable method for OCR anomalies correction in legal domain |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 18785106 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2019555877 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 20197033257 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2018785106 Country of ref document: EP Effective date: 20191114 |