[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2018188653A1 - 检查方法和检查设备 - Google Patents

检查方法和检查设备 Download PDF

Info

Publication number
WO2018188653A1
WO2018188653A1 PCT/CN2018/083012 CN2018083012W WO2018188653A1 WO 2018188653 A1 WO2018188653 A1 WO 2018188653A1 CN 2018083012 W CN2018083012 W CN 2018083012W WO 2018188653 A1 WO2018188653 A1 WO 2018188653A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
transmission image
category
container
word
Prior art date
Application number
PCT/CN2018/083012
Other languages
English (en)
French (fr)
Inventor
张健
赵占永
顾建平
刘耀红
赵自然
Original Assignee
清华大学
同方威视技术股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 清华大学, 同方威视技术股份有限公司 filed Critical 清华大学
Priority to EP18785106.8A priority Critical patent/EP3611666A4/en
Priority to KR1020197033257A priority patent/KR20190139254A/ko
Priority to JP2019555877A priority patent/JP2020516897A/ja
Publication of WO2018188653A1 publication Critical patent/WO2018188653A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N23/00Investigating or analysing materials by the use of wave or particle radiation, e.g. X-rays or neutrons, not covered by groups G01N3/00 – G01N17/00, G01N21/00 or G01N22/00
    • G01N23/02Investigating or analysing materials by the use of wave or particle radiation, e.g. X-rays or neutrons, not covered by groups G01N3/00 – G01N17/00, G01N21/00 or G01N22/00 by transmitting the radiation through the material
    • G01N23/04Investigating or analysing materials by the use of wave or particle radiation, e.g. X-rays or neutrons, not covered by groups G01N3/00 – G01N17/00, G01N21/00 or G01N22/00 by transmitting the radiation through the material and forming images of the material
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • G06F18/256Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/809Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
    • G06V10/811Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data the classifiers operating on different input data, e.g. multi-modal recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/05Recognition of patterns representing particular kinds of hidden objects, e.g. weapons, explosives, drugs

Definitions

  • Embodiments of the present disclosure relate to security checks, and more particularly to a method and apparatus for inspecting a container or the like based on image information and text information.
  • a method of inspecting a container comprising the steps of: performing an X-ray scan of the container to be inspected to obtain a transmission image; and generating a first vector describing the partial transmission image from the transmission image using a convolutional neural network; Generating a word vector from a textual description of the container cargo using a cyclic neural network as a second vector; integrating the first vector and the second vector to obtain a third vector representing the transmission image and the text description; The third vector discriminates the category to which the goods in the container belong.
  • the step of discriminating a category to which the goods in the container belong based on the third vector further comprises: generating, based on a probability function, a probability value indicating that the goods in the container belong to a certain category from the third vector ; The category with the highest probability value is taken as the category to which the goods belong.
  • the method further includes presenting to the user a typical transmission image associated with the category based on the identified category.
  • the step of generating a word vector includes: performing a word segmentation operation on the text description of the container cargo; and vectorizing the character description after the word segmentation operation to obtain a word vector.
  • the method further comprises the steps of: retrieving a corresponding representative transmission image from a library of typical transmission images based on the word vector; presenting the retrieved representative transmission image to a user.
  • the method further comprises the steps of: retrieving a corresponding representative transmission image from a library of typical transmissive images based on the first vector; presenting the retrieved representative transmission image to a user.
  • an inspection apparatus comprising: an X-ray inspection system that performs an X-ray scan of a container to be inspected to obtain a transmission image; a memory that stores the transmission image; and a processor configured to: utilize A convolutional neural network generates a first vector describing a local transmission image from a transmission image; generating a word vector from a textual description of the container cargo using a cyclic neural network as a second vector; integrating the first vector and the second vector to obtain Denoting the third image of the transmission image and the text description; and discriminating the category to which the goods in the container belong based on the third vector.
  • the processor is configured to generate, from the third vector, a probability value indicating that the goods in the container belong to a certain category based on a probability function; and the category having the largest probability value as the belonging of the goods category.
  • the processor is further configured to present a typical transmission image associated with the category to a user according to the identified category.
  • the processor is configured to perform a word segmentation operation on the text description of the container cargo; and vectorize the character description after the word segmentation operation to obtain a word vector.
  • the processor is further configured to: retrieve a corresponding representative transmission image from a typical transmission image library based on the word vector; present the retrieved representative transmission image to a user.
  • the approximate category of the target cargo can be initially determined, which facilitates the judger's further judgment.
  • FIG. 1 shows a schematic structural view of an inspection apparatus according to an embodiment of the present disclosure
  • FIG. 2 is a schematic diagram showing the structure of a computing device included in the inspection apparatus as described in FIG. 1;
  • FIG. 3 shows a schematic block configuration diagram of an inspection apparatus according to an embodiment of the present disclosure
  • FIG. 4 shows a schematic flow chart of an inspection method according to an embodiment of the present disclosure
  • FIG. 5 illustrates a schematic diagram of matching a cargo image with category information in accordance with an embodiment of the present disclosure
  • FIG. 6 illustrates a schematic diagram of determining a category of a cargo using a convolutional neural network, in accordance with an embodiment of the present disclosure
  • FIG. 7 illustrates a schematic diagram of retrieving a typical perspective image based on category information, in accordance with another embodiment of the present disclosure
  • FIG. 8 illustrates a word vector space relationship diagram used in a method in accordance with an embodiment of the present disclosure
  • FIG. 9 illustrates a cell structure diagram of a cyclic neural network used in a method in accordance with an embodiment of the present disclosure
  • FIG. 10 illustrates a schematic diagram of generating category information from image vectors and word vectors, in accordance with an embodiment of the present disclosure.
  • embodiments of the present disclosure propose a human-aided inspection technique based on x-ray images and text descriptions to complete intelligent analysis tools for classifying and checking goods in specific areas (key areas of interest).
  • the inspection personnel are more judged on the local area of the image, which is also the most closely related and necessary part of the human-machine "mutual assistance".
  • the technology uses the computer's data analysis and image understanding capabilities to initially determine the approximate category of the target shipment.
  • human perception information is introduced, especially for the comprehensive cognition of local prominent regions, and more accurate classification results are given, thereby improving the effectiveness of the inspection recommendations.
  • FIG. 1 shows a schematic structural view of an inspection apparatus according to an embodiment of the present disclosure.
  • an inspection apparatus 100 includes an X-ray source 110, a detector 130, a data collection device 150, a controller 140, and a computing device 160 for an object to be inspected 120 such as a container truck.
  • Conduct a security check such as determining whether it contains dangerous goods such as firearms/drugs and/or suspicious items.
  • the detector 130 and the data acquisition device 150 are separately described in this embodiment, those skilled in the art will appreciate that they may also be integrated together as an X-ray detection and data acquisition device.
  • the X-ray source 110 described above may be an isotope, or may be an X-ray machine or an accelerator or the like.
  • the X-ray source 110 can be single energy or dual energy.
  • the object to be inspected 120 is scanned by X-ray source 110 and detector 150 and controller 140 and computing device 160 to obtain probe data.
  • the operator by means of the human-computer interaction interface of the computing device 160, issues an instruction through the controller 140 to command the X-ray source 110 to emit radiation, passing through the object under inspection 120 and being detected by the detector 130 and
  • the data acquisition device 150 receives and processes the data by the computing device 160 to obtain a transmission image, and further generates an image vector (first vector) describing the local transmission image from the transmission image using the trained convolutional neural network, using the trained
  • the circulatory neural network generates a word vector (second vector) from the textual description of the container cargo.
  • Computing device 160 determines the category to which the goods in the container belong based on the image vector and the word vector. For example, computing device 160 integrates the first vector and the second vector to obtain a third vector representing the transmitted image and the textual description, and based on the third vector, discriminating the category to which the cargo in the container belongs.
  • FIG. 2 shows a schematic structural diagram of a computing device as shown in FIG. 1.
  • the signal detected by the detector 130 is collected by a data collector, and the data is stored in the memory 161 through the interface unit 167 and the bus 163.
  • Configuration information and a program of the computer data processor are stored in a read only memory (ROM) 162.
  • a random access memory (RAM) 163 is used to temporarily store various data during the operation of the processor 165.
  • a computer program for performing data processing such as a substance recognition program, an image processing program, and the like are also stored in the memory 161.
  • the internal bus 163 is connected to the above-described memory 161, read only memory 162, random access memory 163, input device 164, processor 165, display device 166, and interface unit 167.
  • the instruction code of the computer program instructs the processor 165 to execute a predetermined data processing algorithm, and after obtaining the data processing result, displays it such as LCD (Liquid).
  • the processing result is output on the display device 167 such as a display, or directly in the form of a hard copy such as printing.
  • FIG. 3 shows a schematic block configuration diagram of an inspection apparatus according to an embodiment of the present disclosure.
  • a software program is installed in the computing device 160 of the inspection device to determine the category of the shipment, such as HSCODE, based on the transmitted image of the containerized cargo and the textual information describing the shipment.
  • the convolutional neural network based image understanding module 310 processes the input transmission image to obtain an image vector.
  • the textual understanding module 320 based on the cyclic neural network processes the input text information to obtain a word vector.
  • the analysis and learning module 330 determines the category to which the goods belong based on the image vector and the word vector.
  • FIG. 4 shows a schematic flow chart of an inspection method according to an embodiment of the present disclosure.
  • the inspection apparatus shown in FIG. 1 performs X-ray scanning on the container to be inspected to obtain a transmission image.
  • a first vector describing the partial transmission image is generated from the transmission image using a convolutional neural network.
  • a convolution kernelization operation is performed with a partial region of the container transmission image as an input, and then a full convolution operation is performed, and a vector representation of the transmission image is output as the first vector.
  • the local area of the container transmission image is taken as input, and after five stages of convolution and pooling operations (each level corresponds to a set of convolution kernels and one pooling layer, the number and size of convolution kernels are independent)
  • the network can output a vector representation of the transmitted image.
  • the convolutional neural network based image understanding module 310 is responsible for cargo identification and analysis of x-ray images.
  • the module 310 mainly includes two parts: a cargo category judgment using a convolution network and a typical template matching.
  • the image of the local sensitive area contains rich texture information of the goods.
  • the cargo category judgment of the convolutional neural network takes a specific local area image as an input, and through a multi-level matrix operation, a vector representation of the local area image can be generated, and the vector can be used to perform the category attribution inference information. As shown in Figure 5, the representation of this information is the HSCODE encoding of the cargo type and the corresponding confidence probability.
  • the convolutional neural network preferably employs a network structure of VGG (Visual Geometry Group)-Net, but those skilled in the art understand that different embodiments may not be limited to such a structure.
  • the input of the convolutional neural network is a local area of the cargo x-ray image. After multi-level convolution, pooling, full connection, etc., a vector can be obtained to represent the information of the image.
  • the convolution operation is a process in which the analog filter performs feature learning on the image in order to fully extract the information in the image.
  • multiple different and independent convolution kernels are utilized, each convolution kernel convolving the input separately and passing all convolution results into the next operation.
  • each output matrix after convolution is divided into n*m grids, where n and m respectively represent the number of rows and columns of the grid, and then the maximum value of each grid is taken as the network.
  • the output value of the cell can finally get a matrix of size n*m, which is the output of the pooling operation.
  • the full connection process is to vectorize the output matrix through multi-layer convolution and pooling operations, and also adds a mapping operation to the data by using the full connection matrix. This increases the learning and simultaneously transforms the output matrix into a vector whose length is equal to the number of categories to facilitate subsequent classification operations.
  • this vector is probabilistically processed, which is done using the Softmax function. That is to say, each element in the vector represents a probability value, which corresponds to the probability size of the object to be tested.
  • the probabilistic formula of the Softmax function can be expressed as:
  • v denotes the fully connected output vector
  • v i denotes the i-th element in v
  • k is the length of the vector
  • c i denotes the i-th class
  • v) is predicted to be the i-th based on the input The probability value of the class. Accordingly, the category with the largest probability value can be used as the prediction result of the first stage.
  • the typical template matching function can use the HSCODE code given by the convolutional neural network to visually provide a number of typical texture image block data of the goods, thereby further confirming whether the inferred information is credible, as shown in FIG. 7. Shown.
  • a typical cargo image with the corresponding HSCODE encoding can be called and presented in a nine-square grid manner.
  • the inspector can compare the image of the goods to be inspected with the typical data for better judgment.
  • a word vector is generated from the textual description of the container cargo using the cyclic neural network as the second vector.
  • the textual description of the containerized goods is used as an input to the network, and the textual description is converted into a list by a word segmentation operation.
  • a vector representation of each word in the list is obtained by querying the dictionary as a second vector. More specifically, the textual description of the inspector is used as a network input.
  • a sentence description sentence is converted into a corresponding list of words (in some cases, it is possible to remove the repeated words or assign a certain weight to the words).
  • the existing dictionary is then queried, turned into a word label, and a vector representation of each word in the list is extracted.
  • the words in the word list are then input into the LSTM (Long-Short Term Memory) network one by one for prediction.
  • the textual understanding module 320 based on the cyclic neural network takes a textual description as an input, performs system processing, and finally outputs a typical image consistent with the text description, in order to be able to provide an effective checker in a more humane manner. Information is decided.
  • the module includes a participle operation part, a word vector generation part, a typical image display and the like.
  • FIG. 8 illustrates a word vector space relationship diagram used in a method in accordance with an embodiment of the present disclosure.
  • the typical image display part of the text understanding module is mainly for the fact that the previous inspection system basically takes images as input, and can only rely on the ability of the computer to understand the image, and the human perception function is rarely introduced. Under certain conditions, the inspector may not be able to judge which type of image belongs to, and can only characterize the characteristics of the goods by describing the texture, shape, etc., to call up typical historical images for comparison. If the traditional inspection system is used, the keyword information is required to be described, which indirectly becomes a burden on the inspector. The generation of the corresponding word vector by the cyclic neural network naturally has the ability to perform distance learning on similar words (Fig. 8). As a result, in actual operation, it is not necessary to completely and accurately type the fixed keyword of the goods. Convenient and accurate retrieval of the desired image.
  • the word segmentation operation is for the preprocessing operation of the data input system with sentences (especially obvious in Chinese). Unlike previous template data retrieval with attributes or keywords as input, the module supports users to more flexible and complete typing of the desired information in the form of complete sentences, but the working method based on sentences as basic information units is complex and inefficient. Therefore, it is necessary to properly decompose sentences into unit words.
  • the word segmentation operation follows the human understanding of the language. Based on the dictionary, the sentence is divided into an array (or vector) with words (words or phrases) as elements to facilitate computer understanding.
  • Text understanding is the domain of natural language processing, and word segmentation technology is the basis of text mining. Especially for Chinese input, due to the innate language structure particularity, Chinese description only has obvious identification of sentences, paragraphs, etc., and lacks a clear delimiter for individual words. Therefore, the first condition of text understanding is the clear division of text description.
  • the word segmentation operation is performed here based on a statistical and machine learning manner.
  • a dictionary is first established based on historical knowledge. When the word segmentation is used, some rules are used for string matching. For some ambiguous words and words not entered into the dictionary, CRF (conditional random fields) is used.
  • the sentence is marked with the word position (the beginning of the word, the middle of the word, the ending of the word, and the single subword), and then the CRF is used to perform the word segmentation operation, and at the same time, the new words not registered in the dictionary are added to facilitate the matching.
  • the word position the beginning of the word, the middle of the word, the ending of the word, and the single subword
  • the word vector is a process of transforming a language description into a feature that is easy for the computer to understand and manipulate. This process relies entirely on a circular neural network implementation. Cyclic neural networks naturally have the ability to process and analyze serialized association data. They can be used to group large pieces of information into several core elements, and can also enrich words that are not clearly related to each other into a piece of information that can be understood.
  • the data after segmentation is input as a network in the form of a vector, and each time a word is analyzed and learned until the processing of the last word is completed, a vector representation, called a word vector, can be generated. This vector contains the information of the entire sentence, by which the typical image of the speech symbol can be retrieved accordingly, or the subsequent category determination can be made.
  • the word vector is a process of vectorizing a textual description of a word segmentation operation.
  • the LSTM Long-Short Term Memory
  • the specific working process is to convert each word in the text description into a vector, which can be encoded by one-hot, or by using a mapping matrix whose number of lines is the number of dictionary words and the number of columns is a specified size. Ground, the latter is used here.
  • FIG. 9 shows a cell structure diagram of a cyclic neural network used in a method in accordance with an embodiment of the present disclosure. After all the words are vectorized, they are reversed in the order of the text description, and then the initial vector of one word is selected in order, and then input into the network unit of the LSTM.
  • the calculation process of the LSTM unit can be expressed as:
  • m t sigmoid(f t )*m t-1 +sigmoid(i t )*tanh(c t )
  • x t represents the initial vector of the tth word
  • h t-1 is the output of the last LSTM unit
  • W is the weight matrix
  • the parameter matrix pre-trained by the previous samples i t , c t , f t and o t is the network intermediate state of the t-th word
  • m t-1 is the transfer value of the middle state of the previous word
  • sigmoid() and tanh() are called the activation function
  • m t is the state transfer value of the t-th word.
  • h t is the word vector generated by the previous t words. If the input text description contains a total of k words, then after the processing of the k LSTM units, a final word vector h k containing the description information is finally generated.
  • the typical image display portion of the text description changes the traditional system to use image understanding as the sole criterion for the results and similar displays of the inspection system, but rather the process of visualizing the perception of the inspector by text.
  • the typical image template data is manually labeled and described, and then the BOW (Bag of Words) method is used to classify the annotation information, and a BOW feature of each type of image can be obtained.
  • the correlation between this vector and the BOW feature of the typical image is calculated. Then select the category corresponding to the top three BOW features of the correlation, and extract the typical images under the category for visual display.
  • step S440 the first vector and the second vector are integrated to obtain a third vector representing the transmission image and the text description.
  • the category to which the goods in the container belong is discriminated based on the third vector. For example, a probability value indicating that the goods in the container belong to a certain category is generated from the third vector based on a probability function (for example, a Softmax function), and the category having the largest probability value is regarded as the category to which the goods belong.
  • a probability function for example, a Softmax function
  • FIG. 10 illustrates a schematic diagram of generating category information from image vectors and word vectors, in accordance with an embodiment of the present disclosure.
  • the analysis and learning module 330 combines the image comprehension capability of the computer with the human perception ability to more completely complete the new means of inspection tasks.
  • the image comprehension module based on convolutional neural network and the text comprehension module based on convolutional neural network can respectively perform analysis on image or text description and give corresponding results, and the comprehensive analysis and learning module can combine the capabilities of both.
  • the mutual learning process can be completed, and the output is a more accurate prediction result.
  • a convolutional neural network-based image comprehension module and a convolutional network-based text comprehension module are currently each performing an inspection of an image for each of an image and a text description, and the comprehensive analysis of the system and The learning module effectively combines the two to better assist the inspector in completing the inspection.
  • the convolutional network for image understanding and the cyclic network for textual understanding are each trained to calculate their losses, so that the initial learning of the two networks can be completed, and then the image representation vector and the loop of the convolutional network are completed.
  • the word vectors output by the network are integrated, and after a projection mapping, the Softmax function is also used to obtain the prediction categories obtained by combining the two networks. This effectively combines the two kinds of information, and since the two networks go through a process of joint training during the training phase, when the feedback is adjusted, the adjustment of each network has another network intervention and Adjustments have increased the learning of the entire system.
  • the local area of the container transmission image is used as the input of the VGG network, and undergoes five levels of convolution and pooling operations (each level corresponds to a set of convolution kernels and one pooling).
  • the layer, the number and size of the convolution kernel are independent, and then through the 3-layer full convolution operation, the output of the last convolution layer is the vector representation I of the transmission image; the cyclic neural network for text understanding, to the inspector
  • the text description is used as the network input. After the basic word segmentation operation, a text description sentence can be converted into a corresponding list of words (here, you can choose to remove the repeated words or assign a certain weight to the words), and then according to the existing dictionary.
  • the query operation can be changed into a word label and extract the vector representation of each word in the list. Then the words in the word list are input into the LSTM network one by one according to the order, when all the words in the list are executed.
  • a vector representation T of the final textual understanding can be generated.
  • the vector representation I of the image and the vector representation T of the text are spliced into a vector, and then through the 2-layer full convolution network, the category prediction is performed using the Softmax layer, so that the container cargo category combining image and text information can be realized. Predictive reminder function.
  • the network training learning process can use SGD (Stochastic Gradient Descent) and BGD (Stochastic Gradient Descent) to optimize the parameters of the learning network.
  • the entire network structure contains their own (image, text) processing network, and there is a common learning process that combines two kinds of information. In each network adjustment, it will be interfered and adjusted by another network to some extent, adding to the system's Information utilization and learning.
  • the data analysis and image understanding capabilities of the computer are utilized to initially determine the approximate category of the target goods.
  • human perception information is introduced, especially for the comprehensive cognition of local prominent regions, and more accurate classification results are given, thereby improving the effectiveness of the inspection recommendations.
  • aspects of the embodiments disclosed herein may be implemented in an integrated circuit as a whole or in part, as one or more of one or more computers running on one or more computers.
  • a computer program eg, implemented as one or more programs running on one or more computer systems
  • implemented as one or more programs running on one or more processors eg, implemented as one or One or more programs running on a plurality of microprocessors, implemented as firmware, or substantially in any combination of the above, and those skilled in the art, in accordance with the present disclosure, will be provided with design circuitry and/or write software and / or firmware code capabilities.
  • signal bearing media include, but are not limited to, recordable media such as floppy disks, hard drives, compact disks (CDs), digital versatile disks (DVDs), digital tapes, computer memories, and the like; and transmission-type media such as digital and / or analog communication media (eg, fiber optic cable, waveguide, wired communication link, wireless communication link, etc.).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Analysing Materials By The Use Of Radiation (AREA)
  • Image Analysis (AREA)

Abstract

公开了一种检查设备和检查方法。对待检查的集装箱进行X射线扫描,得到透射图像,然后利用卷积神经网络从透射图像产生描述局部透射图像的第一向量,并且利用循环神经网络从集装箱货物的文字描述产生词向量,作为第二向量。整合第一向量和第二向量,得到表述透射图像和文字描述的第三向量。基于第三向量判别集装箱中的货物所属的类别。根据本公开的实施例,可以初步判断目标货物的大致类别,方便判图员的进一步判断。

Description

检查方法和检查设备 技术领域
本公开的实施例涉及安全检查,具体涉及一种基于图像信息和文本信息的检查集装箱之类货物的方法和设备。
背景技术
当前,辐射安检查验系统主要集中在对货物X射线图像的分析上。例如,利用图像理解的相关知识完成货物分类和识别任务。但是对于较难区分的货物,目前还主要基于人的认知进行区分判断,而人机辅助并没有达到“互助”的程度。
发明内容
鉴于现有技术中的一个或多个问题,提出了一种检查例如集装箱之类的货物的方法和设备。
在本公开的一个方面,提出了一种检查集装箱的方法,包括步骤:对待检查的集装箱进行X射线扫描,得到透射图像;利用卷积神经网络从透射图像产生描述局部透射图像的第一向量;利用循环神经网络从集装箱货物的文字描述产生词向量,作为第二向量;整合所述第一向量和所述第二向量,得到表述所述透射图像和所述文字描述的第三向量;以及基于所述第三向量判别所述集装箱中的货物所属的类别。
根据本公开的实施例,基于所述第三向量判别所述集装箱中的货物所属的类别的步骤还包括:基于概率函数从所述第三向量产生表示集装箱中的货物属于某个类别的概率值;将具有最大概率值的类别作为所述货物所属的类别。
根据本公开的实施例,所述的方法还包括:根据所判别的类别向用户呈现与所述类别相关联的典型透射图像。
根据本公开的实施例,产生词向量的步骤包括:对所述集装箱货物的文字描述进行分词操作;将分词操作后的文字描述向量化,得到词向量。
根据本公开的实施例,所述的方法还包括步骤:基于所述词向量从典型透射图像库中检索相应的典型透射图像;向用户呈现所检索的典型透射图像。
根据本公开的实施例,所述的方法还包括步骤:基于所述第一向量从典型透 射图像库中检索相应的典型透射图像;向用户呈现所检索的典型透射图像。
在本公开的另一个方面,提出了一种检查设备,包括:X射线检查系统,对待检查的集装箱进行X射线扫描,得到透射图像;存储器,存储所述透射图像;处理器,配置为:利用卷积神经网络从透射图像产生描述局部透射图像的第一向量;利用循环神经网络从集装箱货物的文字描述产生词向量,作为第二向量;整合所述第一向量和所述第二向量,得到表述所述透射图像和所述文字描述的第三向量;以及基于所述第三向量判别所述集装箱中的货物所属的类别。
根据本公开的实施例,所述处理器配置为:基于概率函数从所述第三向量产生表示集装箱中的货物属于某个类别的概率值;将具有最大概率值的类别作为所述货物所属的类别。
根据本公开的实施例,所述处理器还被配置为:根据所判别的类别向用户呈现与所述类别相关联的典型透射图像。
根据本公开的实施例,所述处理器被配置为:对所述集装箱货物的文字描述进行分词操作;将分词操作后的文字描述向量化,得到词向量。
根据本公开的实施例,所述处理器还被配置为:基于所述词向量从典型透射图像库中检索相应的典型透射图像;向用户呈现所检索的典型透射图像。
利用上述实施例的方案,能够初步判断目标货物的大致类别,方便判图员的进一步判断。
附图说明
为了更好地理解本发明,将根据以下附图对本发明进行详细描述:
图1示出了根据本公开实施例的检查设备的结构示意图;
图2是描述如图1所述的检查设备中包括的计算设备的结构的示意图;
图3示出了根据本公开的实施例的检查设备的示意性模块结构图;
图4示出了根据本公开的实施例的检查方法的示意性流程图;
图5示出了根据本公开实施例的货物图像与类别信息匹配的示意图;
图6示出了根据本公开的实施例,利用卷积神经网络来判断货物的类别的示意图;
图7示出了根据本公开的另一实施例,根据类别信息检索典型透视图像的示意图;
图8示出了根据本公开的实施例的方法中使用的词语向量空间关系图;
图9示出了根据本公开的实施例的方法中使用的循环神经网络的单元结构图;以及
图10示出了根据本公开的实施例根据图像向量和词向量生成类别信息的示意图。
具体实施方式
下面将详细描述本发明的具体实施例,应当注意,这里描述的实施例只用于举例说明,并不用于限制本发明。在以下描述中,为了提供对本发明的透彻理解,阐述了大量特定细节。然而,对于本领域普通技术人员显而易见的是:不必采用这些特定细节来实行本发明。在其他实例中,为了避免混淆本发明,未具体描述公知的结构、材料或方法。
在整个说明书中,对“一个实施例”、“实施例”、“一个示例”或“示例”的提及意味着:结合该实施例或示例描述的特定特征、结构或特性被包含在本发明至少一个实施例中。因此,在整个说明书的各个地方出现的短语“在一个实施例中”、“在实施例中”、“一个示例”或“示例”不一定都指同一实施例或示例。此外,可以以任何适当的组合和/或子组合将特定的特征、结构或特性组合在一个或多个实施例或示例中。此外,本领域普通技术人员应当理解,这里使用的术语“和/或”包括一个或多个相关列出的项目的任何和所有组合。
鉴于现有技术中的问题,本公开的实施例提出了一种基于x射线图像和文本描述的人机辅助查验技术,完成对特定区域(重点关注区域)货物分类和查验的智能分析工具。在实际货物查验过程中,查验人员更多地是对图像局部区域的判断,这也是人机“互助”最为密切与必要的环节。该技术利用计算机的数据分析和图像理解能力,初步判断目标货物的大致类别。此外还引入人类感知信息,特别对局部突出区域的综合认知,给出更加准确的分类结果,从而提高查验建议的有效性。
图1示出了根据本公开实施例的检查设备的结构示意图。如图1所示,根据本公开实施例的检查设备100包括X射线源110、探测器130、数据采集装置150、控制器140、和计算设备160,对诸如集装箱卡车之类的被检查物体120进行安全检查,例如判断其中是否包含了诸如枪支/毒品之类的危险品和/或可疑物品。 虽然在该实施例中,将探测器130和数据采集装置150分开描述,但是本领域的技术人员应该理解也可以将它们集成在一起称为X射线探测和数据采集设备。
根据一些实施例,上述的X射线源110可以是同位素,也可以是X光机或加速器等。X射线源110可以是单能,也可以是双能。这样,通过X射线源110和探测器150以及控制器140和计算设备160对被检查物体120进行透射扫描,得到探测数据。例如在被检查物体120行进过程中,操作人员借助于计算设备160的人机交互界面,通过控制器140发出指令,命令X射线源110发出射线,穿过被检查物体120后被探测器130和数据采集设备150接收,并且通过计算设备160对数据进行处理,可以获得透射图像,并且进一步利用训练的卷积神经网络从透射图像产生描述局部透射图像的图像向量(第一向量),利用训练的循环神经网络从集装箱货物的文字描述产生词向量(第二向量)。然后计算设备160基于图像向量和词向量判断集装箱中的货物的所属类别。例如计算设备160整合第一向量和第二向量,得到表述透射图像和文字描述的第三向量,基于第三向量判别所述集装箱中的货物所属的类别。
图2示出了如图1所示的计算设备的结构示意图。如图2所示,探测器130探测的信号通过数据采集器采集,数据通过接口单元167和总线163存储在存储器161中。只读存储器(ROM)162中存储有计算机数据处理器的配置信息以及程序。随机存取存储器(RAM)163用于在处理器165工作过程中暂存各种数据。另外,存储器161中还存储有用于进行数据处理的计算机程序,例如物质识别程序和图像处理程序等等。内部总线163连接上述的存储器161、只读存储器162、随机存取存储器163、输入装置164、处理器165、显示装置166和接口单元167。
在用户通过诸如键盘和鼠标之类的输入装置164输入的操作命令后,计算机程序的指令代码命令处理器165执行预定的数据处理算法,在得到数据处理结果之后,将其显示在诸如LCD(Liquid Crystal Display)显示器之类的显示装置167上,或者直接以诸如打印之类硬拷贝的形式输出处理结果。
图3示出了根据本公开的实施例的检查设备的示意性模块结构图。如图3所示,根据本公开的实施例,检查设备的计算设备160中安装有软件程序,基于集装箱货物的透射图像和描述该货物的文字信息确定货物的类别,例如HSCODE。例如,基于卷积神经网络的图像理解模块310对输入的透射图像进行处理,得到图像向量。基于循环神经网络的文本理解模块320对输入的文本信息进行处理, 得到词向量。分析与学习模块330基于图像向量和词向量确定货物的所属类别。
图4示出了根据本公开的实施例的检查方法的示意性流程图。如图4所示,在步骤S410,例如如图1所示的检查设备对待检查的集装箱进行X射线扫描,得到透射图像。
在步骤S420,利用卷积神经网络从透射图像产生描述局部透射图像的第一向量。例如,以集装箱透射图像的局部区域作为输入进行卷积核池化操作,然后进行全卷积操作,输出该透射图像的向量表示,作为第一向量。更具体地,以集装箱透射图像的局部区域作为输入,经过5级的卷积、池化操作(每一级对应一组卷积核和一个池化层,卷积核个数和尺寸各自独立),之后再经过3层全卷积操作,网络就可以输出一个该透射图像的向量表示。虽然上述实施例中是以5级卷积和3层全卷积操作来进行描述的,但是本领域的技术人员可以想到使用其他的卷积神经网络。
根据本公开的实施例,基于卷积神经网络的图像理解模块310负责x射线图像的货物判别和分析。在实际应用场景下该模块310主要包含利用卷积网络的货物类别判断、典型模板匹配两部分。
针对x射线货物查验的目的,局部敏感区域的图像包含了该类货物的丰富纹理信息。对于以纹理分析为主的类属识别问题,可以专注于对该区域的理解。卷积神经网络的货物类别判断以特定局部区域图像为输入,通过多层级的矩阵操作,可以生成局部区域图像的向量表示,并可以利用这个向量进行类别归属的推断信息。如图5所示,该信息的表现形式是货物类型的HSCODE编码和相应的置信概率。
图6示出了根据本公开的实施例,利用卷积神经网络来判断货物的类别的示意图。根据本公开的实施例,卷积神经网络优选地采用的网络结构为VGG(Visual Geometry Group)-Net,但本领域的技术人员理解,不同的实施例中可以不限于这种结构。卷积神经网络的输入为货物x射线图像的局部区域,经过多级卷积、池化、全连接等操作,最后可以得到一个向量用于表征该图像的信息。
卷积操作是模拟滤波器对图像进行特征学习的过程,为了充分提取图像中的信息。在进行图像卷积操作时,会利用多个不同且相互独立的卷积核,每个卷积核会分别对输入进行卷积操作,并把所有卷积结果传入到下一步操作中。
池化操作可以有效提高算法对多尺度问题的适应能力。优选地,这里采用最 大池采样方法。具体的是将卷积后的每个输出矩阵都划分为n*m个网格,其中n和m分别表示网格的行和列数,然后对每个网格取其中的最大值作为该网格的输出值,最后可以得到一个大小为n*m的矩阵,这个矩阵就是池化操作的输出。
全连接过程是将经过多层卷积、池化操作的输出矩阵进行向量化表示,同时也利用全连接矩阵对数据增加了一个映射操作。这增加学习性的同时将输出矩阵变成一个长度等于类别数的向量进行表示,以方便后面的分类操作。最后将这个向量进行概率化处理,这里采用Softmax函数完成。也即是使得向量中每个元素都表示一个概率值,分别对应待测目标是某一类的概率大小。Softmax函数的概率化公式可以表示为:
Figure PCTCN2018083012-appb-000001
其中,v表示全连接后的输出向量,v i表示v中第i个元素,k为该向量的长度,c i表示第i个类别,p(c i|v)为根据输入预测成第i类的概率值。相应地,可以把具有最大概率值的类别作为第一阶段的预测结果。
另外,由于查验人员的经验有所不同,对某些类别货物图像的记忆程度也可能有差异。这种情况下,典型模板匹配功能就可以利用卷积神经网络给出的HSCODE编码,可视化的提供若干该类货物的典型纹理图像块数据,从而进一步确认上述推断信息是否符可信,如图7所示。
根据本公开的另一实施例,根据卷积神经网络给出的预测结果,可以调用具有相应HSCODE编码的典型货物图像,并使用九宫格的方式进行呈现。相应地,查验员就可以将待检货物的图像与典型数据进行比对,便于更好的进行判断。
在步骤S430,利用循环神经网络从集装箱货物的文字描述产生词向量,作为第二向量。例如,以集装箱货物的文字描述作为网络的输入,经过分词操作将文字描述转换成列表。然后,通过查询词典来得到列表中每个词语的向量表示,作为第二向量。更为具体地,以查验员的文字描述作为网络输入。经过基本的分词操作将一句文字描述语句转化为相应的由词语构成的列表(在一些例子中可以选择去除重复单词或者对单词进行一定的权重分配)。然后查询已有的词典,将其变为词语标号,并抽取出列表中每个词语的向量表示。之后将词语列表中的词语根据先后顺序逐个输入到LSTM(Long-Short Term Memory)网络中进行预测。当对列表中所有单词都执行完循环神经网络处理后,就可以生成文本理解的最终向 量表示。
根据本公开的实施例,基于循环神经网络的文本理解模块320以文字描述作为输入,经过系统处理,最后输出与文字描述一致的典型图像,旨在能够以更加人性化的方式为查验人员提供有效信息进行决断。该模块包含分词操作部分,词向量生成部分,典型图像显示等部分。
图8示出了根据本公开的实施例的方法中使用的词语向量空间关系图。文字理解模块的典型图像显示部分主要是针对以往查验系统基本以图像为输入,只能依靠计算机对图像的理解能力,而人类感知作用发挥很少的情况引入的。在某些特定条件下,查验员可能不好判断图像属于哪一类,只能通过相关纹理、形状等描述来表征出货物的特点,来调出典型历史图像进行对比。如果使用传统查验系统,要求描述的是关键词信息,这个也间接成为了查验员的负担。而通过循环神经网络生成对应描述的词向量,天然地具备对类似词语进行距离学习(图8)的能力,结果就是在实际操作时,并不需要完全准确的键入货物的固定关键词,就可以方便、准确的检索到期望图像。
分词操作是针对以句子(中文尤为明显)作为数据输入系统的预处理操作。与以往以属性或者关键词为输入进行模板数据检索不同,该模块支持用户以完整句子的形式更加弹性与完整的键入期望包含的信息,但是基于句子作为基本信息单元的工作方式是复杂且低效的,因此将句子恰当的分解成单元词语是十分必要的。分词操作仿照人类对语言的理解方式,以词典作为依据,将句子切分成一个以词语(单词或短语)为元素的数组(或者向量)表示,便于计算机理解。
文本理解是自然语言处理的范畴,而分词技术是文本挖掘的基础。特别针对中文输入,由于先天的语言结构特殊性,中文描述只有对句、段等的明显标识,缺少对单独的词的清晰的分界符,因此文本理解的首要条件就是对文字描述的清晰划分。优选地,这里采用基于统计及机器学习的方式完成分词操作。在实际应用时,会先根据历史知识建立一个词典,分词时也先利用一些规则进行字符串匹配,而对于一些歧义词和未录入词典的词则采用CRF(conditional random fields)的方式完成。具体为将句子利用词位(词首、词中、词尾和单子词)进行标注,然后使用CRF对其进行分词操作,同时,将词典中未登录的新词加入其中,便于之后用来匹配。
词向量是将语言描述转换成计算机方便理解和操作的特征的过程,这个过程 完全依赖循环神经网络实现。循环神经网络天然具有对序列化关联数据的处理和分析能力,可以将大段信息归结成几个核心要素,也可以把相互没有明显关联的词语丰富成一段可以让人理解的信息。在该系统中,将分词后的数据以向量的形式作为网络输入,每次对一个词语进行分析和学习,直至完成最后一个词语的处理,就可以生成一个向量表示,称为词向量。这个向量包含了整个句子的信息,依靠该向量就可以相应地检索语气相符的典型图像,或者进行之后的类别判定。
词向量是将经过分词操作的文字描述向量化的过程。优选地,这里采用LSTM(Long-Short Term Memory)循环神经网络来完成。具体的工作过程是将文字描述中的每个词先变成一个向量,可以使用one-hot进行编码,或者利用一个行数为词典单词个数、列数为指定大小的映射矩阵进行转换,优选地,这里采用后者。
图9示出了根据本公开的实施例的方法中使用的循环神经网络的单元结构图。将所有词都进行向量化后,将它们按照文字描述的先后顺序进行反序操作,然后按顺序每次选择一个词的初始向量,依次输入LSTM的网络单元中。LSTM单元的计算过程可表示为:
H=[x t,h t-1]
[i t,c t,f t,o t]=H*W
m t=sigmoid(f t)*m t-1+sigmoid(i t)*tanh(c t)
h t=sigmoid(o t)*tanh(m t)
其中,x t表示第t个词的初始向量,h t-1为上一次LSTM单元的输出,W是权重矩阵,由之前的样本预训练好的参数矩阵,i t、c t、f t和o t为第t个词的网络中间状态,m t-1是上一个词中间状态的传递值,sigmoid()和tanh()称为激活函数,m t是第t个词的状态传递值,h t是之前t个词生成的词向量,假设输入的文字描述总共包含k个词,那么经过k次LSTM单元的处理,最后就会生成包含该描述信息的一个最终词向量h k
文字描述的典型图像显示部分改变传统系统以图像理解作为查验系统给出结果和相似显示的唯一标准,而是通过文字将查验员的感知具象化的过程。首先,对典型图像模板数据进行人工标注描述信息,然后利用BOW(Bag of Words)方法对标注信息进行分类,就可以得到每类图像的一个BOW特征。在使用时,将查验员输入的文字描述向量化表示后,将这个向量与典型图像的BOW特征进行相 关性计算。然后选出相关性前三的BOW特征所对应的类别,抽取出该类别下的典型图像进行可视化显示。
在步骤S440,整合第一向量和第二向量,得到表述透射图像和文字描述的第三向量。在步骤S450,基于第三向量判别集装箱中的货物所属的类别。例如,基于概率函数(例如Softmax函数)从第三向量产生表示集装箱中的货物属于某个类别的概率值,将具有最大概率值的类别作为货物所属的类别。
图10示出了根据本公开的实施例根据图像向量和词向量生成类别信息的示意图。如图10所示,分析和学习模块330结合计算机的图像理解能力与人类感知能力,从而更加准确的完成查验任务的全新手段。基于卷积神经网络的图像理解模块和基于卷积神经网络的文字理解模块可以各自的完成针对图像或文字描述的分析,并给出相应结果,而综合分析和学习模块可以结合两者的能力,通过将图像理解部分的图像向量和文字理解部分的词向量合并,再经过一个共同分析的环节,就可以完成相互学习的过程,而输出则是更为准确的预测结果。
根据本公开的实施例,基于卷积神经网络的图像理解模块和基于循环卷积网络的文字理解模块到目前还是各自针对图像和文字描述各自发挥着货物查验的工作,而该系统的综合分析和学习模块将两者有效地结合起来,以更好的辅助查验员完成查验的目的。
例如,对用于图像理解的卷积网络和用于文字理解的循环网络各自进行训练,分别计算它们的损失,这样可以完成两个网络的初始学习,之后将卷积网络的图像表示向量和循环网络输出的词向量进行整合,再经过一个投影映射之后,同样使用Softmax函数得到两个网络结合起来得到的预测类别。这样就有效的将两种信息进行了结合,并且由于在训练阶段,两个网络会经过一个共同训练的过程,在反馈调整的时候,相当于每个网络的调整都有另外一个网络的干预和调整,增加了整个系统的学习性。
更具体地,针对图像理解的卷积神经网络,以集装箱透射图像的局部区域作为VGG网络的输入,经过5级的卷积、池化操作(每一级对应一组卷积核和一个池化层,卷积核个数和尺寸各自独立),之后再经过3层全卷积操作,最后一个卷积层的输出就是该透射图像的向量表示I;针对文本理解的循环神经网络,以查验员的文字描述作为网络输入,经过基本的分词操作可以将一句文字描述语句转化为相应的由词语构成的列表(这里可以选择去除重复单词或者对单词进行 一定的权重分配),然后根据对已有词典的查询操作,可以将其变为词语标号,并抽取出列表中每个词语的向量表示,之后将词语列表中的词语根据先后顺序逐个输入到LSTM网络,当对列表中所有单词都执行完循环神经网络处理后,可以生成一个最终文本理解的向量表示T。接下来,将图像的向量表示I和文本的向量表示T拼接成一个向量,再通过2层全卷积网络后,使用Softmax层进行类别预测,这样就可以实现结合图像和文本信息的集装箱货物类别预测提示功能。而网络训练学习过程可以采用SGD(Stochastic Gradient Descent,随机梯度下降法)、BGD(Stochastic Gradient Descent,批量梯度下降法)等方法优化学习网络的参数。整个网络结构包含各自(图像、文本)的处理网络,也有结合两种信息的共同学习过程,在每个网络的调整时会在一定程度上受到另外一个网络的干预和调整,增加了这个系统的信息利用和学习性。
根据上述实施例,利用了计算机的数据分析和图像理解能力,初步判断目标货物的大致类别。此外还引入人类感知信息,特别对局部突出区域的综合认知,给出更加准确的分类结果,从而提高查验建议的有效性。
以上的详细描述通过使用示意图、流程图和/或示例,已经阐述了检查设备和方法的众多实施例。在这种示意图、流程图和/或示例包含一个或多个功能和/或操作的情况下,本领域技术人员应理解,这种示意图、流程图或示例中的每一功能和/或操作可以通过各种结构、硬件、软件、固件或实质上它们的任意组合来单独和/或共同实现。在一个实施例中,本发明的实施例所述主题的若干部分可以通过专用集成电路(ASIC)、现场可编程门阵列(FPGA)、数字信号处理器(DSP)、或其他集成格式来实现。然而,本领域技术人员应认识到,这里所公开的实施例的一些方面在整体上或部分地可以等同地实现在集成电路中,实现为在一台或多台计算机上运行的一个或多个计算机程序(例如,实现为在一台或多台计算机系统上运行的一个或多个程序),实现为在一个或多个处理器上运行的一个或多个程序(例如,实现为在一个或多个微处理器上运行的一个或多个程序),实现为固件,或者实质上实现为上述方式的任意组合,并且本领域技术人员根据本公开,将具备设计电路和/或写入软件和/或固件代码的能力。此外,本领域技术人员将认识到,本公开所述主题的机制能够作为多种形式的程序产品进行分发,并且无论实际用来执行分发的信号承载介质的具体类型如何,本公开所述主题的示例性实施例均适用。信号承载介质的示例包括但不限于:可记录型介质,如软 盘、硬盘驱动器、紧致盘(CD)、数字通用盘(DVD)、数字磁带、计算机存储器等;以及传输型介质,如数字和/或模拟通信介质(例如,光纤光缆、波导、有线通信链路、无线通信链路等)。
虽然已参照几个典型实施例描述了本发明,但应当理解,所用的术语是说明和示例性、而非限制性的术语。由于本发明能够以多种形式具体实施而不脱离发明的精神或实质,所以应当理解,上述实施例不限于任何前述的细节,而应在随附权利要求所限定的精神和范围内广泛地解释,因此落入权利要求或其等效范围内的全部变化和改型都应为随附权利要求所涵盖。

Claims (11)

  1. 一种检查集装箱的方法,包括步骤:
    对待检查的集装箱进行X射线扫描,得到透射图像;
    利用卷积神经网络从透射图像产生描述局部透射图像的第一向量;
    利用循环神经网络从集装箱货物的文字描述产生词向量,作为第二向量;
    整合所述第一向量和所述第二向量,得到表述所述透射图像和所述文字描述的第三向量;以及
    基于所述第三向量判别所述集装箱中的货物所属的类别。
  2. 如权利要求1所述的方法,其中基于所述第三向量判别所述集装箱中的货物所属的类别的步骤还包括:
    基于概率函数从所述第三向量产生表示集装箱中的货物属于某个类别的概率值;
    将具有最大概率值的类别作为所述货物所属的类别。
  3. 如权利要求2所述的方法,还包括:
    根据所判别的类别向用户呈现与所述类别相关联的典型透射图像。
  4. 如权利要求1所述的方法,其中产生词向量的步骤包括:
    对所述集装箱货物的文字描述进行分词操作;
    将分词操作后的文字描述向量化,得到所述词向量。
  5. 如权利要求4所述的方法,还包括步骤:
    基于所述词向量从典型透射图像库中检索相应的典型透射图像;
    向用户呈现所检索的典型透射图像。
  6. 如权利要求1所述的方法,还包括步骤:
    基于所述第一向量从典型透射图像库中检索相应的典型透射图像;
    向用户呈现所检索的典型透射图像。
  7. 一种检查设备,包括:
    X射线检查系统,对待检查的集装箱进行X射线扫描,得到透射图像;
    存储器,存储所述透射图像;
    处理器,配置为:
    利用卷积神经网络从透射图像产生描述局部透射图像的第一向量;
    利用循环神经网络从集装箱货物的文字描述产生词向量,作为第二向量;
    整合所述第一向量和所述第二向量,得到表述所述透射图像和所述文字描述的第三向量;以及
    基于所述第三向量判别所述集装箱中的货物所属的类别。
  8. 如权利要求7所述的检查设备,其中所述处理器配置为:
    基于概率函数从所述第三向量产生表示集装箱中的货物属于某个类别的概率值;
    将具有最大概率值的类别作为所述货物所属的类别。
  9. 如权利要求8所述的检查设备,所述处理器还被配置为:
    根据所判别的类别向用户呈现与所述类别相关联的典型透射图像。
  10. 如权利要求7所述的检查设备,其中所述处理器被配置为:
    对所述集装箱货物的文字描述进行分词操作;
    将分词操作后的文字描述向量化,得到词向量。
  11. 如权利要求10所述的检查设备,所述处理器还被配置为:
    基于所述词向量从典型透射图像库中检索相应的典型透射图像;向用户呈现所检索的典型透射图像。
PCT/CN2018/083012 2017-04-14 2018-04-13 检查方法和检查设备 WO2018188653A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP18785106.8A EP3611666A4 (en) 2017-04-14 2018-04-13 INSPECTION PROCEDURE AND INSPECTION DEVICE
KR1020197033257A KR20190139254A (ko) 2017-04-14 2018-04-13 검사 방법 및 검사 장비
JP2019555877A JP2020516897A (ja) 2017-04-14 2018-04-13 検査方法及び検査設備

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710243591.9 2017-04-14
CN201710243591.9A CN108734183A (zh) 2017-04-14 2017-04-14 检查方法和检查设备

Publications (1)

Publication Number Publication Date
WO2018188653A1 true WO2018188653A1 (zh) 2018-10-18

Family

ID=63792302

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/083012 WO2018188653A1 (zh) 2017-04-14 2018-04-13 检查方法和检查设备

Country Status (5)

Country Link
EP (1) EP3611666A4 (zh)
JP (1) JP2020516897A (zh)
KR (1) KR20190139254A (zh)
CN (1) CN108734183A (zh)
WO (1) WO2018188653A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472728A (zh) * 2019-07-30 2019-11-19 腾讯科技(深圳)有限公司 目标信息确定方法、目标信息确定装置、介质及电子设备
CN111860263A (zh) * 2020-07-10 2020-10-30 海尔优家智能科技(北京)有限公司 信息录入方法、装置及计算机可读存储介质
CN113496046A (zh) * 2021-01-18 2021-10-12 图林科技(深圳)有限公司 一种基于区块链的电商物流系统及方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522913B (zh) * 2017-09-18 2022-07-19 同方威视技术股份有限公司 检查方法和检查设备以及计算机可读介质
CN111461152B (zh) * 2019-01-21 2024-04-05 同方威视技术股份有限公司 货物检测方法及装置、电子设备和计算机可读介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015062352A1 (zh) * 2013-10-29 2015-05-07 同方威视技术股份有限公司 立体成像系统及其方法
CN104751163A (zh) * 2013-12-27 2015-07-01 同方威视技术股份有限公司 对货物进行自动分类识别的透视检查系统和方法
CN105784732A (zh) * 2014-12-26 2016-07-20 同方威视技术股份有限公司 检查方法和检查系统

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090174554A1 (en) * 2005-05-11 2009-07-09 Eric Bergeron Method and system for screening luggage items, cargo containers or persons
WO2013036735A1 (en) * 2011-09-07 2013-03-14 Rapiscan Systems, Inc. X-ray inspection system that integrates manifest data with imaging/detection processing
CN105808555B (zh) * 2014-12-30 2019-07-26 清华大学 检查货物的方法和系统
JP6543986B2 (ja) * 2015-03-25 2019-07-17 日本電気株式会社 情報処理装置、情報処理方法およびプログラム
CN105574133A (zh) * 2015-12-15 2016-05-11 苏州贝多环保技术有限公司 一种多模态的智能问答系统及方法
CN105975457A (zh) * 2016-05-03 2016-09-28 成都数联铭品科技有限公司 基于全自动学习的信息分类预测系统
CN106446782A (zh) * 2016-08-29 2017-02-22 北京小米移动软件有限公司 图像识别方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015062352A1 (zh) * 2013-10-29 2015-05-07 同方威视技术股份有限公司 立体成像系统及其方法
CN104751163A (zh) * 2013-12-27 2015-07-01 同方威视技术股份有限公司 对货物进行自动分类识别的透视检查系统和方法
CN105784732A (zh) * 2014-12-26 2016-07-20 同方威视技术股份有限公司 检查方法和检查系统

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
See also references of EP3611666A4 *
ZHANG, JIAN ET AL.: "The Application of Neural Network Model for X-ray Image Fusion", COMPUTERIZED TOMOGRAPHY THEORY AND APPLICATIONS, vol. 20, no. 2, 30 June 2011 (2011-06-30), pages 235 - 243, XP009517427, ISSN: 1004-4140 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472728A (zh) * 2019-07-30 2019-11-19 腾讯科技(深圳)有限公司 目标信息确定方法、目标信息确定装置、介质及电子设备
CN111860263A (zh) * 2020-07-10 2020-10-30 海尔优家智能科技(北京)有限公司 信息录入方法、装置及计算机可读存储介质
CN113496046A (zh) * 2021-01-18 2021-10-12 图林科技(深圳)有限公司 一种基于区块链的电商物流系统及方法
CN113496046B (zh) * 2021-01-18 2024-05-10 华翼(广东)电商科技有限公司 一种基于区块链的电商物流系统及方法

Also Published As

Publication number Publication date
CN108734183A (zh) 2018-11-02
EP3611666A1 (en) 2020-02-19
KR20190139254A (ko) 2019-12-17
JP2020516897A (ja) 2020-06-11
EP3611666A4 (en) 2021-01-20

Similar Documents

Publication Publication Date Title
Ghiasi et al. Scaling open-vocabulary image segmentation with image-level labels
WO2022227207A1 (zh) 文本分类方法、装置、计算机设备和存储介质
WO2018188653A1 (zh) 检查方法和检查设备
CN112214995B (zh) 用于同义词预测的分层多任务术语嵌入学习
CN111797241B (zh) 基于强化学习的事件论元抽取方法及装置
CN111125406B (zh) 一种基于自适应聚类学习的视觉关系检测方法
CN111914097A (zh) 基于注意力机制和多层级特征融合的实体抽取方法与装置
Younis et al. Detection and annotation of plant organs from digitised herbarium scans using deep learning
WO2019052561A1 (zh) 检查方法和检查设备以及计算机可读介质
Menshawy Deep Learning By Example: A hands-on guide to implementing advanced machine learning algorithms and neural networks
CN113672931B (zh) 一种基于预训练的软件漏洞自动检测方法及装置
Cheng et al. A semi-supervised deep learning image caption model based on Pseudo Label and N-gram
CN117611576A (zh) 一种基于图文融合对比学习预测方法
CN115965818A (zh) 一种基于相似度特征融合的小样本图像分类方法
Gunaseelan et al. Automatic extraction of segments from resumes using machine learning
CN111242059B (zh) 基于递归记忆网络的无监督图像描述模型的生成方法
Wu et al. AGNet: Automatic generation network for skin imaging reports
US20240028828A1 (en) Machine learning model architecture and user interface to indicate impact of text ngrams
Li et al. Legal case inspection: An analogy-based approach to judgment evaluation
Sharma et al. Optical Character Recognition Using Hybrid CRNN Based Lexicon-Free Approach with Grey Wolf Hyperparameter Optimization
CN116503674B (zh) 一种基于语义指导的小样本图像分类方法、装置及介质
Souri et al. Neural network dealing with Arabic language
Yan et al. Causality Extraction Cascade Model Based on Dual Labeling
Jayaswal et al. Image Captioning Using VGG-16 Deep Learning Model
Abbruzzese et al. REMOAC: A retroactive explainable method for OCR anomalies correction in legal domain

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18785106

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019555877

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 20197033257

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2018785106

Country of ref document: EP

Effective date: 20191114