CN106599028B

CN106599028B - Book content searching and matching method based on video image processing

Info

Publication number: CN106599028B
Application number: CN201610946349.3A
Authority: CN
Inventors: 刘龙坡; 徐向民; 晋建秀
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2016-11-02
Filing date: 2016-11-02
Publication date: 2020-04-28
Anticipated expiration: 2036-11-02
Also published as: CN106599028A

Abstract

The invention provides a book content searching and matching method based on video image processing, which is used for searching and matching textbook content images in a current camera by processing images captured by the camera. The method comprises the steps of performing target image segmentation on an image captured by a camera by using an image processing technology to obtain a target image area; and then extracting a target image from the target image area by using a four-edge detection algorithm, coding the target image based on a perceptual hash algorithm, and searching and matching in a database according to the code of the target image so as to obtain the page number of the current book content. The invention provides an intelligent book content searching and matching method, which is particularly suitable for educational products such as children robots and has wide market prospect and practical significance.

Description

Book content searching and matching method based on video image processing

Technical Field

The invention relates to a searching and matching technology based on an image video segmentation technology and a Hash coding technology, in particular to a book content searching and matching method based on video image processing.

Background

In recent years, the robot education in China is greatly developed under the vigorous promotion of computer education in China, and the robot becomes an innovative education platform with high call. The existing children robots in the market mainly provide accompanying functions for children, for example, some intelligent products related to education mainly include a touch and talk pen, a touch and talk machine, a children tablet computer and the like, although the education products are mature in technology, the technical difficulty of the education products is not high, the overall functions of the equipment are simple, the education functions are relatively few or single, and complex teaching problems cannot be provided, so that the children are really provided with learning guidance.

In addition, mobile learning platforms such as the scholar-tyrant and the homework side can provide learning answers in the market, but no matter the scholar-tyrant or the homework side needs to take pictures on a textbook in the using process, and then the problem area is intercepted by an adjusting intercepting frame, which causes some troubles in interaction. Therefore, there is an urgent need in the market for a solution that can intelligently assist children in learning, for childhood education.

Therefore, the invention provides a book content searching and matching method based on video image processing, which comprises the steps of installing a camera on an educational robot, shooting book contents by the camera when a child does work, starting a video image segmentation and searching and matching function when the child encounters a problem, and processing the shot images so as to obtain the page number of the contents currently studied by the child in the corresponding book, and obtaining the title number only by the child speaking the 'number of the title' and combining with voice recognition so as to obtain the title currently done by the child and provide answers from a background.

Disclosure of Invention

The invention aims to introduce a brand-new book content identification and matching method based on video image processing.

The invention adopts at least one of the following technical solutions.

A book content searching and matching method based on video image processing is characterized in that a target image is segmented and extracted by combining a target image segmentation algorithm based on image processing and a target image extraction algorithm based on edge detection, and the target image is searched and matched by a target image searching and matching algorithm based on perceptual hash algorithm, and specifically comprises the following steps:

(1) converting a video sequence captured by a camera into an image based on a book image segmentation algorithm of image processing, setting a book image area, and segmenting an input image by using the image segmentation algorithm to obtain an area where a target image is located;

(2) the method comprises the steps of segmenting a segmented image into a gray-scale image based on a book image extraction algorithm for four-edge detection, extracting the edge of a target image based on the four-edge detection algorithm, and subtracting a pixel boundary to extract the target image;

(3) the book content image searching and matching algorithm based on the perceptual hash algorithm is characterized in that perceptual hash coding is carried out on a target image of the whole database, codes are stored, then the hash codes of the target image are obtained, and the book content searching and matching are realized by calculating the Hamming distance between the target image codes and the database image codes.

Further, in the target image segmentation algorithm based on image processing in step (1), a video captured by a camera is converted into an image with a uniform format, and a target area is set as an area where a foreground target is located, where p is an empirical value 10, and a rectangular area S obtained by subtracting p pixels from start and end coordinates of the image is set as a background area, and a foreground target area can be obtained by combining the foreground area and the background area with the image segmentation algorithm, and then the target segmentation algorithm based on image processing comprises the following steps:

1) converting a video sequence captured by the camera into an image with a uniform format;

2) setting the number p of boundary pixels of a target area to be 10, subtracting p pixels according to the initial and final coordinates of the captured book image to obtain a rectangular area S where a foreground target is located, and judging the area outside the rectangular area S and within the captured image to be a background area;

3) and obtaining the foreground object by combining image segmentation algorithm segmentation according to the foreground area and the background area of the captured image

And marking a region.

Further, in the book content image extraction algorithm based on four-edge detection in step (2), when the target image is extracted from the input image, the target image extraction algorithm based on four-edge detection is used to remove the background black circle part of the image, the image is converted into a gray-scale image, at this time, if the pixel value of the image is 0, the background black circle part is detected, and if the pixel value is not 0, the foreground target part is detected, according to the determination method, the top and right two points A, B of the gray-scale image can be detected, and then the image is subjected to rotation correction according to a coordinate correction formula; then, whether the pixel value of the image is 0 or not is detected from the middle of four edges of the image to the inside of the image, if the pixel value is 0, the image is judged to be a background black circle part, if the pixel value is not 0, the image is judged to be a target image part, the starting coordinates startRow, startCol and the ending coordinates endRow and endCol of the row and column of the target image are obtained after the detection is finished, then boundary values pad are subtracted from the starting coordinates and the ending coordinates respectively to obtain an area of the book image, and if the pad is an empirical value 2, the target image extraction algorithm based on the four-edge detection comprises the following steps:

1) converting the image into a gray scale image;

2) detecting coordinates of an uppermost point A and a rightmost point B of a book content area in the image, and performing rotation correction on the image according to the coordinates of the A and the B and a correction formula;

3) detecting pixel values from the middle of the upper side, the lower side, the left side and the right side of the image to the inside respectively, if the pixel values are not 0, obtaining the initial coordinates startRow and startCol and the termination coordinates endRow and endCol of the target image, and if not, continuing to execute the step 2);

4) and respectively subtracting the boundary value pad from the start coordinates startRow and startCol and the end coordinates endRow and endCol of the target image to obtain the coordinates of the book image and obtain the horizontal book image.

Further, in the target image search matching algorithm based on the perceptual hash algorithm in step (3), the target image is obtained by extracting the image in the database through step (1) and step (2), the target image is converted into a gray map, the gray map is compressed to 12x12 size and is subjected to discrete cosine transform to calculate a discrete cosine coefficient, then the hash coding is performed on the region of 8x8 of the upper left corner to obtain a picture fingerprint, the picture fingerprint is stored in a fingerprint matrix, when the matched picture to be searched is input, the input picture is subjected to the above steps again to obtain an input image hash fingerprint, the hamming distance calculation is performed on the hash fingerprint and each fingerprint in the fingerprint matrix, the image corresponding to the fingerprint with the minimum hamming distance is the image obtained by search matching, and the target image search matching algorithm based on the perceptual hash algorithm comprises the following steps:

1) obtaining a target image from each image in the database through a target image segmentation algorithm and a target image extraction algorithm for detecting four edges;

2) converting the target image into a gray-scale image, compressing the gray-scale image to an image of 12x12, and calculating a discrete cosine coefficient to obtain a discrete cosine matrix;

3) carrying out hash coding on the 8x8 area at the upper left corner of the discrete cosine matrix in the step 2) to obtain a picture fingerprint and storing the picture fingerprint in a fingerprint matrix;

4) and inputting a matched image to be searched, and carrying out the steps 1), 2), 3) to obtain 64-bit fingerprints of the target image, carrying out Hamming distance calculation on the fingerprints in a fingerprint matrix, updating the subscript of the current searched matched image if the Hamming distance is smaller than the current minimum value, and continuing to execute the step 4) until the search is finished.

Compared with the prior art, the invention has the following advantages and technical effects:

the method comprises the steps of performing target image segmentation on an image captured by a camera by using an image processing technology to obtain a target image area; and then extracting a target image from the target image area by using a four-edge detection algorithm, coding the target image based on a perceptual hash algorithm, and searching and matching in a database according to the code of the target image so as to obtain the page number of the current book content. The invention provides an intelligent book content searching and matching method, which is particularly suitable for educational products such as children robots and has wide market prospect and practical significance.

Drawings

FIG. 1 is a flow chart of the overall implementation method in an example.

FIG. 2 is a flow chart of a book image segmentation algorithm;

fig. 3 is a flow chart of a book image extraction algorithm.

Detailed Description

The following description of the embodiments of the present invention is provided in connection with the accompanying drawings and examples, but the invention is not limited thereto.

The technical scheme of the embodiment mainly comprises the following steps: the book image segmentation algorithm based on image processing, the target image extraction algorithm based on four-edge detection and the target image search matching algorithm based on perceptual hash algorithm are as follows.

1. Book image segmentation algorithm based on video image processing

The GrapCut algorithm based on image segmentation can perform RGB three-channel Gaussian mixture modeling on an image,

and performing target segmentation on the image through an iterative process of continuously performing segmentation estimation and Gaussian model parameter learning. Firstly, a given area M of a target in an image is given, and the area M is taken as a target pixel M_UAnd the area outside M is used as a background pixel M_BAnd respectively modeling the target and the background by adopting a full covariance Gaussian mixture model with k Gaussian components, wherein k takes an empirical value of 5, each pixel in the obtained model can only be classified as a certain Gaussian component of the target Gaussian mixture model or a certain component of the background Gaussian mixture model, and then, optimizing parameters of the Gaussian mixture model by distributing the Gaussian mixture component with the maximum probability to each pixel and carrying out iteration on two processes of segmentation estimation to segment the target image.

Based on the image segmentation algorithm, the invention provides an object segmentation algorithm based on image processing, which is used for shooting a book by a camera and converting a captured video frame into an image with a size of 200x200 in a jpg format. The shot image is divided into a target image textbook and other background images, the number p of boundary pixels of a target area is set to be 10, a rectangular area S where a foreground target is located is obtained by subtracting p from the initial and final coordinates of the captured image, and the area inside the input image except the rectangular area S is determined as a background area. The method comprises the following steps of segmenting a background area and a target area of a captured image by using an image segmentation algorithm to obtain a target area image, wherein the book image segmentation algorithm based on image processing comprises the following steps:

1) converting a video sequence captured by the camera into a 200x200 uniform format size;

2) setting the boundary p of the target area to be 10, and subtracting the boundary p according to the initial and final coordinates of the captured image

Obtaining a rectangular area S where the foreground object is located, and judging the foreground object to be a background area outside the rectangular area S and within the input image;

3) and obtaining the book image by combining graph segmentation algorithm segmentation according to the foreground area and the background area of the input image.

2. Book image extraction algorithm based on four-edge detection

After a book content area captured by a camera is segmented from an image, pixel values of a background part in the image are all 0, namely the image is black; meanwhile, when the camera shoots textbook contents, the textbook is possibly out of level, so that the shot textbook contains a rotation angle, and in order to extract an image of a region where the book is located from a target image obtained by segmentation, the invention provides a book content image extraction algorithm based on four-edge detection to extract a book content image. When the target image is extracted from the input image, the target book image may contain a rotation angle, if the rotation angle is contained, the coordinates of four corner points of the book image are respectively the top, the left, the bottom and the right of a part of the target image, wherein the pixel value is not 0, the image is firstly converted into a gray scale image, then any two points on the non-diagonal line are searched, and the image alignment correction can be carried out through slope adjustment. In this example, the top and right two points are found, denoted as A and B, with coordinates (x) respectively_a,y_a) And (x)_b,y_b) The correction formula is as follows:

correcting the image by a correction formula, removing a background black circle part of the image by adopting a target image extraction algorithm of four-edge detection, respectively detecting whether an image pixel value is 0 from the middle of four edges of the image to the inside of the image, if the image pixel value is 0, determining the background black circle part, if the image pixel value is not 0, determining the target image part, obtaining a row and column starting coordinate startRow, startCol and an ending coordinate endRow and endCol of the target image after the detection is finished, subtracting a boundary value pad from the starting coordinate and the ending coordinate respectively to obtain an area of the book image, wherein the pad takes an empirical value of 2, thereby obtaining coordinates of the book content image, and extracting the book content image, wherein the target image extraction algorithm based on the four-edge detection comprises the following steps:

1) converting the image into a gray scale image;

3. Book content searching and matching algorithm based on perceptual hash algorithm

After the book content image is extracted from the image captured by the camera, in order to identify that the current book content is the second page of the textbook, the invention provides a book content searching and matching algorithm based on a perceptual hash algorithm. Firstly, compressing the extracted book content image to 12x12 size, and performing discrete cosine transform to calculate discrete cosine coefficients, wherein the size of image compression can be selected according to the actual size, but should be more than or equal to 8x8, and should not be too large. And then carrying out hash coding on the region of the upper left corner 8x8, firstly calculating the average mean of the pixel values of the compressed 12x12 image, and setting each pixel value of the upper left corner 8x8 as character 1 if the average mean is larger than the average mean and as character 0 if the average mean is smaller than the average mean, thereby obtaining 64-bit hash coding. In order to realize the search matching of images, 64-bit hash coding is performed on each page of image of a textbook and the images are stored in a database, when the images are searched and matched, the input images are subjected to hash coding to obtain hash fingerprints, then hamming distances are calculated between the hash fingerprints and each hash fingerprint in the database, the image corresponding to the hash fingerprint with the minimum hamming distance is judged as the image which is most matched with the current input image, and therefore the current input image is the page of the textbook content, and then the book content search matching algorithm based on the perceptual hash algorithm comprises the following steps:

4) and inputting a matched image to be searched and carrying out the steps 1), 2), 3) to obtain 64-bit fingerprints of the target image, carrying out Hamming distance calculation on the fingerprints in a fingerprint matrix, updating the subscript of the current searched matched image if the Hamming distance is smaller than the current minimum value, and continuously executing the step 4).

The method only needs to use the camera to shoot the front side of the area where the textbook is located, and software developed based on OpenCv is used for processing the image video of the video stream acquired by the camera. And carrying out image segmentation, rotation correction, textbook content extraction, perceptual hash coding and other processing on the shot image, searching and matching the hash fingerprint obtained by coding in a hash fingerprint database, wherein the image corresponding to the fingerprint with the minimum Hamming distance of the hash fingerprint obtained by coding is the image obtained by searching and matching, and thus the page number of the current textbook content is obtained.

As shown in the general implementation method flow of fig. 1, the present embodiment applies an image processing technique to perform target image segmentation on an image captured by a camera to obtain a target image region; and then extracting a target image from the target image area by using a four-edge detection algorithm, coding the target image based on a perceptual hash algorithm, and searching and matching in a database according to the code of the target image so as to obtain the page number of the current book content. According to the embodiment, the camera is installed on the educational robot, the camera shoots the book content when the child does work, when the child encounters a problem, the video image segmentation and search matching function is started, the shot image is processed, the page number of the content currently being learned by the child in the corresponding book is obtained, in addition, the child only needs to speak the 'number of the question' and then combines with voice recognition to obtain the question number, so that the current question being made by the child can be obtained, and the answer can be provided from the background.

The method comprises the following specific steps:

book image segmentation algorithm based on image processing

A book is first photographed by a camera and the captured video frame is converted into an image in jpg format of size 200x 200. Setting the boundary p of the target area as 10 to obtain a background area and a foreground area of the image, and then segmenting the background area and the target area of the captured image by using an image segmentation algorithm to obtain a target area image, as shown in fig. 2.

Book image extraction algorithm based on four-edge detection

When the target image is extracted from the input image, the image is converted into a gray-scale image, then any two points on the non-diagonal line are searched, and the image alignment correction is carried out through slope adjustment. And removing a background black circle part of the image by adopting a target image extraction algorithm for detecting four edges, obtaining a row and column starting coordinate startRow and startCol and an ending coordinate endRow and endCol of the target image after the detection is finished, and subtracting a boundary value pad from each of the starting coordinate and the ending coordinate to obtain an area of the book image so as to extract a book content image, wherein the area is shown in figure 3.

Third, book content searching and matching algorithm based on perceptual hash algorithm

After extracting the book content image from the image captured by the camera, firstly compressing the extracted book content image to 12x12 size, performing discrete cosine transform to calculate discrete cosine coefficient, then performing hash coding on the region of 8x8 at the upper left corner, in order to realize the search matching of the image, firstly performing 64-bit hash coding on each page of image of the textbook and storing the page of image in a database, performing hash coding on the input image to obtain hash fingerprints when searching for matching, then calculating hamming distance between the hash fingerprints and each hash fingerprint in the database, and judging the hash fingerprint with the minimum hamming distance as the image which is most matched with the current input image, thereby obtaining the page of which the current input image is the textbook content.

Claims

1. A book content searching and matching method based on video image processing is characterized by comprising the following steps of segmenting and extracting a target image by combining a target image segmentation algorithm based on image processing and a target image extraction algorithm based on edge detection, and searching and matching the target image by a target image searching and matching algorithm based on perceptual hash algorithm, wherein the method comprises the following steps:

(2) the method comprises the steps of segmenting a segmented image into a gray-scale image based on a book image extraction algorithm for four-edge detection, extracting the edge of a target image based on the four-edge detection algorithm, and subtracting a pixel boundary to extract the target image; in the book content image extraction algorithm based on the four-edge detection, when a target image is extracted from an input image, the target image extraction algorithm based on the four-edge detection is adopted to remove a background black circle part of the image, the image is converted into a gray-scale image, at the moment, if the pixel value of the image is 0, the image is the background black circle part, and if the pixel value of the image is not 0, the image is a foreground target part, the uppermost point A, B and the rightmost point A, B of the gray-scale image can be detected according to the judgment method, and then the image is subjected to rotation correction according to a coordinate correction formula; then, whether the pixel value of the image is 0 or not is detected from the middle of four edges of the image to the inside of the image, if the pixel value is 0, the image is judged to be a background black circle part, if the pixel value is not 0, the image is judged to be a target image part, the starting coordinates startRow, startCol and the ending coordinates endRow and endCol of the row and column of the target image are obtained after the detection is finished, then boundary values pad are subtracted from the starting coordinates and the ending coordinates respectively to obtain an area of the book image, and if the pad is an empirical value 2, the target image extraction algorithm based on the four-edge detection comprises the following steps:

1) converting the image into a gray scale image;

4) respectively subtracting the boundary value pad from the initial coordinates startRow, startCol and the end coordinates endRow and endCol of the target image to obtain the coordinates of the book image and obtain a horizontal book image;

(3) the book content image searching and matching algorithm based on the perceptual hash algorithm is characterized in that perceptual hash coding is carried out on a target image of the whole database, codes are stored, then the hash codes of the target image are obtained, and the book content searching and matching are realized by calculating the Hamming distance between the target image codes and the database image codes; in the target image searching and matching algorithm based on the perceptual hash algorithm, an image in a database is extracted through the steps (1) and (2) respectively to obtain a target image, the target image is converted into a gray-scale image, the gray-scale image is compressed to 12x12 size and is subjected to discrete cosine transform to calculate a discrete cosine coefficient, then a region of 8x8 on the upper left corner is subjected to hash coding to obtain a picture fingerprint, the picture fingerprint is stored in a fingerprint matrix, when a matched picture to be searched is input, the input picture is subjected to the steps again to obtain an input image hash fingerprint, the hash fingerprint and each fingerprint in the fingerprint matrix are subjected to hamming distance calculation, an image corresponding to the fingerprint with the minimum hamming distance is an image obtained through searching and matching, and the target image searching and matching algorithm based on the perceptual hash algorithm comprises the following steps:

2. The method for searching and matching book contents based on video image processing according to claim 1, wherein in the step (1) of the target image segmentation algorithm based on image processing, the video captured by the camera is converted into an image with a uniform format, and the target area is set as a rectangular area S obtained by subtracting p pixels from the start and end coordinates of the image, where p is an empirical value of 10, and the area inside the input image outside the rectangular area is set as a background area, and the foreground target area can be segmented according to the foreground and background areas and the image segmentation algorithm, then the steps of the target segmentation algorithm based on image processing are as follows:

3) and segmenting according to the foreground region and the background region of the captured image by combining an image segmentation algorithm to obtain a foreground target region.