CN107256378A - Language Identification and device - Google Patents
Language Identification and device Download PDFInfo
- Publication number
- CN107256378A CN107256378A CN201710272779.6A CN201710272779A CN107256378A CN 107256378 A CN107256378 A CN 107256378A CN 201710272779 A CN201710272779 A CN 201710272779A CN 107256378 A CN107256378 A CN 107256378A
- Authority
- CN
- China
- Prior art keywords
- image
- recognized
- identified
- energy
- pixel points
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 claims abstract description 120
- 239000013598 vector Substances 0.000 claims abstract description 63
- 230000008569 process Effects 0.000 claims abstract description 17
- 230000006870 function Effects 0.000 claims description 40
- 239000011159 matrix material Substances 0.000 claims description 40
- 238000012706 support-vector machine Methods 0.000 claims description 37
- 238000001914 filtration Methods 0.000 claims description 25
- 238000012545 processing Methods 0.000 claims description 24
- 238000012549 training Methods 0.000 claims description 21
- 238000000354 decomposition reaction Methods 0.000 claims description 14
- 238000010606 normalization Methods 0.000 claims description 9
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000011426 transformation method Methods 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 9
- 238000000605 extraction Methods 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 238000012360 testing method Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000012015 optical character recognition Methods 0.000 description 2
- 238000011045 prefiltration Methods 0.000 description 2
- KHOITXIGCFIULA-UHFFFAOYSA-N Alophen Chemical compound C1=CC(OC(=O)C)=CC=C1C(C=1N=CC=CC=1)C1=CC=C(OC(C)=O)C=C1 KHOITXIGCFIULA-UHFFFAOYSA-N 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000003825 pressing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/263—Language identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the present invention provides a kind of Language Identification and device.This method includes:The textural characteristics vector of images to be recognized is obtained using multi-wavelet transformation method, the textural characteristics vector of images to be recognized is inputted in predetermined SVM, obtain the languages of images to be recognized Chinese version, realize and textural characteristics vector is obtained using multi-wavelet transformation method, because the process that multi-wavelet transformation method obtains textural characteristics vector is very fast, therefore, the recognition efficiency of Language Identification provided in an embodiment of the present invention is higher.Simultaneously as the characteristic of multi-wavelet transformation method, the recognition accuracy of the Language Identification of the embodiment of the present invention is also higher.Therefore, the efficiency and accuracy rate of languages identification are improved using Language Identification provided in an embodiment of the present invention.
Description
Technical Field
The present invention relates to machine learning technologies, and in particular, to a language identification method and apparatus.
Background
With the increasing demand for human and computer interaction, it is desirable to have a system that can handle a variety of inputs, such as in print and handwritten document formats, in order to make human interaction more efficient. Although computers are widely used in almost all fields, paper documents still hold an important position for the next long time. In addition, business documents in physical form are widely available to meet a variety of needs in reality. Such as faxing documents, presenting documents in a court, etc. Therefore, there is a pressing need for methods and systems that can automatically extract, analyze, and store information from existing paper forms, as well as maintain and access when needed. All of these problems are attributed to textual image analysis. Language identification is an important component in the field of text image processing, and its application scenarios include text image sorting, searching online text documents containing specific languages, designing an Optical Character Recognition (OCR) system, and an automatic text retrieval system based on the language of the base document.
Currently, language identification can be performed based on texture features of images. The existing language identification method based on texture features mainly comprises the following steps: based on gray level co-occurrence matrix method, fractal dimension method and Gabor filter method. These methods first require extraction of the textural features of the image. The gray level co-occurrence matrix-based method is a method for performing language identification by extracting a feature value of an image according to a co-occurrence matrix. In practical application, because the computation of the gray level co-occurrence matrix is relatively complex, the computation complexity is generally reduced by preprocessing the image binarization. The gray level co-occurrence matrix of the processed image generates three values, and the three values can be used for language identification as texture features. The basic idea of the fractal dimension-based method is to understand an image as a multi-fractal set, and texture features of the image are obtained by calculating a corresponding generalized dimension spectrum. The text image is treated as a fractal model based on a fractal dimension method, and the textural features of the image are expressed by the fractal dimension calculated by difference box-counting dimension, so that the task of language identification is completed. A multi-channel model based on the Gabor filter is constructed based on a Gabor filter method, so that textural features of the text image are extracted, and the purpose of language identification can be achieved.
However, the above methods have the disadvantages of large calculation amount and long time for extracting the texture features, so the current language identification method has low efficiency.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a language identification method and a language identification device, so as to improve the efficiency of language identification.
In a first aspect, an embodiment of the present invention provides a language identification method, including:
acquiring a texture feature vector of an image to be identified by adopting a multi-wavelet transform method;
and inputting the texture feature vector of the image to be recognized into a predetermined SVM to obtain the language of the text in the image to be recognized.
In the method shown above, the obtaining the texture feature vector of the image to be identified by using the multi-wavelet transform method includes:
processing the image to be identified by adopting an interpolation filtering method to obtain a coefficient matrix of the image to be identified;
taking first half row elements of all rows in the coefficient matrix as parameters of a first scale function of the second-order multi-wavelet, and taking second half row elements of all rows in the coefficient matrix as parameters of a second scale function of the second-order multi-wavelet;
determining the second-order multi-wavelet according to the parameters of the first scale function and the parameters of the second scale function;
decomposing the coefficient matrix according to the second-order multi-wavelet to obtain a first preset number of detail sub-images and a second preset number of approximation sub-images of the image to be recognized;
respectively determining an energy mean value and an energy variance of each detail sub-graph;
and taking the energy mean value and the energy variance of each detail sub-image as the texture feature vector of the image to be identified.
In the method shown above, before the preprocessing the image to be recognized by using the interpolation filtering method, the method further includes:
normalizing the energy of each pixel point of the image to be recognized to obtain a normalized image to be recognized;
the processing of the image to be identified by adopting an interpolation filtering method comprises the following steps:
and processing the normalized image to be recognized by adopting an interpolation filtering method.
In the method, the normalizing the energy of each pixel of the image to be recognized to obtain the normalized image to be recognized includes:
according to the formulaAcquiring the normalized energy of each pixel point in the image to be identified; wherein m represents the row sequence number of the pixel points in the image to be identified, n represents the column sequence number of the pixel points in the image to be identified, W (m, n) represents the energy of the pixel points (m, n), and Q represents the total number of the pixel points in each row and the total number of the pixel points in each column in the image to be identified;
and determining the normalized image to be identified according to the normalized energy of all the pixel points.
In the method shown above, the separately determining an energy mean and an energy variance of each detail sub-graph includes:
according to the formulaDetermining an energy value of each detail sub-graph; wherein: a represents the row sequence number of the pixel points in the detail subgraph, b represents the column sequence number of the pixel points in the detail subgraph, and l is (1, 2, 3, 4) which represents that each level of detail subgraphs decompose four detail subgraphs in the same direction; j ═ 1, 2, and represents the number of decomposition stages; k ═ 1, 2, 3, representing the three horizontal, vertical, and diagonal directions; n represents the total number of pixel points in each row and the total number of pixel points in each column of the detail subgraph, and W (a, b) represents the energy of the pixel points (a, b);
according to the formulaDetermining an average energy value of each detail sub-graph;
according to the formulaThe energy variance of each detail sub-graph is determined.
In the method as described above, before the texture feature vector of the image to be recognized is input into a predetermined support vector machine SVM and the language of the text in the image to be recognized is obtained, the method further includes:
acquiring a texture feature vector of each training image in a training image set by adopting a multi-wavelet transform method;
determining a classification surface of the SVM according to the texture feature vector of each training image and the language calibration result of each training image;
inputting the texture feature vector of the image to be recognized into a predetermined SVM, and acquiring the language of the text in the image to be recognized, including:
and determining the language of the text in the image to be recognized according to the texture feature vector of the image to be recognized and the classification surface of the SVM.
In a second aspect, an embodiment of the present invention provides a language identification apparatus, including:
the first acquisition module is used for acquiring the texture feature vector of the image to be identified by adopting a multi-wavelet transform method;
and the second acquisition module is used for inputting the texture feature vector of the image to be recognized into a predetermined Support Vector Machine (SVM) to acquire the language of the text in the image to be recognized.
In the above apparatus, the first obtaining module is specifically configured to:
processing the image to be identified by adopting an interpolation filtering method to obtain a coefficient matrix of the image to be identified;
taking first half row elements of all rows in the coefficient matrix as parameters of a first scale function of the second-order multi-wavelet, and taking second half row elements of all rows in the coefficient matrix as parameters of a second scale function of the second-order multi-wavelet;
determining the second-order multi-wavelet according to the parameters of the first scale function and the parameters of the second scale function;
decomposing the coefficient matrix according to the second-order multi-wavelet to obtain a first preset number of detail sub-images and a second preset number of approximation sub-images of the image to be recognized;
respectively determining an energy mean value and an energy variance of each detail sub-graph;
and taking the energy mean value and the energy variance of each detail sub-image as the texture feature vector of the image to be identified.
In the apparatus as described above, the apparatus further comprises:
the normalization processing module is used for performing normalization processing on the energy of each pixel point of the image to be recognized to obtain a normalized image to be recognized;
the first obtaining module processes the image to be recognized by adopting an interpolation filtering method, and specifically includes:
and processing the normalized image to be recognized by adopting an interpolation filtering method.
In the apparatus as shown above, the normalization processing module is configured to:
according to the formulaAcquiring the normalized energy of each pixel point in the image to be identified; wherein m represents the row sequence number of the pixel points in the image to be identified, n represents the column sequence number of the pixel points in the image to be identified, W (m, n) represents the energy of the pixel points (m, n), and Q represents the total number of the pixel points in each row and the total number of the pixel points in each column in the image to be identified;
and determining the normalized image to be identified according to the normalized energy of all the pixel points.
According to the language identification method and device provided by the embodiment of the invention, the texture feature vector of the image to be identified is obtained by adopting the multi-wavelet transform method, the texture feature vector of the image to be identified is input into the predetermined SVM, and the language of the text in the image to be identified is obtained, so that the texture feature vector is obtained by adopting the multi-wavelet transform method, and the process of obtaining the texture feature vector by adopting the multi-wavelet transform method is quicker, so that the language identification method provided by the embodiment of the invention has higher identification efficiency. Meanwhile, due to the characteristics of the multi-wavelet transform method, the language identification method provided by the embodiment of the invention is high in identification accuracy. Therefore, the language identification method provided by the embodiment of the invention improves the efficiency and accuracy of language identification.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a flowchart illustrating a language identification method according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a specific implementation manner of S101 in the embodiment shown in FIG. 1;
FIG. 3 is a schematic diagram of the multi-wavelet transform of FIG. 2;
FIG. 4 is a graph showing the results of the experiment in FIG. 3;
fig. 5 is a schematic flow chart of a specific implementation method of the language identification method according to the embodiment of the present invention;
FIG. 6A is a diagram illustrating a first image library for verifying a language identification method according to an embodiment of the present invention;
FIG. 6B is a diagram illustrating a second image library for verifying the language identification method according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an embodiment of a language identification device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," and "fourth," if any, in the description and claims of the invention and in the above-described figures are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 1 is a flowchart illustrating an embodiment of a language identification method according to an embodiment of the present invention. As shown in fig. 1, the language identification method provided in the embodiment of the present invention includes the following steps:
s101: and acquiring the texture feature vector of the image to be identified by adopting a multi-wavelet transform method.
Specifically, the language identification method provided by the embodiment of the present invention may be executed by a computer. Determining the language of text in an image may be applied to ranking text images or searching for online text documents containing a particular language, etc. Based on language identification of different application scenarios, the computer in the embodiment of the present invention may be a server of a search engine, or a server in a document database.
Texture of an image is a visual feature that reflects homogeneity in the image, and refers to a function of spatially varying pixel intensity. Thus, the texture feature vector of the image to be recognized may reflect the spatially varying pixel intensities of the image to be recognized. The image to be identified in the embodiment of the invention can be a text image obtained by scanning a paper material, or a text image converted from a document generated by character editing software. The content of the text in the image to be recognized may be a novel in various languages, a document, and the like.
Fig. 2 is a flowchart illustrating a specific implementation manner of S101 in the embodiment shown in fig. 1. As shown in fig. 2, in a possible implementation manner, a specific process of obtaining a texture feature vector of an image to be identified by using a multi-wavelet transformation method is as follows:
s1011: and processing the image to be recognized by adopting an interpolation filtering method to obtain a coefficient matrix of the image to be recognized.
Since the multi-wavelet transform is only applicable to vector signals, to perform the multi-wavelet transform on image signals, it is necessary to pre-filter rows and columns of an image, then compose the vector signals of the rows and columns of the pre-filtered image according to a certain rule, and then perform the multi-wavelet transform.
Representing an image to be identified of N × N, where N represents the total number of pixels in each row and column of the image, it can be seen that the image is a square image, ai,jThe pixel value of the pixel point (i, j) is represented, i is more than or equal to 1 and less than or equal to N, j is more than or equal to 0 and less than or equal to N, and the pixel point (i, j) represents the ith rowAnd j-th row of pixel points.
The specific process of the interpolation filtering method is as follows:
firstly, each line of an image A to be recognized forms a line vector signal according to the following mode:
then to Airow(n) pre-filtering:
wherein,which represents the impulse response of the pre-filter, Iirow(n) of composition I1Is a matrix with 2N rows and N/2 columns.
Pre-filtering the rows:
will I1Each column of (a) constitutes a column vector signal in the following way:
then to Bicol(n) pre-filtering:
finally obtained Iicol(n) of composition I2Is the coefficient matrix of the image to be recognized.
S1012: and taking the first half row elements of all rows in the coefficient matrix as parameters of a first scale function of the second-order multi-wavelet, and taking the second half row elements of all rows in the coefficient matrix as parameters of a second scale function of the second-order multi-wavelet.
Specifically, optionally, in the embodiment of the present invention, a biorthogonal multi-wavelet "ghmap 2" is selected to decompose the coefficient matrix. ghmap2 is a biorthogonal multiple wavelet function constructed from two scale functions. The second-order multi-wavelet has two parameters: a first scale function and a second scale function. The parameters of the first scale function and the parameters of the second scale function are determined to determine a second order multi-wavelet. The parameters of the first scale function and the second scale function refer to scale factors of the first scale function and the second scale function. From the coefficient matrix, parameters of the first scale function and parameters of the second scale function may be determined.
S1013: and determining a second-order multi-wavelet according to the parameters of the first scale function and the parameters of the second scale function.
S1014: and decomposing the coefficient matrix according to the second-order multi-wavelet to obtain a first preset number of detail subgraphs and a second preset number of approximation subgraphs of the image to be recognized.
Specifically, in the embodiment of the present invention, a coefficient matrix is subjected to two-stage decomposition, and finally, 4 approximation subgraphs and 24 detail subgraphs can be obtained.
Fig. 3 is a schematic diagram of the multi-wavelet transform in fig. 2. As shown in fig. 3, the pre-filtering process is performed on the image to be recognized to obtain a coefficient matrix of the image to be recognized. And performing primary decomposition and secondary decomposition on the coefficient matrix. There are four types of graphs after the first-order decomposition: LL, HL, LH and HH. The four types of graphs are parameters obtained in the process of decomposing two-dimensional discrete wavelet transform into two groups of mirror image filters and reconstructing signals according to the multi-resolution analysis theory. LL corresponds to the low-pass filter in the convolution decomposition filter of the row and the column respectively, and HL corresponds to the convolution decomposition high-pass filter of the row and the convolution decomposition low-pass filter of the column to inferiorly classify LH and HH. HH, LH and HL hold edge detail information of an original image and describe detail characteristics of the image, and are called detail subgraphs. The LL retains more energy information of the original image, called the approximation subgraph. In the second-level decomposition, the LL graph is decomposed again to obtain 16 sub-graphs. Of the 16 sub-graphs, there are 4 LL graphs, the 4 LL graphs 31 are approximate sub-graphs, and the other graphs after the secondary decomposition are detail sub-graphs, so that the number of the detail sub-graphs 12+12 is 24. Fig. 4 is a schematic diagram of the experimental result in fig. 3, fig. 41 in fig. 4 is an approximation subgraph, and the other figures are detail subgraphs. The detail subgraph and the approximation subgraph in the embodiment of the invention refer to graphs after two-stage decomposition.
S1015: and respectively determining the energy mean value and the energy variance of each detail sub-graph.
Specifically, the energy mean and the energy variance of each detail sub-graph can be determined according to the energy of each pixel point in the detail sub-graph. The detail subgraph is stored in a matrix form, and the value of each element in the matrix represents the energy of the pixel point represented by the element.
According to the formulaAn energy value for each detail sub-graph is determined. Wherein: a represents the row sequence number of the pixel points in the detail subgraph, b represents the column sequence number of the pixel points in the detail subgraph, and l is (1, 2, 3, 4) which represents that each level of detail subgraphs decompose four detail subgraphs in the same direction; j ═ 1, 2, and represents the number of decomposition stages; k ═ 1, 2, 3, representing the three horizontal, vertical, and diagonal directions; and N represents the total number of pixel points in each row and the total number of pixel points in each column of the detail subgraph. It can be seen that the detail subgraph is a square. According to the formulaAn average energy value for each detail sub-graph is determined. According to the formulaThe energy variance of each detail sub-graph is determined.
S1016: and taking the energy mean value and the energy variance of each detail sub-image as texture feature vectors of the image to be identified.
Considering that information distribution in different directions and frequencies is different for a text image, and a detail sub-image has strong expression capability for edge information, the embodiment of the present invention uses an energy mean and an energy variance of all detail sub-images to construct texture features of an image to be recognized. After the energy mean value and the energy variance of each detail sub-image are determined, the energy mean value and the energy variance are used as texture feature vectors of the image to be identified. Since the number of detail sub-images is 24, each detail sub-image has two-dimensional measures of energy mean and energy variance, and thus, the texture feature vector is 48-dimensional. The texture feature vector is represented as follows:
s102: and inputting the texture feature vector of the image to be recognized into a predetermined SVM to acquire the language of the text in the image to be recognized.
Specifically, a Support Vector Machine (SVM) is determined from the training image set. Before S102, a classification surface in the SVM needs to be determined, and the process is as follows: acquiring a texture feature vector of each training image in a training image set by adopting a multi-wavelet transform method; and determining the classification surface of the SVM according to the texture feature vector of each training image and the language calibration result of each training image. The language calibration result of each training image refers to the correct language result of the text in the training image determined manually or in other ways. Optionally, the step of inputting the texture feature vector of the image to be recognized into a predetermined SVM to obtain the language of the text in the image to be recognized may be: and determining the language of the text in the image to be recognized according to the texture feature vector of the image to be recognized and the classification surface of the SVM.
In the embodiment of the invention, the SVM can be determined by using an LIBSVM toolkit. This is a simple and effective toolkit for SVMs, providing a variety of interfaces that facilitate our tuning for specific applications. Meanwhile, a plurality of default parameters are provided, so that the adjustment parameters are simplified, and the function of interactive verification is provided. The classification steps are as follows:
1) the data set was prepared according to the following format:
<label><index1>:<value1><index2>:<value2>...
where < label > is used to mark a category. < index > is an integer starting with 1, which may be discontinuous; < value > is a real number.
2) Scaling data using a scale method to improve operating efficiency;
3) selecting a radial basis kernel function as a kernel function;
4) training and adjusting parameters C and g;
5) completing model training according to the parameters C and g;
6) and testing on the test set by using the trained SVM classifier.
Optionally, the image to be identified may be energy normalized. Different images to be recognized have different energies, so that the calculation of the sub-image energy is influenced, therefore, before the multi-wavelet transformation is carried out, the energy of each pixel point of the images to be recognized is normalized, and the normalized images to be recognized are obtained. When the interpolation filtering method is used for processing the image to be recognized, the normalized image to be recognized is processed by the interpolation filtering method.
Can be according to the formulaAnd acquiring the normalized energy of each pixel point in the image to be identified. Wherein m represents the row number of the pixel points in the image to be identified, n represents the column number of the pixel points in the image to be identified, W (m, n) represents the energy of the pixel points (m, n), and Q represents each time in the image to be identifiedThe total number of pixel points in a row and the total number of pixel points in each column. After the normalized energy of all the pixel points is determined, the energy of the pixel points is arranged in a matrix form, and the matrix represents the normalized image to be identified. Namely, the normalized image to be identified is determined according to the normalized energy of all the pixel points.
Fig. 5 is a flowchart illustrating a specific implementation method of the language identification method according to an embodiment of the present invention. As shown in fig. 5, there is an image 51 to be recognized, and the image 51 to be recognized is subjected to energy normalization. And performing multi-wavelet decomposition on the normalized image to be recognized to obtain exploded views, wherein texture feature vectors of the image to be recognized can be obtained from the exploded views. And inputting the texture feature vector into the SVM, so that the language of the text of the image to be recognized can be obtained. The language here may be english, chinese, russian, japanese, or arabic, etc.
It should be noted that an SVM is a two-class classifier. In the embodiment of the invention, a plurality of SVM can be predetermined to realize multi-language recognition. For example, if the SVM1 is a chinese and english classifier and the SVM2 is a russian and japanese classifier, the SVM system formed by the SVM1 and the SVM2 can recognize 4 languages.
The following describes a process of verifying the language identification method provided by the embodiment of the present invention. Fig. 6A is a schematic diagram illustrating a first image library for verifying a language identification method according to an embodiment of the present invention. Fig. 6B is a schematic diagram illustrating a second image library for verifying the language identification method according to the embodiment of the present invention.
Because the research field of language identification problems does not have an authoritative available text image library, the invention collects and establishes two text image libraries to check the proposed method. In order to test the robustness of the multi-wavelet transform texture feature extraction method on the image quality, a second image library is specially established.
A first image library: the document formed by the character editing software is converted into a static image, the character content mainly comes from Chinese and English novel and multi-national language material of national language translation center, the size is 128x128, the image is stored as 8-bit gray level image, and the image comprises 300 text images of Chinese, English, Tibetan and Wei characters, and the quality degradation phenomenon is almost avoided. Each image contains 8 lines of text. As shown in fig. 6A.
A second image library: the text image is obtained by scanning paper materials, and the image source is the collection on the internet and the conversion of a Portable Document Format (PDF) with low quality. The size is 128x128, and the images are stored as 8-bit gray images, and each of the 200 text images comprises English and Chinese characters. Such images are relatively complex in origin and generally suffer from quality degradation phenomena such as fracture, noise, tilt, etc. As shown in fig. 6B.
The hardware and software configuration used for this verification process is as follows: 1. hardware environment: central Processing Unit (CPU): intel (R) core (TM) Quad CPU @2.40GHz, memory: 4G; 2. software environment: operating the system: 64-bit Windows 8 operating system. An experiment platform: MATLAB R2014a
In the verification process, the evaluation index of the algorithm performance is divided into two parts: the efficiency of language identification and the effect of language identification. The efficiency mainly refers to the extraction time of the texture feature vector, and the effect mainly refers to the accuracy and recall rate of cross validation. The accuracy is defined as: 1. the accuracy rate is the number of classified correct samples/the total number of samples classified into the category; the recall ratio is defined as: 2. recall is the number of classified correct samples/number of samples in the sample set.
Experiments are respectively carried out on the first image library and the second image library, the algorithm is verified by adopting a ten-fold cross verification method, the accuracy and the recall rate are obtained, and the average characteristic analysis time is counted. By calculation, the extraction time of the single image is about 0.4639 s. Table 1 shows the experimental results for the first image library. Table 2 shows the experimental results for the second image library.
Table 1 experimental results for the first image library
Table 2 experimental results for the second image library
As can be seen from tables 1 and 2, the English image classification accuracy rate of the language identification method provided by the embodiment of the invention is about 90%, and the identification accuracy rate is high.
The language identification method provided by the embodiment of the invention obtains the texture feature vector of the image to be identified by adopting the multi-wavelet transform method, inputs the texture feature vector of the image to be identified into the predetermined SVM, and obtains the language of the text in the image to be identified, thereby realizing the purpose of obtaining the texture feature vector by adopting the multi-wavelet transform method. Meanwhile, due to the characteristics of the multi-wavelet transform method, the language identification method provided by the embodiment of the invention is high in identification accuracy. Therefore, the language identification method provided by the embodiment of the invention improves the efficiency and accuracy of language identification.
Fig. 7 is a schematic structural diagram of an embodiment of a language identification device according to an embodiment of the present invention. As shown in fig. 7, the language identification apparatus provided in the embodiment of the present invention includes the following modules:
a first obtaining module 71, configured to obtain a texture feature vector of the image to be identified by using a multi-wavelet transform method.
Optionally, the first obtaining module 71 is specifically configured to: processing an image to be recognized by adopting an interpolation filtering method to obtain a coefficient matrix of the image to be recognized; taking first half row elements of all rows in the coefficient matrix as parameters of a first scale function of the second-order multi-wavelet, and taking second half row elements of all rows in the coefficient matrix as parameters of a second scale function of the second-order multi-wavelet; determining a second-order multi-wavelet according to the parameters of the first scale function and the parameters of the second scale function; decomposing the coefficient matrix according to the second-order multi-wavelet to obtain a first preset number of detail sub-images and a second preset number of approximation sub-images of the image to be recognized; respectively determining the energy mean value and the energy variance of each detail sub-graph; and taking the energy mean value and the energy variance of each detail sub-image as texture feature vectors of the image to be identified.
The second obtaining module 72 is configured to input the texture feature vector of the image to be recognized into a predetermined SVM, and obtain a language of a text in the image to be recognized.
Optionally, the apparatus further comprises: and the normalization processing module is used for normalizing the energy of each pixel point of the image to be recognized to obtain the normalized image to be recognized. The first obtaining module 71 processes the image to be recognized by using an interpolation filtering method, which specifically includes: and processing the normalized image to be recognized by adopting an interpolation filtering method.
The normalization processing module is specifically configured to: according to the formulaAcquiring the normalized energy of each pixel point in the image to be identified; wherein m represents the row serial number of the pixel points in the image to be identified, n represents the column serial number of the pixel points in the image to be identified, W (m, n) represents the energy of the pixel points (m, n), and Q represents the total number of the pixel points in each row and the total number of the pixel points in each column in the image to be identified; and determining the normalized image to be identified according to the normalized energy of all the pixel points.
The device also includes: the third acquisition module is used for acquiring the texture feature vector of each training image in the training image set by adopting a multi-wavelet transform method; and the determining module is used for determining the classification surface of the SVM according to the texture feature vector of each training image and the language calibration result of each training image. The second obtaining module 72 is specifically configured to determine the language of the text in the image to be recognized according to the texture feature vector of the image to be recognized and the classification plane of the SVM.
The language identification device provided by the embodiment of the present invention can be specifically used for executing the language identification method in the embodiment shown in fig. 1, and the implementation process and the technical principle are similar, and are not described herein again.
The language identification device provided by the embodiment of the invention is characterized in that a first acquisition module is arranged for acquiring the texture feature vector of the image to be identified by adopting a multi-wavelet transform method, and a second acquisition module is arranged for inputting the texture feature vector of the image to be identified into a predetermined SVM to acquire the language of the text in the image to be identified, so that the texture feature vector is acquired by adopting the multi-wavelet transform method. Meanwhile, due to the characteristics of the multi-wavelet transformation method, the language identification device provided by the embodiment of the invention has higher identification accuracy. Therefore, the language identification device provided by the embodiment of the invention improves the efficiency and accuracy of language identification.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Claims (10)
1. A language identification method, comprising:
acquiring a texture feature vector of an image to be identified by adopting a multi-wavelet transform method;
and inputting the texture feature vector of the image to be recognized into a predetermined Support Vector Machine (SVM), and acquiring the language of the text in the image to be recognized.
2. The method according to claim 1, wherein the obtaining the texture feature vector of the image to be identified by using the multi-wavelet transform method comprises:
processing the image to be identified by adopting an interpolation filtering method to obtain a coefficient matrix of the image to be identified;
taking first half row elements of all rows in the coefficient matrix as parameters of a first scale function of the second-order multi-wavelet, and taking second half row elements of all rows in the coefficient matrix as parameters of a second scale function of the second-order multi-wavelet;
determining the second-order multi-wavelet according to the parameters of the first scale function and the parameters of the second scale function;
decomposing the coefficient matrix according to the second-order multi-wavelet to obtain a first preset number of detail sub-images and a second preset number of approximation sub-images of the image to be recognized;
respectively determining an energy mean value and an energy variance of each detail sub-graph;
and taking the energy mean value and the energy variance of each detail sub-image as the texture feature vector of the image to be identified.
3. The method of claim 2, wherein before the preprocessing the image to be recognized by interpolation filtering, the method further comprises:
normalizing the energy of each pixel point of the image to be recognized to obtain a normalized image to be recognized;
the processing of the image to be identified by adopting an interpolation filtering method comprises the following steps:
and processing the normalized image to be recognized by adopting an interpolation filtering method.
4. The method according to claim 3, wherein the normalizing the energy of each pixel point of the image to be recognized to obtain a normalized image to be recognized comprises:
according to the formulaAcquiring the normalized energy of each pixel point in the image to be identified; wherein m represents the row sequence number of the pixel points in the image to be identified, n represents the column sequence number of the pixel points in the image to be identified, W (m, n) represents the energy of the pixel points (m, n), and Q represents the total number of the pixel points in each row and the total number of the pixel points in each column in the image to be identified;
and determining the normalized image to be identified according to the normalized energy of all the pixel points.
5. The method of claims 2-4, wherein the separately determining an energy mean and an energy variance for each of the detail sub-images comprises:
according to the formulaDetermining an energy value of each detail sub-graph; wherein: a represents the row sequence number of the pixel points in the detail subgraph, b represents the column sequence number of the pixel points in the detail subgraph, and l is (1, 2, 3, 4) which represents that each level of detail subgraphs decompose four detail subgraphs in the same direction; j ═ 1, 2, and represents the number of decomposition stages; k ═ 1, 2, 3, representing the three horizontal, vertical, and diagonal directions; n represents the total number of pixel points in each row and the total number of pixel points in each column of the detail subgraph, and W (a, b) represents the energy of the pixel points (a, b);
according to the formulaDetermining an average energy value of each detail sub-graph;
according to the formulaThe energy variance of each detail sub-graph is determined.
6. The method according to claim 1, wherein before inputting the texture feature vector of the image to be recognized into a predetermined Support Vector Machine (SVM), the language of the text in the image to be recognized is obtained, the method further comprises:
acquiring a texture feature vector of each training image in a training image set by adopting a multi-wavelet transform method;
determining a classification surface of the SVM according to the texture feature vector of each training image and the language calibration result of each training image;
inputting the texture feature vector of the image to be recognized into a predetermined SVM, and acquiring the language of the text in the image to be recognized, including:
and determining the language of the text in the image to be recognized according to the texture feature vector of the image to be recognized and the classification surface of the SVM.
7. A language identification device, comprising:
the first acquisition module is used for acquiring the texture feature vector of the image to be identified by adopting a multi-wavelet transform method;
and the second acquisition module is used for inputting the texture feature vector of the image to be recognized into a predetermined Support Vector Machine (SVM) to acquire the language of the text in the image to be recognized.
8. The apparatus of claim 7, wherein the first obtaining module is specifically configured to:
processing the image to be identified by adopting an interpolation filtering method to obtain a coefficient matrix of the image to be identified;
taking first half row elements of all rows in the coefficient matrix as parameters of a first scale function of the second-order multi-wavelet, and taking second half row elements of all rows in the coefficient matrix as parameters of a second scale function of the second-order multi-wavelet;
determining the second-order multi-wavelet according to the parameters of the first scale function and the parameters of the second scale function;
decomposing the coefficient matrix according to the second-order multi-wavelet to obtain a first preset number of detail sub-images and a second preset number of approximation sub-images of the image to be recognized;
respectively determining an energy mean value and an energy variance of each detail sub-graph;
and taking the energy mean value and the energy variance of each detail sub-image as the texture feature vector of the image to be identified.
9. The apparatus of claim 8, further comprising:
the normalization processing module is used for performing normalization processing on the energy of each pixel point of the image to be recognized to obtain a normalized image to be recognized;
the first obtaining module processes the image to be recognized by adopting an interpolation filtering method, and specifically includes:
and processing the normalized image to be recognized by adopting an interpolation filtering method.
10. The apparatus of claim 9, wherein the normalization processing module is configured to:
according to the formulaAcquiring the normalized energy of each pixel point in the image to be identified; wherein m represents the row sequence number of the pixel points in the image to be identified, n represents the column sequence number of the pixel points in the image to be identified, W (m, n) represents the energy of the pixel points (m, n), and Q represents the total number of the pixel points in each row and the total number of the pixel points in each column in the image to be identified;
and determining the normalized image to be identified according to the normalized energy of all the pixel points.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710272779.6A CN107256378A (en) | 2017-04-24 | 2017-04-24 | Language Identification and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710272779.6A CN107256378A (en) | 2017-04-24 | 2017-04-24 | Language Identification and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107256378A true CN107256378A (en) | 2017-10-17 |
Family
ID=60027222
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710272779.6A Pending CN107256378A (en) | 2017-04-24 | 2017-04-24 | Language Identification and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107256378A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108172212A (en) * | 2017-12-25 | 2018-06-15 | 横琴国际知识产权交易中心有限公司 | A kind of voice Language Identification and system based on confidence level |
CN110334705A (en) * | 2019-06-25 | 2019-10-15 | 华中科技大学 | A kind of Language Identification of the scene text image of the global and local information of combination |
CN111027528A (en) * | 2019-11-22 | 2020-04-17 | 华为技术有限公司 | Language identification method and device, terminal equipment and computer readable storage medium |
CN111339787A (en) * | 2018-12-17 | 2020-06-26 | 北京嘀嘀无限科技发展有限公司 | Language identification method and device, electronic equipment and storage medium |
CN114462397A (en) * | 2022-01-20 | 2022-05-10 | 连连(杭州)信息技术有限公司 | Language identification model training method, language identification method and device and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103295001A (en) * | 2013-06-03 | 2013-09-11 | 哈尔滨理工大学 | Invoice information recognition equipment and invoice information recognition method both applied to handhold equipment |
CN105139041A (en) * | 2015-08-21 | 2015-12-09 | 北京旷视科技有限公司 | Method and device for recognizing languages based on image |
CN105760901A (en) * | 2016-01-27 | 2016-07-13 | 南开大学 | Automatic language identification method for multilingual skew document image |
-
2017
- 2017-04-24 CN CN201710272779.6A patent/CN107256378A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103295001A (en) * | 2013-06-03 | 2013-09-11 | 哈尔滨理工大学 | Invoice information recognition equipment and invoice information recognition method both applied to handhold equipment |
CN105139041A (en) * | 2015-08-21 | 2015-12-09 | 北京旷视科技有限公司 | Method and device for recognizing languages based on image |
CN105760901A (en) * | 2016-01-27 | 2016-07-13 | 南开大学 | Automatic language identification method for multilingual skew document image |
Non-Patent Citations (2)
Title |
---|
顾立娟 等: "基于多小波变换的文本图像文种识别", 《电子设计工程》 * |
顾立娟: "基于文本图像纹理特征的文种识别技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108172212A (en) * | 2017-12-25 | 2018-06-15 | 横琴国际知识产权交易中心有限公司 | A kind of voice Language Identification and system based on confidence level |
CN108172212B (en) * | 2017-12-25 | 2020-09-11 | 横琴国际知识产权交易中心有限公司 | Confidence-based speech language identification method and system |
CN111339787A (en) * | 2018-12-17 | 2020-06-26 | 北京嘀嘀无限科技发展有限公司 | Language identification method and device, electronic equipment and storage medium |
CN111339787B (en) * | 2018-12-17 | 2023-09-19 | 北京嘀嘀无限科技发展有限公司 | Language identification method and device, electronic equipment and storage medium |
CN110334705A (en) * | 2019-06-25 | 2019-10-15 | 华中科技大学 | A kind of Language Identification of the scene text image of the global and local information of combination |
CN110334705B (en) * | 2019-06-25 | 2021-08-03 | 华中科技大学 | Language identification method of scene text image combining global and local information |
CN111027528A (en) * | 2019-11-22 | 2020-04-17 | 华为技术有限公司 | Language identification method and device, terminal equipment and computer readable storage medium |
CN111027528B (en) * | 2019-11-22 | 2023-10-03 | 华为技术有限公司 | Language identification method, device, terminal equipment and computer readable storage medium |
CN114462397A (en) * | 2022-01-20 | 2022-05-10 | 连连(杭州)信息技术有限公司 | Language identification model training method, language identification method and device and electronic equipment |
CN114462397B (en) * | 2022-01-20 | 2023-09-22 | 连连(杭州)信息技术有限公司 | Language identification model training method, language identification method, device and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2020200251B2 (en) | Label and field identification without optical character recognition (OCR) | |
RU2661750C1 (en) | Symbols recognition with the use of artificial intelligence | |
Ye et al. | Text detection and recognition in imagery: A survey | |
US20190180154A1 (en) | Text recognition using artificial intelligence | |
US20190385054A1 (en) | Text field detection using neural networks | |
Bhunia et al. | Text recognition in scene image and video frame using color channel selection | |
CN107256378A (en) | Language Identification and device | |
Mehri et al. | Texture feature benchmarking and evaluation for historical document image analysis | |
CN104751093B (en) | Method and apparatus for obtaining the video identification code that host equipment is shown | |
US10657369B1 (en) | Unsupervised removal of text from images using linear programming for optimal filter design | |
WO2024027347A9 (en) | Content recognition method and apparatus, device, storage medium, and computer program product | |
Zemouri et al. | Nonsubsampled contourlet transform and k-means clustering for degraded document image binarization | |
Kaur et al. | Proposed approach for layout and handwritten character recognization in OCR | |
Pereira et al. | Discrete artificial bee colony algorithm based optical character recognition | |
Rodin et al. | Document image quality assessment via explicit blur and text size estimation | |
Zhong et al. | Image patches analysis for text block identification | |
Mehri et al. | Historical Document Image Segmentation Combining Deep Learning and Gabor Features | |
Jindal et al. | Script identification for printed document images at text-line level using DCT and PCA | |
Balobaid et al. | Contemporary Methods on Text Detection and Localization from Natural Scene Images and Applications | |
Islam et al. | A deep convolutional neural network for Bangla handwritten numeral recognition | |
RU2792743C1 (en) | Identification of writing systems used in documents | |
Cui et al. | Image classification: No features, no clustering | |
Mehri et al. | Use of SLIC superpixels for ancient document image enhancement and segmentation | |
Ibrahim | Development for Application of Malaysia License Plate Recognition (MLPR) using Convolutional Neural Network (CNN). | |
Shelke et al. | A Tabulation method for Character Recognition using Haar wavelet in 2D images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171017 |