CN107256378A

CN107256378A - Language Identification and device

Info

Publication number: CN107256378A
Application number: CN201710272779.6A
Authority: CN
Inventors: 刘庆杰; 王博; 陈兴岳; 王蕴红; 陈训逊
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2017-04-24
Filing date: 2017-04-24
Publication date: 2017-10-17

Abstract

The embodiment of the present invention provides a kind of Language Identification and device.This method includes：The textural characteristics vector of images to be recognized is obtained using multi-wavelet transformation method, the textural characteristics vector of images to be recognized is inputted in predetermined SVM, obtain the languages of images to be recognized Chinese version, realize and textural characteristics vector is obtained using multi-wavelet transformation method, because the process that multi-wavelet transformation method obtains textural characteristics vector is very fast, therefore, the recognition efficiency of Language Identification provided in an embodiment of the present invention is higher.Simultaneously as the characteristic of multi-wavelet transformation method, the recognition accuracy of the Language Identification of the embodiment of the present invention is also higher.Therefore, the efficiency and accuracy rate of languages identification are improved using Language Identification provided in an embodiment of the present invention.

Description

Language identification method and device

Technical Field

The present invention relates to machine learning technologies, and in particular, to a language identification method and apparatus.

Background

With the increasing demand for human and computer interaction, it is desirable to have a system that can handle a variety of inputs, such as in print and handwritten document formats, in order to make human interaction more efficient. Although computers are widely used in almost all fields, paper documents still hold an important position for the next long time. In addition, business documents in physical form are widely available to meet a variety of needs in reality. Such as faxing documents, presenting documents in a court, etc. Therefore, there is a pressing need for methods and systems that can automatically extract, analyze, and store information from existing paper forms, as well as maintain and access when needed. All of these problems are attributed to textual image analysis. Language identification is an important component in the field of text image processing, and its application scenarios include text image sorting, searching online text documents containing specific languages, designing an Optical Character Recognition (OCR) system, and an automatic text retrieval system based on the language of the base document.

Currently, language identification can be performed based on texture features of images. The existing language identification method based on texture features mainly comprises the following steps: based on gray level co-occurrence matrix method, fractal dimension method and Gabor filter method. These methods first require extraction of the textural features of the image. The gray level co-occurrence matrix-based method is a method for performing language identification by extracting a feature value of an image according to a co-occurrence matrix. In practical application, because the computation of the gray level co-occurrence matrix is relatively complex, the computation complexity is generally reduced by preprocessing the image binarization. The gray level co-occurrence matrix of the processed image generates three values, and the three values can be used for language identification as texture features. The basic idea of the fractal dimension-based method is to understand an image as a multi-fractal set, and texture features of the image are obtained by calculating a corresponding generalized dimension spectrum. The text image is treated as a fractal model based on a fractal dimension method, and the textural features of the image are expressed by the fractal dimension calculated by difference box-counting dimension, so that the task of language identification is completed. A multi-channel model based on the Gabor filter is constructed based on a Gabor filter method, so that textural features of the text image are extracted, and the purpose of language identification can be achieved.

However, the above methods have the disadvantages of large calculation amount and long time for extracting the texture features, so the current language identification method has low efficiency.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a language identification method and a language identification device, so as to improve the efficiency of language identification.

In a first aspect, an embodiment of the present invention provides a language identification method, including:

acquiring a texture feature vector of an image to be identified by adopting a multi-wavelet transform method;

and inputting the texture feature vector of the image to be recognized into a predetermined SVM to obtain the language of the text in the image to be recognized.

In the method shown above, the obtaining the texture feature vector of the image to be identified by using the multi-wavelet transform method includes:

processing the image to be identified by adopting an interpolation filtering method to obtain a coefficient matrix of the image to be identified;

taking first half row elements of all rows in the coefficient matrix as parameters of a first scale function of the second-order multi-wavelet, and taking second half row elements of all rows in the coefficient matrix as parameters of a second scale function of the second-order multi-wavelet;

determining the second-order multi-wavelet according to the parameters of the first scale function and the parameters of the second scale function;

decomposing the coefficient matrix according to the second-order multi-wavelet to obtain a first preset number of detail sub-images and a second preset number of approximation sub-images of the image to be recognized;

respectively determining an energy mean value and an energy variance of each detail sub-graph;

and taking the energy mean value and the energy variance of each detail sub-image as the texture feature vector of the image to be identified.

In the method shown above, before the preprocessing the image to be recognized by using the interpolation filtering method, the method further includes:

normalizing the energy of each pixel point of the image to be recognized to obtain a normalized image to be recognized;

the processing of the image to be identified by adopting an interpolation filtering method comprises the following steps:

and processing the normalized image to be recognized by adopting an interpolation filtering method.

In the method, the normalizing the energy of each pixel of the image to be recognized to obtain the normalized image to be recognized includes:

according to the formulaAcquiring the normalized energy of each pixel point in the image to be identified; wherein m represents the row sequence number of the pixel points in the image to be identified, n represents the column sequence number of the pixel points in the image to be identified, W (m, n) represents the energy of the pixel points (m, n), and Q represents the total number of the pixel points in each row and the total number of the pixel points in each column in the image to be identified;

and determining the normalized image to be identified according to the normalized energy of all the pixel points.

In the method shown above, the separately determining an energy mean and an energy variance of each detail sub-graph includes:

according to the formulaDetermining an energy value of each detail sub-graph; wherein: a represents the row sequence number of the pixel points in the detail subgraph, b represents the column sequence number of the pixel points in the detail subgraph, and l is (1, 2, 3, 4) which represents that each level of detail subgraphs decompose four detail subgraphs in the same direction; j ═ 1, 2, and represents the number of decomposition stages; k ═ 1, 2, 3, representing the three horizontal, vertical, and diagonal directions; n represents the total number of pixel points in each row and the total number of pixel points in each column of the detail subgraph, and W (a, b) represents the energy of the pixel points (a, b);

according to the formulaDetermining an average energy value of each detail sub-graph;

according to the formulaThe energy variance of each detail sub-graph is determined.

In the method as described above, before the texture feature vector of the image to be recognized is input into a predetermined support vector machine SVM and the language of the text in the image to be recognized is obtained, the method further includes:

acquiring a texture feature vector of each training image in a training image set by adopting a multi-wavelet transform method;

determining a classification surface of the SVM according to the texture feature vector of each training image and the language calibration result of each training image;

inputting the texture feature vector of the image to be recognized into a predetermined SVM, and acquiring the language of the text in the image to be recognized, including:

and determining the language of the text in the image to be recognized according to the texture feature vector of the image to be recognized and the classification surface of the SVM.

In a second aspect, an embodiment of the present invention provides a language identification apparatus, including:

the first acquisition module is used for acquiring the texture feature vector of the image to be identified by adopting a multi-wavelet transform method;

and the second acquisition module is used for inputting the texture feature vector of the image to be recognized into a predetermined Support Vector Machine (SVM) to acquire the language of the text in the image to be recognized.

In the above apparatus, the first obtaining module is specifically configured to:

In the apparatus as described above, the apparatus further comprises:

the normalization processing module is used for performing normalization processing on the energy of each pixel point of the image to be recognized to obtain a normalized image to be recognized;

the first obtaining module processes the image to be recognized by adopting an interpolation filtering method, and specifically includes:

In the apparatus as shown above, the normalization processing module is configured to:

According to the language identification method and device provided by the embodiment of the invention, the texture feature vector of the image to be identified is obtained by adopting the multi-wavelet transform method, the texture feature vector of the image to be identified is input into the predetermined SVM, and the language of the text in the image to be identified is obtained, so that the texture feature vector is obtained by adopting the multi-wavelet transform method, and the process of obtaining the texture feature vector by adopting the multi-wavelet transform method is quicker, so that the language identification method provided by the embodiment of the invention has higher identification efficiency. Meanwhile, due to the characteristics of the multi-wavelet transform method, the language identification method provided by the embodiment of the invention is high in identification accuracy. Therefore, the language identification method provided by the embodiment of the invention improves the efficiency and accuracy of language identification.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

FIG. 1 is a flowchart illustrating a language identification method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a specific implementation manner of S101 in the embodiment shown in FIG. 1;

FIG. 3 is a schematic diagram of the multi-wavelet transform of FIG. 2;

FIG. 4 is a graph showing the results of the experiment in FIG. 3;

fig. 5 is a schematic flow chart of a specific implementation method of the language identification method according to the embodiment of the present invention;

FIG. 6A is a diagram illustrating a first image library for verifying a language identification method according to an embodiment of the present invention;

FIG. 6B is a diagram illustrating a second image library for verifying the language identification method according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an embodiment of a language identification device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first," "second," "third," and "fourth," if any, in the description and claims of the invention and in the above-described figures are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The technical solution of the present invention will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 1 is a flowchart illustrating an embodiment of a language identification method according to an embodiment of the present invention. As shown in fig. 1, the language identification method provided in the embodiment of the present invention includes the following steps:

s101: and acquiring the texture feature vector of the image to be identified by adopting a multi-wavelet transform method.

Specifically, the language identification method provided by the embodiment of the present invention may be executed by a computer. Determining the language of text in an image may be applied to ranking text images or searching for online text documents containing a particular language, etc. Based on language identification of different application scenarios, the computer in the embodiment of the present invention may be a server of a search engine, or a server in a document database.

Texture of an image is a visual feature that reflects homogeneity in the image, and refers to a function of spatially varying pixel intensity. Thus, the texture feature vector of the image to be recognized may reflect the spatially varying pixel intensities of the image to be recognized. The image to be identified in the embodiment of the invention can be a text image obtained by scanning a paper material, or a text image converted from a document generated by character editing software. The content of the text in the image to be recognized may be a novel in various languages, a document, and the like.

Fig. 2 is a flowchart illustrating a specific implementation manner of S101 in the embodiment shown in fig. 1. As shown in fig. 2, in a possible implementation manner, a specific process of obtaining a texture feature vector of an image to be identified by using a multi-wavelet transformation method is as follows:

s1011: and processing the image to be recognized by adopting an interpolation filtering method to obtain a coefficient matrix of the image to be recognized.

Since the multi-wavelet transform is only applicable to vector signals, to perform the multi-wavelet transform on image signals, it is necessary to pre-filter rows and columns of an image, then compose the vector signals of the rows and columns of the pre-filtered image according to a certain rule, and then perform the multi-wavelet transform.

Representing an image to be identified of N × N, where N represents the total number of pixels in each row and column of the image, it can be seen that the image is a square image, a_i,jThe pixel value of the pixel point (i, j) is represented, i is more than or equal to 1 and less than or equal to N, j is more than or equal to 0 and less than or equal to N, and the pixel point (i, j) represents the ith rowAnd j-th row of pixel points.

The specific process of the interpolation filtering method is as follows:

firstly, each line of an image A to be recognized forms a line vector signal according to the following mode:

then to A_irow(n) pre-filtering:

wherein,which represents the impulse response of the pre-filter, I_irow(n) of composition I₁Is a matrix with 2N rows and N/2 columns.

Pre-filtering the rows:

will I₁Each column of (a) constitutes a column vector signal in the following way:

then to B_icol(n) pre-filtering:

finally obtained I_icol(n) of composition I₂Is the coefficient matrix of the image to be recognized.

S1012: and taking the first half row elements of all rows in the coefficient matrix as parameters of a first scale function of the second-order multi-wavelet, and taking the second half row elements of all rows in the coefficient matrix as parameters of a second scale function of the second-order multi-wavelet.

Specifically, optionally, in the embodiment of the present invention, a biorthogonal multi-wavelet "ghmap 2" is selected to decompose the coefficient matrix. ghmap2 is a biorthogonal multiple wavelet function constructed from two scale functions. The second-order multi-wavelet has two parameters: a first scale function and a second scale function. The parameters of the first scale function and the parameters of the second scale function are determined to determine a second order multi-wavelet. The parameters of the first scale function and the second scale function refer to scale factors of the first scale function and the second scale function. From the coefficient matrix, parameters of the first scale function and parameters of the second scale function may be determined.

S1013: and determining a second-order multi-wavelet according to the parameters of the first scale function and the parameters of the second scale function.

S1014: and decomposing the coefficient matrix according to the second-order multi-wavelet to obtain a first preset number of detail subgraphs and a second preset number of approximation subgraphs of the image to be recognized.

Specifically, in the embodiment of the present invention, a coefficient matrix is subjected to two-stage decomposition, and finally, 4 approximation subgraphs and 24 detail subgraphs can be obtained.

Fig. 3 is a schematic diagram of the multi-wavelet transform in fig. 2. As shown in fig. 3, the pre-filtering process is performed on the image to be recognized to obtain a coefficient matrix of the image to be recognized. And performing primary decomposition and secondary decomposition on the coefficient matrix. There are four types of graphs after the first-order decomposition: LL, HL, LH and HH. The four types of graphs are parameters obtained in the process of decomposing two-dimensional discrete wavelet transform into two groups of mirror image filters and reconstructing signals according to the multi-resolution analysis theory. LL corresponds to the low-pass filter in the convolution decomposition filter of the row and the column respectively, and HL corresponds to the convolution decomposition high-pass filter of the row and the convolution decomposition low-pass filter of the column to inferiorly classify LH and HH. HH, LH and HL hold edge detail information of an original image and describe detail characteristics of the image, and are called detail subgraphs. The LL retains more energy information of the original image, called the approximation subgraph. In the second-level decomposition, the LL graph is decomposed again to obtain 16 sub-graphs. Of the 16 sub-graphs, there are 4 LL graphs, the 4 LL graphs 31 are approximate sub-graphs, and the other graphs after the secondary decomposition are detail sub-graphs, so that the number of the detail sub-graphs 12+12 is 24. Fig. 4 is a schematic diagram of the experimental result in fig. 3, fig. 41 in fig. 4 is an approximation subgraph, and the other figures are detail subgraphs. The detail subgraph and the approximation subgraph in the embodiment of the invention refer to graphs after two-stage decomposition.

S1015: and respectively determining the energy mean value and the energy variance of each detail sub-graph.

Specifically, the energy mean and the energy variance of each detail sub-graph can be determined according to the energy of each pixel point in the detail sub-graph. The detail subgraph is stored in a matrix form, and the value of each element in the matrix represents the energy of the pixel point represented by the element.

According to the formulaAn energy value for each detail sub-graph is determined. Wherein: a represents the row sequence number of the pixel points in the detail subgraph, b represents the column sequence number of the pixel points in the detail subgraph, and l is (1, 2, 3, 4) which represents that each level of detail subgraphs decompose four detail subgraphs in the same direction; j ═ 1, 2, and represents the number of decomposition stages; k ═ 1, 2, 3, representing the three horizontal, vertical, and diagonal directions; and N represents the total number of pixel points in each row and the total number of pixel points in each column of the detail subgraph. It can be seen that the detail subgraph is a square. According to the formulaAn average energy value for each detail sub-graph is determined. According to the formulaThe energy variance of each detail sub-graph is determined.

S1016: and taking the energy mean value and the energy variance of each detail sub-image as texture feature vectors of the image to be identified.

Considering that information distribution in different directions and frequencies is different for a text image, and a detail sub-image has strong expression capability for edge information, the embodiment of the present invention uses an energy mean and an energy variance of all detail sub-images to construct texture features of an image to be recognized. After the energy mean value and the energy variance of each detail sub-image are determined, the energy mean value and the energy variance are used as texture feature vectors of the image to be identified. Since the number of detail sub-images is 24, each detail sub-image has two-dimensional measures of energy mean and energy variance, and thus, the texture feature vector is 48-dimensional. The texture feature vector is represented as follows:

s102: and inputting the texture feature vector of the image to be recognized into a predetermined SVM to acquire the language of the text in the image to be recognized.

Specifically, a Support Vector Machine (SVM) is determined from the training image set. Before S102, a classification surface in the SVM needs to be determined, and the process is as follows: acquiring a texture feature vector of each training image in a training image set by adopting a multi-wavelet transform method; and determining the classification surface of the SVM according to the texture feature vector of each training image and the language calibration result of each training image. The language calibration result of each training image refers to the correct language result of the text in the training image determined manually or in other ways. Optionally, the step of inputting the texture feature vector of the image to be recognized into a predetermined SVM to obtain the language of the text in the image to be recognized may be: and determining the language of the text in the image to be recognized according to the texture feature vector of the image to be recognized and the classification surface of the SVM.

In the embodiment of the invention, the SVM can be determined by using an LIBSVM toolkit. This is a simple and effective toolkit for SVMs, providing a variety of interfaces that facilitate our tuning for specific applications. Meanwhile, a plurality of default parameters are provided, so that the adjustment parameters are simplified, and the function of interactive verification is provided. The classification steps are as follows:

1) the data set was prepared according to the following format:

<label><index1>:<value1><index2>:<value2>...

where < label > is used to mark a category. < index > is an integer starting with 1, which may be discontinuous; < value > is a real number.

2) Scaling data using a scale method to improve operating efficiency;

3) selecting a radial basis kernel function as a kernel function;

4) training and adjusting parameters C and g;

5) completing model training according to the parameters C and g;

6) and testing on the test set by using the trained SVM classifier.

Optionally, the image to be identified may be energy normalized. Different images to be recognized have different energies, so that the calculation of the sub-image energy is influenced, therefore, before the multi-wavelet transformation is carried out, the energy of each pixel point of the images to be recognized is normalized, and the normalized images to be recognized are obtained. When the interpolation filtering method is used for processing the image to be recognized, the normalized image to be recognized is processed by the interpolation filtering method.

Can be according to the formulaAnd acquiring the normalized energy of each pixel point in the image to be identified. Wherein m represents the row number of the pixel points in the image to be identified, n represents the column number of the pixel points in the image to be identified, W (m, n) represents the energy of the pixel points (m, n), and Q represents each time in the image to be identifiedThe total number of pixel points in a row and the total number of pixel points in each column. After the normalized energy of all the pixel points is determined, the energy of the pixel points is arranged in a matrix form, and the matrix represents the normalized image to be identified. Namely, the normalized image to be identified is determined according to the normalized energy of all the pixel points.

Fig. 5 is a flowchart illustrating a specific implementation method of the language identification method according to an embodiment of the present invention. As shown in fig. 5, there is an image 51 to be recognized, and the image 51 to be recognized is subjected to energy normalization. And performing multi-wavelet decomposition on the normalized image to be recognized to obtain exploded views, wherein texture feature vectors of the image to be recognized can be obtained from the exploded views. And inputting the texture feature vector into the SVM, so that the language of the text of the image to be recognized can be obtained. The language here may be english, chinese, russian, japanese, or arabic, etc.

It should be noted that an SVM is a two-class classifier. In the embodiment of the invention, a plurality of SVM can be predetermined to realize multi-language recognition. For example, if the SVM1 is a chinese and english classifier and the SVM2 is a russian and japanese classifier, the SVM system formed by the SVM1 and the SVM2 can recognize 4 languages.

The following describes a process of verifying the language identification method provided by the embodiment of the present invention. Fig. 6A is a schematic diagram illustrating a first image library for verifying a language identification method according to an embodiment of the present invention. Fig. 6B is a schematic diagram illustrating a second image library for verifying the language identification method according to the embodiment of the present invention.

Because the research field of language identification problems does not have an authoritative available text image library, the invention collects and establishes two text image libraries to check the proposed method. In order to test the robustness of the multi-wavelet transform texture feature extraction method on the image quality, a second image library is specially established.

A first image library: the document formed by the character editing software is converted into a static image, the character content mainly comes from Chinese and English novel and multi-national language material of national language translation center, the size is 128x128, the image is stored as 8-bit gray level image, and the image comprises 300 text images of Chinese, English, Tibetan and Wei characters, and the quality degradation phenomenon is almost avoided. Each image contains 8 lines of text. As shown in fig. 6A.

A second image library: the text image is obtained by scanning paper materials, and the image source is the collection on the internet and the conversion of a Portable Document Format (PDF) with low quality. The size is 128x128, and the images are stored as 8-bit gray images, and each of the 200 text images comprises English and Chinese characters. Such images are relatively complex in origin and generally suffer from quality degradation phenomena such as fracture, noise, tilt, etc. As shown in fig. 6B.

The hardware and software configuration used for this verification process is as follows: 1. hardware environment: central Processing Unit (CPU): intel (R) core (TM) Quad CPU @2.40GHz, memory: 4G; 2. software environment: operating the system: 64-bit Windows 8 operating system. An experiment platform: MATLAB R2014a

In the verification process, the evaluation index of the algorithm performance is divided into two parts: the efficiency of language identification and the effect of language identification. The efficiency mainly refers to the extraction time of the texture feature vector, and the effect mainly refers to the accuracy and recall rate of cross validation. The accuracy is defined as: 1. the accuracy rate is the number of classified correct samples/the total number of samples classified into the category; the recall ratio is defined as: 2. recall is the number of classified correct samples/number of samples in the sample set.

Experiments are respectively carried out on the first image library and the second image library, the algorithm is verified by adopting a ten-fold cross verification method, the accuracy and the recall rate are obtained, and the average characteristic analysis time is counted. By calculation, the extraction time of the single image is about 0.4639 s. Table 1 shows the experimental results for the first image library. Table 2 shows the experimental results for the second image library.

Table 1 experimental results for the first image library

Table 2 experimental results for the second image library

As can be seen from tables 1 and 2, the English image classification accuracy rate of the language identification method provided by the embodiment of the invention is about 90%, and the identification accuracy rate is high.

The language identification method provided by the embodiment of the invention obtains the texture feature vector of the image to be identified by adopting the multi-wavelet transform method, inputs the texture feature vector of the image to be identified into the predetermined SVM, and obtains the language of the text in the image to be identified, thereby realizing the purpose of obtaining the texture feature vector by adopting the multi-wavelet transform method. Meanwhile, due to the characteristics of the multi-wavelet transform method, the language identification method provided by the embodiment of the invention is high in identification accuracy. Therefore, the language identification method provided by the embodiment of the invention improves the efficiency and accuracy of language identification.

Fig. 7 is a schematic structural diagram of an embodiment of a language identification device according to an embodiment of the present invention. As shown in fig. 7, the language identification apparatus provided in the embodiment of the present invention includes the following modules:

a first obtaining module 71, configured to obtain a texture feature vector of the image to be identified by using a multi-wavelet transform method.

Optionally, the first obtaining module 71 is specifically configured to: processing an image to be recognized by adopting an interpolation filtering method to obtain a coefficient matrix of the image to be recognized; taking first half row elements of all rows in the coefficient matrix as parameters of a first scale function of the second-order multi-wavelet, and taking second half row elements of all rows in the coefficient matrix as parameters of a second scale function of the second-order multi-wavelet; determining a second-order multi-wavelet according to the parameters of the first scale function and the parameters of the second scale function; decomposing the coefficient matrix according to the second-order multi-wavelet to obtain a first preset number of detail sub-images and a second preset number of approximation sub-images of the image to be recognized; respectively determining the energy mean value and the energy variance of each detail sub-graph; and taking the energy mean value and the energy variance of each detail sub-image as texture feature vectors of the image to be identified.

The second obtaining module 72 is configured to input the texture feature vector of the image to be recognized into a predetermined SVM, and obtain a language of a text in the image to be recognized.

Optionally, the apparatus further comprises: and the normalization processing module is used for normalizing the energy of each pixel point of the image to be recognized to obtain the normalized image to be recognized. The first obtaining module 71 processes the image to be recognized by using an interpolation filtering method, which specifically includes: and processing the normalized image to be recognized by adopting an interpolation filtering method.

The normalization processing module is specifically configured to: according to the formulaAcquiring the normalized energy of each pixel point in the image to be identified; wherein m represents the row serial number of the pixel points in the image to be identified, n represents the column serial number of the pixel points in the image to be identified, W (m, n) represents the energy of the pixel points (m, n), and Q represents the total number of the pixel points in each row and the total number of the pixel points in each column in the image to be identified; and determining the normalized image to be identified according to the normalized energy of all the pixel points.

The device also includes: the third acquisition module is used for acquiring the texture feature vector of each training image in the training image set by adopting a multi-wavelet transform method; and the determining module is used for determining the classification surface of the SVM according to the texture feature vector of each training image and the language calibration result of each training image. The second obtaining module 72 is specifically configured to determine the language of the text in the image to be recognized according to the texture feature vector of the image to be recognized and the classification plane of the SVM.

The language identification device provided by the embodiment of the present invention can be specifically used for executing the language identification method in the embodiment shown in fig. 1, and the implementation process and the technical principle are similar, and are not described herein again.

The language identification device provided by the embodiment of the invention is characterized in that a first acquisition module is arranged for acquiring the texture feature vector of the image to be identified by adopting a multi-wavelet transform method, and a second acquisition module is arranged for inputting the texture feature vector of the image to be identified into a predetermined SVM to acquire the language of the text in the image to be identified, so that the texture feature vector is acquired by adopting the multi-wavelet transform method. Meanwhile, due to the characteristics of the multi-wavelet transformation method, the language identification device provided by the embodiment of the invention has higher identification accuracy. Therefore, the language identification device provided by the embodiment of the invention improves the efficiency and accuracy of language identification.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims

1. A language identification method, comprising:

and inputting the texture feature vector of the image to be recognized into a predetermined Support Vector Machine (SVM), and acquiring the language of the text in the image to be recognized.

2. The method according to claim 1, wherein the obtaining the texture feature vector of the image to be identified by using the multi-wavelet transform method comprises:

3. The method of claim 2, wherein before the preprocessing the image to be recognized by interpolation filtering, the method further comprises:

4. The method according to claim 3, wherein the normalizing the energy of each pixel point of the image to be recognized to obtain a normalized image to be recognized comprises:

5. The method of claims 2-4, wherein the separately determining an energy mean and an energy variance for each of the detail sub-images comprises:

6. The method according to claim 1, wherein before inputting the texture feature vector of the image to be recognized into a predetermined Support Vector Machine (SVM), the language of the text in the image to be recognized is obtained, the method further comprises:

7. A language identification device, comprising:

8. The apparatus of claim 7, wherein the first obtaining module is specifically configured to:

9. The apparatus of claim 8, further comprising:

10. The apparatus of claim 9, wherein the normalization processing module is configured to: