Freehand Sketchy Graphics Recognition based on character string nuclear
Technical field
The present invention relates to cartographical sketching identification based on kernel method.
Background technology
Background technology involved in the present invention comprises following three aspects:
One, cartographical sketching identification
Cartographical sketching identification is the fuzzy sketch that pen type obtains alternately to be expressed map to accurate avatars, promptly from the man-machine interaction process, excavate the sketch shape constraining in the ever-increasing sketch information, on the basis of understanding the original intention of user, freely, irregular sketch recognition become rule, geometric configuration accurately.
The main at present pattern recognition method that has three kinds of modes, the i.e. method of representing based on stroke; The method of representing based on pels such as straight line, arc, curves; Pattern recognition method based on geometric properties.
1, based on the method for expressing of gesticulating
Generally, the sketch that obtains by the pen type interactive mode of sketch recognition system shows as some strokes that are made of the sampling spot sequence of user between starting to write and starting writing.Stroke is the configuration information of figure fractal key.It is symbol that stroke is looked by early stage system, gives its specific implication, the identification of stroke is equal to the identification of figure.In the one stroke identification, comparatively classical is: the Rubine method, and this method is simply effective, but it requires the user to sketch the contours figure with fixed mode stroke, and figure constitutes fairly simple.
2, based on the method for expressing of pel
Represent that based on pel with diagrammatic representation be the combinations of pels such as straight line, arc, curve under the certain space relation.Based on the method for pel comprise usually cut apart, step such as match, regular, identification.The sketch that the user draws can split into pel and relation thereof automatically, and the training of graphics template does not need user intervention, and template base is expanded easily, and adaptability is strong.But train template inevitably and all oneself attributes of definition of graphics test have been increased system overhead.
3, based on the identification of geometric properties
Method based on geometric properties is directly treated as recognition unit with figure, and the geometric properties that directly extracts sketch is used for classification.Usually its extraction that will cut apart different descriptive geometry features in some way earlier comes down to stroke information dimensionality reduction, but because the complicacy of figure, be difficult to the fixing complete information of the geometric properties reservation stroke of dimension, partial information has been lost in the dimensionality reduction process.Therefore, the geometric properties of the dissimilar figure of the not higher vision of accuracy of the method for geometric properties identification expresses possibility close.
Two, character string nuclear (Sting Kernel)
Kernel method is represented one group of relevant machine learning and data mining algorithm.The key components of kernel method are kernel functions, and kernel function can be measured the similarity of input data.Based on these kernel functions, can finish tasks such as classification, recurrence by support vector machine (SVM).
Character string is endorsed and handled the input data type is the data of character string, by the similarity of two input of character string of character string kernel function tolerance.
Character string endorse be divided into multiple, for example: spectrum-similarly, calculate the character string nuclear of the public substring of two input of character string, based on the character string nuclear of comparison, the character string nuclear that obtains by probability model.
Character string nuclear has been applied to fields such as protein homology detection, text classification.
Three, support vector machine (Support Vector Machine, SVM)
(Support Vector Machine SVM) is the famous system based on kernel method to support vector machine.
(support vector machine SVM) solves the new tool of machine learning problem to support vector machine by means of optimization method.It is introduced in the meeting of computer learning theory in 1992 and is entered the machine learning field by Vapnik and co-worker's invention thereof, has been subjected to afterwards paying close attention to widely.All obtain breakthrough aspect its theoretical research and the algorithm realization in recent years, and becoming the powerful measure that overcomes " dimension disaster " and tradition difficulties such as " crossing study ".The object that the theoretical system of SVM contains is very extensive, as dual representation, feature space, the theories of learning, optimum theory and algorithm etc.SVM has obtained reasonable application in fields such as text classification, handwriting recognition, image classification, bioinformatics.
This algorithm becomes sky at set L, stops when perhaps λ is enough little.
Summary of the invention
Based on above-mentioned prior art, the present invention proposes a kind of Freehand Sketchy Graphics Recognition based on character string nuclear, in conjunction with cartographical sketching identification, character string nuclear, these three prior arts of support vector machine, realize a kind of new Freehand Sketchy Graphics Recognition that can continuous training/accumulation.
The present invention proposes a kind of Freehand Sketchy Graphics Recognition based on character string nuclear, this method may further comprise the steps:
Step 1 is mapped as feature string with cartographical sketching, and the cartographical sketching of importing is carried out equidistant sampling, and the sampled distance threshold value is rule of thumb selected 5 pixels, and limits interior sketch of (0.7 second) continuous sampling of time threshold of sampling; The cartographical sketching of sampling is mapped to feature string, further comprising the steps of:
Get a positive integer n; The boundary rectangle of the cartographical sketching that calculating sampling arrives, and the boundary rectangle of sketch is divided into n
2Part; Each cuts apart little rectangle that the back obtains can use two-dimensional coordinate x, and y represents, 1≤x wherein, y≤n and x, y ∈ N; When the central point of and if only if a little rectangle had dropped in the zone that cartographical sketching surrounds, we thought that this little rectangle has been filled; It is n that a cartographical sketching has been mapped to a length
2Feature string;
Step 2 is checked sampling sketch training as training sample by support vector machine based on character string, obtains sorter, uses the lattice search to carry out the parameter tuning, selects best penalty factor C, gama, n; Character string nuclear is measured the similarity of these two character strings by the editing distance between two character strings; Described editing distance is that character string one is transformed to the used minimum character manipulation number of character string two;
Step 3, the sorter that obtains by training is classified to sketch to be identified and is discerned, and will blur, irregular cartographical sketching is mapped as accurate geometric configuration.
Described character manipulation comprises: delete character one; (2) insert character two; (3) character is changed into the operation of another character.
The penalty factor C of described the best, gama, the value of n is respectively: 5 pixels, 9 characters, 0.01 second.
The computing formula of described mapping process is as follows:
At first, defined function
f:N
2→{0,1,2,Λ,n}
Secondly, the process of calculated characteristics character string is as follows:
Initialize:x=1,y=1,n=k,?
S=″″;
For(x=1;x≤n;x++)
For(y=1;y≤n;y++)
{
Int?tmp=f(x,y);
S=strcat(S,tmp);
}
Output?S;
So far just a cartographical sketching being mapped to a length is n
2Feature string.
Compared with prior art, a kind of sketch recognition method of the present invention based on character string nuclear, the position of this method and sketch, size, the drafting mode is irrelevant, allows the user to carry out sketch drafting according to the mode of oneself being accustomed to.Sketch recognition method recognition accuracy based on character string nuclear is higher, and realizes simple.
Description of drawings
Fig. 1 is the leg-of-mutton feature string mapping of Freehandhand-drawing (n=5) synoptic diagram;
Fig. 2 is experiment graphical-set synoptic diagram;
Fig. 3 is cartographical sketching and recognition result figure thereof;
Fig. 4 is the overall flow figure of the Freehand Sketchy Graphics Recognition based on character string nuclear of the present invention.
Embodiment
At first the thought of filling based on the zone is mapped as feature string with cartographical sketching, secondly by support vector machine (Support Vector Machine, SVM) based on character string nuclear (String Kernelho) training sample is trained, obtain sorter, the sorter that obtains by training is classified to sketch to be identified and is discerned then, to blur, irregular cartographical sketching is mapped as accurate geometric configuration.
System of the present invention runs under the visual C++6.0 environment, based on the libsvm software package, uses C Plus Plus to develop.
At first, in visual C++6.0, set up the engineering of a single document view; The libsvm software package that has added character string nuclear (String Kernel) is transplanted in the engineering of foundation, is write the code of feature string mapping block.Prepare training data: gathered 1150 samples, wherein 1000 samples are as training data, and 150 as test data, and sketch is mapped as feature string, the feature string that obtains is write in the file preserve.Training classifier: use characteristic character string training classifier, and use the mode of lattice search to carry out parameter regulation, i.e. penalty factor C, gama, n regulates.
Be described in detail as follows:
One, the cartographical sketching sampling ﹠amp that imports as the present invention; Identification
What need to do in this step is, the cartographical sketching to be identified that samples is mapped as feature string, and the sorter that uses training to obtain is classified to it, thereby finishes the identification of cartographical sketching.
1) pre-service
Cartographical sketching to input carries out equidistant sampling, the sampled distance threshold value is rule of thumb selected 5 pixels, and interior sketch of (0.7 second) continuous sampling of the time threshold that limits sampling, order and direction no requirement (NR) that input is gesticulated, even the user start to write and last starting writing between interval greater than 0.7 second, then user's a last sketch end of input is thought by system, and begins to draw a new sketch.As the gap that pixel resamples, this variable-value rule of thumb obtains.
The cartographical sketching of sampling is mapped to feature string, may further comprise the steps:
At first get a positive integer n;
The boundary rectangle of the cartographical sketching that calculating sampling arrives, and the boundary rectangle of sketch is divided into n
2Part.Each is cut apart little rectangle that the back obtains and can use two-dimensional coordinate (x y) represents, 1≤x wherein, y≤n and x, y ∈ N.When the central point of and if only if a little rectangle had dropped in the zone that cartographical sketching surrounds, we thought that this little rectangle has been filled.The leg-of-mutton feature string mapping of Freehandhand-drawing as shown in Figure 1 (n=5Pixel) synoptic diagram (dash area is expressed as and is filled).The feature string S=0100002222033004400050000 of Freehandhand-drawing triangle mapping among the figure.The computing formula of mapping process is as follows:
At first, defined function
f:N
2→{0,1,2,Λ,n}
Secondly, the process of calculated characteristics character string is as follows:
Initialize:x=1,y=1,n=k,?
S=″″;
For(x=1;x≤n;x++)
For(y=1;y≤n;y++)
{
Int?tmp=f(x,y);
S=strcat(S,tmp);
}
Output?S;
So far we just a cartographical sketching to be mapped to a length be n
2Feature string.
Pre-service of the present invention comprises the sample collection of the cartographical sketching of various ways, and for example, the present invention has selected six kinds of common geometric figures such as straight line, triangle, rectangle, ellipse as the experiment graphical-set.
As shown in Figure 2, at above-mentioned graphical-set, in experiment, collect sample data.The user is free to the skeletonizing figure, 1000 of the total sample number that the present invention adopts.Be mainly used to of the influence of the different sample numbers of comparison to training time and recognition accuracy.
Two, examine training classifier by support vector machine based on character string
Use character string nuclear training classifier.Use the lattice search to carry out the parameter tuning, select best penalty factor C, gama, n.Character string kernel function in the native system is measured the similarity of these two character strings by the editing distance (EDITDISTANCE) between two character strings.
The definition of editing distance:
Suppose these 2 character strings of A and B.To character string A be converted to character string B with minimum character manipulation.Here said character manipulation comprises:
(1) character of deletion;
(2) insert a character;
(3) change a character into another character.
Character string A is transformed to the used minimum character manipulation number of character string B is called the editing distance of character string A to B, be designated as K (A, B).
Experimental result, as shown in Figure 3.And following experimental verification data are arranged:
The accuracy of identification of form 1 difformity figure and recall ratio (get n=9, C=5, gama=0.01)
The recognition accuracy of the cartographical sketching of form 2 on different training set scales (C=5, gama=0.01)