WO2016075274A1

WO2016075274A1 - Methods, systems and apparatus for image recognition based on recursively determined exemplar-support vector machines (e-svm) features

Info

Publication number: WO2016075274A1
Application number: PCT/EP2015/076522
Authority: WO
Inventors: Joaquin ZEPEDA SALVATIERRA; Patrick Perez
Original assignee: Thomson Licensing
Priority date: 2014-11-14
Filing date: 2015-11-13
Publication date: 2016-05-19
Also published as: WO2016075293A1

Abstract

An aspect of present principles relates to methods and apparatus for image searching. The method may include determining at least an RE-SVM feature based on a recursive exemplar support vector machine feature of an order n (RE-SVM_n) and performing image searching by applying the determined RE-SVM feature to a plurality of features of search images. The apparatus may include a processor. The processor may be configured to determine at least an RE-SVM feature based on a recursive exemplar support vector machine feature of an order (RE-SVM_n). The processor may further be configured to perform image searching by applying the determined RE-SVM feature to a plurality of features of search images. The RE-SVM_nfeature is determined based on at least an initial positive feature and at least an initial negative feature when n equals one. The RE-SVM_n feature is determined based on at least a positive RE-SVM_n-1 feature and at least a negative N-1 RE-SVM feature when n is greater than one.

Description

METHODS, SYSTEMS AND APPARATUS FOR IMAGE RECOGNITION BASED ON RECURSIVELY DETERMINED EXEMPLAR-SUPPORT VECTOR MACHINES (E-SVM) FEATURES

TECHNICAL FIELD

The present disclosure relates to image recognition or searching. More precisely, the present disclosure relates to image searching based on Exemplar Support Vector Machines (E- SVM).

BACKGROUND

Various computer or machine based applications offer image searching abilities. An image search is a search for an image or images. The input to the image search may be an image or images. Image searching is relevant in many fields, including computer vision, object recognition, video tracking, and video-based location determination/mapping.

Fig. 1 illustrates a method 100 for performing image searching. The method 100 includes receiving query image(s) 1 10 that will be the basis of the image search; extracting image features 120 from the query image(s); and performing image searching 130 by using the image features from block 120. Block 130 may perform image searching based on any technique, including image retrieval and semantic searching techniques.

Fig. 1A illustrates an example of an image retrieval search technique. An image retrieval search generally locates images that have the same scene as a query image. The located same-scene images may even have scene alterations resulting from task-related transformation(s) (e.g., changes in scene illumination, image cropping, scaling, wide changes in the perspective of the camera, high compression ratios, or picture-of-video-screen artifacts).

Query image features may be received at block 141. Block 142 may perform image retrieval based on image features received from block 141 and image features received from a large database of features 145. Block 142 may compare the image features received from block 141 with the image features received from the large database of features 145. The large database of features 145 may consist of image features extracted from images stored in a database of search images 143. Block 144 may extract features of each image in the database of search images 143. The large database of features 145 may store the results determined by block 144. Blocks 143-145 may be computed offline or prior to the execution of blocks 141, 142, and 146.

In one example, block 142 performs image retrieval based on a distance between the features from block 141 and the features of the large database from block 145. For example, block 142 may calculate the Euclidean distance between a query image feature from block 141 and each of the features from block 145. Block 142 may choose the images from database 143 that correspond to the smallest distances. Block 142 may provide the search results to block 146. Block 146 may output the search result(s). If multiple results are returned, the results may be ranked and the ranking may be provided along with the results.

Fig. IB illustrates an example of a semantic search. The semantic search searches images for visual concepts representing a search query (e.g., cats). Block 151 receives features of positive images (i.e., features of images that contain the queried visual concept (e.g., cats)). Block 152 receives features of negative images (i.e., features of images that do not contain the queried visual concept (e.g., no cats, instead contain dogs, other animals or no animals)). Block 153 may perform classifier learning to determine a classifier. Block 153 performs classifier learning based on the positive features from block 151 and the negative features from block 152. Block 153 may choose a classifier that best differentiates the positive features from the negative features. In one example, the classifier learning may be an SVM algorithm. Alternatively, the classifier learning may be other learning techniques.

Block 155 may perform image searching based on the classifier from block 153 and the features from the large database 154. The large database of features 154 may consist of feature vectors of images that will be searched. For example, the large database of features 154 may be determined in accordance with the principles described in connection with block 145 of Fig. l a. Block 155 may apply the classifier from block 153 to each of the features from block 154. Block 156 may output the result(s) from block 155. If multiple results are returned, the results may be ranked and the ranking may be provided along with the results.

An E-SVM is a classifier that is a linear Support Vector Machine learned from a single positive example, referred to as the exemplar. The exemplar can be determined based on the positive example and a set of generic, negative examples N. For example, Malisiewicz et al, Exemplar-SVMs for Visual Object Detection, Label Transfer and Image Retrieval, International Conference of Machine Learning, 2012 discuss E-SVMs.

Present E-SVM search techniques require an inefficiently large set of generic, negative examples Af (e.g., a set of one million feature vectors). The large set is inefficient during data computations. Also, present E-SVM search techniques are limited to using an E-SVM representation as a classifier.

Hard-negative mining is a process of (i) selecting subset of generic negative features Af (hereinafter referred to as the "hard-negative cache"); (ii) training a classifier based on the hard- negatives cache; and (iii) increasing the hard-negative cache based on items from Af that have a classification score inside a margin (i.e., greater than - l). However, E-SVM searching based on hard-negative mining is still a highly complex and inefficient process because it still relies on a large set of generic negatives.

Present E-SVM techniques also have limited generalization because, despite the size of the negative set, they utilize a single positive exemplar. Present E-SVM determinations rely on asymmetric determinations. Image retrieval using the asymmetric approach has many shortcomings, including sub-par performance relative to features tailored for the image retrieval task. The asymmetric approach has not been considered for semantic search because E-SVMs are tailored for data-constrained scenarios, and many positives are generally available in semantic search. The asymmetric approach relies on a very large generic negatives pool, and suffers from limits in generalization that must be overcome by deriving extra positives from the original exemplar.

SUMMARY OF INVENTION

There is a need for a novel application of E-SVM the shortcomings listed above.

An aspect of present principles relates to methods and apparatus for image searching. The method may include determining at least an RE-SVM feature based on a recursive exemplar support vector machine feature of an order n (RE-SVM_n) and performing image searching by applying the determined RE-SVM feature to a plurality of features of search images. The apparatus may include a processor. The processor may be configured to determine at least an RE-SVM feature based on a recursive exemplar support vector machine feature of an order n (RE-SVMn). The processor may further be configured to perform image searching by applying the determined RE-SVM feature to a plurality of features of search images. The RE-SVMn feature is determined based on at least an initial positive feature and at least an initial negative feature when n equals one. The RE-SVMn feature is determined based on at least a positive RE-SVMn- 1 feature and at least a negative N-l RE-SVM feature when n is greater than one.

The RE-SVM feature may represent at least a query image when the image search is an image retrieval search. The RE-SVM feature may represent at least a positive image when the image search is a semantic search. The semantic search may include determining at least a second RE-SVM feature representing at least a negative image.

The features of the search images may be RE-SVMn features. The RE-SVM feature may be determined based on n-iterations of performing an E-SVM function based on the at least a positive RE-SVMn-i feature and the at least a negative RE- SVM_n-i feature. The at least a negative RE-SVMn-i feature may be determined by performing an E-SVM function based on a corresponding RE-SVMn-2 feature and at least another RE-SVMn-2 feature. The at least a negative RE-SVMn-i feature may be determined offline.

The image searching may be performed based on the RE-SVMn feature. The image searching may include determining a classifier based on the positive RE-SVMn feature and negative RE-SVMn feature. A score may be determined for each search image when the classifier is applied to that search image. The score may be determined by applying the classifier to RE-SVM features of the search images.

A local descriptor may be determined for the RE-SVMn feature. The image search may be performed based on an aggregation of local descriptors.

SUMMARY OF THE DRAWINGS

The features and advantages of the present invention may be apparent from the detailed description below when taken in conjunction with the Figures described below:

FIG. 1 illustrates a flow diagram of a method for image searching.

FIG. 1 A illustrates a flow diagram of a method for an image retrieval search.

FIG. IB illustrates a flow diagram of a method for a semantic search.

FIG. 1C illustrates an example of an illustration of a graphical representation of an E-

SVM.

FIG. 2 illustrates a flow chart of an exemplary method for determining E-SVM feature(s).

FIG. 3 illustrates a flow chart of an exemplary method for determining an RE-SVM feature(s).

FIG. 4 illustrates a flow chart of an exemplary method for semantic search using RE- SVM features.

FIG. 5 illustrates a flow chart of an exemplary method for extracting RE-SVM local descriptors.

FIG. 6 illustrates a flow chart of an exemplary method for accelerated E-SVM determinations.

FIG. 6A illustrates a flow chart of an exemplary method for centroid determination.

FIG. 6B illustrates an exemplary diagram of a plot of similarity clustering.

FIG. 7 illustrates a block diagram of an exemplary image processing device.

FIG. 8 illustrates a block diagram of an exemplary distributed image processing system. FIG. 9 illustrates an exemplary apparatus.

FIG. 10 illustrates an exemplary plot diagram of experimental results,

FIG. 11 illustrates an exemplary plot diagram of experimental results,

FIG. 12 illustrates an exemplary plot diagram of experimental results,

FIG. 13 illustrates an exemplary plot diagram of experimental results,

FIG. 14 illustrates an exemplary plot diagram of experimental results,

FIG. 15 illustrates an exemplary plot diagram of experimental results.

DETAILED DESCRIPTION

As used herein, an "E-SVM function" refers to a determination of linear classifiers based on one or more positive and negative examples. The E-SVM function may perform a learning determination based on the positive examples of a class (i.e., examples that represent the class; for example, for the class "cats", the set of examples may consist of images of cats) and negative examples (i.e., examples that do not represent a class). As described herein, a learning algorithm may choose a function that best differentiates positive examples from negative examples.

As used herein, an "E-SVM feature" refers to the output of the E-SVM function.

As used herein, an "RE-SVM function" refers to a recursive framework for determining RE-SVM features in accordance with the principles herein. The recursive framework may determine an initial set of E-SVM features based on image feature(s) of a query image and image features of generic negative images. The recursive framework may determine an initial RE-SVM feature by applying an E-SVM function to E-SVM features of query image(s) and E- SVM features of generic negative images. The recursive framework may then recursively determine RE-SVM features by repetitively applying the E-SVM function based on a prior repetition's determination of RE-SVM query features and features of generic negative images. As used herein, an "RE-SVM feature" refers to the output of the RE-SVM function.

The scalars, vectors and matrices may be denoted using, respectively standard, bold, and uppercase-bold typeface (e.g., scalar a, vector a and matrix A). The notation v_k may denote a vector from a sequence v_{l t}v₂, . . . , v_N, and v_k may denote the A-tli coefficient of vector v. The notation [a*]* (respectively, [<¾]¾) may denote concatenation of the vectors a_fc (scalars <¾) to form a single column vector. Finally, | may denote the Jacobian matrix with (a, j)-th entry fjk An aspect of present principles is directed to mapping E-SVM features to RE-SVM feature(s) based on an E-SVM function. An aspect of present principles is directed to determining RE-SVM feature(s) during a post-processing stage. In one example, enhancement of image searching and image classification may occur based on E-SVM features obtained by concatenating the activations of Deep Convolutional Neural Networks (DCNNs). An aspect of present principles is directed to automatically determining an RE-SVM feature that contains unique information of a query image relative to information of generic, negative images. The automatic extraction of the unique information is more efficient than feature extractors that rely on manual encoding of uniqueness into a feature extraction process.

An aspect of present principles is directed to image searching based on symmetrically determined E-SVM features. As used herein, "symmetric" determinations or "symmetrically" determined refers image searching based on E-SVM features of query images (i.e., positive images), E-SVM features of generic negative images, and E-SVM features of images that will be searched. These symmetric determinations improve robustness and efficiency because the E-SVM function has been applied to the images and this function determines, from the feature representing each image, a representation that separates the image from a large generic negative pool of features.

An aspect of present principles is directed to a generic, -dimensional image feature encoder. The image feature encoder may be a global encoder, may be based on aggregated local features, or may be derived from CNNs-based features. The features may be denoted by vectors in a space R^D. The parameter ff^Dmay represent the space of all possible vectors having D, real-valued entries.

An aspect of present principles provides a novel solution to the problem of complexity of generic negative image sets with a large amount of images. An aspect of present principles reduces complexity based on clustering of the generic negative features. The clustering may consist of deriving centroids. Each centroid may represent a subset of the large set of generic negative images. Each centroid may be associated to a subset of generic negative features that are closest to that centroid. That is, the generic negative features of a subset are the generic features that are closer to that centroid than to any other centroid. The distance may be determined by maximizing the inner-product similarities between the generic negative features of the subset and the centroid.

An E-SVM feature in accordance with present principles may be determined based on an optimization determined as follows:

Equation No. (1) ²

The parameters λ, αι, and α ι may have positive, scalar values. The parameters λ, αι, and on may control regularization and relative weight of positive and negative features. Each of the parameters λ, αι, and on can be manually chosen or can be automatically determined. The parameter x may be a feature vector that relates to a query image, a generic negative image, or a search image. In one example, feature vector x may be determined using an encoder such as a VLAD, Fisher Vector, CNN activation coefficients, or another E-SVM feature. The parameter M may represent a set of generic feature vectors (e.g., of a size N). Parameter w is the parameter that is being optimized under Equation No. (1). The parameter zi is negative feature that is part of the set of .A/negative features. The set of generic feature vectors N may be a large set of vectors.

The optimization of Equation No. (1) may determine the best (i.e., the optimal) vector w. The optimization may be based on stochastic gradient descent. The stochastic gradient decent may be determined using a Pegasos algorithm. The stochastic gradient descent may process a feature vector of parameter x or parameter zi The feature vectors x and zi may be processed in a random order. The parameters ai and a-i may be determined during the stochastic gradient descent algorithm. The optimization of Equation No. (1) may be an SVM problem relying on hinge loss, where the amount of features in the positive set is unbalanced relative to the amount of features in the negative set. For example, there may be one positive feature and a million negative features.

In one example, an E-SVM feature may be determined based on a έ·₂ normalization of the result of Equation No. (1). In one example, the normalized E-SVM feature may be determined as follows:

, ,,, v(x,M)

w ( χ, Λ ) =— ~— ~—

Equation No. (2) " "

The parameters x and may have the same values as parameters x and λί of Equation

No. (1).

In other examples, other types of normalization may be used such as h normalization or other type of post-processing may be used such as zeroing of negative coefficients or signed square roots of coefficients. An E-SVM feature in accordance with present principles may be determined based on an optimization determined as follows:

u(x, ). b argniin |w'|²+Ci max(0, ί—π^Ί w'~i»)+rr?r Y max(0, l+z^Tw'+b), Equation No. (3) ^' w.t ^■ pv | _a6Af

The parameters x and Jsf may have the same values as parameters x and Jsf of Equation No. (1). The parameter z may denote a negative feature in a set of negative features Jsf. The parameters Ci, C2 are regularization parameters. In one example the parameters Ci, C2 may be customized empirically. The parameters w' and b denote the parameters over which there is optimization or minimization.

In one example, an E-SVM feature may be determined based on a ¾ normalization of the result of Equation No. (3) as follows:

, , .. u(x, AO

w(x. JM J = r^;' "-;7;-r

Equation No. (4) ^' |Μ(Χ, Λ^Γ) |

In other examples, other types of normalization may be used such as h normalization or other type of post-processing may be used such as zeroing of negative coefficients or signed square roots of coefficients.

An aspect of present principles is directed to determining RE-SVM feature(s) based on E-SVM features, including E-SVM features determined in accordance with the principles described in connection with Equation Nos. (l)-(4). In one example, the RE-SVM feature(s) may be determined based on an E-SVM function that utilizes previously determined E-SVM features of query, training and/or search images. In one example, a feature encoder may encode E-SVM features instead of the image feature representations. For example, if x is a feature vector encoded by a feature encoder, the feature encoder may instead encode an E-SVM feature (e.g., w(x,A ).

An aspect of present principles is directed to E-SVM functions based on symmetrically determined E-SVM features. The symmetric approach includes determining E-SVM features for a query image x°. The symmetric approach further includes determining corresponding E- SVM features for each image x in a database V. The database D may consist of images that will be searched.

The E-SVM feature w° for a query image x° may be represented by:

Equation No. (5) w° = w(x°, .V) The set of E-SVM features Wi corresponding with image xi in a database P of search images may be represented by:

Equation No. (6) {w, = w(x_itAi))}i

During image searching, search images ¾ or E-SVM features Wi corresponding may be sorted based on their similarity to the query image. In one example, similarity may be measured between an E-SVM feature w° derived from a query image feature x° and an E-SVM feature Wj derived from a database image feature xj as follows:

T o

Equation No. (7) w, w

The parameter wy represents the ESVM feature of a database image xy. The parameter w° represents the ESVM feature of a query image. The parameter Sj may represent a similarity score of the image feature of the query image and the jth search image. Because each feature w is uniquely associated with a single image, the parameter Sj may also represent a similarity score between the query image and the database image xj. A high similarity score may indicate a high similarity between the images, whereas a low similarity score may indicate a low similarity between the images. In one example, the score may be determined based on the inner product between wj and w°. In another example, the score may be determined based on a negated Euclidean distance (e.g.,— | wj - w₀ |₂).

Image searching based on E-SVM or RE-SVM features includes learning a classifier. A classifier based on E-SVM or RE-SVM features may be learned based on an optimization as follows:

„ JV

argiain M² + ¾· max(0, 1 - _Vi(w v + b))

Equation No. (8) ^{v b} i=i

The annotated training set {_j/i . i }^ may consist of labels Vector v represents a classifier v. Parameter C represents a scalar chosen empirically and controlling the regularization. Parameter N represents the size of the training set. Parameters Wi represent the E-SVM features of the training set images and b represent the bias or offset between the origin and the hyperplane orthogonal to v.

In an image search based on a symmetric determinations, E-SVM or RE-ESVM features will be determined for the query image, the negative images and the search images. The image classifier v is determined based on the training (positive and negative) images. The image classifier v is applied to a database of searched images that are also represented by E-SVM or RE-SVM features.

In one example, a classifier may be learned from the RE-SVM representations of training images. For example, the training image(s) may be utilized to determine a K-nearest- neighbor classifier, or a nearest-class-mean classifier. The RE-SVM features may be determined by recursively applying an E-SVM function to E-SVM or RE-SVM features (e.g.,

In one example, initial E-SVM features may be represented by:

Equation No. (9a) w° = x

Equation No. (9b) M° = M^'

The parameter w° may represent the initial feature of an image. The image can be any image, including a query, database, or training image. The parameter Λ*^"0 may represent the initial feature representations of generic, negative images. In one example, a first RE-SVM (RE-SVMi) feature is determined based on an E-SVM function applied to initial features. In one example, a j-th recursion RE-SVM (RE-SVMj) feature may be determined as follows:

Equation No. ( 10) ^w = ^w( ^{wJ" 1} - ¹ ) .

Equation No. (10a) where M^j = {w(z, ^i→), z€ Λ^¹ }

The value of j may be equal to or greater than 1. The -th RE-SVM (or the -th E-SVM) may be determined in accordance with the principles described with Equations Nos. (l)-(4). The E-SVM function in Equation Nos. (10) and (10a) may be derived according to Equation Nos. (l)-(4).

In one example, the RE-SVM feature construction may be similar to deep architectures that use an output of a given layer as the input of a subsequent layer. However, an aspect of present principles, unlike deep architecture, determines E-SVM or RE-SVM features using an unsupervised learning routine on a per-image basis. As used herein, unsupervised may indicate automatic, machine learning without user annotations to a training set. Additionally, each j-th RE-SVM feature (RE-SVMj) may be determined by a single, non-linear, and/or convex problem, as opposed to a standard tandem linear/non-linear arrangement.

In one example, local descriptors may be extracted from images. The local descriptors may be extracted for the query image, the negative images, the database images that will be searched, or any other images that are used to learn classifiers. The descriptors may be aggregated based on a descriptor aggregating method, such as VLAD, the Fisher Vector, or Bag-of-Words to produce a single feature vector.

Since E-SVMs are agnostic to the underlying feature representation, it is possible to compute an E-SVM from each local descriptor by using an adequate set M of generic local descriptors, and further to do so recursively as in Equations (10) and (10a). The resulting RE- SVM local descriptors can subsequently be used to build aggregated features using the standard methods previously mentioned such as VLAD, the Fisher Vector or Bag-of-Words. In turn, the subsequent features can be further processed using RE-SVM learning functions as discussed in connection with Equation Nos. (10) and (10a).

Accelerating E-SVM learning using similarity clustering

An aspect of present principles is directed to clustering generic, negative image features to reduce complexity. In one example, computational complexity of E-SVM feature determinations is decreased despite the existence of a large amount of generic, negative images. In one example, the amount of generic negative image features is reduced based on corresponding centroid features. This reduction results in reduced storage and complexity. In one example, this decrease is achieved by deriving a smaller amount of representative centroids from a large amount of generic, negative features. As used herein, a "centroid" may refer to a vector that represents a subset of features. For example, the centroid may represent the mean of the subset, the medoid of the subset (i.e., the subset feature closest to the mean) or a singular vector or eigenvector derived from the subset. The centroids may replace the large amount of generic features, thereby resulting in a smaller training set for an E-SVM function, and hence a faster learning time. The amount of centroids needs to be chosen empirically, as using too few centroids may result in a non-representative generic negative pool.

In one example, the centroids may further have associated auxiliary data. In one example, the auxiliary data may be a weight that is associated with each centroid. In one example, the weight may represent the number of generic negative image features that are part of a corresponding centroid. In another example, the auxiliary data may consist of statistics derived from the generic negative image features of the corresponding centroid. The statistics may include the mean and correlation matrix of the generic negative image features of the corresponding centroid. In one example, the centroids may be determined based on a clustering approach that is applied to features of generic, negative images. In one example, the clustering approach may be applied to multi-modal distribution of generic images' features. In one example, a clustering approach may locate the centroids of the multi-modal distribution. The large amount of negative image features may then be replaced by a smaller set of centroids, weighted by their importance. In one example, the centroids may be determined based on:

Equation (1 1):

where Mk = {z' G M, argmax z'¹ c¾ = k}

Equation (1 1 a):

The parameters Ck and c\ are centroids. The parameter K is the number of learned centroids. The paramctcr.V is the generic negative pool. The parameter M_k is the subset of features from the generic negative pool that correspond to the centroid Ck. The subset of features M_k may be assigned to centroid Ck because they are closer to centroid Ckthan any other centroid. The parameters z and z' are feature vectors from the generic negative pool. Each centroid e¾ vector may be determined based on a first left-singular vector of a matrix derived by stacking the vectors for z e M_k as columns.

In one example, the process for clustering features to determine centroids may include: i) updating each ¾ in accordance with Equation No. (1 1), ii) updating cluster information with a determination of whether a generic negative feature should be assigned to that cluster, and (iii) repeating (i) and (ii) recursively. Unlike dictionary learning for sparse coding, the present clustering approach includes clustering that is sensitive to the sign of the inner product z'^Tc/ (compare with dictionary learning where the cluster assignment is done using the absolute value, e.g., |z'^Tc; |). In one example, the process for clustering or centroid determination may be performed by block 603 of Fig. 6 or blocks 658 and 661 of Fig. 6A. After all the centroids c_k are determined, each feature z may be assigned a centroid. In one example, the feature z may be assigned to a centroid based on the distance between the feature z and the centroid.

After the features have been assigned to centroids, an E-SVM function may be performed based on the centroids of the generic negative images instead of the features of the generic negative images. For example, E-SVM features may be determined by applying an E- SVM function to centroids as follows: u(x, A0, b = argmin | '|²+0i max(0, l-x^Tw'-&)+rj r Ύ A¾ max(0, l+cJw'+6). Equation 12 ^w'-^fc <^{Λ 1} ¾=ι

As shown above, Equation No. (12) corresponds to a modification of Equation No. (1).

The parameter c_fc may correspond to a kth centroid. The parameter J\¾ denotes the number of vectors in Af assigned to e¾,. Letting |N| denote the number of feature vectors inside the generic negatives N, the computation time required to solve the problem in Equation No. (12) is the same as that required when using only K < \ \ generic negative features.

Fig. 2 illustrates a flow chart of a method 200 for extracting E-SVM features. Block

201 may receive at least an image. In one example, block 201 may modify the image to include at least a mirrored, flipped, cropped, shifted and scaled version of the original image.

Alternatively, block 201 may receive a plurality of images of a queried scene (e.g., a plurality of images of the same scene of the statue of liberty).

Block 202 may determine an initial feature vector for the image received at block 201.

The initial, base feature vector may be any type of feature vector, including VLAD, BoW, Fisher, and deep CNN activations. The initial feature vector of the image may also be known as the positive feature.

Block 203 may receive a set of generic images. In one example, the generic images may comprise the generic negative set as discussed in connection with Equation Nos. (1) and

(3). The amount of generic negative set images may be large enough to allow for a reasonable computational time on a target CPU (e.g., a per-image E-SVM feature construction time of about 1 second, corresponding to about 100,000 negative images). In one example, the content of the generic negative images varies between each negative image.

Block 204 may determine initial feature vectors the generic, negative images received from block 203. For example, block 204 may determine an initial feature vector for each image in the set of generic images. The feature vectors determined by block 204 may be the same type of feature vectors as the feature vector determined by block 202. Block 204 may provide the determined feature vectors, i.e., the generic negative features, to block 205.

Block 205 may determine an E-SVM feature 210. Block 205 may determine the E-

SVM feature 210 based on the initial feature vector from block 202 and the initial generic negative feature vectors received from block 204. Block 205 may include one or more of blocks

206, 207 and 208. In one example, block 205 may combine or omit the functionality described in connection with one or more of blocks 206, 207 and 208. Alternatively, the functionality of blocks 206-208 may be combined and performed by block 205. In one example, block 205 may perform a classifier determination other than the SVM determination. In one example, block 205 may perform additional post-processing scheme to the determined exemplar feature.

Block 206 may perform an SVM determination based on the initial feature from block 202 and the initial generic negative features from block 204. Block 206 may perform an SVM or E-SVM function. Block 206 may determine a vector u and a bias parameter b. The vector u may be orthogonal to a hyper-plane that separates generic negatives from an exemplar feature(s). The bias parameter b may specify a displacement of the hyper-plane along the orthogonal. In one example, block 206 may determine the orthogonal vector u and the bias parameter b in accordance with the principles described in connection with Equation Nos. (1) and (3).

Block 207 may process a vector u from block 206. Block 207 may normalize the vector u to determine parameter w. Block 207 may optionally post-process the vector u or parameter w. For example, block 207 may perform post-processing such as a normalization under a 1-2 normalization (e.g., in accordance with the principles described in connection with Equation Nos. (2) and (4)). In another example, the processing may be squared root related to the Hellinger Kernel. In another example, the processing can be a sequential application of a plurality of post-processing solutions.

Block 205 may output an E-SVM feature 210. The E-SVM feature 210 may correspond to the E-SVM feature illustrated in Fig. 1C. Alternatively, the E-SVM feature 210 may be a post-processed version of the E-SVM feature illustrated in Fig. 1C.

Fig. 1C illustrates an orthogonal vector « 174 and an E-SVM feature w 170. An SVM algorithm may determine an orthogonal vector u 174 and a bias b parameter 173. The orthogonal vector u 174 may be determined as discussed in connection with blocks 205 and 206 of Fig. 2. The orthogonal vector « 174 may be orthogonal to a separating hyper-plane 172. The orthogonal vector u 174 may be determined based on an exemplar feature x that may represent a queried image or a positive image concept. The orthogonal vector u 174 may be normalized to produce the E-SVM feature w 170. For example, the orthogonal vector u 174 may be normalized as described in connection with block 207 of Fig. 2. The parameter u and/or the E- SVM feature w 170 may be determined based on negative vectors zi 175. In one example, negative vectors zi may be determined in accordance with the principles described in connection with block 204 of Fig. 2. The negative vectors zi 175 may be selected from a pool of generic features 176.

Fig. 3 illustrates a flow chart of a method 300 for determining RE-SVM feature(s) in accordance with present principles. Block 301 may receive one or more image(s).

Block 302 may determine an initial feature for the image from block 301. The initial feature vector of the image may be known as the initial positive feature(s). In one example, block 302 may determine the initial feature vector in accordance with the principles described in connection with block 202 of Fig. 2.

Block 303 may receive a set of generic images. The generic images may be negative images determined in accordance with the images described in connection with block 203 of Fig. 2. Block 304 may determine initial negative features for the generic, negative images from block 303. Block 304 may determine initial negative features in accordance with the principles described in connection with block 204 of Fig. 3. In one example, block 304 may be performed off-line or prior to the performance of method 300.

Block 305 may determine a first, positive RE-SVM feature (RE-SVMi) based on the positive feature(s) from block 302 and the negative features from block 304. Block 305 may determine the RE-SVMi feature based on an SVM determination, e.g., by applying an E-SVM function to the positive and negative features. Block 305 may perform an SVM determination as discussed in connection with Fig. 2 (e.g., blocks 205-207) and Fig. 1C. Block 305 may determine the RE-SVMi feature as described in connection with Equation Nos. (1) to (4).

Block 306 may determine first negative RE-SVMi features Λί^ι corresponding to the generic image features from block 304. In one example, block 306 may determine the first generic negative RE-SVMi features in accordance with the principles described in connection with Equation Nos. (l)-(4), (10), (10a). For example, block 306 may use the output of block 304 to replace the set of generic features discussed in connection with Equation Nos. (1)- (4), (10), (10a). In one example, block 306 may use the output of block 304 as a generic negative pool Af° and then may use and each feature in the pool of features JV° to determine a corresponding RE-SVM feature while using all or some of the features of the pool Λ^*0 as a generic negative pool. The RE-SVM features may be determined as described in connection with Equation Nos. (l)-(4), (10), (10a). In one example, block 306 may be optional and the image features from block 304 may be provided directly to block 307. In one example, block 306 may be performed off-line or prior to the performance of method 300. Block 307 may determine a second, positive RE-SVM feature (RE-SVM2). Block 307 may determine the RE-SVM2 feature based on the first, positive RE-SVMi feature and the first negative RE-SVMi features Λί¹. Block 307 may determine the RE-SVM2 feature based on an SVM determination. The SVM determination of block 307 would consider second, positive RE-SVM2 feature as the positive exemplar and first negative RE-SVM features λί¹ as the negative exemplars. Block 307 may perform an SVM determination as discussed in connection with Fig. 2 (e.g., blocks 205-208) and Fig. 1C. Block 307 may determine the RE-SVM2 feature as described in connection with Equation Nos. (1-4, 10, 10a).

Block 309 may determine an Nth positive RE-SVMN feature. Block 309 may recursively determine the RE-SVMN feature based on an RE-SVMN-I positive feature and RE-SVMN-I negative features. The N-1 RE-SVMN-I negative features may be determined by block 308. The N-1 RE-SVMN-I negative features may be determined based on the N-2 RE-SVMN-2 negative features. Each N-1 RE-SVMN- 1 negative feature may be determined in accordance with the principles described in connection with Equation Nos. (1-4, 10, 10a). Each N-1 RE-SVM negative feature is computed by using the corresponding N-2 RE-SVMN-2 negative feature as a positive feature and using all other N-2 RE-SVMN-2 features as negative features. In one example, block 308 may be optional and the image features from blocks 304 or 306 may be provided directly to block 307. In one example, block 308 may be performed off-line or prior to the performance of method 300.

Block 309 may determine the final, positive RE-SVMN feature 310 based on N iterations of the recursive framework. Block 309 may determine the RE-SVMN feature 310 based on a positive RE-SVMN- 1 and N-1 negative RE-SVM features. For example, when N=2, the output RE-SVMN feature 310 is RE-SVM2 from block 307. In another example, when N=5, RE-SVM feature 310 may be RE-SVM5 which may be determined based on a previously determined RE- SVM₄ feature and previously determined fourth negative RE-SVM features for generic, negative images.

Block 309 may determine the RE-SVMN feature 310 based on an SVM determination. Block 309 may perform an SVM determination as discussed in connection with Fig. 2 (e.g., blocks 205-208) and Fig. 1C. Block 309 may determine the RE-SVMN feature as described in connection with Equation Nos. (l)-(4), (10), and (10a). The RE-SVM determination may be repeated N times to determine the RE-SVMN feature.

Fig. 4 illustrates an exemplary image search based on RE-SVM features. Block 401 may receive positive images (e.g., one or more query images). The positive images may represent a visual concept (e.g., cat). Block 402 may determine positive RE-SVM features for the positive images from block 1. Block 402 may determine positive RE-SVM features in accordance with the principles described in connection with Equation Nos. (l)-(4) and Fig. 3.

Block 403 may receive negative images. The negative images may be counter examples of the visual concept (e.g., images not containing cats). In one example, the negative images may be images a set of generic, negative images. Block 404 may determine negative RE-SVM features for the negative images from block 403. In one example, block 404 may determine negative RE-SVM features in accordance with the principles described in connection with Equation Nos. (l)-(4), (10) and (10a) and Fig. 3.

Block 405 may learn a classifier based on the negative and positive RE-SVM features received from blocks 402 and 404, respectively. The learned classifier can be an SVM classifier, or a KNN classifier, or a nearest class mean classifier, or any other type of classifier.

Block 406 may determine image search results based on the classifier from block 405 and features from a large database of image features 407. The large database of image features 407 may contain the features of the images that will be searched. The large database of image features 407 may be determined in accordance with the principles described in connection with blocks 143-145 of Fig. la and block 154 of Fig. lb. Block 406 may determine a score for each image by applying the classifier to each of the search image features. In a symmetric determination, the search features of block 407 may be RE-SVM features determined in accordance with the principles described in connection with Fig. 3 and Equation Nos. (10)- (10a). The search results may be ranked based on the score and the ranking may be provided along with the results.

Fig. 5 illustrates a flow chart of a method 500 for extracting RE-SVM local descriptors. Block 501 may extract N local descriptors from an input image. Block 502 may extract many local descriptors from a generic pool of images. Block 503 may compute RE-SVM local descriptors based on the local descriptors extracted at blocks 501 and 502. Block 504 may output the results of block 503. The output of block 504 may be provided to a local descriptor encoder, such as a VLAD, Fisher or Bag of Words local descriptor encoder. The Ν local descriptors provided by block 504 may be aggregated into a single global image feature. The global feature may represent features of a query, positive, negative, training, search or any other image and may be used during image searching. Fig. 6 illustrates a flow chart of a method 600 for accelerated E-SVM or RE-SVM feature determination. Method 600 may include a block 601 which may receive features of an image. In one example, block 601 may receive the image and may determine the image features. In one example, the features may be E-SVM or RE-SVM features as discussed in connection with Figs. 2 and 3. Alternatively, the features may be initial image features or any other type of features. Block 601 may determine the E-SVM or RE-SVM image features as discussed in connection with methods 200 and 300 of Figs. 2 and 3, respectively. In another example, the features may be E-SVM or RE-SVM features derived from local descriptors. Block 601 may provide the image features to block 604.

Block 602 may receive a large set of features of negative, generic images. The large set of features may have a size that is a multiple of the size of the feature vector (e.g., ΙΟχ, lOOx, lOOOx). Each feature vector may have any size, including in the range of 100 to tens of thousands. The total size of the large set may be very high, including over 100,000 or over a million features.

In one example, block 602 may receive features of generic images as described in connection with blocks 303 and 304. In another example, block 602 may determine the features of generic images. The features of the generic images may be initial, base features, E-SVM features or RE-SVM features. In another example, the features may be E-SVM or RE-SVM features derived from local descriptors. Block 602 may provide the features of the generic images to block 603.

Block 603 may perform similarity clustering for the generic features from block 602. In one example, block 603 may perform cosine-similarity clustering. In one example, block 603 may perform similarity clustering in accordance with the principles described by Equation Nos. (1 1) and (11a) and output the resulting centroids Ck.

In one example, block 603 may perform similarity clustering in accordance with the principles described in connection with Fig 6a. In one example, block 603 may determine each centroid by: i) initializing clusters , and ii) updating each centroid c_k and each corresponding cluster , and (iii) repeating (i) and (ii) recursively. Unlike dictionary learning for sparse coding, the present clustering approach includes clustering that is sensitive to the sign of the inner product x^rrc_t (compare with dictionary learning where the cluster assignment is done using the absolute value, e.g., Iz'^Tci (). Block 603 may optionally perform the clustering offline before any image searching is carried out. For example, the processing associated with block 603 may be performed prior to image searching and the results may be stored.

Block 604 may perform the E-SVM function using as generic negatives the centroids Ck from block 603. For example, the centroids ck may be utilized as the generic negative features are typically utilized by an E-SVM function. In one example, block 604 may perform the E- SVM function by further considering auxiliary data for each centroid.

In one example, block 604 may perform an E-SVM function based on the centroids from block 603 as the generic negative features. For example, block 604 may perform an E-SVM function as described in accordance with the principles of block 205 of Fig. 2. In another example, block 604 may perform the RE-SVM determination according to blocks 305, 307, or 309 of Fig. 3.

In one example, the auxiliary data may be represent the number of generic negative features vectors Nk in each cluster k. The variable k, with k=l, ...,K, identifies a cluster number, and the variable K denotes the total number of clusters. Block 604 may determine scaled centroids by multiplying the centroid k by the corresponding Nk . Block 604 may perform an E-SVM function based on the scaled centroids. Block 604 may perform E-SVM determinations in accordance with Equation Nos. (l)-(4) and (12). Fig. 6A illustrates an exemplary method for performing similarity clustering in accordance with present principles. Fig. 6A includes a block 650 that may receive generic negative features. The generic features may be E-SVM, RE-SVM or base features such as VLAD, Bag-of-words or Fisher vector. In one example, the generic negative features may be the features received from block 602 in Figure 6. In one example, the set of generic negative features may have a size N (i.e., there are N generic negative features received by block 651). In one example, block 650 may receive the actual values of each generic negative feature. In another example, block 650 may receive one or more pointers to the values of the generic negative features. Each negative generic feature may be represented by a vector.

Block 651 may initialize centroids in accordance with present principles. In one example, block 651 may initialize a set of centroids c\, CK„ where there are a total number of K centroids. In one example, block 651 may initialize centroids by randomly choosing K generic negative features from the features received in block 650. The value of K is determined empirically and should be smaller than the total number of generic negative features. In one example, block 651 may initialize the centroids based on the K generic negative features. For example, block 651 may loop through the K generic negative features and assign each feature to a corresponding centroid. For example, block 651 may assign a first negative feature to a first centroid ci, a second negative feature to a second centroid c₂, a third negative feature to a third centroid C3, and so on until the Kth negative feature which is assigned to centroid Ck. In another example, the centroids may be initialized to fixed values that do not depend on the generic negatives. For example, each centroid may be assigned a value that is determined by a random number generator. Each centroid may be a vector of the same size as the features stored in the set N.

Block 652 may begin a loop (hereinafter "loop(l)") that will loop and perform the processing of blocks 653-662 using a variable i having a range from one through a number Z. The number Z may signify the number of loop iterations that will be performed to optimize the values of the centroids. In one example, the number Z may be received from an outside source (e.g., may be set by a user) or may be empirically determined. Block 652 may pass control to block 653.

Block 653 may initialize clusters in accordance with present principles. In one example, block 653 may initialize a set of clusters Ci, C₂, ... Ck. The set of clusters may have the same amount of elements (K) as the set of centroids initialized by block 651. Each cluster C may have a one to one association with a corresponding centroid. For example, cluster Ci corresponds to centroid ci, cluster C2 corresponds to centroid c₂, cluster C3 corresponds to centroid C3, ..., cluster Ck corresponds to centroid Ck. Each cluster C is a list designed to point to features associated to the corresponding centroid of that cluster. Block 653 may initialize all the clusters (e.g., clusters Ci, C2, ... Ck) to zero. Block 653 may then pass control to block 654.Block 654 may begin a loop (hereinafter "loop(2)") that will loop through all the generic negative features from block 650 using a variable j having a range from one to the total number of generic negative features N. Block 654 may pass control to block 655. For example, when j=l, loop 1 may loop through generic negative feature zi. Likewise, when j=4, loop 2 may loop through generic negative feature z₄.

Block 655 may begin a loop (hereinafter "loop(3)") that will loop through all the centroids ci, ... Ck and their corresponding clusters Ci, ... Ck using a variable / having a range from one to the total number of centroids K. Block 655 may pass control to block 656.

Block 656 may determine a distance between the feature zj and a centroid ci. In one example, block 656 may determine distance based on the signed inner product or the absolute value of an inner product between the feature zj and the centroid ci. Block 656 may pass control to block 657 when the value of 1 is greater than K. Block 657 may end loop 3. Block 657 may then pass control to block 658.

Block 658 may assign feature zj to a closest centroid. In one example, block 658 may determine the closest centroid to the feature zj based on the distance between the feature zj and all the centroids determined by loop 3. In one example, block 658 may determine the centroid with the highest absolute value of an inner product between feature zi and the centroid. In another example, block 658 may determine the closest centroid based on any other method that compares the distances determined by loop 3. Block 658 may assign the feature zj to the closest centroid by updating the cluster corresponding to that closest centroid. In one example, block

658 may update the corresponding cluster to include an identifier to feature zj.

Block 658 may pass control to block 659 when the value of j is greater than N. Block

659 may end loop 2. Block 659 may pass control to block 660.

Block 660 may begin a loop (hereinafter "loop(4)") that will loop through all the centroids ci, . . . ck and their corresponding clusters Ci, ... Ck using a variable m having a range from one to the total number of centroids K. Block 660 may pass control to block 661.

Block 661 may update the value for the centroid c_m. In one example, block 661 may update the value of the centroid c_m based on the first left-singular vector of the features identified by the corresponding cluster Cm. Block 661 may review the values of the cluster Cm that corresponds to the centroid c_m. The cluster Cm may identify the features clustered with centroid c_m (e.g., the features closest to centroid c_m). Block 661 may create a matrix Mm based on the feature vectors identified by cluster Cm. This matrix may be the correlation matrix for the features assigned to cluster Cm. Block 661 may then determine an updated value for the centroid Cm based on the left-singular vector of the matrix Mm with the strongest singular value. Block 661 may pass control to block 662 when the value of m is greater than K. Block 662 may end loop 4. Block 662 may pass control to block 663 when the value of i is greater than A. Block 663 may further optionally output updated values for all the centroid ci, . . . Ck.

Fig. 6B illustrates a plot of similarity clustering in accordance with present principles. In one example, Fig. 6B illustrates an example of the similarity clustering performed by the method of Fig. 6A or by block 603 of Fig. 6. Fig. 6B illustrates a two dimensional feature space where the generic negative features of size two exist, with each of the two axes corresponding to one of the entries in each feature vector zi Fig. 6B illustrates for a set of 10 generic negative features zi 671, Z2 672, Z3 673, z₄ 674, Z5 675, Z6 676, Z7 677, zs 678, Z9 679, zio 680. Fig. 6B further illustrates three centroids ci 681, a 682, C3 683 and their corresponding weights Ni 691, N2 692, N3 693. In an example, each of the weights Ni, N₂, N3 may represent the number of nearest features of the corresponding centroid ci, a, C3. Each centroid ci, c₂, C3 may be learned in accordance with the principles described in connection with Fig. 6A.

As illustrated in Fig. 6B, each centroid ci, a, C3 may be associated with a respective Cluster 1, Cluster 2 and Cluster 3. Each feature zi is part of a cluster of the closest or nearest centroid. For example, centroid ci 681 may have a cluster of features Z6 676, Z7 677, zs 678. Furthermore, centroid ci 681 may have an associated weight Ni 691. The value of weight Ni may be 3, indicating that there are three features assigned to cluster ci 681. Centroid C2 682 may have a cluster of features zi 671, Z3 673, Z4 674, Z9 679. Furthermore, centroid C2 682 may have an associated weight 2 692. The value of weight N2 may be 4, indicating that there are four features assigned to cluster C2 682. Centroid C3 683 may have a cluster of features Z2 672, zs 675, zio 680. Furthermore, centroid C3 683 may have an associated weight 3 693. The value of weight N3 may be 3, indicating that there are three features assigned to cluster C3 683.

Fig. 7 is a block diagram illustrating an exemplary image processing system 700. The image processing system includes an image processing device 710 and a display 720. In one example, the device 710 and the display 720 may be connected by a physical link. In another example, the device 710 and the display 720 may communicate via a communication link, such as, for example a wireless network, a wired network, a short range communication network or a combination of different communication networks.

The display 720 may allow a user to interact with image processing device 710, including, for example, inputting criteria for performing an image search. The display 720 may also display the output of an image search.

The image processing device 710 includes memory and processor 730 and any other software or hardware that allow the performance of image searcher 740, RE-SVM image feature determinator 750 and accelerated feature determinator 760. The device 710 may be configured to perform any of the features of the blocks depicted in FIG. 2-6a. .

Each of the RE-SVM image searcher 740, RE-SVM image feature determinator 750 and accelerated feature determinator 760 may be executed in whole or in part on image processing device 710. Alternatively, each of the image searcher 740, RE-SVM image feature determinator 750 and accelerated feature determinator 760 may be performed remotely and the results may be communicated to device 710 via a communication link. The dashed boxes may indicate that each of the box processes may be executed on image processing device 710 or may be executed remotely.

Fig. 8 illustrates an example of various image processing devices 801-804 and a server

805. The image processing devices may be smartphones (e.g., device 801), tablets (e.g., device 802), laptops (e.g., device 803), or any other image processing device 804 that includes software and hardware to execute the features of described in the Figures. The image processing devices 801-804 may be similar to the image processing device 710 and the image processing system 700 described in connection with Fig. 7. Image searcher 740, RE-SVM image feature determinator 750 and accelerated feature determinator 760 may be executed on any of the devices 801-804, on server 805, or in a combination of any of the devices 801-804 and server 805.

Fig. 9 represents an exemplary architecture of an apparatus 900 which may be configured to implement methods described in relation with Fig. 1-6A and Equation Nos. 1- 10a. In one example, apparatus 900 represents an apparatus that may be configured to implement the methods according to present principles, including principles described in relation to Figs. l-6a.

Apparatus 900 comprises following elements that are linked together by a data and address bus 901 :

a microprocessor 902 (or CPU), which is, for example, a DSP (or Digital Signal

Processor);

a ROM (or Read Only Memory) 903 ;

a RAM (or Random Access Memory) 904;

- an I/O interface 905 for reception of data to transmit, from an application; and a battery 906 (or other suitable power source).

According to a variant, the battery 906 is external to the apparatus. In each of mentioned memory, the word « register » used in the specification can correspond to area of small capacity (some bits) or to very large area (e.g. a whole program or large amount of received or decoded data). ROM 903 comprises at least a program and parameters. Algorithm of the methods according to the invention is stored in the ROM 903. When switched on, the CPU 902 uploads the program in the RAM and executes the corresponding instructions. RAM 904 comprises, in a register, the program executed by the CPU 902 and uploaded after switch on of the apparatus 900, input data in a register, intermediate data in different states of the method in a register, and other variables used for the execution of the method in a register.

The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or an apparatus), the implementation of features discussed may also be implemented in other forms (for example a program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.

According to a specific example of image processing, the image or picture I is obtained from a source. For example, the source belongs to a set comprising:

a local memory (903 or 904), e.g. a video memory or a RAM (or Random Access Memory), a flash memory, a ROM (or Read Only Memory), a hard disk ;

a storage interface (905), e.g. an interface with a mass storage, a RAM, a flash memory, a ROM, an optical disc or a magnetic support;

a communication interface (905), e.g. a wireline interface (for example a bus interface, a wide area network interface, a local area network interface) or a wireless interface (such as a IEEE 802.1 1 interface or a Bluetooth® interface); and

an image capturing circuit (e.g. a sensor such as, for example, a CCD (or Charge- Coupled Device) or CMOS (or Complementary Metal-Oxide-Semiconductor)).

According to different embodiments, the decoded image I is sent to a destination; specifically, the destination belongs to a set comprising:

a local memory (903 or 904), e.g. a video memory or a RAM, a flash memory, a hard disk ;

- a storage interface (905), e.g. an interface with a mass storage, a RAM, a flash memory, a ROM, an optical disc or a magnetic support;

a communication interface (905), e.g. a wireline interface (for example a bus interface (e.g. USB (or Universal Serial Bus)), a wide area network interface, a local area network interface, a HDMI (High Definition Multimedia Interface) interface) or a wireless interface (such as a IEEE 802.11 interface, Wi-Fi ® or a Bluetooth ® interface); and

a display.

According to different examples, the bitstream BF and/or F are sent to a destination. As an example, one of bitstream F and BF or both bitstreams F and BF are stored in a local or remote memory, e.g. a video memory (904) or a RAM (904), a hard disk (903). In a variant, one or both bitstreams are sent to a storage interface (905), e.g. an interface with a mass storage, a flash memory, ROM, an optical disc or a magnetic support and/or transmitted over a communication interface (905), e.g. an interface to a point to point link, a communication bus, a point to multipoint link or a broadcast network.

According to different examples, the bitstream BF and/or F is obtained from a source. Exemplarily, the bitstream is read from a local memory, e.g. a video memory (904), a RAM (904), a ROM (903), a flash memory (903) or a hard disk (903). In a variant, the bitstream is received from a storage interface (905), e.g. an interface with a mass storage, a RAM, a ROM, a flash memory, an optical disc or a magnetic support and/or received from a communication interface (905), e.g. an interface to a point to point link, a bus, a point to multipoint link or a broadcast network.

According to different examples, apparatus 900 being configured to implement methods in accordance with present principles, belongs to a set comprising:

a mobile device ;

a communication device ;

a game device ;

a tablet (or tablet computer) ;

a laptop ;

a still image camera;

a video camera ;

an encoding chip;

a still image server ; and

a video server (e.g. a broadcast server, a video-on-demand server or a web server).

According to different examples, apparatus 900 being configured to implement an image processing process in accordance with present principles, belongs to a set comprising:

a mobile device ; a communication device ;

a game device ;

a set top box;

a TV set;

a tablet (or tablet computer) ;

a laptop ;

a display and

a decoding chip.

Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications. Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a preprocessor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.

An aspect of present principles may be implemented on any electronic device or combination of electronic devices. For example, the present principles may be implemented on any of variety of devices including a computer, a laptop, a smartphone, a handheld computing system, a remote server, or on any other type of dedicated hardware. Various examples of the present invention are described below with reference to the figures.

In one example, experiments were conducted based on Algorithm 1 shown below, where 1₀<|, = 1 if a < b and 0 otherwise.

Algorithm 1 : E-SVM feature encoding with PECASOS.

Input: x, fif = A, T, f

Set z = x, y = 1

end if

10: Set w_t+i = W(— jjr( AW( — l_?J¾! ^Twc"iS^z)

1 1 end far

12: Algorithm 1 may perform determinations as discussed in connection with Equation Nos. (1) and (3). The normalization in Equation Nos. (2) and (4) is carried out at the end of the algorithm (line 12).

As part of the experiments discussed herein, Algorithm 1 was performed on two publicly available datasets, Holidays and Oxford. The Holidays dataset consists of 1491 images of vacation shots divided into 500 groups of matching images. The second dataset, the Oxford dataset, consists of close to 5000 images of buildings from the city of Oxford. The images are divided into 55 groups of matching images, and a query image is specified for each group. The query image consisted of full images (as opposed to cropped regions). Both datasets include a specific evaluation protocol based on mean Average Precision (mAP) that we use throughout.

Figs. 10-13 illustrate the effect of the RE-SVM encoding parameters on mAP performance on the Holidays dataset. Fig. 10 illustrates the effect of a negative pool size N on the performance of the system. As shown in Fig. 10, the mean Average Precision (mAP) (vertical axis) increases when the size of the negative pool (horizontal axis) is larger. In other experiments, the pool size may be set to N = 60e3. Using larger pools could indeed increase the system performance, but this is at the expense of more encoder memory and longer Stochastic Gradient Descent (SGD ) runtimes.

Fig. 11 illustrates the effect on mAP v. for SGD iterations for two different pool sizes ( N = 10e3 and 60e3). As shown in Fig. 1 1, when the the pool size N equals 60e3, the benefit of increasing the number T of iterations saturates after WOeS' iterations. Due to this observed saturation, T— 100e3 iterations was utilized in other experiments.

Table 1 illustrates runtimes for E-SVM learning. As shown in Table 1, using lOOeS iterations takes 620 ms with a non-optimized implementation using the "C" programming language.

Iterations le3 10e3 100e3

Run-time 10 ms 70 ms 620 ms

Table 1: E-SVM encoding runtime when using a single core running at 2.6 GHz.

Fig. 12 illustrates the use of recursively trained RE-SVMs. Fig. 12 displays mAP on the vertical axis and number of RE-SVM recursions on the horizontal axis. As shown in Fig. 12, the RE-SVM results are determined based on r recursions (referred to as RE-SVM-r), for r = 0. 6, where r = 0 refers to the baseline results obtained with the VLAD encoder. As shown in Fig. 12, a single RE-SVM recursion produces a gain of close to 4 mAP points relative to the VLAD encoder. As shown in Fig. 12, using 6 recursions produces a gain of close to 5 mAP points.

Fig. 13 illustrates the effect of varying the exemplar sampling rate during the SGD optimization process. Note that any value between f_P =\

results in an improvement relative to the baseline VLAD encoder.

Figs. 14 and 15 illustrate the robustness of the RE-SVM methods in accordance with present principles. In particular they demonstrate the robustness relative to a large number of distractor images from Flickr different from those used for the negative pool. The distractor images are encoded in the same manner as the benchmark images using VLAD + RE-SVM-2. The parameters used for both RE-SVM recursions are N = 60e3, T=100e3, and f_P=30. It is noted that the same parameters selected on Holidays also give an important improvement in the Oxford dataset. Moreover, this improvement is constant over the entire range of distractor images considered. For the Holidays dataset, the improvement is in excess of 5 mAP points for the entire range of distractor images. For the Oxford dataset, the improvement is in excess of 10 mAP points likewise for the entire range of distractor images.

Table 2 illustrates the performance of the base VLAD-6 , BoW-1000 and Fisher-64 encodings on the Holidays and Oxford benchmarks, along with the performance of the RE- SVM-1 and RE-SVM-2 features derived from each encoding. As illustrated in Table 2, even a single RE-SVM recursion gives a large boost to all three encodings. The Fisher vector, in particular, performs poorly initially, but gains as many as 35 mAP points (on the Holidays dataset) after two RE-SVM recursions to outperform BoW. As shown in Table 2, the RE-SVM system of present principles also gives an important advantage (3.6 mAP points) when using such CNN-based features as initial features.

Holidays Oxford 5K

VLAD-64 72.7 46.3

VLAD-64 H h RE-SVM- 1 77.5 55.5

VLAD-64 H h RE-SVM-2 78.3 57.5

Fisher-64 18.2 9.27

Fisher-64 + RE-SVM- 1 59.8 27.7

Fisher-64 + RE-SVM-2 63.6 31.5

BoW-1000 38.6 17.0

BoW-1000^■ +- RE-SVM- 1 44.5 20.5

BoW-1000^■ +- RE-SVM-2 49.1 25.5

CNN 64.2 32.2

CNN 68.2 40.6 C + RE-SVM-1 71.3 43.9

CNN + RE-S VM-2 71.8 44.6

Table 2: Results for VLAD, BoW, Fisher and CNN encodings and their RE-SVM-1 and RE- SVM-2 variants.

Table 3 illustrates the present method in a Pascal VOC image classification task when using either Nearest Class Mean (NCM) or A'-Nearest Neighbors (ii-NN) classifiers. As shown in Table 3, there are improvements of close to 4 mAP points for the NCM classifier and up to 2 mAP points for the A^'-NN classifier. The NCM and ii-NN classifiers have the important advantage that new classifiers can be added at near zero cost, contrary to one-vs-rest approaches using linear classifiers that require that the classifiers for all classes be updated periodically when adding new classes. NCM in particular further enjoys a very low testing cost.

Classifier CNN + RE-SVM-1

NCM 5 L8 55.5

A^'-NN 3 60.7 62.2

Ji'-NN 5 65.7 66.5

ft^'-NN 10 68.9 69.8

Table 3: Results (mAP) on the Pascal VOC image classification task when using CNN as a base feature.

Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette ("CD"), an optical disc (such as, for example, a DVD, often referred to as a digital versatile disc or a digital video disc), a random access memory ("RAM"), or a read-only memory ("ROM"). The instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor- readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.

As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data the rules for writing or reading the syntax of a described example, or to carry as data the actual syntax-values written by a described example. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.

A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application.

Numerous specific details have been set forth herein to provide a thorough understanding of the present invention. It will be understood by those skilled in the art, however, that the examples above may be practiced without these specific details. In other instances, well- known operations, components and circuits have not been described in detail so as not to obscure the present invention. It can be appreciated that the specific structural and functional details disclosed herein may be representative and do not necessarily limit the scope of the present invention. Various examples of the present invention may be implemented using hardware elements, software elements, or a combination of both. Some examples may be implemented, for example, using a computer-readable medium or article which may store an instruction or a set of instructions that, if executed by a machine, may cause the machine to perform a method and/or operations in accordance with the examples. Such a machine may include, for example, any suitable processing platform, computing platform, computing device, processing device, computing system, processing system, computer, processor, or the like, and may be implemented using any suitable combination of hardware and/or software. The computer- readable medium or article may include, for example, any suitable type of memory unit, memory device, memory article, memory medium, storage device, storage article, storage medium and/or storage unit. The instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, encrypted code, and the like, implemented using any suitable high-level, low-level, object- oriented, visual, compiled and/or interpreted programming language.

The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program). An apparatus and constituents included therein, for example, a processor, an encoder and a decoder, may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.

Additionally, this application or its claims may refer to "determining" various pieces of information. Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.

Further, this application or its claims may refer to "accessing" various pieces of information. Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.

Additionally, this application or its claims may refer to "receiving" various pieces of information. Receiving is, as with "accessing", intended to be a broad term. Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory). Further, "receiving" is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.

Claims

1. A method (300) for image searching, comprising:

determining at least an RE-SVM feature (310) based on a recursive exemplar support vector machine feature of an order n (RE-SVM_n);

performing image searching by applying the determined RE-SVM feature to a plurality of features of search images;

wherein the RE-SVMn feature is determined based on at least an initial positive feature (302) and at least an initial negative feature (304) when n equals one, and

wherein the RE-SVMn feature is determined based on at least a positive RE-SVMn-i

(305, 307, 309) feature and at least a negative N-1 RE-SVM feature (306, 308) when n is greater than one.

2. An apparatus (700, 900) for image searching, comprising:

a processor (730, 902) configured to determine at least an RE-SVM feature (310) based on a recursive exemplar support vector machine feature of an order n (RE-SVM_n), the processor further configured to perform image searching by applying the determined RE-SVM feature to a plurality of features of search images;

wherein the RE-SVMn feature is determined based on at least a positive RE-SVMn-i (305, 307, 309) feature and at least a negative RE-SVMn-i feature (306, 308) when n is greater than one.

3. The method of claim 1 or the apparatus of claim 2, wherein the RE-SVM feature represents at least a query image and wherein the image search is an image retrieval search.

4. The method of claim 1 or the apparatus of claim 2, wherein the RE-SVM feature represents at least a positive image and wherein the image search is a semantic search that includes determining at least a second RE-SVM feature representing at least a negative image.

5. The method of claim 1, apparatus of claim 2, or method or apparatus of claims 3 or 4, wherein the features of the search images are RE-SVMn features.

6. The method of claim 1 or the apparatus of claim 2, wherein the RE-SVM feature is determined based on n-iterations of performing an E-SVM function based on the at least a positive RE-SVM_n-i feature and the at least a negative RE- SVM_n-i feature.

7. The method of claim 1 or the apparatus of claim 2, wherein the at least a negative RE-SVMn-i feature are determined by performing an E-SVM function based on a corresponding RE-SVMn-2 feature and at least another RE-SVM_{n 2} feature.

8. The method of claim 1 or the apparatus of claim 2, wherein the at least a negative

RE-SVMn-i feature is determined offline.

9. The method of claim 1 or the apparatus of claim 2, further comprising performing image searching based on the RE-SVM_n feature.

10. The method or apparatus of claim 7, wherein the image searching includes determining a classifier based on the positive RE-S VM_n feature and negative RE-SVM_n feature.

1 1 . The method or apparatus of claim 7, wherein a score is determined for each search image when the classifier is applied to that search image.

12. The method or apparatus of claim 11, wherein the score is determined by applying the classifier to RE-SVM features of the search images.

13. The method of claim 1 or the apparatus of claim 2, further comprising determining a local descriptor for the RE-SVM_n feature.

14. The method or apparatus of claim 13, further comprising performing image searching based on an aggregation of local descriptors.