[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2016075274A1 - Methods, systems and apparatus for image recognition based on recursively determined exemplar-support vector machines (e-svm) features - Google Patents

Methods, systems and apparatus for image recognition based on recursively determined exemplar-support vector machines (e-svm) features Download PDF

Info

Publication number
WO2016075274A1
WO2016075274A1 PCT/EP2015/076522 EP2015076522W WO2016075274A1 WO 2016075274 A1 WO2016075274 A1 WO 2016075274A1 EP 2015076522 W EP2015076522 W EP 2015076522W WO 2016075274 A1 WO2016075274 A1 WO 2016075274A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
svm
features
block
image
Prior art date
Application number
PCT/EP2015/076522
Other languages
French (fr)
Inventor
Joaquin ZEPEDA SALVATIERRA
Patrick Perez
Original Assignee
Thomson Licensing
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Thomson Licensing filed Critical Thomson Licensing
Publication of WO2016075274A1 publication Critical patent/WO2016075274A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Definitions

  • the present disclosure relates to image recognition or searching. More precisely, the present disclosure relates to image searching based on Exemplar Support Vector Machines (E- SVM).
  • E- SVM Support Vector Machines
  • An image search is a search for an image or images.
  • the input to the image search may be an image or images.
  • Image searching is relevant in many fields, including computer vision, object recognition, video tracking, and video-based location determination/mapping.
  • Fig. 1 illustrates a method 100 for performing image searching.
  • the method 100 includes receiving query image(s) 1 10 that will be the basis of the image search; extracting image features 120 from the query image(s); and performing image searching 130 by using the image features from block 120.
  • Block 130 may perform image searching based on any technique, including image retrieval and semantic searching techniques.
  • Fig. 1A illustrates an example of an image retrieval search technique.
  • An image retrieval search generally locates images that have the same scene as a query image. The located same-scene images may even have scene alterations resulting from task-related transformation(s) (e.g., changes in scene illumination, image cropping, scaling, wide changes in the perspective of the camera, high compression ratios, or picture-of-video-screen artifacts).
  • task-related transformation(s) e.g., changes in scene illumination, image cropping, scaling, wide changes in the perspective of the camera, high compression ratios, or picture-of-video-screen artifacts.
  • Query image features may be received at block 141.
  • Block 142 may perform image retrieval based on image features received from block 141 and image features received from a large database of features 145.
  • Block 142 may compare the image features received from block 141 with the image features received from the large database of features 145.
  • the large database of features 145 may consist of image features extracted from images stored in a database of search images 143.
  • Block 144 may extract features of each image in the database of search images 143.
  • the large database of features 145 may store the results determined by block 144.
  • Blocks 143-145 may be computed offline or prior to the execution of blocks 141, 142, and 146.
  • block 142 performs image retrieval based on a distance between the features from block 141 and the features of the large database from block 145. For example, block 142 may calculate the Euclidean distance between a query image feature from block 141 and each of the features from block 145. Block 142 may choose the images from database 143 that correspond to the smallest distances. Block 142 may provide the search results to block 146. Block 146 may output the search result(s). If multiple results are returned, the results may be ranked and the ranking may be provided along with the results.
  • Fig. IB illustrates an example of a semantic search.
  • the semantic search searches images for visual concepts representing a search query (e.g., cats).
  • Block 151 receives features of positive images (i.e., features of images that contain the queried visual concept (e.g., cats)).
  • Block 152 receives features of negative images (i.e., features of images that do not contain the queried visual concept (e.g., no cats, instead contain dogs, other animals or no animals)).
  • Block 153 may perform classifier learning to determine a classifier.
  • Block 153 performs classifier learning based on the positive features from block 151 and the negative features from block 152.
  • Block 153 may choose a classifier that best differentiates the positive features from the negative features.
  • the classifier learning may be an SVM algorithm. Alternatively, the classifier learning may be other learning techniques.
  • Block 155 may perform image searching based on the classifier from block 153 and the features from the large database 154.
  • the large database of features 154 may consist of feature vectors of images that will be searched. For example, the large database of features 154 may be determined in accordance with the principles described in connection with block 145 of Fig. l a.
  • Block 155 may apply the classifier from block 153 to each of the features from block 154.
  • Block 156 may output the result(s) from block 155. If multiple results are returned, the results may be ranked and the ranking may be provided along with the results.
  • An E-SVM is a classifier that is a linear Support Vector Machine learned from a single positive example, referred to as the exemplar.
  • the exemplar can be determined based on the positive example and a set of generic, negative examples N.
  • Malisiewicz et al, Exemplar-SVMs for Visual Object Detection, Label Transfer and Image Retrieval, International Conference of Machine Learning, 2012 discuss E-SVMs.
  • E-SVM search techniques require an inefficiently large set of generic, negative examples Af (e.g., a set of one million feature vectors). The large set is inefficient during data computations. Also, present E-SVM search techniques are limited to using an E-SVM representation as a classifier.
  • Hard-negative mining is a process of (i) selecting subset of generic negative features Af (hereinafter referred to as the "hard-negative cache"); (ii) training a classifier based on the hard- negatives cache; and (iii) increasing the hard-negative cache based on items from Af that have a classification score inside a margin (i.e., greater than - l).
  • Af generic negative features
  • E-SVM searching based on hard-negative mining is still a highly complex and inefficient process because it still relies on a large set of generic negatives.
  • E-SVM techniques also have limited generalization because, despite the size of the negative set, they utilize a single positive exemplar.
  • Present E-SVM determinations rely on asymmetric determinations.
  • Image retrieval using the asymmetric approach has many shortcomings, including sub-par performance relative to features tailored for the image retrieval task.
  • the asymmetric approach has not been considered for semantic search because E-SVMs are tailored for data-constrained scenarios, and many positives are generally available in semantic search.
  • the asymmetric approach relies on a very large generic negatives pool, and suffers from limits in generalization that must be overcome by deriving extra positives from the original exemplar.
  • the method may include determining at least an RE-SVM feature based on a recursive exemplar support vector machine feature of an order n (RE-SVM n ) and performing image searching by applying the determined RE-SVM feature to a plurality of features of search images.
  • the apparatus may include a processor.
  • the processor may be configured to determine at least an RE-SVM feature based on a recursive exemplar support vector machine feature of an order n (RE-SVMn).
  • the processor may further be configured to perform image searching by applying the determined RE-SVM feature to a plurality of features of search images.
  • the RE-SVMn feature is determined based on at least an initial positive feature and at least an initial negative feature when n equals one.
  • the RE-SVMn feature is determined based on at least a positive RE-SVMn- 1 feature and at least a negative N-l RE-SVM feature when n is greater than one.
  • the RE-SVM feature may represent at least a query image when the image search is an image retrieval search.
  • the RE-SVM feature may represent at least a positive image when the image search is a semantic search.
  • the semantic search may include determining at least a second RE-SVM feature representing at least a negative image.
  • the features of the search images may be RE-SVMn features.
  • the RE-SVM feature may be determined based on n-iterations of performing an E-SVM function based on the at least a positive RE-SVMn-i feature and the at least a negative RE- SVM n -i feature.
  • the at least a negative RE-SVMn-i feature may be determined by performing an E-SVM function based on a corresponding RE-SVMn-2 feature and at least another RE-SVMn-2 feature.
  • the at least a negative RE-SVMn-i feature may be determined offline.
  • the image searching may be performed based on the RE-SVMn feature.
  • the image searching may include determining a classifier based on the positive RE-SVMn feature and negative RE-SVMn feature.
  • a score may be determined for each search image when the classifier is applied to that search image. The score may be determined by applying the classifier to RE-SVM features of the search images.
  • a local descriptor may be determined for the RE-SVMn feature.
  • the image search may be performed based on an aggregation of local descriptors.
  • FIG. 1 illustrates a flow diagram of a method for image searching.
  • FIG. 1 A illustrates a flow diagram of a method for an image retrieval search.
  • FIG. IB illustrates a flow diagram of a method for a semantic search.
  • FIG. 1C illustrates an example of an illustration of a graphical representation of an E-
  • FIG. 2 illustrates a flow chart of an exemplary method for determining E-SVM feature(s).
  • FIG. 3 illustrates a flow chart of an exemplary method for determining an RE-SVM feature(s).
  • FIG. 4 illustrates a flow chart of an exemplary method for semantic search using RE- SVM features.
  • FIG. 5 illustrates a flow chart of an exemplary method for extracting RE-SVM local descriptors.
  • FIG. 6 illustrates a flow chart of an exemplary method for accelerated E-SVM determinations.
  • FIG. 6A illustrates a flow chart of an exemplary method for centroid determination.
  • FIG. 6B illustrates an exemplary diagram of a plot of similarity clustering.
  • FIG. 7 illustrates a block diagram of an exemplary image processing device.
  • FIG. 8 illustrates a block diagram of an exemplary distributed image processing system.
  • FIG. 9 illustrates an exemplary apparatus.
  • FIG. 10 illustrates an exemplary plot diagram of experimental results
  • FIG. 11 illustrates an exemplary plot diagram of experimental results
  • FIG. 12 illustrates an exemplary plot diagram of experimental results
  • FIG. 13 illustrates an exemplary plot diagram of experimental results
  • FIG. 14 illustrates an exemplary plot diagram of experimental results
  • FIG. 15 illustrates an exemplary plot diagram of experimental results.
  • an "E-SVM function” refers to a determination of linear classifiers based on one or more positive and negative examples.
  • the E-SVM function may perform a learning determination based on the positive examples of a class (i.e., examples that represent the class; for example, for the class "cats", the set of examples may consist of images of cats) and negative examples (i.e., examples that do not represent a class).
  • a learning algorithm may choose a function that best differentiates positive examples from negative examples.
  • E-SVM feature refers to the output of the E-SVM function.
  • an "RE-SVM function” refers to a recursive framework for determining RE-SVM features in accordance with the principles herein.
  • the recursive framework may determine an initial set of E-SVM features based on image feature(s) of a query image and image features of generic negative images.
  • the recursive framework may determine an initial RE-SVM feature by applying an E-SVM function to E-SVM features of query image(s) and E- SVM features of generic negative images.
  • the recursive framework may then recursively determine RE-SVM features by repetitively applying the E-SVM function based on a prior repetition's determination of RE-SVM query features and features of generic negative images.
  • an "RE-SVM feature” refers to the output of the RE-SVM function.
  • the scalars, vectors and matrices may be denoted using, respectively standard, bold, and uppercase-bold typeface (e.g., scalar a, vector a and matrix A).
  • the notation v k may denote a vector from a sequence v l t v 2 , . . . , v N , and v k may denote the A-tli coefficient of vector v.
  • the notation [a*]* (respectively, [ ⁇ 3 ⁇ 4]3 ⁇ 4) may denote concatenation of the vectors a fc (scalars ⁇ 3 ⁇ 4) to form a single column vector.
  • An aspect of present principles is directed to mapping E-SVM features to RE-SVM feature(s) based on an E-SVM function.
  • An aspect of present principles is directed to determining RE-SVM feature(s) during a post-processing stage.
  • enhancement of image searching and image classification may occur based on E-SVM features obtained by concatenating the activations of Deep Convolutional Neural Networks (DCNNs).
  • An aspect of present principles is directed to automatically determining an RE-SVM feature that contains unique information of a query image relative to information of generic, negative images. The automatic extraction of the unique information is more efficient than feature extractors that rely on manual encoding of uniqueness into a feature extraction process.
  • An aspect of present principles is directed to image searching based on symmetrically determined E-SVM features.
  • symmetric determinations or “symmetrically” determined refers image searching based on E-SVM features of query images (i.e., positive images), E-SVM features of generic negative images, and E-SVM features of images that will be searched. These symmetric determinations improve robustness and efficiency because the E-SVM function has been applied to the images and this function determines, from the feature representing each image, a representation that separates the image from a large generic negative pool of features.
  • the image feature encoder may be a global encoder, may be based on aggregated local features, or may be derived from CNNs-based features.
  • the features may be denoted by vectors in a space R D .
  • the parameter ff D may represent the space of all possible vectors having D, real-valued entries.
  • An aspect of present principles provides a novel solution to the problem of complexity of generic negative image sets with a large amount of images.
  • An aspect of present principles reduces complexity based on clustering of the generic negative features.
  • the clustering may consist of deriving centroids.
  • Each centroid may represent a subset of the large set of generic negative images.
  • Each centroid may be associated to a subset of generic negative features that are closest to that centroid. That is, the generic negative features of a subset are the generic features that are closer to that centroid than to any other centroid.
  • the distance may be determined by maximizing the inner-product similarities between the generic negative features of the subset and the centroid.
  • An E-SVM feature in accordance with present principles may be determined based on an optimization determined as follows:
  • the parameters ⁇ , ⁇ , and ⁇ ⁇ may have positive, scalar values.
  • the parameters ⁇ , ⁇ , and on may control regularization and relative weight of positive and negative features. Each of the parameters ⁇ , ⁇ , and on can be manually chosen or can be automatically determined.
  • the parameter x may be a feature vector that relates to a query image, a generic negative image, or a search image. In one example, feature vector x may be determined using an encoder such as a VLAD, Fisher Vector, CNN activation coefficients, or another E-SVM feature.
  • the parameter M may represent a set of generic feature vectors (e.g., of a size N). Parameter w is the parameter that is being optimized under Equation No. (1).
  • the parameter zi is negative feature that is part of the set of .A/negative features.
  • the set of generic feature vectors N may be a large set of vectors.
  • the optimization of Equation No. (1) may determine the best (i.e., the optimal) vector w.
  • the optimization may be based on stochastic gradient descent.
  • the stochastic gradient decent may be determined using a Pegasos algorithm.
  • the stochastic gradient descent may process a feature vector of parameter x or parameter zi
  • the feature vectors x and zi may be processed in a random order.
  • the parameters ai and a-i may be determined during the stochastic gradient descent algorithm.
  • the optimization of Equation No. (1) may be an SVM problem relying on hinge loss, where the amount of features in the positive set is unbalanced relative to the amount of features in the negative set. For example, there may be one positive feature and a million negative features.
  • an E-SVM feature may be determined based on a ⁇ 2 normalization of the result of Equation No. (1).
  • the normalized E-SVM feature may be determined as follows:
  • the parameters x and may have the same values as parameters x and ⁇ of Equation
  • E-SVM feature in accordance with present principles may be determined based on an optimization determined as follows:
  • the parameters x and Jsf may have the same values as parameters x and Jsf of Equation No. (1).
  • the parameter z may denote a negative feature in a set of negative features Jsf.
  • the parameters Ci, C2 are regularization parameters. In one example the parameters Ci, C2 may be customized empirically.
  • the parameters w' and b denote the parameters over which there is optimization or minimization.
  • an E-SVM feature may be determined based on a 3 ⁇ 4 normalization of the result of Equation No. (3) as follows:
  • h normalization other types of normalization may be used such as h normalization or other type of post-processing may be used such as zeroing of negative coefficients or signed square roots of coefficients.
  • An aspect of present principles is directed to determining RE-SVM feature(s) based on E-SVM features, including E-SVM features determined in accordance with the principles described in connection with Equation Nos. (l)-(4).
  • the RE-SVM feature(s) may be determined based on an E-SVM function that utilizes previously determined E-SVM features of query, training and/or search images.
  • a feature encoder may encode E-SVM features instead of the image feature representations. For example, if x is a feature vector encoded by a feature encoder, the feature encoder may instead encode an E-SVM feature (e.g., w(x,A ).
  • An aspect of present principles is directed to E-SVM functions based on symmetrically determined E-SVM features.
  • the symmetric approach includes determining E-SVM features for a query image x°.
  • the symmetric approach further includes determining corresponding E- SVM features for each image x in a database V.
  • the database D may consist of images that will be searched.
  • the E-SVM feature w° for a query image x° may be represented by:
  • the set of E-SVM features Wi corresponding with image xi in a database P of search images may be represented by:
  • search images 3 ⁇ 4 or E-SVM features Wi corresponding may be sorted based on their similarity to the query image.
  • similarity may be measured between an E-SVM feature w° derived from a query image feature x° and an E-SVM feature Wj derived from a database image feature xj as follows:
  • the parameter wy represents the ESVM feature of a database image xy.
  • the parameter w° represents the ESVM feature of a query image.
  • the parameter Sj may represent a similarity score of the image feature of the query image and the jth search image. Because each feature w is uniquely associated with a single image, the parameter Sj may also represent a similarity score between the query image and the database image xj. A high similarity score may indicate a high similarity between the images, whereas a low similarity score may indicate a low similarity between the images.
  • the score may be determined based on the inner product between wj and w°. In another example, the score may be determined based on a negated Euclidean distance (e.g.,—
  • Image searching based on E-SVM or RE-SVM features includes learning a classifier.
  • a classifier based on E-SVM or RE-SVM features may be learned based on an optimization as follows:
  • the annotated training set ⁇ j /i . i ⁇ may consist of labels
  • Vector v represents a classifier v.
  • Parameter C represents a scalar chosen empirically and controlling the regularization.
  • Parameter N represents the size of the training set.
  • Parameters Wi represent the E-SVM features of the training set images and b represent the bias or offset between the origin and the hyperplane orthogonal to v.
  • E-SVM or RE-ESVM features will be determined for the query image, the negative images and the search images.
  • the image classifier v is determined based on the training (positive and negative) images.
  • the image classifier v is applied to a database of searched images that are also represented by E-SVM or RE-SVM features.
  • a classifier may be learned from the RE-SVM representations of training images.
  • the training image(s) may be utilized to determine a K-nearest- neighbor classifier, or a nearest-class-mean classifier.
  • the RE-SVM features may be determined by recursively applying an E-SVM function to E-SVM or RE-SVM features (e.g.,
  • initial E-SVM features may be represented by:
  • the parameter w° may represent the initial feature of an image.
  • the image can be any image, including a query, database, or training image.
  • the parameter ⁇ * "0 may represent the initial feature representations of generic, negative images.
  • a first RE-SVM (RE-SVMi) feature is determined based on an E-SVM function applied to initial features.
  • a j-th recursion RE-SVM (RE-SVMj) feature may be determined as follows:
  • the value of j may be equal to or greater than 1.
  • the -th RE-SVM (or the -th E-SVM) may be determined in accordance with the principles described with Equations Nos. (l)-(4).
  • the E-SVM function in Equation Nos. (10) and (10a) may be derived according to Equation Nos. (l)-(4).
  • the RE-SVM feature construction may be similar to deep architectures that use an output of a given layer as the input of a subsequent layer.
  • an aspect of present principles unlike deep architecture, determines E-SVM or RE-SVM features using an unsupervised learning routine on a per-image basis.
  • unsupervised may indicate automatic, machine learning without user annotations to a training set.
  • each j-th RE-SVM feature (RE-SVMj) may be determined by a single, non-linear, and/or convex problem, as opposed to a standard tandem linear/non-linear arrangement.
  • local descriptors may be extracted from images.
  • the local descriptors may be extracted for the query image, the negative images, the database images that will be searched, or any other images that are used to learn classifiers.
  • the descriptors may be aggregated based on a descriptor aggregating method, such as VLAD, the Fisher Vector, or Bag-of-Words to produce a single feature vector.
  • E-SVMs are agnostic to the underlying feature representation, it is possible to compute an E-SVM from each local descriptor by using an adequate set M of generic local descriptors, and further to do so recursively as in Equations (10) and (10a).
  • the resulting RE- SVM local descriptors can subsequently be used to build aggregated features using the standard methods previously mentioned such as VLAD, the Fisher Vector or Bag-of-Words.
  • the subsequent features can be further processed using RE-SVM learning functions as discussed in connection with Equation Nos. (10) and (10a).
  • An aspect of present principles is directed to clustering generic, negative image features to reduce complexity.
  • computational complexity of E-SVM feature determinations is decreased despite the existence of a large amount of generic, negative images.
  • the amount of generic negative image features is reduced based on corresponding centroid features. This reduction results in reduced storage and complexity.
  • this decrease is achieved by deriving a smaller amount of representative centroids from a large amount of generic, negative features.
  • a centroid may refer to a vector that represents a subset of features.
  • the centroid may represent the mean of the subset, the medoid of the subset (i.e., the subset feature closest to the mean) or a singular vector or eigenvector derived from the subset.
  • the centroids may replace the large amount of generic features, thereby resulting in a smaller training set for an E-SVM function, and hence a faster learning time.
  • the amount of centroids needs to be chosen empirically, as using too few centroids may result in a non-representative generic negative pool.
  • the centroids may further have associated auxiliary data.
  • the auxiliary data may be a weight that is associated with each centroid.
  • the weight may represent the number of generic negative image features that are part of a corresponding centroid.
  • the auxiliary data may consist of statistics derived from the generic negative image features of the corresponding centroid. The statistics may include the mean and correlation matrix of the generic negative image features of the corresponding centroid.
  • the centroids may be determined based on a clustering approach that is applied to features of generic, negative images.
  • the clustering approach may be applied to multi-modal distribution of generic images' features.
  • a clustering approach may locate the centroids of the multi-modal distribution. The large amount of negative image features may then be replaced by a smaller set of centroids, weighted by their importance.
  • the centroids may be determined based on:
  • the parameters Ck and c ⁇ are centroids.
  • the parameter K is the number of learned centroids.
  • the paramctcr.V is the generic negative pool.
  • the parameter M k is the subset of features from the generic negative pool that correspond to the centroid Ck. The subset of features M k may be assigned to centroid Ck because they are closer to centroid Ckthan any other centroid.
  • the parameters z and z' are feature vectors from the generic negative pool. Each centroid e3 ⁇ 4 vector may be determined based on a first left-singular vector of a matrix derived by stacking the vectors for z e M k as columns.
  • the process for clustering features to determine centroids may include: i) updating each 3 ⁇ 4 in accordance with Equation No. (1 1), ii) updating cluster information with a determination of whether a generic negative feature should be assigned to that cluster, and (iii) repeating (i) and (ii) recursively.
  • the present clustering approach includes clustering that is sensitive to the sign of the inner product z' T c/ (compare with dictionary learning where the cluster assignment is done using the absolute value, e.g.,
  • the process for clustering or centroid determination may be performed by block 603 of Fig. 6 or blocks 658 and 661 of Fig. 6A. After all the centroids c k are determined, each feature z may be assigned a centroid. In one example, the feature z may be assigned to a centroid based on the distance between the feature z and the centroid.
  • an E-SVM function may be performed based on the centroids of the generic negative images instead of the features of the generic negative images.
  • 2 +0i max(0, l-x T w'-&)+rj r ⁇ A3 ⁇ 4 max(0, l+cJw'+6). Equation 12 w' - fc ⁇ ⁇ 1 3 ⁇ 4
  • Equation No. (12) corresponds to a modification of Equation No. (1).
  • the parameter c fc may correspond to a kth centroid.
  • the parameter J ⁇ 3 ⁇ 4 denotes the number of vectors in Af assigned to e3 ⁇ 4,.
  • denote the number of feature vectors inside the generic negatives N, the computation time required to solve the problem in Equation No. (12) is the same as that required when using only K ⁇ ⁇ ⁇ generic negative features.
  • Fig. 2 illustrates a flow chart of a method 200 for extracting E-SVM features.
  • block 201 may receive at least an image.
  • block 201 may modify the image to include at least a mirrored, flipped, cropped, shifted and scaled version of the original image.
  • block 201 may receive a plurality of images of a queried scene (e.g., a plurality of images of the same scene of the statue of liberty).
  • Block 202 may determine an initial feature vector for the image received at block 201.
  • the initial, base feature vector may be any type of feature vector, including VLAD, BoW, Fisher, and deep CNN activations.
  • the initial feature vector of the image may also be known as the positive feature.
  • Block 203 may receive a set of generic images.
  • the generic images may comprise the generic negative set as discussed in connection with Equation Nos. (1) and
  • the amount of generic negative set images may be large enough to allow for a reasonable computational time on a target CPU (e.g., a per-image E-SVM feature construction time of about 1 second, corresponding to about 100,000 negative images).
  • a per-image E-SVM feature construction time of about 1 second, corresponding to about 100,000 negative images.
  • the content of the generic negative images varies between each negative image.
  • Block 204 may determine initial feature vectors the generic, negative images received from block 203. For example, block 204 may determine an initial feature vector for each image in the set of generic images. The feature vectors determined by block 204 may be the same type of feature vectors as the feature vector determined by block 202. Block 204 may provide the determined feature vectors, i.e., the generic negative features, to block 205.
  • Block 205 may determine an E-SVM feature 210. Block 205 may determine the E-
  • Block 205 may include one or more of blocks
  • block 205 may combine or omit the functionality described in connection with one or more of blocks 206, 207 and 208. Alternatively, the functionality of blocks 206-208 may be combined and performed by block 205. In one example, block 205 may perform a classifier determination other than the SVM determination. In one example, block 205 may perform additional post-processing scheme to the determined exemplar feature.
  • Block 206 may perform an SVM determination based on the initial feature from block 202 and the initial generic negative features from block 204.
  • Block 206 may perform an SVM or E-SVM function.
  • Block 206 may determine a vector u and a bias parameter b.
  • the vector u may be orthogonal to a hyper-plane that separates generic negatives from an exemplar feature(s).
  • the bias parameter b may specify a displacement of the hyper-plane along the orthogonal.
  • block 206 may determine the orthogonal vector u and the bias parameter b in accordance with the principles described in connection with Equation Nos. (1) and (3).
  • Block 207 may process a vector u from block 206.
  • Block 207 may normalize the vector u to determine parameter w.
  • Block 207 may optionally post-process the vector u or parameter w.
  • block 207 may perform post-processing such as a normalization under a 1-2 normalization (e.g., in accordance with the principles described in connection with Equation Nos. (2) and (4)).
  • the processing may be squared root related to the Hellinger Kernel.
  • the processing can be a sequential application of a plurality of post-processing solutions.
  • Block 205 may output an E-SVM feature 210.
  • the E-SVM feature 210 may correspond to the E-SVM feature illustrated in Fig. 1C.
  • the E-SVM feature 210 may be a post-processed version of the E-SVM feature illustrated in Fig. 1C.
  • Fig. 1C illustrates an orthogonal vector « 174 and an E-SVM feature w 170.
  • An SVM algorithm may determine an orthogonal vector u 174 and a bias b parameter 173.
  • the orthogonal vector u 174 may be determined as discussed in connection with blocks 205 and 206 of Fig. 2.
  • the orthogonal vector « 174 may be orthogonal to a separating hyper-plane 172.
  • the orthogonal vector u 174 may be determined based on an exemplar feature x that may represent a queried image or a positive image concept.
  • the orthogonal vector u 174 may be normalized to produce the E-SVM feature w 170.
  • the orthogonal vector u 174 may be normalized as described in connection with block 207 of Fig. 2.
  • the parameter u and/or the E- SVM feature w 170 may be determined based on negative vectors zi 175.
  • negative vectors zi may be determined in accordance with the principles described in connection with block 204 of Fig. 2.
  • the negative vectors zi 175 may be selected from a pool of generic features 176.
  • Fig. 3 illustrates a flow chart of a method 300 for determining RE-SVM feature(s) in accordance with present principles.
  • Block 301 may receive one or more image(s).
  • Block 302 may determine an initial feature for the image from block 301.
  • the initial feature vector of the image may be known as the initial positive feature(s).
  • block 302 may determine the initial feature vector in accordance with the principles described in connection with block 202 of Fig. 2.
  • Block 303 may receive a set of generic images.
  • the generic images may be negative images determined in accordance with the images described in connection with block 203 of Fig. 2.
  • Block 304 may determine initial negative features for the generic, negative images from block 303.
  • Block 304 may determine initial negative features in accordance with the principles described in connection with block 204 of Fig. 3.
  • block 304 may be performed off-line or prior to the performance of method 300.
  • Block 305 may determine a first, positive RE-SVM feature (RE-SVMi) based on the positive feature(s) from block 302 and the negative features from block 304.
  • Block 305 may determine the RE-SVMi feature based on an SVM determination, e.g., by applying an E-SVM function to the positive and negative features.
  • Block 305 may perform an SVM determination as discussed in connection with Fig. 2 (e.g., blocks 205-207) and Fig. 1C.
  • Block 305 may determine the RE-SVMi feature as described in connection with Equation Nos. (1) to (4).
  • Block 306 may determine first negative RE-SVMi features ⁇ ⁇ corresponding to the generic image features from block 304.
  • block 306 may determine the first generic negative RE-SVMi features in accordance with the principles described in connection with Equation Nos. (l)-(4), (10), (10a).
  • block 306 may use the output of block 304 to replace the set of generic features discussed in connection with Equation Nos. (1)- (4), (10), (10a).
  • block 306 may use the output of block 304 as a generic negative pool Af° and then may use and each feature in the pool of features JV° to determine a corresponding RE-SVM feature while using all or some of the features of the pool ⁇ *0 as a generic negative pool.
  • the RE-SVM features may be determined as described in connection with Equation Nos. (l)-(4), (10), (10a).
  • block 306 may be optional and the image features from block 304 may be provided directly to block 307. In one example, block 306 may be performed off-line or prior to the performance of method 300.
  • Block 307 may determine a second, positive RE-SVM feature (RE-SVM2).
  • Block 307 may determine the RE-SVM2 feature based on the first, positive RE-SVMi feature and the first negative RE-SVMi features ⁇ 1 .
  • Block 307 may determine the RE-SVM2 feature based on an SVM determination.
  • Block 307 may perform an SVM determination as discussed in connection with Fig. 2 (e.g., blocks 205-208) and Fig. 1C. Block 307 may determine the RE-SVM2 feature as described in connection with Equation Nos. (1-4, 10, 10a).
  • Block 309 may determine an Nth positive RE-SVMN feature.
  • Block 309 may recursively determine the RE-SVMN feature based on an RE-SVMN-I positive feature and RE-SVMN-I negative features.
  • the N-1 RE-SVMN-I negative features may be determined by block 308.
  • the N-1 RE-SVMN-I negative features may be determined based on the N-2 RE-SVMN-2 negative features.
  • Each N-1 RE-SVMN- 1 negative feature may be determined in accordance with the principles described in connection with Equation Nos. (1-4, 10, 10a).
  • Each N-1 RE-SVM negative feature is computed by using the corresponding N-2 RE-SVMN-2 negative feature as a positive feature and using all other N-2 RE-SVMN-2 features as negative features.
  • block 308 may be optional and the image features from blocks 304 or 306 may be provided directly to block 307. In one example, block 308 may be performed off-line or prior to the performance of method 300.
  • Block 309 may determine the final, positive RE-SVMN feature 310 based on N iterations of the recursive framework.
  • Block 309 may determine the RE-SVMN feature 310 based on an SVM determination.
  • Block 309 may perform an SVM determination as discussed in connection with Fig. 2 (e.g., blocks 205-208) and Fig. 1C.
  • Block 309 may determine the RE-SVMN feature as described in connection with Equation Nos. (l)-(4), (10), and (10a). The RE-SVM determination may be repeated N times to determine the RE-SVMN feature.
  • Fig. 4 illustrates an exemplary image search based on RE-SVM features.
  • Block 401 may receive positive images (e.g., one or more query images). The positive images may represent a visual concept (e.g., cat).
  • Block 402 may determine positive RE-SVM features for the positive images from block 1.
  • Block 402 may determine positive RE-SVM features in accordance with the principles described in connection with Equation Nos. (l)-(4) and Fig. 3.
  • Block 403 may receive negative images.
  • the negative images may be counter examples of the visual concept (e.g., images not containing cats).
  • the negative images may be images a set of generic, negative images.
  • Block 404 may determine negative RE-SVM features for the negative images from block 403.
  • block 404 may determine negative RE-SVM features in accordance with the principles described in connection with Equation Nos. (l)-(4), (10) and (10a) and Fig. 3.
  • Block 405 may learn a classifier based on the negative and positive RE-SVM features received from blocks 402 and 404, respectively.
  • the learned classifier can be an SVM classifier, or a KNN classifier, or a nearest class mean classifier, or any other type of classifier.
  • Block 406 may determine image search results based on the classifier from block 405 and features from a large database of image features 407.
  • the large database of image features 407 may contain the features of the images that will be searched.
  • the large database of image features 407 may be determined in accordance with the principles described in connection with blocks 143-145 of Fig. la and block 154 of Fig. lb.
  • Block 406 may determine a score for each image by applying the classifier to each of the search image features.
  • the search features of block 407 may be RE-SVM features determined in accordance with the principles described in connection with Fig. 3 and Equation Nos. (10)- (10a).
  • the search results may be ranked based on the score and the ranking may be provided along with the results.
  • Fig. 5 illustrates a flow chart of a method 500 for extracting RE-SVM local descriptors.
  • Block 501 may extract N local descriptors from an input image.
  • Block 502 may extract many local descriptors from a generic pool of images.
  • Block 503 may compute RE-SVM local descriptors based on the local descriptors extracted at blocks 501 and 502.
  • Block 504 may output the results of block 503.
  • the output of block 504 may be provided to a local descriptor encoder, such as a VLAD, Fisher or Bag of Words local descriptor encoder.
  • the ⁇ local descriptors provided by block 504 may be aggregated into a single global image feature.
  • the global feature may represent features of a query, positive, negative, training, search or any other image and may be used during image searching.
  • Fig. 6 illustrates a flow chart of a method 600 for accelerated E-SVM or RE-SVM feature determination.
  • Method 600 may include a block 601 which may receive features of an image.
  • block 601 may receive the image and may determine the image features.
  • the features may be E-SVM or RE-SVM features as discussed in connection with Figs. 2 and 3.
  • the features may be initial image features or any other type of features.
  • Block 601 may determine the E-SVM or RE-SVM image features as discussed in connection with methods 200 and 300 of Figs. 2 and 3, respectively.
  • the features may be E-SVM or RE-SVM features derived from local descriptors.
  • Block 601 may provide the image features to block 604.
  • Block 602 may receive a large set of features of negative, generic images.
  • the large set of features may have a size that is a multiple of the size of the feature vector (e.g., ⁇ , lOOx, lOOOx).
  • Each feature vector may have any size, including in the range of 100 to tens of thousands.
  • the total size of the large set may be very high, including over 100,000 or over a million features.
  • block 602 may receive features of generic images as described in connection with blocks 303 and 304. In another example, block 602 may determine the features of generic images.
  • the features of the generic images may be initial, base features, E-SVM features or RE-SVM features. In another example, the features may be E-SVM or RE-SVM features derived from local descriptors. Block 602 may provide the features of the generic images to block 603.
  • Block 603 may perform similarity clustering for the generic features from block 602.
  • block 603 may perform cosine-similarity clustering.
  • block 603 may perform similarity clustering in accordance with the principles described by Equation Nos. (1 1) and (11a) and output the resulting centroids Ck.
  • block 603 may perform similarity clustering in accordance with the principles described in connection with Fig 6a.
  • block 603 may determine each centroid by: i) initializing clusters , and ii) updating each centroid c k and each corresponding cluster , and (iii) repeating (i) and (ii) recursively.
  • the present clustering approach includes clustering that is sensitive to the sign of the inner product x rr c t (compare with dictionary learning where the cluster assignment is done using the absolute value, e.g., Iz' T ci ().
  • Block 603 may optionally perform the clustering offline before any image searching is carried out. For example, the processing associated with block 603 may be performed prior to image searching and the results may be stored.
  • Block 604 may perform the E-SVM function using as generic negatives the centroids Ck from block 603.
  • the centroids ck may be utilized as the generic negative features are typically utilized by an E-SVM function.
  • block 604 may perform the E- SVM function by further considering auxiliary data for each centroid.
  • block 604 may perform an E-SVM function based on the centroids from block 603 as the generic negative features.
  • block 604 may perform an E-SVM function as described in accordance with the principles of block 205 of Fig. 2.
  • block 604 may perform the RE-SVM determination according to blocks 305, 307, or 309 of Fig. 3.
  • the auxiliary data may be represent the number of generic negative features vectors Nk in each cluster k.
  • Block 604 may determine scaled centroids by multiplying the centroid k by the corresponding Nk .
  • Block 604 may perform an E-SVM function based on the scaled centroids.
  • Block 604 may perform E-SVM determinations in accordance with Equation Nos. (l)-(4) and (12).
  • Fig. 6A illustrates an exemplary method for performing similarity clustering in accordance with present principles.
  • Fig. 6A includes a block 650 that may receive generic negative features.
  • the generic features may be E-SVM, RE-SVM or base features such as VLAD, Bag-of-words or Fisher vector.
  • the generic negative features may be the features received from block 602 in Figure 6.
  • the set of generic negative features may have a size N (i.e., there are N generic negative features received by block 651).
  • block 650 may receive the actual values of each generic negative feature.
  • block 650 may receive one or more pointers to the values of the generic negative features.
  • Each negative generic feature may be represented by a vector.
  • Block 651 may initialize centroids in accordance with present principles.
  • block 651 may initialize a set of centroids c ⁇ , CKatti where there are a total number of K centroids.
  • block 651 may initialize centroids by randomly choosing K generic negative features from the features received in block 650. The value of K is determined empirically and should be smaller than the total number of generic negative features.
  • block 651 may initialize the centroids based on the K generic negative features. For example, block 651 may loop through the K generic negative features and assign each feature to a corresponding centroid.
  • block 651 may assign a first negative feature to a first centroid ci, a second negative feature to a second centroid c 2 , a third negative feature to a third centroid C3, and so on until the Kth negative feature which is assigned to centroid Ck.
  • the centroids may be initialized to fixed values that do not depend on the generic negatives.
  • each centroid may be assigned a value that is determined by a random number generator.
  • Each centroid may be a vector of the same size as the features stored in the set N.
  • Block 652 may begin a loop (hereinafter "loop(l)") that will loop and perform the processing of blocks 653-662 using a variable i having a range from one through a number Z.
  • the number Z may signify the number of loop iterations that will be performed to optimize the values of the centroids.
  • the number Z may be received from an outside source (e.g., may be set by a user) or may be empirically determined.
  • Block 652 may pass control to block 653.
  • Block 653 may initialize clusters in accordance with present principles.
  • block 653 may initialize a set of clusters Ci, C 2 , ... Ck.
  • the set of clusters may have the same amount of elements (K) as the set of centroids initialized by block 651.
  • Each cluster C may have a one to one association with a corresponding centroid.
  • cluster Ci corresponds to centroid ci
  • cluster C2 corresponds to centroid c 2
  • cluster C3 corresponds to centroid C3
  • cluster Ck corresponds to centroid Ck.
  • Each cluster C is a list designed to point to features associated to the corresponding centroid of that cluster.
  • Block 653 may initialize all the clusters (e.g., clusters Ci, C2, ... Ck) to zero.
  • Block 653 may then pass control to block 654.
  • Block 654 may begin a loop (hereinafter "loop(2)") that will loop through all the generic negative features from block 650 using a variable j having a range from one to the total number of generic negative features N.
  • Block 655 may begin a loop (hereinafter "loop(3)") that will loop through all the centroids ci, ... Ck and their corresponding clusters Ci, ... Ck using a variable / having a range from one to the total number of centroids K. Block 655 may pass control to block 656.
  • loop(3) a loop that will loop through all the centroids ci, ... Ck and their corresponding clusters Ci, ... Ck using a variable / having a range from one to the total number of centroids K.
  • Block 656 may determine a distance between the feature zj and a centroid ci. In one example, block 656 may determine distance based on the signed inner product or the absolute value of an inner product between the feature zj and the centroid ci. Block 656 may pass control to block 657 when the value of 1 is greater than K. Block 657 may end loop 3. Block 657 may then pass control to block 658.
  • Block 658 may assign feature zj to a closest centroid.
  • block 658 may determine the closest centroid to the feature zj based on the distance between the feature zj and all the centroids determined by loop 3.
  • block 658 may determine the centroid with the highest absolute value of an inner product between feature zi and the centroid.
  • block 658 may determine the closest centroid based on any other method that compares the distances determined by loop 3.
  • Block 658 may assign the feature zj to the closest centroid by updating the cluster corresponding to that closest centroid.
  • 658 may update the corresponding cluster to include an identifier to feature zj.
  • Block 658 may pass control to block 659 when the value of j is greater than N.
  • Block 659 may end loop 2. Block 659 may pass control to block 660.
  • Block 660 may begin a loop (hereinafter "loop(4)") that will loop through all the centroids ci, . . . ck and their corresponding clusters Ci, ... Ck using a variable m having a range from one to the total number of centroids K.
  • Loop(4) a loop that will loop through all the centroids ci, . . . ck and their corresponding clusters Ci, ... Ck using a variable m having a range from one to the total number of centroids K.
  • Block 660 may pass control to block 661.
  • Block 661 may update the value for the centroid c m .
  • block 661 may update the value of the centroid c m based on the first left-singular vector of the features identified by the corresponding cluster Cm.
  • Block 661 may review the values of the cluster Cm that corresponds to the centroid c m .
  • the cluster Cm may identify the features clustered with centroid c m (e.g., the features closest to centroid c m ).
  • Block 661 may create a matrix Mm based on the feature vectors identified by cluster Cm. This matrix may be the correlation matrix for the features assigned to cluster Cm.
  • Block 661 may then determine an updated value for the centroid Cm based on the left-singular vector of the matrix Mm with the strongest singular value. Block 661 may pass control to block 662 when the value of m is greater than K. Block 662 may end loop 4. Block 662 may pass control to block 663 when the value of i is greater than A. Block 663 may further optionally output updated values for all the centroid ci, . . . Ck.
  • Fig. 6B illustrates a plot of similarity clustering in accordance with present principles.
  • Fig. 6B illustrates an example of the similarity clustering performed by the method of Fig. 6A or by block 603 of Fig. 6.
  • Fig. 6B illustrates a two dimensional feature space where the generic negative features of size two exist, with each of the two axes corresponding to one of the entries in each feature vector zi
  • Fig. 6B illustrates for a set of 10 generic negative features zi 671, Z2 672, Z3 673, z 4 674, Z5 675, Z6 676, Z7 677, zs 678, Z9 679, zio 680.
  • each of the weights Ni, N 2 , N3 may represent the number of nearest features of the corresponding centroid ci, a, C3.
  • Each centroid ci, c 2 , C3 may be learned in accordance with the principles described in connection with Fig. 6A.
  • each centroid ci, a, C3 may be associated with a respective Cluster 1, Cluster 2 and Cluster 3.
  • Each feature zi is part of a cluster of the closest or nearest centroid.
  • centroid ci 681 may have a cluster of features Z6 676, Z7 677, zs 678.
  • centroid ci 681 may have an associated weight Ni 691. The value of weight Ni may be 3, indicating that there are three features assigned to cluster ci 681.
  • Centroid C2 682 may have a cluster of features zi 671, Z3 673, Z4 674, Z9 679.
  • centroid C2 682 may have an associated weight 2 692.
  • the value of weight N2 may be 4, indicating that there are four features assigned to cluster C2 682.
  • Centroid C3 683 may have a cluster of features Z2 672, zs 675, zio 680. Furthermore, centroid C3 683 may have an associated weight 3 693.
  • the value of weight N3 may be 3, indicating that there are three features assigned to cluster C3 683.
  • Fig. 7 is a block diagram illustrating an exemplary image processing system 700.
  • the image processing system includes an image processing device 710 and a display 720.
  • the device 710 and the display 720 may be connected by a physical link.
  • the device 710 and the display 720 may communicate via a communication link, such as, for example a wireless network, a wired network, a short range communication network or a combination of different communication networks.
  • the display 720 may allow a user to interact with image processing device 710, including, for example, inputting criteria for performing an image search.
  • the display 720 may also display the output of an image search.
  • the image processing device 710 includes memory and processor 730 and any other software or hardware that allow the performance of image searcher 740, RE-SVM image feature determinator 750 and accelerated feature determinator 760.
  • the device 710 may be configured to perform any of the features of the blocks depicted in FIG. 2-6a. .
  • Each of the RE-SVM image searcher 740, RE-SVM image feature determinator 750 and accelerated feature determinator 760 may be executed in whole or in part on image processing device 710. Alternatively, each of the image searcher 740, RE-SVM image feature determinator 750 and accelerated feature determinator 760 may be performed remotely and the results may be communicated to device 710 via a communication link.
  • the dashed boxes may indicate that each of the box processes may be executed on image processing device 710 or may be executed remotely.
  • Fig. 8 illustrates an example of various image processing devices 801-804 and a server
  • the image processing devices may be smartphones (e.g., device 801), tablets (e.g., device 802), laptops (e.g., device 803), or any other image processing device 804 that includes software and hardware to execute the features of described in the Figures.
  • the image processing devices 801-804 may be similar to the image processing device 710 and the image processing system 700 described in connection with Fig. 7.
  • Image searcher 740, RE-SVM image feature determinator 750 and accelerated feature determinator 760 may be executed on any of the devices 801-804, on server 805, or in a combination of any of the devices 801-804 and server 805.
  • Fig. 9 represents an exemplary architecture of an apparatus 900 which may be configured to implement methods described in relation with Fig. 1-6A and Equation Nos. 1- 10a.
  • apparatus 900 represents an apparatus that may be configured to implement the methods according to present principles, including principles described in relation to Figs. l-6a.
  • Apparatus 900 comprises following elements that are linked together by a data and address bus 901 :
  • a microprocessor 902 (or CPU), which is, for example, a DSP (or Digital Signal
  • ROM Read Only Memory
  • RAM Random Access Memory
  • an I/O interface 905 for reception of data to transmit, from an application; and a battery 906 (or other suitable power source).
  • the battery 906 is external to the apparatus.
  • the word « register » used in the specification can correspond to area of small capacity (some bits) or to very large area (e.g. a whole program or large amount of received or decoded data).
  • ROM 903 comprises at least a program and parameters. Algorithm of the methods according to the invention is stored in the ROM 903.
  • the CPU 902 uploads the program in the RAM and executes the corresponding instructions.
  • RAM 904 comprises, in a register, the program executed by the CPU 902 and uploaded after switch on of the apparatus 900, input data in a register, intermediate data in different states of the method in a register, and other variables used for the execution of the method in a register.
  • the implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or an apparatus), the implementation of features discussed may also be implemented in other forms (for example a program).
  • An apparatus may be implemented in, for example, appropriate hardware, software, and firmware.
  • the methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs”), and other devices that facilitate communication of information between end-users.
  • PDAs portable/personal digital assistants
  • the image or picture I is obtained from a source.
  • the source belongs to a set comprising:
  • a local memory e.g. a video memory or a RAM (or Random Access Memory), a flash memory, a ROM (or Read Only Memory), a hard disk ;
  • a storage interface (905), e.g. an interface with a mass storage, a RAM, a flash memory, a ROM, an optical disc or a magnetic support;
  • a communication interface (905) e.g. a wireline interface (for example a bus interface, a wide area network interface, a local area network interface) or a wireless interface (such as a IEEE 802.1 1 interface or a Bluetooth® interface); and
  • a wireline interface for example a bus interface, a wide area network interface, a local area network interface
  • a wireless interface such as a IEEE 802.1 1 interface or a Bluetooth® interface
  • an image capturing circuit e.g. a sensor such as, for example, a CCD (or Charge- Coupled Device) or CMOS (or Complementary Metal-Oxide-Semiconductor)).
  • a CCD or Charge- Coupled Device
  • CMOS Complementary Metal-Oxide-Semiconductor
  • the decoded image I is sent to a destination; specifically, the destination belongs to a set comprising:
  • a local memory (903 or 904), e.g. a video memory or a RAM, a flash memory, a hard disk ;
  • a storage interface (905), e.g. an interface with a mass storage, a RAM, a flash memory, a ROM, an optical disc or a magnetic support;
  • a communication interface (905) e.g. a wireline interface (for example a bus interface (e.g. USB (or Universal Serial Bus)), a wide area network interface, a local area network interface, a HDMI (High Definition Multimedia Interface) interface) or a wireless interface (such as a IEEE 802.11 interface, Wi-Fi ® or a Bluetooth ® interface); and
  • a wireline interface for example a bus interface (e.g. USB (or Universal Serial Bus)
  • a wide area network interface e.g. USB (or Universal Serial Bus)
  • a local area network interface e.g. HDMI (High Definition Multimedia Interface) interface
  • a wireless interface such as a IEEE 802.11 interface, Wi-Fi ® or a Bluetooth ® interface
  • bitstream BF and/or F are sent to a destination.
  • bitstream F and BF or both bitstreams F and BF are stored in a local or remote memory, e.g. a video memory (904) or a RAM (904), a hard disk (903).
  • one or both bitstreams are sent to a storage interface (905), e.g. an interface with a mass storage, a flash memory, ROM, an optical disc or a magnetic support and/or transmitted over a communication interface (905), e.g. an interface to a point to point link, a communication bus, a point to multipoint link or a broadcast network.
  • the bitstream BF and/or F is obtained from a source.
  • the bitstream is read from a local memory, e.g. a video memory (904), a RAM (904), a ROM (903), a flash memory (903) or a hard disk (903).
  • the bitstream is received from a storage interface (905), e.g. an interface with a mass storage, a RAM, a ROM, a flash memory, an optical disc or a magnetic support and/or received from a communication interface (905), e.g. an interface to a point to point link, a bus, a point to multipoint link or a broadcast network.
  • apparatus 900 being configured to implement methods in accordance with present principles, belongs to a set comprising:
  • a tablet or tablet computer
  • a video server e.g. a broadcast server, a video-on-demand server or a web server.
  • apparatus 900 being configured to implement an image processing process in accordance with present principles, belongs to a set comprising:
  • a mobile device a communication device ;
  • a tablet or tablet computer
  • Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications.
  • Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a preprocessor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices.
  • the equipment may be mobile and even installed in a mobile vehicle.
  • An aspect of present principles may be implemented on any electronic device or combination of electronic devices.
  • the present principles may be implemented on any of variety of devices including a computer, a laptop, a smartphone, a handheld computing system, a remote server, or on any other type of dedicated hardware.
  • Various examples of the present invention are described below with reference to the figures.
  • Algorithm 1 E-SVM feature encoding with PECASOS.
  • Algorithm 1 may perform determinations as discussed in connection with Equation Nos. (1) and (3). The normalization in Equation Nos. (2) and (4) is carried out at the end of the algorithm (line 12).
  • Algorithm 1 was performed on two publicly available datasets, Holidays and Oxford.
  • the Holidays dataset consists of 1491 images of vacation shots divided into 500 groups of matching images.
  • the second dataset, the Oxford dataset consists of close to 5000 images of buildings from the city of Oxford.
  • the images are divided into 55 groups of matching images, and a query image is specified for each group.
  • the query image consisted of full images (as opposed to cropped regions).
  • Both datasets include a specific evaluation protocol based on mean Average Precision (mAP) that we use throughout.
  • mAP mean Average Precision
  • Figs. 10-13 illustrate the effect of the RE-SVM encoding parameters on mAP performance on the Holidays dataset.
  • Fig. 10 illustrates the effect of a negative pool size N on the performance of the system.
  • the mean Average Precision (mAP) vertical axis
  • SGD Stochastic Gradient Descent
  • N 10e3 and 60e3
  • the benefit of increasing the number T of iterations saturates after WOeS' iterations. Due to this observed saturation, T— 100e3 iterations was utilized in other experiments.
  • Table 1 illustrates runtimes for E-SVM learning. As shown in Table 1, using lOOeS iterations takes 620 ms with a non-optimized implementation using the "C" programming language.
  • Table 1 E-SVM encoding runtime when using a single core running at 2.6 GHz.
  • Fig. 12 illustrates the use of recursively trained RE-SVMs.
  • Fig. 12 displays mAP on the vertical axis and number of RE-SVM recursions on the horizontal axis.
  • RE-SVM-r r recursions
  • a single RE-SVM recursion produces a gain of close to 4 mAP points relative to the VLAD encoder.
  • using 6 recursions produces a gain of close to 5 mAP points.
  • Figs. 14 and 15 illustrate the robustness of the RE-SVM methods in accordance with present principles. In particular they demonstrate the robustness relative to a large number of distractor images from Flickr different from those used for the negative pool.
  • the distractor images are encoded in the same manner as the benchmark images using VLAD + RE-SVM-2.
  • Table 2 illustrates the performance of the base VLAD-6 , BoW-1000 and Fisher-64 encodings on the Holidays and Oxford benchmarks, along with the performance of the RE- SVM-1 and RE-SVM-2 features derived from each encoding.
  • Table 2 illustrates the performance of the base VLAD-6 , BoW-1000 and Fisher-64 encodings on the Holidays and Oxford benchmarks, along with the performance of the RE- SVM-1 and RE-SVM-2 features derived from each encoding.
  • the Fisher vector in particular, performs poorly initially, but gains as many as 35 mAP points (on the Holidays dataset) after two RE-SVM recursions to outperform BoW.
  • the RE-SVM system of present principles also gives an important advantage (3.6 mAP points) when using such CNN-based features as initial features.
  • Table 2 Results for VLAD, BoW, Fisher and CNN encodings and their RE-SVM-1 and RE- SVM-2 variants.
  • Table 3 illustrates the present method in a Pascal VOC image classification task when using either Nearest Class Mean (NCM) or A'-Nearest Neighbors (ii-NN) classifiers.
  • NCM Nearest Class Mean
  • ii-NN A'-Nearest Neighbors
  • Table 3 Results (mAP) on the Pascal VOC image classification task when using CNN as a base feature.
  • Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications.
  • Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a preprocessor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices.
  • the equipment may be mobile and even installed in a mobile vehicle.
  • the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette (“CD"), an optical disc (such as, for example, a DVD, often referred to as a digital versatile disc or a digital video disc), a random access memory (“RAM”), or a read-only memory (“ROM”).
  • the instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination.
  • a processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor- readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.
  • implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted.
  • the information may include, for example, instructions for performing a method, or data produced by one of the described implementations.
  • a signal may be formatted to carry as data the rules for writing or reading the syntax of a described example, or to carry as data the actual syntax-values written by a described example.
  • Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal.
  • the formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream.
  • the information that the signal carries may be, for example, analog or digital information.
  • the signal may be transmitted over a variety of different wired or wireless links, as is known.
  • the signal may be stored on a processor-readable medium.
  • Such a machine may include, for example, any suitable processing platform, computing platform, computing device, processing device, computing system, processing system, computer, processor, or the like, and may be implemented using any suitable combination of hardware and/or software.
  • the computer- readable medium or article may include, for example, any suitable type of memory unit, memory device, memory article, memory medium, storage device, storage article, storage medium and/or storage unit.
  • the instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, encrypted code, and the like, implemented using any suitable high-level, low-level, object- oriented, visual, compiled and/or interpreted programming language.
  • the implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program).
  • An apparatus and constituents included therein, for example, a processor, an encoder and a decoder may be implemented in, for example, appropriate hardware, software, and firmware.
  • the methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs”), and other devices that facilitate communication of information between end-users.
  • PDAs portable/personal digital assistants
  • Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.
  • Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
  • Receiving is, as with “accessing”, intended to be a broad term.
  • Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory).
  • “receiving” is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Library & Information Science (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

An aspect of present principles relates to methods and apparatus for image searching. The method may include determining at least an RE-SVM feature based on a recursive exemplar support vector machine feature of an order n (RE-SVMn) and performing image searching by applying the determined RE-SVM feature to a plurality of features of search images. The apparatus may include a processor. The processor may be configured to determine at least an RE-SVM feature based on a recursive exemplar support vector machine feature of an order (RE-SVMn). The processor may further be configured to perform image searching by applying the determined RE-SVM feature to a plurality of features of search images. The RE-SVMn feature is determined based on at least an initial positive feature and at least an initial negative feature when n equals one. The RE-SVMn feature is determined based on at least a positive RE-SVMn-1 feature and at least a negative N-1 RE-SVM feature when n is greater than one.

Description

METHODS, SYSTEMS AND APPARATUS FOR IMAGE RECOGNITION BASED ON RECURSIVELY DETERMINED EXEMPLAR-SUPPORT VECTOR MACHINES (E-SVM) FEATURES
TECHNICAL FIELD
The present disclosure relates to image recognition or searching. More precisely, the present disclosure relates to image searching based on Exemplar Support Vector Machines (E- SVM).
BACKGROUND
Various computer or machine based applications offer image searching abilities. An image search is a search for an image or images. The input to the image search may be an image or images. Image searching is relevant in many fields, including computer vision, object recognition, video tracking, and video-based location determination/mapping.
Fig. 1 illustrates a method 100 for performing image searching. The method 100 includes receiving query image(s) 1 10 that will be the basis of the image search; extracting image features 120 from the query image(s); and performing image searching 130 by using the image features from block 120. Block 130 may perform image searching based on any technique, including image retrieval and semantic searching techniques.
Fig. 1A illustrates an example of an image retrieval search technique. An image retrieval search generally locates images that have the same scene as a query image. The located same-scene images may even have scene alterations resulting from task-related transformation(s) (e.g., changes in scene illumination, image cropping, scaling, wide changes in the perspective of the camera, high compression ratios, or picture-of-video-screen artifacts).
Query image features may be received at block 141. Block 142 may perform image retrieval based on image features received from block 141 and image features received from a large database of features 145. Block 142 may compare the image features received from block 141 with the image features received from the large database of features 145. The large database of features 145 may consist of image features extracted from images stored in a database of search images 143. Block 144 may extract features of each image in the database of search images 143. The large database of features 145 may store the results determined by block 144. Blocks 143-145 may be computed offline or prior to the execution of blocks 141, 142, and 146.
In one example, block 142 performs image retrieval based on a distance between the features from block 141 and the features of the large database from block 145. For example, block 142 may calculate the Euclidean distance between a query image feature from block 141 and each of the features from block 145. Block 142 may choose the images from database 143 that correspond to the smallest distances. Block 142 may provide the search results to block 146. Block 146 may output the search result(s). If multiple results are returned, the results may be ranked and the ranking may be provided along with the results.
Fig. IB illustrates an example of a semantic search. The semantic search searches images for visual concepts representing a search query (e.g., cats). Block 151 receives features of positive images (i.e., features of images that contain the queried visual concept (e.g., cats)). Block 152 receives features of negative images (i.e., features of images that do not contain the queried visual concept (e.g., no cats, instead contain dogs, other animals or no animals)). Block 153 may perform classifier learning to determine a classifier. Block 153 performs classifier learning based on the positive features from block 151 and the negative features from block 152. Block 153 may choose a classifier that best differentiates the positive features from the negative features. In one example, the classifier learning may be an SVM algorithm. Alternatively, the classifier learning may be other learning techniques.
Block 155 may perform image searching based on the classifier from block 153 and the features from the large database 154. The large database of features 154 may consist of feature vectors of images that will be searched. For example, the large database of features 154 may be determined in accordance with the principles described in connection with block 145 of Fig. l a. Block 155 may apply the classifier from block 153 to each of the features from block 154. Block 156 may output the result(s) from block 155. If multiple results are returned, the results may be ranked and the ranking may be provided along with the results.
An E-SVM is a classifier that is a linear Support Vector Machine learned from a single positive example, referred to as the exemplar. The exemplar can be determined based on the positive example and a set of generic, negative examples N. For example, Malisiewicz et al, Exemplar-SVMs for Visual Object Detection, Label Transfer and Image Retrieval, International Conference of Machine Learning, 2012 discuss E-SVMs.
Present E-SVM search techniques require an inefficiently large set of generic, negative examples Af (e.g., a set of one million feature vectors). The large set is inefficient during data computations. Also, present E-SVM search techniques are limited to using an E-SVM representation as a classifier.
Hard-negative mining is a process of (i) selecting subset of generic negative features Af (hereinafter referred to as the "hard-negative cache"); (ii) training a classifier based on the hard- negatives cache; and (iii) increasing the hard-negative cache based on items from Af that have a classification score inside a margin (i.e., greater than - l). However, E-SVM searching based on hard-negative mining is still a highly complex and inefficient process because it still relies on a large set of generic negatives.
Present E-SVM techniques also have limited generalization because, despite the size of the negative set, they utilize a single positive exemplar. Present E-SVM determinations rely on asymmetric determinations. Image retrieval using the asymmetric approach has many shortcomings, including sub-par performance relative to features tailored for the image retrieval task. The asymmetric approach has not been considered for semantic search because E-SVMs are tailored for data-constrained scenarios, and many positives are generally available in semantic search. The asymmetric approach relies on a very large generic negatives pool, and suffers from limits in generalization that must be overcome by deriving extra positives from the original exemplar.
SUMMARY OF INVENTION
There is a need for a novel application of E-SVM the shortcomings listed above.
An aspect of present principles relates to methods and apparatus for image searching. The method may include determining at least an RE-SVM feature based on a recursive exemplar support vector machine feature of an order n (RE-SVMn) and performing image searching by applying the determined RE-SVM feature to a plurality of features of search images. The apparatus may include a processor. The processor may be configured to determine at least an RE-SVM feature based on a recursive exemplar support vector machine feature of an order n (RE-SVMn). The processor may further be configured to perform image searching by applying the determined RE-SVM feature to a plurality of features of search images. The RE-SVMn feature is determined based on at least an initial positive feature and at least an initial negative feature when n equals one. The RE-SVMn feature is determined based on at least a positive RE-SVMn- 1 feature and at least a negative N-l RE-SVM feature when n is greater than one.
The RE-SVM feature may represent at least a query image when the image search is an image retrieval search. The RE-SVM feature may represent at least a positive image when the image search is a semantic search. The semantic search may include determining at least a second RE-SVM feature representing at least a negative image.
The features of the search images may be RE-SVMn features. The RE-SVM feature may be determined based on n-iterations of performing an E-SVM function based on the at least a positive RE-SVMn-i feature and the at least a negative RE- SVMn-i feature. The at least a negative RE-SVMn-i feature may be determined by performing an E-SVM function based on a corresponding RE-SVMn-2 feature and at least another RE-SVMn-2 feature. The at least a negative RE-SVMn-i feature may be determined offline.
The image searching may be performed based on the RE-SVMn feature. The image searching may include determining a classifier based on the positive RE-SVMn feature and negative RE-SVMn feature. A score may be determined for each search image when the classifier is applied to that search image. The score may be determined by applying the classifier to RE-SVM features of the search images.
A local descriptor may be determined for the RE-SVMn feature. The image search may be performed based on an aggregation of local descriptors.
SUMMARY OF THE DRAWINGS
The features and advantages of the present invention may be apparent from the detailed description below when taken in conjunction with the Figures described below:
FIG. 1 illustrates a flow diagram of a method for image searching.
FIG. 1 A illustrates a flow diagram of a method for an image retrieval search.
FIG. IB illustrates a flow diagram of a method for a semantic search.
FIG. 1C illustrates an example of an illustration of a graphical representation of an E-
SVM.
FIG. 2 illustrates a flow chart of an exemplary method for determining E-SVM feature(s).
FIG. 3 illustrates a flow chart of an exemplary method for determining an RE-SVM feature(s).
FIG. 4 illustrates a flow chart of an exemplary method for semantic search using RE- SVM features.
FIG. 5 illustrates a flow chart of an exemplary method for extracting RE-SVM local descriptors.
FIG. 6 illustrates a flow chart of an exemplary method for accelerated E-SVM determinations.
FIG. 6A illustrates a flow chart of an exemplary method for centroid determination.
FIG. 6B illustrates an exemplary diagram of a plot of similarity clustering.
FIG. 7 illustrates a block diagram of an exemplary image processing device.
FIG. 8 illustrates a block diagram of an exemplary distributed image processing system. FIG. 9 illustrates an exemplary apparatus.
FIG. 10 illustrates an exemplary plot diagram of experimental results,
FIG. 11 illustrates an exemplary plot diagram of experimental results,
FIG. 12 illustrates an exemplary plot diagram of experimental results,
FIG. 13 illustrates an exemplary plot diagram of experimental results,
FIG. 14 illustrates an exemplary plot diagram of experimental results,
FIG. 15 illustrates an exemplary plot diagram of experimental results.
DETAILED DESCRIPTION
As used herein, an "E-SVM function" refers to a determination of linear classifiers based on one or more positive and negative examples. The E-SVM function may perform a learning determination based on the positive examples of a class (i.e., examples that represent the class; for example, for the class "cats", the set of examples may consist of images of cats) and negative examples (i.e., examples that do not represent a class). As described herein, a learning algorithm may choose a function that best differentiates positive examples from negative examples.
As used herein, an "E-SVM feature" refers to the output of the E-SVM function.
As used herein, an "RE-SVM function" refers to a recursive framework for determining RE-SVM features in accordance with the principles herein. The recursive framework may determine an initial set of E-SVM features based on image feature(s) of a query image and image features of generic negative images. The recursive framework may determine an initial RE-SVM feature by applying an E-SVM function to E-SVM features of query image(s) and E- SVM features of generic negative images. The recursive framework may then recursively determine RE-SVM features by repetitively applying the E-SVM function based on a prior repetition's determination of RE-SVM query features and features of generic negative images. As used herein, an "RE-SVM feature" refers to the output of the RE-SVM function.
The scalars, vectors and matrices may be denoted using, respectively standard, bold, and uppercase-bold typeface (e.g., scalar a, vector a and matrix A). The notation vk may denote a vector from a sequence vl tv2, . . . , vN, and vk may denote the A-tli coefficient of vector v. The notation [a*]* (respectively, [<¾]¾) may denote concatenation of the vectors afc (scalars <¾) to form a single column vector. Finally, | may denote the Jacobian matrix with (a, j)-th entry fjk An aspect of present principles is directed to mapping E-SVM features to RE-SVM feature(s) based on an E-SVM function. An aspect of present principles is directed to determining RE-SVM feature(s) during a post-processing stage. In one example, enhancement of image searching and image classification may occur based on E-SVM features obtained by concatenating the activations of Deep Convolutional Neural Networks (DCNNs). An aspect of present principles is directed to automatically determining an RE-SVM feature that contains unique information of a query image relative to information of generic, negative images. The automatic extraction of the unique information is more efficient than feature extractors that rely on manual encoding of uniqueness into a feature extraction process.
An aspect of present principles is directed to image searching based on symmetrically determined E-SVM features. As used herein, "symmetric" determinations or "symmetrically" determined refers image searching based on E-SVM features of query images (i.e., positive images), E-SVM features of generic negative images, and E-SVM features of images that will be searched. These symmetric determinations improve robustness and efficiency because the E-SVM function has been applied to the images and this function determines, from the feature representing each image, a representation that separates the image from a large generic negative pool of features.
An aspect of present principles is directed to a generic, -dimensional image feature encoder. The image feature encoder may be a global encoder, may be based on aggregated local features, or may be derived from CNNs-based features. The features may be denoted by vectors in a space RD. The parameter ffDmay represent the space of all possible vectors having D, real-valued entries.
An aspect of present principles provides a novel solution to the problem of complexity of generic negative image sets with a large amount of images. An aspect of present principles reduces complexity based on clustering of the generic negative features. The clustering may consist of deriving centroids. Each centroid may represent a subset of the large set of generic negative images. Each centroid may be associated to a subset of generic negative features that are closest to that centroid. That is, the generic negative features of a subset are the generic features that are closer to that centroid than to any other centroid. The distance may be determined by maximizing the inner-product similarities between the generic negative features of the subset and the centroid.
An E-SVM feature in accordance with present principles may be determined based on an optimization determined as follows:
Figure imgf000008_0001
Equation No. (1) 2
The parameters λ, αι, and α ι may have positive, scalar values. The parameters λ, αι, and on may control regularization and relative weight of positive and negative features. Each of the parameters λ, αι, and on can be manually chosen or can be automatically determined. The parameter x may be a feature vector that relates to a query image, a generic negative image, or a search image. In one example, feature vector x may be determined using an encoder such as a VLAD, Fisher Vector, CNN activation coefficients, or another E-SVM feature. The parameter M may represent a set of generic feature vectors (e.g., of a size N). Parameter w is the parameter that is being optimized under Equation No. (1). The parameter zi is negative feature that is part of the set of .A/negative features. The set of generic feature vectors N may be a large set of vectors.
The optimization of Equation No. (1) may determine the best (i.e., the optimal) vector w. The optimization may be based on stochastic gradient descent. The stochastic gradient decent may be determined using a Pegasos algorithm. The stochastic gradient descent may process a feature vector of parameter x or parameter zi The feature vectors x and zi may be processed in a random order. The parameters ai and a-i may be determined during the stochastic gradient descent algorithm. The optimization of Equation No. (1) may be an SVM problem relying on hinge loss, where the amount of features in the positive set is unbalanced relative to the amount of features in the negative set. For example, there may be one positive feature and a million negative features.
In one example, an E-SVM feature may be determined based on a έ·2 normalization of the result of Equation No. (1). In one example, the normalized E-SVM feature may be determined as follows:
, ,,, v(x,M)
w ( χ, Λ ) =— ~— ~—
Equation No. (2) " "
The parameters x and may have the same values as parameters x and λί of Equation
No. (1).
In other examples, other types of normalization may be used such as h normalization or other type of post-processing may be used such as zeroing of negative coefficients or signed square roots of coefficients. An E-SVM feature in accordance with present principles may be determined based on an optimization determined as follows:
u(x, ). b argniin |w'|2+Ci max(0, ί—πΊ w'~i»)+rr?r Y max(0, l+zTw'+b), Equation No. (3) ' w.t pv | a6Af
The parameters x and Jsf may have the same values as parameters x and Jsf of Equation No. (1). The parameter z may denote a negative feature in a set of negative features Jsf. The parameters Ci, C2 are regularization parameters. In one example the parameters Ci, C2 may be customized empirically. The parameters w' and b denote the parameters over which there is optimization or minimization.
In one example, an E-SVM feature may be determined based on a ¾ normalization of the result of Equation No. (3) as follows:
, , .. u(x, AO
w(x. JM J = r;' "-;7;-r
Equation No. (4) ' |Μ(Χ, ΛΓ) |
In other examples, other types of normalization may be used such as h normalization or other type of post-processing may be used such as zeroing of negative coefficients or signed square roots of coefficients.
An aspect of present principles is directed to determining RE-SVM feature(s) based on E-SVM features, including E-SVM features determined in accordance with the principles described in connection with Equation Nos. (l)-(4). In one example, the RE-SVM feature(s) may be determined based on an E-SVM function that utilizes previously determined E-SVM features of query, training and/or search images. In one example, a feature encoder may encode E-SVM features instead of the image feature representations. For example, if x is a feature vector encoded by a feature encoder, the feature encoder may instead encode an E-SVM feature (e.g., w(x,A ).
An aspect of present principles is directed to E-SVM functions based on symmetrically determined E-SVM features. The symmetric approach includes determining E-SVM features for a query image x°. The symmetric approach further includes determining corresponding E- SVM features for each image x in a database V. The database D may consist of images that will be searched.
The E-SVM feature w° for a query image x° may be represented by:
Equation No. (5) w° = w(x°, .V) The set of E-SVM features Wi corresponding with image xi in a database P of search images may be represented by:
Equation No. (6) {w, = w(xitAi))}i
During image searching, search images ¾ or E-SVM features Wi corresponding may be sorted based on their similarity to the query image. In one example, similarity may be measured between an E-SVM feature w° derived from a query image feature x° and an E-SVM feature Wj derived from a database image feature xj as follows:
T o
Equation No. (7) w, w
The parameter wy represents the ESVM feature of a database image xy. The parameter w° represents the ESVM feature of a query image. The parameter Sj may represent a similarity score of the image feature of the query image and the jth search image. Because each feature w is uniquely associated with a single image, the parameter Sj may also represent a similarity score between the query image and the database image xj. A high similarity score may indicate a high similarity between the images, whereas a low similarity score may indicate a low similarity between the images. In one example, the score may be determined based on the inner product between wj and w°. In another example, the score may be determined based on a negated Euclidean distance (e.g.,— | wj - w0 |2).
Image searching based on E-SVM or RE-SVM features includes learning a classifier. A classifier based on E-SVM or RE-SVM features may be learned based on an optimization as follows:
„ JV
argiain M2 + ¾· max(0, 1 - Vi(w v + b))
Equation No. (8) v b i=i
The annotated training set {j/i . i }^ may consist of labels Vector v represents a classifier v. Parameter C represents a scalar chosen empirically and controlling the regularization. Parameter N represents the size of the training set. Parameters Wi represent the E-SVM features of the training set images and b represent the bias or offset between the origin and the hyperplane orthogonal to v.
In an image search based on a symmetric determinations, E-SVM or RE-ESVM features will be determined for the query image, the negative images and the search images. The image classifier v is determined based on the training (positive and negative) images. The image classifier v is applied to a database of searched images that are also represented by E-SVM or RE-SVM features.
In one example, a classifier may be learned from the RE-SVM representations of training images. For example, the training image(s) may be utilized to determine a K-nearest- neighbor classifier, or a nearest-class-mean classifier. The RE-SVM features may be determined by recursively applying an E-SVM function to E-SVM or RE-SVM features (e.g.,
In one example, initial E-SVM features may be represented by:
Equation No. (9a) w° = x
Equation No. (9b) M° = M'
The parameter w° may represent the initial feature of an image. The image can be any image, including a query, database, or training image. The parameter Λ*"0 may represent the initial feature representations of generic, negative images. In one example, a first RE-SVM (RE-SVMi) feature is determined based on an E-SVM function applied to initial features. In one example, a j-th recursion RE-SVM (RE-SVMj) feature may be determined as follows:
Equation No. ( 10) w = w( wJ" 1 - 1 ) .
Equation No. (10a) where Mj = {w(z, i→), z€ Λ^1 }
The value of j may be equal to or greater than 1. The -th RE-SVM (or the -th E-SVM) may be determined in accordance with the principles described with Equations Nos. (l)-(4). The E-SVM function in Equation Nos. (10) and (10a) may be derived according to Equation Nos. (l)-(4).
In one example, the RE-SVM feature construction may be similar to deep architectures that use an output of a given layer as the input of a subsequent layer. However, an aspect of present principles, unlike deep architecture, determines E-SVM or RE-SVM features using an unsupervised learning routine on a per-image basis. As used herein, unsupervised may indicate automatic, machine learning without user annotations to a training set. Additionally, each j-th RE-SVM feature (RE-SVMj) may be determined by a single, non-linear, and/or convex problem, as opposed to a standard tandem linear/non-linear arrangement.
In one example, local descriptors may be extracted from images. The local descriptors may be extracted for the query image, the negative images, the database images that will be searched, or any other images that are used to learn classifiers. The descriptors may be aggregated based on a descriptor aggregating method, such as VLAD, the Fisher Vector, or Bag-of-Words to produce a single feature vector.
Since E-SVMs are agnostic to the underlying feature representation, it is possible to compute an E-SVM from each local descriptor by using an adequate set M of generic local descriptors, and further to do so recursively as in Equations (10) and (10a). The resulting RE- SVM local descriptors can subsequently be used to build aggregated features using the standard methods previously mentioned such as VLAD, the Fisher Vector or Bag-of-Words. In turn, the subsequent features can be further processed using RE-SVM learning functions as discussed in connection with Equation Nos. (10) and (10a).
Accelerating E-SVM learning using similarity clustering
An aspect of present principles is directed to clustering generic, negative image features to reduce complexity. In one example, computational complexity of E-SVM feature determinations is decreased despite the existence of a large amount of generic, negative images. In one example, the amount of generic negative image features is reduced based on corresponding centroid features. This reduction results in reduced storage and complexity. In one example, this decrease is achieved by deriving a smaller amount of representative centroids from a large amount of generic, negative features. As used herein, a "centroid" may refer to a vector that represents a subset of features. For example, the centroid may represent the mean of the subset, the medoid of the subset (i.e., the subset feature closest to the mean) or a singular vector or eigenvector derived from the subset. The centroids may replace the large amount of generic features, thereby resulting in a smaller training set for an E-SVM function, and hence a faster learning time. The amount of centroids needs to be chosen empirically, as using too few centroids may result in a non-representative generic negative pool.
In one example, the centroids may further have associated auxiliary data. In one example, the auxiliary data may be a weight that is associated with each centroid. In one example, the weight may represent the number of generic negative image features that are part of a corresponding centroid. In another example, the auxiliary data may consist of statistics derived from the generic negative image features of the corresponding centroid. The statistics may include the mean and correlation matrix of the generic negative image features of the corresponding centroid. In one example, the centroids may be determined based on a clustering approach that is applied to features of generic, negative images. In one example, the clustering approach may be applied to multi-modal distribution of generic images' features. In one example, a clustering approach may locate the centroids of the multi-modal distribution. The large amount of negative image features may then be replaced by a smaller set of centroids, weighted by their importance. In one example, the centroids may be determined based on:
Equation (1 1):
Figure imgf000013_0001
where Mk = {z' G M, argmax z'1 c¾ = k}
Equation (1 1 a):
The parameters Ck and c\ are centroids. The parameter K is the number of learned centroids. The paramctcr.V is the generic negative pool. The parameter Mk is the subset of features from the generic negative pool that correspond to the centroid Ck. The subset of features Mk may be assigned to centroid Ck because they are closer to centroid Ckthan any other centroid. The parameters z and z' are feature vectors from the generic negative pool. Each centroid e¾ vector may be determined based on a first left-singular vector of a matrix derived by stacking the vectors for z e Mk as columns.
In one example, the process for clustering features to determine centroids may include: i) updating each ¾ in accordance with Equation No. (1 1), ii) updating cluster information with a determination of whether a generic negative feature should be assigned to that cluster, and (iii) repeating (i) and (ii) recursively. Unlike dictionary learning for sparse coding, the present clustering approach includes clustering that is sensitive to the sign of the inner product z'Tc/ (compare with dictionary learning where the cluster assignment is done using the absolute value, e.g., |z'Tc; |). In one example, the process for clustering or centroid determination may be performed by block 603 of Fig. 6 or blocks 658 and 661 of Fig. 6A. After all the centroids ck are determined, each feature z may be assigned a centroid. In one example, the feature z may be assigned to a centroid based on the distance between the feature z and the centroid.
After the features have been assigned to centroids, an E-SVM function may be performed based on the centroids of the generic negative images instead of the features of the generic negative images. For example, E-SVM features may be determined by applying an E- SVM function to centroids as follows: u(x, A0, b = argmin | '|2+0i max(0, l-xTw'-&)+rj r Ύ A¾ max(0, l+cJw'+6). Equation 12 w'-fc <Λ 1 ¾=ι
As shown above, Equation No. (12) corresponds to a modification of Equation No. (1).
The parameter cfc may correspond to a kth centroid. The parameter J\¾ denotes the number of vectors in Af assigned to e¾,. Letting |N| denote the number of feature vectors inside the generic negatives N, the computation time required to solve the problem in Equation No. (12) is the same as that required when using only K < \ \ generic negative features.
Fig. 2 illustrates a flow chart of a method 200 for extracting E-SVM features. Block
201 may receive at least an image. In one example, block 201 may modify the image to include at least a mirrored, flipped, cropped, shifted and scaled version of the original image.
Alternatively, block 201 may receive a plurality of images of a queried scene (e.g., a plurality of images of the same scene of the statue of liberty).
Block 202 may determine an initial feature vector for the image received at block 201.
The initial, base feature vector may be any type of feature vector, including VLAD, BoW, Fisher, and deep CNN activations. The initial feature vector of the image may also be known as the positive feature.
Block 203 may receive a set of generic images. In one example, the generic images may comprise the generic negative set as discussed in connection with Equation Nos. (1) and
(3). The amount of generic negative set images may be large enough to allow for a reasonable computational time on a target CPU (e.g., a per-image E-SVM feature construction time of about 1 second, corresponding to about 100,000 negative images). In one example, the content of the generic negative images varies between each negative image.
Block 204 may determine initial feature vectors the generic, negative images received from block 203. For example, block 204 may determine an initial feature vector for each image in the set of generic images. The feature vectors determined by block 204 may be the same type of feature vectors as the feature vector determined by block 202. Block 204 may provide the determined feature vectors, i.e., the generic negative features, to block 205.
Block 205 may determine an E-SVM feature 210. Block 205 may determine the E-
SVM feature 210 based on the initial feature vector from block 202 and the initial generic negative feature vectors received from block 204. Block 205 may include one or more of blocks
206, 207 and 208. In one example, block 205 may combine or omit the functionality described in connection with one or more of blocks 206, 207 and 208. Alternatively, the functionality of blocks 206-208 may be combined and performed by block 205. In one example, block 205 may perform a classifier determination other than the SVM determination. In one example, block 205 may perform additional post-processing scheme to the determined exemplar feature.
Block 206 may perform an SVM determination based on the initial feature from block 202 and the initial generic negative features from block 204. Block 206 may perform an SVM or E-SVM function. Block 206 may determine a vector u and a bias parameter b. The vector u may be orthogonal to a hyper-plane that separates generic negatives from an exemplar feature(s). The bias parameter b may specify a displacement of the hyper-plane along the orthogonal. In one example, block 206 may determine the orthogonal vector u and the bias parameter b in accordance with the principles described in connection with Equation Nos. (1) and (3).
Block 207 may process a vector u from block 206. Block 207 may normalize the vector u to determine parameter w. Block 207 may optionally post-process the vector u or parameter w. For example, block 207 may perform post-processing such as a normalization under a 1-2 normalization (e.g., in accordance with the principles described in connection with Equation Nos. (2) and (4)). In another example, the processing may be squared root related to the Hellinger Kernel. In another example, the processing can be a sequential application of a plurality of post-processing solutions.
Block 205 may output an E-SVM feature 210. The E-SVM feature 210 may correspond to the E-SVM feature illustrated in Fig. 1C. Alternatively, the E-SVM feature 210 may be a post-processed version of the E-SVM feature illustrated in Fig. 1C.
Fig. 1C illustrates an orthogonal vector « 174 and an E-SVM feature w 170. An SVM algorithm may determine an orthogonal vector u 174 and a bias b parameter 173. The orthogonal vector u 174 may be determined as discussed in connection with blocks 205 and 206 of Fig. 2. The orthogonal vector « 174 may be orthogonal to a separating hyper-plane 172. The orthogonal vector u 174 may be determined based on an exemplar feature x that may represent a queried image or a positive image concept. The orthogonal vector u 174 may be normalized to produce the E-SVM feature w 170. For example, the orthogonal vector u 174 may be normalized as described in connection with block 207 of Fig. 2. The parameter u and/or the E- SVM feature w 170 may be determined based on negative vectors zi 175. In one example, negative vectors zi may be determined in accordance with the principles described in connection with block 204 of Fig. 2. The negative vectors zi 175 may be selected from a pool of generic features 176.
Fig. 3 illustrates a flow chart of a method 300 for determining RE-SVM feature(s) in accordance with present principles. Block 301 may receive one or more image(s).
Block 302 may determine an initial feature for the image from block 301. The initial feature vector of the image may be known as the initial positive feature(s). In one example, block 302 may determine the initial feature vector in accordance with the principles described in connection with block 202 of Fig. 2.
Block 303 may receive a set of generic images. The generic images may be negative images determined in accordance with the images described in connection with block 203 of Fig. 2. Block 304 may determine initial negative features for the generic, negative images from block 303. Block 304 may determine initial negative features in accordance with the principles described in connection with block 204 of Fig. 3. In one example, block 304 may be performed off-line or prior to the performance of method 300.
Block 305 may determine a first, positive RE-SVM feature (RE-SVMi) based on the positive feature(s) from block 302 and the negative features from block 304. Block 305 may determine the RE-SVMi feature based on an SVM determination, e.g., by applying an E-SVM function to the positive and negative features. Block 305 may perform an SVM determination as discussed in connection with Fig. 2 (e.g., blocks 205-207) and Fig. 1C. Block 305 may determine the RE-SVMi feature as described in connection with Equation Nos. (1) to (4).
Block 306 may determine first negative RE-SVMi features Λίι corresponding to the generic image features from block 304. In one example, block 306 may determine the first generic negative RE-SVMi features in accordance with the principles described in connection with Equation Nos. (l)-(4), (10), (10a). For example, block 306 may use the output of block 304 to replace the set of generic features discussed in connection with Equation Nos. (1)- (4), (10), (10a). In one example, block 306 may use the output of block 304 as a generic negative pool Af° and then may use and each feature in the pool of features JV° to determine a corresponding RE-SVM feature while using all or some of the features of the pool Λ*0 as a generic negative pool. The RE-SVM features may be determined as described in connection with Equation Nos. (l)-(4), (10), (10a). In one example, block 306 may be optional and the image features from block 304 may be provided directly to block 307. In one example, block 306 may be performed off-line or prior to the performance of method 300. Block 307 may determine a second, positive RE-SVM feature (RE-SVM2). Block 307 may determine the RE-SVM2 feature based on the first, positive RE-SVMi feature and the first negative RE-SVMi features Λί1. Block 307 may determine the RE-SVM2 feature based on an SVM determination. The SVM determination of block 307 would consider second, positive RE-SVM2 feature as the positive exemplar and first negative RE-SVM features λί1 as the negative exemplars. Block 307 may perform an SVM determination as discussed in connection with Fig. 2 (e.g., blocks 205-208) and Fig. 1C. Block 307 may determine the RE-SVM2 feature as described in connection with Equation Nos. (1-4, 10, 10a).
Block 309 may determine an Nth positive RE-SVMN feature. Block 309 may recursively determine the RE-SVMN feature based on an RE-SVMN-I positive feature and RE-SVMN-I negative features. The N-1 RE-SVMN-I negative features may be determined by block 308. The N-1 RE-SVMN-I negative features may be determined based on the N-2 RE-SVMN-2 negative features. Each N-1 RE-SVMN- 1 negative feature may be determined in accordance with the principles described in connection with Equation Nos. (1-4, 10, 10a). Each N-1 RE-SVM negative feature is computed by using the corresponding N-2 RE-SVMN-2 negative feature as a positive feature and using all other N-2 RE-SVMN-2 features as negative features. In one example, block 308 may be optional and the image features from blocks 304 or 306 may be provided directly to block 307. In one example, block 308 may be performed off-line or prior to the performance of method 300.
Block 309 may determine the final, positive RE-SVMN feature 310 based on N iterations of the recursive framework. Block 309 may determine the RE-SVMN feature 310 based on a positive RE-SVMN- 1 and N-1 negative RE-SVM features. For example, when N=2, the output RE-SVMN feature 310 is RE-SVM2 from block 307. In another example, when N=5, RE-SVM feature 310 may be RE-SVM5 which may be determined based on a previously determined RE- SVM4 feature and previously determined fourth negative RE-SVM features for generic, negative images.
Block 309 may determine the RE-SVMN feature 310 based on an SVM determination. Block 309 may perform an SVM determination as discussed in connection with Fig. 2 (e.g., blocks 205-208) and Fig. 1C. Block 309 may determine the RE-SVMN feature as described in connection with Equation Nos. (l)-(4), (10), and (10a). The RE-SVM determination may be repeated N times to determine the RE-SVMN feature.
Fig. 4 illustrates an exemplary image search based on RE-SVM features. Block 401 may receive positive images (e.g., one or more query images). The positive images may represent a visual concept (e.g., cat). Block 402 may determine positive RE-SVM features for the positive images from block 1. Block 402 may determine positive RE-SVM features in accordance with the principles described in connection with Equation Nos. (l)-(4) and Fig. 3.
Block 403 may receive negative images. The negative images may be counter examples of the visual concept (e.g., images not containing cats). In one example, the negative images may be images a set of generic, negative images. Block 404 may determine negative RE-SVM features for the negative images from block 403. In one example, block 404 may determine negative RE-SVM features in accordance with the principles described in connection with Equation Nos. (l)-(4), (10) and (10a) and Fig. 3.
Block 405 may learn a classifier based on the negative and positive RE-SVM features received from blocks 402 and 404, respectively. The learned classifier can be an SVM classifier, or a KNN classifier, or a nearest class mean classifier, or any other type of classifier.
Block 406 may determine image search results based on the classifier from block 405 and features from a large database of image features 407. The large database of image features 407 may contain the features of the images that will be searched. The large database of image features 407 may be determined in accordance with the principles described in connection with blocks 143-145 of Fig. la and block 154 of Fig. lb. Block 406 may determine a score for each image by applying the classifier to each of the search image features. In a symmetric determination, the search features of block 407 may be RE-SVM features determined in accordance with the principles described in connection with Fig. 3 and Equation Nos. (10)- (10a). The search results may be ranked based on the score and the ranking may be provided along with the results.
Fig. 5 illustrates a flow chart of a method 500 for extracting RE-SVM local descriptors. Block 501 may extract N local descriptors from an input image. Block 502 may extract many local descriptors from a generic pool of images. Block 503 may compute RE-SVM local descriptors based on the local descriptors extracted at blocks 501 and 502. Block 504 may output the results of block 503. The output of block 504 may be provided to a local descriptor encoder, such as a VLAD, Fisher or Bag of Words local descriptor encoder. The Ν local descriptors provided by block 504 may be aggregated into a single global image feature. The global feature may represent features of a query, positive, negative, training, search or any other image and may be used during image searching. Fig. 6 illustrates a flow chart of a method 600 for accelerated E-SVM or RE-SVM feature determination. Method 600 may include a block 601 which may receive features of an image. In one example, block 601 may receive the image and may determine the image features. In one example, the features may be E-SVM or RE-SVM features as discussed in connection with Figs. 2 and 3. Alternatively, the features may be initial image features or any other type of features. Block 601 may determine the E-SVM or RE-SVM image features as discussed in connection with methods 200 and 300 of Figs. 2 and 3, respectively. In another example, the features may be E-SVM or RE-SVM features derived from local descriptors. Block 601 may provide the image features to block 604.
Block 602 may receive a large set of features of negative, generic images. The large set of features may have a size that is a multiple of the size of the feature vector (e.g., ΙΟχ, lOOx, lOOOx). Each feature vector may have any size, including in the range of 100 to tens of thousands. The total size of the large set may be very high, including over 100,000 or over a million features.
In one example, block 602 may receive features of generic images as described in connection with blocks 303 and 304. In another example, block 602 may determine the features of generic images. The features of the generic images may be initial, base features, E-SVM features or RE-SVM features. In another example, the features may be E-SVM or RE-SVM features derived from local descriptors. Block 602 may provide the features of the generic images to block 603.
Block 603 may perform similarity clustering for the generic features from block 602. In one example, block 603 may perform cosine-similarity clustering. In one example, block 603 may perform similarity clustering in accordance with the principles described by Equation Nos. (1 1) and (11a) and output the resulting centroids Ck.
In one example, block 603 may perform similarity clustering in accordance with the principles described in connection with Fig 6a. In one example, block 603 may determine each centroid by: i) initializing clusters , and ii) updating each centroid ck and each corresponding cluster , and (iii) repeating (i) and (ii) recursively. Unlike dictionary learning for sparse coding, the present clustering approach includes clustering that is sensitive to the sign of the inner product xrrct (compare with dictionary learning where the cluster assignment is done using the absolute value, e.g., Iz'Tci (). Block 603 may optionally perform the clustering offline before any image searching is carried out. For example, the processing associated with block 603 may be performed prior to image searching and the results may be stored.
Block 604 may perform the E-SVM function using as generic negatives the centroids Ck from block 603. For example, the centroids ck may be utilized as the generic negative features are typically utilized by an E-SVM function. In one example, block 604 may perform the E- SVM function by further considering auxiliary data for each centroid.
In one example, block 604 may perform an E-SVM function based on the centroids from block 603 as the generic negative features. For example, block 604 may perform an E-SVM function as described in accordance with the principles of block 205 of Fig. 2. In another example, block 604 may perform the RE-SVM determination according to blocks 305, 307, or 309 of Fig. 3.
In one example, the auxiliary data may be represent the number of generic negative features vectors Nk in each cluster k. The variable k, with k=l, ...,K, identifies a cluster number, and the variable K denotes the total number of clusters. Block 604 may determine scaled centroids by multiplying the centroid k by the corresponding Nk . Block 604 may perform an E-SVM function based on the scaled centroids. Block 604 may perform E-SVM determinations in accordance with Equation Nos. (l)-(4) and (12). Fig. 6A illustrates an exemplary method for performing similarity clustering in accordance with present principles. Fig. 6A includes a block 650 that may receive generic negative features. The generic features may be E-SVM, RE-SVM or base features such as VLAD, Bag-of-words or Fisher vector. In one example, the generic negative features may be the features received from block 602 in Figure 6. In one example, the set of generic negative features may have a size N (i.e., there are N generic negative features received by block 651). In one example, block 650 may receive the actual values of each generic negative feature. In another example, block 650 may receive one or more pointers to the values of the generic negative features. Each negative generic feature may be represented by a vector.
Block 651 may initialize centroids in accordance with present principles. In one example, block 651 may initialize a set of centroids c\, CK„ where there are a total number of K centroids. In one example, block 651 may initialize centroids by randomly choosing K generic negative features from the features received in block 650. The value of K is determined empirically and should be smaller than the total number of generic negative features. In one example, block 651 may initialize the centroids based on the K generic negative features. For example, block 651 may loop through the K generic negative features and assign each feature to a corresponding centroid. For example, block 651 may assign a first negative feature to a first centroid ci, a second negative feature to a second centroid c2, a third negative feature to a third centroid C3, and so on until the Kth negative feature which is assigned to centroid Ck. In another example, the centroids may be initialized to fixed values that do not depend on the generic negatives. For example, each centroid may be assigned a value that is determined by a random number generator. Each centroid may be a vector of the same size as the features stored in the set N.
Block 652 may begin a loop (hereinafter "loop(l)") that will loop and perform the processing of blocks 653-662 using a variable i having a range from one through a number Z. The number Z may signify the number of loop iterations that will be performed to optimize the values of the centroids. In one example, the number Z may be received from an outside source (e.g., may be set by a user) or may be empirically determined. Block 652 may pass control to block 653.
Block 653 may initialize clusters in accordance with present principles. In one example, block 653 may initialize a set of clusters Ci, C2, ... Ck. The set of clusters may have the same amount of elements (K) as the set of centroids initialized by block 651. Each cluster C may have a one to one association with a corresponding centroid. For example, cluster Ci corresponds to centroid ci, cluster C2 corresponds to centroid c2, cluster C3 corresponds to centroid C3, ..., cluster Ck corresponds to centroid Ck. Each cluster C is a list designed to point to features associated to the corresponding centroid of that cluster. Block 653 may initialize all the clusters (e.g., clusters Ci, C2, ... Ck) to zero. Block 653 may then pass control to block 654.Block 654 may begin a loop (hereinafter "loop(2)") that will loop through all the generic negative features from block 650 using a variable j having a range from one to the total number of generic negative features N. Block 654 may pass control to block 655. For example, when j=l, loop 1 may loop through generic negative feature zi. Likewise, when j=4, loop 2 may loop through generic negative feature z4.
Block 655 may begin a loop (hereinafter "loop(3)") that will loop through all the centroids ci, ... Ck and their corresponding clusters Ci, ... Ck using a variable / having a range from one to the total number of centroids K. Block 655 may pass control to block 656.
Block 656 may determine a distance between the feature zj and a centroid ci. In one example, block 656 may determine distance based on the signed inner product or the absolute value of an inner product between the feature zj and the centroid ci. Block 656 may pass control to block 657 when the value of 1 is greater than K. Block 657 may end loop 3. Block 657 may then pass control to block 658.
Block 658 may assign feature zj to a closest centroid. In one example, block 658 may determine the closest centroid to the feature zj based on the distance between the feature zj and all the centroids determined by loop 3. In one example, block 658 may determine the centroid with the highest absolute value of an inner product between feature zi and the centroid. In another example, block 658 may determine the closest centroid based on any other method that compares the distances determined by loop 3. Block 658 may assign the feature zj to the closest centroid by updating the cluster corresponding to that closest centroid. In one example, block
658 may update the corresponding cluster to include an identifier to feature zj.
Block 658 may pass control to block 659 when the value of j is greater than N. Block
659 may end loop 2. Block 659 may pass control to block 660.
Block 660 may begin a loop (hereinafter "loop(4)") that will loop through all the centroids ci, . . . ck and their corresponding clusters Ci, ... Ck using a variable m having a range from one to the total number of centroids K. Block 660 may pass control to block 661.
Block 661 may update the value for the centroid cm. In one example, block 661 may update the value of the centroid cm based on the first left-singular vector of the features identified by the corresponding cluster Cm. Block 661 may review the values of the cluster Cm that corresponds to the centroid cm. The cluster Cm may identify the features clustered with centroid cm (e.g., the features closest to centroid cm). Block 661 may create a matrix Mm based on the feature vectors identified by cluster Cm. This matrix may be the correlation matrix for the features assigned to cluster Cm. Block 661 may then determine an updated value for the centroid Cm based on the left-singular vector of the matrix Mm with the strongest singular value. Block 661 may pass control to block 662 when the value of m is greater than K. Block 662 may end loop 4. Block 662 may pass control to block 663 when the value of i is greater than A. Block 663 may further optionally output updated values for all the centroid ci, . . . Ck.
Fig. 6B illustrates a plot of similarity clustering in accordance with present principles. In one example, Fig. 6B illustrates an example of the similarity clustering performed by the method of Fig. 6A or by block 603 of Fig. 6. Fig. 6B illustrates a two dimensional feature space where the generic negative features of size two exist, with each of the two axes corresponding to one of the entries in each feature vector zi Fig. 6B illustrates for a set of 10 generic negative features zi 671, Z2 672, Z3 673, z4 674, Z5 675, Z6 676, Z7 677, zs 678, Z9 679, zio 680. Fig. 6B further illustrates three centroids ci 681, a 682, C3 683 and their corresponding weights Ni 691, N2 692, N3 693. In an example, each of the weights Ni, N2, N3 may represent the number of nearest features of the corresponding centroid ci, a, C3. Each centroid ci, c2, C3 may be learned in accordance with the principles described in connection with Fig. 6A.
As illustrated in Fig. 6B, each centroid ci, a, C3 may be associated with a respective Cluster 1, Cluster 2 and Cluster 3. Each feature zi is part of a cluster of the closest or nearest centroid. For example, centroid ci 681 may have a cluster of features Z6 676, Z7 677, zs 678. Furthermore, centroid ci 681 may have an associated weight Ni 691. The value of weight Ni may be 3, indicating that there are three features assigned to cluster ci 681. Centroid C2 682 may have a cluster of features zi 671, Z3 673, Z4 674, Z9 679. Furthermore, centroid C2 682 may have an associated weight 2 692. The value of weight N2 may be 4, indicating that there are four features assigned to cluster C2 682. Centroid C3 683 may have a cluster of features Z2 672, zs 675, zio 680. Furthermore, centroid C3 683 may have an associated weight 3 693. The value of weight N3 may be 3, indicating that there are three features assigned to cluster C3 683.
Fig. 7 is a block diagram illustrating an exemplary image processing system 700. The image processing system includes an image processing device 710 and a display 720. In one example, the device 710 and the display 720 may be connected by a physical link. In another example, the device 710 and the display 720 may communicate via a communication link, such as, for example a wireless network, a wired network, a short range communication network or a combination of different communication networks.
The display 720 may allow a user to interact with image processing device 710, including, for example, inputting criteria for performing an image search. The display 720 may also display the output of an image search.
The image processing device 710 includes memory and processor 730 and any other software or hardware that allow the performance of image searcher 740, RE-SVM image feature determinator 750 and accelerated feature determinator 760. The device 710 may be configured to perform any of the features of the blocks depicted in FIG. 2-6a. .
Each of the RE-SVM image searcher 740, RE-SVM image feature determinator 750 and accelerated feature determinator 760 may be executed in whole or in part on image processing device 710. Alternatively, each of the image searcher 740, RE-SVM image feature determinator 750 and accelerated feature determinator 760 may be performed remotely and the results may be communicated to device 710 via a communication link. The dashed boxes may indicate that each of the box processes may be executed on image processing device 710 or may be executed remotely.
Fig. 8 illustrates an example of various image processing devices 801-804 and a server
805. The image processing devices may be smartphones (e.g., device 801), tablets (e.g., device 802), laptops (e.g., device 803), or any other image processing device 804 that includes software and hardware to execute the features of described in the Figures. The image processing devices 801-804 may be similar to the image processing device 710 and the image processing system 700 described in connection with Fig. 7. Image searcher 740, RE-SVM image feature determinator 750 and accelerated feature determinator 760 may be executed on any of the devices 801-804, on server 805, or in a combination of any of the devices 801-804 and server 805.
Fig. 9 represents an exemplary architecture of an apparatus 900 which may be configured to implement methods described in relation with Fig. 1-6A and Equation Nos. 1- 10a. In one example, apparatus 900 represents an apparatus that may be configured to implement the methods according to present principles, including principles described in relation to Figs. l-6a.
Apparatus 900 comprises following elements that are linked together by a data and address bus 901 :
a microprocessor 902 (or CPU), which is, for example, a DSP (or Digital Signal
Processor);
a ROM (or Read Only Memory) 903 ;
a RAM (or Random Access Memory) 904;
- an I/O interface 905 for reception of data to transmit, from an application; and a battery 906 (or other suitable power source).
According to a variant, the battery 906 is external to the apparatus. In each of mentioned memory, the word « register » used in the specification can correspond to area of small capacity (some bits) or to very large area (e.g. a whole program or large amount of received or decoded data). ROM 903 comprises at least a program and parameters. Algorithm of the methods according to the invention is stored in the ROM 903. When switched on, the CPU 902 uploads the program in the RAM and executes the corresponding instructions. RAM 904 comprises, in a register, the program executed by the CPU 902 and uploaded after switch on of the apparatus 900, input data in a register, intermediate data in different states of the method in a register, and other variables used for the execution of the method in a register.
The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method or an apparatus), the implementation of features discussed may also be implemented in other forms (for example a program). An apparatus may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.
According to a specific example of image processing, the image or picture I is obtained from a source. For example, the source belongs to a set comprising:
a local memory (903 or 904), e.g. a video memory or a RAM (or Random Access Memory), a flash memory, a ROM (or Read Only Memory), a hard disk ;
a storage interface (905), e.g. an interface with a mass storage, a RAM, a flash memory, a ROM, an optical disc or a magnetic support;
a communication interface (905), e.g. a wireline interface (for example a bus interface, a wide area network interface, a local area network interface) or a wireless interface (such as a IEEE 802.1 1 interface or a Bluetooth® interface); and
an image capturing circuit (e.g. a sensor such as, for example, a CCD (or Charge- Coupled Device) or CMOS (or Complementary Metal-Oxide-Semiconductor)).
According to different embodiments, the decoded image I is sent to a destination; specifically, the destination belongs to a set comprising:
a local memory (903 or 904), e.g. a video memory or a RAM, a flash memory, a hard disk ;
- a storage interface (905), e.g. an interface with a mass storage, a RAM, a flash memory, a ROM, an optical disc or a magnetic support;
a communication interface (905), e.g. a wireline interface (for example a bus interface (e.g. USB (or Universal Serial Bus)), a wide area network interface, a local area network interface, a HDMI (High Definition Multimedia Interface) interface) or a wireless interface (such as a IEEE 802.11 interface, Wi-Fi ® or a Bluetooth ® interface); and
a display.
According to different examples, the bitstream BF and/or F are sent to a destination. As an example, one of bitstream F and BF or both bitstreams F and BF are stored in a local or remote memory, e.g. a video memory (904) or a RAM (904), a hard disk (903). In a variant, one or both bitstreams are sent to a storage interface (905), e.g. an interface with a mass storage, a flash memory, ROM, an optical disc or a magnetic support and/or transmitted over a communication interface (905), e.g. an interface to a point to point link, a communication bus, a point to multipoint link or a broadcast network.
According to different examples, the bitstream BF and/or F is obtained from a source. Exemplarily, the bitstream is read from a local memory, e.g. a video memory (904), a RAM (904), a ROM (903), a flash memory (903) or a hard disk (903). In a variant, the bitstream is received from a storage interface (905), e.g. an interface with a mass storage, a RAM, a ROM, a flash memory, an optical disc or a magnetic support and/or received from a communication interface (905), e.g. an interface to a point to point link, a bus, a point to multipoint link or a broadcast network.
According to different examples, apparatus 900 being configured to implement methods in accordance with present principles, belongs to a set comprising:
a mobile device ;
a communication device ;
a game device ;
a tablet (or tablet computer) ;
a laptop ;
a still image camera;
a video camera ;
an encoding chip;
a still image server ; and
a video server (e.g. a broadcast server, a video-on-demand server or a web server).
According to different examples, apparatus 900 being configured to implement an image processing process in accordance with present principles, belongs to a set comprising:
a mobile device ; a communication device ;
a game device ;
a set top box;
a TV set;
a tablet (or tablet computer) ;
a laptop ;
a display and
a decoding chip.
Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications. Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a preprocessor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.
An aspect of present principles may be implemented on any electronic device or combination of electronic devices. For example, the present principles may be implemented on any of variety of devices including a computer, a laptop, a smartphone, a handheld computing system, a remote server, or on any other type of dedicated hardware. Various examples of the present invention are described below with reference to the figures.
In one example, experiments were conducted based on Algorithm 1 shown below, where 10<|, = 1 if a < b and 0 otherwise.
Algorithm 1 : E-SVM feature encoding with PECASOS.
Input: x, fif = A, T, f
Figure imgf000027_0001
Set z = x, y = 1
end if
10: Set wt+i = W(— jjr( AW( — l?J¾! Twc"iSz)
1 1 end far
12: Algorithm 1 may perform determinations as discussed in connection with Equation Nos. (1) and (3). The normalization in Equation Nos. (2) and (4) is carried out at the end of the algorithm (line 12).
As part of the experiments discussed herein, Algorithm 1 was performed on two publicly available datasets, Holidays and Oxford. The Holidays dataset consists of 1491 images of vacation shots divided into 500 groups of matching images. The second dataset, the Oxford dataset, consists of close to 5000 images of buildings from the city of Oxford. The images are divided into 55 groups of matching images, and a query image is specified for each group. The query image consisted of full images (as opposed to cropped regions). Both datasets include a specific evaluation protocol based on mean Average Precision (mAP) that we use throughout.
Figs. 10-13 illustrate the effect of the RE-SVM encoding parameters on mAP performance on the Holidays dataset. Fig. 10 illustrates the effect of a negative pool size N on the performance of the system. As shown in Fig. 10, the mean Average Precision (mAP) (vertical axis) increases when the size of the negative pool (horizontal axis) is larger. In other experiments, the pool size may be set to N = 60e3. Using larger pools could indeed increase the system performance, but this is at the expense of more encoder memory and longer Stochastic Gradient Descent (SGD ) runtimes.
Fig. 11 illustrates the effect on mAP v. for SGD iterations for two different pool sizes ( N = 10e3 and 60e3). As shown in Fig. 1 1, when the the pool size N equals 60e3, the benefit of increasing the number T of iterations saturates after WOeS' iterations. Due to this observed saturation, T— 100e3 iterations was utilized in other experiments.
Table 1 illustrates runtimes for E-SVM learning. As shown in Table 1, using lOOeS iterations takes 620 ms with a non-optimized implementation using the "C" programming language.
Iterations le3 10e3 100e3
Run-time 10 ms 70 ms 620 ms
Table 1: E-SVM encoding runtime when using a single core running at 2.6 GHz.
Fig. 12 illustrates the use of recursively trained RE-SVMs. Fig. 12 displays mAP on the vertical axis and number of RE-SVM recursions on the horizontal axis. As shown in Fig. 12, the RE-SVM results are determined based on r recursions (referred to as RE-SVM-r), for r = 0. 6, where r = 0 refers to the baseline results obtained with the VLAD encoder. As shown in Fig. 12, a single RE-SVM recursion produces a gain of close to 4 mAP points relative to the VLAD encoder. As shown in Fig. 12, using 6 recursions produces a gain of close to 5 mAP points.
Fig. 13 illustrates the effect of varying the exemplar sampling rate during the SGD optimization process. Note that any value between fP =\
Figure imgf000029_0001
results in an improvement relative to the baseline VLAD encoder.
Figs. 14 and 15 illustrate the robustness of the RE-SVM methods in accordance with present principles. In particular they demonstrate the robustness relative to a large number of distractor images from Flickr different from those used for the negative pool. The distractor images are encoded in the same manner as the benchmark images using VLAD + RE-SVM-2. The parameters used for both RE-SVM recursions are N = 60e3, T=100e3, and fP=30. It is noted that the same parameters selected on Holidays also give an important improvement in the Oxford dataset. Moreover, this improvement is constant over the entire range of distractor images considered. For the Holidays dataset, the improvement is in excess of 5 mAP points for the entire range of distractor images. For the Oxford dataset, the improvement is in excess of 10 mAP points likewise for the entire range of distractor images.
Table 2 illustrates the performance of the base VLAD-6 , BoW-1000 and Fisher-64 encodings on the Holidays and Oxford benchmarks, along with the performance of the RE- SVM-1 and RE-SVM-2 features derived from each encoding. As illustrated in Table 2, even a single RE-SVM recursion gives a large boost to all three encodings. The Fisher vector, in particular, performs poorly initially, but gains as many as 35 mAP points (on the Holidays dataset) after two RE-SVM recursions to outperform BoW. As shown in Table 2, the RE-SVM system of present principles also gives an important advantage (3.6 mAP points) when using such CNN-based features as initial features.
Holidays Oxford 5K
VLAD-64 72.7 46.3
VLAD-64 H h RE-SVM- 1 77.5 55.5
VLAD-64 H h RE-SVM-2 78.3 57.5
Fisher-64 18.2 9.27
Fisher-64 + RE-SVM- 1 59.8 27.7
Fisher-64 + RE-SVM-2 63.6 31.5
BoW-1000 38.6 17.0
BoW-1000 +- RE-SVM- 1 44.5 20.5
BoW-1000 +- RE-SVM-2 49.1 25.5
CNN 64.2 32.2
CNN 68.2 40.6 C + RE-SVM-1 71.3 43.9
CNN + RE-S VM-2 71.8 44.6
Table 2: Results for VLAD, BoW, Fisher and CNN encodings and their RE-SVM-1 and RE- SVM-2 variants.
Table 3 illustrates the present method in a Pascal VOC image classification task when using either Nearest Class Mean (NCM) or A'-Nearest Neighbors (ii-NN) classifiers. As shown in Table 3, there are improvements of close to 4 mAP points for the NCM classifier and up to 2 mAP points for the A'-NN classifier. The NCM and ii-NN classifiers have the important advantage that new classifiers can be added at near zero cost, contrary to one-vs-rest approaches using linear classifiers that require that the classifiers for all classes be updated periodically when adding new classes. NCM in particular further enjoys a very low testing cost.
Classifier CNN + RE-SVM-1
NCM 5 L8 55.5
A'-NN 3 60.7 62.2
Ji'-NN 5 65.7 66.5
ft'-NN 10 68.9 69.8
Table 3: Results (mAP) on the Pascal VOC image classification task when using CNN as a base feature.
Implementations of the various processes and features described herein may be embodied in a variety of different equipment or applications. Examples of such equipment include an encoder, a decoder, a post-processor processing output from a decoder, a preprocessor providing input to an encoder, a video coder, a video decoder, a video codec, a web server, a set-top box, a laptop, a personal computer, a cell phone, a PDA, and other communication devices. As should be clear, the equipment may be mobile and even installed in a mobile vehicle.
Additionally, the methods may be implemented by instructions being performed by a processor, and such instructions (and/or data values produced by an implementation) may be stored on a processor-readable medium such as, for example, an integrated circuit, a software carrier or other storage device such as, for example, a hard disk, a compact diskette ("CD"), an optical disc (such as, for example, a DVD, often referred to as a digital versatile disc or a digital video disc), a random access memory ("RAM"), or a read-only memory ("ROM"). The instructions may form an application program tangibly embodied on a processor-readable medium. Instructions may be, for example, in hardware, firmware, software, or a combination. Instructions may be found in, for example, an operating system, a separate application, or a combination of the two. A processor may be characterized, therefore, as, for example, both a device configured to carry out a process and a device that includes a processor-readable medium (such as a storage device) having instructions for carrying out a process. Further, a processor- readable medium may store, in addition to or in lieu of instructions, data values produced by an implementation.
As will be evident to one of skill in the art, implementations may produce a variety of signals formatted to carry information that may be, for example, stored or transmitted. The information may include, for example, instructions for performing a method, or data produced by one of the described implementations. For example, a signal may be formatted to carry as data the rules for writing or reading the syntax of a described example, or to carry as data the actual syntax-values written by a described example. Such a signal may be formatted, for example, as an electromagnetic wave (for example, using a radio frequency portion of spectrum) or as a baseband signal. The formatting may include, for example, encoding a data stream and modulating a carrier with the encoded data stream. The information that the signal carries may be, for example, analog or digital information. The signal may be transmitted over a variety of different wired or wireless links, as is known. The signal may be stored on a processor-readable medium.
A number of implementations have been described. Nevertheless, it will be understood that various modifications may be made. For example, elements of different implementations may be combined, supplemented, modified, or removed to produce other implementations. Additionally, one of ordinary skill will understand that other structures and processes may be substituted for those disclosed and the resulting implementations will perform at least substantially the same function(s), in at least substantially the same way(s), to achieve at least substantially the same result(s) as the implementations disclosed. Accordingly, these and other implementations are contemplated by this application.
Numerous specific details have been set forth herein to provide a thorough understanding of the present invention. It will be understood by those skilled in the art, however, that the examples above may be practiced without these specific details. In other instances, well- known operations, components and circuits have not been described in detail so as not to obscure the present invention. It can be appreciated that the specific structural and functional details disclosed herein may be representative and do not necessarily limit the scope of the present invention. Various examples of the present invention may be implemented using hardware elements, software elements, or a combination of both. Some examples may be implemented, for example, using a computer-readable medium or article which may store an instruction or a set of instructions that, if executed by a machine, may cause the machine to perform a method and/or operations in accordance with the examples. Such a machine may include, for example, any suitable processing platform, computing platform, computing device, processing device, computing system, processing system, computer, processor, or the like, and may be implemented using any suitable combination of hardware and/or software. The computer- readable medium or article may include, for example, any suitable type of memory unit, memory device, memory article, memory medium, storage device, storage article, storage medium and/or storage unit. The instructions may include any suitable type of code, such as source code, compiled code, interpreted code, executable code, static code, dynamic code, encrypted code, and the like, implemented using any suitable high-level, low-level, object- oriented, visual, compiled and/or interpreted programming language.
The implementations described herein may be implemented in, for example, a method or a process, an apparatus, a software program, a data stream, or a signal. Even if only discussed in the context of a single form of implementation (for example, discussed only as a method), the implementation of features discussed may also be implemented in other forms (for example, an apparatus or program). An apparatus and constituents included therein, for example, a processor, an encoder and a decoder, may be implemented in, for example, appropriate hardware, software, and firmware. The methods may be implemented in, for example, an apparatus such as, for example, a processor, which refers to processing devices in general, including, for example, a computer, a microprocessor, an integrated circuit, or a programmable logic device. Processors also include communication devices, such as, for example, computers, cell phones, portable/personal digital assistants ("PDAs"), and other devices that facilitate communication of information between end-users.
Additionally, this application or its claims may refer to "determining" various pieces of information. Determining the information may include one or more of, for example, estimating the information, calculating the information, predicting the information, or retrieving the information from memory.
Further, this application or its claims may refer to "accessing" various pieces of information. Accessing the information may include one or more of, for example, receiving the information, retrieving the information (for example, from memory), storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.
Additionally, this application or its claims may refer to "receiving" various pieces of information. Receiving is, as with "accessing", intended to be a broad term. Receiving the information may include one or more of, for example, accessing the information, or retrieving the information (for example, from memory). Further, "receiving" is typically involved, in one way or another, during operations such as, for example, storing the information, processing the information, transmitting the information, moving the information, copying the information, erasing the information, calculating the information, determining the information, predicting the information, or estimating the information.

Claims

1. A method (300) for image searching, comprising:
determining at least an RE-SVM feature (310) based on a recursive exemplar support vector machine feature of an order n (RE-SVMn);
performing image searching by applying the determined RE-SVM feature to a plurality of features of search images;
wherein the RE-SVMn feature is determined based on at least an initial positive feature (302) and at least an initial negative feature (304) when n equals one, and
wherein the RE-SVMn feature is determined based on at least a positive RE-SVMn-i
(305, 307, 309) feature and at least a negative N-1 RE-SVM feature (306, 308) when n is greater than one.
2. An apparatus (700, 900) for image searching, comprising:
a processor (730, 902) configured to determine at least an RE-SVM feature (310) based on a recursive exemplar support vector machine feature of an order n (RE-SVMn), the processor further configured to perform image searching by applying the determined RE-SVM feature to a plurality of features of search images;
wherein the RE-SVMn feature is determined based on at least an initial positive feature (302) and at least an initial negative feature (304) when n equals one, and
wherein the RE-SVMn feature is determined based on at least a positive RE-SVMn-i (305, 307, 309) feature and at least a negative RE-SVMn-i feature (306, 308) when n is greater than one.
3. The method of claim 1 or the apparatus of claim 2, wherein the RE-SVM feature represents at least a query image and wherein the image search is an image retrieval search.
4. The method of claim 1 or the apparatus of claim 2, wherein the RE-SVM feature represents at least a positive image and wherein the image search is a semantic search that includes determining at least a second RE-SVM feature representing at least a negative image.
5. The method of claim 1, apparatus of claim 2, or method or apparatus of claims 3 or 4, wherein the features of the search images are RE-SVMn features.
6. The method of claim 1 or the apparatus of claim 2, wherein the RE-SVM feature is determined based on n-iterations of performing an E-SVM function based on the at least a positive RE-SVMn-i feature and the at least a negative RE- SVMn-i feature.
7. The method of claim 1 or the apparatus of claim 2, wherein the at least a negative RE-SVMn-i feature are determined by performing an E-SVM function based on a corresponding RE-SVMn-2 feature and at least another RE-SVMn 2 feature.
8. The method of claim 1 or the apparatus of claim 2, wherein the at least a negative
RE-SVMn-i feature is determined offline.
9. The method of claim 1 or the apparatus of claim 2, further comprising performing image searching based on the RE-SVMn feature.
10. The method or apparatus of claim 7, wherein the image searching includes determining a classifier based on the positive RE-S VMn feature and negative RE-SVMn feature.
1 1 . The method or apparatus of claim 7, wherein a score is determined for each search image when the classifier is applied to that search image.
12. The method or apparatus of claim 11, wherein the score is determined by applying the classifier to RE-SVM features of the search images.
13. The method of claim 1 or the apparatus of claim 2, further comprising determining a local descriptor for the RE-SVMn feature.
14. The method or apparatus of claim 13, further comprising performing image searching based on an aggregation of local descriptors.
PCT/EP2015/076522 2014-11-14 2015-11-13 Methods, systems and apparatus for image recognition based on recursively determined exemplar-support vector machines (e-svm) features WO2016075274A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
EP14306810.4 2014-11-14
EP14306810 2014-11-14

Publications (1)

Publication Number Publication Date
WO2016075274A1 true WO2016075274A1 (en) 2016-05-19

Family

ID=52016008

Family Applications (2)

Application Number Title Priority Date Filing Date
PCT/EP2015/076522 WO2016075274A1 (en) 2014-11-14 2015-11-13 Methods, systems and apparatus for image recognition based on recursively determined exemplar-support vector machines (e-svm) features
PCT/EP2015/076567 WO2016075293A1 (en) 2014-11-14 2015-11-13 Accelerated support vector machine (svm) learning using clustering

Family Applications After (1)

Application Number Title Priority Date Filing Date
PCT/EP2015/076567 WO2016075293A1 (en) 2014-11-14 2015-11-13 Accelerated support vector machine (svm) learning using clustering

Country Status (1)

Country Link
WO (2) WO2016075274A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543571A (en) * 2018-11-07 2019-03-29 西安交通大学 A kind of intelligent recognition and search method of Complex Product abnormity machining feature
CN110175626A (en) * 2019-04-15 2019-08-27 哈尔滨理工大学 One kind is based on SVM image identification system and method under cloud platform
EP4022534A4 (en) * 2020-11-06 2022-11-30 Visenze Pte Ltd A system and a method for generating an image recognition model and classifying an input image

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI737659B (en) 2015-12-22 2021-09-01 以色列商應用材料以色列公司 Method of deep learning - based examination of a semiconductor specimen and system thereof
KR102036957B1 (en) * 2016-10-06 2019-10-25 가톨릭대학교 산학협력단 Safety classification method of the city image using deep learning-based data feature
CN107844337B (en) * 2017-10-31 2019-08-06 Oppo广东移动通信有限公司 Method for cleaning, device, storage medium and the electronic equipment of background application

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
DOUGLAS NATAN ET AL: "Iterative Technique for Content-Based Image Retrieval using Multiple SVM Ensembles", 3 June 2013 (2013-06-03), XP055254858, Retrieved from the Internet <URL:http://iris.sel.eesc.usp.br/wvc/Anais_WVC2013/Oral/1/6.pdf> [retrieved on 20160303] *
LORENZO DAL COL ET AL: "Fast and accurate object detection by means of recursive monomial feature elimination and cascade of SVM", AUTOMATION SCIENCE AND ENGINEERING (CASE), 2011 IEEE CONFERENCE ON, IEEE, 24 August 2011 (2011-08-24), pages 304 - 309, XP031975813, ISBN: 978-1-4577-1730-7, DOI: 10.1109/CASE.2011.6042464 *
ORIOL VINYALS ET AL: "Learning with Recursive Perceptual Representations", NIPS 2012, 6 December 2012 (2012-12-06), XP055170464, Retrieved from the Internet <URL:http://www1.icsi.berkeley.edu/~vinyals/Files/nips2012.pdf> [retrieved on 20150218] *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109543571A (en) * 2018-11-07 2019-03-29 西安交通大学 A kind of intelligent recognition and search method of Complex Product abnormity machining feature
CN109543571B (en) * 2018-11-07 2022-03-08 西安交通大学 Intelligent identification and retrieval method for special-shaped processing characteristics of complex products
CN110175626A (en) * 2019-04-15 2019-08-27 哈尔滨理工大学 One kind is based on SVM image identification system and method under cloud platform
EP4022534A4 (en) * 2020-11-06 2022-11-30 Visenze Pte Ltd A system and a method for generating an image recognition model and classifying an input image
US11869228B2 (en) 2020-11-06 2024-01-09 Visenze Pte Ltd System and a method for generating an image recognition model and classifying an input image

Also Published As

Publication number Publication date
WO2016075293A1 (en) 2016-05-19

Similar Documents

Publication Publication Date Title
US11823050B2 (en) Semi-supervised person re-identification using multi-view clustering
WO2016075274A1 (en) Methods, systems and apparatus for image recognition based on recursively determined exemplar-support vector machines (e-svm) features
Gong et al. Deep convolutional ranking for multilabel image annotation
CN105912611B (en) A kind of fast image retrieval method based on CNN
US20200334287A1 (en) Image retrieval method, image retrieval apparatus, image retrieval device and medium
US20170262478A1 (en) Method and apparatus for image retrieval with feature learning
Xia et al. Loop closure detection for visual SLAM using PCANet features
Shi et al. Training DCNN by combining max-margin, max-correlation objectives, and correntropy loss for multilabel image classification
CN104112018B (en) A kind of large-scale image search method
CN108875487A (en) Pedestrian is identified the training of network again and is identified again based on its pedestrian
CN108595546B (en) Semi-supervision-based cross-media feature learning retrieval method
EP2776981A2 (en) Methods and apparatuses for mobile visual search
CN108154156B (en) Image set classification method and device based on neural topic model
CN118334489B (en) Vision language model field self-adaption method based on countermeasure double-prompt learning, terminal and readable storage medium
CN116777727B (en) Integrated memory chip, image processing method, electronic device and storage medium
Tong et al. A review of indoor-outdoor scene classification
Boix et al. Nested sparse quantization for efficient feature coding
CN109934253A (en) A kind of confrontation sample generating method and device
WO2023036157A1 (en) Self-supervised spatiotemporal representation learning by exploring video continuity
CN113255604B (en) Pedestrian re-identification method, device, equipment and medium based on deep learning network
US20170309004A1 (en) Image recognition using descriptor pruning
Li et al. Compact convolutional neural network transfer learning for small-scale image classification
Wu et al. Codebook-free compact descriptor for scalable visual search
CN110019096A (en) The generation method and device of index file
Reddy et al. Spatio-temporal feature based VLAD for efficient video retrieval

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15797921

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15797921

Country of ref document: EP

Kind code of ref document: A1