CN107992611A

CN107992611A - The high dimensional data search method and system of hash method are distributed based on Cauchy

Info

Publication number: CN107992611A
Application number: CN201711353318.8A
Authority: CN
Inventors: 王建民; 龙明盛; 刘斌; 曹越
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2017-12-15
Filing date: 2017-12-15
Publication date: 2018-05-04
Anticipated expiration: 2037-12-15
Also published as: CN107992611B

Abstract

The present invention, which provides a kind of the high dimensional data search method based on Cauchy's distribution hash method and system, search method, to be included：S1, input the corresponding high dimensional data of data point to be retrieved in trained similar to search model, obtains the corresponding Hash coding of data point to be retrieved, wherein, similar to search model is for neural network model and including Hash coding module；S2, in the inverted index unit of Hash coding input similar to search model, will obtain and bucket of the Hamming distance of Hash coding less than or equal to preset value；S3, reorder all high dimensional datas in bucket, obtains similar high dimensional data list, using similar high dimensional data list as retrieval result.The present invention first carries out beta pruning using hash method, is then resequenced using primitive character, that is, accelerates effectiveness of retrieval, also ensure that the precision of retrieval.The present invention, which can realize, fast and accurately to be retrieved.

Description

The high dimensional data search method and system of hash method are distributed based on Cauchy

Technical field

The present invention relates to big data analysis field, more particularly, to a kind of higher-dimension that hash method is distributed based on Cauchy Data retrieval method and system.

Background technology

In Internet era, as multimedia resource is continuously increased on internet, how from large-scale data it is quick, Relevant data are effectively found, are either still all spatially a greatly test in time.With internet Rapid development, large capacity, high-dimensional image big data are more and more common in search engine and community network, also attract More and more concerns, how fast and effectively to carry out image retrieval analysis is a urgent problem needed to be solved, and approximate neighbour looks into It is exactly to be produced for this problem to ask, and how to ensure computational efficiency at the same time and search quality be approximate NN Query pass Key.For this problem, a very common and effective solution method is hash method, i.e., is converted to high dimensional data compact Binary code, and similar binary code is generated for similar data.The present invention pays close attention to the relevant hash method of data, Verified, such method hash method more incoherent than data (such as local sensitivity Hash) is more effective.

In the past, it has been proposed that excessively many hash methods based on Hamming distance.These methods can be divided into unsupervised Kazakhstan Uncommon method and there is supervision two class of hash method.Wherein unsupervised approaches are more universal, can be in no semantic label or data phase It is trained in the case of closing property.Therefore, many unsupervised hash methods are suggested study compact binary encoded to carry out Efficient Hamming spatial image retrieval.

However, existing most methods are both for data compression rather than candidate's trimming, the i.e. line based on Hash codes Property scanning maximize retrieval performance.Since even if using Hash codes, the cost of linear scan is still very high, therefore existing big Most hash methods deviate from original intention to a certain extent, that is, farthest accelerate under acceptable precision.With The it is proposed of the powerful hash method suitable for linear scan, should turn to Hamming spatial retrieval now, it can realize highest The Time constant search of effect.In Hamming spatial retrieval, by Hash table search rather than linear scan, the Chinese that we will give Data point in bright radius returns to each inquiry.But the loss function of existing hash method is not very applicable, is usually lacked The weary ability concentrated on associated picture in small Hamming ball, therefore the loss function may be not suitable for Hamming spatial retrieval.

The content of the invention

The present invention provides a kind of a kind of high dimensional data retrieval side that hash method is distributed based on Cauchy for overcoming the above problem Method and system.

According to an aspect of the present invention, there is provided a kind of high dimensional data search method that hash method is distributed based on Cauchy, Including：S1, input the corresponding high dimensional data of data point to be retrieved in trained similar to search model, obtains described to be checked The corresponding Hash coding of data point of rope, wherein, the similar to search model encodes mould for neural network model and including Hash Block；S2, in the inverted index unit of similar to search model described in the Hash coding input, will obtain and Hash coding Hamming distance is less than or equal to the bucket of preset value；S3, reorder all high dimensional datas in the bucket, obtains similar High dimensional data list, using the similar high dimensional data list as retrieval result.

Preferably, trained similar to search model obtains as follows described in step S1：Obtained from training set Take multiple untapped high dimensional datas；The multiple untapped high dimensional data is inputted into the hash module, is obtained to Hash The more matched low-dimensional feature vector of coding；Based on the low-dimensional feature vector, the corresponding Hash of the low-dimensional feature vector is obtained Coding, and setting loss function is calculated to the similar to search model according to the corresponding Hash coding of the low-dimensional feature vector The gradient of Hash coding layer；Based on the gradient of the Hash coding layer to the similar to search model, to the similar to search Model is trained, and obtains the trained similar to search model.

Preferably, it is described set loss function as：

Wherein, O is setting loss function, and L is Cauchy's cross entropy loss function,For definition, Q quantifies loss letter for Cauchy Number, λ are to adjust the parameter that Cauchy intersects entropy loss and Cauchy quantifies loss weight.

Preferably, the setting loss function quantifies loss function by Cauchy's cross entropy loss function and the Cauchy Linear combination obtains；Cauchy's cross entropy loss function is represented by following formula：

Wherein, L is Cauchy's cross entropy loss function, and S is similar matrix, s_ijThe member of j row is arranged for the i in similar matrix Element, 0≤i≤N, 0≤j≤N, N be low-dimensional feature vector total number, w_ijFor z_iFor i-th of low-dimensional feature vector, z_jFor jth A low-dimensional feature vector, d (z_i,z_j) it is z_i, z_jBetween normalization Euclidean distance, γ be Cauchy distribution zooming parameter；

The z_i, z_jBetween normalization Euclidean distance obtained by following formula：

Wherein, d (z_i,z_j) it is z_i, z_jBetween normalization Euclidean distance, z_iFor i-th of low-dimensional feature vector, z_jFor J-th of low-dimensional feature vector, K are the digit of Hash coding, | | | | for the Euclid norm of vector；

The Cauchy quantifies loss function and is represented by following formula：

Wherein, Q for the Cauchy quantify loss function, 0≤i≤N, N be training data number, z_iFor i-th of low-dimensional Feature vector, d (| z_i|, 1) be | z_i|, the normalization Euclidean distance between 1, γ is the zooming parameter of Cauchy's distribution.

Preferably, the similar matrix is the matrix of NxN, and N is the data point total amount in training set, and the element in matrix is 0 or 1；For any one element s in matrix_ij, s_ij=1 represents data point x_iWith data point x_jIt is similar, s_ij=0 represents data Point x_iWith data point x_jIt is dissimilar；Wherein, s_ijThe element of j row, x are arranged for the i in similar matrix_iFor i-th of data in training set Point, x_jFor j-th of data point in training set, 0≤i≤N, 0≤j≤N.

Preferably, step S1 further comprises：S11, input the corresponding high dimensional data of all data points of database and instruct In the similar to search model perfected, obtain the Hash corresponding to the corresponding high dimensional data of all data points of the database and compile Code；S12, based on the Hash coding corresponding to the corresponding high dimensional data of all data points, obtain all of the database The inverted index of data point；The inverted index of S13, all data points based on the database, obtain the number to be retrieved The corresponding high dimensional data in strong point, and obtain the corresponding Hash coding of the high dimensional data.

According to another aspect of the present invention, there is provided a kind of high dimensional data that hash method is distributed based on Cauchy retrieves system System, including：Hash coding module is obtained, for the corresponding high dimensional data of data point to be retrieved to be inputted trained similar inspection In rope model, the corresponding Hash coding of the data point to be retrieved is obtained, wherein, the similar to search model is neutral net Model and including Hash coding module；Bucket module is obtained, for falling similar to search model described in the Hash coding input Arrange in indexing units, obtain the bucket for being less than or equal to preset value with the Hamming distance of Hash coding；Obtain retrieval result Module, for reordering to all high dimensional datas in the bucket, obtains similar high dimensional data list, by the similar height Dimension data list is as retrieval result.

According to a further aspect of the invention, there is provided a kind of electronic equipment for high dimensional data retrieval, including memory And processor, the processor and the memory complete mutual communication by bus；The memory storage has can quilt The programmed instruction that the processor performs, the processor call described program instruction to be able to carry out such as above-mentioned any one of them Search method.

According to a further aspect of the invention, there is provided a kind of computer program product, the computer program product include The computer program being stored on non-transient computer readable storage medium storing program for executing, the computer program include programmed instruction, work as institute When stating programmed instruction and being computer-executed, the computer is set to perform such as above-mentioned any one of them search method.

A kind of still another aspect according to the present invention, there is provided computer-readable recording medium, it is characterised in that the computer Such as above-mentioned any one of them search method is realized when program is executed by processor.

A kind of high dimensional data search method and system that hash method is distributed based on Cauchy provided by the invention, passes through setting Beta pruning reordering technique, is first carried out beta pruning using hash method, is then resequenced using primitive character, that is, accelerate retrieval Efficiency, also ensure that the precision of retrieval.And the present invention provide search method be it is unsupervised, can in no semantic label or It is trained in the case of data dependence.The present invention, which can realize, fast and accurately to be retrieved.

Brief description of the drawings

Fig. 1 is a kind of flow of high dimensional data search method that hash method is distributed based on Cauchy in the embodiment of the present invention Figure；

Fig. 2 is a kind of framework of high dimensional data searching system that hash method is distributed based on Cauchy in the embodiment of the present invention Figure；

Fig. 3 is a kind of totality of high dimensional data search method that hash method is distributed based on Cauchy in the embodiment of the present invention Flow chart；

Fig. 4 is a kind of module of high dimensional data searching system that hash method is distributed based on Cauchy in the embodiment of the present invention Figure；

Fig. 5 is a kind of structure diagram of electronic equipment for high dimensional data retrieval in the embodiment of the present invention.

Embodiment

With reference to the accompanying drawings and examples, the embodiment of the present invention is described in further detail.Implement below Example is used to illustrate the present invention, but is not limited to the scope of the present invention.

Fig. 1 is a kind of flow of high dimensional data search method that hash method is distributed based on Cauchy in the embodiment of the present invention Figure, as shown in Figure 1, including：S1, by the corresponding high dimensional data of data point to be retrieved input trained similar to search model In, obtain the corresponding Hash coding of the data point to be retrieved, wherein, the similar to search model for neural network model and Including Hash coding module；S2, will in the inverted index unit of similar to search model described in the Hash coding input, obtain with The Hamming distance of the Hash coding is less than or equal to the bucket of preset value；S3, carry out all high dimensional datas in the bucket Reorder, obtain similar high dimensional data list, using the similar high dimensional data list as retrieval result.

Specifically, the high dimensional data in the embodiment of the present invention can be shallow-layer feature or by deep neural network The further feature of generation.

Further, preset value is preferably 2.

It should be noted that it is each high dimension based in bucket that all high dimensional datas in the bucket, which reorder, Carried out according to the distance of corresponding high dimensional data.

Further, Hash coding is a kind of Hash coding, Hash coding again it will be understood that for high dimensional data it is corresponding Binary feature.

A kind of high dimensional data search method that hash method is distributed based on Cauchy provided by the invention, by setting beta pruning weight Drainage technique, is first carried out beta pruning using hash method, is then resequenced using primitive character, that is, accelerate the effect of retrieval Rate, also ensure that the precision of retrieval.And the present invention provide search method be it is unsupervised, can be in no semantic label or data It is trained in the case of correlation.

Based on above-described embodiment, trained similar to search model obtains as follows described in step S1：From instruction Practice to concentrate and obtain multiple untapped high dimensional datas；The multiple untapped high dimensional data is inputted into the hash module, is obtained Take and more matched low-dimensional feature vector is encoded to Hash；Based on the low-dimensional feature vector, the low-dimensional feature vector pair is obtained The Hash coding answered, and setting loss function is calculated to the similar inspection according to the corresponding Hash coding of the low-dimensional feature vector The gradient of the Hash coding layer of rope model；Based on the gradient of the Hash coding layer to the similar to search model, to described Similar to search model is trained, and obtains the trained similar to search model.

Further, multiple untapped high dimensional datas are obtained from training set；By the multiple untapped high dimension According to the hash module is inputted, obtain and further include following steps before encoding more matched low-dimensional feature vector to Hash：

Calculate dependency relation matrix S.

It is Epoch_current (initial value 0) to remember current exercise wheel number, and maximum exercise wheel number is Epoch_max.Renewal Epoch_current=Epoch_current+1, if Epoch_current≤Epoch_max after renewal, represents and not yet instruct White silk terminates, and it is unused state to mark all data point features in training set, into next step, otherwise terminates to train.

A collection of epicycle training is obtained at random from training set still between used data point feature and these features Dependency relation, number of data points is batch size, and is marked as having used.Data point feature is denoted as X, X={ x₁, x₂,…,x_N, the dependency relation between data point is denoted as S, S={ s_ij| 1≤i, j≤N }, s_ij=1 represents x_i,x_jIt is semantic related, And s_ij=-1 represents data point x_i,x_jIt is semantic unrelated.If all data points have been previously used in epicycle training, turn supreme One step.

Further, setting loss function includes the gradient of the Hash coding layer of the similar to search modelWithAfter gradient is obtained, measured back-propagation algorithm is trained the similar to search model, constantly obtains The step of epicycle training still carries out the present embodiment for used data point feature, until obtaining the trained similar to search Model.

Based on above-described embodiment, it is described set loss function as：

Based on above-described embodiment, the setting loss function is quantified by Cauchy's cross entropy loss function and the Cauchy Loss function linear combination obtains；

Cauchy's cross entropy loss function is represented by following formula：

The Cauchy quantifies loss function and is represented by following formula：

Specifically, Cauchy's cross entropy loss function is used to ensure similarity-based learning, quantifies the phase between low-dimensional feature vector Like property.

Further, Cauchy quantifies loss function and is used to control the quantifiable of binary code quality and quantization signifying.

A kind of high dimensional data search method that hash method is distributed based on Cauchy provided by the invention, by setting Cauchy to hand over Entropy loss function is pitched, remains pairwise similarity information included in similar matrix, that is, is remained between raw data points Similarity relationships, are conducive to preferably train model.By setting Cauchy to quantify loss function so that before binaryzation Character representation more they tends to 0 and 1 so that information loss substantially reduces caused by binaryzation.

Based on above-described embodiment, the similar matrix is the matrix of NxN, and N is the data point total amount in training set, in matrix Element be 0 or 1；For any one element s in matrix_ij, s_ij=1 represents data point x_iWith data point x_jIt is similar, s_ij=0 Represent data point x_iWith data point x_jIt is dissimilar；Wherein, s_ijThe element of j row, x are arranged for the i in similar matrix_iFor in training set I-th of data point, x_jFor j-th of data point in training set, 0≤i≤N, 0≤j≤N.

Specifically, the building method of similar matrix is：

K most like data points of each input data point, i.e. k neighbours are calculated by input feature vector, k is neighbour Number, is an adjustable parameter.For each input data point x_iEach corresponding k neighbours x_j, there is s_ij=1, for its remainder Strong point x_k, there is s_ik=0.

Based on above-described embodiment, step S1 further comprises：S11, by the corresponding high dimension of all data points of database According to inputting in trained similar to search model, obtain corresponding to the corresponding high dimensional data of all data points of the database Hash encodes；S12, based on the Hash coding corresponding to the corresponding high dimensional data of all data points, obtain the database All data points inverted index；The inverted index of S13, all data points based on the database, obtain described to be checked The corresponding high dimensional data of data point of rope, and obtain the corresponding Hash coding of the high dimensional data.

It should be noted that in order to obtain more effective hash function, we use a full connection with R unit Quantify layer as hash module, and use tanh (tanh) to be used as activation primitive, obtain connecting quantization signifying z entirely.Finally Binaryzation is carried out to z and obtains Hash coding.

As a preferred embodiment, Fig. 2 is a kind of height that hash method is distributed based on Cauchy in the embodiment of the present invention The Organization Chart of dimension data searching system, Fig. 3 are a kind of high dimension that hash method is distributed based on Cauchy in the embodiment of the present invention According to the overview flow chart of search method, the present embodiment refers to Fig. 2 and Fig. 3.

Calculate dependency relation matrix S.

A collection of epicycle training is obtained at random from training set still between used data point feature and these features Dependency relation, number of data points is batch size, and is marked as having used.Data point feature is denoted as X, X={ x₁, x₂,…,x_N, the dependency relation between data point is denoted as S, S={ s_ij| 1≤i, j≤N }, s_ij=1 represents x_i,x_jIt is semantic related, And s_ij=-1 represents data point x_i,x_jIt is semantic unrelated.If all data points have been previously used in epicycle training, turn supreme One step, otherwise into next step.

Multiple untapped high dimensional datas are obtained from training set；By described in the multiple untapped high dimensional data input Hash module, obtains and encodes more matched low-dimensional feature vector to Hash.

Based on the low-dimensional feature vector, the corresponding Hash coding of the low-dimensional feature vector is obtained, and according to described low The corresponding Hash coding of dimensional feature vector calculates gradient of the setting loss function to the Hash coding layer of the similar to search model.

After gradient is obtained, measured back-propagation algorithm is trained the similar to search model, constantly Obtain epicycle training and still carry out above-mentioned steps for used data point feature, until obtaining the trained similar to search mould Type.

The corresponding high dimensional data of data point to be retrieved is inputted in trained similar to search model, is obtained described to be checked The corresponding Hash coding of data point of rope, wherein, the similar to search model encodes mould for neural network model and including Hash Block.

By in the inverted index unit of similar to search model described in the Hash coding input, obtain and encoded with the Hash Hamming distance be less than or equal to preset value bucket.

Reorder to all high dimensional datas in the bucket, obtain similar high dimensional data list, by the similar height Dimension data list is as retrieval result.

Based on above-described embodiment, Fig. 4 is a kind of high dimension that hash method is distributed based on Cauchy in the embodiment of the present invention According to the module map of searching system, as shown in figure 4, including：Hash coding module is obtained, for data point to be retrieved is corresponding High dimensional data is inputted in trained similar to search model, obtains the corresponding Hash coding of the data point to be retrieved, wherein, The similar to search model is for neural network model and including Hash coding module；Bucket module is obtained, for the Hash to be compiled Code is inputted in the inverted index unit of the similar to search model, is obtained and is less than or waits with the Hamming distance of Hash coding In the bucket of preset value；Retrieval result module is obtained, for reordering to all high dimensional datas in the bucket, is obtained similar High dimensional data list, using the similar high dimensional data list as retrieval result.

Based on above-described embodiment, Fig. 5 is a kind of electronic equipment for high dimensional data retrieval in the embodiment of the present invention Structure diagram, as shown in figure 5, including memory and processor, the processor and the memory complete phase by bus Communication between mutually；The memory storage has the programmed instruction that can be performed by the processor, and the processor calls the journey Sequence instruction is able to carry out search method as described in above-mentioned any embodiment, such as including：S1, correspond to data point to be retrieved High dimensional data input in trained similar to search model, obtain the corresponding Hash coding of the data point to be retrieved, its In, the similar to search model is for neural network model and including Hash coding module；S2, by described in the Hash coding input In the inverted index unit of similar to search model, obtain and be less than or equal to preset value with the Hamming distance of Hash coding Bucket；S3, reorder all high dimensional datas in the bucket, obtains similar high dimensional data list, by the similar higher-dimension Data list is as retrieval result.

Based on above-described embodiment, further embodiment of this invention provides a kind of computer program product, the computer journey Sequence product includes the computer program being stored on non-transient computer readable storage medium storing program for executing, and the computer program includes program Instruction, when described program instruction is computer-executed, makes the computer perform the retrieval as described in above-mentioned any embodiment Method, such as including：S1, input the corresponding high dimensional data of data point to be retrieved in trained similar to search model, obtains Take the data point to be retrieved corresponding Hash coding, wherein, the similar to search model for neural network model and including Hash coding module；S2, will in the inverted index unit of similar to search model described in the Hash coding input, obtain with it is described The Hamming distance of Hash coding is less than or equal to the bucket of preset value；S3, to all high dimensional datas in the bucket into rearrangement Sequence, obtains similar high dimensional data list, using the similar high dimensional data list as retrieval result.

Based on above-described embodiment, a further embodiment of the present invention provides a kind of computer-readable recording medium, the computer Search method of the realization as described in above-mentioned any embodiment when program is executed by processor, such as including：S1, by number to be retrieved The corresponding high dimensional data in strong point is inputted in trained similar to search model, obtains the corresponding Hash of the data point to be retrieved Coding, wherein, the similar to search model is for neural network model and including Hash coding module；S2, encode the Hash Input in the inverted index unit of the similar to search model, obtain and be less than or equal to the Hamming distance of Hash coding The bucket of preset value；S3, reorder all high dimensional datas in the bucket, obtains similar high dimensional data list, by described in Similar high dimensional data list is as retrieval result.

A kind of high dimensional data search method and system that hash method is distributed based on Cauchy provided by the invention, passes through setting Beta pruning reordering technique, is first carried out beta pruning using hash method, is then resequenced using primitive character, that is, accelerate retrieval Efficiency, also ensure that the precision of retrieval.And the present invention provide search method be it is unsupervised, can in no semantic label or It is trained in the case of data dependence.By setting Cauchy's cross entropy loss function, remain included in similar matrix Pairwise similarity information, that is, remain the similarity relationships between raw data points, be conducive to preferably instruct model Practice.By setting Cauchy to quantify loss function so that the character representation before binaryzation more they tends to 0 and 1 so that caused by binaryzation Information loss substantially reduces.

One aspect of the present invention quality for controlling quantization error, improving Hash coding good in Optimization Framework, from And the efficiency of beta pruning is improved, retrieval rate is on the other hand greatly improved by beta pruning, there is stronger validity, can be realized Fast and accurately retrieve.

Finally, method of the invention is only preferable embodiment, is not intended to limit the scope of the present invention.It is all Within the spirit and principles in the present invention, any modification, equivalent replacement, improvement and so on, should be included in the protection of the present invention Within the scope of.

Claims

A kind of 1. high dimensional data search method, it is characterised in that including：

S1, input the corresponding high dimensional data of data point to be retrieved in trained similar to search model, obtains described to be checked The corresponding Hash coding of data point of rope, wherein, the similar to search model encodes mould for neural network model and including Hash Block；

S2, in the inverted index unit of similar to search model described in the Hash coding input, will obtain and Hash coding Hamming distance be less than or equal to preset value bucket；

S3, reorder all high dimensional datas in the bucket, obtains similar high dimensional data list, by the similar higher-dimension Data list is as retrieval result.
2. search method according to claim 1, it is characterised in that trained similar to search model described in step S1 Obtain as follows：

Multiple untapped high dimensional datas are obtained from training set；

The multiple untapped high dimensional data is inputted into the hash module, obtains and more matched low-dimensional feature is encoded to Hash Vector；

Based on the low-dimensional feature vector, the corresponding Hash coding of the low-dimensional feature vector is obtained, and it is special according to the low-dimensional The corresponding Hash coding of sign vector calculates gradient of the setting loss function to the Hash coding layer of the similar to search model；

Based on the gradient of the Hash coding layer to the similar to search model, the similar to search model is trained, Obtain the trained similar to search model.
3. search method according to claim 2, it is characterised in that it is described set loss function as：

<mrow> <mi>O</mi> <mover> <mo>=</mo> <mi>&Delta;</mi> </mover> <mi>L</mi> <mo>+</mo> <mi>&lambda;</mi> <mi>Q</mi> <mo>;</mo> </mrow>

Wherein, O is setting loss function, and L is Cauchy's cross entropy loss function,For definition, Q quantifies loss function, λ for Cauchy To adjust the parameter that Cauchy intersects entropy loss and Cauchy quantifies loss weight.
4. search method according to claim 3, it is characterised in that the setting loss function is by Cauchy's cross entropy Loss function and the Cauchy quantify loss function linear combination and obtain；

Cauchy's cross entropy loss function is represented by following formula：

<mrow> <mi>L</mi> <mo>=</mo> <msub> <mi>&Sigma;</mi> <mrow> <msub> <mi>s</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>&Element;</mo> <mi>S</mi> </mrow> </msub> <msub> <mi>w</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <msup> <mrow> <mo>(</mo> <msub> <mi>s</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mfrac> <mrow> <mi>d</mi> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>z</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> <mi>&gamma;</mi> </mfrac> <mo>+</mo> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mo>(</mo> <mrow> <mn>1</mn> <mo>+</mo> <mfrac> <mi>&gamma;</mi> <mrow> <mi>d</mi> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>z</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow> <mo>)</mo> <mo>)</mo> </mrow> <mn>2</mn> </msup> <mo>;</mo> </mrow>

Wherein, L is Cauchy's cross entropy loss function, and S is similar matrix, s_ijThe element arranged for the i rows j in similar matrix, 0 ≤ i≤N, 0≤j≤N, N be low-dimensional feature vector total number, w_ijFor z_iFor i-th of low-dimensional feature vector, z_jIt is low for j-th Dimensional feature vector, d (z_i, z_j) it is z_i, z_jBetween normalization Euclidean distance, γ be Cauchy distribution zooming parameter；

The z_i, z_jBetween normalization Euclidean distance obtained by following formula：

<mrow> <mi>d</mi> <mrow> <mo>(</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>,</mo> <msub> <mi>z</mi> <mi>j</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mi>K</mi> <mn>4</mn> </mfrac> <mo>|</mo> <mo>|</mo> <mfrac> <msub> <mi>z</mi> <mi>i</mi> </msub> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>|</mo> <mo>|</mo> </mrow> </mfrac> <mo>-</mo> <mfrac> <msub> <mi>z</mi> <mi>j</mi> </msub> <mrow> <mo>|</mo> <mo>|</mo> <msub> <mi>z</mi> <mi>j</mi> </msub> <mo>|</mo> <mo>|</mo> </mrow> </mfrac> <mo>|</mo> <msubsup> <mo>|</mo> <mn>2</mn> <mn>2</mn> </msubsup> <mo>;</mo> </mrow>

Wherein, d (z_i, z_j) it is z_i, z_jBetween normalization Euclidean distance, z_iFor i-th of low-dimensional feature vector, z_jFor jth A low-dimensional feature vector, K are the digit of Hash coding, | | | | for the Euclid norm of vector；

The Cauchy quantifies loss function and is represented by following formula：

<mrow> <mi>Q</mi> <mo>=</mo> <msubsup> <mi>&Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </msubsup> <mi>l</mi> <mi>o</mi> <mi>g</mi> <mrow> <mo>(</mo> <mn>1</mn> <mo>+</mo> <mfrac> <mrow> <mi>d</mi> <mrow> <mo>(</mo> <mo>|</mo> <msub> <mi>z</mi> <mi>i</mi> </msub> <mo>|</mo> <mo>,</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow> <mi>&gamma;</mi> </mfrac> <mo>)</mo> </mrow> <mo>;</mo> </mrow>

Wherein, Q for the Cauchy quantify loss function, 0≤i≤N, N be training data number, z_iFor i-th of low-dimensional feature to Amount, d (| z_i|, 1) be | z_i|, the normalization Euclidean distance between 1, γ is the zooming parameter of Cauchy's distribution.
5. search method according to claim 4, it is characterised in that the similar matrix is the matrix of NxN, and N is training The data point total amount of concentration, the element in matrix are 0 or 1；

For any one element s in matrix_ij, s_ij=1 represents data point x_iWith data point x_jIt is similar, s_ij=0 represents data point x_iWith data point x_jIt is dissimilar；

Wherein, s_ijThe element of j row, x are arranged for the i in similar matrix_iFor i-th of data point in training set, x_jFor in training set J-th of data point, 0≤i≤N, 0≤j≤N.
6. search method according to claim 1, it is characterised in that step S1 further comprises：

S11, input the corresponding high dimensional data of all data points of database in trained similar to search model, described in acquisition Hash coding corresponding to the corresponding high dimensional data of all data points of database；

S12, based on the Hash coding corresponding to the corresponding high dimensional data of all data points, obtain all of the database The inverted index of data point；

The inverted index of S13, all data points based on the database, obtain the corresponding higher-dimension of the data point to be retrieved Data, and obtain the corresponding Hash coding of the high dimensional data.
A kind of 7. high dimensional data searching system, it is characterised in that including：

Hash coding module is obtained, for the corresponding high dimensional data of data point to be retrieved to be inputted trained similar to search mould In type, the corresponding Hash coding of the data point to be retrieved is obtained, wherein, the similar to search model is neural network model And including Hash coding module；

Obtain bucket module, for will in the inverted index unit of similar to search model described in the Hash coding input, obtain with The Hamming distance of the Hash coding is less than or equal to the bucket of preset value；

Retrieval result module is obtained, for reordering to all high dimensional datas in the bucket, obtains similar high dimensional data List, using the similar high dimensional data list as retrieval result.
A kind of 8. electronic equipment for high dimensional data retrieval, it is characterised in that including memory and processor, the processor Mutual communication is completed by bus with the memory；The memory storage has the program that can be performed by the processor Instruction, the processor call described program instruction to be able to carry out the search method as described in claim 1 to 6 is any.
9. a kind of computer program product, it is characterised in that the computer program product includes being stored in non-transient computer Computer program on readable storage medium storing program for executing, the computer program include programmed instruction, when described program is instructed by computer During execution, the computer is set to perform the search method as described in claim 1 to 6 is any.
10. a kind of computer-readable recording medium, is stored thereon with computer program, it is characterised in that the computer program quilt The search method as described in claim 1 to 6 is any is realized when processor performs.