CN115130571A

CN115130571A - Feature encoding method, feature decoding method, feature encoding device, feature decoding device, electronic device, and storage medium

Info

Publication number: CN115130571A
Application number: CN202210730401.7A
Authority: CN
Inventors: 粘春湄; 林聚财; 江东; 陈瑶; 张雪; 施晓迪; 殷俊
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2022-06-24
Filing date: 2022-06-24
Publication date: 2022-09-30

Abstract

The present application relates to the field of computer technologies, and in particular, to a feature encoding method, a feature decoding method, an apparatus, an electronic device, and a storage medium, where the feature encoding method includes: extracting the characteristics of the obtained media resources to obtain an original characteristic vector; carrying out first processing on the original characteristic vector to obtain a first characteristic vector; the first process includes any one of: feature normalization processing, principal component analysis processing and feature transformation processing; and carrying out coding processing on the first feature vector to obtain coding feature information. The original feature vectors are reasonably distributed by performing first processing on the original feature vectors, so that the compression rate and the encoding efficiency of feature encoding can be improved. Further, the decoding end decodes the coded feature information and then performs a second process, namely the reverse process of the first process, to obtain a reconstructed feature vector, and performs an enhancement process on the reconstructed feature vector to reduce a reconstruction error.

Description

Feature encoding method, feature decoding method, feature encoding device, feature decoding device, electronic device, and storage medium

Technical Field

The present application relates to the field of computer technologies, and in particular, to a feature encoding and decoding method and apparatus, an electronic device, and a storage medium.

Background

In the related art, when media resources are transmitted, an encoding end encodes the media resources and transmits the encoded media resources to a decoding end, and the decoding end decodes the encoded media resources so as to execute corresponding processing tasks by using the decoded media resources; media assets include images, video, text, voice, etc. For example, in a traditional machine vision task, a picture captured by a camera is compressed into a picture stream by JPEG and then transmitted to a server, and the server decodes the picture stream and then matches the decoded picture stream with a picture in a database to obtain an identification result.

However, the method of encoding and transmitting media resources is not only time-consuming, but also occupies a large bandwidth, resulting in a slow transmission speed and affecting the efficiency of subsequent processing tasks. In addition, transmitting media resources is not conducive to protecting user privacy. Therefore, the use of the feature data extracted from the media resources for subsequent processing tasks is widely studied, and the extracted feature data is encoded and transmitted, so that the occupied bandwidth can be reduced, and the privacy of the user can be protected.

As the implementation of the processing task becomes more sophisticated, the implementation of the processing task is required to be faster, lighter, and lower in power consumption. In order to satisfy the above requirements, a higher compression rate and efficiency are required when encoding the extracted feature data, and therefore, how to improve the compression rate and encoding efficiency of feature encoding is a problem to be solved.

Disclosure of Invention

The embodiment of the application provides a feature encoding and decoding method, a feature encoding and decoding device, an electronic device and a storage medium, which are used for improving the compression rate and the encoding efficiency of feature encoding.

In one aspect, an embodiment of the present application provides a feature encoding method, applied to an encoding end, including:

acquiring media resources, and extracting the characteristics of the media resources to obtain original characteristic vectors;

performing first processing on the original characteristic vector to obtain a first characteristic vector; wherein the first processing includes any one of: feature normalization processing, principal component analysis processing and feature transformation processing;

and carrying out coding processing on the first feature vector to obtain coding feature information.

Optionally, the first processing includes feature normalization processing, and the performing first processing on the original feature vector to obtain a first feature vector includes:

performing linear normalization processing on the original feature vector based on the maximum feature value and the minimum feature value corresponding to the original feature vector to obtain the first feature vector; or

And performing zero-mean normalization processing on the original feature vector based on the mean value and the variance corresponding to the original feature vector to obtain the first feature vector.

Optionally, after the encoding processing is performed on the first feature vector and the encoding feature information is obtained, the method further includes:

transmitting the coding feature information and the feature normalization parameter to the decoding end; wherein the normalization parameter includes a first parameter or a second parameter, the first parameter includes the maximum eigenvalue and the minimum eigenvalue, and the second parameter includes the mean and the variance.

Optionally, the first processing includes principal component analysis processing, and the performing first processing on the original feature vector to obtain a first feature vector includes:

based on a preset mean vector, performing mean removing operation on the original feature vector to obtain a candidate feature vector;

and performing a first transformation operation on the candidate eigenvector based on a preset covariance matrix to obtain the first eigenvector.

Optionally, the first processing includes feature transformation processing, and the first processing is performed on the original feature vector to obtain a first feature vector, where the first processing includes any one of the following operations:

and converting the original characteristic vector from a pixel domain to a frequency domain based on a discrete cosine transform method to obtain the first characteristic vector.

In one aspect, an embodiment of the present application provides a feature decoding method, applied to a decoding end, including:

decoding the coded feature information to obtain a decoded feature vector;

performing second processing on the decoded feature vector to obtain a reconstructed feature vector; wherein the second processing includes any one of: characteristic normalization inverse processing, principal component analysis inverse processing and characteristic transformation inverse processing;

and performing enhancement processing on the reconstructed feature vector to obtain a generated feature vector.

Optionally, the enhancing the reconstructed feature vector to obtain a generated feature vector includes:

performing convolution operation on the reconstructed feature vectors through a plurality of convolution modules of a multi-scale enhancement network respectively to obtain a plurality of intermediate feature vectors;

and carrying out fusion processing on the plurality of intermediate feature vectors to obtain the generated feature vectors.

enhancing the reconstructed feature vector through a generating module of a generating type countermeasure network to obtain the generated feature vector;

the generative confrontation network comprises a generation module and a discrimination module, and the training process comprises the following operations of iterative execution until a preset convergence condition is met: acquiring a sample reconstruction characteristic vector of a sample media resource, inputting the sample reconstruction characteristic vector added with noise data into the generation module to obtain a sample generation characteristic vector, judging whether the sample generation characteristic vector is the same as a sample original characteristic vector through the judgment module, if so, adjusting the parameter of the judgment module, and if not, adjusting the parameter of the generation module.

Optionally, the second processing includes feature normalization and inverse processing, and before the decoding processing is performed on the encoded feature information to obtain the decoded feature vector, the second processing further includes:

receiving the coding feature information and the feature normalization parameter sent by a coding end; wherein the normalization parameters comprise a first parameter or a second parameter, the first parameter comprises a maximum eigenvalue and a minimum eigenvalue, and the second parameter comprises a mean and a variance;

the second processing on the decoded feature vector to obtain a reconstructed feature vector includes:

based on the first parameter, carrying out linear normalization inverse processing on the decoding characteristic vector to obtain the reconstruction characteristic vector; or alternatively

And performing zero-mean normalized inverse processing on the decoded feature vector based on the second parameter to obtain the reconstructed feature vector.

Optionally, the second processing includes inverse principal component analysis processing, and the second processing is performed on the decoded feature vector to obtain a reconstructed feature vector, including:

performing a second transformation operation on the decoded eigenvector based on a preset covariance matrix to obtain a second eigenvector;

and performing mean value increasing operation on the second feature vector based on a preset mean value vector to obtain the reconstructed feature vector.

Optionally, the second processing includes inverse feature transformation processing, and the second processing is performed on the decoded feature vector to obtain a reconstructed feature vector, where the second processing includes:

and converting the decoding feature vector from a frequency domain to a pixel domain based on an inverse discrete cosine transform method to obtain the reconstruction feature vector.

Optionally, the media resource is an image, and the method further includes:

searching whether an object feature vector matched with the generated feature vector exists or not from a preset feature database so as to execute an object re-identification task;

executing target segmentation processing on the generated feature vector to obtain a target segmentation result of the image;

performing super-resolution processing on the generated feature vector to generate a super-divided image;

and performing denoising processing on the generated feature vector to generate a denoised image.

In one aspect, an embodiment of the present application provides a feature encoding apparatus, including:

the system comprises a characteristic extraction module, a feature extraction module and a feature extraction module, wherein the characteristic extraction module is used for acquiring media resources and extracting characteristics of the media resources to obtain original characteristic vectors;

the first processing module is used for carrying out first processing on the original characteristic vector to obtain a first characteristic vector; wherein the first processing includes any one of: feature normalization processing, principal component analysis processing and feature transformation processing;

and the coding module is used for coding the first feature vector to obtain coding feature information.

Optionally, the first processing includes feature normalization processing, and the first processing module is configured to:

performing linear normalization processing on the original feature vector based on the maximum feature value and the minimum feature value corresponding to the original feature vector to obtain the first feature vector; or alternatively

Optionally, the apparatus further comprises a transmission module, configured to:

transmitting the coding feature information and the feature normalization parameter to a decoding end; wherein the normalization parameter includes a first parameter or a second parameter, the first parameter includes the maximum eigenvalue and the minimum eigenvalue, and the second parameter includes the mean and the variance.

Optionally, the first processing includes principal component analysis processing, and the first processing module is configured to:

Optionally, the first processing includes feature transformation processing, and the first processing module is configured to:

and converting the original feature vector from a pixel domain to a frequency domain based on a discrete cosine transform method to obtain the first feature vector.

In one aspect, an embodiment of the present application provides a feature decoding apparatus, including:

the decoding module is used for decoding the coded characteristic information to obtain a decoded characteristic vector;

the second processing module is used for carrying out second processing on the decoding characteristic vector to obtain a reconstruction characteristic vector; wherein the second processing includes any one of: characteristic normalization inverse processing, principal component analysis inverse processing and characteristic transformation inverse processing;

and the enhancement module is used for enhancing the reconstructed feature vector to obtain a generated feature vector.

Optionally, the enhancement module is further configured to:

Optionally, the second processing includes inverse feature normalization, and the receiving module is further configured to:

receiving the coding feature information and the feature normalization parameter sent by a coding end; wherein the normalization parameters comprise a first parameter comprising the maximum eigenvalue and the minimum eigenvalue or a second parameter comprising the mean and the variance;

the second processing module is further configured to:

based on the first parameter, carrying out linear normalization inverse processing on the decoding characteristic vector to obtain the reconstruction characteristic vector; or

Optionally, the second processing includes a principal component analysis inverse processing, and the second processing module is further configured to:

and performing second transformation operation on the decoded eigenvector based on a preset covariance matrix to obtain a second eigenvector.

Optionally, the second processing includes inverse feature transformation processing, and the second processing module is further configured to:

Optionally, the media resource is an image, and the apparatus further includes a task processing module, configured to:

In one aspect, the present application provides an electronic device, which includes a processor and a memory, where the memory stores a computer program, and when the computer program is executed by the processor, the processor is caused to execute the steps of any one of the above methods for generating a deep learning model.

In one aspect, the present application provides a computer storage medium including a computer program, when the computer program runs on an electronic device, the computer program is configured to enable the electronic device to perform the steps of any one of the above methods for generating a deep learning model.

Due to the adoption of the technical scheme, the embodiment of the application has at least the following technical effects:

in the solution of the embodiment of the present application, after an original feature vector of a media resource is extracted, a first process is performed on the original feature vector to obtain a first feature vector, where the first process may be one of a feature normalization process, a principal component analysis process, and a feature transformation process, and the first feature vector is encoded to obtain encoded feature information. The original characteristic vectors can be reasonably distributed by carrying out first processing on the original characteristic vectors, so that when the first characteristic vectors after the first processing are coded, the distribution characteristics of the first characteristic vectors can be utilized, and the compression rate and the coding efficiency of characteristic coding can be favorably improved. In addition, the method is beneficial to coding the feature data extracted by adopting different feature extraction methods, and the generalization of feature coding is improved.

Furthermore, the decoding end decodes the coded feature information and then carries out second processing to obtain a reconstructed feature vector, the second processing is the inverse process of the first processing, and then the reconstructed feature vector is subjected to enhancement processing, so that the reconstruction error can be reduced, and the accuracy of a processing task is improved when the enhanced generated feature vector executes the processing task.

Additional features and advantages of the present application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the present application. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a flowchart of a feature encoding method according to an embodiment of the present application;

fig. 2 is a schematic characteristic diagram before and after PCA processing according to an embodiment of the present disclosure;

fig. 3 is a flowchart of a feature decoding method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a multi-scale enhanced network according to an embodiment of the present application;

fig. 5 is a schematic diagram illustrating a training process of a generative confrontation network according to an embodiment of the present application;

FIG. 6 is a logic diagram illustrating a feature encoding and decoding method according to an embodiment of the present application;

fig. 7 is a block diagram of a feature encoding apparatus according to an embodiment of the present application;

fig. 8 is a block diagram of another feature decoding apparatus according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The word "exemplary" is used hereinafter to mean "serving as an example, embodiment, or illustration. Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

The terms "first" and "second" are used herein for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature, and in the description of embodiments of the application, unless stated otherwise, "plurality" means two or more.

In the related art, the method for encoding and transmitting the media resources is time-consuming, occupies a large bandwidth, and causes a slow transmission speed, thereby affecting the efficiency of subsequent processing tasks. In addition, transmitting media resources is not conducive to protecting user privacy. Therefore, the use of features extracted from media resources for subsequent processing tasks is widely studied, and the extracted features are encoded and transmitted, so that the occupied bandwidth can be reduced, and the privacy of users can be protected.

As the implementation means of the processing task becomes more mature, the implementation process of the processing task is required to be faster, lighter and lower in energy consumption. In order to satisfy the above requirements, a higher compression rate is required when encoding the extracted feature data, and therefore, how to increase the compression rate of feature encoding is a problem to be solved. In view of this, embodiments of the present application provide a feature encoding method, a decoding method, an apparatus, an electronic device, and a storage medium, where after an original feature vector of a media resource is extracted, the original feature vector is subjected to a first processing, so that the original feature vector can be reasonably distributed, which is beneficial to improving the compression rate and encoding efficiency of feature encoding. In addition, the method is beneficial to coding the feature data extracted by adopting different feature extraction methods, and the generalization of feature coding is improved. Furthermore, after the decoding end decodes and reconstructs the coded feature information, the reconstructed feature vector is enhanced, so that the reconstruction error can be reduced, and the accuracy of a processing task is improved when the enhanced generated feature vector executes the processing task.

Specific embodiments of the feature encoding method and the feature decoding method according to the embodiments of the present application are described below with reference to the drawings.

Fig. 1 shows a flowchart of a feature encoding method according to an embodiment of the present application, which may be performed by an encoding end. Referring to fig. 1, the feature encoding method of the embodiment of the present application may include the following steps S101 to S103:

step S101, media resources are obtained, feature extraction is carried out on the media resources, and original feature vectors are obtained.

Media assets include, but are not limited to, image information, video information, text information, voice information, and the like. Methods for feature extraction of media assets include, but are not limited to: the feature extraction method based on deep learning, the traditional feature extraction method and the like can adopt corresponding feature extraction methods aiming at different media resources. For example, taking media resources as image information as an example, feature extraction may be performed by using a deep learning Network such as a Residual Network (ResNet), for example, ResNet-50, or by using a conventional feature extraction method such as Scale-invariant feature transform (SIFT), which is not limited herein.

Step S102, carrying out first processing on the original characteristic vector to obtain a first characteristic vector; the first process includes any one of: feature normalization processing, principal component analysis processing and feature transformation processing.

The original feature vectors can be mapped into a designated numerical value interval or designated distribution through feature normalization processing, so that the original feature vectors are reasonably distributed, and the compression rate and the encoding efficiency of feature encoding are improved.

The original feature vector can be transformed through Principal Component Analysis (PCA) processing, so that feature data with similar values in the transformed first feature vector are placed in a region, which is beneficial to utilizing spatial correlation during feature coding and improving the compression rate and coding efficiency of feature coding. In addition, the original feature vectors can be mapped into a low-dimensional space through PCA processing, so that fewer data dimensions are used, more characteristics of the original feature vectors are reserved, and the dimension number of the original feature vectors can be kept unchanged.

Through the feature transformation processing, most of the signal energy of the original feature vector can be concentrated in a small range of a frequency domain, so that high-frequency energy and low-frequency energy can be distinguished, and the compression rate and the encoding efficiency of feature encoding can be improved.

The following embodiments of the present application will further describe the above-mentioned several processing methods.

Step S103, the first feature vector is coded to obtain coding feature information.

In the embodiment of the present application, a coding method based on deep learning or a conventional coding method may be adopted to perform coding processing on the first feature vector. The deep learning-based coding method includes, but is not limited to: the full-connection dimensionality reduction in the full-connection dimensionality reduction and quantification method can reduce the dimensionality of the first feature vector through a full-connection layer, and quantification means that continuous feature data after dimensionality reduction is dispersed and compression is facilitated. Conventional encoding methods include, but are not limited to, h.266 encoding method, h.265 encoding method, JPEG (joint Photographic Experts group) compression method, JPEG2000 compression method, and the like.

It should be noted that, the embodiment of the present application may use any suitable encoding method to perform encoding processing on the first feature vector, which is not limited in this respect.

Further, the encoding end may transmit the encoding characteristic information to the decoding end.

In the embodiment of the application, the original feature vectors can be reasonably distributed by carrying out the first processing on the original feature vectors, so that when the first processed feature vectors are coded, the distribution characteristics of the first feature vectors can be utilized, and the compression rate of feature coding is favorably improved. In addition, the method is also beneficial to coding the feature data extracted by adopting different feature extraction methods, and the generalization of feature coding is improved.

A possible implementation of step S102 is described below.

In a first possible implementation manner, the first processing in step S102 includes a feature normalization processing, and the step S102 performs the first processing on the original feature vector to obtain the first feature vector, which may include one of the following two manners.

In the first mode, based on the maximum eigenvalue and the minimum eigenvalue corresponding to the original eigenvector, linear normalization processing is performed on the original eigenvector to obtain a first eigenvector.

The original feature vector includes a plurality of dimensional features, for example 2048 dimensional features, and a maximum value of the plurality of dimensional features is used as a maximum feature value, and a minimum value of the plurality of dimensional features is used as a minimum feature value.

Specifically, each dimension feature in the original feature vector may be subjected to linear normalization processing by using the following formula (1):

wherein, X _max And X _min Respectively representing maximum characteristic value and minimum characteristic value, and each dimension characteristic X after linear normalization _norm In [0, 1]]In the meantime. In addition, X is _max And X _min Needs to be transmitted to the decoding end, so that the decoding end performs linear normalization and inverse processing after decoding.

Each dimension feature in the original feature vector can be mapped between [0, 1] through the formula (1) so as to reasonably distribute the original feature vector, which is beneficial to improving the compression rate and the encoding efficiency of feature encoding.

And in the second mode, based on the mean value and the variance corresponding to the original feature vector, zero mean value normalization processing is carried out on the original feature vector to obtain the first feature vector.

As can be seen from the above, the original feature vector includes a plurality of dimensional features, the average of the plurality of dimensional features is used as a mean, and the average of the square of the difference between each of the plurality of dimensional features and the mean is used as a variance.

Specifically, the zero-mean normalization processing may be performed on the original feature vector by using the following formula (2):

wherein, mu represents the first mean data corresponding to the original feature vector, and sigma represents the variance corresponding to the original feature vector. Wherein, mu and sigma need to be transmitted to the decoding end, so that the decoding end performs zero-mean normalization inverse processing after decoding.

The original feature vectors can be mapped to the distribution with the mean value of 0 and the variance of 1 through the formula (2) so as to reasonably distribute the original feature vectors, and the compression rate and the coding efficiency of feature coding are favorably improved.

Further, the transmitting the encoding characteristic information to the decoding end in step S104 may include:

transmitting the coding feature information and the feature normalization parameters to a decoding end; the normalization parameter includes a first parameter or a second parameter, the first parameter includes the maximum characteristic value and the minimum characteristic value, and the second parameter includes the mean value and the variance.

And transmitting the characteristic normalization parameter to a decoding end so that the decoding end performs characteristic normalization inverse processing on the decoded characteristic data, namely an inverse process of the formula (1) or an inverse process of the formula (2).

In a second possible implementation, the first processing in step S102 includes a principal component analysis processing, and the step S102 performs the first processing on the original feature vector to obtain a first feature vector, and may include the following steps a1-a 2:

and A1, based on the preset mean vector, executing a mean removing operation on the original feature vector to obtain a candidate feature vector.

The mean vector comprises a mean value of each dimension feature in the original feature vector, and the mean value removing operation means that a corresponding mean value is subtracted from each dimension feature to obtain a candidate feature vector after mean value removing.

A2, based on the preset covariance matrix, executing a first transformation operation on the candidate eigenvector to obtain a first eigenvector.

In this step, first, based on a preset covariance matrix, an eigenvalue of the covariance matrix and a corresponding eigenvector are calculated, as shown in the following formula (3):

Cu＝λu (3)

the method comprises the steps that C is a covariance matrix, u is an eigenvector, lambda is an eigenvalue, lambda comprises a plurality of values, the values can be sequenced from large to small, and the first k values are selected, so that the corresponding k eigenvectors are obtained.

Then, the original feature vectors are projected to k feature vectors to obtain new k-dimensional features, which are used as first feature vectors. Assume that the original feature vector is represented as

k feature vectors are respectively u ₁ 、u ₂ ...u _k Then the new k-dimensional feature vector after projection is represented as

The projection mode is shown in the following formula (4):

when the value of k is smaller than the dimension number of the original feature vector, dimension reduction of data can be realized, so that the characteristic of retaining more original data points by using less data dimensions is realized.

It should be noted that the mean vector and the covariance matrix are trained through sample data and exist in the encoding end and the decoding end at the same time, so that the encoding end does not need to transmit the mean vector and the covariance matrix to the decoding end.

The training process of the above-mentioned mean vector and covariance matrix is explained below.

Step 1: and (4) carrying out averaging.

For M sample feature vectors { X ¹ ，X ² ，……,X ^M Each sample feature vector contains N dimensional features

Calculating an average value of each dimension feature in the M sample feature vectors, and subtracting the average value of each dimension feature in each sample feature vector to obtain a new feature value, where N is 2, for example, the average value formula is shown in the following formula (5):

step 2: a covariance matrix is calculated.

Specifically, the covariance matrix may be calculated based on the following equations (6) and (7):

wherein x is ₁ And x ₂ Is a two-dimensional feature when N is 2.

In the foregoing embodiment, when the original eigenvector is transformed by the principal component analysis processing method, the eigenvalues of the covariance matrix may be sorted, and then the original eigenvector is transformed based on the eigenvector corresponding to the selected eigenvalue, so that the eigenvalue data with similar values in the transformed first eigenvector is placed in a region, as shown in fig. 2, (a) is the original eigenvector corresponding to the original eigenvector, and the data distribution in the original eigenvector is relatively disordered, (b) is the eigenvector corresponding to the first eigenvector after the PCA processing, and the data is concentrated in a region, e.g., the right region of (b), which is beneficial to utilizing spatial correlation during the feature coding and improving the coding efficiency.

In a third possible implementation, the first processing in step S102 includes feature transformation processing, and the step S102 performs the first processing on the original feature vector to obtain a first feature vector, including the following steps:

Specifically, a one-dimensional discrete cosine transform may be performed on the original feature vector, so as to convert the original feature vector from a pixel domain to a frequency domain, where the formulas are shown in the following equations (8) and (9):

wherein f (i) is the original feature vector, f (u) is the first feature vector after discrete cosine transform, N is the dimension number of the original feature vector, c (u) is the set compensation coefficient, and the inverse discrete cosine transform is the inverse process of the above equations (8) and (9).

Through discrete cosine transform processing, most of signal energy of the original feature vector can be concentrated in a small range of a frequency domain, so that high-frequency energy and low-frequency energy can be distinguished, and the compression rate and the encoding efficiency of feature encoding can be improved.

Besides the discrete cosine transform, other characteristic transform methods such as discrete sine transform and wavelet transform may be used, and may be specifically selected as needed.

Based on the same inventive concept, the embodiment of the application also provides a feature decoding method which can be executed by a decoding end. Referring to fig. 3, a feature decoding method provided in the embodiment of the present application includes the following steps S301 to S303:

step S301, decoding the encoded feature information to obtain a decoded feature vector.

Before step S301 is executed, the decoding end may receive encoding characteristic information of the media resource sent by the encoding end, where the encoding characteristic information may be obtained after the encoding end performs characteristic encoding on the media resource based on steps S101 to S104.

The decoding method may correspond to the encoding method in the above embodiment, and the encoding feature information may be decoded by using a deep learning-based decoding method or a conventional decoding method. The decoding method based on deep learning includes but is not limited to: an end-to-end decoding method based on a deep neural network, a full-connection dimension-increasing and inverse-quantization method and the like. Conventional decoding methods include, but are not limited to, H.266 decoding methods, H.265 decoding methods, JPEG (Joint Photographic Experts group) decompression methods, JPEG2000 decompression methods, and the like. And decoding the coded characteristic information by adopting the decoding parameters based on a corresponding decoding method to obtain a decoding characteristic vector.

Step S302, carrying out second processing on the decoded feature vector to obtain a reconstructed feature vector; the second process includes any one of: characteristic normalization inverse processing, principal component analysis inverse processing and characteristic transformation inverse processing.

In this step, the second process may be the reverse process of the first process in the above-described embodiment, which will be further described in the following embodiment.

Step S303, the reconstructed feature vector is enhanced to obtain a generated feature vector.

The reconstruction error can be reduced by performing enhancement processing on the reconstructed feature vector, so that the accuracy of the processing task is improved when the enhanced generated feature vector executes the processing task.

In this embodiment of the application, the reconstructed feature vector may be enhanced based on a multi-scale enhanced network or a Generative Adaptive Network (GAN), which are introduced below.

In a possible implementation manner, when the reconstructed feature vector is enhanced by using a multi-scale enhancement network, step S303 may include the following steps:

and B1, performing convolution operation on the reconstructed feature vectors through a plurality of convolution modules of the multi-scale enhancement network respectively to obtain a plurality of intermediate feature vectors.

Each convolution module may be a residual block or other convolution module, and is not limited herein. As the scale of multi-scale enhancement networks increases, the number of convolution modules also increases.

And B2, carrying out fusion processing on the plurality of intermediate feature vectors to obtain a generated feature vector.

Illustratively, as shown in fig. 4, the multi-scale enhancement network includes a plurality of residual blocks, each residual block performs convolution operation on the reconstructed feature vector to obtain corresponding features L1, L2.

In another possible implementation, when the reconstructed feature vector is enhanced by using the generative countermeasure network GAN, step S303 may include the following steps:

and performing enhancement processing on the reconstructed feature vector through a generating module of the generating type countermeasure network to obtain a generated feature vector.

Specifically, as shown in fig. 5, the generative confrontation network includes a generation module and a discrimination module, and the training process includes iteratively executing the following operations until a preset convergence condition is satisfied: acquiring a sample reconstruction characteristic vector of a sample media resource, inputting the sample reconstruction characteristic vector added with noise data into a generating module to obtain a sample generation characteristic vector, judging whether the sample generation characteristic vector is the same as a sample original characteristic vector or not through a judging module, if so, adjusting the parameter of the judging module, and if not, adjusting the parameter of the generating module.

The preset convergence condition may be that the number of iterations reaches a first preset number, or that the number of consecutive errors of the determination result of the determination module reaches a second preset number, which is not limited herein.

Three embodiments of performing the second processing on the decoded feature vector in step S302 are described below.

In the first embodiment, the second process includes a feature normalization inverse process, and the following steps may be further performed before the step S301 is performed:

receiving coding characteristic information and characteristic normalization parameters sent by a coding end; the normalization parameters comprise a first parameter or a second parameter, the first parameter comprises a maximum characteristic value and a minimum characteristic value, and the second parameter comprises a mean value and a variance.

Since the decoding end needs to use the feature normalization parameter used in the feature normalization process in the above embodiment when performing the feature normalization inverse process after decoding the encoded feature information, the decoding end receives the encoded feature information sent by the encoding end and also receives the feature normalization parameter.

Further, the second processing on the decoded feature vector in step S302 to obtain a reconstructed feature vector may include one of the following operations:

and operation I, based on the first parameter, carrying out linear normalization inverse processing on the decoding characteristic vector to obtain a reconstruction characteristic vector.

Specifically, based on the maximum eigenvalue and the minimum eigenvalue in the first parameter, the decoded eigenvector may be subjected to linear normalization inverse processing according to the inverse process of equation (1) in the above embodiment, so as to obtain a reconstructed eigenvector.

And secondly, performing zero-mean normalized inverse processing on the decoded feature vector based on the second parameter to obtain a reconstructed feature vector.

Specifically, based on the mean and the variance in the second parameter, the inverse process of equation (2) in the above embodiment may be adopted to perform zero-mean normalized inverse processing on the decoded feature vector, so as to obtain a reconstructed feature vector.

In the second embodiment, the second processing includes inverse principal component analysis, and the second processing is performed on the decoded feature vector in the step S302 to obtain a reconstructed feature vector, which may include the following steps C1-C2:

and C1, performing a second transformation operation on the decoded eigenvector based on the preset covariance matrix to obtain a second eigenvector.

Specifically, the eigenvalue of the covariance matrix and the corresponding eigenvector may be calculated by using formula (3) in the above embodiment based on a preset covariance matrix, the eigenvalue includes a plurality of values, the plurality of values may be sorted in an order from large to small, and the top k values are selected, so as to obtain the corresponding k eigenvectors. Then, the inverse process of equation (4) in the above embodiment is performed on the decoded feature vector based on the k feature vectors, obtaining a second feature vector.

And C2, performing mean value increasing operation on the second feature vector based on the preset mean value vector to obtain a reconstructed feature vector.

And adding the corresponding mean value to each dimension characteristic to obtain a reconstructed characteristic vector with the increased mean value.

In the third embodiment, the second processing includes inverse feature transformation processing, and the second processing is performed on the decoded feature vector in the step S302 to obtain a reconstructed feature vector, which may include the following steps:

and converting the decoding characteristic vector from a frequency domain to a pixel domain based on an inverse discrete cosine transform method to obtain the reconstruction characteristic vector.

Specifically, based on the inverse process of equation (8) in the above embodiment, the decoded feature vector may be subjected to inverse discrete cosine transform, and the decoded feature vector may be converted from the frequency domain to the pixel domain to obtain a reconstructed feature vector.

When the encoding end performs the feature transformation processing on the original feature vector by using another feature transformation method, for example, inverse discrete sine transformation or wavelet transformation, the decoding end performs the feature transformation processing on the decoded feature vector by using an inverse feature transformation method corresponding to the encoding end to obtain a reconstructed feature vector.

In some embodiments, after obtaining the generated feature vector, a subsequent processing task may be performed based on the generated feature vector, and specifically one of the following processing tasks may be performed:

and processing the first task, searching whether an object feature vector matched with the generated feature vector exists in a preset feature database so as to execute an object re-identification task.

The feature database comprises object feature vectors of a plurality of objects, wherein the objects comprise but are not limited to pedestrians, vehicles and the like, and the object feature vector of each object is obtained by the following steps: and performing feature extraction on the image containing the object to obtain an object feature vector of the object, wherein the feature extraction method is the same as the extraction method of the original feature vector.

Specifically, the generated feature vector is compared with each object feature vector in the feature database, the similarity between the generated feature vector and each object feature vector is determined, and if the similarity between the generated feature vector and a certain object feature vector reaches a similarity threshold, the generated feature vector can be considered to be matched with the object feature vector; wherein, the similarity may be cosine similarity or the like.

And a second processing task of executing target segmentation processing on the generated feature vectors to obtain a target segmentation result of the image.

The target segmentation process may segment a foreground and a background in the image, and specifically, based on a generated feature vector of the image, perform a segmentation process of the foreground and the background on the image to obtain foreground information and background information of the image.

And a third processing task of performing super-resolution processing on the generated eigenvector to generate a super-resolved image.

Specifically, super-resolution processing is performed on the generated eigenvectors by adopting a super-resolution technology to generate a super-divided image; the super-resolution technology is to reconstruct a corresponding high-resolution image from a low-resolution image by using a certain algorithm or model, and recover more detail information as much as possible.

And fourthly, performing denoising processing on the generated feature vectors to generate denoised images.

Specifically, an image denoising technology is adopted to perform denoising processing on the generated feature vector so as to generate a denoised image. Image denoising refers to a process of reducing noise in a digital image called image denoising. In reality, digital images are often affected by interference of imaging equipment and external environment noise during digitization and transmission, and are called noisy images or noisy images.

In addition to the above processing tasks, other processing tasks may be executed based on the generated feature vectors, and the processing tasks executed for different media resources are different, and are not limited herein.

The characteristic encoding and decoding methods of the embodiments of the present application are described below by way of specific examples.

As shown in fig. 6, taking the media resource as the image of the pedestrian and the executed processing task as the re-identification of the pedestrian as an example, the overall process of the feature encoding and decoding method includes the following steps:

1. a vector V of 2048-dimensional float32 representing characteristic data can be extracted by inputting a pedestrian image and passing through Resnet-50 ₂₀₄₈ 。

Wherein, the pedestrian image is adjusted and extracted through Resnet-50 to obtain a vector V ₂₀₄₈ Vector V ₂₀₄₈ I.e. the original feature vectors in the above embodiments of the present application.

2. Will vector V ₂₀₄₈ The vector V can be transformed by discrete cosine transform ₂₀₄₈ The data in the step (a) are clustered and distributed again according to an energy rule to obtain V' ₂₀₄₈ The specific implementation is shown in the following formula (10):

3. in the encoding side, vector V' ₂₀₄₈ After full-connection dimensionality reduction, a vector V 'of 128 dimensions is obtained' ₁₂₈ V 'will be vector' ₁₂₈ And (4) carrying out quantization processing, for example, obtaining coding characteristic information after 4-bit fixed quantization into integer vectors, then storing the coding characteristic information into a code stream file, and transmitting the code stream file to a decoding end through a network broadband.

4. In a decoding end, after the coded feature information adopts 4-bit inverse quantization and full-connection dimension-increasing operation, the inverse process of the formula (10) is executed through inverse discrete cosine transform, and then the enhanced feature vector is obtained through a multi-scale enhancement network

Wherein the feature vector

I.e. the generated feature vectors in the above embodiments of the present application.

5. In the task of re-identifying the pedestrians, feature extraction is carried out on a large number of labeled images, the feature extraction mode is the same as that of the step 1, a feature database is constructed, and feature vectors are extracted

And comparing the pedestrian identification information with all the feature vectors in the feature database to identify the pedestrian again to obtain the identification accuracy.

The re-identification may use cosine similarity to determine the similarity between two eigenvectors, as shown in the following formula (11):

wherein d is the similarity, and the smaller d is, the higher the similarity is; x and Y are feature vectors

And a feature vector in the feature database that requires traversing all feature vectors in the feature database.

In the above embodiments of the present application, the feature vector V of the pedestrian image is extracted ₂₀₄₈ Then, by aligning the feature vector V ₂₀₄₈ Performing discrete cosine transform to obtain feature vector V ₂₀₄₈ Most of the signal energy is concentrated in a small range of the frequency domain, so that the high-frequency energy and the low-frequency energy can be distinguished, and the compression rate of characteristic coding and the coding efficiency are improved. Furthermore, the decoding end decodes the coded feature information, then performs inverse discrete cosine transform, and obtains an enhanced feature vector through a multi-scale enhancement network

Reconstruction errors can be reduced so that when the enhanced generated feature vectors execute a processing task, the accuracy of the processing task is improved.

Based on the same inventive concept, the embodiment of the present application further provides a feature encoding apparatus, and as the principle of the apparatus for solving the problem is similar to the feature encoding method in the foregoing embodiment, the implementation of the apparatus may refer to the embodiment of the method, and repeated details are not described herein.

As shown in fig. 7, an embodiment of the present application provides a feature encoding apparatus, which includes a feature extraction module 71, a first processing module 72, and an encoding module 73.

The feature extraction module 71 is configured to obtain a media resource, and perform feature extraction on the media resource to obtain an original feature vector;

a first processing module 72, configured to perform first processing on the original feature vector to obtain a first feature vector; wherein the first processing includes any one of: feature normalization processing, principal component analysis processing and feature transformation processing;

and the encoding module 73 is configured to perform encoding processing on the first feature vector to obtain encoded feature information.

Optionally, the first processing includes a feature normalization processing, and the first processing module 72 is configured to:

performing linear normalization processing on the original characteristic vector based on the maximum characteristic value and the minimum characteristic value corresponding to the original characteristic vector to obtain a first characteristic vector; or

And performing zero-mean normalization processing on the original feature vector based on the mean value and the variance corresponding to the original feature vector to obtain a first feature vector.

transmitting the coding feature information and the feature normalization parameters to a decoding end; the normalization parameters comprise a first parameter or a second parameter, the first parameter comprises a maximum characteristic value and a minimum characteristic value, and the second parameter comprises a mean value and a variance.

Optionally, the first process includes a principal component analysis process, and the first processing module 72 is configured to:

and performing a first transformation operation on the candidate eigenvector based on a preset covariance matrix to obtain a first eigenvector.

Optionally, the first process includes a feature transformation process, and the first processing module 72 is configured to:

Based on the same inventive concept, the embodiment of the present application further provides a feature decoding apparatus, and as the principle of the apparatus for solving the problem is similar to the feature decoding method in the foregoing embodiment, the implementation of the apparatus may refer to the embodiment of the method, and repeated details are not described herein.

As shown in fig. 8, an embodiment of the present application provides a feature decoding apparatus, which includes a decoding module 81, a second processing module 82, and an enhancement module 83.

A decoding module 81, configured to perform decoding processing on the encoded feature information to obtain a decoded feature vector;

a second processing module 82, configured to perform second processing on the decoded feature vector to obtain a reconstructed feature vector; wherein the second processing includes any one of: characteristic normalization inverse processing, principal component analysis inverse processing and characteristic transformation inverse processing;

and the enhancing module 83 is configured to perform enhancement processing on the reconstructed feature vector to obtain a generated feature vector.

Optionally, the enhancing module 83 is further configured to:

performing convolution operation on the reconstructed feature vectors respectively through a plurality of convolution modules of the multi-scale enhancement network to obtain a plurality of intermediate feature vectors;

and carrying out fusion processing on the plurality of intermediate feature vectors to obtain a generated feature vector.

Optionally, the enhancing module 83 is further configured to:

enhancing the reconstructed feature vector through a generating module of the generative countermeasure network to obtain a generated feature vector;

the generative confrontation network comprises a generation module and a discrimination module, and the training process comprises the following operations of iterative execution until a preset convergence condition is met: acquiring a sample reconstruction characteristic vector of a sample media resource, inputting the sample reconstruction characteristic vector added with noise data into a generating module to obtain a sample generation characteristic vector, judging whether the sample generation characteristic vector is the same as a sample original characteristic vector or not through a judging module, if so, adjusting the parameter of the judging module, and if not, adjusting the parameter of the generating module.

Optionally, the second processing includes inverse feature normalization, and the apparatus further includes a receiving module, configured to:

receiving coding feature information and feature normalization parameters sent by a coding end; the normalization parameters comprise a first parameter or a second parameter, the first parameter comprises a maximum characteristic value and a minimum characteristic value, and the second parameter comprises a mean value and a variance;

the second processing module 82 is further configured to:

performing linear normalization inverse processing on the decoding feature vector based on the first parameter to obtain a reconstruction feature vector; or

And based on the second parameter, carrying out zero-mean normalized inverse processing on the decoded feature vector to obtain a reconstructed feature vector.

Optionally, the second processing includes a principal component analysis inverse processing, and the second processing module 82 is further configured to:

And performing mean value increasing operation on the second feature vector based on a preset mean value vector to obtain a reconstructed feature vector.

Optionally, the second processing includes inverse feature transformation processing, and the second processing module 82 is further configured to:

performing super-resolution processing on the generated eigenvectors to generate a super-resolved image;

Based on the same inventive concept, the embodiment of the present application further provides an electronic device, and since the electronic device is the electronic device in the method in the embodiment of the present application, and the principle of the electronic device to solve the problem is similar to the method, the implementation of the electronic device may refer to the embodiment of the method, and repeated details are omitted.

As shown in fig. 9, the electronic device includes a processor 900, a memory 901, and a communication interface 902, wherein the processor 900, the communication interface 902, and the memory 901 communicate with each other via a communication bus 903; the memory 901 is used for storing programs executable by the processor 900, and the processor 900 is used for reading the programs in the memory 901 and executing the steps of any feature encoding method or feature decoding method in the above embodiments.

The communication bus 903 mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface 902 is used for communication between the electronic apparatus and other apparatuses. The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as N-disk Memory. Alternatively, the memory may be N memory devices located remotely from the processor.

The Processor may be a general-purpose Processor, including a central processing unit, a Network Processor (NP), and the like; but may also be a Digital instruction processor (DSP), an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, etc.

Based on the same inventive concept, embodiments of the present application further provide a computer storage medium, in which a computer program executable by a processor is stored, and when the program runs on the processor, the processor is caused to execute the steps of any one of the feature encoding methods or the feature decoding methods in the foregoing embodiments.

The computer readable storage medium may be any available medium or data storage device that can be accessed by a processor in an electronic device, including but not limited to magnetic memory such as floppy disks, hard disks, magnetic tape, magneto-optical disks (MOs), etc., optical memory such as CDs, DVDs, BDs, HVDs, etc., and semiconductor memory such as ROMs, EPROMs, EEPROMs, non-volatile memory (NAND FLASH), Solid State Disks (SSDs), etc.

In yet another embodiment provided by the present application, a computer program product containing instructions is further provided, which when invoked by an electronic device, can cause the electronic device to perform the steps of any one of the feature encoding method or the feature decoding method in the above embodiments.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A feature coding method applied to a coding end includes:

2. The method of claim 1, wherein the first processing comprises a feature normalization processing, and wherein the first processing the original feature vector to obtain a first feature vector comprises:

3. The method according to claim 2, wherein after the encoding the first eigenvector to obtain the encoded eigen information, the method further comprises:

4. The method of claim 1, wherein the first processing comprises a principal component analysis processing, and wherein the first processing the raw feature vectors to obtain first feature vectors comprises:

5. The method of claim 1, wherein the first processing comprises a feature transformation processing, and wherein the first processing the original feature vector to obtain a first feature vector comprises:

6. A feature decoding method applied to a decoding end includes:

decoding the coded feature information to obtain a decoded feature vector;

7. The method of claim 6, wherein the enhancing the reconstructed feature vector to obtain a generated feature vector comprises:

8. The method of claim 6, wherein the enhancing the reconstructed feature vector to obtain a generated feature vector comprises:

9. The method according to any one of claims 6 to 8, wherein the second processing includes a feature normalization inverse processing, and before the decoding processing of the encoded feature information to obtain the decoded feature vector, the method further includes:

10. The method according to any of claims 6-8, wherein the second processing comprises a principal component analysis inverse processing, and wherein the second processing of the decoded feature vectors to obtain reconstructed feature vectors comprises:

11. The method according to any of claims 6-8, wherein the second processing comprises inverse feature transform processing, and wherein the second processing of the decoded feature vector to obtain a reconstructed feature vector comprises:

12. The method according to any of claims 6-8, wherein the media asset is an image, the method further comprising any of:

13. A feature encoding apparatus, comprising:

14. A feature decoding apparatus, comprising:

the second processing module is used for carrying out second processing on the decoding characteristic vector to obtain a reconstruction characteristic vector; wherein the second processing includes any one of: characteristic normalization inverse processing, PCA inverse processing and characteristic transformation inverse processing;

15. An electronic device, comprising a processor and a memory, wherein the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of the method of any of claims 1-5, 6-12.

16. A computer-readable storage medium, characterized in that it comprises a computer program for causing an electronic device to carry out the steps of the method according to any one of claims 1 to 5, 6 to 12, when said computer program is run on said electronic device.