CN117014561B - Information fusion method, training method of variable learning and electronic equipment - Google Patents
Information fusion method, training method of variable learning and electronic equipment Download PDFInfo
- Publication number
- CN117014561B CN117014561B CN202311247658.8A CN202311247658A CN117014561B CN 117014561 B CN117014561 B CN 117014561B CN 202311247658 A CN202311247658 A CN 202311247658A CN 117014561 B CN117014561 B CN 117014561B
- Authority
- CN
- China
- Prior art keywords
- image
- information
- data
- shooting
- learnable
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 145
- 238000012549 training Methods 0.000 title claims abstract description 145
- 238000007500 overflow downdraw method Methods 0.000 title claims abstract description 22
- 230000004927 fusion Effects 0.000 claims abstract description 80
- 239000011159 matrix material Substances 0.000 claims description 122
- 230000008569 process Effects 0.000 claims description 87
- 230000006870 function Effects 0.000 claims description 51
- 238000012545 processing Methods 0.000 claims description 43
- 238000004590 computer program Methods 0.000 claims description 8
- 125000004122 cyclic group Chemical group 0.000 claims description 3
- 239000010410 layer Substances 0.000 description 24
- 238000012795 verification Methods 0.000 description 24
- 238000013461 design Methods 0.000 description 23
- 239000000284 extract Substances 0.000 description 10
- 239000012633 leachable Substances 0.000 description 10
- 238000010586 diagram Methods 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 8
- 238000004891 communication Methods 0.000 description 8
- 238000005457 optimization Methods 0.000 description 8
- 210000004027 cell Anatomy 0.000 description 7
- 238000012804 iterative process Methods 0.000 description 6
- 238000011176 pooling Methods 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 5
- 238000013528 artificial neural network Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 238000007499 fusion processing Methods 0.000 description 4
- 230000004044 response Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 3
- 239000003086 colorant Substances 0.000 description 3
- 238000010276 construction Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 3
- 238000010200 validation analysis Methods 0.000 description 3
- 230000003190 augmentative effect Effects 0.000 description 2
- 230000006399 behavior Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 238000009877 rendering Methods 0.000 description 2
- 241001085205 Prenanthella exigua Species 0.000 description 1
- 230000001133 acceleration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000013529 biological neural network Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 210000000988 bone and bone Anatomy 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 230000000295 complement effect Effects 0.000 description 1
- 239000012792 core layer Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 229910044991 metal oxide Inorganic materials 0.000 description 1
- 150000004706 metal oxides Chemical class 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N5/00—Details of television systems
- H04N5/222—Studio circuitry; Studio devices; Studio equipment
- H04N5/262—Studio circuits, e.g. for mixing, switching-over, change of character of image, other special effects ; Cameras specially adapted for the electronic generation of special effects
- H04N5/265—Mixing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Signal Processing (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the application relates to the technical field of information, in particular to an information fusion method, a training method of a learnable variable and electronic equipment. In the information fusion method, the electronic device can input the image information into the pre-trained image feature extractor to obtain the image information features. Next, the electronic device adds a pre-trained learnable variable to the photographing information. Then, the photographing information added with the learnable variable is input to the image feature extractor to obtain photographing information features. Then, fusion information is obtained based on the shooting information feature and the image information feature. According to the method, the shooting information can be fused through a small number of feature extractors, and occupation of electronic equipment resources when the electronic equipment fuses the shooting information can be reduced.
Description
Technical Field
The embodiment of the application relates to the technical field of information, in particular to an information fusion method, a training method of a learnable variable and electronic equipment.
Background
With the popularization of electronic devices with photographing functions in life, photographing by people using electronic devices has become a daily behavior mode. The requirements of people on the photographed images are also increasing. In order to improve the quality of the photographed image, the electronic device may fuse some photographing information and image information to obtain fusion information, and use the fusion information to improve the quality of the photographed image.
At present, how to fuse the image information of the photographed image with the photographing information when photographing the image is a problem to be solved.
Disclosure of Invention
The embodiment of the application provides an information fusion method, a training method of a learnable variable and electronic equipment, which can fuse shooting information through a small number of feature extractors and can reduce occupation of electronic equipment resources when the electronic equipment fuses the information.
In order to achieve the above purpose, the embodiment of the application adopts the following technical scheme:
in a first aspect, an information fusion method is provided, where the method may be applied to an electronic device having an image processing capability, such as a mobile phone, a notebook computer, a tablet computer, and the like. The method comprises the following steps: the electronic equipment acquires a target image, wherein the target image comprises image information and shooting information, the shooting information comprises camera parameters used for representing when a camera acquires the target image, and the data processing formats of the image information and the shooting information are not matched. Then, the electronic device inputs the image information of the target image into an image feature extractor to obtain image features. Then, the electronic device inputs the shooting information added with the learnable variable into the image feature extractor to obtain shooting features. Wherein the data processing format of the shooting information added with the learnable variable is matched with the data processing format of the image information. Next, the electronic device obtains fusion information based on the image feature and the photographing feature, the fusion information being used to determine the quality of the target image under the influence of the photographing information.
Wherein the image features may be used to characterize the quality of the target image; such as to characterize whether the target image is overexposed, whether the target image is underexposed, whether the target image is color cast, whether the target image is blurred, and the like. The above-described photographing characteristics may be used to characterize the extent to which photographing information affects the quality of a target image.
It can be understood that, because the principle of the image feature extractor is to perform convolution, pooling, deconvolution and the like on the image information, analyze the rule reflected by each pixel point in the image information, extract and analyze the feature expressed by the rule from the rule, and the learnable variable can imitate the rule. Based on this, it is possible to make the photographing information after adding the learnable variable have the rule by adding the learnable variable in the photographing information. Thus, the image feature extractor can extract the shooting information added with the learnable variable to obtain shooting features.
In the method, the format of the shooting information can be changed by adding a pre-trained learnable variable into the shooting information, so that the format of the shooting information is matched with the format of the image information; thus, the image feature extractor can extract features from the photographing information to which the learnable variable is added. Meanwhile, the learnable variable can also enable the shooting information added with the learnable variable to have the same rule as the image information. Thus, the image feature extractor can extract the feature of the shot information from the shot information added with the learnable variable. That is, using a trained image information feature extractor, image information features can be extracted from the image information, and shot information features can be extracted from the shot information. Therefore, the electronic equipment can fuse shooting information and image information through a small number of feature extractors, and occupation of electronic equipment resources when the electronic equipment fuses information can be reduced.
In another possible design of the first aspect, the image information includes RGB data, the data processing format of the RGB data is a three-dimensional matrix format, and the data processing format of the learnable variables is a two-dimensional matrix format; the two-dimensional matrix format is matched with the three-dimensional matrix format, for example, the number of rows and columns of the two-dimensional matrix is the same as that of the three-dimensional matrix. The electronic device adds a learnable variable into shooting information of a target image, and comprises: the electronic equipment expands the data processing format of shooting information into a one-dimensional matrix format; the one-dimensional matrix format is matched with the three-dimensional matrix format, for example, the row number and the column number of the one-dimensional matrix are the same as those of the three-dimensional matrix. Then, the electronic equipment expands the data processing format into shooting information in a one-dimensional matrix format, and combines the shooting information with the learnable variable to obtain shooting information added with the learnable variable; the data processing format of the shooting information added with the learnable variables is a three-dimensional matrix format.
In still another possible design of the first aspect, the electronic device obtains the fusion information according to the image feature and the shooting feature, including: superposing the image features and the shooting features to obtain fusion information; for example, the electronic device adds the image feature and the shooting feature to obtain the fusion information.
In the design, the electronic equipment can obtain fusion information according to superposition of image features and shooting features, and compared with some schemes, the electronic equipment uses a feature fusion device to obtain the fusion information; the image features and the shooting features are overlapped to obtain fusion information, and the fusion information can be obtained without a feature fusion device; this may further reduce the occupation of electronic device resources.
In yet another possible design of the first aspect, the method further includes: the electronic equipment inputs the fusion information into a quality classifier to obtain a quality result of the target image, wherein the quality result is used for representing the quality of the target image under the influence of the shooting information; the quality classifier and the learnable variables are derived based on the same training process. The electronic device adjusts the photographing information based on the quality result. Then the electronic equipment controls the camera to shoot by utilizing the adjusted shooting information; or the electronic equipment adjusts the image information of the target image by using the adjusted shooting information to obtain an adjusted target image.
In the design, the electronic equipment can optimize the shooting process on the mobile phone through the adjusted shooting information, so that the electronic equipment can shoot and obtain images with higher quality, and the use experience of a user can be improved. Or the electronic equipment can also improve the quality of the target image through the adjusted shooting information.
In one possible design of the first aspect, the electronic device adjusts the shooting information based on the quality result, including: the electronic equipment inputs the fusion information into a quality classifier to obtain a quality result of the target image, the quality result is used for representing the quality of the target image under the influence of shooting information, and the quality classifier and the learnable variable are obtained based on the same training process. The quality results include: the quality result is the quality of the image of the first type quality result, which is higher than the quality of the image of the second type quality result. Next, under the condition that the quality result of the target image is a second type quality result, the electronic equipment executes a cyclic adjustment process on the shooting information until shooting information meeting a preset quality condition is obtained; the quality result corresponding to the shooting information meeting the quality condition is a first type of quality result. Then, the electronic device obtains an adjusted target image based on the photographing information satisfying the quality condition. The cyclic adjustment process comprises the following steps: the electronic equipment adjusts shooting information based on a preset adjusting step length; and the electronic equipment obtains the quality result of the corresponding adjusted shooting information through the learnable variable, the image characteristic and the quality classifier. And if the quality result corresponding to the adjusted shooting information is the second type quality result, adjusting the shooting information again based on the adjustment step length until the adjusted shooting information meets the quality condition.
In yet another possible design of the first aspect, the learnable variable corresponds to an information type in the photographing information, the information type of the photographing information including one or more of auto white balance AWB data, auto focus AF data, and auto exposure AE data. The above-mentioned learnable variables may include: a first class of learnable variables corresponding to AWB data, a second class of learnable variables corresponding to AF data, and a third class of learnable variables corresponding to AE data.
In still another possible design of the first aspect, the information type of the photographing information includes AWB data, AF data, and AE data, and the electronic apparatus adds a learnable variable to the photographing information of the target image, including: the electronic equipment adds a first type of learnable variable in the AWB data, adds a second type of learnable variable in the AF data, and adds a third type of learnable variable in the AE data. The electronic device inputs shooting information added with a learnable variable into an image feature extractor to obtain shooting features, and the electronic device comprises: the electronic equipment inputs the AWB data added with the first type of the learnable variables into an image feature extractor to obtain shooting features aiming at the AWB data; the electronic device inputs the AF data added with the second class of the learnable variables into the image feature extractor to obtain shooting features aiming at the AF data. The electronic device inputs the AE data added with the third type of the learnable variable into an image feature extractor to obtain shooting features aiming at the AE data. The electronic equipment obtains fusion information according to the image characteristics and the shooting characteristics, and the method comprises the following steps: and combining the shooting characteristics aiming at the AWB data, the shooting characteristics aiming at the AF data and the shooting characteristics aiming at the AE data with the image characteristics to obtain fusion information.
In this design, the electronic device may extract the shooting features from the 3A data (AWB data, AF data, and AE data) that affect the target image more directly, respectively, and fuse the shooting features and the image features to obtain fusion information. Thus, the electronic device can improve the quality of the target image based on the fusion information comprehensively or classify the target image based on the fusion information.
In a second aspect, a training method of a learnable variable and quality classifier corresponding to shooting information is provided, and the method can be applied to electronic devices such as mobile phones, notebook computers, tablet computers, servers and the like. The method comprises the following steps: the electronic device obtains training data samples, the training data samples comprising: an image information sample of a training image, a shooting information sample of the training image and a quality result sample of the training image; the shooting information sample comprises camera parameters used for representing when the camera collects training images, and the quality result sample is used for representing the quality of the training images under the influence of the shooting information sample; the image information sample does not match the data processing format of the shot information sample. The electronic device then inputs the image information sample into an image feature extractor to obtain image features. Next, the electronic equipment inputs a shooting information sample added with the initial learnable variable into the image feature extractor to obtain shooting features; the data processing format of the shooting information sample added with the initial learnable variable is matched with the data processing format of the image information sample. And then, the electronic equipment inputs the image characteristics and the shooting characteristics into an initial quality classifier to obtain a quality training result. The electronic device then calculates a loss function based on the difference between the quality training result and the quality result sample, and the difference between the image feature and the photographing feature. Next, the electronic device iterates the initial learnable variable and the initial quality classifier based on the loss function until the loss function converges, resulting in a trained learnable variable and a trained quality classifier.
Thus, the electronic device can fuse shooting information and image information through the trained variable, classify the image through the trained quality classifier, or adjust the quality of the image.
In one possible upgrade of the second aspect, the above-mentioned loss function includes a first weight and a second weight; the first weight corresponds to a difference between the image feature and the shooting feature, the second weight corresponds to a difference between the quality training result and the quality result sample, and the first weight is smaller than the second weight.
It will be appreciated that the first weight may be used to measure the degree of similarity between the image features and the capture features; the second weight may be used to measure the degree of similarity between the virtual focus training result and the virtual focus result sample. That is, in an iterative process, the first weight may be used to control the learning ability of the learnable variable to the data format of the image information; the second weight may be used to control the ability of the quality classifier to determine the quality of the image. The first weight is smaller than the second weight, so that the learnable variables can simulate the data format of the image information in the iterative process and the shooting characteristics reflected by the shooting information can not be lost in the process of simulating the data format of the image information. Meanwhile, the quality classifier can be suitable for shooting information added with a learnable variable, the shooting characteristics are expressed, and the shooting characteristics and the image characteristics can be combined to obtain a quality result.
In another possible design of the second aspect, the above-mentioned learnable variable corresponds to a type of information in a shot information sample, the type of information of the shot information sample including one or more of an auto white balance AWB data sample, an auto focus AF data sample, and an auto exposure AE data sample. The initial learnable variables include a first type of initial learnable variable corresponding to AWB data samples, a second type of initial learnable variable corresponding to AF data samples, and a third type of initial learnable variable corresponding to AE data samples.
In a further possible design of the second aspect, the photographing features comprise photographing features for AWB data samples, photographing features for AF data samples, photographing features for AE data samples. The information types of the photographing information samples include AWB data samples, AF data samples, and AE data samples. The above-mentioned, input the shooting information sample after adding the initial variable of learning into the image feature extractor and obtain shooting the characteristic, include: inputting the AWB data samples added with the first type of initial learnable variables into an image feature extractor to obtain shooting features aiming at the AWB data samples; inputting the AF data sample added with the second type of initial learnable variable into an image feature extractor to obtain shooting features aiming at the AF data sample; inputting the AE data samples added with the third type of initial learnable variables into an image feature extractor to obtain shooting features for the AE data samples. The electronic device calculating the loss function includes: the electronic device calculates a loss function based on differences between the shooting feature and the image feature for the AWB data sample, differences between the shooting feature and the image feature for the AF data sample, differences between the shooting feature and the image feature for the AE data sample, and differences between the image feature and the shooting feature.
Wherein the first weight includes: a first sub-weight, a second sub-weight, a third sub-weight; the differences between the shot features and the image features for the AWB data samples correspond to a first sub-weight, the differences between the shot features and the image features for the AF data samples correspond to a second sub-weight, the differences between the shot features for the AE data samples correspond to a third sub-weight, the first sub-weight is greater than the second sub-weight, and the first sub-weight is greater than the third sub-weight.
In the design, the emphasis degree of the quality classifier obtained by training on different shooting information types can be controlled by controlling the value of the sub-weight during training. The first sub-weight is larger than the second sub-weight, and the first sub-weight is larger than the third sub-weight, so that the quality classifier obtained through training is more focused on the AWB data.
In a third aspect, the present application provides an electronic device comprising: a memory, one or more processors, a bluetooth module; the memory is coupled with the processor; wherein the memory stores computer program code comprising computer instructions; when executed by a processor, cause an electronic device to perform the method provided by the first aspect and any one of the possible designs of the first aspect; or cause the electronic device to perform the method provided by the second aspect and any one of the possible designs of the second aspect.
In a fourth aspect, the present application provides a computer readable storage medium having stored therein instructions which, when executed on an electronic device, cause the electronic device to perform the method provided by the first aspect and any one of the possible designs of the first aspect; or cause the electronic device to perform the method provided by the second aspect and any one of the possible designs of the second aspect.
In a fifth aspect, the present application provides a computer program product comprising instructions which, when run on an electronic device, enable the electronic device to perform the method provided by any one of the above-described first aspects and any one of the possible designs of the first aspect; or cause the electronic device to perform the method provided by the second aspect and any one of the possible designs of the second aspect.
The technical effects of any one of the design manners of the third aspect to the fifth aspect may be referred to the technical effects of the different design manners of the first aspect, which are not repeated herein.
Drawings
Fig. 1 is a schematic diagram of a fusion process of shooting information and image information according to an embodiment of the present application;
fig. 2 is a schematic diagram of a usage scenario of an information fusion method according to an embodiment of the present application;
Fig. 3 is a schematic hardware structure of an electronic device according to an embodiment of the present application;
fig. 4 is a schematic software architecture diagram of an electronic device according to an embodiment of the present application;
FIG. 5 is a schematic flow chart of a training process and an information fusion process of a variable learning process according to an embodiment of the present application;
FIG. 6 is a flow chart of a training process for yet another variable to be learned provided by an embodiment of the present application;
FIG. 7 is a smoothL provided by an embodiment of the present application 1 And L 1 Is a comparative schematic of (1);
FIG. 8 is a smoothL provided by an embodiment of the present application 1 And L 1 A loss function diagram of (2);
fig. 9 is a schematic flow chart of an information fusion method according to an embodiment of the present application;
fig. 10 is a schematic flow chart of an optimized shooting process according to an embodiment of the present application;
fig. 11 is a schematic diagram of a shooting preview interface provided in an embodiment of the present application;
FIG. 12 is a schematic diagram of a training process for yet another learnable variable provided by an embodiment of the present application;
fig. 13 is a flowchart illustrating another process of optimizing shooting according to an embodiment of the present application;
FIG. 14 is a schematic diagram of another training process for a variable that can be learned according to an embodiment of the present application;
FIG. 15 is a schematic flow chart of an image optimization process according to an embodiment of the present application;
FIG. 16 is a flowchart illustrating another information fusion process according to an embodiment of the present application;
FIG. 17 is a flow chart of yet another training process for a learnable variable according to an embodiment of the present application;
fig. 18 is a schematic structural diagram of still another electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. Wherein, in the description of the present application, "/" means that the related objects are in a "or" relationship, unless otherwise specified, for example, a/B may mean a or B; the "and/or" in the present application is merely an association relationship describing the association object, and indicates that three relationships may exist, for example, a and/or B may indicate: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. Also, in the description of the embodiments of the present application, unless otherwise indicated, "plurality" means two or more than two. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural. In addition, in order to facilitate the clear description of the technical solution of the embodiments of the present application, in the embodiments of the present application, the words "first", "second", etc. are used to distinguish the same item or similar items having substantially the same function and effect. It will be appreciated by those of skill in the art that the words "first," "second," and the like do not limit the amount and order of execution, and that the words "first," "second," and the like do not necessarily differ.
Meanwhile, in the embodiments of the present application, words such as "exemplary" or "such as" are used to mean serving as examples, illustrations or explanations. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion that may be readily understood.
With the popularization of electronic devices with photographing functions in life, photographing by people using electronic devices has become a daily behavior mode. The requirements of people on the photographed images are also increasing. In order to improve the quality of the photographed image, the electronic device fuses some photographing information and image information to obtain fusion information, and optimizes the photographing process of the electronic device according to the fusion information so as to improve the quality of the photographed image.
The image information may include any one of Red Green Blue (RGB) data, gray data, and YUV data. The image information is used to express the color, gray scale, etc. of each pixel in the image.
The photographing information may be understood as information for reflecting parameter settings related to photographing when the electronic device photographs, and may include: autofocus (automatic focusing, AF) data, auto exposure (automatic exposure, AE) data, auto white balance (automatic white balance, AWB) data. Among them, the above AF data, AE data, and AWB data may be collectively referred to as 3A data. The 3A data are three important data when the camera shoots, and the effect influence of the three data on the shot image is relatively direct. AF data mainly influences the focusing of a camera during shooting, and further influences the definition of an image obtained through shooting. AWB data mainly affects the resolution of the camera for colors, and thus affects the color level of the captured image (e.g., the color level may be characterized by a color temperature). AE data mainly affects exposure time of a camera during shooting, and further affects brightness of an image obtained through shooting.
And, the photographing information may further include: depth map, heat map, event information, network inertial measurement unit (inertial measurement unit, IMU) data, user click data, overexposure information, and so forth. The depth map is mainly used for reflecting the depth of field of the shot image. The heat map is mainly used for reflecting the heat of the shot image, such as the popularity of the image. The IMU data is mainly used for reflecting inertial data acquired by a network inertial measurement unit (such as an inertial sensor) of the electronic equipment during shooting.
For example, the electronic device may fuse the AE data and the image information to obtain fusion information, and determine whether the captured image is overexposed according to the fusion information. Also for example, the electronic device may fuse the AF data and the image information to obtain fusion information, and determine whether or not the photographed image has virtual focus or the like through the fusion information.
In some aspects, see fig. 1; in the process of fusing the photographing information and the image information by the electronic device, feature extractors are configured for the photographing information and the image information, respectively. If the feature extractor 1 is configured for image information, the feature extractor 2 is configured for photographing information 1, and the feature extractor 3 is configured for photographing information 2. Then, the electronic device extracts the shooting information feature and the image feature, for example, the image information feature, the shooting information feature 1 and the shooting information feature 2. Next, the electronic device obtains fusion information by the feature fusion device from the image information feature, the shooting information feature 1, and the shooting information feature 2. Therefore, the electronic equipment can optimize the shooting process on the electronic equipment by fusing the information, so that the shooting quality is improved, and the use experience of a user when shooting by using the electronic equipment is improved. For example, whether the shot image is in virtual focus, overexposed, accurate in color and the like are judged through the fusion information.
In the above-mentioned scheme, since the modes of the photographing information and the image information are different (for example, for the image information in RGB format, the mode of the image information may be a three-dimensional matrix related to the image size, for the AF data, it may be a numerical value, or a one-dimensional matrix of 4*4, etc.), the photographing information and the image information cannot share one feature extractor, which may cause the electronic device to need more feature extractors to use in the photographing information fusion. Meanwhile, the image features or shooting features obtained by using different feature extractors can be obtained by a feature fusion device. And these feature extractors all need to be pre-trained (e.g., pre-trained feature extractor 1, pre-trained feature extractor 2, and pre-trained feature extractor 3), training these feature extractors can be time consuming and consume training resources, resulting in relatively large network overhead. And, for electronic devices, multiple feature extractors (e.g., feature extractor 1, feature extractor 2, and feature extractor 3) need to be deployed on the electronic device, which can occupy storage resources of the electronic device memory. In addition, a plurality of feature extractors are needed to be used when shooting information fusion is carried out, and the computing resources of the processor of the electronic equipment are occupied relatively; that is, the use of a relatively large number of feature extractors for fusing the shot information occupies relatively large resources of the electronic device.
Therefore, in the above scheme, because the electronic device uses more feature extractors when the electronic device fuses the photographed information, the network overhead for training the feature extractors is relatively large, and the use of a relatively large number of feature extractors for fusing the photographed information occupies relatively more resources of the electronic device.
In view of this, an embodiment of the present application provides a shooting information fusion method, in which an electronic device may input image information into a pre-trained image feature extractor to obtain image information features. Next, the electronic device adds a pre-trained learnable variable to the photographing information (learnable embedding). Then, the photographing information added with the learnable variable is input to the image feature extractor to obtain photographing information features. Then, fusion information is obtained based on the shooting information feature and the image information feature. Wherein the learnable variable may also be referred to as a learnable factor.
In the method, the format of the shooting information can be changed by adding a pre-trained learnable variable into the shooting information, so that the format of the shooting information is matched with the format of the image information; thus, the image feature extractor can extract features from the photographing information to which the learnable variable is added. And because the principle of the image feature extractor is to perform convolution, pooling, deconvolution and the like on the image information, analyzing the rule reflected by each pixel point in the image information, extracting and analyzing the rule to obtain the feature expressed by the rule, wherein the learnable variable can imitate the rule. Based on this, it is possible to make the photographing information after adding the learnable variable have the rule by adding the learnable variable in the photographing information. Thus, the image feature extractor can perform feature extraction on the shooting information added with the learnable variable, and the shooting feature is extracted.
That is, using a trained image information feature extractor, image information features can be extracted from the image information, and shot information features can be extracted from the shot information. Therefore, the number of the feature extractors used by the electronic equipment in the process of shooting information fusion can be reduced, and the occupation of the electronic equipment resources in the process of shooting information fusion by the electronic equipment is reduced.
For example, referring to fig. 2, the photographing information fusion method provided by the embodiment of the application may be applied to a process of photographing by using the electronic device 100. For example, in the process of shooting by using the electronic device, the electronic device may fuse shooting information and image information to obtain fusion information, and optimize the process of shooting by using the electronic device 100 by the user according to the fusion information, so as to improve the quality of the shot image. In particular, reference may be made to the following text corresponding to fig. 10, fig. 13 or fig. 15.
The electronic device 100 may be a mobile phone, a tablet computer, a wearable device, a smart screen, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (personal digital assistant, PDA), or the like, which has an image processing capability; the system can also be an internet of things device with image processing capability such as a smart watch and a smart bracelet; but also cloud servers, heterogeneous servers, clustered servers, etc. with image processing capabilities. The embodiment of the application does not limit the product form of the electronic equipment.
Fig. 3 shows a schematic hardware structure of an electronic device 100 according to an embodiment of the present application.
The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (universal serial bus, USB) interface 130, a charge management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, keys 190, a motor 191, an indicator 192, a camera 193, a display 194, and a subscriber identity module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include a pressure sensor 180A, a gyro sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.
It should be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the application, electronic device 100 may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The processor 110 may include one or more processing units, such as: the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a memory, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.
The controller may be a neural hub and a command center of the electronic device 100, among others. The controller can generate operation control signals according to the instruction operation codes and the time sequence signals to finish the control of instruction fetching and instruction execution.
A memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that the processor 110 has just used or recycled. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby improving the efficiency of the system.
In some embodiments, the processor 110 may include one or more interfaces. The interfaces may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.
The MIPI interface may be used to connect the processor 110 to peripheral devices such as a display 194, a camera 193, and the like. The MIPI interfaces include camera serial interfaces (camera serial interface, CSI), display serial interfaces (display serial interface, DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the photographing functions of electronic device 100. The processor 110 and the display 194 communicate via a DSI interface to implement the display functionality of the electronic device 100.
It should be understood that the interfacing relationship between the modules illustrated in the embodiments of the present application is only illustrative, and is not meant to limit the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also employ different interfacing manners in the above embodiments, or a combination of multiple interfacing manners.
The electronic device 100 implements display functions through a GPU, a display screen 194, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 110 may include one or more GPUs that execute program instructions to generate or change display information.
The display screen 194 is used to display images, videos, and the like.
The ISP is used to process data fed back by the camera 193. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes.
The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, electronic device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.
In the embodiment of the present application, the camera parameters at the time of photographing by the camera 193 may include AF data, AE data, and AWB data. The AF data can be used for controlling parameters related to focusing of the camera in operation, the AE data can be used for controlling parameters related to exposure time of the camera in operation, and the AWB data can be used for controlling parameters related to color temperature of the camera in operation. It is understood that the AF data, AE data, and AWB data may be stored in the memory of the electronic apparatus 100 by the camera after shooting is completed.
The NPU is a neural-network (NN) computing processor, and can rapidly process input information by referencing a biological neural network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent awareness of the electronic device 100 may be implemented through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.
For example, the NPU may run pre-trained learnable variables, scene classifiers, and image feature extractors to obtain fusion information.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device 100. The internal memory 121 may be used to store computer executable program code including instructions. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121.
Also exemplary, the electronic device 100 may implement photographing functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, an application processor, and the like.
The software system of the electronic device 100 may employ a layered architecture, an event driven architecture, a microkernel architecture, a microservice architecture, or a cloud architecture. In the embodiment of the application, taking an Android system with a layered architecture as an example, a software structure of the electronic device 100 is illustrated.
Fig. 4 shows a software architecture diagram of an electronic device 100 according to an embodiment of the present application.
The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, android will be ® The system is divided into four layers, namely an application program layer, an application program framework layer and An Zhuoyun row (Android from top to bottom ® Run time) and system libraries, and kernel layer.
The application layer may include a series of application packages.
As shown in fig. 4, the application package may include applications for cameras, gallery, calendar, phone calls, maps, navigation, WLAN, bluetooth, music, video, short messages, etc.
The application framework layer provides an application programming interface (application programming interface, API) and programming framework for application programs of the application layer. The application framework layer includes a number of predefined functions.
As shown in fig. 4, the application framework layer may include a window manager, a content provider, a view system, a telephony manager, a resource manager, a notification manager, and the like.
The window manager is used for managing window programs. The window manager can acquire the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.
The content provider is used to store and retrieve data and make such data accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phonebooks, etc.
The view system includes visual controls, such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, a display interface including a text message notification icon may include a view displaying text and a view displaying a picture.
The telephony manager is used to provide the communication functions of the electronic device 100. Such as the management of call status (including on, hung-up, etc.).
The resource manager provides various resources for the application program, such as localization strings, icons, pictures, layout files, video files, and the like.
The notification manager allows the application to display notification information in a status bar, can be used to communicate notification type messages, can automatically disappear after a short dwell, and does not require user interaction. Such as notification manager is used to inform that the download is complete, message alerts, etc. The notification manager may also be a notification in the form of a chart or scroll bar text that appears on the system top status bar, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, a text message is prompted in a status bar, a prompt tone is emitted, the electronic device vibrates, and an indicator light blinks, etc.
Android ® Run time includes a core library and virtual machines. Android system ® Run time is responsible for scheduling and management of the android system.
The core library consists of two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.
The application layer and the application framework layer run in a virtual machine. The virtual machine executes java files of the application program layer and the application program framework layer as binary files. The virtual machine is used for executing the functions of object life cycle management, stack management, thread management, security and exception management, garbage collection and the like.
The system library may include a plurality of functional modules. For example: surface manager (surface manager), media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., openGL ES), two-dimensional (2D) graphics engines (e.g., SGL), etc.
The surface manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
Media libraries support a variety of commonly used audio, video format playback and recording, still image files, and the like. The media library may support a variety of audio and video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, etc.
The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.
The 2D graphics engine is a drawing engine for 2D drawing.
The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.
Next, a workflow of the electronic device 100 software and hardware is exemplarily described in connection with capturing a photo scene.
When touch sensor 180K receives a touch operation, a corresponding hardware interrupt is issued to the kernel layer. The kernel layer processes the touch operation into the original input event (including information such as touch coordinates, time stamp of touch operation, etc.). The original input event is stored at the kernel layer. The application framework layer acquires an original input event from the kernel layer, and identifies a control corresponding to the input event. Taking the touch operation as a touch click operation, taking a control corresponding to the click operation as an example of a control of a camera application icon, the camera application calls an interface of an application framework layer, starts the camera application, further starts a camera driver by calling a kernel layer, and captures a still image or video by the camera 193.
In the shooting information fusion method provided by the embodiment of the application, the trained variable can be used for processing shooting data. It will be appreciated that the above-described variable may be trained in a number of ways, and that there may be some differences in the variable obtained from different training methods. In the following, taking an example that the electronic device in the scene shown in fig. 2 is a mobile phone, a training process of the learnable variable is briefly described by using a method of combining the learnable variable and the scene classifier to perform training.
Before training the learnable variables, a training set (train set) and a validation set (validation set) may be constructed. Both the training set and the validation set include a plurality of pre-configured data samples. The data samples may include image information, photographing information, and scene classification results. Shooting information and image information in each data sample have a corresponding relationship; for example, the shooting information may reflect the parameter setting of the camera when the camera acquires the image information. For example, the photographing information may include one or more of Automatic White Balance (AWB) data, automatic Focus (AF) data, and Automatic Exposure (AE) data. It can be understood that the shooting information can be recorded and generated when the camera acquires the image information; alternatively, the shooting information may be extracted from the image information, and the specific extraction method may refer to related technology, which is not described herein.
And the scene classification result has a corresponding relation with the image information and shooting information in the data sample. The scene classification result may be understood as a result obtained by performing scene recognition on the image information and the photographing information (e.g., a virtual focus judgment result, an overexposure judgment result, a color shift judgment result described below). The scene classification result can reflect the image quality corresponding to the image information. If the overexposure judging result is that overexposure occurs (that is, the brightness of part or the whole of the image is higher), the image quality is lower; if the overexposure judging result is that underexposure occurs (namely, the partial or whole brightness in the image is lower), the image quality is lower; if the color cast judging result is that color cast (namely, the color in the image is warm or cold), the image quality is lower; if the result of the virtual focus judgment is that virtual focus occurs (that is, the definition of the image is relatively low, and the boundary of the object in the image is not obvious), the image quality is relatively low.
For example, referring to fig. 5, a training process for a learnable variable on a handset may include steps S500-S505 described below.
S500, the mobile phone acquires a data sample.
The data sample comprises image information, shooting information and scene classification results. The scene classification result has a corresponding relation with the image information and the shooting information; the formats of the photographing information and the image information are different.
S501, the mobile phone processes the image information by using a pre-trained image feature extractor to obtain image features. The image features are used for representing the quality of the picture corresponding to the image information. For example, the image features are used to characterize the exposure level of the picture corresponding to the image information, the sharpness level of the picture corresponding to the image information, the accuracy level of the color corresponding to the image information, and so on.
Illustratively, the pre-trained image feature extractor may be an image feature extractor associated with exposure level analysis, an image feature extractor associated with sharpness analysis, an image feature extractor associated with image color analysis, and so forth.
For example, the image feature extractor may be a pretrained CLIP image encoder model based on the ViT architecture.
S502, adding an initial learnable variable into the shooting information by the mobile phone so that the shooting information added with the initial learnable variable has the same format as the image information.
It will be appreciated that since the image feature extractor is designed for image information, and pre-trained. The input format of the image feature lifter is thus related to the image information. Therefore, before the photographing information is input to the image feature extractor, it is necessary to add an initial learnable variable to the photographing information so that the format of the photographing information after the addition of the initial learnable variable is the same as the format of the image information. Thus, the shooting information added with the initial variable can be input into the image characteristic processor, and the shooting information added with the initial variable is processed through the image characteristic processor.
For example, the shooting information can be expanded into a matrix form with the same size as the image information; thereafter, the initial learnable variable is added.
S503, the mobile phone processes the shooting information added with the initial learnable variable by using the pre-trained image feature extractor to obtain shooting features.
The shooting feature is used for representing the influence degree of shooting information on the image quality. The pre-trained image feature extractor is the same as the image feature extractor in step S501 described above.
It can be understood that, since the principle of the image feature extractor is to perform convolution, pooling, deconvolution and the like on the image information, a rule reflected by each pixel point in the image information is analyzed, and the image feature expressed by the rule is extracted and analyzed from the rule. Based on this, it is possible to make the photographing information to which the variable is added similar to the image information by adding the variable to the photographing information. In this way, the image feature extractor can be made to extract the photographing information to which the learnable variable is added. And, the addition of the learnable variables can simulate the rules of the pixel points; then, the photographing information added with the learnable variable can simulate the rule of the pixel point to express the photographing characteristics.
It will be appreciated that the image feature extractor may extract features from the RGB space; and the learnable variables may map the photographing information to an RGB space of the image. Based on this, the image feature extractor can extract the photographing feature from the photographing information to which the learnable variable is added.
S504, the mobile phone processes the image features and shooting features by using an initial scene classifier to obtain scene classification training results.
For example, the mobile phone may obtain a scene classification training result through an initial scene classifier after adding the image feature and the shooting feature.
S505, the mobile phone iterates the initial learnable variable and the initial scene classifier according to the difference between the scene classification training result and the scene classification result and the difference between the shooting characteristic and the image characteristic; and training the initial learnable variable and the initial scene classifier until convergence to obtain a trained learnable variable and scene classifier.
Then, the mobile phone can use the learned variable trained in the process to fuse shooting information. For example, referring again to fig. 5, a process of fusing photographing information on a mobile phone may include steps S506-S509.
S506, the mobile phone obtains image features according to the image information and the pre-trained image feature extractor.
S507, obtaining shooting characteristics by the mobile phone according to shooting information, the pre-trained learnable variables and the image characteristic extractor.
The shooting information and the image information may be shooting information and image information of the same image (such as a target image), and the data processing formats of the shooting information and the image information are not matched.
It is understood that the image feature extractor in step S507 is the same as that in step S506. It can be understood that the mobile phone may execute step S506 first and then execute step S507, or may execute steps S506 and S507 simultaneously; the embodiment of the present application is not limited in this regard.
For example, the mobile phone may add the pre-trained variable to the photographing information such that the photographing information to which the variable is added is in the same format as the image information in step S507. Next, the mobile phone may input photographing information to which a learnable variable is added to the image feature extractor, resulting in photographing features. Wherein the trained learnable variable can be used to narrow the gap between the shot information and the image information.
Assuming that the format of the image information is a three-dimensional matrix, the mobile phone can expand the shooting information into a one-dimensional matrix, and the size of the one-dimensional matrix is the same as that of the three-dimensional matrix. And combining the pre-trained variable with the one-dimensional matrix by the mobile phone to obtain a three-dimensional matrix with the same image information format. It will be appreciated that the pre-trained learner variable may be in the form of a two-dimensional matrix.
S508, the mobile phone obtains fusion information based on the image features and the shooting features.
The fusion information is used for determining the quality of the target image under the influence of the shooting information.
In some embodiments, the handset may derive the fused information based on a numerical relationship between the image features and the capture features.
For example, the mobile phone may obtain the fusion information according to the numerical relationship between the image feature and the shooting feature. For example, the image feature and the shooting feature are added to obtain the fusion information. The fusion information can be used for optimizing the shooting process on the mobile phone so as to improve the quality of pictures shot by the user.
Optionally, after step S508, step S509 may also be performed.
S509, obtaining a scene classification result by the mobile phone based on the fusion information and the pre-trained scene classifier.
The pre-trained scene classifier may be the scene classifier trained in the steps S500 to S505. The scene classification result can be used for representing the quality of the image corresponding to the image information. Such as whether exposure occurs, whether a virtual focus occurs, whether color shift occurs, etc. Therefore, the mobile phone can optimize the shooting process on the mobile phone by fusing the information; the quality of pictures shot by the mobile phone is improved; and the mobile phone can also process the image by fusing the information so as to improve the quality of the image.
It will be appreciated that the training process corresponding to fig. 5 may be performed by a mobile phone or by other electronic devices. After training other electronic equipment to obtain the learnable variable and the scene classifier, the learnable variable and the scene classifier obtained by training can be loaded to the mobile phone for the mobile phone to use. Specifically, the execution body of the training process may be designed according to the actual use situation, which is not limited in any way by the embodiment of the present application.
It will be appreciated that the learnable variables that need to be trained for different shot information are different, and that the scene classifier corresponding to different shot information may also be different. For example, a learnable variable for AF data may be combined trained with a virtual focus classifier (hereinafter referred to as implementation 1); the learnable variables for AE data may be combined trained with the overexposure classifier (hereinafter referred to as implementation 2); the learnable variables for AWB data may be combined trained with a color cast classifier (hereinafter referred to as implementation 3); similarly, the mobile phone can also perform different applications on the learnable variables and scene classifiers obtained by different data training. The three implementations will be described in detail below.
Implementation 1
In implementation 1, in the usage scenario shown in fig. 2, the technical solution provided in the embodiment of the present application will be described in detail by taking, as an example, fusion information obtained by fusing RGB data and AF data, where the electronic device 100 is a mobile phone, the image information is RGB data, the shooting information is auto-focus (AF) data.
In some embodiments, the data samples may include: RGB data samples, AF data samples, and virtual focus result samples. The RGB data samples may be in a matrix form, such as a three-dimensional matrix of h×w×3, where H is a height of an image corresponding to the RGB data samples, and W is a width of the image corresponding to the RGB data samples; wherein H, W can each be a positive integer greater than zero. In some embodiments, h×w may also be referred to as resolution of the image. The format of the AF data sample may be a numeric format, such as may be noted as AF. The format of the virtual focus result sample can be 0-1 format, if the virtual focus result sample is 0, the data sample is not virtual focus; if the result sample of the virtual focus is 1, the data sample is virtual focus, that is, the image definition is poor, and the image quality is low. Thus, the format of the data samples may be expressed as [ (H x W x 3), (af), (1) ], or [ (H x W x 3), (af), (0) ]. It can be appreciated that the training set and the verification set can be constructed in other ways; specifically, this may be designed according to the actual use situation, and the embodiment of the present application does not limit the construction of the training set and the verification set. The virtual focus can be understood as having a relatively large number of noise points on the image, the definition of the image being relatively low, and the picture being blurred.
The training process of the learnable variables will be described in detail below in connection with the virtual focus judgment scenario.
For example, referring to FIG. 6, a training process for a learnable variable may include steps S600-S603 described below.
S600. the handset inputs the RGB data samples into the pre-trained image feature extractor 610 to obtain image features 630 of the RGB data samples.
It will be appreciated that the RGB data samples are passed through the image features 630 obtained by the image feature extractor 610, which image features 630 may be used to characterize the sharpness of the picture (e.g., whether the boundaries are sharp, etc.) in the image to which the RGB data samples correspond.
S601, the mobile phone adds initial learnable variables into the AF data sample to obtain a learnable AF matrix 620. Wherein the format of the learnable AF matrix is the same as the format of the RGB data samples.
For example, assume that the RGB data sample format is the h×w×3 format described above, that is, a three-dimensional matrix of h×w. The AF data samples are in the form of values, i.e. AF. Step S601 may include the following process:
first, the AF data samples are expanded into a matrix form. For example, a one-dimensional matrix of H x W is created, the elements of which have values af.
Next, a two-dimensional matrix of H x W is created based on the plurality of initial learnable variables, the elements in the matrix being the initial learnable variables.
And combining the H-W one-dimensional matrix and the H-W two-dimensional matrix. Thus, an AF matrix is constructed having a format identical to that of RGB data samples (e.g., H.times.W.times.3 matrix) and having a variable that can be learned. Thus, the AF matrix can be input into the image feature extractor, and the image feature extractor can perform feature extraction on the AF matrix.
It should be noted that, in step S601, a matrix of h×w×2, that is, a two-dimensional matrix of h×w, may also be constructed by using AF data samples, where the element value in the matrix is AF. Thereafter, a one-dimensional matrix is constructed based on the initial selectable learning variables.
S602, the mobile phone inputs the learnable AF matrix 620 into the image feature extractor 610 to obtain AF features 640.
In the virtual focus judgment scene, the AF feature can be used for representing the influence degree of the AF data sample on the image quality (such as the image definition).
It will be appreciated that the principle of the image feature extractor is to extract the rule reflected by the RGB value of each pixel point in the RGB data sample by convolution, pooling, deconvolution, and so on, so as to obtain the image feature. Thus, the AF data sample, i.e., the learnable AF matrix, to which the learnable variable is added, can simulate the above-described rule by the learnable variable and express the AF feature by the above-described rule. Based on this, the AF features can be extracted from the AF data samples by the image feature extractor.
S603, combining the AF features and the image features by the mobile phone, and inputting the AF features and the image features into an initial virtual focus classifier to obtain a virtual focus training result.
After step S603, a loss function of the present training is calculated based on the difference between the image feature 630 and the AF feature 640, and the difference between the virtual focus training result and the virtual focus result sample. And based on the loss function, respectively iterating the initial learnable variable and the initial virtual focus classifier by using a training set until the loss function converges or the iteration times exceed preset times (such as 1000 times, 10000 times, 1000000 times and the like), and ending the iteration. It should be noted that, in the training process of the learnable variable and the virtual focus classifier, the mobile phone can fix the weight of the image feature extractor; that is, the parameters of the image feature extractor 610 are not changed during the training process.
In some embodiments, the learnable variables and the virtual focus classifier after the iteration is finished can be used as the trained learnable variables and virtual focus classifier and used in the subsequent steps. In other embodiments, to ensure the performance of the learnable variable and the virtual focus classifier, the learnable variable and the virtual focus classifier after the iteration can be further verified by using a verification set to verify the performance of the learnable variable and the virtual focus classifier; thus, the capability of the learnable variable to express shooting characteristics in the form of image information can be improved, and the accuracy of the virtual focus classifier on image virtual focus verification can be improved.
Next, verifying the iterated learnable variables and the virtual focus classifier through a verification set to obtain a virtual focus verification result, and calculating the success rate of virtual focus judgment based on the virtual focus verification result and a virtual focus result sample in the verification set; if the success rate meets the preset (e.g., 90%, 95%, 97%, etc.), the learnable variables and the virtual focus classifier training succeed. And if the success rate does not meet the preset, training the learnable variables and the virtual focus classifier by reusing the training set.
As a possible implementation, in the training process described above, the difference between the image feature 630 and the AF feature 640 corresponds to a first weight, and the difference between the virtual focus training result and the virtual focus result sample corresponds to a second weight. The penalty function for this training is calculated based on the difference between the RGB data samples and the AF matrix 620, the first weight, the difference between the virtual focus training result and the virtual focus result sample, and the second weight. Wherein the first weight is less than the second weight.
Specifically, the loss function can be seen in the following expression 1.
Loss=α*L 1 +β*L bce Expression 1.
Wherein L is 1 =sum (f_rgb-f_other), operator "sum" denotes a summation operation, f_rgb denotes an RGB feature, f_other denotes a photographing feature (e.g., AF feature). L (L) bce BCE (virtual focus training result, virtual focus result sample). The operator "BCE" represents a binary cross entropy loss (binary cross entropy loss, BCE) calculation for calculating the differences between the virtual focus result samples of the virtual focus training result. Wherein L is 1 May be referred to as a first type loss (L 1 loss),L bce May be referred to as a binary cross entropy loss. It will be appreciated that in other embodiments, the difference between the image features and the AF features may be calculated by other means, such as a second type of loss (L 2 loss of smoothing (smoothL) 1 ) Etc.; the difference between the virtual focus training result samples can also be calculated in other modes; such as by means of mean squareError (mean squared error, MSE) or cross entropy loss (cross entropy loss, CE), etc.αA first weight is indicated and a second weight is indicated,βa second weight is indicated as being indicative of a second weight,α<β. For example, the number of the cells to be processed,α=0.1,β=3;α=0.2,β=5, and so on.
It will be appreciated that the first weight may be used to measure the degree of similarity between the image features and the AF features; the second weight may be used to measure the degree of similarity between the virtual focus training result and the virtual focus result sample. That is, in an iterative process, the first weight may be used to control the learning ability of the learnable variables to the RGB data format; the mapping, the second weight can be used to control the judgment capability of the virtual focus classifier on the image definition. The first weight is smaller than the second weight, so that the learnable variables can simulate the format of RGB data in the iterative process and the shooting characteristics reflected by shooting information can not be lost in the process of simulating the format of the RGB data. Meanwhile, the virtual focus classifier can be adapted to AF data (namely the AF matrix) added with a learnable variable, expressed AF features and can be combined with AF features and image features to judge whether virtual focus occurs in an image corresponding to RGB data.
In some embodiments, the above L may be used in the training of the learnable variables 1 Performing smoothing operations, i.e. smoothL 1 The method comprises the steps of carrying out a first treatment on the surface of the Limit L 1 In this way, the training process of the variable can be converged more quickly, the training time can be reduced, and the training efficiency can be improved. Exemplary, see FIG. 7, smoothL 1 And L 1 In comparison with [ -1,1]The variation is smoother, especially at 0 point.
Exemplary, see FIG. 8, employing L 1 loss function calculated by loss is shown in part A of FIG. 8, using smoothL 1 The calculated loss function is shown in part B of fig. 8. As can be seen by comparing parts a and B of fig. 8, burrs on the loss function become smaller, thereby employing smoothL 1 The loss function is calculated, the process of the loss function convergence can be quickened, and the loss function convergence can be reducedTime spent training.
Based on this, by training the combination of the learnable variable and the scene classifier (e.g., the virtual focus classifier), the photographing information added with the learnable variable can be made to express the photographing characteristics of the photographing information in the form of image information, and the scene classifier can be made to judge the level of the image quality (e.g., whether virtual focus occurs) corresponding to the image information by combining the photographing characteristics and the image characteristics.
After the training of the learnable variables is completed, the mobile phone can use the learnable variables after the training is completed to perform shooting information fusion. Referring to fig. 9, a photographing information fusion method may include steps S700 to S704.
S700, the mobile phone inputs RGB data of a target image into the pre-trained image feature extractor 710 to obtain image features 730 of the RGB data.
S701, adding the pre-trained learnable variable into AF data of the target image by the mobile phone to obtain a learnable AF matrix 720. Wherein the format of the learnable AF matrix is the same as the format of the RGB data. It will be appreciated that in some embodiments, step S700 may be performed first and then step S701 may be performed, or step S701 and step S700 may be performed simultaneously, which is not limited in any way in the embodiments of the present application.
S702, the mobile phone inputs the learnable AF matrix 720 into the image feature extractor 710 to obtain AF features 740.
For example, the mobile phone may expand the AF data into an AF matrix, and then add a pre-trained learnable variable to the AF data to obtain the AF matrix. Thus, the AF matrix can have the same format as the RGB data, and the AF matrix can extract AF features through the pre-trained image feature extractor.
Specifically, the detailed steps of steps S700-S702 can be referred to the description of steps S600-S602, and will not be repeated.
S703, the mobile phone obtains fusion information based on the image features 730 and the AF features 740.
Illustratively, the handset may add the image features 730 and the AF features 740 to obtain the fused information.
It can be appreciated that the shooting information fusion method provided by the embodiment of the application can reduce the number of feature extractors which need to be trained in advance when in use, and the learning variable can be used after training is completed; there is no need to train the feature extractor separately for the shot information. Thus, overhead of network training can be reduced. And the fusion of shooting features can be realized by only deploying the learnable variables and the image feature extractor on the mobile phone, and the feature extractor does not need to be trained and deployed separately for the shooting features. Thus, by reducing the number of feature extractors on the handset, the occupancy of handset resources (e.g., occupancy of memory resources, or occupancy of handset processor resources at run-time) may be reduced.
In some embodiments, after step S703, the above-described shooting information fusion method may further include step S704.
S704, inputting the fusion information into a pre-trained virtual focus classifier to obtain a virtual focus judgment result.
Therefore, after the fusion information is acquired, the mobile phone can generate a virtual focus judgment result through the pre-trained virtual focus classifier, and can judge the quality of the image corresponding to the image information (such as whether virtual focus occurs or not).
In some embodiments, the mobile phone may also optimize the shooting process of the mobile phone through pre-trained learnable variables and scene classifiers. And (5) optimizing the shooting process of the mobile phone by the trained learnable variables and the corresponding virtual focus classifiers obtained in the steps S500-S505.
It can be understood that the trained learnable variables and the scene classifier corresponding to the learnable variables can be used for optimizing the code scanning process, the face recognition process and the like of the mobile phone.
For example, referring to fig. 10, an optimized photographing process of a cellular phone may include steps S1001 to S1005.
S1001, the mobile phone displays a shooting preview interface 1010.
It can be understood that the user can click on the camera application on the mobile phone to trigger the mobile phone to run the camera application program and display the shooting preview interface.
The shooting preview interface comprises: an image 1070 is to be taken. The image 1070 to be shot is acquired by the camera of the mobile phone based on shooting information. It will be appreciated that the shot information may be used to control some of the operating parameters of the camera when in operation. For example, AF data may be used to control parameters of the camera that are related to focus when in operation, AE data may be used to control parameters of the camera that are related to exposure time when in operation, and AWB data may be used to control parameters of the camera that are related to color temperature when in operation. In fig. 10, a broken line segment in the image 1070 to be photographed indicates that virtual focus occurs.
S1002, the mobile phone obtains fusion information through shooting information 1020, image information 1030, pre-trained learnable variables 1040 and a pre-trained image feature extractor 1050.
The image information 1030 may be image information corresponding to the image 1070 to be photographed. The shooting information can be shooting information which is automatically generated according to light rays, composition and the like during shooting by a shooting algorithm preset on the mobile phone; alternatively, the shooting information input by the user in the shooting preview interface may be used. For example, the photographing information may include AF data. Wherein the learnable variable is used to reduce a difference between the photographing information and the image information.
S1003, the mobile phone obtains a virtual focus judgment result through fusion information and a pre-trained virtual focus classifier 1060.
It can be understood that the virtual focus judgment result can be used for representing the quality of the image corresponding to the current image information; if the virtual focus occurs, the image quality is lower, and if the virtual focus does not occur, the image quality is higher.
Specifically, the steps S1002 to S1003 can be referred to the description of the steps S700 to S703, and will not be repeated here.
If the virtual focus determination result is that virtual focus occurs, step S1004 is executed, and if the virtual focus determination result is that virtual focus does not occur, the optimized photographing process is ended.
S1004, the mobile phone adjusts shooting information 1020.
For example, the mobile phone may execute the preset shooting algorithm again to regenerate shooting information, so as to adjust the shooting information. It can be understood that the shooting algorithm preset on the mobile phone has the possibility of unreasonable generation of shooting information; therefore, after the virtual focus classifier judges that virtual focus occurs, the mobile phone can execute a shooting algorithm again to regenerate shooting information. Therefore, the condition that the shooting information generated by the shooting algorithm on the mobile phone is unreasonable can be relieved, the quality of the image shot by the mobile phone can be improved, and the use experience of a user is improved.
Also, for example, the mobile phone may also adjust the shooting information by using a preset adjustment step. Assuming that the current AF data is 0.5, the preset step size is 0.2. The mobile phone may increase or decrease a preset step size based on the current AF data to obtain adjusted photographing information, that is, the adjusted photographing information may be 0.7 or 0.3.
S1005, the mobile phone obtains a virtual focus judgment result based on the adjusted shooting information.
If the virtual focus determination result is that virtual focus occurs, step S1004 and step S1005 are executed in a circulating manner until the virtual focus determination result is that virtual focus does not occur.
It will be appreciated that for electronic devices with camera capabilities (e.g., mobile phones), the range of AF data values and the granularity of adjustments may be related to the camera on the mobile phone. Based on the above, the mobile phone can obtain the adjusted AF data by adjusting granularity and traversing the values in the AF data value range.
Next, after the mobile phone executes step S1005, the mobile phone displays the image to be shot acquired by the camera based on the adjusted shooting information on the shooting preview interface 1010. For convenience of the following description, the image to be photographed acquired by the camera based on the adjusted photographing information may be simply referred to as an adjusted image to be photographed.
For example, referring to fig. 11, the mobile phone may display a related prompt on the shooting preview interface 1100, such as "detect that the current screen is in a virtual focus or not to perform optimization". In response to the user clicking the "yes" option, the handset displays the adjusted image to be photographed on the photographing preview interface 1110.
Also, for example, after the mobile phone obtains the adjusted shooting information, the mobile phone may also automatically display the adjusted image to be shot on the shooting preview interface.
And then, responding to shooting operation of a user, and storing the adjusted image information to be shot by the mobile phone so as to finish shooting operation.
For example, referring again to fig. 11, in response to the user clicking on a control associated with the shooting on the mobile phone, the mobile phone saves the adjusted image information to be shot. In the shooting preview interface 1100 of fig. 11, a broken line section indicates that a virtual focus has occurred, and the sharpness is relatively low; in the photographing preview interface 1110 of fig. 11, the solid line segment portion indicates that no virtual focus has occurred, and the sharpness is relatively high.
It will be appreciated that in the above process, the user's shooting process on the cell phone can be optimized by pre-trained learnable variables and pre-trained scene classifiers (e.g., virtual focus classifiers). Thus, the user can shoot and obtain images with higher quality on the mobile phone. The quality of the image shot by the user on the mobile phone can be improved, and the use experience of the user can be improved.
Implementation 2
In implementation 2, the technical solution provided in the embodiment of the present application will be described in detail by taking, as an example, fusion information obtained by fusing RGB data and AE data, where in the usage scenario shown in fig. 2, the electronic device 100 is a mobile phone, the image information is RGB data, the shooting information is Automatic Exposure (AE) data.
In some embodiments, the data samples may include: RGB data samples, AE data samples, and overexposure result samples. The RGB data samples may be in a matrix form, such as a three-dimensional matrix of h×w×3, where H is a height of an image corresponding to the RGB data samples, and W is a width of the image corresponding to the RGB data samples; wherein H, W can each be a positive integer greater than zero. In some embodiments, h×w may also be referred to as resolution of the image. The format of the AE data samples may be a matrix format, such as a one-dimensional matrix of 4*4, with data in the matrix being AE1-AE4. The format of the overexposure result sample may be 0-1 format, and if the overexposure result sample is 0, it indicates that the data sample is not overexposed; if the overexposure result sample is 1, the data sample is overexposed, that is, the image definition is poor, and the image quality is low. Thus, the format of the data samples may be expressed as [ (H x W x 3), (4*4), (0/1) ]. It can be appreciated that the training set and the verification set can be constructed in other ways; specifically, this may be designed according to the actual use situation, and the embodiment of the present application does not limit the construction of the training set and the verification set. Overexposure is understood as meaning, among other things, that the brightness of an object in an image is relatively high, that the details of a bright part in the image are lost, and that excessive brightness or distortion is present.
For example, the training process for a learnable variable on a handset, see fig. 12, may include steps S1200-S1203 described below.
S1200. input RGB data samples into a pre-trained image feature extractor 1210 to obtain image features 1230 for the RGB data samples.
It will be appreciated that the image features 1230 may be used to characterize the exposure of pictures in an image to which the RGB data samples correspond.
S1201, adding the initial learnable variable into the AE data sample to obtain a learnable AE matrix 1220. Wherein the format of the leachable AE matrix is the same as the format of the RGB data samples.
For example, assume that the RGB data sample format is the h×w×3 format described above, that is, a three-dimensional matrix of h×w. The AE data samples are in the form of a matrix, i.e. a matrix of 4*4, the elements in the matrix being AE1-AE4, respectively. Step S1201 may include the following procedure:
first, AE data samples are expanded into an h×w matrix form. For example, a one-dimensional matrix of H x W is created, in which the elements are assigned sequentially in the order ae1-ae4. For another example, a one-dimensional matrix of h×w is created, where the first four elements in the matrix are ae1-ae4 and the other elements are any integers.
Next, a two-dimensional matrix of H x W is created based on the plurality of initial learnable variables, the elements in the matrix being the initial learnable variables.
Combining the H-W one-dimensional matrix and the H-W two-dimensional matrix。Thus, an AE matrix having a format identical to that of the RGB data samples (e.g., a matrix of h×w×3) and having a variable to be learned is constructed. Thus, the AE matrix can be input into the image feature extractor, and the image feature extractor can perform feature extraction on the AE matrix.
S1202. the leachable AE matrix 1220 is input into the image feature extractor 1210 to obtain AE features 1240.
Among other things, AE features can be used to characterize how well AE data samples affect image quality, such as image sharpness.
It will be appreciated that the principle of the image feature extractor is to extract the rule reflected by the RGB value of each pixel point in the RGB data sample by convolution, pooling, deconvolution, and so on, so as to obtain the image feature. Thus, the AE data sample, i.e., the learnable AE matrix, to which the learnable variable is added, can simulate the above-described rule by the learnable variable and express the AE feature by the above-described rule. Based on this, AE features can be extracted from AE data samples by an image feature extractor.
S1203, combining the AE features and the image features, and inputting the combined AE features and the combined image features into an initial overexposure classifier to obtain an overexposure training result.
After step S1203, a loss function of the present training is calculated based on the difference between the image feature and the AE feature 1220, and the difference between the overexposure training result and the overexposure result sample. And based on the loss function, iterating the initial learnable variable and the initial overexposure classifier by using a training set until the loss function converges or the iteration number exceeds a preset number (such as 1000 times, 10000 times, 1000000 times, etc.), and ending the iteration.
Next, verifying the iterated learnable variables and the overexposure classifier through a verification set to obtain an overexposure verification result, and calculating the success rate of overexposure judgment based on the overexposure verification result and an overexposure result sample in the verification set; if the success rate meets the preset (e.g., 90%, 95%, 97%, etc.), the learner variable and overexposure classifier training is successful. And if the success rate does not meet the preset, training the learnable variables and the overexposure classifier by reusing the training set. It should be noted that, during the training process of the learnable variables and the overexposure classifier, the mobile phone may fix the weight of the image feature extractor 1210; that is, the parameters of the image feature extractor 1210 are not changed during the training process.
It will be appreciated that the specific process of steps S1200-S1203 is similar to that of steps S600-S603, and reference is made to the above description, and the detailed description is omitted.
Based on this, by training the combination of the learnable variable and the scene classifier (e.g., overexposure classifier), the shot information added with the learnable variable can be made to express the shot characteristics of the shot information in the form of image information, and the scene classifier can be made to judge the level of the image quality (e.g., whether overexposure occurs) corresponding to the image information in combination with the shot characteristics and the image characteristics.
After the training of the learnable variables is completed, the mobile phone can use the learnable variables after the training is completed to perform shooting information fusion. A shooting information fusion process on the mobile phone can comprise steps S1300-S1303.
S1300, inputting RGB data into a pre-trained image feature extractor by the mobile phone to obtain image features of the RGB data sample.
S1301, adding the pre-trained learnable variable into AE data by the mobile phone to obtain a learnable AE matrix. Wherein the format of the leachable AE matrix is the same as the format of the RGB data samples. It will be appreciated that in some embodiments, step S1300 may be performed first and then step S1301 may be performed, or step S1301 and step S1300 may be performed simultaneously, which is not limited in any way by the embodiment of the present application.
S1302, inputting the leachable AE matrix into an image feature extractor by the mobile phone to obtain AE features.
S1303, the mobile phone obtains fusion information based on the image features and the AE features.
For example, the mobile phone may obtain the fusion information according to a numerical relationship between the image feature and the AE feature. For example, the mobile phone adds the image feature and the AE feature to obtain the fusion information.
It can be appreciated that the shooting information fusion method provided by the embodiment of the application can reduce the number of feature extractors which need to be trained in advance when in use, and the learning variable can be used after training is completed; there is no need to train the feature extractor separately for the shot information. Thus, overhead of network training can be reduced. And the fusion of shooting features can be realized by only deploying the learnable variables and the image feature extractor on the mobile phone, and the feature extractor does not need to be trained and deployed separately for the shooting features. Thus, by reducing the number of feature extractors on the handset, the occupancy of handset resources (e.g., occupancy of memory resources, or occupancy of handset processor resources at run-time) may be reduced.
In some embodiments, after step S1303, the above-described shooting information fusion method may further include step S1304.
And S1304, inputting the fusion information into a pre-trained overexposure classifier to obtain an overexposure judgment result.
Therefore, after the fusion information is acquired, the mobile phone can generate an overexposure judgment result through the pre-trained overexposure classifier, and can judge the quality of the image corresponding to the image information (such as whether overexposure occurs or not).
Specifically, steps S1300-S1304 are similar to steps S700-S704 described above, and reference is made to the above description, and the details are not repeated here.
In some embodiments, the mobile phone may also optimize the shooting process of the mobile phone through pre-trained learnable variables and scene classifiers. The shooting process of the mobile phone is optimized by the trained learnable variables and the corresponding overexposure classifier obtained in the steps S1200-S1203.
For example, referring to fig. 13, an optimized photographing process of a cellular phone may include steps S1401-S1405.
S1401, the mobile phone displays a shooting preview interface 1410.
The shooting preview interface comprises: an image 1470 is to be captured. The image 1470 to be shot is acquired by the camera of the mobile phone based on shooting information. It will be appreciated that the shot information may be used to control some of the operating parameters of the camera when in operation. In fig. 13, a portion having diagonal line filling in the image 1470 to be photographed indicates that overexposure has occurred in the portion.
S1402 the handset obtains fusion information by capturing information 1420, image information 1430, pre-trained learnable variables 1440 and pre-trained image feature extractor 1450.
The image information 1430 may be image information corresponding to the image 1470 to be photographed. The shooting information can be shooting information which is automatically generated according to light rays, composition and the like during shooting by a shooting algorithm preset on the mobile phone; alternatively, the shooting information input by the user in the shooting preview interface may be used. For example, the photographing information may include AE data.
S1403, the mobile phone obtains an overexposure judgment result through combining information and a pre-trained overexposure classifier 1460.
It can be understood that the overexposure judgment result can be used for representing the quality of the image corresponding to the current image information; the image quality is lower if overexposure occurs and higher if overexposure does not occur.
If the overexposure determination result is that overexposure occurs, step S1404 is executed, and if the overexposure determination result is that overexposure does not occur, the optimized photographing process is ended.
S1404, the mobile phone adjusts shooting information 1420.
S1405, the mobile phone obtains an overexposure judgment result based on the adjusted shooting information.
If the overexposure determination result is that overexposure occurs, step S1404 and step S1405 are performed in a loop until the overexposure determination result is that overexposure does not occur.
It will be appreciated that for a cell phone, the range of values of AE data on the cell phone, and the granularity of adjustment, will be related to the camera on the cell phone. Based on the above, the mobile phone can traverse the values in the value range of the AE data based on the adjustment granularity of the AE data, and obtain the adjusted AF data, that is, the adjusted shooting information. Illustratively, in some embodiments the AE data on the handset may have a range of values (-4, 4), with a granularity of 1. Therefore, the mobile phone can traverse AE data from (-4, 4) according to granularity of 1 to obtain the adjusted AE data.
Next, after the mobile phone executes step S1405, the mobile phone displays the image to be shot acquired by the camera based on the adjusted shooting information on the shooting preview interface 1410.
Then, in response to the shooting operation of the user, the mobile phone stores the adjusted image information to be shot so as to complete the shooting operation.
It will be appreciated that in the above process, the user's shooting process on the cell phone can be optimized by pre-trained learnable variables and pre-trained scene classifiers (e.g., overexposure classifiers). Thus, the user can shoot and obtain images with higher quality on the mobile phone. The quality of the image shot by the user on the mobile phone can be improved, and the use experience of the user can be improved.
Specifically, steps S1401-S1405 are similar to steps S1001-S1005 described above, and reference is made to the above description, and the details are not repeated here.
Implementation 3
In implementation 3, in the usage scenario shown in fig. 2, the electronic device 100 is a mobile phone, the image information is RGB data, the shooting information is Automatic White Balance (AWB) data, and the fusion information obtained by fusing the RGB data and the AWB data is used for color cast determination as an example, so that the technical scheme provided by the embodiment of the present application will be described in detail.
In some embodiments, the data samples may include: RGB data samples, AWB data samples, and color cast result samples. The format of the color cast result sample can be 0-1 format, if the color cast result sample is 0, the data sample is not color cast; if the color cast result sample is 1, the data sample is color cast, that is, the color of the image is inaccurate, and the image quality is lower. It can be appreciated that the training set and the verification set can be constructed in other ways; specifically, this may be designed according to the actual use situation, and the embodiment of the present application does not limit the construction of the training set and the verification set. Color cast is understood as that the color of an object in an image deviates, and the image cannot truly reflect the color of the object, and the color is warm or cold.
For example, a learning variable training process, see fig. 14, may include steps S1500-S1503 described below.
S1500. the RGB data samples are input into a pre-trained image feature extractor 1510 to obtain image features 1530 for the RGB data samples.
It will be appreciated that the image features 1530 may be used to characterize the accuracy of the colors in the image to which the RGB data samples correspond.
S1501, adding initial learnable variables into the AWB data samples to obtain a learnable AWB matrix 1520. Wherein the format of the learnable AWB matrix is the same as the format of the RGB data samples.
For example, the AWB data in AWB numerical format may be expanded into an AWB matrix of h×w size, and then the learnable variables are added to obtain a learnable AWB matrix 1520.
S1502. The learner AWB matrix 1520 is input into the image feature extractor 1510 to obtain AWB features 1540.
Wherein the AWB feature can be used to characterize the extent to which AWB data samples affect image quality (e.g., the color accuracy level described above).
It will be appreciated that the principle of the image feature extractor is to extract the rule reflected by the RGB value of each pixel point in the RGB data sample by convolution, pooling, deconvolution, and so on, so as to obtain the image feature. Thus, AWB data samples, i.e., a learnable AWB matrix, to which a learnable variable is added, the above-described rule can be simulated by the learnable variable, and AWB characteristics can be expressed by the above-described rule. Based on this, AWB features can be extracted from AWB data samples by an image feature extractor.
S1503, combining the AWB features 1540 and the image features 1530 and inputting the combined AWB features and the image features into an initial color cast classifier to obtain a color cast training result.
After step S1503, a loss function of the present training is calculated based on the difference between the image feature and the AWB feature, and the difference between the color cast training result and the color cast result sample. And based on the loss function, iterating the initial learnable variable and the initial color cast classifier by using a training set until the loss function converges or the iteration number exceeds a preset number (such as 1000 times, 10000 times, 1000000 times and the like), and ending the iteration.
Next, verifying the iterated learnable variables and the color cast classifier through a verification set to obtain a color cast verification result, and calculating the success rate of color cast judgment based on the color cast verification result and a color cast result sample in the verification set; if the success rate meets the preset (e.g., 90%, 95%, 97%, etc.), the learner variable and the color cast classifier training is successful. And if the success rate does not meet the preset, training the learnable variables and the color cast classifier by reusing the training set. It should be noted that, in the training process of the learnable variable and the color cast classifier, the mobile phone can fix the weight of the image feature extractor; that is, the parameters of the image feature extractor 1510 are not changed during the training process.
It will be appreciated that the specific process of steps S1500-S1503 is similar to that of steps S600-S603, and reference is made to the above description, and the detailed description is omitted.
Based on this, by training the combination of the learnable variable and the scene classifier (e.g., the color cast classifier), the shot information added with the learnable variable can express the shot characteristics of the shot information in the form of image information, and the scene classifier can judge the quality of the image (e.g., whether color cast occurs) corresponding to the image information by combining the shot characteristics and the image characteristics.
After the training of the learnable variables is completed, the mobile phone can use the learnable variables after the training is completed to perform shooting information fusion. A shooting information fusion method may include steps S1600-S1603.
S1600, inputting RGB data into a pre-trained image feature extractor by the mobile phone to obtain image features of the RGB data sample.
S1601, adding a pre-trained learnable variable into the AWB data by the mobile phone to obtain a learnable AWB matrix. Wherein the format of the learnable AWB matrix is the same as the format of the RGB data samples. It will be appreciated that in some embodiments, step S1600 may be performed first and then step S1601 may be performed, or step S1601 and step S1600 may be performed simultaneously, which is not limited in this respect.
S1602, the mobile phone inputs the learnable AWB matrix into an image feature extractor to obtain AWB features.
S1603, the mobile phone obtains fusion information based on the image features and the AWB features.
For example, the handset may add the image feature and the AWB feature to obtain the fusion information.
It can be appreciated that the shooting information fusion method provided by the embodiment of the application can reduce the number of feature extractors which need to be trained in advance when in use, and the learning variable can be used after training is completed; there is no need to train the feature extractor separately for the shot information. Thus, overhead of network training can be reduced. And the fusion of shooting features can be realized by only deploying the learnable variables and the image feature extractor on the mobile phone, and the feature extractor does not need to be trained and deployed separately for the shooting features. Thus, by reducing the number of feature extractors on the handset, the occupancy of handset resources (e.g., occupancy of memory resources, or occupancy of handset processor resources at run-time) may be reduced.
In some embodiments, after step S1603, the above-described shooting information fusion method may further include step S1604.
S1604, inputting the fusion information into a pre-trained color cast classifier to obtain a color cast judgment result.
Therefore, after the fusion information is acquired, the mobile phone can generate a color cast judgment result through the pre-trained color cast classifier, and can judge the quality of the image corresponding to the image information (such as whether color cast occurs or not).
Specifically, steps S1600-S1604 are similar to steps S700-S704 described above, and reference is made to the above description, and thus will not be repeated here.
In some embodiments, the mobile phone can also perform post-processing on the image shot by the mobile phone through the pre-trained learnable variables and the scene classifier so as to optimize the quality of the image shot by the mobile phone. The images taken by the mobile phone are optimized by the trained learnable variables and the corresponding color cast classifiers obtained by the steps S1200-S1203.
For example, referring to fig. 15, an image optimization process of a mobile phone may include steps S1701-S1705.
S1701, acquiring a shooting image 1770 by the mobile phone.
Illustratively, the handset may obtain the captured image 1770 in response to a user's capture operation (e.g., the user clicking on a capture control) on the handset capture preview interface 1710. The portion filled with the horizontal stripes in the photographed image 1770 indicates that color shift occurs, and the color display of the portion is inaccurate.
S1702, the mobile phone obtains fusion information according to shooting information 1720, image information 1730, pre-trained learnable variables 1740 and a pre-trained image feature extractor 1750.
The image information 1730 may be image information corresponding to the photographed image 1770; for example, RGB data corresponding to the captured image 1770. The photographing information 1720 may be photographing information corresponding to the photographed image 1770; for example, AWB data corresponding to the captured image 1770. The AWB data may be AWB data manually input by a user in a shooting preview interface, or may be AWB data automatically generated according to a shooting algorithm preset on a mobile phone, such as light rays, composition, and the like during shooting.
S1703, the mobile phone obtains a color cast judging result according to the fusion information and the pre-trained color cast classifier 1760.
S1704, responding to the color cast judging result to be color cast, and adjusting shooting information by the mobile phone until the color cast judging result is that the color cast does not occur.
It will be appreciated that for electronic devices with camera capabilities (e.g., mobile phones), the range of values and granularity of adjustments of the AWB data will be related to the camera on the mobile phone. Based on the above, the mobile phone can obtain the adjusted AWB data by adjusting granularity and traversing the value in the value range of the AWB data.
For example, the value range of the AWB data on the mobile phone may be (2800K, 10000K), and the adjustment granularity of the AWB data is 100K. Wherein K is a color temperature unit and represents kelvin temperature.
S1705, the mobile phone optimizes the shooting image based on the adjusted shooting information to obtain an optimized shooting image 1780.
The mobile phone adjusts RGB data of image information of the photographed image based on the adjusted photographing information.
The mobile phone can adjust the RGB values of the RGB data in the image information of the photographed image through the adjusted AWB data, so as to obtain an optimized photographed image. Specifically, this may be designed according to actual use requirements, and the embodiments in the community do not limit this.
As can be seen from fig. 15, compared with the shot image 1770 before the optimization, the color of the portion of the image where color cast occurs is corrected, and the color cast phenomenon in the shot image 1770 is alleviated.
It can be understood that in the related art, the image can be optimized by a gray world method and a gray pixel method based on the pixel point with the highest gray value and the lowest gray value in the image. Compared with the image optimization method (such as a gray world method and a gray pixel method) in the related art, the image optimization process provided by the embodiment of the application optimizes the image, and can obtain adjusted AWB data based on all pixel points in the image. That is, in the image optimization process provided by the embodiment of the application, the obtained adjusted shooting information is based on all pixel points in the image. Therefore, the image optimization process provided by the embodiment of the application can more comprehensively optimize the image. Thus, the quality of the image can be improved, and the use experience of the user can be improved.
Specifically, steps S1701-S1705 are similar to steps S1001-S1005 described above, and reference is made to the above description, and the detailed description is omitted here.
According to the technical scheme provided by the embodiment of the application, a plurality of shooting information and image information can be fused, such as AF data, AE data and image information; fusing the AWB data, the AE data and the image information; fusing the AWB data, the AE data and the image information; as another example, the AF data, AWB data, and image information are fused, and so on.
Next, taking the use scenario shown in fig. 2 as an example, the electronic device 100 is a mobile phone, the image information is RGB data, and the fusion information obtained by fusing AF data, AE data, AWB data and image information is described in detail.
For example, referring to fig. 16, a process of fusing shooting information on a mobile phone may include steps S1800-S1805.
S1800, inputting RGB data into a pre-trained image feature extractor by the mobile phone to obtain the image features of the RGB data sample.
S1801, adding a first pre-trained learnable variable into AE data by the mobile phone to obtain a learnable AE matrix. Wherein the format of the leachable AE matrix is the same as the format of the RGB data. The first pre-trained variable may be the variable obtained in the implementation 2 through steps S1200 to S1205.
S1802, adding a second pre-trained variable capable of learning into the AF data by the mobile phone to obtain a matrix capable of learning AF. Wherein the format of the AF matrix and the format of the RGB data can be learned to be the same. The second pre-trained variable may be the variable obtained in the implementation 1 through steps S600 to S603.
S1803, adding a pre-trained third learnable variable into the AWB data by the mobile phone to obtain a learnable AWB matrix. Wherein the format of the learnable AWB matrix is the same as the format of the RGB data. The third variable may be the variable obtained in steps S1500 to S1503 in the implementation 3.
It will be appreciated that in some embodiments, the order of execution of each of steps S1800-S1803 may be arbitrarily changed, and the embodiments of the present application are not limited in this respect.
S1804, the mobile phone obtains AE features, AF features and AWB features according to the leachable AE matrix, the leachable AF matrix and the leachable AWB matrix.
S1805, the mobile phone obtains fusion information according to AE features, AF features, AWB features and image features.
Alternatively, in some embodiments, after step S1805, the above-described shooting information fusion method may further include step S1806.
S1806, the mobile phone inputs the fusion information into a pre-trained image quality classifier to obtain a quality result.
Therefore, the mobile phone can judge the quality of the image shot by the mobile phone through the quality result, so that the shooting process of the mobile phone is optimized.
In other embodiments, the first, second, and third learnable variables may also be combined with the scene classifier for training. In this way, the shooting information added with the learnable variable can express the shooting characteristics of the shooting information in the form of image information, and the scene classifier can judge the image quality corresponding to the image information by combining the shooting characteristics and the image characteristics.
For example, the data samples used in the training process may include: RGB data samples, AE data samples, AF data samples, AWB data samples, and quality result samples. The RGB data samples may be in a matrix form, such as a three-dimensional matrix of h×w×3, where H is a height of an image corresponding to the RGB data samples, and W is a width of the image corresponding to the RGB data samples; wherein H, W can each be a positive integer greater than zero. In some embodiments, h×w may also be referred to as resolution of the image. The format of the AE data samples may be a matrix format, such as a one-dimensional matrix of 4*4, with data in the matrix being AE1-AE4. The format of the AF data sample may be a numeric format, such as may be noted as AF. The format of the AWB data samples may be a numeric format, such as may be noted as AWB. The quality result sample may be in a 0-1 format, with a quality result sample of 0 indicating that the image quality of the data sample is relatively high and a quality result sample of 1 indicating that the image quality of the data sample is relatively low. Wherein the image quality is relatively low, it is understood that one or more of the following: the images have more noise points, the definition of the images is lower, and the pictures are blurred; the brightness of the object in the image is higher, the details of the bright part in the image are lost, and the object is excessively bright white or distorted; objects in the image are colored, and the colors of the objects in the image are colder or warmer.
For example, referring to fig. 17, a training process for a learnable variable on a cell phone may include the following steps S1900-S1907.
S1900. the handset inputs RGB data into a pre-trained image feature extractor 1910 to obtain image features of the RGB data samples. It will be appreciated that the image features may be used to characterize the sharpness or exposure of the frames in the image to which the RGB data samples correspond.
S1901, adding an initial learnable variable 1 into the AF data sample by the mobile phone to obtain a learnable AF matrix 1920.
S1902, inputting a learnable AF matrix into a pre-trained image feature extractor by the mobile phone to obtain AF features.
For example, the detailed process of step S1901 and step S1902 may be referred to the detailed description of step S601 and step S602, and the embodiments of the present application are not described herein.
S1903, adding an initial learnable variable 2 into the AE data sample by the mobile phone to obtain a learnable AE matrix 1930.
S1904, inputting the leachable AE matrix into a pre-trained image feature extractor by the mobile phone to obtain AE features.
For example, the detailed process of step S1903 and step S1904 may be referred to the detailed description of step S1201 and step S1202, and the embodiments of the present application are not described herein.
S1905, adding an initial learnable variable 3 into the AWB data sample by the mobile phone to obtain a learnable AWB matrix 1940.
S1906, inputting the leachable AWB matrix into a pre-trained image feature extractor by the mobile phone to obtain AWB features.
For example, the detailed procedure of step S1905 and step S1906 can be referred to the related description of step S1501 and step S1502 described above.
S1907, inputting the image features, the AF features, the AE features and the AWB features into an initial image quality classifier by the mobile phone to obtain a quality training result.
After step S1907, a loss function of the present training is calculated based on the differences between the image features and AE features, the differences between the image features and AF features, the differences between the image features and AWB features, and the differences between the quality training results and quality result samples. And iterating the initial learner variable 1, the initial learner variable 2, the initial learner variable 3, and the initial image quality classifier using a training set based on the loss function until the loss function converges or the iteration number exceeds a preset number (e.g., 1000 times, 10000 times, 1000000 times, etc.), and ending the iteration.
In some embodiments, after training the initial learner variable 1, the initial learner variable 2, the initial learner variable 3 and the initial image quality classifier to converge by using the training set, the iterative learner variable 1, the learner variable 2, the initial learner variable 3 and the image quality classifier can be verified by the verification set to obtain an image quality verification result, and the success rate of image quality judgment is calculated based on the image quality verification result and the quality result sample in the verification set; if the success rate meets the preset (e.g., 90%, 95%, 97%, etc.), the learner variable and the color cast classifier training is successful. And if the success rate does not meet the preset, training the learnable variables and the color cast classifier by reusing the training set.
As a possible implementation manner, in the training process, the difference between the image feature and the AE feature corresponds to the 1 st weight, the difference between the image feature and the AF feature corresponds to the 2 nd weight, the difference between the image feature and the AWB feature corresponds to the 3 rd weight, and the difference between the quality training result and the quality result sample corresponds to the 4 th weight. The mobile phone can calculate the loss function of the training based on the various differences and the weights corresponding to the differences. Wherein the 4 th weight is greater than the 1 st, 2 nd and 3 rd weights.
Specifically, the loss function can be seen in the following expression 2.
Loss=a* L 1 1 + b* L 1 2 + c* L 1 3 + d* L bce Expression 2.
Wherein L is 1 =sum (f_rgb-f_other), operator "sum" denotes a summation operation, f_rgb denotes an RGB feature, f_other denotes a photographing feature (e.g., AF feature, AE feature, AWB feature). E.g. L 1 1 = sum(F_rgb-F_ae),L 1 2 = sum(F_rgb-F_af), L 1 3 = sum(F_rgb-F_awb)。L bce BCE (quality training results, virtual focus result samples). The operator "BCE" means a binary cross entropy loss (binary cross entropy loss, BCE) calculation for calculating the difference between the virtual focus result samples of the virtual focus training result; a represents a first weight, b represents a second weight, c represents a third weight, d represents a fourth weight; d > a, d > b, d > c.
It will be appreciated that the 1 st, 2 nd, 3 rd weights may be used to characterize the degree of similarity between the image features and the capture features; the 4 th weight may be used to characterize the degree of similarity between the quality training results and the quality result samples. That is, in the iterative process, the 1 st, 2 nd, and 3 rd weights can be used to control the learning ability of the learnable variables to the RGB data format; the 4 th weight may control the image quality classifier's ability to determine image quality. By making the 4 th weight larger than the 1 st, 2 nd, and 3 rd weights, it is possible to make the learnable variables not only imitate the format of RGB data but also not lose the photographing characteristics reflected by photographing information in the process of imitating the format of RGB data in the iterative process. Meanwhile, the image quality classifier can be adapted to the shooting data (namely the AF matrix, the AWB matrix and the AE matrix) added with the learnable variables, expressed shooting characteristics can be combined with the shooting characteristics and the image characteristics, and the quality of the image corresponding to the RGB data can be judged.
The numerical relation among a, b and c can be determined according to the attention degree of color cast, overexposure and virtual focus in the picture quality. Assume that a user compares the accuracy of the color of the picture of interest; then the value of c can be increased during the training process, with c > b, c > a. Suppose that the user compares the sharpness of the picture of interest; then the value of b can be increased during the training process, with b > a, b > c. Assume that a user compares the exposure degree of a picture of interest; the value of a can be increased during the training process, with a > b, a > c.
Based on this, by performing the combined training of the variable 1, the variable 2, the variable 3, and the image quality classifier, the photographing information to which the variable is added can be made to express the photographing characteristics of the photographing information in the form of image information, and the image quality classifier can be made to judge the level of the image quality corresponding to the image information (e.g., whether overexposure occurs, whether virtual focus occurs, whether color shift occurs, etc.) in combination with the photographing characteristics and the image characteristics.
It should be noted that, the technical solution provided in the embodiment of the present application may also be other combinations of the foregoing implementation 1, implementation 2, or implementation 4, which are not listed here.
It will be appreciated that in order to achieve the above-described functionality, the electronic device comprises corresponding hardware and/or software modules that perform the respective functionality. The present application can be implemented in hardware or a combination of hardware and computer software, in conjunction with the example algorithm steps described in connection with the embodiments disclosed herein. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Those skilled in the art may implement the described functionality using different approaches for each particular application in conjunction with the embodiments, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The present embodiment may divide the functional modules of the electronic device according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated modules described above may be implemented in hardware. It should be noted that, in this embodiment, the division of the modules is schematic, only one logic function is divided, and another division manner may be implemented in actual implementation.
Embodiments of the present application also provide an electronic device, as shown in fig. 18, which may include one or more processors 2001, memory 2002, and communication interface 2003.
Wherein the memory 2002, communication interface 2003 are coupled to the processor 2001. For example, the memory 2002, the communication interface 2003 and the processor 2001 may be coupled together by a bus 2004.
Wherein the communication interface 2003 is used for data transmission with other devices. The memory 2002 has stored therein computer program code. The computer program code comprises computer instructions which, when executed by the processor 2001, cause the electronic device to perform device authentication in embodiments of the application.
The processor 2001 may be a processor or controller, such as a central processing unit (central processing unit, CPU), a general purpose processor, a digital signal processor (digital signal processor, DSP), an application-specific integrated circuit (ASIC), a field programmable gate array (field programmable gate array, FPGA) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules, and circuits described in connection with this disclosure. The processor may also be a combination that performs the function of a computation, e.g., a combination comprising one or more microprocessors, a combination of a DSP and a microprocessor, and the like.
Bus 2004 may be, among other things, a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, etc. The bus 2004 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 18, but not only one bus or one type of bus.
The embodiment of the application also provides a computer readable storage medium, in which a computer program code is stored, which when executed by the above-mentioned processor, causes the electronic device to perform the relevant method steps in the above-mentioned method embodiments.
The present application also provides a computer program product which, when run on a computer, causes the computer to perform the relevant method steps of the method embodiments described above.
The electronic device, the computer readable storage medium or the computer program product provided by the present application are used to execute the corresponding method provided above, and therefore, the advantages achieved by the present application may refer to the advantages in the corresponding method provided above, and will not be described herein.
It will be apparent to those skilled in the art from this description that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and the parts displayed as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application, or a contributing part or all or part of the technical solution, may be embodied in the form of a software product, where the software product is stored in a storage medium, and includes several instructions for causing a device (may be a single-chip microcomputer, a chip or the like) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely illustrative of specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (13)
1. An information fusion method, characterized in that the method comprises:
acquiring a target image; the target image comprises image information and shooting information, the shooting information comprises camera parameters used for representing when the camera acquires the target image, and the data processing formats of the image information and the shooting information are not matched;
inputting the image information of the target image into an image feature extractor to obtain image features;
inputting the shooting information added with the learnable variable into the image feature extractor to obtain shooting features; the data processing format of the shooting information added with the learnable variable is matched with the data processing format of the image information;
obtaining fusion information according to the image characteristics and the shooting characteristics; the fusion information is used for determining the quality of the target image under the influence of the shooting information.
2. The method of claim 1, wherein the image information comprises RGB data, the RGB data having a data processing format of a three-dimensional matrix format, and the learnable variable having a data processing format of a two-dimensional matrix format; the two-dimensional matrix format is matched with the three-dimensional matrix format; the method further comprises the steps of:
Expanding the data processing format of the shooting information into a one-dimensional matrix format; the one-dimensional matrix format is matched with the three-dimensional matrix format;
combining the shooting information which is expanded into a one-dimensional matrix format by a data processing format with the learnable variable to obtain the shooting information added with the learnable variable; and the data processing format of the shooting information added with the learnable variable is the three-dimensional matrix format.
3. The method of claim 1, wherein the obtaining fusion information from the image feature and the capture feature comprises:
and adding the image features and the shooting features to obtain the fusion information.
4. A method according to any one of claims 1-3, wherein the method further comprises:
inputting the fusion information into a quality classifier to obtain a quality result of the target image, wherein the quality result is used for representing the quality of the target image under the influence of the shooting information; the quality classifier and the learnable variable are obtained based on the same training process;
adjusting the shooting information based on the quality result;
controlling the camera to shoot by utilizing the adjusted shooting information,
Or adjusting the image information of the target image by using the adjusted shooting information to obtain an adjusted target image.
5. The method of claim 4, wherein the quality result comprises: a first type of quality result or a second type of quality result, the quality result being the quality of the image of the first type of quality result, being higher than the quality result being the quality of the image of the second type of quality result; the adjusting the shooting information based on the quality result comprises:
executing a cyclic adjustment process on the shooting information under the condition that the quality result of the target image is the second type quality result until shooting information meeting a preset quality condition is obtained; the quality result corresponding to the shooting information meeting the quality condition is the first type quality result;
wherein the loop adjustment process comprises:
adjusting the shooting information based on a preset adjusting step length to obtain shooting information after adjusting the adjusting step length;
obtaining quality results corresponding to the shooting information after the adjustment of the adjustment step through the learnable variables, the image features and the quality classifier;
And if the quality result of the shooting information adjusted according to the adjustment step length is a second type of quality result, adjusting the shooting information again based on the adjustment step length until the quality result of the shooting information adjusted according to the adjustment step length meets the quality condition.
6. The method according to claim 1, wherein the information type of the photographing information includes one or more of auto white balance AWB data, auto focus AF data, and auto exposure AE data;
the learnable variables include a first type of learnable variable corresponding to the AWB data, a second type of learnable variable corresponding to the AF data, and a third type of learnable variable corresponding to the AE data.
7. The method according to claim 6, wherein the information type of the photographing information includes the AWB data, the AF data, and the AE data, and the photographing characteristics include a photographing characteristic for the AWB data, a photographing characteristic for the AF data, and a photographing characteristic for the AE data; inputting the shooting information added with the learnable variable into the image feature extractor to obtain shooting features, wherein the method comprises the following steps:
Adding the first type of learnable variable into the AWB data, adding the second type of learnable variable into the AF data, and adding the third type of learnable variable into the AE data;
inputting the AWB data added with the first type of the learnable variables into the image feature extractor to obtain shooting features aiming at the AWB data;
inputting the AF data added with the second class of the learnable variables into the image feature extractor to obtain the shooting features aiming at the AF data;
inputting the AE data added with the third class of the learnable variables into the image feature extractor to obtain the shooting features aiming at the AE data.
8. A method of training a learnable variable, the method comprising:
obtaining training data samples, the training data samples comprising: an image information sample of a training image, a shooting information sample of the training image and a quality result sample of the training image; the shooting information sample comprises camera parameters used for representing when a camera collects the training image, the quality result sample is used for representing the quality of the training image under the influence of the shooting information sample, and the image information sample is not matched with the data processing format of the shooting information sample;
Inputting the image information sample into an image feature extractor to obtain image features;
inputting the shooting information sample added with the initial learnable variable into the image feature extractor to obtain shooting features; the data processing format of the shooting information sample added with the initial learnable variable is matched with the data processing format of the image information sample;
inputting the image features and the shooting features into an initial quality classifier to obtain a quality training result;
calculating a loss function based on a difference between the quality training result and the quality result sample, and a difference between the image feature and the photographing feature;
and iterating the initial variable and the initial quality classifier based on the loss function until the loss function converges to obtain the variable and the quality classifier.
9. The method of claim 8, wherein the loss function comprises a first weight and a second weight; the first weight corresponds to a difference between the image feature and the shooting feature, the second weight corresponds to a difference between the quality training result and the quality result sample, and the first weight is smaller than the second weight.
10. The method of claim 9, wherein the learnable variable corresponds to a type of information in the shot information sample, the type of information of the shot information sample comprising one or more of an auto white balance, AWB, auto focus, AF, and auto exposure AE data sample;
the initial learnable variables include a first type of initial learnable variable corresponding to the AWB data samples, a second type of initial learnable variable corresponding to the AF data samples, and a third type of initial learnable variable corresponding to the AE data samples.
11. The method of claim 10, wherein the information type of the shot information samples comprises an AWB data sample, an AF data sample, and an AE data sample, the shot features comprising shot features for the AWB data sample, shot features for the AF data sample, and shot features for the AE data sample; inputting the shooting information sample added with the initial learnable variable into the image feature extractor to obtain shooting features, wherein the method comprises the following steps of:
inputting the AWB data sample added into the first type initial learnable variable into the image feature extractor to obtain the shooting feature aiming at the AWB data sample;
Inputting the AF data sample added with the second type of initial learnable variable into the image feature extractor to obtain the shooting feature aiming at the AF data sample;
inputting the AE data sample added with the third type of initial learnable variable into the image feature extractor to obtain the shooting feature aiming at the AE data sample;
the calculating a loss function includes:
calculating a loss function based on the differences between the shot features for the AWB data samples and the image features, the differences between the shot features for the AF data samples and the image features, the differences between the shot features for the AE data samples, and the differences between the image features and the shot features;
the first weight includes: a first sub-weight, a second sub-weight, and a third sub-weight; the difference between the shot feature for the AWB data sample and the image feature corresponds to the first sub-weight, the difference between the shot feature for the AF data sample and the image feature corresponds to the second sub-weight, the difference between the shot feature for the AE data sample corresponds to the third sub-weight, the first sub-weight is greater than the second sub-weight, and the first sub-weight is greater than the third sub-weight.
12. An electronic device comprising a memory, one or more processors, the memory coupled with the processors; wherein the memory has stored therein computer program code comprising computer instructions; the computer instructions, when executed by the processor, cause the electronic device to perform the method of any of claims 1-11.
13. A computer readable storage medium comprising computer instructions which, when run on an electronic device, cause the electronic device to perform the method of any of claims 1-11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311247658.8A CN117014561B (en) | 2023-09-26 | 2023-09-26 | Information fusion method, training method of variable learning and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311247658.8A CN117014561B (en) | 2023-09-26 | 2023-09-26 | Information fusion method, training method of variable learning and electronic equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117014561A CN117014561A (en) | 2023-11-07 |
CN117014561B true CN117014561B (en) | 2023-12-15 |
Family
ID=88574649
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311247658.8A Active CN117014561B (en) | 2023-09-26 | 2023-09-26 | Information fusion method, training method of variable learning and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117014561B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111800569A (en) * | 2019-04-09 | 2020-10-20 | Oppo广东移动通信有限公司 | Photographing processing method and device, storage medium and electronic equipment |
CN114067381A (en) * | 2021-04-29 | 2022-02-18 | 中国科学院信息工程研究所 | Deep forgery identification method and device based on multi-feature fusion |
CN115240093A (en) * | 2022-09-22 | 2022-10-25 | 山东大学 | Automatic power transmission channel inspection method based on visible light and laser radar point cloud fusion |
CN116206159A (en) * | 2023-03-28 | 2023-06-02 | 武汉大学 | Image classification method, device, equipment and readable storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11087184B2 (en) * | 2018-09-25 | 2021-08-10 | Nec Corporation | Network reparameterization for new class categorization |
-
2023
- 2023-09-26 CN CN202311247658.8A patent/CN117014561B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111800569A (en) * | 2019-04-09 | 2020-10-20 | Oppo广东移动通信有限公司 | Photographing processing method and device, storage medium and electronic equipment |
CN114067381A (en) * | 2021-04-29 | 2022-02-18 | 中国科学院信息工程研究所 | Deep forgery identification method and device based on multi-feature fusion |
CN115240093A (en) * | 2022-09-22 | 2022-10-25 | 山东大学 | Automatic power transmission channel inspection method based on visible light and laser radar point cloud fusion |
CN116206159A (en) * | 2023-03-28 | 2023-06-02 | 武汉大学 | Image classification method, device, equipment and readable storage medium |
Non-Patent Citations (1)
Title |
---|
Perceptual Image Quality Assessment with Transformers;Manri Cheon 等;2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW);第433-442页 * |
Also Published As
Publication number | Publication date |
---|---|
CN117014561A (en) | 2023-11-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111476306A (en) | Object detection method, device, equipment and storage medium based on artificial intelligence | |
TW201918997A (en) | Method and apparatus for video super resolution using convolutional neural network | |
CN112712470B (en) | Image enhancement method and device | |
KR20220051376A (en) | 3D Data Generation in Messaging Systems | |
KR20220118545A (en) | Post-capture processing in messaging systems | |
US11948280B2 (en) | System and method for multi-frame contextual attention for multi-frame image and video processing using deep neural networks | |
CN116048244B (en) | Gaze point estimation method and related equipment | |
WO2021180046A1 (en) | Image color retention method and device | |
CN115689963B (en) | Image processing method and electronic equipment | |
CN110570383B (en) | Image processing method and device, electronic equipment and storage medium | |
CN111800569B (en) | Photographing processing method and device, storage medium and electronic equipment | |
CN113411498A (en) | Image shooting method, mobile terminal and storage medium | |
CN116152122B (en) | Image processing method and electronic device | |
CN112785488A (en) | Image processing method and device, storage medium and terminal | |
CN112651410A (en) | Training of models for authentication, authentication methods, systems, devices and media | |
CN113642359B (en) | Face image generation method and device, electronic equipment and storage medium | |
CN117014561B (en) | Information fusion method, training method of variable learning and electronic equipment | |
CN112752031A (en) | Image acquisition detection method and device, electronic equipment and storage medium | |
CN115880347B (en) | Image processing method, electronic device, storage medium, and program product | |
CN116723383B (en) | Shooting method and related equipment | |
CN115661941A (en) | Gesture recognition method and electronic equipment | |
CN113763517B (en) | Facial expression editing method and electronic equipment | |
CN116740777B (en) | Training method of face quality detection model and related equipment thereof | |
CN117440253B (en) | Image processing method and related device | |
CN115170441B (en) | Image processing method and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |