WO2019228450A1 - 一种图像处理方法、装置及设备、可读介质 - Google Patents
一种图像处理方法、装置及设备、可读介质 Download PDFInfo
- Publication number
- WO2019228450A1 WO2019228450A1 PCT/CN2019/089249 CN2019089249W WO2019228450A1 WO 2019228450 A1 WO2019228450 A1 WO 2019228450A1 CN 2019089249 W CN2019089249 W CN 2019089249W WO 2019228450 A1 WO2019228450 A1 WO 2019228450A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- target
- image data
- image
- position information
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the present disclosure relates to the field of image processing technologies, and in particular, to an image processing method, apparatus, and device, and a readable medium.
- target detection technology is to detect and locate specific targets from a single frame of pictures or videos.
- target detection technology has been widely used in various fields in society, such as: text detection of goods handling in logistics, detection of illegal vehicles in road traffic, detection of passenger flow and statistics of passenger flow in shopping malls and stations, and so on.
- the target detection algorithm mainly uses low-bit-width images that have been processed by the ISP. After detecting the target of interest, the corresponding target image is extracted from the image for display or subsequent recognition. In the system of this detection technology, the quality of the target image finally obtained is generally large, and some images have better quality, but in many cases there may be poor quality such as blur, insufficient brightness, and insufficient contrast.
- the original information of the image will be lost to a certain extent, and the information used in the technical solution of CN104463103A in the subsequent word processing has been processed by the ISP algorithm Data format image, at this time, the information may be seriously lost and cannot be repaired later; and only the text is processed, and the text is generally a very small part of the people's attention.
- the patent does not perform subsequent processing to improve the quality of key targets; on the whole, the current solution is more limited and cannot comprehensively improve the image quality of detected targets.
- the present disclosure provides an image processing method, apparatus, and device, and a readable medium, which can improve the image quality of a detection target.
- a first aspect of the present disclosure provides an image processing method, including:
- the acquiring position information of a specified target in the first image data from the acquired first image data in the first data format includes:
- the position information of the designated target is detected in the second image data, and the detected position information is determined as the position information of the designated target in the first image data.
- the detecting the position information of the designated target in the second image data, and determining the detected position information as the position information of the designated target in the first image data includes:
- the first neural network at least passes a convolution layer for performing convolution, a pooling layer for performing downsampling, and performs feature synthesis A fully-connected layer and a frame regression layer for performing coordinate transformation to realize the positioning and output of the position information of the specified target;
- the result output by the first neural network is determined as position information of the specified target in the first image data.
- the converting the first image data into second image data capable of performing target detection includes: using black level correction, white balance correction, color interpolation, contrast enhancement, and bit width compression. At least one of the image processing methods realizes converting the first image data into second image data capable of performing target detection.
- the acquiring position information of a specified target in the first image data from the acquired first image data in the first data format includes:
- the second neural network passes at least a grayscale layer for performing grayscale processing, a convolution layer for performing convolution, and The down-sampling pooling layer, a fully connected layer for performing feature synthesis, and a border regression layer for performing coordinate transformation implement conversion of the first image data into second image data capable of target detection, and detection of a specified target Position information in the second image data;
- the result output by the second neural network is determined as position information of the specified target in the first image data.
- converting the data format of the target data from the first data format to the second data format includes: inputting the target data to a trained third neural network; the third The neural network implements conversion of the data format of the target data from the first data format to the second data format by at least a convolution layer for performing convolution.
- the converting a data format of the target data from the first data format to a second data format includes:
- ISP processing is performed on the target data; wherein the ISP processing is used to convert a data format of the target data from the first data format to a second data format, and the ISP processing includes at least color interpolation.
- the ISP process further includes at least one of the following processes: white balance correction, curve mapping.
- a second aspect of the present disclosure provides an image processing apparatus including:
- a first processing module configured to obtain position information of a specified target in the first image data from the collected first image data in a first data format
- a second processing module configured to intercept target data corresponding to the position information from the first image data
- a third processing module is configured to convert a data format of the target data from the first data format to a second data format, where the second data format is suitable for displaying and / or transmitting the target data.
- the first processing module includes a first processing unit and a second processing unit; the first processing unit is configured to convert the first image data into a second object that can perform target detection Image data; the second processing unit is configured to detect position information of the designated target in the second image data, and determine the detected position information as the designated target is in the first image data Location information.
- the second processing unit is specifically configured to: input the second image data to a trained first neural network, and determine a result output by the first neural network as all The position information of the specified target in the first image data; the first neural network at least passes a convolution layer for performing convolution, a pooling layer for performing downsampling, and a fully connected layer for performing feature synthesis And a frame regression layer for performing coordinate transformation to realize the positioning and output of the position information of the specified target.
- the first processing unit is specifically configured to: use at least one image processing mode of black level correction, white balance correction, color interpolation, contrast enhancement, and bit width compression to implement the The first image data is converted into second image data capable of performing target detection.
- the first processing module includes a third processing unit; the third processing unit is configured to input the first image data to a trained second neural network; the second Neural networks pass at least a grayscale layer for performing grayscale processing, a convolution layer for performing convolution, a pooling layer for performing downsampling, a fully connected layer for performing feature synthesis, and a coordinate for performing
- the transformed border regression layer realizes converting the first image data into second image data capable of target detection, and detecting position information of a specified target in the second image data; determining a result output by the second neural network Is position information of the specified target in the first image data.
- the third processing module includes a fourth processing unit; the fourth processing unit is configured to input the target data to a trained third neural network; and the third neural network
- the conversion of the data format of the target data from the first data format to the second data format is achieved by at least a convolution layer for performing a convolution.
- the third processing module includes a fifth processing unit; the fifth processing unit is configured to perform ISP processing on the target data; wherein the ISP processing is used to transfer the target data
- the data format of the data is converted from the first data format to a second data format, including at least color interpolation.
- an electronic device including a processor and a memory; the memory stores a program that can be called by the processor; wherein when the processor executes the program, it implements any one of the foregoing embodiments. Item described in the image processing method.
- a fourth aspect of the present disclosure provides a machine-readable storage medium having a program stored thereon that, when executed by a processor, implements the image processing method according to any one of the foregoing embodiments.
- the embodiments of the present invention have the following beneficial effects:
- the first image data in the first data format acquired is used to detect the specified target to obtain its position information, and then the target data corresponding to the obtained position information is intercepted by using the first image data in the first data format. Because the target data is intercepted from the first image data, there is no change in the image format or quality. The target data is then used to perform format conversion to convert it to a data format suitable for display and / or transmission, compared to Existing methods for performing post-processing on an image that has undergone image processing after detection improves the image quality of the detected target.
- FIG. 1 is a schematic flowchart of an image processing method according to an exemplary embodiment of the present disclosure.
- FIG. 2 is a structural block diagram of an image processing apparatus according to an exemplary embodiment of the present disclosure.
- FIG. 3 is a structural block diagram of an embodiment of a first processing module provided by the present disclosure.
- FIG. 4 is a schematic flowchart of an embodiment of converting first image data to second image data provided by the present disclosure.
- FIG. 5 is a schematic diagram of an embodiment of color interpolation provided by the present disclosure.
- FIG. 6 is a structural block diagram of an embodiment of a first neural network provided by the present disclosure.
- FIG. 7 is a structural block diagram of another embodiment of a first neural network provided by the present disclosure.
- FIG. 8 is a structural block diagram of another embodiment of a first processing module provided by the present disclosure.
- FIG. 9 is a structural block diagram of an embodiment of a second neural network provided by the present disclosure.
- FIG. 10 is a structural block diagram of another embodiment of a second neural network provided by the present disclosure.
- FIG. 11 is a schematic diagram of an embodiment for performing grayscale processing provided by the present disclosure.
- FIG. 12 is a structural block diagram of an embodiment of an image processing apparatus provided by the present disclosure.
- FIG. 13 is a structural block diagram of an embodiment of a third neural network provided by the present disclosure.
- FIG. 14 is a structural block diagram of another embodiment of an image processing apparatus provided by the present disclosure.
- 15 is a structural block diagram of an embodiment of an ISP process for converting target data from a first data format to a second data format provided by the present disclosure.
- FIG. 16 is a structural block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
- first, second, third, etc. may be used in this disclosure to describe various information, such information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
- first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information.
- word “if” as used herein can be interpreted as “at” or "when” or "in response to determination”.
- ISP Image Signal Processor processing: It can process the image signals collected by the image sensor of the front-end imaging device, including dead pixel correction, black level correction, white balance correction, color interpolation, gamma correction, color correction, sharpening, Denoising and other functions, you can choose one or more of them according to the actual application.
- Deep learning It is a method that uses neural networks to simulate human brain analysis and learning, and establishes corresponding data representation.
- Neural Network (Neural Network): It is mainly composed of neurons; it can include Convolutional Layer and Pooling Layer.
- an image processing method according to an embodiment of the present disclosure is shown.
- the method may include the following steps:
- S1 acquiring position information of a specified target in the first image data from the collected first image data in a first data format
- S2 intercept target data corresponding to the position information from the first image data
- S3 Convert the data format of the target data from the first data format to a second data format, where the second data format is suitable for displaying and / or transmitting the target data.
- the image processing method of FIG. 1 may be applied to an image device.
- the image device may be a device with an imaging function, such as a camera, or a device capable of performing image post-processing, and the like is not limited.
- the first image data in the first data format may be image data acquired by the image device itself, or image data acquired from other devices, which is not limited in particular.
- the image data collected by the image device is first image data
- the image data format of the first image data is the first data format.
- the first data format is the original image format collected by the image device.
- the original image format is an image format without image preprocessing generated by an image sensor in the image device after sensing one or more spectral bands, and an image in the original image format. It may include data in one or more spectral bands, for example, it may include a spectral sampling signal with a wavelength range of 380 nm to 780 nm and / or a spectral sampling signal with a wavelength range of 780 nm to 2500 nm.
- step S1 position information of a specified target in the first image data is acquired from the collected first image data in a first data format.
- the first image data includes a specified target
- the specified target is an object that is expected to undergo ISP processing to improve the image quality of the specified target.
- a specified target may be detected and located on the first image data.
- the position information of the designated target in the first image data may include: the coordinates of the feature point of the designated target in the first image data, and the size of the image area of the designated target; or, the start point and end point of the designated image area of the target.
- the coordinates and the like are not specifically limited, as long as they can locate the position of the designated target in the first image data.
- step S2 is executed to intercept target data corresponding to the position information from the first image data.
- the first image data in step S2 is the first image data in the first data format that is collected, that is, the original image when the device is acquired, and is not the first image data that is processed in order to obtain the position information of the target object.
- Image data there is no problem of losing image information. That is, the first image data used in steps S1 and S2 are the same data source, and may be the same first image data, or different first image data collected in the same scene. Frame image data, as long as the specified target does not undergo motion or other changes in the two frames of image data.
- the same first image data is selected, and the first image data can be stored in an image device and can be accessed when needed.
- the image area corresponding to the position information in the first image data is the designated target.
- Image capture can be performed on the area pointed by the position information in the first image data to obtain target data corresponding to the specified target. Since the target data is intercepted from the first image data, its data format is still the first data format, which is the same as the data format of the first image data.
- Step S3 is then executed to convert the data format of the target data from the first data format to a second data format, and the second data format is suitable for displaying and / or transmitting the target data.
- step S3 image processing is performed on the target data in the first data format, and the data format is converted into the second data format.
- the second data format is a data format suitable for displaying and / or transmitting the target data.
- Both the first data format and the second data format are image formats.
- the image processing process may not only perform data format conversion, but may also include other image processing to improve the image quality of the target data.
- the first image data in the first data format acquired is used to detect the specified target to obtain its position information, and then the first data in the first data format is used to intercept the target data corresponding to the obtained position information. Because the target data is intercepted from the first image data, there is no change in the image format or quality, and the target data is then format converted to convert it to a data format suitable for display and / or transmission, compared to As for the manner in which the image that has undergone image processing is post-processed after detection, the image quality of the detected target is improved.
- Step S1 is a step of acquiring position information.
- the position information of the specified target can be obtained by detecting the specified target of interest and positioning after detecting the specified target.
- the types of designated targets are not limited, such as text, characters, vehicles, license plates, buildings, etc. The shape and size are also unlimited.
- Preprocessing can be performed to convert the first image data of the inputted first data format into commonly used data for target detection, and then perform target detection, or directly perform the first image data of the first data format.
- Target detection, output target location information the specific implementation is not limited.
- the above method flow may be executed by the image processing apparatus 100.
- the image processing apparatus 100 mainly includes three modules: a first processing module 101, a second processing module 102, and a third processing module 103.
- the first processing module 101 is configured to perform step S1
- the second processing module 102 is configured to perform step S2
- the third processing module 103 is configured to perform step S3.
- the first processing module 101 detects a target or object of interest from the first image data in the first data format, and outputs position information of the detected target; the second processing module 102 outputs based on the first processing module 101 The position information of the target of interest and the first image data of the first input data format, and obtain the target data of the first data format corresponding to the target of interest from the first image data of the original first data format; third The processing module 103 performs adaptive ISP processing on the target data in the first data format corresponding to the target of interest output by the second processing module 102 to obtain the target data in the second data format with higher quality.
- the first processing module 101 includes a first processing unit 1011 and a second processing unit 1012.
- Step S101 may be performed by the first processing unit 1011
- step S102 may be performed by the second processing unit 1012.
- the above step S1 specifically includes the following steps:
- S102 Detect position information of a designated target in the second image data, and determine the detected position information as position information of the designated target in the first image data.
- the first image data is first converted into second image data that can be used for target detection, so that the second Image data can be used to detect specific targets.
- the specific conversion method is not limited, as long as the first image data can be converted into the second image data capable of detecting a target.
- the data format may no longer be the first data format. If it is used for post-processing to detect the extraction target, the image quality cannot be guaranteed. Therefore, in this embodiment, instead of using the second image data to extract the specified target, the second image data is used to detect the position information of the specified target.
- step S102 is performed, and position information of a specified target is detected in the second image data.
- Target recognition and positioning of the specified target in the second image data can determine the position information of the specified target in the second image data.
- the positional relationship of the specified target in the first image data and the second image data generally does not change.
- zooming or panning of the designated target occurs between the first image data and the second image data, but these zooming and panning are all determinable during processing, so it is known that the designated target is in the second image data.
- the position information can be used to know the position information of the specified target in the first image data, and the detected position information is determined as the position information of the specified target in the first image data.
- a manner of converting the first image data into second image data capable of performing target detection may include performing color interpolation processing on at least the first image data.
- at least one of the following processes may also be performed: black level correction, white balance correction, contrast enhancement, and bit width compression, of course, it is not specifically limited to this.
- the first processing unit 1011 may implement step S101 by performing steps S1011 to S1015.
- S1011 to step S1015 are specifically:
- the method of converting the first image data to the second image data is not limited to the above steps S1011 to S1015, and the processing order is not limited.
- the first image data can be converted to the second image data only by color
- the interpolation process may be performed as long as the obtained second image data can perform target detection.
- step S1011 it is assumed that the first image data in the first data format is recorded as imgR, and the black level correction is to remove the influence of the black level in the first image data in the first data format, and output imgR blc :
- imgR blc imgR-V blc
- V blc is a black level value; “-” here is not a mathematical operation, which means the meaning of “removal”.
- step S1012 the white balance correction is to remove the image color cast due to the influence of ambient light in the image imaging to restore the original color information of the image.
- the corresponding R1 and B1 components can be controlled by two coefficients R gain and B gain Adjustments:
- R1 and B1 are the color components of the red and blue channels of the image data after the black level correction processing
- R1 ′ and B1 ′ are the color components of the red and blue channels of the output image of the white balance correction module.
- the output image is recorded as imgR wb .
- the data targeted for color interpolation is data after white balance correction processing.
- the color interpolation can be implemented by the nearest neighbor interpolation method, and the first image data in the single channel first data format is expanded into multi-channel data.
- For the first image data in the first data format of the Bayer format directly fill the missing pixels with the nearest color pixels, so that each pixel contains three RGB color components.
- the specific interpolation process is shown in Figure 5. R11 fills its three neighboring color pixels as R11. Which specific neighboring pixels can be filled can be set. The same applies to other color pixels, which will not be repeated here.
- the interpolated image is recorded as imgC.
- the data for contrast enhancement is data after color interpolation.
- Contrast enhancement is to enhance the contrast of the image after interpolation.
- Gamma curves can be used for linear mapping. It is assumed that the mapping function of the Gamma curve is f (). The image is recorded as imgC gm :
- imgC gm (i, j) f (imgC (i, j)),
- bit-width compression is the compression of the high-bit-width data imgC gm obtained after the contrast enhancement. Linear compression is used directly, and the compressed image is recorded as imgC 1b :
- imgC 1b (i, j) imgC gm (i, j) / M
- M is a compression ratio corresponding to the compression of the first data format to the second data format.
- the second processing unit 1012 may implement step S102 by performing steps S1021 to 1022.
- S1021 input the second image data to a trained first neural network; the first neural network is used to achieve positioning through at least a convolution layer, a pooling layer, a fully connected layer, and a frame regression layer;
- S1022 Determine a result output by the first neural network as position information of the specified target in the first image data.
- step S1021 the first neural network is a trained network, and inputting the second image data into the first neural network can realize the positioning of the specified target in the second image data and obtain the position information of the specified target accordingly.
- the first neural network may be integrated in the second processing unit 1012 as a part of the first processing module 101, or may be provided outside the first processing module 101, and may be scheduled by the second processing unit 1012.
- the first neural network 200 may include at least one convolution layer 201 for performing convolution, at least one pooling layer 202 for performing downsampling, and at least one layer for performing feature synthesis.
- the first neural network 200 may include a convolution layer 205, a convolution layer 206, a pooling layer 207,... A convolution layer 208, a pooling layer 209, The connection layer 210 and the frame return layer 211.
- the second image data is input to the first neural network 200, and the first neural network 200 outputs position information, which is used as position information of the specified target in the first image data.
- the functions performed by each layer of the first neural network have been described above, and each layer may have adaptive changes.
- the convolution kernels of different convolution layers may be different, which will not be described again here.
- the first neural network shown in FIG. 7 is only an example, and is not specifically limited thereto.
- a convolution layer, a pooling layer, and / or other layers may be reduced or increased.
- the convolution layer (Conv) performs a convolution operation and can also carry an activation function ReLU, which can activate the convolution result. Therefore, the operation for a convolution layer can be expressed by the following formula:
- YC i (I) is the i-th output of the convolutional layers
- YC i-1 (I) is a convolution of the i-th input layer
- W is i
- B i are the i th
- G () represents the activation function.
- the activation function is ReLU
- g (x) max (0, x)
- x is YC i (I).
- the pooling layer (Pool) is a special type of downsampling layer, that is, the feature map obtained by the convolution is reduced, and the size of the reduction window is N ⁇ N. Take the maximum value as the value of the corresponding point in the latest image.
- the specific formula is as follows:
- YP j (I) maxpool (YP j-1 (I))
- YP j-1 (I) is the input of the j-th pooling layer
- YP j (I) is the output of the j-th pooling layer
- the fully connected layer can be regarded as a convolution layer with a filter window of 1 ⁇ 1.
- Each node of the fully connected layer is connected to all the nodes in the previous layer, which is used to integrate the features extracted before.
- W ij and B ij are the connection weight coefficients and bias coefficients of the fully connected layer, g () represents the activation function, and I is (i, j).
- the border regression layer is to find a relationship such that the window P output by the fully connected layer is mapped to obtain a window G ′ closer to the real window G; the implementation of regression is generally to perform a coordinate transformation on window P, such as including translation Transformation and / or scaling transformation; assuming that the coordinates of the window P output by the fully connected layer are (x 1 , x 2 , y 1 , y 2 ), then the coordinates of the transformed window P (x 3 , x 4 , y 3, y 4);
- the transformation is a translation transformation
- the translation scale is ( ⁇ x, ⁇ y)
- the coordinate relationship before and after the translation is:
- the scale transformation is a scale transformation
- the scale scales in the X and Y directions are dx and dy, respectively, and the coordinate relationship before and after the transformation is:
- step S1022 the position information of the specified target in the first image data is determined according to a result output by the first neural network, and the output result of the first neural network may be directly used as the position information of the specified target in the first image data. Or, the output result may also be converted by using the position change relationship of the specified target in the first image data and the second image data to obtain position information of the specified target in the first image data.
- the first neural network can be obtained by obtaining a second image data sample and a corresponding position information sample as a training sample set, taking the second image data sample as an input, and the corresponding position information sample as an output Training model for training.
- the second image data sample can be processed by an image processing method that can identify the detection target to obtain a corresponding position information sample.
- the first processing module 101 includes a third processing unit 1013, and step S111 and step S112 may be performed by the third processing unit 1013 to implement the foregoing step S1.
- Step S111 and step S112 are specifically:
- S111 input the first image data to a trained second neural network; the second neural network at least passes the first neural network through a grayscale layer, a convolution layer, a pooling layer, a fully connected layer, and a frame regression layer Converting the image data into second image data capable of target detection, and detecting position information of a specified target in the second image data;
- S112 Determine a result output by the second neural network as position information of the specified target in the first image data.
- the second neural network may be integrated in the third processing unit 1013 as a part of the first processing module 101, or may be provided outside the first processing module 101, and may be scheduled by the third processing unit 1013.
- the second neural network 300 includes at least one grayscale layer 301 for performing grayscale processing, one convolution layer 302 for performing convolution, and one layer for performing downsampling.
- the second neural network can be used to convert the first image data into second image data capable of target detection and detect position information of a specified target in the second image data without performing other ISP processing.
- certain information processing can be performed on the basis of the second neural network processing, which is not limited in particular.
- the second neural network 300 may include a grayscale layer 306, a convolutional layer 307, a convolutional layer 308, a pooling layer 309, ... a convolutional layer 310, a pooling Layer 311, fully connected layer 312, and border regression layer 313.
- the first image data is input to the second neural network, and each layer structure of the second neural network applies position processing to the first image data and outputs position information, and the position information is used as position information of the specified target in the first image data.
- the function performed by each layer of the second neural network is the same as that of the corresponding layer in the first neural network, which has been described above.
- Each layer can have adaptive changes.
- the convolution kernels of different convolution layers may be different. , Will not repeat them here. It can be understood that the second neural network 300 shown in FIG. 10 is only an example, and is not specifically limited thereto.
- the convolutional layer, and / or, the pooling layer, and / or other layers may be reduced or increased.
- the gray layer in the second neural network is to convert the multi-channel first data format information into single-channel gray information, which can be achieved by weighting the components representing different colors around the current pixel.
- the gray layer in the second neural network through processing of a graying layer, components of different colors RGB are weighted and converted into single-channel gray information Y.
- the calculation formula is as follows:
- the functions performed by the convolutional layer, pooling layer, fully connected layer, and border regression layer in the second neural network can be the same as the corresponding layers in the first neural network.
- Each layer can have adaptive changes, such as different convolutions.
- the convolution kernels of the layers may be different, and will not be repeated here.
- the second neural network can be obtained by obtaining the first image data sample and the corresponding position information sample as the training sample set, taking the first image data sample as the input, and the corresponding position information sample as the output.
- Training model for training Regarding the acquisition of the first image data sample and the corresponding position information sample, the first image data sample may be subjected to target detectable image processing, and then the target may be detected through an image processing method that can identify the detection target to obtain the corresponding Sample location information.
- step S2 data can be intercepted according to the position information of the specified target in the first image data obtained in step S1 to the corresponding position in the first image data of the first data format that was originally input, and the intercepted data is used as the corresponding target.
- Target data in the first data format can be intercepted according to the position information of the specified target in the first image data obtained in step S1 to the corresponding position in the first image data of the first data format that was originally input, and the intercepted data is used as the corresponding target.
- the position information of the specified target obtained in step S1 in the first image data is [x1, x2, y1, y2], where x1, y1 are starting position information, and x2, y2 are ending position information.
- the target data imgT in the first data format specifying the target is:
- imgT imgR (x1: x2, y1: y2).
- step S3 the target data in the first data format corresponding to the specified target obtained in step S2 is processed to convert the target data of the specified target from the first data format to the second data format.
- Step S3 is actually image processing for small target data, which can be implemented by ISP processing implemented by a non-neural network, or by a neural network.
- the third processing module 103 includes a fourth processing unit 1031, and the fourth processing unit 1031 may perform the following steps to implement the above step S3.
- Input target data in a first data format to a trained third neural network implements at least a convolution layer to convert the data format of the target data from the first data format to the second data format.
- the third neural network may be integrated in the fourth processing unit 1031 as a part of the third processing module 103, or may be provided outside the third processing module 103, and may be scheduled by the fourth processing unit 1031.
- the third neural network may include at least one convolution layer for performing a convolution to convert a data format of the target data from the first data format to a second data format.
- the layer structure of the third neural network is not limited to this.
- it may include at least one ReLu layer for performing activation, or may include other layers. The number of specific layers is not limited.
- Image processing is implemented based on the third neural network, which reduces the error propagation that may be caused by traditional image processing in each processing step.
- each layer of the third neural network is described in detail below, but it should not be limited to this.
- FC i + 1 g (w ik * FC i + b ik )
- w ik and b ik are parameters of the k-th convolution in the current convolution layer, and g (x) is a linear weighting function, that is, the convolution output of each convolution layer is linearly weighted.
- g (x) is a linear weighting function, that is, the convolution output of each convolution layer is linearly weighted.
- the convolutional layer of the third neural network and the convolutional layer of the first neural network both perform convolution operations, and therefore have similar functions.
- the third neural network 400 may include a convolutional layer 401, a convolutional layer 402, a ReLu layer 403, a convolutional layer 404, and a convolutional layer 405 which are sequentially connected.
- the input of the third neural network 400 is the target data in the first data format
- the output is the target data in the second data format.
- the functions performed by each layer of the third neural network are the same as the corresponding layers of the first neural network, which have been described above.
- Each layer may have adaptive changes.
- the convolution kernels of different convolution layers may be different. I will not repeat them here.
- the third neural network shown in FIG. 13 is only an example, and is not specifically limited thereto.
- the convolutional layer, and / or, the pooling layer, and / or other layers may be reduced or increased.
- the third neural network For the training of the third neural network, in order to optimize the deep neural network in advance, a large number of target data samples in the first data format and target data samples corresponding to the ideal second data format can be used to form samples.
- the third neural network training process is used The network parameters are continuously trained until the target data in the first data format is input, and the target data in the ideal second data format can be output. At this time, the network parameters are output for actual testing and use by the third neural network.
- the training process for training the third neural network may include the following steps:
- S311 Collect training samples: collect first data format information corresponding to the target of interest and corresponding ideal second data format information. Assume that n training sample pairs ⁇ (x 1 , y 1 ), (x 2 , y 2 ), ..., (x n , y n ) ⁇ have been obtained, where x i represents the input first data format information , Y i represents corresponding ideal second data format information.
- S312 design the structure of the third neural network; the network structure used in network training and the network structure used in testing are the same network structure;
- S313 Initialize training parameters; initialize network parameters of the structure of the third neural network, which can be random value initialization, fixed value initialization, etc .; set training related parameters, such as learning rate, number of iterations, etc .;
- the training process of the third neural network is not limited to this, and it can also be other training methods, as long as the trained third neural network can achieve the target data input in the first data format and the corresponding second data format can be obtained.
- Target data can be obtained.
- the third processing module 103 includes a fifth processing unit 1032.
- the fifth processing unit 1032 may perform ISP processing on the target data, and remove the target data from the first data.
- the data format is converted into the second data format, and the ISP processing includes at least color interpolation to implement step S3 described above.
- the ISP processing further includes at least one of the following processes: white balance correction and curve mapping, which can further improve image quality.
- Using only the target data in the first data format to implement the calculation of the parameters in the ISP processing can improve the accuracy of the processing parameters, thereby improving the image quality after the target data is processed.
- the ISP processing may include the following steps in order:
- S301 white balance correction; inputting target data in a first data format
- the ISP processing for converting the target data from the first data format to the second data format is not limited to this, for example, only color interpolation may be performed, or other ISP processing methods may be included.
- the ISP processes such as white balance correction, color interpolation, and curve mapping are described in more detail below, but it should not be limited to this.
- White balance correction is to remove the image color cast due to the influence of ambient light to restore the original color information of the image.
- two coefficients R- gain and B- gain are used to control the corresponding R and B components. Adjustment.
- R2 ′ R2 * R _gain
- R2 and B2 are the color components of the red and blue channels of the input image of white balance correction
- R2 'and B2' are the color components of the red and blue channels of the output image of white balance correction
- R- gain and B- gain only the R, B, and G channel color components of the target of interest need to be calculated and calculated.
- Color interpolation refers to expanding the target data of the first data format after white balance correction from a single-channel format to a multi-channel data format in which each channel represents a color component; it can be implemented using the nearest neighbor interpolation method to convert the single-channel first data
- the formatted target data is expanded into multi-channel target data.
- the nearest color pixels can be directly used to fill the missing pixels of the corresponding color, so that each pixel contains three RGB color components.
- the specific interpolation process corresponds to the aforementioned FIG. 4
- the embodiments may be the same or similar, and are not repeated here.
- Curve mapping refers to adjusting the brightness and contrast of image data according to the visual characteristics of the human eye.
- Gamma curves with different parameters are commonly used for mapping. Assuming that the mapping function of the Gamma curve is g, the mapped image is recorded as img gm . The previous image is marked as img, then:
- img gm (i, j) g (img (i, j)).
- the embodiment of the present disclosure uses the acquired first image data in the first data format to perform detection of a specified target to obtain its position information; and then uses the first image data in the first data format to intercept target data corresponding to the obtained position information; the Because the target data is intercepted from the first image data, there is no change in the image format or quality, and then the target data is converted to a data format suitable for display and / or transmission, compared to the image processing In terms of post-processing of the image after detection, the image quality of the detected object is improved.
- an image processing apparatus 100 may include:
- a first processing module 101 configured to obtain position information of a specified target in the first image data from the collected first image data in a first data format
- a second processing module 102 configured to intercept target data corresponding to the position information from the first image data
- a third processing module 103 is configured to convert a data format of the target data from the first data format to a second data format, where the second data format is suitable for displaying and / or transmitting the target data.
- the image processing apparatus 100 may be applied to an image device.
- the image device may be a device with an imaging function, such as a video camera, or a device capable of performing image post-processing, and the like is not limited.
- the first image data in the first data format may be image data acquired by the image device itself, or image data acquired from other devices, which is not limited in particular.
- the first processing module 101 includes a first processing unit 1011 and a second processing unit 1012.
- the first processing unit 1011 is configured to convert the first image data into second image data capable of performing target detection.
- the second processing unit 1012 is configured to detect position information of the designated target in the second image data, and determine the detected position information as a position of the designated target in the first image data. information.
- the second processing unit 1012 is specifically configured to: input the second image data to a trained first neural network, and determine a result output by the first neural network as the first neural network. Specify the position information of the target in the first image data.
- the first neural network includes at least a convolution layer for performing convolution, a pooling layer for performing downsampling, a fully connected layer for performing feature synthesis, and a frame regression layer for performing coordinate transformation. In order to realize the positioning and output of the position information of the specified target.
- the first processing unit 1011 is specifically configured to: use at least one of black level correction, white balance correction, color interpolation, contrast enhancement, and bit width compression to convert the first image data It is converted into the second image data capable of performing target detection.
- the first processing module 101 includes a third processing unit 1013 for inputting the first image data to a trained second neural network.
- the second neural network includes at least a grayscale layer for performing grayscale processing, a convolution layer for performing convolution, a pooling layer for performing downsampling, and a full connection for performing feature synthesis.
- Layer and frame regression layer for performing coordinate transformation to convert the first image data into second image data capable of target detection, and detect position information of a specified target in the second image data. In this way, position information of the specified target in the first image data may be determined according to a result output by the second neural network.
- the third processing module 103 includes a fourth processing unit 1031 for inputting the target data to a trained third neural network.
- the third neural network includes at least a convolution layer for performing convolution to convert the target data from the first data format to a second data format.
- the third processing module 103 includes a fifth processing unit 1032 for performing ISP processing on the target data.
- the ISP processing is used to convert the target data from the first data format to the second data format, and the ISP processing includes at least color interpolation.
- the relevant part may refer to the description of the method embodiment.
- the device embodiments described above are only schematic, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units.
- the present disclosure also provides an electronic device including a processor and a memory; the memory stores a program that can be called by the processor; wherein when the processor executes the program, the program is implemented as in any one of the foregoing embodiments.
- Embodiments of the image processing apparatus of the present disclosure can be applied to electronic devices. Taking software implementation as an example, as a device in a logical sense, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory through the processor of the electronic device where it is located.
- FIG. 16 is a hardware structural diagram of an electronic device in which the image processing apparatus 100 is located according to an exemplary embodiment of the present disclosure, except for the processor 510 and the memory shown in FIG. 7.
- the electronic device in which the device 100 is located in the embodiment may generally include other hardware according to the actual function of the electronic device, and details are not described herein again.
- the present disclosure also provides a machine-readable storage medium having a program stored thereon, which when executed by a processor, causes an image device to implement the image processing method according to any one of the foregoing embodiments.
- the present disclosure may take the form of a computer program product implemented on one or more storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) containing program code therein.
- Machine-readable storage media includes permanent and non-permanent, removable and non-removable media, and information can be stored by any method or technology.
- Information may be computer-readable instructions, data structures, modules of a program, or other data.
- machine-readable storage media include, but are not limited to: phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read-only Memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only disc read-only memory (CD-ROM), digital versatile disc (DVD), or other optical storage , Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices.
- PRAM phase change memory
- SRAM static random access memory
- DRAM dynamic random access memory
- RAM random access memory
- ROM read-only Memory
- EEPROM electrically erasable programmable read-only memory
- flash memory or other memory technologies
- CD-ROM compact disc read-only memory
- DVD digital versatile disc
- Magnetic tape cartridges magnetic tape storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
本公开提供一种图像处理方法、装置及设备、可读介质,该图像处理方法包括从采集的第一数据格式的第一图像数据中获取指定目标在所述第一图像数据中的位置信息;从所述第一图像数据中截取所述位置信息对应的目标数据;将所述目标数据的数据格式从所述第一数据格式转换为第二数据格式,所述第二数据格式适于所述目标数据显示和/或传输。可提升检测目标的图像质量。
Description
相关申请的交叉引用
本专利申请要求于2018年5月31日提交的、申请号为201810571964.X、发明名称为“一种图像处理方法、装置及设备、可读介质”的中国专利申请的优先权,该申请的全文以引用的方式并入本文中。
本公开涉及图像处理技术领域,尤其涉及一种图像处理方法、装置及设备、可读介质。
目标检测技术主要目的是从单帧图片或视频中检测并定位特定的目标。目前,目标检测技术已广泛应用于社会中的各个领域,例如:物流中货物搬运的文字检测,道路交通中对违章车辆的检测,商场、车站对客流的检测和客流量的统计等等。
目标检测算法主要应用已经经过ISP处理后的低位宽图像,在检测到感兴趣的目标后,从图像中抠取相应的目标图像,以作显示或者后续的识别等处理。在这种检测技术的系统中,最终获取的目标图像的质量一般差距较大,有些图像质量较优,但是很多时候可能存在模糊、亮度不足、对比度不足等质量较差的情况。
中国专利局公开的公开号为CN104463103A的专利申请文件中,提出了一种图像处理方法及装置,当检测目标是文字时,对目标图像中文字进行清晰化处理,方案的主要流程如下:首先对图像中的感兴趣目标进行检测,检测目标使用预设的分类器进行分类,当分类结果为文字时,对文字进行清晰化处理。
ISP处理算法由于设计的缺陷,以及每个处理模块损失的叠加,最后会在一定程度上损失图像原有信息,而CN104463103A的技术方案在后续对文字处理时使用的信息已经是经过ISP算法处理后的数据格式图像,此时信息有可能丢失较为严重,后续无法再修复;且只针对文字进行处理,而文字一般是人们关注对象的极小一部分,当检测到人们关注的其它目标如人脸、车辆、建筑等时,该专利并不进行后续的处理以提高关键目标的质量;整体来说,当前方案较为局限,无法全面提升检测到目标的图像质量。
发明内容
有鉴于此,本公开提供一种图像处理方法、装置及设备、可读介质,可提升检测目标的图像质量。
本公开第一方面提供一种图像处理方法,包括:
从采集的第一数据格式的第一图像数据中获取指定目标在所述第一图像数据中的位置信息;
从所述第一图像数据中截取所述位置信息对应的目标数据;
将所述目标数据的数据格式从所述第一数据格式转换为第二数据格式,所述第二数据格式适于所述目标数据显示和/或传输。
根据本公开的一个实施例,所述从采集的第一数据格式的第一图像数据中获取指定目标在所述第一图像数据中的位置信息包括:
将所述第一图像数据转换为可进行目标检测的第二图像数据;
在所述第二图像数据中检测出所述指定目标的位置信息,将检测出的位置信息确定为所述指定目标在所述第一图像数据中的位置信息。
根据本公开的一个实施例,所述在第二图像数据中检测出指定目标的位置信息,将检测出的位置信息确定为所述指定目标在所述第一图像数据中的位置信息包括:
将所述第二图像数据输入至已训练的第一神经网络;所述第一神经网络至少通过用于执行卷积的卷积层、用于执行下采样的池化层、用于执行特征综合的全连接层和用于执行坐标变换的边框回归层实现所述指定目标的位置信息的定位和输出;
将所述第一神经网络输出的结果确定为所述指定目标在第一图像数据中的位置信息。
根据本公开的一个实施例,所述将所述第一图像数据转换为可进行目标检测的第二图像数据,包括:采用黑电平校正、白平衡校正、色彩插值、对比度增强和位宽压缩中的至少一种图像处理方式,实现将所述第一图像数据转换为可进行目标检测的第二图像数据。
根据本公开的一个实施例,所述从采集的第一数据格式的第一图像数据中获取指定目标在所述第一图像数据中的位置信息包括:
将所述第一图像数据输入至已训练的第二神经网络;所述第二神经网络至少通过用于执行灰度处理的灰度化层、用于执行卷积的卷积层、用于执行下采样的池化层、用于执行特征综合的全连接层和用于执行坐标变换的边框回归层实现将所述第一图像数据转换为可进行目标检测的第二图像数据,以及检测指定目标在第二图像数据中的位置信息;
将所述第二神经网络输出的结果确定为所述指定目标在所述第一图像数据中的位置信息。
根据本公开的一个实施例,所述将目标数据的数据格式从所述第一数据格式转换为第二数据格式包括:将所述目标数据输入至已训练的第三神经网络;所述第三神经网络至少通过用于执行卷积的卷积层实现将所述目标数据的数据格式从所述第一数据格式转换为第二数据格式。
根据本公开的一个实施例,所述将目标数据的数据格式从所述第一数据格式转换为第二数据格式包括:
对所述目标数据执行ISP处理;其中,所述ISP处理用于将所述目标数据的数 据格式从所述第一数据格式转换为第二数据格式,所述ISP处理至少包括色彩插值。
根据本公开的一个实施例,所述ISP处理还包括以下处理中的至少一种:白平衡校正、曲线映射。
本公开第二方面提供一种图像处理装置,包括:
第一处理模块,用于从采集的第一数据格式的第一图像数据中获取指定目标在所述第一图像数据中的位置信息;
第二处理模块,用于从所述第一图像数据中截取所述位置信息对应的目标数据;
第三处理模块,用于将所述目标数据的数据格式从所述第一数据格式转换为第二数据格式,所述第二数据格式适于所述目标数据显示和/或传输。
根据本公开的一个实施例,所述第一处理模块包括第一处理单元和第二处理单元;所述第一处理单元,用于将所述第一图像数据转换为可进行目标检测的第二图像数据;所述第二处理单元,用于在所述第二图像数据中检测出所述指定目标的位置信息,将检测出的位置信息确定为所述指定目标在所述第一图像数据中的位置信息。
根据本公开的一个实施例,所述第二处理单元,具体用于:将所述第二图像数据输入至已训练的第一神经网络,以及将所述第一神经网络输出的结果确定为所述指定目标在第一图像数据中的位置信息;所述第一神经网络至少通过用于执行卷积的卷积层、用于执行下采样的池化层、用于执行特征综合的全连接层和用于执行坐标变换的边框回归层实现所述指定目标的位置信息的定位和输出。
根据本公开的一个实施例,所述第一处理单元,具体用于:采用黑电平校正、白平衡校正、色彩插值、对比度增强和位宽压缩中的至少一种图像处理方式,实现将所述第一图像数据转换为可进行目标检测的第二图像数据。
根据本公开的一个实施例,所述第一处理模块包括第三处理单元;所述第三处理单元,用于将所述第一图像数据输入至已训练的第二神经网络;所述第二神经网络至少通过用于执行灰度处理的灰度化层、用于执行卷积的卷积层、用于执行下采样的池化层、用于执行特征综合的全连接层和用于执行坐标变换的边框回归层实现将所述第一图像数据转换为可进行目标检测的第二图像数据、以及检测指定目标在第二图像数据中的位置信息;将所述第二神经网络输出的结果确定为所述指定目标在所述第一图像数据中的位置信息。
根据本公开的一个实施例,所述第三处理模块包括第四处理单元;所述第四处理单元,用于将所述目标数据输入至已训练的第三神经网络;所述第三神经网络至少通过用于执行卷积的卷积层实现将所述目标数据的数据格式从所述第一数据格式转换为第二数据格式。
根据本公开的一个实施例,所述第三处理模块包括第五处理单元;所述第五处理单元,用于对所述目标数据执行ISP处理;其中,所述ISP处理用于将所述目标数据的数据格式从所述第一数据格式转换为第二数据格式,至少包括色彩插值。
本公开第三方面提供一种电子设备,包括处理器及存储器;所述存储器存储有 可被处理器调用的程序;其中,所述处理器执行所述程序时,实现如前述实施例中任意一项所述的图像处理方法。
本公开第四方面提供一种机器可读存储介质,其上存储有程序,该程序被处理器执行时,实现如前述实施例中任意一项所述的图像处理方法。相比于现有技术,本发明实施例具有以下有益效果:
本发明实施例利用采集得到的第一数据格式的第一图像数据,进行指定目标的检测而获取其位置信息,再利用第一数据格式的第一图像数据截取所得位置信息对应的目标数据,该目标数据由于是从第一图像数据中截取的,因而未发生图像格式或质量的改变,再利用该目标数据进行格式转换而将其转换至适于显示和/或传输的数据格式,相比于现有对已经经过图像处理的图像在检测后进行后处理的方式而言,提升了所检测目标的图像质量。
图1为本公开一示例性实施例的图像处理方法的流程示意图。
图2为本公开一示例性实施例的图像处理装置的结构框图。
图3为本公开提供的第一处理模块的一个实施例的结构框图。
图4为本公开提供的将第一图像数据转换为第二图像数据的一个实施例的流程示意图。
图5为本公开提供的色彩插值的一个实施例的示意图。
图6为本公开提供的第一神经网络的一个实施例的结构框图。
图7为本公开提供的第一神经网络的另一个实施例的结构框图。
图8为本公开提供的第一处理模块的另一个实施例的结构框图。
图9为本公开提供的第二神经网络的一个实施例的结构框图。
图10为本公开提供的第二神经网络的另一个实施例的结构框图。
图11为本公开提供的进行灰度化处理的一个实施例的示意图。
图12为本公开提供的图像处理装置的一个实施例的结构框图。
图13为本公开提供的第三神经网络的一个实施例的结构框图。
图14为本公开提供的图像处理装置的另一个实施例的结构框图。
图15为本公开提供的将目标数据从第一数据格式转换为第二数据格式的ISP处理的一个实施例的结构框图。
图16为本公开一示例性实施例的电子设备的结构框图。
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
为了使得本公开的描述更清楚简洁,下面对本公开中的一些技术术语进行解释:
ISP(Image Signal Processor)处理:可以对前端成像设备的图像传感器采集的图像信号进行处理,有坏点校正、黑电平校正、白平衡校正、色彩插值、伽马校正、色彩校正、锐化、去噪等功能,具体可根据实际应用选择其中的一种或几种。
深度学习:是一种使用神经网络来模拟人脑分析学习、并建立对应数据表征的方法。
神经网络(Neural Network):主要由神经元构成;可以包括卷积层(Convolutional Layer)和池化层(Pooling Layer)等。
下面对本公开实施例的图像处理方法进行更具体的描述,但不应以此为限。
在一个实施例中,参看图1,示出了本公开实施例的一种图像处理方法,该方法可以包括以下步骤:
S1:从采集的第一数据格式的第一图像数据中获取指定目标在所述第一图像数据中的位置信息;
S2:从所述第一图像数据中截取所述位置信息对应的目标数据;
S3:将所述目标数据的数据格式从所述第一数据格式转换为第二数据格式,所述第二数据格式适于所述目标数据显示和/或传输。
在本公开实施例中,图1的图像处理方法可以应用在图像设备上,图像设备可以是摄像机等具有成像功能的设备,或者是可进行图像后处理的设备等,具体不限。第一数据格式的第一图像数据可以是图像设备自身采集得到的图像数据,也可以是从其他设备获取的图像数据,具体不限。
图像设备采集的图像数据为第一图像数据,该第一图像数据的图像数据格式为 第一数据格式。第一数据格式是图像设备采集的原始图像格式,比如原始图像格式是图像设备中的图像传感器对一个或多个光谱波段进行感光之后生成的未经图像预处理的图像格式,原始图像格式的图像可包含一个或多个光谱波段的数据,例如可以包括对波长范围是380nm~780nm的光谱采样信号和/或对波长范围是780nm~2500nm的光谱采样信号。通常来说,第一数据格式的图像直接用于显示或传输会存在一定的困难。
在步骤S1中,从采集的第一数据格式的第一图像数据中获取指定目标在所述第一图像数据中的位置信息。
第一图像数据中包含有指定目标,该指定目标是被期望进行ISP处理,以提高该指定目标的图像质量的对象。可以在第一图像数据检测并定位指定目标。
指定目标在第一图像数据中的位置信息,可以包括:指定目标的特征点在第一图像数据中的坐标、及指定目标的图像区域的大小;或者,指定目标的图像区域的起始点和终点的坐标等等,具体并不限定,只要是能够定位指定目标在第一图像数据中的位置即可。
接着执行步骤S2,从所述第一图像数据中截取所述位置信息对应的目标数据。
步骤S2中的第一图像数据是采集的第一数据格式的第一图像数据,也即是设备采集时的原始图像,并非是为了获取目标对象的位置信息而对第一图像数据经过处理后的图像数据,不存在丢失图像信息的问题。也就是说,步骤S1和步骤S2中所利用的第一图像数据是同一数据源,可以是同一第一图像数据,也可以是在相同场景下采集的不同第一图像数据,例如可以是前后两帧图像数据,只要指定目标在两帧图像数据中不发生运动或其他变化即可。当然,优选来说,步骤S1和步骤S2中是选用同一第一图像数据,该第一图像数据可以存储在图像设备中,在需要利用时可调取。
由于位置信息是从第一图像数据中检测获取的,因而位置信息在第一图像数据中对应的图像区域便是指定目标。可在第一图像数据中位置信息指向的区域进行图像截取,得到指定目标对应的目标数据。由于目标数据是从第一图像数据中截取的,因而其数据格式仍然是第一数据格式,与第一图像数据的数据格式是相同的。
接着执行步骤S3,将所述目标数据的数据格式从所述第一数据格式转换为第二数据格式,所述第二数据格式适于所述目标数据显示和/或传输。
步骤S3中,是对第一数据格式的目标数据进行图像处理,将其数据格式转换为第二数据格式,第二数据格式是适于目标数据进行显示和/或传输的一种数据格式,第一数据格式和第二数据格式都是图像格式。图像处理的过程可以不仅仅进行数据格式转换,还可以包括其他的图像处理,提高目标数据的图像质量。
本公开实施例利用采集得到的第一数据格式的第一图像数据,进行指定目标的检测而获取其位置信息,再利用第一数据格式的第一图像数据截取所得位置信息对应的目标数据,该目标数据由于是从第一图像数据中截取的,因而未发生图像格式或质量的改变,再对该目标数据进行格式转换而将其转换至适于显示和/或传输的数据格式,相比于对已经经过图像处理的图像在检测后进行后处理的方式而言,提升了所检测目标的图像质量。
步骤S1是位置信息获取的步骤,可通过检测感兴趣的指定目标,并在检测到指定目标后进行定位,得到指定目标的位置信息。指定目标的类型不限,例如是文字、人物、车辆、车牌和建筑等等,形状、大小同样是不限的。可以先进行预处理将输入的第一数据格式的第一图像数据进行转化,转化为常用的可进行目标检测的数据,再进行目标检测,也可以直接对第一数据格式的第一图像数据进行目标检测,输出目标位置信息,具体实现方式不限。
在一个实施例中,上述方法流程可由图像处理装置100执行,如图2所示,图像处理装置100主要包含3个模块:第一处理模块101,第二处理模块102和第三处理模块103。第一处理模块101用于执行上述步骤S1,第二处理模块102用于执行上述步骤S2,第三处理模块103用于执行上述步骤S3。
如图2,第一处理模块101从第一数据格式的第一图像数据中,检测感兴趣的目标或者对象,并输出检测到目标的位置信息;第二处理模块102基于第一处理模块101输出的感兴趣目标的位置信息和原始输入的第一数据格式的第一图像数据,从原始的第一数据格式的第一图像数据中获取感兴趣目标对应的第一数据格式的目标数据;第三处理模块103对第二处理模块102输出的感兴趣目标对应的第一数据格式的目标数据,进行自适应ISP处理,得到质量较高的第二数据格式的目标数据。
在一个实施例中,如图3所示,第一处理模块101包括第一处理单元1011和第二处理单元1012,可由第一处理单元1011执行步骤S101,可由第二处理单元1012执行步骤S102,以实现上述步骤S1。上述步骤S1具体包括以下步骤:
S101:将所述第一图像数据转换为可进行目标检测的第二图像数据;
S102:在所述第二图像数据中检测出指定目标的位置信息,将检测出的位置信息确定为所述指定目标在所述第一图像数据中的位置信息。
由于需要检测出指定目标,而第一图像数据不方便用来直接检测出指定目标,因而在步骤S101中,先将第一图像数据转换为可用来进行目标检测的第二图像数据,使得第二图像数据可以用来检测出指定目标。具体转换的方式不限,只要能够将第一图像数据转换为能检测目标的第二图像数据即可。
此第二图像数据由于经过转换,数据格式可能已不再是第一数据格式,若再利用其进行后处理来检测提取目标,图像质量是无法保证的。因而本实施例中并非利用第二图像数据来提取指定目标,而是利用了该第二图像数据来检测出指定目标的位置信息。
步骤S101之后执行步骤S102,在所述第二图像数据中检测出指定目标的位置信息。在第二图像数据中对指定目标进行目标识别定位,便可确定指定目标在第二图像数据中的位置信息,指定目标在第一图像数据和第二图像数据中的位置关系一般不发生变化,当然也不排除第一图像数据和第二图像数据之间发生缩放或者指定目标的平移等,但是这些缩放和平移都是处理过程中可确定的,因而得知指定目标在第二图像数据中的位置信息就可得知指定目标在第一图像数据中的位置信息,将检测出的位置信息确定为所述指定目标在所述第一图像数据中的位置信息。
进一步的,将所述第一图像数据转换为可进行目标检测的第二图像数据的方式 可以包括至少对第一图像数据进行色彩插值处理。在此基础上,例如还可以进行以下处理的至少一种:黑电平校正、白平衡校正、对比度增强和位宽压缩,当然具体也不限于此。
在一种可能的实施方式中,第一处理单元1011可以通过执行步骤S1011至步骤S1015来实现上述步骤S101。参看图4,S1011至步骤S1015具体为:
S1011:黑电平校正;
S1012:白平衡校正;
S1013:色彩插值;
S1014:对比度增强;
S1015:位宽压缩。
可以理解,将第一图像数据转换到第二图像数据的方式也不限于上述步骤S1011~S1015,且处理顺序也不限,例如,将第一图像数据转换到第二图像数据也可以仅进行色彩插值处理,只要得到的第二图像数据能够进行目标检测即可。
步骤S1011中,假设第一数据格式的第一图像数据记为imgR,黑电平校正是为了去除第一数据格式的第一图像数据中黑电平的影响,输出imgR
blc:
imgR
blc=imgR-V
blc
其中,V
blc是黑电平值;这里的“-”并非是数学运算,表示“去除”的含义。
步骤S1012中,白平衡校正是为了去除图像成像由于环境光照影响而造成的图像偏色,以还原图像原有的色彩信息,可由两个系数R
gain、B
gain来控制对应的R1分量和B1分量的调整:
R1′=R1*R
gain
B1′=B1*B
gain
其中,R1、B1是黑电平校正处理后的图像数据的红色、蓝色通道色彩分量,R1′、B1′是白平衡校正模块输输出图像的红色、蓝色通道色彩分量,输出图像记为imgR
wb。
步骤S1013中,色彩插值所针对的数据是白平衡校正处理后的数据,色彩插值可采用最邻近插值法实现,将单通道第一数据格式的第一图像数据扩展成多通道数据。针对Bayer格式的第一数据格式的第一图像数据,直接用最邻近的色彩像素填补相应色彩缺失的像素点,使每个像素点都含有RGB三种色彩分量,具体插值过程如图5所示,R11将其邻近的三个色彩像素填补为R11,具体将哪几个邻近的进行填补可以进行设置,其他色彩像素也是同理,在此不再赘述,插值后的图像记为imgC。
步骤S1014中,对比度增强所针对的数据是色彩插值后的数据,对比度增强是为了增强插值后图像的对比度,可以使用Gamma曲线进行线性映射,假设Gamma曲线的映射函数为f(),映射后的图像记为imgC
gm:
imgC
gm(i,j)=f(imgC(i,j)),
其中,(i,j)为像素点的坐标。
步骤S1015中,位宽压缩所针对的数据是对比度增强后的数据,位宽压缩是将对比度增强后得到的高位宽数据imgC
gm进行压缩,可压缩到第二数据格式对应的位宽,例如可以直接采用线性压缩,压缩后的图像记为imgC
1b:
imgC
1b(i,j)=imgC
gm(i,j)/M
其中,M即为第一数据格式压缩到第二数据格式对应的压缩比例。
在一种可能的实施方式中,第二处理单元1012可以通过执行步骤S1021至步骤1022来实现上述步骤S102。
S1021:将所述第二图像数据输入至已训练的第一神经网络;所述第一神经网络用于至少通过卷积层、池化层、全连接层和边框回归层实现定位;
S1022:将所述第一神经网络输出的结果确定为所述指定目标在第一图像数据中的位置信息。
步骤S1021中,第一神经网络是已经训练好的网络,将第二图像数据输入该第一神经网络,可实现指定目标在第二图像数据中的定位,相应获得指定目标的位置信息。
其中,第一神经网络可以集成在第二处理单元1012中作为第一处理模块101的一部分,也可以设置在第一处理模块101外部,可由第二处理单元1012来调度。
参看图6,所述第一神经网络200可以包括至少一层用于执行卷积的卷积层201、至少一层用于执行下采样的池化层202、至少一层用于执行特征综合的全连接层203和至少一层用于执行坐标变换的边框回归层204。
作为第一神经网络的一个实施例,参看图7,第一神经网络200可以包括依次连接的卷积层205、卷积层206、池化层207…卷积层208、池化层209、全连接层210、边框回归层211。将第二图像数据输入第一神经网络200,第一神经网络200输出位置信息,该位置信息作为指定目标在第一图像数据中的位置信息。第一神经网络每层所执行的功能已经在上面进行了描述,每层可有适应性的变化,例如不同卷积层的卷积核可有所不同,在此便不再赘述。可以理解,图7示出的第一神经网络仅是一个示例,具体并不限于此,例如可以减少或增加卷积层、和/或池化层、和/或其他层。
下面介绍一下第一神经网络中的各层的具体功能,但不应以此为限。
卷积层(Conv)执行的是卷积操作,还可以带有一个激活函数ReLU,可以对卷积结果进行激活操作,因此对于一个卷积层的操作可以用以下公式表示:
YC
i(I)=g(W
i*YC
i-1(I)+B
i)
其中,YC
i(I)为第i个卷积层的输出,YC
i-1(I)为第i个卷积层的输入,*表示卷积操作,W
i和B
i分别为第i个卷积层的卷积滤波器的权重系数和偏移系数,g()表示激活函数,当激活函数为ReLU时,g(x)=max(0,x),x即YC
i(I)。
池化层(Pool)是一种特殊的下采样层,即对卷积得到的特征图进行缩小,缩小窗的大小例如为N×N,当使用最大池化时,即对N×N窗求取最大值作为最新图像对 应点的值,具体公式如下:
YP
j(I)=maxpool(YP
j-1(I))
其中,YP
j-1(I)为第j个池化层的输入,YP
j(I)为第j个池化层的输出。
全连接层(FC)可以看成是滤波窗口为1×1的卷积层,全连接层的每一个结点都与上一层的所有结点相连,用来把前边提取到的特征综合起来,具体实现和卷积滤波类
边框回归层(BBR)是为了寻找一种关系使得全连接层输出的窗口P经过映射得到一个跟真实窗口G更接近的窗口G′;回归的实现一般是对窗口P进行坐标变换,例如包括平移变换和/或尺度缩放变换;假设全连接层输出的窗口P的坐标为(x
1,x
2,y
1,y
2),则经变换后的窗口P的坐标(x
3,x
4,y
3,y
4);
若变换为平移变换,平移尺度为(Δx,Δy),平移前后的坐标关系为:
x
3=x
1+Δx
x
4=x
2+Δx
y
3=y
1+Δy
y
4=y
2+Δy
若尺度变换为缩放变换,X、Y方向的缩放尺度分别为dx、dy,变换前后的坐标关系为:
x
4-x
3=(x
2-x
1)*dx
y
4-y
3=(y
2-y
1)*dy。
在步骤S1022中,根据第一神经网络输出的结果确定所述指定目标在第一图像数据中的位置信息,可将第一神经网络的输出结果直接作为指定目标在第一图像数据中的位置信息,或者也可将输出结果用指定目标在第一图像数据和第二图像数据的位置变化关系进行转换得到指定目标在第一图像数据中的位置信息。
对于第一神经网络的训练,可以通过获取第二图像数据样本和对应的位置信息样本作为训练样本集,将第二图像数据样本作为输入,将对应的位置信息样本作为输出,对第一神经网络的训练模型进行训练。关于第二图像数据样本和对应的位置信息样本的获取,可以通过可识别检测目标的图像处理方式来对第二图像数据样本进行处理得到对应的位置信息样本。
在另一个实施例中,参看图8,第一处理模块101包括第三处理单元1013,可由第三处理单元1013执行步骤S111和步骤S112,以实现上述步骤S1。步骤S111和步 骤S112具体为:
S111:将所述第一图像数据输入至已训练的第二神经网络;第二神经网络至少通过灰度化层、卷积层、池化层、全连接层和边框回归层将所述第一图像数据转换为可进行目标检测的第二图像数据、以及检测指定目标在第二图像数据中的位置信息;
S112:将所述第二神经网络输出的结果确定为所述指定目标在所述第一图像数据中的位置信息。
其中,第二神经网络可以集成在第三处理单元1013中作为第一处理模块101的一部分,也可以设置在第一处理模块101外部,可由第三处理单元1013来调度。
参看图9,所述第二神经网络300包括至少一层用于执行灰度处理的灰度化层301、一层用于执行卷积的卷积层302、一层用于执行下采样的池化层303、一层用于执行特征综合的全连接层304和一层用于执行坐标变换的边框回归层305。可以通过第二神经网络来实现将所述第一图像数据转换为可进行目标检测的第二图像数据、以及检测指定目标在第二图像数据中的位置信息,而不用进行其他ISP处理。当然,根据不同的需求,可以在第二神经网络处理的基础上,进行一定的信息处理,具体不限。
作为第二神经网络的一个实施例,参看图10,第二神经网络300可以包括灰度化层306、卷积层307、卷积层308、池化层309……卷积层310、池化层311、全连接层312和边框回归层313。向第二神经网络输入第一图像数据,第二神经网络的各层结构对第一图像数据进行应用处理后输出位置信息,该位置信息作为指定目标在第一图像数据中的位置信息。第二神经网络每层所执行的功能与第一神经网络中的相应层相同,已经在上面进行了描述,每层可有适应性的变化,例如不同卷积层的卷积核可有所不同,在此便不再赘述。可以理解,图10示出的第二神经网络300仅是一个示例,具体并不限于此,例如可以减少或增加卷积层、和/或、池化层、和/或其他层。
第二神经网络中的灰度化层是将多通道的第一数据格式信息转化为单通道的灰度信息,可以通过对当前像素点周围表征不同色彩的分量分别进行加权即可。参看图11,通过灰度化层的处理,将不同色彩的分量RGB进行加权后均转换为了单通道的灰度信息Y,例如,对于Y22来说,计算公式如下:
Y22=(B22+(G12+G32+G21+G23)/4+(R11+R13+R31+R33)/4)/3
其他分量也可以是同理,在此便不再赘述。
第二神经网络中的卷积层、池化层、全连接层和边框回归层所执行的功能与第一神经网络中的相应层可相同,每层可有适应性的变化,例如不同卷积层的卷积核可有所不同,在此便不再赘述。
对于第二神经网络的训练,可以通过获取第一图像数据样本和对应的位置信息样本作为训练样本集,将第一图像数据样本作为输入,将对应的位置信息样本作为输出,对第二神经网络的训练模型进行训练。关于第一图像数据样本和对应的位置信息样本的获取,可以先将第一图像数据样本进行目标可检测化的图像处理,再通过可识别检测目标的图像处理方式来检测目标,以得到对应的位置信息样本。
步骤S2中,可以根据步骤S1中得到的指定目标在第一图像数据的位置信息,到原始输入的第一数据格式的第一图像数据中的对应位置进行数据截取,截取的数据作为对应目标的第一数据格式的目标数据。
在一个实施例中,假设步骤S1中得到的指定目标在第一图像数据的位置信息为[x1,x2,y1,y2],其中x1、y1为起始位置信息,x2、y2为结束位置信息,当整幅图像对应的第一数据格式的第一图像数据用imgR来表示时,则指定目标的第一数据格式的目标数据imgT为:
imgT=imgR(x1:x2,y1:y2)。
步骤S3中,针对步骤S2中得到的指定目标对应的第一数据格式的目标数据进行处理,以将指定目标的目标数据从第一数据格式转化为第二数据格式。步骤S3事实上是针对小目标数据的图像处理,可以通过非神经网络实现的ISP处理实现,也可以通过神经网络来实现。
在一个实施例中,如图12所示,第三处理模块103包括第四处理单元1031,可由第四处理单元1031执行以下步骤,以实现上述步骤S3。
将第一数据格式的目标数据输入至已训练的第三神经网络;所述第三神经网络至少通过卷积层实现将所述目标数据的数据格式从所述第一数据格式转换为第二数据格式。
其中,第三神经网络可以集成在第四处理单元1031中作为第三处理模块103的一部分,也可以设置在第三处理模块103外部,可由第四处理单元1031来调度。
所述第三神经网络可以包括至少一层用于执行卷积的卷积层实现将所述目标数据的数据格式从所述第一数据格式转换为第二数据格式。当然,第三神经网络的层结构不限于此,例如还可以包括至少一层用于执行激活的ReLu层,或者还可以包括其他层。具体的层数也不限。
基于第三神经网络来实现图像处理,减少了传统图像处理在每个处理步骤中分别处理所可能引起的误差传播。
下面对第三神经网络的各层执行的操作进行具体描述,但不应以此为限。
第三神经网络的卷积层,假设每个卷积层的输入为FC
i,卷积层的输出为FC
i+1,则有:
FC
i+1=g(w
ik*FC
i+b
ik)
w
ik、b
ik为当前卷积层中第k个卷积的参数,g(x)是一种线性加权函数,即对每个卷积层的卷积输出进行线性加权。当然,第三神经网络的卷积层与第一神经网络的卷积层都是执行的卷积操作,因而功能类似,相关描述也可以参看关于第一神经网络的卷积层的内容。
第三神经网络的ReLu层,假设每个ReLu层的输入为FR
i,ReLu层的输入为FR
i+1,则有:
FR
i+1=max(FR
i,0),
即选择0和FR
i的最大者。
作为第三神经网络的一个实施例中,参看图13,第三神经网络400可以包括依次连接的卷积层401、卷积层402、ReLu层403、卷积层404、卷积层405。第三神经网络400的输入是第一数据格式的目标数据,输出是第二数据格式的目标数据。第三神经网络每层所执行的功能与第一神经网络的相应层相同,已经在上面进行了描述,每层可有适应性的变化,例如不同卷积层的卷积核可有所不同,在此便不再赘述。可以理解,图13示出的第三神经网络仅是一个示例,具体并不限于此,例如可以减少或增加卷积层、和/或、池化层、和/或其他层。
对于第三神经网络的训练,为了预先优化深度神经网络,可以使用大量第一数据格式的目标数据样本和对应理想的第二数据格式的目标数据样本构成样本,对第三神经网络训练过程中使用的网络参数进行不断训练,直至当输入第一数据格式的目标数据时,能够输出理想的第二数据格式的目标数据,此时输出网络参数,以供第三神经网络实际测试、使用。
对第三神经网络进行训练的训练流程可以包括以下步骤:
S311:收集训练样本:收集感兴趣目标对应的第一数据格式信息和对应的理想第二数据格式信息。假设已获得n个训练样本对{(x
1,y
1),(x
2,y
2),...,(x
n,y
n)},其中,x
i表示输入的第一数据格式信息,y
i表示对应的理想第二数据格式信息。
S312:设计第三神经网络的结构;网络训练使用的网络结构和测试时使用的网络结构为同一网络结构;
S313:初始化训练参数;对第三神经网络的结构的网络参数进行初始化,可采取随机值初始化、固定值初始化等;设置训练相关参数,如学习率、迭代次数等;
S314:前向传播;基于当前网络参数,采用训练样本x
i在第三神经网络上进行前向传播,获得第三神经网络的输出F(x
i),计算损失函数Loss:
Loss=(F(x
i)-y
i)
2;
S315:后向传播:利用后向传播,调整第三神经网络的网络参数;
S316:反复迭代:重复迭代步骤S314和S315,直至网络收敛,输出此时的网络参数。
当然,第三神经网络的训练过程并不局限于此,还可以是其他训练方式,只要能够使得训练后的第三神经网络可以实现输入第一数据格式的目标数据可以得到对应的第二数据格式的目标数据即可。
在另一个实施例中,如图14所示,第三处理模块103包括第五处理单元1032,可由第五处理单元1032对所述目标数据执行ISP处理,将所述目标数据从所述第一数据格式转换为第二数据格式,ISP处理至少包括色彩插值,以实现上述步骤S3。
进一步的,所述ISP处理还包括以下处理中的至少一种:白平衡校正、曲线映 射,可以进一步提高图像质量。
仅利用第一数据格式的目标数据来实现ISP处理中参数的计算,可提高处理参数的准确性,进而提升目标数据处理后的图像质量。
作为对所述目标数据执行ISP处理的一个实施例中,参看图15,所述ISP处理可以依次包括以下步骤:
S301:白平衡校正;输入为第一数据格式的目标数据;
S302:色彩插值;
S303:曲线映射;输出为第二数据格式的目标数据。
可以理解,实现将所述目标数据从所述第一数据格式转换为第二数据格式的ISP处理也不局限于此,例如可以仅进行色彩插值,或者可以包括其他的ISP处理方式。
下面对白平衡校正、色彩插值、曲线映射这些ISP处理进行更具体的描述,但不应以此为限。
白平衡校正是为了去除图像成像由于环境光照影响而造成的图像偏色,以还原图像原有的色彩信息,一般由两个系数R
-gain、B
-gain来控制对应的R分量和B分量的调整。
R2′=R2*R
_gain
B2′=B2*B
-gain
其中,R2、B2是白平衡校正的输入图像的红色、蓝色通道色彩分量,R2'、B2'是白平衡校正的输出图像的红色、蓝色通道色彩分量;相对于全图的白平衡校正而言,此处R
-gain、B
-gain只需要对感兴趣目标的R、B、G通道色彩分量进行统计和计算;
在计算R
-gain、B
-gain时,需要先统计R、G、B通道每个色彩分量的均值R
avg、G
avg和B
avg,则有:
色彩插值是指将白平衡校正后的第一数据格式的目标数据由单通道格式扩展成每个通道表征一个色彩分量的多通道数据格式;可以采用最邻近插值法实现,将单通道第一数据格式的目标数据扩展成多通道的目标数据。例如针对Bayer格式的第一数据格式的图像数据,可直接用最邻近的色彩像素填补相应色彩缺失的像素点,使每个像素点都含有RGB三种色彩分量,具体插值过程与前述图4对应的实施例可以相同或类似,在此不再赘述。
曲线映射是指将图像数据按照人眼的视觉特性将图像数据进行亮度和对比度调整,常用不同参数的Gamma曲线进行映射,假设Gamma曲线的映射函数为g,映射后的图像记为img
gm,映射前图像记为img,则有:
img
gm(i,j)=g(img(i,j))。
本公开实施例利用采集得到的第一数据格式的第一图像数据,进行指定目标的检测而获取其位置信息;再利用第一数据格式的第一图像数据截取所得位置信息对应的目标数据;该目标数据由于是从第一图像数据中截取的,因而未发生图像格式或质量的改变,再将该目标数据转换至适于显示和/或传输的数据格式,相比于对已经经过图像处理的图像在检测后进行后处理的方式而言,提升了所检测目标的图像质量。
下面对本公开实施例的图像处理装置进行描述,但不应以此为限。
在一个实施例中,参看图2,一种图像处理装置100可以包括:
第一处理模块101,用于从采集的第一数据格式的第一图像数据中获取指定目标在所述第一图像数据中的位置信息;
第二处理模块102,用于从所述第一图像数据中截取所述位置信息对应的目标数据;
第三处理模块103,用于将所述目标数据的数据格式从所述第一数据格式转换为第二数据格式,所述第二数据格式适于所述目标数据显示和/或传输。
在本公开实施例中,图像处理装置100可以应用在图像设备上,图像设备可以是摄像机等具有成像功能的设备,或者是可进行图像后处理的设备等,具体不限。第一数据格式的第一图像数据可以是图像设备自身采集得到的图像数据,也可以是从其他设备获取的图像数据,具体不限。
在一个实施例中,参看图3,所述第一处理模块101包括第一处理单元1011和第二处理单元1012。所述第一处理单元1011,用于将所述第一图像数据转换为可进行目标检测的第二图像数据。所述第二处理单元1012,用于在所述第二图像数据中检测出所述指定目标的位置信息,将检测出的位置信息确定为所述指定目标在所述第一图像数据中的位置信息。
在一个实施例中,所述第二处理单元1012,具体用于:将所述第二图像数据输入至已训练的第一神经网络,并将所述第一神经网络输出的结果确定为所述指定目标在第一图像数据中的位置信息。其中,所述第一神经网络至少包括用于执行卷积的卷积层、用于执行下采样的池化层、用于执行特征综合的全连接层和用于执行坐标变换的边框回归层,以实现所述指定目标的位置信息的定位和输出。
在一个实施例中,所述第一处理单元1011,具体用于:采用黑电平校正、白平衡校正、色彩插值、对比度增强和位宽压缩中的至少一者,将所述第一图像数据转换为可进行目标检测的第二图像数据。
在一个实施例中,参看图8,第一处理模块101包括第三处理单元1013,用于将所述第一图像数据输入至已训练的第二神经网络。其中,所述第二神经网络至少包括用于执行灰度处理的灰度化层、用于执行卷积的卷积层、用于执行下采样的池化层、用于执行特征综合的全连接层和用于执行坐标变换的边框回归层,以将所述第一图像数据转换为可进行目标检测的第二图像数据、并检测指定目标在第二图像数据中的位置信息。 这样,可根据所述第二神经网络输出的结果确定所述指定目标在所述第一图像数据中的位置信息。
在一个实施例中,参看图12,第三处理模块103包括第四处理单元1031,用于将所述目标数据输入至已训练的第三神经网络。其中,所述第三神经网络至少包括用于执行卷积的卷积层,以将所述目标数据从所述第一数据格式转换为第二数据格式。
在一个实施例中,参看图14,第三处理模块103包括第五处理单元1032,用于对所述目标数据执行ISP处理。其中,所述ISP处理用于将所述目标数据从所述第一数据格式转换为第二数据格式,所述ISP处理至少包括色彩插值。
上述装置中各个单元的功能和作用的实现过程具体详见上述方法中对应步骤的实现过程,在此不再赘述。
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元。
本公开还提供一种电子设备,包括处理器及存储器;所述存储器存储有可被处理器调用的程序;其中,所述处理器执行所述程序时,实现如前述实施例中任意一项所述的图像处理方法。
本公开图像处理装置的实施例可以应用在电子设备上。以软件实现为例,作为一个逻辑意义上的装置,是通过其所在电子设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,如图16所示,图16是本公开根据一示例性实施例示出的图像处理装置100所在电子设备的一种硬件结构图,除了图7所示的处理器510、内存530、接口520、以及非易失性存储器540之外,实施例中装置100所在的电子设备通常根据该电子设备的实际功能,还可以包括其他硬件,对此不再赘述。
本公开还提供一种机器可读存储介质,其上存储有程序,该程序被处理器执行时,使得图像设备实现如前述实施例中任意一项所述的图像处理方法。
本公开可采用在一个或多个其中包含有程序代码的存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。机器可读存储介质包括永久性和非永久性、可移动和非可移动媒体,可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。机器可读存储介质的例子包括但不限于:相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。
以上所述仅为本公开的实施例而已,并不用以限制本公开,凡在本公开的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本公开保护的范围之内。
Claims (17)
- 一种图像处理方法,包括:从采集的第一数据格式的第一图像数据中获取指定目标在所述第一图像数据中的位置信息;从所述第一图像数据中截取所述位置信息对应的目标数据;将所述目标数据的数据格式从所述第一数据格式转换为第二数据格式,所述第二数据格式适于所述目标数据显示和/或传输。
- 如权利要求1所述的图像处理方法,其特征在于,所述从采集的第一数据格式的第一图像数据中获取指定目标在所述第一图像数据中的位置信息包括:将所述第一图像数据转换为可进行目标检测的第二图像数据;在所述第二图像数据中检测出所述指定目标的位置信息,将检测出的位置信息确定为所述指定目标在所述第一图像数据中的位置信息。
- 如权利要求2所述的图像处理方法,其特征在于,所述在第二图像数据中检测出指定目标的位置信息,将检测出的位置信息确定为所述指定目标在所述第一图像数据中的位置信息包括:将所述第二图像数据输入至已训练的第一神经网络;所述第一神经网络至少通过用于执行卷积的卷积层、用于执行下采样的池化层、用于执行特征综合的全连接层和用于执行坐标变换的边框回归层实现所述指定目标的位置信息的定位和输出;根据所述第一神经网络的输出确定所述指定目标在第一图像数据中的位置信息。
- 如权利要求2所述的图像处理方法,其特征在于,所述将所述第一图像数据转换为可进行目标检测的第二图像数据,包括:采用黑电平校正、白平衡校正、色彩插值、对比度增强和位宽压缩中的至少一种图像处理方式,将所述第一图像数据转换为可进行目标检测的所述第二图像数据。
- 如权利要求1所述的图像处理方法,其特征在于,所述从采集的第一数据格式的第一图像数据中获取指定目标在所述第一图像数据中的位置信息包括:将所述第一图像数据输入至已训练的第二神经网络;所述第二神经网络至少通过用于执行灰度处理的灰度化层、用于执行卷积的卷积层、用于执行下采样的池化层、用于执行特征综合的全连接层和用于执行坐标变换的边框回归层,以将所述第一图像数据转换为可进行目标检测的第二图像数据,以及检测指定目标在第二图像数据中的位置信息;根据所述第二神经网络的输出确定所述指定目标在所述第一图像数据中的位置信息。
- 如权利要求1至5中任一项所述的图像处理方法,其特征在于,所述将目标数据的数据格式从所述第一数据格式转换为第二数据格式包括:将所述目标数据输入至已训练的第三神经网络;所述第三神经网络至少通过用于执行卷积的卷积层实现将所述目标数据的数据格式从所述第一数据格式转换为第二数据格式。
- 如权利要求1至5中任一项所述的图像处理方法,其特征在于,所述将目标数据的数据格式从所述第一数据格式转换为第二数据格式包括:对所述目标数据执行图像信号处理器ISP处理;其中,所述ISP处理用于将所述目标数据的数据格式从所述第一数据格式转换为第二数据格式,所述ISP处理至少包括色 彩插值。
- 如权利要求7所述的图像处理方法,其特征在于,所述ISP处理还包括以下处理中的至少一种:白平衡校正、曲线映射。
- 一种图像处理装置,包括:第一处理模块,用于从采集的第一数据格式的第一图像数据中获取指定目标在所述第一图像数据中的位置信息;第二处理模块,用于从所述第一图像数据中截取所述位置信息对应的目标数据;第三处理模块,用于将所述目标数据的数据格式从所述第一数据格式转换为第二数据格式,所述第二数据格式适于所述目标数据显示和/或传输。
- 如权利要求9所述的图像处理装置,其特征在于,所述第一处理模块包括第一处理单元和第二处理单元;所述第一处理单元,用于将所述第一图像数据转换为可进行目标检测的第二图像数据;所述第二处理单元,用于在所述第二图像数据中检测出所述指定目标的位置信息,将检测出的位置信息确定为所述指定目标在所述第一图像数据中的位置信息。
- 如权利要求10所述的图像处理装置,其特征在于,所述第二处理单元,具体用于:将所述第二图像数据输入至已训练的第一神经网络,以及将所述第一神经网络输出的结果确定为所述指定目标在第一图像数据中的位置信息;所述第一神经网络至少通过用于执行卷积的卷积层、用于执行下采样的池化层、用于执行特征综合的全连接层和用于执行坐标变换的边框回归层实现所述指定目标的位置信息的定位和输出。
- 如权利要求10所述的图像处理装置,其特征在于,所述第一处理单元具体用于:采用黑电平校正、白平衡校正、色彩插值、对比度增强和位宽压缩中的至少一者,将所述第一图像数据转换为可进行目标检测的第二图像数据。
- 如权利要求9所述的图像处理装置,其特征在于,所述第一处理模块包括第三处理单元;所述第三处理单元,用于将所述第一图像数据输入至已训练的第二神经网络;所述第二神经网络至少通过用于执行灰度处理的灰度化层、用于执行卷积的卷积层、用于执行下采样的池化层、用于执行特征综合的全连接层和用于执行坐标变换的边框回归层,以将所述第一图像数据转换为可进行目标检测的第二图像数据、以及检测指定目标在所述第二图像数据中的位置信息;根据所述第二神经网络输出的结果确定所述指定目标在所述第一图像数据中的位置信息。
- 如权利要求9所述的图像处理装置,其特征在于,所述第三处理模块包括第四处理单元;所述第四处理单元,用于将所述目标数据输入至已训练的第三神经网络;所述第三神经网络至少通过用于执行卷积的卷积层,以将所述目标数据的数据格式从所述第一数据格式转换为第二数据格式。
- 如权利要求9所述的图像处理装置,其特征在于,所述第三处理模块包括第五处理单元;所述第五处理单元,用于对所述目标数据执行ISP处理;其中,所述ISP处理用于将所述目标数据的数据格式从所述第一数据格式转换为第二数据格式,所述ISP处理至少包括色彩插值。
- 一种电子设备,包括处理器及存储器;所述存储器存储有可被处理器调用的程序;其中,所述处理器执行所述程序时,实现如权利要求1-8中任意一项所述的图像处理方法。
- 一种机器可读存储介质,其上存储有程序,该程序被处理器执行时,实现如权利要求1-8中任意一项所述的图像处理方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810571964.XA CN110555877B (zh) | 2018-05-31 | 2018-05-31 | 一种图像处理方法、装置及设备、可读介质 |
CN201810571964.X | 2018-05-31 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019228450A1 true WO2019228450A1 (zh) | 2019-12-05 |
Family
ID=68698712
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/089249 WO2019228450A1 (zh) | 2018-05-31 | 2019-05-30 | 一种图像处理方法、装置及设备、可读介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110555877B (zh) |
WO (1) | WO2019228450A1 (zh) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113077516A (zh) * | 2021-04-28 | 2021-07-06 | 深圳市人工智能与机器人研究院 | 一种位姿确定方法及相关设备 |
CN114387171A (zh) * | 2020-10-21 | 2022-04-22 | Oppo广东移动通信有限公司 | 图像处理方法、装置、存储介质及电子设备 |
CN115977496A (zh) * | 2023-02-24 | 2023-04-18 | 重庆长安汽车股份有限公司 | 一种车门控制方法、系统、设备及介质 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111110272B (zh) * | 2019-12-31 | 2022-12-23 | 深圳开立生物医疗科技股份有限公司 | 超声图像测量信息显示方法、装置、设备及可读存储介质 |
CN114078168A (zh) * | 2020-08-19 | 2022-02-22 | Oppo广东移动通信有限公司 | 图像处理模型训练方法、图像处理方法及电子设备 |
RU2764395C1 (ru) | 2020-11-23 | 2022-01-17 | Самсунг Электроникс Ко., Лтд. | Способ и устройство для совместного выполнения дебайеризации и устранения шумов изображения с помощью нейронной сети |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170083752A1 (en) * | 2015-09-18 | 2017-03-23 | Yahoo! Inc. | Face detection |
CN107886074A (zh) * | 2017-11-13 | 2018-04-06 | 苏州科达科技股份有限公司 | 一种人脸检测方法以及人脸检测系统 |
CN108009524A (zh) * | 2017-12-25 | 2018-05-08 | 西北工业大学 | 一种基于全卷积网络的车道线检测方法 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103873781B (zh) * | 2014-03-27 | 2017-03-29 | 成都动力视讯科技股份有限公司 | 一种宽动态摄像机实现方法及装置 |
US9881234B2 (en) * | 2015-11-25 | 2018-01-30 | Baidu Usa Llc. | Systems and methods for end-to-end object detection |
CN106529446A (zh) * | 2016-10-27 | 2017-03-22 | 桂林电子科技大学 | 基于多分块深层卷积神经网络的车型识别方法和系统 |
CN107301383B (zh) * | 2017-06-07 | 2020-11-24 | 华南理工大学 | 一种基于Fast R-CNN的路面交通标志识别方法 |
CN107895378A (zh) * | 2017-10-12 | 2018-04-10 | 西安天和防务技术股份有限公司 | 目标检测方法和装置、存储介质、电子设备 |
CN107808139B (zh) * | 2017-11-01 | 2021-08-06 | 电子科技大学 | 一种基于深度学习的实时监控威胁分析方法及系统 |
CN107871126A (zh) * | 2017-11-22 | 2018-04-03 | 西安翔迅科技有限责任公司 | 基于深层神经网络的车型识别方法和系统 |
-
2018
- 2018-05-31 CN CN201810571964.XA patent/CN110555877B/zh active Active
-
2019
- 2019-05-30 WO PCT/CN2019/089249 patent/WO2019228450A1/zh active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170083752A1 (en) * | 2015-09-18 | 2017-03-23 | Yahoo! Inc. | Face detection |
CN107886074A (zh) * | 2017-11-13 | 2018-04-06 | 苏州科达科技股份有限公司 | 一种人脸检测方法以及人脸检测系统 |
CN108009524A (zh) * | 2017-12-25 | 2018-05-08 | 西北工业大学 | 一种基于全卷积网络的车道线检测方法 |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114387171A (zh) * | 2020-10-21 | 2022-04-22 | Oppo广东移动通信有限公司 | 图像处理方法、装置、存储介质及电子设备 |
CN113077516A (zh) * | 2021-04-28 | 2021-07-06 | 深圳市人工智能与机器人研究院 | 一种位姿确定方法及相关设备 |
CN113077516B (zh) * | 2021-04-28 | 2024-02-23 | 深圳市人工智能与机器人研究院 | 一种位姿确定方法及相关设备 |
CN115977496A (zh) * | 2023-02-24 | 2023-04-18 | 重庆长安汽车股份有限公司 | 一种车门控制方法、系统、设备及介质 |
Also Published As
Publication number | Publication date |
---|---|
CN110555877B (zh) | 2022-05-31 |
CN110555877A (zh) | 2019-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019228450A1 (zh) | 一种图像处理方法、装置及设备、可读介质 | |
US20220014684A1 (en) | Image display method and device | |
CN112581379B (zh) | 图像增强方法以及装置 | |
US11037278B2 (en) | Systems and methods for transforming raw sensor data captured in low-light conditions to well-exposed images using neural network architectures | |
WO2021164234A1 (zh) | 图像处理方法以及图像处理装置 | |
US11170470B1 (en) | Content-adaptive non-uniform image downsampling using predictive auxiliary convolutional neural network | |
CN107590786A (zh) | 一种基于对抗学习网络的图像增强方法 | |
US20240007600A1 (en) | Spatially Varying Reduction of Haze in Images | |
US20220398698A1 (en) | Image processing model generation method, processing method, storage medium, and terminal | |
CN111582074A (zh) | 一种基于场景深度信息感知的监控视频树叶遮挡检测方法 | |
CN112966635B (zh) | 面向低分辨率时序遥感影像的运动舰船检测方法及装置 | |
CN111784624A (zh) | 目标检测方法、装置、设备及计算机可读存储介质 | |
CN109272014B (zh) | 一种基于畸变适应卷积神经网络的图像分类方法 | |
CN110647813A (zh) | 一种基于无人机航拍的人脸实时检测识别方法 | |
CN112241982A (zh) | 一种图像处理方法、装置及机器可读存储介质 | |
CN111815529B (zh) | 一种基于模型融合和数据增强的低质图像分类增强方法 | |
CN112288031A (zh) | 交通信号灯检测方法、装置、电子设备和存储介质 | |
CN117456376A (zh) | 一种基于深度学习的遥感卫星影像目标检测方法 | |
CN117409244A (zh) | 一种SCKConv多尺度特征融合增强的低照度小目标检测方法 | |
CN116403200A (zh) | 基于硬件加速的车牌实时识别系统 | |
CN112241670B (zh) | 图像处理方法及装置 | |
CN112241936B (zh) | 图像处理方法、装置及设备、存储介质 | |
CN114463379A (zh) | 一种视频关键点的动态捕捉方法及装置 | |
CN110738225B (zh) | 图像识别方法及装置 | |
WO2023082162A1 (zh) | 图像处理方法和装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19810790 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19810790 Country of ref document: EP Kind code of ref document: A1 |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19810790 Country of ref document: EP Kind code of ref document: A1 |