CN115643404A

CN115643404A - Image processing method, device and system based on hybrid deep learning

Info

Publication number: CN115643404A
Application number: CN202211429849.1A
Authority: CN
Inventors: 卢金勤; 傅清丁; 邓芳名; 朱立; 彭仁夔; 罗梓铭; 段军华; 朱淘淘
Original assignee: Jiangxi Kingroad Technology Development Co ltd
Current assignee: Jiangxi Kingroad Technology Development Co ltd
Priority date: 2022-11-16
Filing date: 2022-11-16
Publication date: 2023-01-24

Abstract

The invention provides an image processing method, device and system based on hybrid deep learning, wherein the method comprises the following steps: acquiring an image signal; training an image signal through a convolutional neural network, and setting an optimized target characteristic as a rate distortion cost; extracting shallow network characteristic information of the image signal, classifying the image signal according to the shallow network characteristic information, and distinguishing salient image information or non-salient image information from the image signal; carrying out quantization transformation on the image signal by using DCT (discrete cosine transformation); coding the image signal after DCT transformation, wherein, when coding the highlight image information, adopting an intra-frame coding mode for a head frame of the highlight image information and adopting an inter-frame prediction coding mode for other frames of the highlight image information; performing corresponding DCT inverse transformation on the coded image signal, and performing corresponding decoding; and uploading the decoded image signal. The invention can greatly improve the compression performance, reduce the coding files and reduce the memory redundancy.

Description

Image processing method, device and system based on hybrid deep learning

Technical Field

The invention relates to the technical field of electronics, in particular to an image processing method, device and system based on hybrid deep learning.

Background

The image compression category mainly comprises still image compression and non-still image compression, and the modes mainly comprise lossy compression and lossless compression, and mainly comprise coding modes such as Huffman coding, JPEG algorithm, LZW algorithm, wavelet coding, fractal coding and the like. At present, vectorization coding technology and neural network learning are also greatly developed. The current mainstream data compression technology is used as a memory data processing method, so that the memory load can be reduced, and the method is widely applied to the monitoring fields of traffic, software, aerospace, electric automobiles and the like. The data are continuously collected, stored and transmitted, and the data volume is huge, and the data occupy a large amount of storage space and network bandwidth. Due to strong relevance among data and a large amount of redundant data, the data compression can remove the redundant data, reduce the occupation amount of a storage space and improve the transmission efficiency of network bandwidth, so the research of the image data compression technology is the focus of current attention.

Most of the existing compression methods are based on given data content for compression, information sparsity and generality are not considered, meaningless content cannot be distinguished and then compressed, and therefore compression efficiency and precision are low. For example, the contents shot every day by surveillance videos on a highway are divided into three cases: the traffic accidents such as no vehicle passing on the road, no vehicle violating the regulations on the road, vehicle violating the regulations on the road and the like occur. The first two cases occur daily and are of little significance, so storing many of the same such surveillance videos does much of the wasted work for overall data compression.

In order to solve the above problems, the present application provides an image processing method, apparatus and system based on hybrid deep learning. The method takes the expressway video compression of the non-static images as the background field, and researches the image compression technology.

Disclosure of Invention

The present invention is directed to solving one of the problems set forth above.

The invention mainly aims to provide an image processing method based on hybrid deep learning.

Another object of the present invention is to provide an image processing apparatus based on hybrid deep learning.

It is still another object of the present invention to provide an image processing system based on hybrid deep learning.

In order to achieve the purpose, the technical scheme of the invention is realized as follows:

the invention provides an image processing method based on hybrid deep learning, which comprises the following steps: acquiring an image signal; training the image signal through a convolutional neural network, and setting the optimized target characteristics as rate distortion cost; extracting shallow network characteristic information of the image signal, classifying the image signal according to the shallow network characteristic information, and distinguishing salient image information or non-salient image information from the image signal; carrying out quantization transformation on the image signal by using DCT (discrete cosine transformation), realizing the transformation from floating point numbers to integers, reducing the sign number from a rational number range to an integer symbol in a limited range, and obtaining the image signal after DCT transformation; coding the image signal after DCT transformation to obtain a coded image signal, wherein when coding the salient image information, an intra-frame coding mode is adopted for a head frame of the salient image information, and an inter-frame prediction coding mode is adopted for other frames of the salient image information; performing corresponding DCT inverse transformation on the coded image signal, and performing corresponding decoding to obtain a decoded image signal; and uploading the decoded image signal.

In addition, the inter prediction encoding mode specifically includes: and for each frame to be coded currently, carrying out motion estimation and compensation on the previous frame, and subtracting the frame to be coded currently from the frame obtained by compensation to obtain a corresponding residual frame.

Further, said performing a corresponding inverse DCT transform on said encoded image signal comprises: judging whether the frame is an intra-frame coding frame or an inter-frame coding frame, and performing compressed sensing reconstruction and DCT inverse transformation on the intra-frame coding frame; and for the inter-frame coding frame, firstly carrying out compressed sensing reconstruction, compensating image information by combining motion information, and carrying out operation with the residual frame.

Furthermore, the rate-distortion cost is: j = D + λ R = L _cls +L _rec + λ R, wherein L _cls For cross-entropy loss of classification, L _rec For reconstruction loss of characteristic compression, D is coding distortion of an inter-frame prediction coding mode, lambda is a Lagrange multiplier, and R is the bit number of the inter-frame prediction coding mode.

Another aspect of the present invention also provides an image processing apparatus based on hybrid deep learning, the apparatus including: the image acquisition device, the encoder and the decoder are connected through optical fibers or wirelessly; the image collector is used for obtaining an image signal; the encoder is used for training the image signal through a convolutional neural network and setting the optimized target characteristics as rate distortion cost; extracting shallow network characteristic information of the image signal, classifying the image signal according to the shallow network characteristic information, and distinguishing salient image information or non-salient image information from the image signal; carrying out quantization transformation on the image signal by using DCT (discrete cosine transformation), realizing the transformation from floating point numbers to integers, reducing the sign number from a rational number range to an integer symbol in a limited range, and obtaining the image signal after DCT transformation; coding the image signal after DCT transformation to obtain a coded image signal, wherein when coding the salient image information, an intra-frame coding mode is adopted for a head frame of the salient image information, and an inter-frame prediction coding mode is adopted for other frames of the salient image information; and the decoder is used for performing corresponding DCT inverse transformation on the coded image signal, performing corresponding decoding to obtain a decoded image signal, and uploading the decoded image signal.

Further, said performing a corresponding inverse DCT transform on said encoded image signal comprises: judging whether the frame is an intra-frame coding frame or an inter-frame coding frame; for the intra-frame coding frame, firstly carrying out compressed sensing reconstruction and then carrying out DCT inverse transformation; and for the inter-frame coding frame, firstly carrying out compressed sensing reconstruction, compensating image information by combining motion information, and carrying out operation with the residual frame.

In addition, the image collector consists of a sensor and a data acquisition unit; the sensor completes photoelectric conversion by using a CCD to obtain original image data; the data acquisition unit collects R/G/B and YCbCr color space components in the original image data to form the image signal.

The invention also provides an image processing system based on hybrid deep learning, which comprises: the system comprises an external storage end, a cloud monitoring platform and the image processing device; the decoder is connected with the external storage end in a wired mode and uploads the decoded image signal to the external storage end; the decoder is connected with the cloud monitoring platform in a wireless mode and uploads the decoded image signals to the cloud monitoring platform.

The technical scheme provided by the invention can be seen that the invention provides an image processing method, device and system based on hybrid deep learning, wherein the improved HEVC and DCT coding technology are combined, an encoder consisting of a deep learning network processes input image data, the sizes of the intercepted image blocks are subjected to block conversion of 8 × 8, 16 × 16, 32 × 32 and 64 × 64, and the intra-frame coding, inter-frame prediction coding, variation/quantization, filtering and entropy coding operation of a certain degree are adopted, so that the capability of acquiring the prediction probability of the input characteristic symbol is improved, and the decoding output result obtains an approximately complete video source. The scheme is that the non-static image data compression processing is carried out by the hybrid deep learning, and the method has the advantages that the combination of the improved DCT and the deep learning classification algorithm is applied to the traditional algorithm, the salient image information and the non-salient image information are classified by training based on the convolutional neural network model, and the salient information is subjected to intra-frame coding and inter-frame prediction coding to reduce redundancy, so that the compression performance is greatly improved, the coding files are reduced, and the memory redundancy is reduced.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

Fig. 1 is a flowchart of an image processing method based on hybrid deep learning according to embodiment 1 of the present invention;

fig. 2 is a schematic structural diagram of an image processing apparatus based on hybrid deep learning according to embodiment 1 of the present invention;

fig. 3 is a schematic structural diagram of an image processing system based on hybrid deep learning according to embodiment 1 of the present invention;

FIG. 4 is a system overview block diagram of an embodiment provided in example 1 of the present invention;

FIG. 5 is a flow chart of an image data compression process according to an embodiment of the present invention provided in example 1;

fig. 6 is a diagram of a deep learning classification network model according to an embodiment of the present invention provided in example 1.

Detailed Description

The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it is to be understood that the terms "center", "longitudinal", "lateral", "up", "down", "front", "back", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like, indicate orientations or positional relationships based on those shown in the drawings, and are used only for convenience in describing the present invention and for simplicity in description, and do not indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and thus, are not to be construed as limiting the present invention. Furthermore, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying a relative importance or quantity or location.

In the description of the present invention, it should be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.

Example 1

The present embodiment provides an image processing method based on hybrid deep learning, which uses an improved High Efficiency Video Coding (HEVC, also referred to as h.265 Coding technology) in combination with Discrete Cosine Transform (DCT) to process an image. As shown in fig. 1, the method specifically includes:

in step S101, an image signal is acquired.

Step S102, training an image signal through a convolutional neural network, and setting optimization target characteristics as rate distortion cost; specifically, in an alternative embodiment, the rate-distortion cost is: j = D + λ R = L _cls +L _rec + λ R, wherein L _cls For cross-entropy loss of classification, L _rec For reconstruction loss of characteristic compression, D is coding distortion of an inter-frame prediction coding mode, lambda is a Lagrange multiplier, and R is the bit number of the inter-frame prediction coding mode. The optimization goal is to minimize the rate-distortion cost jointly formed by distortion and rate estimation, wherein the distortion part comprises the classified cross entropy loss and the reconstruction loss of the feature compression module, and the rate part comprises the size of the coded file of the feature map and the side information.

After acquiring the image signal, it is necessary to determine the division depth through the input parameters, determine the I frame (i.e., intra picture, usually the header frame), and arrange the following frames in a strictly specified order for the subsequent encoding.

Step S103, extracting shallow network characteristic information of the image signal, classifying the image signal according to the shallow network characteristic information, and distinguishing salient image information or non-salient image information from the image signal; specifically, the salient information and the non-salient information are classified based on the convolutional neural network model training, and intra-frame coding and inter-frame prediction coding can be subsequently performed on the salient information to reduce intra-frame redundancy, so that the image compression rate is improved, and the compression performance is improved.

And step S104, carrying out quantization transformation on the image signal by using DCT (discrete cosine transformation), realizing the transformation from floating point numbers to integers, reducing the number of symbols from a rational number range to an integer symbol in a limited range, and obtaining the image signal after DCT transformation.

Step S105, encoding the DCT-transformed image signal to obtain an encoded image signal, wherein, when encoding the salient image information, an intra-frame encoding mode is used for a head frame of the salient image information, and an inter-frame prediction encoding mode is used for other frames of the salient image information. Specifically, the purpose of encoding is to reduce the amplitude of the non-zero coefficient after encoding, increase the number of zero values, and lay a foundation for achieving the purpose of data compression by the following entropy encoding. The first frame of the image adopts an intra-frame coding mode, after DCT (discrete cosine transform) transformation with characteristics of decorrelation and energy concentration, spatial correlation is changed into irrelevant data in a frequency domain, spatial redundancy is removed, then quantization is carried out, and then quantization and entropy coding are carried out by using DCT coefficients, so that a binary coding file is finally obtained.

In an optional embodiment, the inter-prediction encoding mode specifically includes: and for each frame to be coded currently, carrying out motion estimation and compensation on the previous frame, and subtracting the frame to be coded currently from the frame obtained by compensation to obtain a corresponding residual frame. The inter prediction encoding mode may remove temporal redundancy.

When inter-frame prediction coding is carried out on a coding unit of the highlighted image information, the most adjacent pixels in the adjacent coding units are taken as references, and the reference pixels are divided into five parts according to the positions of the coding units: left lower, left upper, right upper. Referring to the line inter prediction technique, the usable reference pixel lines are extended to 3 lines. In consideration of the fact that some reference pixels can not be completely acquired, the pixels which can not be acquired can be filled in a corrected mode by assigning twice before transformation through some methods such as pixel interpolation and by directly assigning values to skipped pixels.

In addition, the H.265 coding technology is combined with the deep learning-based prior image compression to construct a compression network, redundancy elimination is carried out on the data frame in a time domain and a space domain by adopting inter-frame prediction coding and intra-frame coding modes respectively, and the problem of data redundancy in the time domain is solved. The h.265 coding technique provides more different tools for reducing the code rate, and the truncated image blocks can be 8 × 8, 16 × 16, 32 × 32, 64 × 64, etc. in terms of coding units, i.e. 8 × 8 minimum blocks and 64 × 64 maximum blocks are coded. In areas with small image information amount (color change is not obvious, for example, in expressway monitoring images and videos, red parts of vehicle bodies and gray parts of the ground) are divided into large macro blocks, the coded code words are fewer, and in places with large details (such as tires), the divided macro blocks are correspondingly small and more, the coded code words are more, so that the image is subjected to important coding, the whole code number is reduced, and the coding efficiency is improved. In addition, the inter-frame prediction coding mode of the h.265 coding technique supports 33 directions (relatively speaking, the h.264 coding technique only supports 8 directions), and provides better motion compensation processing and vector prediction methods.

In addition, the data distortion condition of the coding aggregation is regarded as a random experiment, an interval of distortion values possibly generated by each coding unit is used as a sample space, and the distortion values are used as random events and respectively correspond to the occurrence probability of the random events. The distortion degree is evaluated, the index of the method selects and refers to the mean square error of the MSE as a standard, the mean square error is obtained by calculating the mean of the squares of the difference values between two signals, and the difference between pixels is compared:

wherein x and

respectively representing the original image and the reconstructed image,

express the sum of x

The mean of the squares of the differences of (a),

is the width x height of the image. The lower the value of MSE, the better the reconstructed image, i.e. the better the compression.

Step S106, carrying out corresponding DCT inverse transformation on the coded image signal, and carrying out corresponding decoding to obtain a decoded image signal; in an alternative embodiment, the corresponding inverse DCT transform of the encoded image signal comprises: judging whether the frame is an intra-frame coding frame or an inter-frame coding frame, and performing compressed sensing reconstruction and DCT inverse transformation on the intra-frame coding frame; and for the interframe coding frame, firstly carrying out compressed sensing reconstruction, compensating image information by combining motion information, and carrying out operation with a residual frame.

When DCT inverse transformation is carried out, in order to prevent that original information can not be recovered without loss caused by inverse transformation error, high frequency components are abandoned and eliminated in the process of inverse transformation.

In step S107, the decoded image signal is uploaded. Specifically, the data can be uploaded to the cloud monitoring platform by being connected with the cloud monitoring platform through a wireless router.

The image processing method for hybrid deep learning provided in this embodiment adopts the combination of improved HEVC and DCT coding techniques, and an encoder constituted by a deep learning network processes input image data, performs block conversion on the captured image blocks by performing sizes of 8 × 8, 16 × 16, 32 × 32, and 64 × 64, and adopts intra-frame coding, inter-frame predictive coding, variation/quantization, filtering, and a certain degree of entropy coding operations, thereby improving the capability of obtaining the prediction probability of input feature symbols, and enabling the decoding output result to obtain an approximately complete video source. The scheme is that the non-static image data compression processing is carried out by the hybrid deep learning, and the method has the advantages that the combination of the improved DCT and the deep learning classification algorithm is applied to the traditional algorithm, the salient image information and the non-salient image information are classified by training based on the convolutional neural network model, and the salient information is subjected to intra-frame coding and inter-frame prediction coding to reduce redundancy, so that the compression performance is greatly improved, the coding files are reduced, and the memory redundancy is reduced.

The present embodiment also provides an image processing apparatus based on hybrid deep learning, as shown in fig. 2, the image processing apparatus 20 includes: an image collector 201, an encoder 202 and a decoder 203 which are connected through optical fibers or wirelessly; specifically, each part of the image processing apparatus 20 is composed of electrical sensor nodes, and the nodes are connected by optical fibers or wireless communication.

An image collector 201 for obtaining an image signal; in a specific embodiment, the image collector 201 may be composed of a sensor and a data collecting unit; the sensor completes photoelectric conversion by using a CCD to obtain original image data; the data acquisition unit collects R/G/B and YCbCr color space components in the original image data to form an image signal.

The number of the image collectors 201 can be several, and the field configuration can be performed according to the needs. In an optional embodiment, a plurality of image collectors 201 may be installed on the external monitoring device set on the highway in a matching manner, and are used for receiving parameter data transmitted from the camera in real time. The image collector 201 can be installed on a complete set of external monitoring equipment through RS-485 and RS-232 data interfaces, and is connected to front-end camera equipment through RS-485, RS-232 lines and 485/232 interfaces. For example, the external input monitoring device of the image acquirer 201 may be a POE bolt camera or other accessory device based on an h.265 coding mode. Of course, the image capturing device may be built in the image capturing unit 201. In addition, the image collectors 201 may be connected by an optical fiber or a data line concentrator through an insulating plug in a signal shielding manner, wherein the optical fiber is covered with an insulating sheath. The image collector 201 inputs the collected image signal to the encoder 202 through the optical fiber shielding connection.

The image collector 201 can support various communication modes, has a Wide Area Network (WAN) port, a Local Area Network (LAN) port, an RS232 interface, an inter-integrated circuit (I2C) port, a transistor-transistor logic (TTL) level serial port, a 4-way switching value input interface, an 8-way analog value input interface (12 bit AD, 4-20mA current or 0-5V voltage signal support), a 4-way relay output interface, a 5-way power output interface (peripheral power supply) and other various interfaces, and can be compatible with the collection requirements of various sensors. For example, the image collector 201 may utilize an embedded TCP/IP protocol communication interface to transmit data from the image collector 201 to the encoder 202 or the cloud device; the image collector 201 may also be connected to the encoder 202 by means of wireless communication transmission. In order to provide connection-oriented, reliable, full-duplex, point-to-point communication services, both communicating parties need to establish a TCP connection before communicating using TCP. When it is desired to end the communication, the TCP connection may be terminated by either party to the communication. To ensure that both the establishment and termination of the connection are reliable, TCP uses a three-way handshake.

The encoder 202 is used for training the image signal through a convolutional neural network and setting the optimized target characteristics as rate distortion cost; specifically, in an alternative embodiment, the rate-distortion cost is: j = D + λ R = L _cls +L _rec + λ R, wherein L _cls For cross-entropy loss of classification, L _rec For reconstruction loss of characteristic compression, D is coding distortion of an inter-frame prediction coding mode, lambda is a Lagrange multiplier, and R is the bit number of the inter-frame prediction coding mode. The optimization goal is to minimize the rate-distortion cost jointly formed by distortion and rate estimation, wherein the distortion part comprises the classified cross entropy loss and the reconstruction loss of the feature compression module, and the rate part comprises the size of the coded file of the feature map and the side information.

The encoder 202 is further configured to extract shallow layer network feature information of the image signal, classify the image signal according to the shallow layer network feature information, and distinguish salient image information or non-salient image information from the image signal; specifically, the salient information and the non-salient information are classified based on the convolutional neural network model training, and intra-frame coding and inter-frame prediction coding can be subsequently performed on the salient information to reduce intra-frame redundancy, so that the image compression rate is improved, and the compression performance is improved.

The encoder 202 is further configured to perform quantization transformation on the image signal by using DCT transformation, implement conversion from floating point numbers to integers, reduce the number of symbols from a rational number range to an integer symbol of a limited range, and obtain the DCT-transformed image signal.

The encoder 202 is further configured to encode the DCT-transformed image signal to obtain an encoded image signal, where when encoding the salient image information, an intra-frame encoding mode is used for a head frame of the salient image information, and an inter-frame prediction encoding mode is used for other frames of the salient image information. Specifically, the purpose of encoding is to reduce the amplitude of the non-zero coefficient after encoding, increase the number of zero values, and lay a foundation for achieving the purpose of data compression by the following entropy encoding. A first frame of the image adopts an intra-frame coding mode, after DCT (discrete cosine transform) transformation with the characteristics of decorrelation and energy concentration, the spatial correlation is changed into irrelevant data in a frequency domain, the spatial redundancy is removed, then quantization is carried out, and then quantization and entropy coding are carried out on a DCT coefficient, so that a binary coding file is finally obtained.

In an optional embodiment, the inter-frame prediction coding mode specifically includes: and for each frame to be coded currently, carrying out motion estimation and compensation on the previous frame, and subtracting the frame to be coded currently from the frame obtained by compensation to obtain a corresponding residual frame. The inter prediction encoding mode may remove temporal redundancy.

In addition, the H.265 coding technology is combined with the deep learning-based prior image compression to construct a compression network, redundancy elimination is carried out on the data frame in a time domain and a space domain by adopting inter-frame prediction coding and intra-frame coding modes respectively, and the problem of data redundancy in the time domain is solved. The h.265 coding technique provides more different tools for reducing the code rate, and the truncated image blocks can be 8 × 8, 16 × 16, 32 × 32, 64 × 64, etc. in terms of coding units, i.e. 8 × 8 minimum blocks and 64 × 64 maximum blocks are coded. In an area with a small amount of image information (color change is not obvious, for example, in a highway monitoring image and a video, a red part of a car body and a gray part of the ground) is divided into large macro blocks, the coded code words are fewer, and in a place with a large amount of details (such as a tire), the divided macro blocks are correspondingly small and more, the coded code words are more, so that the image is subjected to important coding, the whole code number is reduced, and the coding efficiency is improved. In addition, the inter-frame prediction coding mode of the h.265 coding technique supports 33 directions (relatively speaking, the h.264 coding technique only supports 8 directions), and provides better motion compensation processing and vector prediction methods.

wherein, x and

respectively representing the original image and the reconstructed image,

express the sum of x

The mean of the squares of the differences of (a),

The decoder 203 is configured to perform corresponding inverse DCT transform on the encoded image signal, perform corresponding decoding, obtain a decoded image signal, and upload the decoded image signal. Specifically, the decoder 203 may be connected to the encoder 202 through an optical fiber, may be connected to the cloud monitoring platform through a wireless router, and transmits data to the cloud monitoring platform. In addition, in a specific implementation manner, the decoder 203 may include an image parsing module, an image selecting module, and an image encoding module, and the three modules may further include a signal analyzing unit, a storage unit, a display unit, and an inquiring unit; the display unit is connected with the signal analysis unit.

In an alternative embodiment, the decoder 203 performing the corresponding inverse DCT transform on the encoded image signal comprises: judging whether the frame is an intra-frame coding frame or an inter-frame coding frame; for an intra-frame coding frame, firstly carrying out compressed sensing reconstruction and then carrying out DCT inverse transformation; for the interframe coding frame, compressed sensing reconstruction is firstly carried out, image information is compensated by combining motion information, and operation is carried out on the image information and the residual frame.

The image processing apparatus with hybrid deep learning provided in this embodiment adopts the combination of the improved HEVC and the DCT coding technology, and the encoder composed of the deep learning network processes the input image data, performs block conversion on the captured image blocks by performing sizes of 8 × 8, 16 × 16, 32 × 32, and 64 × 64, and adopts intra-frame coding, inter-frame predictive coding, variation/quantization, filtering, and a certain degree of entropy coding operations, so as to improve the capability of obtaining the prediction probability of the input feature symbol, and obtain an approximately complete video source from the decoding output result. The scheme is that the non-static image data compression processing is carried out by the hybrid deep learning, and the method has the advantages that the combination of the improved DCT and the deep learning classification algorithm is applied to the traditional algorithm, the salient image information and the non-salient image information are classified by training based on the convolutional neural network model, and the salient information is subjected to intra-frame coding and inter-frame prediction coding to reduce redundancy, so that the compression performance is greatly improved, the coding files are reduced, and the memory redundancy is reduced.

In addition, the image processing apparatus 20 may further include an external device kit, the external device kit is connected with a capacitor in parallel, and the electrical connection of the external device kit includes: the voltage-current transformer, the circuit breaker, the reactor, the grounding switch, the discharge coil, the capacitor bank and the like can provide complete power supply and circuit circulation functions.

The embodiment also provides an image processing system based on hybrid deep learning, as shown in fig. 3, the image processing system based on hybrid deep learning includes: an external storage 301, a cloud monitoring platform 302, and the image processing apparatus 20; the decoder 203 is connected with the external storage terminal 301 in a wired mode, and uploads the decoded image signal to the external storage terminal 301; the decoder 203 is connected to the cloud monitoring platform 302 in a wireless manner, and uploads the decoded image signal to the cloud monitoring platform 302.

The embodiment also provides a specific implementation scheme aiming at the image processing system of the invention, and the implementation scheme can be applied to an expressway monitoring system, reduces the network transmission bandwidth capacity and removes redundant data by compressing expressway end monitoring video data received in real time, so that the highway emergency response work and the maintenance diagnosis work are more convenient and faster, and the reliability of traffic real-time monitoring control is improved. This embodiment is specifically as follows:

fig. 4 is a general block diagram of the system of this embodiment, and the system includes: the device comprises an input end, an image collector, an encoder, a decoder, an external storage end and a cloud monitoring platform. Wherein: the image collector comprises a signal sensor, a data acquisition and transmission instrument and a camera device. The signal sensor is arranged at the tail part of external complete equipment; the data acquisition and transmission instrument can be arranged in an externally-mounted power distribution cabinet by adopting an RS-485 line; the camera device adopts a POE (Power over Ethernet) bolt camera, supports an H.265 coding mode, is used for acquiring the image condition in real time, and can set the monitoring time to be 0.1ms. The data acquisition and transmission instrument is connected with the encoder and is transmitted through wireless communication; the encoder is connected with the decoder through an optical fiber; the decoder is connected to an external storage end through a wired connection mode and is connected to the cloud monitoring platform through a wireless route. And after the image data processed by the decoder is stored in an external storage end, the image data can be called to a cloud monitoring platform through an RS-485 data line.

Fig. 5 is a flowchart showing the image data compression processing of the present embodiment. After the input signal X is obtained, in order to reduce network parameters and calculation amount, the building of a lightweight deep learning classification network model is designed. The method is improved on the basis of the traditional CNN convolutional neural network algorithm, combines the neural network with a dynamic probability model, firstly adopts five layers of convolution to carry out probability modeling on the characteristics, optimizes a characteristic compression module, estimates the information entropy of the characteristics, fixes and classifies network parameters, utilizes DCT quantization, estimates the code word size of a coding file through an entropy coding probability model, and finally realizes end-to-end rate distortion optimization.

Fig. 6 is a diagram of a deep learning classification network model according to the present embodiment, and the encoder is mainly composed of a deep learning classification network. After receiving an input image signal, a five-layer convolution network is adopted, and a GND layer is used between each two layers to increase the nonlinear capacity, so that the neural network training is accelerated. The size of the convolution kernel is 3 × 3, and the second and fourth layers are down-sampled by a factor of 2 by an operation of step size 2. The part of the decoder is symmetric to the encoder, where the second and fourth layers are 2 times up-sampled deconvolution operations.

In this embodiment, the depth is studiedThe learning network structure trains the input image signals to obtain a classification result, and classifies the salient image information and the non-salient image information. During training, the classification network parameters are fixed, and only the feature compression module is optimized. Training a neural network optimization target to be rate distortion cost, wherein the rate distortion cost is formed by minimum distortion and code rate estimation, the minimum distortion part comprises classified cross entropy loss and feature compression reconstruction loss, the code rate estimation part comprises a feature graph and the size of an encoding file of side information, and the specific writing can be as follows: j = D + λ R = L _cls +L _rec + λ R, wherein L _cls For cross-entropy loss of classification, L _rec For reconstruction loss of characteristic compression, D is coding distortion of an inter-frame prediction coding mode, lambda is a Lagrange multiplier, and R is the bit number of the inter-frame prediction coding mode. And respectively finishing the feature compression results with different code rates at three positions of Layer1, layer2 and Layer3 by adjusting the value of lambda. Layer1 refers to the first Layer of the convolutional neural network, i.e. the input Layer; layer2 and Layer3 refer to the second and third layers of the convolutional neural network, respectively. With the deep backward movement of the network, the presented features are more abstract, and the code rate required for achieving the same classification precision is gradually reduced, namely, the more backward features are less sensitive to the loss caused by compression, the less sensitive feature parameters are set as non-salient image information, and the sensitive feature parameters are salient image information, so that most of the energy of the more insensitive feature parameters is concentrated in a small range of a frequency domain, and thus, the less important components are described, and the less bits are required, which is beneficial to the next DCT quantization operation.

When inter-frame prediction coding is carried out on a coding unit of the highlighted image information, the nearest pixel in the adjacent coding unit is taken as a reference, and the reference pixel is divided into five parts according to the position of the coding unit: left lower, left upper, right upper. Referring to the line inter prediction technique, the usable reference pixel lines are extended to 3 lines. In consideration of the fact that some reference pixels can not be completely acquired, the pixels which can not be acquired can be filled in a corrected mode by assigning twice before transformation through some methods such as pixel interpolation and by directly assigning values to skipped pixels.

Carrying out quantization transformation on the N multiplied by N characteristic salient blocks after deep learning network training, carrying out key framing on the obtained salient information, adopting improved DCT (discrete cosine transformation) coding for control, carrying out quantization transformation on the N multiplied by N characteristic salient blocks after deep learning network training, wherein an input sequence is 2 times of an output sequence, and a one-dimensional real-valued sequence with the length of N is defined

Improved DCT transformation of (3).

The transformation matrix obtained by simplifying and correcting the two one-dimensional DCT components is as follows:

all the element values of the corresponding correction matrix are 1/128.

In the encoding process, each sub-block is first converted into sub-blocks by using two-dimensional DCT

DCT transform coefficient, in which the first value DC component of the top left corner of the n x n image transform matrix expresses the average value of the pixel values of n x n space domain image sub-blocks, and the rest are

The coefficients are alternating current components. The input sequence is as follows:

is an output sequence

2 times of the total weight of the composition.

Redefining one-dimensional real-valued sequence with length N

The improved DCT transform of (1) is as follows:

wherein,

it is indicated that the original image data is,

represented is modified DCT transformed data, N represents the sequence length, and N is a positive integer.

After quantization, entropy coding is performed to remove redundancy. The minimum average code length of a coded file is equal to its marginal probability true distribution m (y), and:

where R is the average code length and P (y) is the entropy coding probability model. R can take a minimum value when P (y) is identical to m (y). The entropy coding model adopts a dynamic coding model, and counts the probability of a certain symbol appearing locally in a certain part or the probability of the symbol appearing given a certain context, so that the obtained probability distribution is more accurate, and the fluctuation of the symbol probability in different areas can be better adapted. Therefore, the modeling method of the invention adopts the simple assumption of the marginal probability distribution of the quantization feature y:

i.e., the symbolic probability distributions at each spatial location are independent of each other, this approach fails to model the spatial correlation of y, not matching the true distributions. Therefore, the invention designs a side information extraction network to learn how to construct the conditional probability dependence in a self-adaptive manner. The information extraction network adopts the structure of a coder/decoder, and reduces the data volume of the side information by reducing the spatial resolution and the channel number of the features. After the calculation of the encoder three-layer convolution, the visual field of the boundary range on the feature map can reach 15 × 15, and corresponding to the input image, the visual field can reach 285 × 285, and for most images, the maximum image block is enough to be contained.

Assuming an image size of M × M, its inverse transform can be represented by an orthogonal frequency domain transform as:

wherein M =1,2, … … M; n =1,2, … … M; m is the length of the image, and M is a positive integer.

The scheme is a mixed deep learning non-static image data compression system, and has the advantages that: the improved DCT and deep learning classification algorithm is combined and applied to the traditional algorithm, the salient image information and the non-salient image information are classified based on the training of the neural network model, and the salient information is subjected to intra-frame prediction and inter-frame prediction to reduce redundancy, so that the compression performance is greatly improved, the coding files are reduced, and the memory redundancy is reduced. In this embodiment, the compressed data is uploaded to a cloud monitoring platform, so that the follow-up deeper research on the high-speed traffic information can be conveniently carried out.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are exemplary and not to be construed as limiting the present invention, and that those skilled in the art may make variations, modifications, substitutions and alterations within the scope of the present invention without departing from the spirit and scope of the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. An image processing method based on hybrid deep learning, comprising:

acquiring an image signal;

training the image signal through a convolutional neural network, and setting the optimized target characteristics as rate distortion cost;

extracting shallow network characteristic information of the image signal, classifying the image signal according to the shallow network characteristic information, and distinguishing salient image information or non-salient image information from the image signal;

carrying out quantization transformation on the image signal by using DCT (discrete cosine transformation), realizing the transformation from floating point numbers to integers, reducing the sign number from a rational number range to an integer symbol in a limited range, and obtaining the image signal after DCT transformation;

coding the image signal after DCT transformation to obtain a coded image signal, wherein when coding the salient image information, an intra-frame coding mode is adopted for a head frame of the salient image information, and an inter-frame prediction coding mode is adopted for other frames of the salient image information;

performing corresponding DCT inverse transformation on the coded image signal, and performing corresponding decoding to obtain a decoded image signal;

and uploading the decoded image signal.

2. The image processing method according to claim 1,

the inter prediction encoding mode specifically includes: and for each frame to be coded currently, carrying out motion estimation and compensation on the previous frame, and subtracting the frame to be coded currently from the frame obtained by compensation to obtain a corresponding residual frame.

3. The image processing method according to claim 2,

said performing a corresponding inverse DCT transform on said encoded image signal comprises:

judging whether the frame is an intra-frame coding frame or an inter-frame coding frame, and performing compressed sensing reconstruction and DCT inverse transformation on the intra-frame coding frame; and for the inter-frame coding frame, firstly carrying out compressed sensing reconstruction, compensating image information by combining motion information, and carrying out operation with the residual frame.

4. The image processing method according to any one of claims 1 to 3, wherein the rate-distortion cost is:

J=D+λR =L _cls +L _rec +λR

wherein L is _cls For cross-entropy loss of classification, L _rec For reconstruction loss of characteristic compression, D is coding distortion of an inter-frame prediction coding mode, lambda is a Lagrange multiplier, and R is the bit number of the inter-frame prediction coding mode.

5. An image processing apparatus based on hybrid deep learning, the apparatus comprising: the system comprises an image collector, an encoder and a decoder which are connected through optical fibers or wirelessly;

the image collector is used for obtaining an image signal;

the encoder is used for training the image signal through a convolutional neural network and setting the optimized target characteristics as rate distortion cost; extracting shallow network characteristic information of the image signal, classifying the image signal according to the shallow network characteristic information, and distinguishing salient image information or non-salient image information from the image signal; carrying out quantization transformation on the image signal by using DCT (discrete cosine transformation), realizing the transformation from floating point numbers to integers, reducing the sign number from a rational number range to an integer symbol in a limited range, and obtaining the image signal after DCT transformation; coding the image signal after DCT transformation to obtain a coded image signal, wherein when coding the highlight image information, an intra-frame coding mode is adopted for a head frame of the highlight image information, and an inter-frame prediction coding mode is adopted for other frames of the highlight image information;

and the decoder is used for performing corresponding DCT inverse transformation on the coded image signal, performing corresponding decoding to obtain a decoded image signal, and uploading the decoded image signal.

6. The image processing apparatus according to claim 5,

7. The image processing apparatus according to claim 6,

judging whether the frame is an intra-frame coding frame or an inter-frame coding frame;

for the intra-frame coding frame, firstly carrying out compressed sensing reconstruction and then carrying out DCT inverse transformation; and for the inter-frame coding frame, firstly carrying out compressed sensing reconstruction, compensating image information by combining motion information, and carrying out operation with the residual frame.

8. The image processing apparatus according to claim 5, wherein the rate-distortion cost is:

J=D+λR =L _cls +L _rec +λR

wherein L is _cls For the cross of classificationFork entropy loss, L _rec For reconstruction loss of characteristic compression, D is coding distortion of an inter-frame prediction coding mode, lambda is a Lagrange multiplier, and R is the bit number of the inter-frame prediction coding mode.

9. The image processing device of claim 5, wherein the image collector is composed of a sensor and a data acquisition unit;

the sensor completes photoelectric conversion by using a CCD to obtain original image data;

the data acquisition unit collects R/G/B and YCbCr color space components in the original image data to form the image signal.

10. An image processing system based on hybrid deep learning, the system comprising: an external storage terminal, a cloud monitoring platform and the image processing device according to any one of claims 5 to 9;

the decoder is connected with the external storage end in a wired mode and uploads the decoded image signal to the external storage end;

the decoder is connected with the cloud monitoring platform in a wireless mode and uploads the decoded image signals to the cloud monitoring platform.