CN118644640B

CN118644640B - Underwater image three-dimensional reconstruction method and system based on deep learning

Info

Publication number: CN118644640B
Application number: CN202411093922.1A
Authority: CN
Inventors: 丁少春; 黄勇; 安楠; 鲍习中; 王翊坤; 赵希赟; 童玲婉
Original assignee: Ningbo Bohai Shenheng Technology Co ltd
Current assignee: Ningbo Bohai Shenheng Technology Co ltd
Priority date: 2024-08-09
Filing date: 2024-08-09
Publication date: 2024-10-29
Anticipated expiration: 2044-08-09
Also published as: CN118644640A

Abstract

The invention relates to the technical field of underwater imaging, in particular to an underwater image three-dimensional reconstruction method and system based on deep learning. The method comprises the following steps: acquiring target underwater position data; performing single-view preliminary shooting according to the underwater position data of the target to obtain a preliminary shooting image of the underwater target; performing image ambiguity analysis on the primary shot image of the underwater target to generate single-view image ambiguity data of the underwater target; performing target motion recognition on the underwater target single-view image ambiguity data to generate target motion recognition data; according to the invention, the image blur compensation, multi-view shooting and accurate depth estimation are carried out on the underwater target, so that the comprehensiveness and accuracy of three-dimensional image reconstruction are improved.

Description

Underwater image three-dimensional reconstruction method and system based on deep learning

Technical Field

The invention relates to the technical field of underwater imaging, in particular to an underwater image three-dimensional reconstruction method and system based on deep learning.

Background

The background development history of the underwater image three-dimensional reconstruction method based on deep learning can be described as follows: with the progress of the underwater detection technology, the conventional underwater image processing method faces the challenges of large underwater illumination change, image blurring, lack of effective depth information and the like. With the advent of deep learning techniques, researchers began exploring the task of applying Convolutional Neural Networks (CNNs) to feature extraction and reconstruction of underwater images. By utilizing a large number of tagged data sets, such as underwater virtual data sets and limited field data, researchers have first developed CNN-based underwater image enhancement algorithms for improving the sharpness and contrast of images. With the introduction of technologies such as generation of a countermeasure network (GAN) and a self-encoder (Autoencoder), researchers have begun to explore how to reconstruct more accurate three-dimensional structures from a single underwater image, these methods utilize deep neural networks to learn features from complex underwater optical and physical scenes and through an end-to-end training process, achieve the ability to accurately reconstruct object shape and depth information in an underwater environment. However, the current traditional method is often influenced by factors such as light attenuation and water flow when processing underwater images, so that the images are blurred and motion blurred, meanwhile, the underwater target images can be obtained only from a single view angle, the reconstructed three-dimensional model is lack of comprehensiveness and accuracy, and the comprehensiveness and accuracy of target three-dimensional image reconstruction are low.

Disclosure of Invention

Based on this, it is necessary to provide a method and a system for three-dimensional reconstruction of underwater images based on deep learning, so as to solve at least one of the above technical problems.

To achieve the above object, a three-dimensional reconstruction method of an underwater image based on deep learning, the method comprising the steps of:

Step S1: acquiring target underwater position data; performing single-view preliminary shooting according to the underwater position data of the target to obtain a preliminary shooting image of the underwater target; performing image ambiguity analysis on the primary shot image of the underwater target to generate single-view image ambiguity data of the underwater target; performing target motion recognition on the underwater target single-view image ambiguity data to generate target motion recognition data;

Step S2: performing target classification on the standard underwater target single-view shot image according to the target motion identification data to generate first underwater target type data and second underwater target type data; performing multi-view shooting compensation on the standard underwater target single-view shooting image based on the first underwater target type data and the second underwater target type data, so as to generate a standard underwater target multi-view shooting image;

step S3: performing image superposition on a standard underwater target multi-view shot image and a standard underwater target single-view shot image to generate a target view superposition image; performing depth estimation on the target visual angle superposition image to generate an underwater target depth map; performing depth fusion reconstruction on the target visual angle superposition image and the underwater target depth map to generate a target underwater fusion depth map;

step S4: performing three-dimensional point cloud conversion on the target underwater fusion depth map and the standard underwater target multi-view shooting image to generate target underwater three-dimensional point cloud data; constructing a three-dimensional model of the target underwater three-dimensional point cloud data to generate a target underwater three-dimensional preliminary model; and performing model rendering on the target underwater three-dimensional preliminary model so as to generate a high-precision target three-dimensional underwater model to execute three-dimensional reconstruction operation of the underwater image.

According to the invention, the underwater target single-view image ambiguity data and the target motion identification data are generated by acquiring the target underwater position data, carrying out single-view preliminary shooting and image ambiguity analysis, and the data provide a basis for subsequent depth estimation and multi-view shooting compensation. The image is classified by utilizing the target motion identification data, the first underwater target type data and the second underwater target type data are generated, multi-view shooting compensation is carried out, and the information quantity and the accuracy of the underwater target image are enhanced. And overlapping the standard underwater target multi-view shooting image and the single-view shooting image, and carrying out depth estimation and depth fusion reconstruction to generate a high-quality target underwater fusion depth map, wherein the steps improve the accuracy of depth information acquisition and image reconstruction under an underwater scene. And generating target underwater three-dimensional point cloud data and a preliminary three-dimensional model by carrying out three-dimensional point cloud conversion and model construction on the target underwater fusion depth map and the multi-view shooting image. And then performing model rendering on the preliminary model to generate a high-precision target three-dimensional underwater model, and providing detailed and real three-dimensional representation for three-dimensional reconstruction operation of the underwater image. Therefore, the method improves the comprehensiveness and accuracy of three-dimensional image reconstruction by carrying out image blur compensation, multi-view shooting and accurate depth estimation on the underwater target.

Preferably, step S1 comprises the steps of:

step S11: acquiring target underwater position data by using a GPS;

Step S12: performing single-view preliminary shooting by utilizing an underwater camera array according to the underwater position data of the target to obtain an underwater target preliminary shooting image;

Step S13: performing image preprocessing on the primary shooting image of the underwater target to obtain a standard single-view shooting image of the underwater target, wherein the image preprocessing comprises image brightness enhancement, image geometric transformation and image smoothing;

Step S14: performing image ambiguity analysis on a standard underwater target single-view shot image to generate underwater target single-view image ambiguity data; and carrying out target motion recognition on the underwater target single-view image ambiguity data to generate target motion recognition data.

According to the invention, the GPS technology is utilized to accurately and rapidly acquire the underwater position data of the target, so that the positioning error is reduced, and the working efficiency is improved. The underwater camera array is used for shooting at a single view, so that a preliminary underwater target image can be acquired in a complex underwater environment, and basic data is provided for subsequent processing. The brightness enhancement, geometric transformation and smoothing treatment in the image preprocessing process effectively improve the definition and stability of the image, so that the subsequent analysis is more accurate and reliable. And the single-view image is subjected to ambiguity analysis, so that the image quality is evaluated, and the blurred image caused by the influence of the underwater environment is identified, so that the accuracy of the subsequent 3D reconstruction is ensured. The motion characteristics of the target can be effectively distinguished by carrying out target motion recognition through the ambiguity data, which is very important for tracking and recognition of the dynamic target.

Preferably, step S2 comprises the steps of:

Step S21: performing target optical flow tracking on the standard underwater target single-view shot image according to the target motion identification data to generate target optical flow tracking data;

Step S22: performing target classification on a standard underwater target single-view shot image through target optical flow tracking data and target motion identification data to generate first underwater target type data and second underwater target type data; judging the target type of the standard underwater target single-view shot image, and outputting the standard underwater target single-view shot image when the standard underwater target single-view shot image is confirmed to be the first underwater target type data;

Step S23: when the target is confirmed to be the second underwater target type data, multi-view shooting compensation is carried out on the standard underwater target single-view shooting image based on the underwater target single-view image ambiguity data, so that the standard underwater target multi-view shooting image is generated.

According to the invention, the target movement identification data is utilized to track the target optical flow, so that the movement track of the target can be more accurately captured, and the accuracy of the image data is improved. The target motion recognition data is utilized to track the target optical flow, so that the motion trail of the target can be captured more accurately, and the accuracy of the image data is improved. Aiming at the second underwater target type data, the defects of single-view shooting can be effectively overcome through multi-view shooting compensation, and more comprehensive target image information is obtained, which is particularly important for complex underwater environments, because more view information can be provided, and the quality of 3D reconstruction is improved. In the process of confirming the target type, the shooting strategy can be dynamically adjusted, and for the first underwater target type data, a standard single-view shot image is directly output, so that the processing efficiency is improved; for the second underwater target type data, the image quality is further optimized by compensating the photographing. The whole process of the step S2 enables the system to be more flexible and robust when processing different types of underwater targets, can adapt to changeable underwater environments, and provides high-quality image data support.

Preferably, the object classification of the standard underwater object single view shot image by the object optical flow tracking data and the object motion recognition data includes:

screening adjacent frame images of the standard underwater target single-view shot images to obtain the adjacent frame images;

Performing target position offset analysis on the adjacent frame images through the target optical flow tracking data to generate target position offset data;

Performing target motion period analysis on the target position offset data by utilizing the target motion identification data to generate target motion period analysis data; performing motion law exploration on the target motion cycle analysis data to obtain target motion law exploration data, wherein the target motion law exploration data comprises target motion law data and target motion irregular data;

Classifying targets of the standard underwater target single-view shot images according to the object motion rule exploration data, and generating first underwater target type data when the object motion rule exploration data are the target motion rule data; and when the object motion rule exploration data is the target motion irregular data, generating second underwater target type data.

According to the invention, the adjacent frame images are screened for the standard underwater target single-view shot images, so that the relation between adjacent frames in the target motion process can be rapidly determined, the data processing amount is effectively reduced, and the processing speed is improved. And performing target position offset analysis on the adjacent frame images by utilizing the target optical flow tracking data to generate accurate target position offset data, thereby being beneficial to accurately capturing the displacement change of the target in each frame image and improving the accuracy of motion analysis. And carrying out target movement period analysis on the target position deviation data by utilizing the target movement identification data, so that the movement period characteristics of the target can be identified, detailed target movement period analysis data is generated, and a solid foundation is provided for subsequent movement rule exploration. The motion rule exploration is carried out on the target motion cycle analysis data, the motion mode of the target, including regular motion and irregular motion, can be identified, so that target motion rule exploration data are generated, the process is helpful for understanding the motion behaviors of the target in depth, and the classification accuracy is improved. Target classification is carried out on the standard underwater target single-view shot images according to target movement rule exploration data, and the regularly moving targets and the irregularly moving targets can be effectively distinguished. When the target motion rule exploration data is target motion rule data, generating first underwater target type data; when the target motion rule exploration data is target motion irregular data, generating second underwater target type data, wherein the classification process improves the accuracy and reliability of target identification. Through carrying out detailed analysis and classification on the movement of the target, the movement characteristics of the target can be better understood, the subsequent image processing and analysis steps are optimized, and the overall processing quality is improved.

Preferably, step S23 includes the steps of:

Step S231: when the target is confirmed to be the second underwater target type data, carrying out target boundary frame identification on the standard underwater target single-view shot image to obtain target boundary frame data;

Step S232: performing image core region segmentation on a standard underwater target single-view shot image through target boundary frame data to generate a target core region image;

step S233: gray value sampling is carried out on the target core area image, and a one-dimensional signal sequence of the target core area image is obtained; performing Hill transformation effect calculation on the target core region image according to the one-dimensional signal sequence of the target core region image to obtain underwater target single view image ambiguity data;

step S234: and performing multi-view shooting compensation on the standard underwater target single-view shooting image based on the underwater target single-view image ambiguity data, so as to generate the standard underwater target multi-view shooting image.

According to the invention, the target boundary frame identification is carried out on the standard underwater target single-view shot image, so that the boundary position of the target can be accurately determined, accurate target boundary frame data is generated, and a reliable basis is provided for subsequent processing. The image is segmented by the core area by utilizing the target boundary frame data, so that the core area image of the target can be effectively extracted, the whole image is prevented from being processed, and the processing efficiency and the processing precision are improved. And gray value sampling is carried out on the target core region image to obtain a one-dimensional signal sequence, key characteristic information of the target core region can be reserved in the process, and accurate data support is provided for ambiguity calculation. And performing Hill bit transformation effect calculation according to the one-dimensional signal sequence to obtain underwater target single-view image ambiguity data, wherein the process can effectively evaluate the image ambiguity and provide important reference for multi-view compensation. The multi-view shooting compensation is carried out based on the fuzzy data, so that the definition and detail performance of the image can be improved by increasing the view information under the condition of image blurring, and a standard underwater target multi-view shooting image can be generated. Through multi-view shooting compensation, the blurring problem of a single-view shot image is effectively improved, the overall quality and usability of the image are improved, and clearer and complete data support is provided for subsequent 3D reconstruction and analysis.

Preferably, step S234 includes the steps of:

Step S2341: performing multi-view shooting and data acquisition through an IMU (inertial measurement unit) device on an underwater camera array based on single-view image ambiguity data of an underwater target to obtain an initial target multi-view image and target inertial measurement data;

Step S2342: performing data preprocessing on the target inertial measurement data to obtain target displacement data and target attitude information data, wherein the data preprocessing comprises denoising and integration;

step S2343: and carrying out image space positioning compensation on the target displacement data, the target attitude information data and the initial target multi-view image according to the visual inertia mileage technology, and generating a standard underwater target multi-view shooting image.

According to the invention, based on the underwater target single-view image ambiguity data, the IMU equipment on the underwater camera array is utilized to carry out multi-view shooting and data acquisition, so that more comprehensive and diversified initial target multi-view images and target inertial measurement data can be obtained, and the richness and accuracy of image information are enhanced. The target inertial measurement data is preprocessed, including denoising and integration, noise interference can be effectively reduced, more accurate target displacement data and target attitude information data are obtained, and high-precision reference data are provided for subsequent image space positioning compensation. By utilizing the visual inertia technology, the target displacement data, the target posture information data and the initial target multi-view image are combined to perform image space positioning compensation, so that the image position and posture of each view can be accurately calculated, and a standard underwater target multi-view shooting image is generated. Through image space positioning compensation, the position deviation caused by camera movement and target movement in multi-view shooting can be effectively corrected, so that the generated standard multi-view shooting image is more consistent and accurate, and the error in the image reconstruction process is reduced. The combination of multi-view shooting and space positioning compensation can remarkably improve the definition and accuracy of the generated multi-view image, reduce the condition of image blurring and distortion and improve the image quality.

Preferably, step S3 comprises the steps of:

step S31: target feature point confirmation is carried out on a standard underwater target multi-view shot image and a standard underwater target single-view shot image, so as to obtain multi-view image target feature point data and single-view image target feature point data;

Step S32: carrying out image superposition on the standard underwater target multi-view shot image and the standard underwater target single-view shot image to generate a target view superposition image;

step S33: performing feature point matching on the target view coincident image through multi-view image target feature point data and single-view image target feature point data to generate a target matching feature point set;

Step S34: performing depth estimation on the target matching feature point set by using a deep learning network to generate an underwater target depth map; and carrying out depth fusion reconstruction on the target visual angle superposition image and the underwater target depth map to generate a target underwater fusion depth map.

According to the invention, target characteristic points of the standard underwater target multi-view shot image and the single-view shot image are confirmed, and the obtained target characteristic point data of the multi-view and single-view images provides accurate basic data for subsequent image superposition and characteristic point matching. The process can integrate multi-view information to provide more complete image information, and is beneficial to subsequent feature point matching and depth estimation. Through feature point matching, feature point matching is carried out on the target view angle superposition image by utilizing target feature point data of the multi-view angle image and the single-view angle image, and a target matching feature point set is generated. And carrying out depth estimation on the target matching feature point set by using a deep learning network to generate an underwater target depth map. The deep learning network can train by utilizing large-scale data, has stronger feature extraction and depth estimation capabilities, and improves the accuracy and the robustness of depth estimation. The target view angle coincident image and the underwater target depth map are subjected to depth fusion reconstruction, so that the target underwater fusion depth map can be generated, the process combines multi-view angle and depth information, more accurate and complete 3D structure information can be provided, and the quality of 3D reconstruction is improved. By combining multi-view shooting with depth estimation, the effect and the precision of 3D reconstruction can be remarkably improved, a more vivid and fine underwater target 3D model can be generated, and high-quality data support is provided for subsequent analysis and application.

Preferably, step S34 includes the steps of:

step S341: screening the characteristic point motion trail of the target matching characteristic point set to obtain an image characteristic point trail set;

step S342: constructing a depth estimation network; predicting pixel depth values of the image characteristic point track set by using a depth estimation network, and generating an underwater target preliminary depth map;

step S343: and carrying out multi-view depth map fusion on the underwater target preliminary depth map by using the target view coincident image to generate a target underwater fusion depth map.

According to the method, the characteristic point motion trail screening is carried out on the target matching characteristic point set, and the obtained image characteristic point trail set can reflect the motion trail of the target under different visual angles more accurately, so that the accuracy of depth estimation is improved. And constructing a depth estimation network, and predicting pixel depth values of the image characteristic point track set by using the network to generate an underwater target preliminary depth map. The depth estimation network can effectively utilize the characteristic point track information, and improves the accuracy of depth prediction. And predicting the pixel depth value of the characteristic point track set by using a depth estimation network, and generating an underwater target preliminary depth map. The preliminary depth map provides preliminary depth information of the target and lays a foundation for subsequent multi-view fusion. And carrying out multi-view depth map fusion on the underwater target preliminary depth map by using the target view coincident image to generate a target underwater fusion depth map. The multi-view fusion can synthesize the depth information of each view, reduce the error of single-view depth estimation and improve the accuracy and reliability of the depth map. Through multi-view depth map fusion, the challenges brought by complex light conditions and target movement in the underwater environment can be better processed, and the generated target underwater fusion depth map has higher precision and definition. The high-precision target underwater fusion depth map provides more accurate depth information for subsequent 3D reconstruction, remarkably improves the effect and precision of 3D reconstruction, and generates a more vivid and fine underwater target 3D model.

Preferably, step S4 comprises the steps of:

step S41: performing three-dimensional point cloud conversion on the target underwater fusion depth map and the standard underwater target multi-view shooting image to generate target underwater three-dimensional point cloud data;

Step S42: carrying out surface reconstruction on the target underwater three-dimensional point cloud data to generate target three-dimensional surface reconstruction data;

Step S43: performing texture mapping on the target three-dimensional surface reconstruction data to generate target three-dimensional texture mapping data;

Step S44: constructing a three-dimensional model based on the target three-dimensional surface reconstruction data and the target three-dimensional texture mapping data to generate a target underwater three-dimensional preliminary model; and performing model rendering on the target underwater three-dimensional preliminary model so as to generate a high-precision target three-dimensional underwater model to execute three-dimensional reconstruction operation of the underwater image.

According to the invention, three-dimensional point cloud conversion is carried out on the target underwater fusion depth map and the standard underwater target multi-view shot image, so that the generated target underwater three-dimensional point cloud data can accurately reflect the three-dimensional structure of the target, and a reliable data base is provided for subsequent surface reconstruction. And carrying out surface reconstruction on the target underwater three-dimensional point cloud data, wherein the generated target three-dimensional surface reconstruction data can accurately reproduce the surface details of the target, and the accuracy and the authenticity of the model are improved. By performing texture mapping on the target three-dimensional surface reconstruction data, the generated target three-dimensional texture mapping data can accurately map detail information in a photographed image to the surface of the three-dimensional model, and the visual effect and detail performance of the model are enhanced. The three-dimensional model construction is carried out based on the target three-dimensional surface reconstruction data and the target three-dimensional texture mapping data, and the generated target underwater three-dimensional preliminary model can accurately reflect the three-dimensional structure and texture details of the target, so that a foundation is provided for a final high-precision model. And performing model rendering on the target underwater three-dimensional preliminary model, wherein the generated high-precision target three-dimensional underwater model can accurately reproduce the three-dimensional form and texture details of the target, and the visual effect and the authenticity of the model are improved. Through the whole process of the step S4, the three-dimensional reconstruction effect of the underwater image can be remarkably improved, the generated high-precision three-dimensional model has higher precision and authenticity, and high-quality three-dimensional data support is provided for subsequent analysis and application.

In the present specification, there is provided a depth learning-based underwater image three-dimensional reconstruction system for performing the above-described depth learning-based underwater image three-dimensional reconstruction method, the depth learning-based underwater image three-dimensional reconstruction system comprising:

The target identification module is used for acquiring target underwater position data; performing single-view preliminary shooting according to the underwater position data of the target to obtain a preliminary shooting image of the underwater target; performing image ambiguity analysis on the primary shot image of the underwater target to generate single-view image ambiguity data of the underwater target; performing target motion recognition on the underwater target single-view image ambiguity data to generate target motion recognition data;

The multi-view compensation module is used for carrying out target classification on the standard underwater target single-view shot image according to the target motion identification data to generate first underwater target type data and second underwater target type data; performing multi-view shooting compensation on the standard underwater target single-view shooting image based on the first underwater target type data and the second underwater target type data, so as to generate a standard underwater target multi-view shooting image;

The depth estimation module is used for carrying out image superposition on the standard underwater target multi-view shot image and the standard underwater target single-view shot image to generate a target view superposition image; performing depth estimation on the target visual angle superposition image to generate an underwater target depth map; performing depth fusion reconstruction on the target visual angle superposition image and the underwater target depth map to generate a target underwater fusion depth map;

The three-dimensional reconstruction module is used for carrying out three-dimensional point cloud conversion on the target underwater fusion depth map and the standard underwater target multi-view shooting image to generate target underwater three-dimensional point cloud data; constructing a three-dimensional model of the target underwater three-dimensional point cloud data to generate a target underwater three-dimensional preliminary model; and performing model rendering on the target underwater three-dimensional preliminary model so as to generate a high-precision target three-dimensional underwater model to execute three-dimensional reconstruction operation of the underwater image.

The invention has the beneficial effects that the primary image data and the definition analysis can be provided by acquiring the underwater position data of the target and carrying out single-view primary shooting and image ambiguity analysis, so as to lay a foundation for subsequent processing. The image is classified by utilizing the target motion identification data, different underwater target type data are generated, multi-view shooting compensation is carried out, the details and the motion characteristics of the target can be more comprehensively captured, and the information richness and the accuracy of the image are improved. And overlapping the standard underwater target multi-view shot image and the single-view shot image, performing depth estimation, performing depth fusion reconstruction, and generating a high-quality underwater target depth map, wherein the steps effectively improve the depth information acquisition and the image quality of the images in the underwater scene. By carrying out three-dimensional point cloud conversion and model construction on the target underwater fusion depth map and the multi-view shooting image, a high-precision target underwater three-dimensional model is generated, and model rendering is carried out at the same time, so that the accuracy and visual effect of three-dimensional reconstruction operation are improved. Therefore, the method improves the comprehensiveness and accuracy of three-dimensional image reconstruction by carrying out image blur compensation, multi-view shooting and accurate depth estimation on the underwater target.

Drawings

FIG. 1 is a schematic flow chart of the steps of a three-dimensional reconstruction method of an underwater image based on deep learning;

FIG. 2 is a flowchart illustrating the detailed implementation of step S2 in FIG. 1;

FIG. 3 is a flowchart illustrating the detailed implementation of step S3 in FIG. 1;

FIG. 4 is a flowchart illustrating the detailed implementation of step S4 in FIG. 1;

the achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

The following is a clear and complete description of the technical method of the present invention, taken in conjunction with the accompanying drawings, and it is evident that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.

Furthermore, the drawings are merely schematic illustrations of the present invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. The functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor methods and/or microcontroller methods.

It will be understood that, although the terms "first," "second," etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments, with the term "and/or" as used herein including any and all combinations of one or more of the associated items listed.

To achieve the above objective, please refer to fig. 1 to 4, a three-dimensional reconstruction method of an underwater image based on deep learning, the method comprises the following steps:

In the embodiment of the present invention, as described with reference to fig. 1, a schematic step flow diagram of a depth learning-based three-dimensional reconstruction method for an underwater image according to the present invention is provided, and in this example, the depth learning-based three-dimensional reconstruction method for an underwater image includes the following steps:

In the embodiment of the invention, the accurate coordinates of the underwater position of the target are determined by utilizing sonar positioning equipment or an underwater positioning system. The position data may be obtained by a combination of GPS and underwater sensors, or may be acquired by an underwater robot (ROV/AUV). On the basis of the acquired underwater position data of the target, underwater imaging equipment (such as an underwater camera, a camera carried by an ROV and the like) is used for shooting the target at a single view angle. The camera equipment is ensured to be kept stable in the shooting process, and shake caused by water flow or other factors is reduced as much as possible. After shooting is completed, shooting underwater target images are acquired, and the images are used as basic data for subsequent processing. And carrying out ambiguity analysis on the shot image by using an image processing algorithm. The degree of blurring of the image can be evaluated using Laplacian transform, edge detection, etc. And generating ambiguity data reflecting the image ambiguity degree by calculating the gradient amplitude or the frequency domain characteristic of the image. The analyzed blur data is stored as structured data, which may contain specific values of blur, the location of the blur areas, and a sharpness assessment of the image as a whole. The ambiguity data is analyzed using a deep learning algorithm to identify target motion in the image. A Convolutional Neural Network (CNN) or a Recurrent Neural Network (RNN) may be used to analyze the sequence of images to determine whether there is motion, and the direction and speed of motion, of a target in the image. The identified target motion information is arranged into data, including the motion track, speed, direction and the like, and the data is used in subsequent multi-view image registration and three-dimensional reconstruction so as to improve the accuracy and consistency of reconstruction.

In the embodiment of the present invention, the targets in the single-view shot image are classified by using the target motion recognition data generated in step S1. Deep learning algorithms (e.g., convolutional neural networks, CNNs) may be used for target detection and classification. By training a predefined object classification model, objects in the image can be classified into different types. For example, the model may classify the targets into two categories, a "stationary target" and a "moving target", or more refined categories. And respectively storing the classified target data as first underwater target type data and second underwater target type data. The first underwater object type data may represent a stationary object and the second underwater object type data may represent a moving object or other category defined according to particular needs. For stationary targets (first underwater target type data), conventional multi-view shots may be performed, ensuring that enough images are shot from different angles for three-dimensional reconstruction. For a moving object (second underwater object type data), multi-view photographing compensation is required. Based on the movement track and speed data of the target, the shooting time and angle are adjusted, and a plurality of visual angle images of the target at different time points are shot. Multi-view camera compensation may be implemented using a multi-camera array, or by controlling the motion path of the ROV/AUV. And (3) arranging the image set subjected to multi-view shooting and compensation into standard underwater target multi-view shooting images, wherein the images are used for a subsequent three-dimensional reconstruction process. Good registration and consistency between images are ensured, so that the accuracy of three-dimensional reconstruction is improved.

In the embodiment of the invention, the multi-view shot image and the single-view shot image are aligned by using an image registration technology. Image registration may be performed using feature matching algorithms (e.g., SIFT, SURF) or directly with a deep learning model (e.g., a convolutional neural network-based image registration model). And after registering all the images, generating target visual angle superposition images which contain target information shot from different angles and visual points. And carrying out depth estimation on the coincident images by using a depth learning model. Common methods include single view depth estimation based on convolutional neural networks (e.g., U-Net, resNet, etc.), and multi-view based depth estimation (e.g., stereo Matching, structure-from-Motion). The result of the depth estimation will generate a depth map of the underwater target representing depth information corresponding to each pixel in the image. And combining the target visual angle superposition image with the underwater target depth map, and carrying out depth fusion reconstruction. A depth fusion algorithm (e.g., a 3D reconstruction algorithm based on a depth neural network) may be used to integrate the image information and the depth information. And generating a target underwater fusion depth map through depth fusion reconstruction, wherein the depth map can reflect the three-dimensional structure and shape of the target.

In the embodiment of the invention, three-dimensional point cloud data is generated by utilizing the target underwater fusion depth map and the multi-view image. The depth map may be converted into three-dimensional point cloud data using computer vision techniques, such as backprojection. By converting the depth information of each pixel into three-dimensional space coordinates (x, y, z), point cloud data is generated, which represents the three-dimensional structure of the underwater target. And generating a three-dimensional preliminary model of the target by using the three-dimensional point cloud data. Point cloud processing and three-dimensional modeling techniques may be used, such as surface reconstruction algorithms (e.g., poisson surface reconstruction, marching Cubes algorithms, etc.). Rendering the generated three-dimensional preliminary model to improve the visual effect of the model. The high-precision three-dimensional underwater model can be generated by adding effects such as textures, illumination, shadows and the like by using a rendering technology in computer graphics.

Preferably, step S1 comprises the steps of:

step S11: acquiring target underwater position data by using a GPS;

In the embodiment of the invention, the accurate position coordinates on the water surface are acquired by using a GPS receiver, and the coordinates are used as the position reference of the underwater target. The underwater sonar equipment (such as multi-beam sonar) is used for scanning from the subsurface, and the accurate position of the target under the water is determined by combining the GPS coordinates of the water surface. And according to the determined target underwater position, the underwater camera array is deployed at a proper position and angle, so that the target area can be covered. And starting the underwater camera array to carry out single-view shooting. Each camera is ensured to shoot at the same time point, so that a group of preliminary shooting images of the underwater targets are obtained. The image brightness is enhanced by using a histogram equalization or adaptive histogram equalization (CLAHE) technology, and the visual effect of the image is improved. The images are geometrically corrected using affine or perspective transformation, ensuring correct alignment and correction of the images. The image is smoothed using a gaussian filter to reduce noise and detail. The image blur is analyzed using a Laplacian transform to generate blur data. And performing target motion recognition by using a deep learning algorithm or a traditional image processing technology to generate target motion recognition data.

As an example of the present invention, referring to fig. 2, the step S2 in this example includes:

The embodiment of the invention is from the recognition of the motion track of the target through a sensor (such as an Inertial Measurement Unit (IMU)) and an image processing technology. The motion state of the target is analyzed in real time by using a deep learning algorithm, such as a Convolutional Neural Network (CNN) in combination with a Recurrent Neural Network (RNN). The motion of the object in the image is tracked using an optical flow algorithm, such as the Lucas-Kanade optical flow algorithm or the Farneback optical flow algorithm. The optical flow vector of the object is calculated by the pixel change between the successive frames. The optical flow vector data is combined with the target motion recognition data to generate complete target optical flow tracking data. The data includes displacement, velocity and direction information of the target at different points in time. And classifying the standard underwater target single-view shot image according to the optical flow tracking data and the target motion recognition data by using a classification algorithm (such as a Support Vector Machine (SVM) or a Convolutional Neural Network (CNN)). The targets are classified into a first underwater target type and a second underwater target type by a classification algorithm. The first type is a static moving object and the second type is a dynamic moving object. And judging the target type of each single-view image, and confirming whether the single-view image belongs to the first type or the second type. And if the image is confirmed to be the first underwater target type data, outputting an original single-view image. An image quality evaluation algorithm (such as a Laplace transform) is used to evaluate the image for blur, and blur data is obtained. Based on the blur data, an image that requires multi-view compensation is determined. The target image is re-shot from different angles by utilizing a multi-view shooting technology (such as structured light or stereoscopic vision), so that the defect of single-view shooting is overcome. Through image stitching and fusion algorithms (such as a multi-view geometric reconstruction algorithm), standard underwater target multi-view shooting images are generated, the images provide more comprehensive and detailed target information, and the accuracy and quality of three-dimensional reconstruction are improved.

In an embodiment of the invention, adjacent frames of a standard underwater target single view shot image are selected based on target motion identification data, and are typically images that are continuous in time and correlated to target motion. Adjacent frame images are processed by using an optical flow algorithm, and optical flow vectors of the target between successive frames are calculated, wherein the vectors represent the motion track and speed of the target in time. The target position offset in the optical flow vector is analyzed based on target optical flow tracking data that describe the position change of the target between successive frames. The target position offset data is periodically analyzed using the target motion identification data, which includes identifying and analyzing a periodic motion pattern, such as periodic oscillations or repetitive motion, that exists for the target. Based on the target movement period analysis data, the movement rule of the target is explored, and the movement behavior of the target is classified into two types of rule and non-rule. Classifying according to target motion law exploration data: if the target exhibits an explicit law of motion (e.g., periodic motion), the single view captured image is classified as first underwater target type data. If the target exhibits irregular motion (such as random motion or no apparent period), the single view shot image is classified as second underwater target type data.

Preferably, step S23 includes the steps of:

In the embodiment of the invention, when the standard underwater target single-view shot image is confirmed to be the second underwater target type data, a target detection algorithm (such as YOLO, fast R-CNN and the like) is used for identifying target boundary frames in the image, and the boundary frames define the positions and the boundaries of the targets in the image. And extracting a core region of the target from the standard underwater target single-view shot image based on the target boundary box data. An image segmentation algorithm (e.g., an area-growth-based algorithm or a deep-learning-based semantic segmentation model) may be used to separate the target from the background, generating a target core area image. The image of the target core region is gray value sampled and converted into a one-dimensional signal sequence describing the gray value variation of the target core region in the image. And analyzing the ambiguity data in the one-dimensional signal sequence by using a Hill-bit transformation effect calculation method. Hill-bit transforms are commonly used to evaluate sharpness or blur of images, particularly in underwater environments, to effectively evaluate the quality and sharpness of images. Based on the underwater target single view image blur degree data, an image part needing multi-view compensation is determined. And a multi-view shooting technology is adopted to re-shoot the fuzzy area of the target image from different angles so as to improve the definition and detail presentation of the image.

Preferably, step S234 includes the steps of:

In the embodiment of the invention, the underwater camera array is arranged to carry out multi-view shooting based on the single-view image ambiguity data of the underwater target, and the cameras can be arranged at different positions and angles to cover multiple views of the target. Meanwhile, inertial measurement data of a target are acquired in real time through IMU (inertial measurement unit) equipment on an underwater camera array, and the data comprise measurement results of an accelerometer and a gyroscope and are used for subsequent motion state estimation and attitude calculation. Preprocessing the collected target inertial measurement data, including denoising and integrating: denoising: noise from the sensor is removed using digital filters or other signal processing techniques. And (3) integration treatment: integrating the denoised acceleration data, and calculating displacement data of the target; meanwhile, attitude information (such as a rotation angle) of the target is calculated using the gyroscope data. And (3) performing image space positioning compensation by combining target displacement data and attitude information data and an initial target multi-view image by using a visual inertial mileage technology: displacement data: a change in the position of the object in space is determined. Attitude information data: the posture change of the target in the photographed images at different angles is corrected. And (3) adjusting the initial multi-view shooting image by combining the compensated displacement and posture information to generate a standard underwater target multi-view shooting image, wherein the images integrate the view angles from different angles, and more comprehensive and accurate target information is provided.

As an example of the present invention, referring to fig. 3, the step S3 in this example includes:

In the embodiment of the invention, the characteristic points, which can be key points, angular points or other obvious characteristics, are extracted and confirmed by the multi-view shot images of the standard underwater target and the single-view shot images of the standard underwater target, so that the uniqueness of the target in different view angles and images is described. And carrying out image superposition on the standard underwater target multi-view shot image and the standard underwater target single-view shot image, wherein the step ensures that the images of different view angles can be aligned under the same coordinate system to form a target view angle superposition image. And performing feature point matching on the target view coincident image by using the multi-view image target feature point data and the single-view image target feature point data. Common approaches include feature descriptor based matching algorithms such as SIFT, SURF, or deep learning network extracted features. And carrying out depth estimation on the target matching characteristic point set by using a deep learning network. Depth estimation may infer the depth distribution of the target in space based on the location of the matching feature points and the disparity information. And carrying out depth fusion reconstruction on the target visual angle superposition image and a depth estimation result (underwater target depth map), wherein the step combines visual information and depth information to generate a more accurate and fine target underwater fusion depth map.

Preferably, step S34 includes the steps of:

In the embodiment of the invention, the motion trail analysis and screening are carried out on the target matching characteristic point sets, the characteristic points can be obvious points in images under different visual angles, and the continuity information of the characteristic points in time and space can be obtained by tracking the motion trail of the characteristic points. The depth estimation network is constructed, typically using a deep learning based approach, such as Convolutional Neural Network (CNN) or variants thereof. The network is used to learn predicted pixel-level depth values from the image feature point trajectories. And processing the image characteristic point track set by using a depth estimation network, and predicting the depth value of each pixel point, wherein the depth value reflects the depth information of the target in the multi-view image. The target visual angle superposition image is used as a reference, multi-visual angle depth map fusion is carried out on the underwater target primary depth map, the process can be combined with the depth information under each visual angle, the accuracy and the integrity of the depth map are improved, and a training data set is specifically collected and prepared. The dataset should include underwater multiview images and their corresponding depth ground truth values or depth maps. Analog data or actual acquired data may be used to ensure coverage of different underwater environments and target types. A suitable deep learning network architecture, such as a Convolutional Neural Network (CNN), is selected. Common depth estimation networks include: single view depth estimation network: a single image is processed and the depth value for each pixel is predicted. Multi-view depth estimation network: depth estimation is performed using image information from multiple perspectives, which may be processed in combination with multiple input images through a convolutional neural network. Depth estimation methods based on self-encoders or generating countermeasure networks: the depth image is generated using the generation capabilities of the network. The depth estimation network is trained using the prepared training data set. A loss function needs to be defined during the training process, and typically a depth difference or other loss function is used to compare the difference between the network generated depth map and the real depth map. And carrying out data enhancement and regularization on the training data to improve the generalization capability and the robustness of the network. The data enhancement technology comprises operations such as image rotation, scaling, overturning and the like, and regularization can control overfitting through means such as batch normalization, dropout and the like. Network parameters are optimized using gradient descent or variants thereof to minimize the loss function. Learning rate adjustment strategies and batch training techniques can be employed to optimize convergence speed and stability of the network. During the training process, the performance of the network on the validation set is monitored and the network architecture or training parameters are adjusted based on the performance.

As an example of the present invention, referring to fig. 4, the step S4 includes, in this example:

In the embodiment of the invention, the depth information is converted into three-dimensional point cloud data by using the target underwater fusion depth map and the standard underwater target multi-view shot image, and the three-dimensional point cloud data can be realized by reconstructing space coordinates from the depth image and converting the space coordinates into a point cloud format. And carrying out surface reconstruction on the generated target underwater three-dimensional point cloud data. Common surface reconstruction methods include: point cloud based mesh reconstruction: the point cloud data is used to generate a gridded triangular surface to form the precise three-dimensional geometry of the target. Voxel grid based method: the point cloud data is converted into a voxel grid and then a three-dimensional surface is reconstructed using a voxel surface extraction algorithm. Texture mapping is performed on the three-dimensional surface reconstruction data of the target, and texture information from a standard underwater target multi-view photographed image is mapped onto the three-dimensional surface, which can be achieved by using an image projection technology or a method of combining the texture information with a three-dimensional geometric model. And constructing a three-dimensional model based on the target three-dimensional surface reconstruction data and the texture mapping data, wherein the three-dimensional model comprises integrating surface geometric information and texture information to form a target underwater three-dimensional preliminary model. And rendering the target underwater three-dimensional preliminary model. The rendering process can adopt ray tracing or real-time rendering technology to project the three-dimensional model into a two-dimensional image so as to generate a visual effect of the high-precision target three-dimensional underwater model.

The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. The underwater image three-dimensional reconstruction method based on deep learning is characterized by comprising the following steps of:

2. The depth learning based underwater image three-dimensional reconstruction method according to claim 1, wherein the step S1 comprises the steps of:

step S11: acquiring target underwater position data by using a GPS;

3. The depth learning based underwater image three-dimensional reconstruction method according to claim 1, wherein the step S2 comprises the steps of:

4. A depth learning based underwater image three-dimensional reconstruction method according to claim 3, wherein the object classification of the standard underwater object single view shot image by the object optical flow tracking data and the object motion recognition data comprises:

5. A depth learning based underwater image three-dimensional reconstruction method according to claim 3, wherein the step S23 comprises the steps of:

6. The depth learning based underwater image three-dimensional reconstruction method according to claim 5, wherein the step S234 comprises the steps of:

7. The depth learning based underwater image three-dimensional reconstruction method according to claim 1, wherein the step S3 comprises the steps of:

8. The depth learning based underwater image three-dimensional reconstruction method according to claim 7, wherein the step S34 comprises the steps of:

9. The depth learning based underwater image three-dimensional reconstruction method according to claim 1, wherein the step S4 comprises the steps of:

10. A depth learning based underwater image three-dimensional reconstruction system for performing the depth learning based underwater image three-dimensional reconstruction method as set forth in claim 1, the depth learning based underwater image three-dimensional reconstruction system comprising: