CN116958393A

CN116958393A - Incremental image rendering method and device

Info

Publication number: CN116958393A
Application number: CN202310964562.7A
Authority: CN
Inventors: 赵飞飞; 刘祥德; 于金波; 严旭; 王梦魁
Original assignee: Beijing Digital City Research Center
Current assignee: Beijing Digital City Research Center
Priority date: 2023-08-02
Filing date: 2023-08-02
Publication date: 2023-10-27

Abstract

The application discloses an incremental image rendering method and device, and relates to the technical field of computer vision and computer graphics. The method comprises the following steps: acquiring scene data; if the moving object appears in the scene data, deleting the moving object to obtain target data; training the first nerve radiation field by utilizing the target data to obtain the color and the volume density of the sampling point corresponding to the target data; training the second nerve radiation field based on the trained first nerve radiation field by utilizing the target data and the color and the volume density of the sampling points corresponding to the target data; and acquiring sampling point data and volume density corresponding to all target data by means of the trained second nerve radiation field, and integrating to obtain an image rendering result. Therefore, the moving objects separated from the scene data can be accurately deleted under the condition of incremental reconstruction of the same scene, and images with higher quality and clearer details can be rendered when the moving objects shield the scene.

Description

Incremental image rendering method and device

Technical Field

The application relates to the technical field of computer vision and computer graphics, in particular to an incremental image rendering method and device.

Background

The nerve radiation field is a three-dimensional reconstruction technology based on deep learning, and can reconstruct target scenes in a plurality of two-dimensional images or videos into a high-quality three-dimensional model, so that the nerve radiation field is widely applied to the fields of virtual reality, augmented reality, game making and the like.

In the related art, a neural network model is used for representing a radiation field of a three-dimensional scene in a neural radiation field, and the color and transparency of each sampling point in a two-dimensional image or video can be obtained by inputting the direction information and the distance information of the sampling point into the neural network model, so that a vivid image is rendered and generated.

However, when a moving object shielding the target scene appears in the two-dimensional image or video, effective direction information and distance information cannot be obtained from a part of the target scene area, so that an image of the target scene area cannot be accurately rendered.

Disclosure of Invention

In view of the above, the embodiments of the present application provide a method and an apparatus for incremental image rendering, which can accurately render a target scene image when a moving object shielding the target scene appears in a two-dimensional image or video.

The embodiment of the application discloses the following technical scheme:

in a first aspect, the present application provides a method of incremental image rendering, the method comprising:

acquiring scene data;

if a moving object appears in the scene data, deleting the moving object which is segmented according to a semantic segmentation model in the scene data to obtain target data, wherein the moving object is an object which violates the assumption of scene geometric invariance;

training the first nerve radiation field by utilizing the target data to obtain a trained first nerve radiation field, and the color and the volume density of a sampling point corresponding to the target data;

training a second nerve radiation field based on the trained first nerve radiation field by using the target data and the color and the volume density of the sampling points corresponding to the target data to acquire the trained second nerve radiation field;

acquiring sampling point data and volume density corresponding to all target data by means of the trained second nerve radiation field;

and integrating the sampling point data and the volume density corresponding to all the target data to obtain an image rendering result.

Optionally, the acquiring scene data includes:

acquiring original data;

and executing a generating potential optimization algorithm on the original data, and generating appearance embedding of the original data to acquire scene data.

Optionally, if a moving object appears in the scene data, deleting the moving object segmented according to the semantic segmentation model in the scene data includes:

detecting whether the scene data comprises a moving object according to a target detection method of the subdivision model;

and if the scene data comprises the moving object, deleting the moving object segmented according to the semantic segmentation model in the scene data.

Optionally, the scene data includes first scene data and second scene data; the detecting whether the scene data includes a moving object includes:

identifying object categories in the first scene data and object categories in the second scene data;

if the object category in the first scene data is the same as the object category in the second scene data, acquiring the similarity of the object in the first scene data and the object in the second scene data;

and if the similarity is smaller than or equal to a first preset threshold value, judging that the object type in the first scene data and the object in the second scene data are moving objects.

Optionally, the method further comprises:

and if the object type in the first scene data is different from the object type in the second scene data, judging that the object type in the first scene data and the object in the second scene data are moving objects.

Optionally, a calculation formula of the similarity is specifically as follows:

wherein Sim is _i Is of similarity, A _i B is a feature vector of an object in the first scene data _i Is a feature vector of an object in the second scene data.

Optionally, the training the second nerve radiation field based on the trained first nerve radiation field includes:

determining a total loss function based on the trained first neural radiation field, the total loss function related to a segmentation mask, a bulk density loss function, a color loss function, and a distillation loss function;

training a second neural radiation field according to the total loss function.

Optionally, the formula of the distillation loss function is specifically as follows:

wherein L is _dis Is regularized loss function, x is position information, d is view angle direction, F _t (x, d) is the result of inputting positional information and viewing direction into the trained first neural radiation field, F _s (x, d) is the result of inputting the positional information and the viewing direction to the second nerve radiation field.

In a second aspect, the present application discloses an image rendering device of a neural radiation field, the device comprising: the system comprises a first acquisition module, a deletion module, a second acquisition module, a third acquisition module, a fourth acquisition module and an integration module;

the first acquisition module is used for acquiring scene data;

the deleting module is used for deleting the moving object segmented according to the semantic segmentation model in the scene data to obtain target data if the moving object appears in the scene data, wherein the moving object is an object against the scene geometric invariance assumption;

the second acquisition module is used for training the first nerve radiation field by utilizing the target data so as to acquire the trained first nerve radiation field, and the color and the volume density of the sampling point corresponding to the target data;

the third obtaining module is configured to train a second nerve radiation field based on the trained first nerve radiation field by using the target data and the sampling point color and the volume density corresponding to the target data, so as to obtain the trained second nerve radiation field;

the fourth acquisition module is used for acquiring sampling point data and volume density corresponding to all target data by means of the trained second nerve radiation field;

and the integration module is used for integrating the sampling point data and the volume density corresponding to all the target data to obtain an image rendering result.

Optionally, the deletion module specifically includes: a detection sub-module and a deletion sub-module;

the detection submodule is used for detecting whether the scene data comprise moving objects according to a target detection method of the subdivision model;

and the deleting sub-module is used for deleting the moving object segmented according to the semantic segmentation model in the scene data if the moving object is included in the scene data.

Compared with the prior art, the application has the following beneficial effects:

the application discloses an incremental image rendering method and device, wherein the method comprises the following steps: acquiring scene data; if the moving object appears in the scene data, deleting the moving object which is segmented according to the semantic segmentation model in the scene data to obtain target data, wherein the moving object is an object which violates the assumption of geometric invariance of the scene; training the first nerve radiation field by utilizing the target data to obtain a trained first nerve radiation field, and the color and the volume density of a sampling point corresponding to the target data; training the second nerve radiation field based on the trained first nerve radiation field by utilizing the target data and the color and the volume density of the sampling points corresponding to the target data to acquire the trained second nerve radiation field; acquiring sampling point data and volume density corresponding to all target data by means of the trained second nerve radiation field; and integrating the sampling point data and the volume density corresponding to all the target data to obtain an image rendering result. Therefore, the moving objects separated from the scene data can be accurately deleted under the condition of incremental reconstruction of the same scene, and images with higher quality and clearer details can be rendered when the moving objects shield the scene.

Drawings

In order to more clearly illustrate this embodiment or the technical solutions of the prior art, the drawings that are required for the description of the embodiment or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of an incremental image rendering method according to an embodiment of the present application;

FIG. 2 is a flow chart of a method for detecting a moving object according to an embodiment of the present application;

FIG. 3 is a schematic diagram of detecting a moving object according to an embodiment of the present application;

FIG. 4 is a schematic diagram of input and output of a neural radiation field according to an embodiment of the present application;

FIG. 5 is a schematic diagram of image rendering according to an embodiment of the present application;

fig. 6 is a schematic diagram of an incremental image rendering apparatus according to an embodiment of the present application.

Detailed Description

The neural radiation field (Representing Scenes as Neural Radiance Fields for View Synthesis, neRF) is a three-dimensional reconstruction technique based on deep learning, which can reconstruct scenes in a plurality of two-dimensional images or videos into a high-quality three-dimensional model, and is widely applied to the fields of virtual reality, augmented reality, game making and the like.

In the related art, a neural radiation field uses a neural network model to represent a radiation field of a three-dimensional scene, and the color and transparency of each point in a two-dimensional image or video can be obtained by inputting the direction information and the distance information of the point into the neural network model, so that a vivid image is rendered and generated.

However, neural radiation fields may be subject to some challenges when moving objects occluding the target scene appear in a two-dimensional image or video.

On the one hand, occlusion may result in a part of the target scene area not being able to obtain valid direction information. This is because the neural radiation field is trained and generated based on two-dimensional image or video data captured from different perspectives, and the presence of occlusions may result in missing or inaccurate directional information for these target scene regions, thereby failing to accurately render an image of the target scene region.

On the other hand, occlusion also causes a problem that depth information is discontinuous in a part of the target scene area. This is because the neural radiation field is to sample points in the target scene using rays from different perspectives and render according to the depth values of the sampled points. When occlusion exists, depth information may be discontinuous, and it is also difficult for the neural radiation field to accurately model the occluded region, and an image of the target scene region cannot be accurately rendered.

In view of the above, the application discloses a method and a device for incremental image rendering, wherein the method comprises the following steps: acquiring scene data; if the moving object appears in the scene data, deleting the moving object which is segmented according to the semantic segmentation model in the scene data to obtain target data, wherein the moving object is an object which violates the assumption of geometric invariance of the scene; training the first nerve radiation field by utilizing the target data to obtain a trained first nerve radiation field, and the color and the volume density of a sampling point corresponding to the target data; training the second nerve radiation field based on the trained first nerve radiation field by utilizing the target data and the color and the volume density of the sampling points corresponding to the target data to acquire the trained second nerve radiation field; acquiring sampling point data and volume density corresponding to all target data by means of the trained second nerve radiation field; and integrating the sampling point data and the volume density corresponding to all the target data to obtain an image rendering result. Therefore, the moving objects separated from the scene data can be accurately deleted under the condition of incremental reconstruction of the same scene, and images with higher quality and clearer details can be rendered when the moving objects shield the scene.

In order to make the present application better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present application with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

With reference to figure 1 of the drawings, the figure is a flow chart of an incremental image rendering method provided by the embodiment of the application. The method comprises the following steps:

s101: scene data is acquired.

The scene data refers to image data or video data for multi-angle shooting of a target scene. The scene may be a real scene or may be a simulated scene (i.e., a synthetic dataset).

It should be noted that the above scene data may be RGB image data or RGB video data, or may be image data or video data in other formats, and the present application is not limited to a specific format.

It should be noted that, if the scene corresponding to the obtained original scene data (hereinafter referred to as the original data for short) is an outdoor real scene, the brightness in the original data may not be uniform, or even the difference is large, and thus the scene image may not be accurately rendered due to the change of the real factors such as the weather condition, the illumination condition, etc. Therefore, it is possible to process the original data, obtain processed scene data, and perform subsequent operations based on the processed scene data.

In some specific implementations, after the raw data is acquired, a generating potential optimization algorithm (Generative Latent Optimization, GLO) may be performed on the raw data to generate an appearance embedding (appearance embedding) of the raw data, so as to perform global illumination unification processing, and obtain processed scene data. The appearance embedding has good processing capability on the change of the real factors such as weather conditions, illumination conditions and the like, and can realize interpolation between different illuminations.

It will be appreciated that if the original scene data obtained is scene data with constant illumination and good weather conditions, the above processing steps need not be performed. The present application is not limited in this regard.

S102: whether the scene data includes a moving object is detected, if yes, S103 is executed, and if no, S104 is executed.

Moving objects refer to objects that violate scene geometry invariance assumptions. After the scene data is acquired, it is necessary to detect whether a moving object is included in the scene data. If the scene data includes a moving object, the step S103 and subsequent operations need to be performed.

In some specific implementations, the detection of moving objects may be performed on scene data based on a target detection method provided by a subdivision model (Segment Anything Model, SAM). Where a subdivision model is a model for semantic segmentation of an image, it aims at achieving accurate segmentation of various objects or regions in the image such that each pixel is assigned to the correct semantic class. The application of subdivision models is very widespread. For example, in the field of computer vision and image processing, subdivision models may be used for tasks such as object detection, image segmentation, scene understanding, and virtual reality.

Referring to fig. 2, which is a flowchart of a method of detecting a moving object according to an embodiment of the present application, detection of the moving object may be performed based on the flowchart. The method comprises the following steps:

s201: object categories in the first scene data and the second scene data are identified.

The first scene data and the second scene data are both scene data acquired in the step S101, so long as different acquisition times are ensured. Specifically, the first scene data may be acquired first and then the second scene data may be acquired first, or the second scene data may be acquired first and then the first scene data may be acquired. For ease of understanding, the following description will be given for embodiments in which the first scene data is acquired and then the second scene data is acquired.

After the first scene data and the second scene data are acquired, the object category in the first scene data and the object category in the second scene data need to be identified.

Referring to fig. 3, a schematic diagram of detecting a moving object according to an embodiment of the present application is shown. First, object categories in the first scene data 10 and the second scene data 20 need to be identified, respectively. As can be seen from the figure, all objects in the first scene data 10 are: the house 11 and the girl 12 have the object categories of house and child respectively, and all objects in the second scene data 20 are house 21, girl 22 and automobile 23, and the object categories of house, child and vehicle respectively.

S202: whether the object type in the first scene data is the same as the object type in the second scene data is judged, if yes, S203 is executed, and if not, S205 is executed.

After the object categories in the first scene data and the second scene data are acquired, it is necessary to determine whether the object category in the first scene data matches the object category in the second scene data, that is, whether the object categories are the same. If the two are the same, the step of S203 is performed, and if the two are different, the step of S205 is performed.

In some specific implementations, if the first scene data and the second scene data include a plurality of object categories, it is necessary to sequentially determine whether the first object in the first scene data matches a certain object in the second scene data. If so, S203 is performed for the first object in the first scene data and the corresponding object in the second scene data, and if not, S205 is performed for the first object. Then, it is continued to determine whether the second object in the first scene data matches a certain object in the second scene data, and so on.

The above description is made on the basis of the first scene data to determine whether or not there is an object corresponding to the object in the first scene data in the second scene data, or on the basis of the second scene data to determine whether or not there is an object corresponding to the object in the second scene data in the first scene data, which is not limited to this application.

Illustratively, as shown in fig. 3, the categories of the house 11 and the house 21 are houses, so that both match, the step of S203 may be performed for the house 11 and the house 21. The categories of the girl 12 and the boy 22 are children, and thus the two match, the step of S203 may be performed for the girl 12 and the boy 22. However, there is no object in the first scene data 10 that matches the car 23 in the second scene data 20, so the step of S205 may be performed for that car 23.

S203: similarity of objects in the first scene data and objects in the second scene data is calculated.

If the object class in the first scene data and the object class in the second scene data match, then the similarity of the feature vector of the object in the first scene data and the object in the second scene data may continue to be calculated.

In some specific implementations, the similarity of the object in the first scene data to the object in the second scene data may be calculated according to the following equation (1).

Wherein Sim is _i Is of similarity, A _i For the feature vector of the object in the first scene data, B _i Is a feature vector of an object in the second scene data.

It will be appreciated that the above-mentioned similarity Sim _i The value range of (C) is [ -1,1]The closer the value of the similarity is to 1, the more similar the feature vector representing the object of the first scene data and the feature vector of the object of the second scene data, and the closer the value of the similarity is to-1, the more dissimilar the feature vector representing the object of the first scene data and the feature vector of the object of the second scene data.

S204: and judging whether the similarity is smaller than or equal to a first preset threshold, if so, executing S205.

After the step S203 is performed to calculate the similarity between the object in the first scene data and the object in the second scene data, it may be determined whether the similarity is less than or equal to the first preset threshold.

In some specific implementations, the first preset threshold may be 0.5. That is, if the similarity is less than or equal to 0.5, it is indicated that the object in the first scene data and the object in the second scene data are not similar, and the step of S205 may be continued. It should be noted that, for a specific first preset threshold, the present application is not limited, and the first preset threshold may be set to 0.4, 0.3, or the like.

It will be appreciated that if the similarity is greater than the first predetermined threshold, it indicates that the object in the first scene data is very similar to the object in the second scene data, and it may be determined that the object is not a moving object, without performing subsequent steps.

Illustratively, as shown in fig. 3, if the similarity between the house 11 and the house 21 is greater than the first preset threshold, the house 11 is proved to be very similar to the house 21, and no subsequent steps are required. If the similarity between the girl 12 and the boy 22 is smaller than the first preset threshold, it is verified that the girl 12 and the boy 22 are not similar, and step S205 is performed.

S205: the object is determined to be a moving object.

For example, as shown in fig. 3, after the steps of S201 to S204 described above are performed, the girl 12, the boy 22, and the automobile 23 may all be determined as moving objects. Therefore, the moving object in the scene data can be judged, the moving object in the scene data is further accurately segmented, and the interference of the moving object in the scene data is further eliminated.

In the above embodiment, the description was given taking the scene data as the image data as an example, and if the scene data is the video data, the object whose coordinates are being changed in the video data may be directly used as the moving object, and the S103 and subsequent operations may be performed.

S103: and deleting the moving objects segmented according to the semantic segmentation model in the scene data to obtain target data.

After the moving objects are detected to be included in the scene data, the moving objects in the scene data can be separated based on the semantic segmentation model of the subdivision model, and a mask is added, so that the moving objects in the scene data are deleted, and target data are acquired.

S104: the target data is input into the neural radiation field to obtain the color and volume density of each sampling point of the target data.

After the target data is acquired, the target data may be input into a neural radiation field. Referring to fig. 4, a schematic diagram of input and output of a neural radiation field according to an embodiment of the present application is shown. Since the target data includes position information x= (x, y, z) of the 3D point of the scene and camera view direction d= (θ, Φ), the neural radiation field F _θ Can output the purposeThe color RGB and the volume density σ of the spatial point of each sampling point of the target data.

It is understood that if it is detected that the moving object is not included in the scene data, the scene data may be directly used as the target data, and the step S104 and the subsequent operations may be performed.

It should be noted that, during the process of inputting the target data into the nerve radiation field, as the input target data increases, the nerve radiation field may be forgotten catastrophically. Catastrophic forgetfulness refers to the fact that the neural radiation field forgets knowledge of historical target data that was learned previously when acquiring currently available target data for training. To this end, the present application introduces a distillation (distillation) technique in an incremental learning technique (Incremental Learning).

The incremental learning technique is a machine learning technique, and can continuously learn new target data on an original nerve radiation field model without retraining the whole nerve radiation field model.

Distillation technology is one of incremental learning technology, and the goal of continuing to learn new target data on a student model is achieved by teaching knowledge of a large model (teacher model) that has been trained into a small model (student model). The use of distillation techniques allows the student model to converge faster and better generalization in processing new target data as new target data continues to be learned on a large model that has already been trained.

In some specific implementations, the target data may be first utilized to train the first neural radiation field, so as to obtain a trained first neural radiation field, and a sampling point color and a body density corresponding to the target data; secondly, training the second nerve radiation field based on the trained first nerve radiation field by utilizing the target data and the color and the volume density of the sampling points corresponding to the target data to obtain the trained second nerve radiation field; finally, acquiring sampling point data and volume density corresponding to all target data by means of the trained second nerve radiation field. Wherein the first nerve radiation field is teacher model F _t Second godThe radiated field is a student model F _s 。

That is, in a first step, a first nerve radiation field and a second nerve radiation field may be established first. And secondly, inputting the first target data into a first nerve radiation field, and distilling the first target data by using the first nerve radiation field to obtain a first sampling point color and a first volume density of the first target data. Third, updating the second nerve radiation field based on the first sampling point color, the first volume density, and the first target data. And a fourth step of inputting the second target data into a second nerve radiation field, and distilling the second target data by using the second nerve radiation field to obtain a second sampling point color and a second volume density of the second target data. And fifthly, updating the first nerve radiation field based on the color and the volume density of the second sampling point and the second target data. And the like, until the input of all the target data is completed, acquiring the color and the volume density of the sampling points of all the target data.

In some specific implementations, the training the second neural radiation field based on the trained first neural radiation field to obtain a trained second neural radiation field may include: from the trained first neural radiation field, a total loss function may be determined, and from the total loss function, the second neural radiation field is trained, resulting in a trained second neural radiation field. Wherein the total loss function is related to the segmentation mask, the bulk density loss function, the color loss function, the distillation loss function.

Illustratively, the above total loss function may be represented by the following formula (2):

wherein L is _loss As a total loss function, L _σ As a bulk density loss function, L _rgb As a color loss function, M _mask For the segmentation mask, λ is the weighting factor, L _dis Is a distillation loss function.

In particular, the above-mentioned body densityDegree loss function L _σ Can be represented by the following formula (3):

wherein L is _σ For the volume density loss function, R is a set of rays, σ, for one or more camera views _t-1 (r) is the volume density, σ, of the output of the trained first neural radiation field _t (r) is the bulk density of the output of the trained second neural radiation field.

Specifically, the color loss function L _rgb Can be represented by the following formula (4):

wherein L is _rgb For colour loss function, R is a set of rays of one or more camera views, c _t-1 (r) sample point color for the output of the trained first neural radiation field, c _t (r) is the sample point color of the output of the trained second neural radiation field.

Specifically, the distillation loss function L _dis Can be represented by the following formula (5):

wherein L is _dis Is a distillation loss function, x is position information, d is viewing angle direction, F _t (x, d) is the result of inputting the position information and the view direction into the trained first neural radiation field, F _s (x, d) is the result of inputting the positional information and the viewing direction to the second nerve radiation field.

By adopting the mode of conducting fine tuning training on the nerve radiation field by using the alternate distillation strategy, the second nerve radiation field can learn new knowledge from new target data and can learn old knowledge from the first nerve radiation field, so that the forgetting problem is relieved.

S105: and integrating the color and the volume density of the sampling points to obtain an image rendering result.

The color and the Volume density of each sampling point of the target data obtained in step S104 are integrated, and voxel Rendering (Volume Rendering) is performed to obtain an image Rendering result.

Referring to fig. 5, a schematic diagram of image rendering according to an embodiment of the present application is shown. The figure shows the overall framework of image rendering based on neural radiation fields proposed by the present application. Firstly, after T different Scene data Scene1 to SceneT are acquired, detecting whether moving objects exist in the Scene data, if so, deleting the moving objects to acquire target data, then inputting the target data into a nerve radiation field, and if not, directly inputting the Scene data into the nerve radiation field as the target data. In each step t of inputting target data into the neural radiation field so as to interfere with acquisition of sampling point color and volume density, the student model simultaneously acquires data from currently available dataAnd learning from old knowledge from the teacher model. And judging the newly added object by using a target detection algorithm in each newly added target data, and rejecting the newly added moving object by using the SAM. The trained student model will act as a teacher model for the next step, imparting learned knowledge to the students, and iterating the process. After all the sampling point colors and the volume densities are obtained, the sampling point colors and the volume densities can be integrated, so that an image rendering result is obtained.

In summary, the application discloses an incremental image rendering method, which can accurately delete a moving object separated from scene data under the condition of incremental reconstruction of the same scene, so that an image with higher quality and clearer details can be rendered when the moving object shields the scene.

Referring to fig. 6, a schematic diagram of an incremental image rendering apparatus according to an embodiment of the present application is shown. The incremental image rendering apparatus 600 includes: a first acquisition module 601, a deletion module 602, a second acquisition module 603, a third acquisition module 604, a fourth acquisition module 605, and an integration module 606.

Specifically, the first obtaining module 601 is configured to obtain scene data;

the deleting module 602 is configured to delete, if a moving object appears in the scene data, the moving object in the scene data that is segmented according to the semantic segmentation model, so as to obtain target data, where the moving object is an object that violates a scene geometry invariance assumption;

the second obtaining module 603 is configured to train the first neural radiation field by using the target data, so as to obtain a trained first neural radiation field, and a sampling point color and a volume density corresponding to the target data;

a third obtaining module 604, configured to train the second nerve radiation field based on the trained first nerve radiation field by using the target data and the color and the volume density of the sampling point corresponding to the target data, so as to obtain the trained second nerve radiation field;

a fourth obtaining module 605, configured to obtain sampling point data and volume densities corresponding to all target data by using the trained second neural radiation field;

and the integration module 606 is used for integrating the sampling point data and the volume density corresponding to all the target data to obtain an image rendering result.

In some specific implementations, the first obtaining module 601 specifically includes: the device comprises a first acquisition sub-module and a second acquisition sub-module. Specifically, the first obtaining submodule is used for obtaining original data; the second obtaining sub-module is used for executing a generating potential optimization algorithm on the original data, and generating appearance embedding of the original data so as to obtain scene data.

In some specific implementations, the deletion module 602 specifically includes: a detection sub-module and a deletion sub-module; the detection sub-module is used for detecting whether the scene data comprise moving objects according to a target detection method of the subdivision model; the deleting submodule is used for deleting the moving objects segmented according to the semantic segmentation model in the scene data if the moving objects are included in the scene data.

In some specific implementations, the scene data includes first scene data and second scene data, and then the detection sub-module specifically includes: the device comprises an identification sub-module, a third acquisition sub-module and a first judgment sub-module. Specifically, the identification sub-module is used for identifying the object category in the first scene data and the object category in the second scene data; a third obtaining sub-module, configured to obtain a similarity between an object in the first scene data and an object in the second scene data if the object class in the first scene data is the same as the object class in the second scene data; the first judging sub-module is used for judging that the object type in the first scene data and the object in the second scene data are moving objects if the similarity is smaller than or equal to a first preset threshold value.

In some specific implementations, the detection sub-module further includes a second determination sub-module. Specifically, the second determining submodule is configured to determine that the object type in the first scene data and the object in the second scene data are moving objects if the object type in the first scene data and the object type in the second scene data are different.

In some specific implementations, the calculation formula of the similarity is specifically shown in the following formula (6):

In some specific implementations, the second input submodule specifically includes: the determination sub-module and the training sub-module. In particular, the determining submodule is configured to determine a total loss function based on the trained first neural radiation field, the total loss function being related to the segmentation mask, the bulk density loss function, the color loss function, and the distillation loss function; the training sub-module is used for training the second nerve radiation field according to the total loss function.

In some specific implementations, the above-mentioned distillation loss function is specifically expressed as the following formula (7):

wherein L is _dis Is regularized loss function, x is position information, d is view angle direction, F _t (x, d) is the result of inputting the position information and the view direction into the trained first neural radiation field, F _s (x, d) is the result of inputting the positional information and the viewing direction to the second nerve radiation field.

In summary, the application discloses an incremental image rendering device, which comprises a first acquisition module, a deletion module, a second acquisition module, a third acquisition module, a fourth acquisition module and an integration module. Therefore, the moving objects separated from the scene data can be accurately deleted under the condition of incremental reconstruction of the same scene, and images with higher quality and clearer details can be rendered when the moving objects shield the scene.

The embodiment of the application also provides corresponding generating equipment and a computer storage medium, which are used for realizing the scheme provided by the embodiment of the application.

The device comprises a memory for storing instructions or code and a processor for executing the instructions or code to cause the device to perform an incremental image rendering method of the present application.

The computer storage medium stores code, and when the code is executed, the device for executing the code realizes the incremental image rendering method.

The "first" and "second" in the names of "first", "second" (where present) and the like in the embodiments of the present application are used for name identification only, and do not represent the first and second in sequence.

From the above description of embodiments, it will be apparent to those skilled in the art that all or part of the steps of the above described example methods may be implemented in software plus general hardware platforms. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a storage medium, such as a read-only memory (ROM)/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network communication device such as a router) to perform the method according to the embodiments or some parts of the embodiments of the present application.

It should be noted that, in the present specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment is mainly described in a different point from other embodiments. In particular, for apparatus and readable storage medium embodiments, since they are substantially similar to method embodiments, the description is relatively simple, and references to parts of the description of method embodiments are only required. The apparatus and readable storage medium embodiments described above are merely illustrative, in which elements illustrated as separate elements may or may not be physically separate, and elements illustrated as elements may or may not be physical elements, may be located in one place, or may be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present application without undue burden.

The foregoing is only one specific embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the technical scope of the present application should be included in the scope of the present application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

1. A method of incremental image rendering, the method comprising:

acquiring scene data;

2. The method of claim 1, wherein the acquiring scene data comprises:

acquiring original data;

3. The method according to claim 1, wherein deleting the moving object in the scene data that is segmented according to a semantic segmentation model if the moving object appears in the scene data comprises:

4. A method according to claim 3, wherein the scene data comprises first scene data and second scene data; the detecting whether the scene data includes a moving object includes:

5. The method according to claim 4, wherein the method further comprises:

6. The method of claim 4, wherein the similarity is calculated by the following formula:

7. The method of claim 1, wherein the training a second neuro-radiation field based on the trained first neuro-radiation field comprises:

training a second neural radiation field according to the total loss function.

8. The method according to claim 7, wherein the distillation loss function is formulated as follows:

9. An incremental image rendering apparatus, the apparatus comprising: the system comprises a first acquisition module, a deletion module, a second acquisition module, a third acquisition module, a fourth acquisition module and an integration module;

the first acquisition module is used for acquiring scene data;

10. The apparatus of claim 9, wherein the deletion module specifically comprises: a detection sub-module and a deletion sub-module;