CN116074581B

CN116074581B - Implant position determining method and device, electronic equipment and storage medium

Info

Publication number: CN116074581B
Application number: CN202310047604.0A
Authority: CN
Inventors: 谢煊
Original assignee: Beijing QIYI Century Science and Technology Co Ltd
Current assignee: Beijing QIYI Century Science and Technology Co Ltd
Priority date: 2023-01-31
Filing date: 2023-01-31
Publication date: 2024-08-30
Anticipated expiration: 2043-01-31
Also published as: CN116074581A

Abstract

The embodiment of the application provides an implantation position determining method, an implantation position determining device, electronic equipment and a storage medium, and relates to the technical field of video processing, wherein the method comprises the following steps: acquiring a video image to be implanted with a three-dimensional image resource; determining a video frame containing a specified object in a video image as a first video frame based on a target detection algorithm; when the three-dimensional image resource is implanted at the position of the designated object in the first video frame, the intersection ratio of the image area occupied by the three-dimensional image resource and the image area occupied by the designated object is smaller than a first preset threshold value; an implantation location for implanting a three-dimensional image asset in a video image is determined based on a location of a specified object in a first video frame. Based on the method, the implantation position is automatically detected through the electronic equipment, and compared with the method of manually browsing the video image to determine the implantation position, the method can reduce labor cost and improve the efficiency of determining the implantation position.

Description

Implant position determining method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of video processing technologies, and in particular, to a method and apparatus for determining an implantation position, an electronic device, and a storage medium.

Background

Currently, in a video playing platform, other resources are often required to be implanted in the video without affecting the user's viewing of the video content, for example, the implanted resources may be advertisements. For example, when a beverage advertisement needs to be implanted, a beverage bottle picture can be implanted in an image area occupied by the table top of the dining table in the video, so that a three-dimensional visual effect that the beverage bottle is positioned on the table top of the dining table can be displayed in the video after the beverage bottle picture is implanted. Therefore, it is important for video playback platforms to determine where assets are to be embedded in an image frame so that the assets are incorporated into the video content without affecting the user's viewing experience.

In the related art, it is generally required to manually browse a video in which a resource is to be implanted, so as to determine an implantation position of the resource in the video. However, this approach is more labor-intensive and less efficient.

Disclosure of Invention

The embodiment of the application aims to provide an implantation position determining method, an implantation position determining device, electronic equipment and a storage medium, so that the cost for determining the implantation position of a resource is reduced, and the efficiency for determining the implantation position of the resource is improved. The specific technical scheme is as follows:

In a first aspect of the present application, there is provided a method of determining implantation location, the method comprising:

acquiring a video image to be implanted with a three-dimensional image resource;

Determining a video frame containing a specified object in the video image as a first video frame based on a target detection algorithm; when the three-dimensional image resource is implanted at the position of the appointed object in the first video frame, the intersection ratio of the image area occupied by the three-dimensional image resource and the image area occupied by the appointed object is smaller than a first preset threshold value;

An implantation location for implanting the three-dimensional image asset in the video image is determined based on the location of the specified object in the first video frame.

Optionally, the determining, based on the target detection algorithm, a video frame including a specified object in the video image as a first video frame includes:

Determining video frames of which the designated objects contained in the video image are the same object based on a target detection algorithm, and taking the video frames as second video frames corresponding to the designated objects;

the determining an implantation location of the three-dimensional image resource in the video image based on the location of the specified object in the first video frame includes:

An implantation location for implanting the three-dimensional image asset in the video image is determined based on the location of each designated object in the corresponding second video frame.

Optionally, the determining, based on the target detection algorithm, that the specified object included in the video image is a video frame of the same object, as each second video frame corresponding to the specified object includes:

Determining the current video frames to be processed in the video images according to the sequence among the video frames in the video images;

detecting whether a current video frame to be processed contains a specified object or not based on a target detection algorithm;

If the current video frame to be processed contains the specified object based on the target detection algorithm, judging whether the specified object is determined in the previous video frame of the current video frame to be processed;

If the appointed object is determined in the previous video frame of the current video frame to be processed, judging whether the appointed object detected in the current video frame to be processed and the appointed object determined in the previous video frame are the same object or not;

If the specified object detected in the current video frame to be processed and the specified object determined in the previous video frame are the same object, determining that the current video frame to be processed and the previous video frame belong to a video frame set containing the same specified object; and returning to execute the step of determining the current video frame to be processed in the video image according to the sequence among the video frames in the video image;

and determining the video frames in the video frame set containing the specified object as the same object as each second video frame corresponding to the specified object based on the similarity between the image areas occupied by the specified object in the video frames.

Optionally, before determining, based on the similarity between the image areas occupied by the specified objects in the video frames, that the video frames in the video frame set containing the specified objects as the same object are each second video frame corresponding to the specified object, the method further includes:

Determining a scene to which each video frame in the video image belongs based on a scene segmentation algorithm;

Based on the similarity between the image areas occupied by the specified objects in the video frames, determining the video frames in the video frame set containing the specified objects as the same object as each second video frame corresponding to the specified object, wherein the method comprises the following steps:

For each scene, determining a video frame set containing a specified object as the same object from all video frame sets belonging to the scene based on the similarity between image areas occupied by the specified object in the video frames, and taking the video frame set as the video frame set corresponding to the specified object;

And determining the video frames in the video frame set corresponding to the specified object as second video frames corresponding to the specified object.

Optionally, the method further comprises:

If the appointed object detected in the current video frame to be processed is not the same object as the appointed object determined in the previous video frame, or if the appointed object is not determined in the previous video frame of the current video frame to be processed, or if the appointed object is not included in the current video frame to be processed based on the target detection algorithm, determining the video frame which is determined last time and contains the appointed object as a current third video frame;

determining whether a specified object determined in a current third video frame is tracked in the current video frame to be processed or not based on a target tracking algorithm;

If the specified object determined in the current third video frame is tracked in the current video frame to be processed based on the target tracking algorithm, determining that the current video frame to be processed and the current third video frame belong to a video frame set containing the same specified object; and returning to the step of executing the determination of the current video frame to be processed in the video image according to the sequence among the video frames in the video image.

Optionally, in a case that it is determined based on the object detection algorithm that the current video frame to be processed does not contain the specified object, before the determining that the video frame containing the specified object is determined last time as the current third video frame, the method further includes:

judging whether a preset number of continuous fourth video frames exist before the current video frame to be processed; wherein the fourth video frame represents the determined video frame which does not contain the specified object;

the determining, as the current third video frame, the video frame containing the specified object determined last time includes:

Under the condition that a preset number of continuous fourth video frames do not exist before the current video frame to be processed, determining the video frame which contains the specified object and is determined last time as the current third video frame;

And under the condition that a preset number of continuous fourth video frames exist before the current video frame to be processed, determining that the current video frame to be processed does not contain a specified object, and returning to the step of executing the step of determining the current video frame to be processed in the video image according to the sequence among the video frames in the video image.

Optionally, before determining the implantation position of the three-dimensional image resource in the video image based on the position of each specified object in the corresponding second video frame, the method further comprises:

calculating an evaluation value of each candidate object contained in the video image aiming at a preset evaluation item based on each second video frame corresponding to the candidate object to obtain an object evaluation value of the candidate object; wherein the candidate object is determined based on a specified object contained in the video image;

The preset evaluation items comprise at least one of the following: a completeness evaluation item, a definition evaluation item and a stability evaluation item;

The integrity evaluation term is used for characterizing: an average value of the integrality of the candidate object in each corresponding second video frame;

The sharpness evaluation term is used to characterize: the proportion of the foreground in the first image area corresponding to the candidate object; the first image area corresponding to the candidate object represents: the image area occupied by the candidate object in each corresponding second video frame;

the stability evaluation item comprises at least one of the following: a region stability evaluation term and an area stability evaluation term;

The region stability evaluation term is used for characterization: the change amplitude of the first image area corresponding to the candidate object;

The area stability evaluation term is used for characterization: in each first image area corresponding to the candidate object, the proportion of the image area with the area smaller than a second preset threshold value is occupied;

the determining an implantation position of implanting the three-dimensional image resource in the video image based on the position of each specified object in the corresponding second video frame includes:

Determining a target object from the candidate objects based on the object evaluation values of the candidate objects;

and determining an implantation position for implanting the three-dimensional image resource in the video image based on the position of the target object in the corresponding second video frame.

Optionally, the evaluation value of the integrity evaluation item of one candidate object is calculated by the following manner:

For each second video frame corresponding to the candidate object, acquiring a detection value corresponding to the second video frame as the integrity of the candidate object in the second video frame; wherein the detection value corresponding to the second video frame represents a probability of determining that the second video frame contains the candidate object;

For each video frame set corresponding to the candidate object, calculating an average value of the integrality of the candidate object in each second video frame contained in the video frame set as the average integrality of the candidate object in the video frame set;

and determining an average value of the average completeness of the candidate object in each video frame set corresponding to the candidate object as an evaluation value of the candidate object for the completeness evaluation item.

Optionally, the evaluation value of the sharpness evaluation item of one candidate object is calculated by:

for each second video frame corresponding to the candidate object, calculating the proportion of the foreground in the first image area corresponding to the candidate object in the second video frame as the definition of the candidate object in the second video frame;

for each video frame set corresponding to the candidate object, calculating an average value of the definition of the candidate object in each second video frame contained in the video frame set as the average definition of the candidate object in the video frame set;

And determining the maximum value in the average definition of the candidate object in each video frame set corresponding to the candidate object as the evaluation value of the candidate object for the definition evaluation item.

Optionally, the area stability evaluation item includes at least one of the following: a first region stability evaluation sub-term and a second region stability evaluation sub-term;

The evaluation value corresponding to the first region stability evaluation sub-item of one candidate object is calculated by the following method:

For each second video frame corresponding to the candidate object, acquiring the coordinates of the center point of the first image area corresponding to the candidate object in the second video frame as the coordinates to be processed corresponding to the second video frame; calculating the area of a first image area corresponding to the candidate object in the second video frame as a first area to be processed corresponding to the second video frame;

for each video frame set corresponding to the candidate object, calculating the standard deviation of the coordinates to be processed corresponding to each second video frame contained in the video frame set, and taking the standard deviation as the first standard deviation corresponding to the video frame set;

Calculating an average value of a first area to be processed corresponding to each second video frame contained in the video frame set, and taking the average value as an average area corresponding to the video frame set;

Calculating the ratio of the first standard deviation corresponding to the video frame set to the average area corresponding to the video frame set as the center point stability corresponding to the video frame set;

Determining an average value of the stability of the central point corresponding to each video frame set corresponding to the candidate object as an evaluation value of the candidate object for the first region stability evaluation sub-item;

the evaluation value corresponding to the second region stability evaluation sub-item of one candidate object is calculated by the following method:

For each second video frame corresponding to the candidate object, calculating the area of a first image area corresponding to the candidate object in the second video frame as a first area to be processed corresponding to the second video frame;

For each video frame set corresponding to the candidate object, calculating the standard deviation of a first area to be processed corresponding to each second video frame contained in the video frame set, and taking the standard deviation as a second standard deviation corresponding to the video frame set;

and determining an average value of the second standard deviation corresponding to each video frame set corresponding to the candidate object as an evaluation value of the candidate object for the second region stability evaluation sub-item.

Optionally, before calculating, for each candidate object included in the video image, an evaluation value of the candidate object for a preset evaluation item based on each second video frame corresponding to the candidate object, and obtaining an object evaluation value of the candidate object, the method further includes:

aiming at each appointed object contained in the video image, obtaining the appearance time of the appointed object in the video image based on each second video frame corresponding to the appointed object;

judging whether the appearance duration of the appointed object is longer than a preset duration;

and if the appearance duration of the specified object is longer than the preset duration, determining the specified object as an alternative object.

In a second aspect of the present application, there is also provided an implantation position determining apparatus, the apparatus comprising:

the video image acquisition module is used for video images needing to be implanted with three-dimensional image resources;

The first determining module is used for determining a video frame containing a specified object in the video image based on a target detection algorithm as a first video frame; when the three-dimensional image resource is implanted at the position of the appointed object in the first video frame, the intersection ratio of the image area occupied by the three-dimensional image resource and the image area occupied by the appointed object is smaller than a first preset threshold value;

An implantation position determining module is used for determining an implantation position of implanting the three-dimensional image resource in the video image based on the position of the specified object in the first video frame.

Optionally, the first determining module includes:

The first determining submodule is used for determining video frames of which the designated objects contained in the video image are the same object based on a target detection algorithm, and the video frames are used as second video frames corresponding to the designated objects;

the implant location determination module includes:

an implantation position determination sub-module for determining an implantation position of the three-dimensional image resource to be implanted in the video image based on the position of each specified object in the corresponding second video frame.

Optionally, the first determining sub-module includes:

the first determining unit is used for determining the current video frames to be processed in the video images according to the sequence among the video frames in the video images;

The specified object detection unit is used for detecting whether the current video frame to be processed contains a specified object or not based on a target detection algorithm;

The first judging unit is used for judging whether the appointed object is determined in the previous video frame of the current video frame to be processed if the appointed object is contained in the current video frame to be processed based on the target detection algorithm;

The second judging unit is used for judging whether the appointed object detected in the current video frame to be processed and the appointed object determined in the previous video frame are the same object if the appointed object is determined in the previous video frame of the current video frame to be processed;

the second determining unit is used for determining that the current video frame to be processed and the previous video frame belong to a video frame set containing the same designated object if the designated object detected in the current video frame to be processed and the designated object determined in the previous video frame are the same object; and returning to execute the step of determining the current video frame to be processed in the video image according to the sequence among the video frames in the video image;

And the third determining unit is used for determining the video frames in the video frame set containing the specified object as the same object as each second video frame corresponding to the specified object based on the similarity between the image areas occupied by the specified object in the video frames.

Optionally, the apparatus further includes:

The scene determining module is used for determining the scene of each video frame in the video image based on a scene segmentation algorithm before determining the video frames in the video frame set containing the specified object as the same object based on the similarity between the image areas occupied by the specified object in the video frames as each second video frame corresponding to the specified object;

The third determining unit is specifically configured to determine, for each scene, a video frame set that includes a specified object that is the same object, as a video frame set corresponding to the specified object, from among video frame sets belonging to the scene, based on a similarity between image areas occupied by the specified object in the video frames; and determining the video frames in the video frame set corresponding to the specified object as second video frames corresponding to the specified object.

Optionally, the apparatus further includes:

The second determining module is configured to determine, if the specified object detected in the current video frame to be processed is not the same object as the specified object determined in the previous video frame, or if the specified object is not determined in the previous video frame of the current video frame to be processed, or if the specified object is not included in the current video frame to be processed based on the target detection algorithm, a video frame including the specified object determined last time is determined as a current third video frame;

the specified object tracking module is used for determining whether the specified object determined in the current third video frame is tracked in the current video frame to be processed or not based on a target tracking algorithm;

The video frame set determining module is used for determining that the current video frame to be processed and the current third video frame belong to a video frame set containing the same designated object if the designated object determined in the current third video frame is tracked in the current video frame to be processed based on a target tracking algorithm; and triggering the first determination unit.

Optionally, the apparatus further includes:

The first judging module is used for judging whether a preset number of continuous fourth video frames exist before the current video frame to be processed or not before the last determined video frame containing the specified object is used as the current third video frame under the condition that the current video frame to be processed does not contain the specified object based on the target detection algorithm; wherein the fourth video frame represents the determined video frame which does not contain the specified object;

the second determining module is specifically configured to determine, as a current third video frame, a video frame that includes a specified object and is determined last time, where a preset number of consecutive fourth video frames do not exist before the current video frame to be processed; and under the condition that a preset number of continuous fourth video frames exist before the current video frame to be processed, determining that the current video frame to be processed does not contain a specified object, and triggering the first determining unit.

Optionally, the apparatus further includes:

The object evaluation value calculation module is used for calculating the evaluation value of each candidate object aiming at a preset evaluation item based on each second video frame corresponding to each candidate object contained in the video image before determining the implantation position of the three-dimensional image resource in the video image based on the position of each specified object in the corresponding second video frame, so as to obtain the object evaluation value of the candidate object; wherein the candidate object is determined based on a specified object contained in the video image;

The implantation position determining submodule is specifically configured to determine a target object from each candidate object based on an object evaluation value of each candidate object; and determining an implantation position for implanting the three-dimensional image resource in the video image based on the position of the target object in the corresponding second video frame.

Optionally, the apparatus further includes:

The appearance duration obtaining module is used for obtaining the appearance duration of each appointed object contained in the video image based on each second video frame corresponding to the appointed object before the evaluation value of the appointed object for the preset evaluation item is obtained according to each second video frame corresponding to the appointed object;

The second judging module is used for judging whether the appearance time length of the appointed object is longer than a preset time length;

And the alternative object determining module is used for determining the designated object as an alternative object if the appearance duration of the designated object is longer than the preset duration.

In a third aspect of the present application, there is also provided an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

A memory for storing a computer program;

and the processor is used for realizing any of the above implantation position determining method steps when executing the program stored in the memory.

In yet another aspect of the present application, there is also provided a computer readable storage medium having a computer program stored therein, which when executed by a processor, implements any of the above described implantation position determination methods.

In yet another aspect of the application there is also provided a computer program product containing instructions which, when run on a computer, cause the computer to perform any of the above described implantation location determination methods.

According to the implantation position determining method provided by the embodiment of the application, the video image of the three-dimensional image resource to be implanted is obtained; determining a video frame containing a specified object in a video image as a first video frame based on a target detection algorithm; when the three-dimensional image resource is implanted at the position of the designated object in the first video frame, the intersection ratio of the image area occupied by the three-dimensional image resource and the image area occupied by the designated object is smaller than a first preset threshold value; an implantation location for implanting a three-dimensional image asset in a video image is determined based on a location of a specified object in a first video frame.

Based on the above processing, it is possible to determine the specified object in the video image, and since the intersection ratio of the image area occupied by the three-dimensional image resource and the image area occupied by the specified object is smaller than the first preset threshold when the three-dimensional image resource is implanted at the position of the specified object in the first video frame, that is, the three-dimensional image resource can be implanted at the position of the specified object, and accordingly, the implantation position of the three-dimensional image resource implanted in the video image can be determined according to the position of the specified object in the first video frame. Based on the method, the implantation position is automatically detected through the electronic equipment, and compared with the method of manually browsing the video image to determine the implantation position, the method can reduce labor cost and improve the efficiency of determining the implantation position.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a first flowchart of an implantation position determination method according to an embodiment of the present application;

FIG. 2 is a second flowchart of an implantation position determination method according to an embodiment of the present application;

FIG. 3 is a third flowchart of an implantation position determination method according to an embodiment of the present application;

FIG. 4 is a fourth flowchart of an implantation position determination method according to an embodiment of the present application;

FIG. 5 is a fifth flowchart of an implantation position determination method according to an embodiment of the present application;

FIG. 6 is a flowchart of determining a set of video frames containing the same specified object according to an embodiment of the present application;

FIG. 7 is a schematic diagram of a foreground portion of a video frame according to an embodiment of the present application;

FIG. 8 is a flowchart of calculating an evaluation value of an alternative object for a preset evaluation item according to an embodiment of the present application;

FIG. 9 is a flowchart of clustering a set of video frames according to an embodiment of the present application;

fig. 10 is a schematic diagram of a video frame set clustering result provided in an embodiment of the present application;

FIG. 11 is a schematic diagram of another clustering result of video frame sets according to an embodiment of the present application;

FIG. 12 is a sixth flowchart of an implantation position determination method according to an embodiment of the present application;

Fig. 13 is a schematic structural diagram of an implantation position determining apparatus according to an embodiment of the present application;

Fig. 14 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

Currently, in a video playing platform, other resources are often required to be implanted in the video without affecting the user's viewing of the video content, for example, the implanted resources may be advertisements. For example, poster pictures of advertisements may be implanted on a wall surface in a video, or poster pictures of advertisements may also be implanted on a desk in a video. Therefore, it is important for video playback platforms to determine where assets are to be embedded in an image frame so that the assets are incorporated into the video content without affecting the user's viewing experience.

In order to reduce the cost of implantation position determination and improve the efficiency of implantation position determination, the embodiment of the application provides an implantation position determination method. Referring to fig. 1, fig. 1 is a first flowchart of a method for determining an implantation position according to an embodiment of the present application, the method includes the following steps:

Step S101: video images are acquired that require implantation of a three-dimensional image resource.

Step S102: based on the target detection algorithm, a video frame containing a specified object in the video image is determined as a first video frame.

When the three-dimensional image resource is implanted at the position of the designated object in the first video frame, the intersection ratio of the image area occupied by the three-dimensional image resource and the image area occupied by the designated object is smaller than a first preset threshold value.

Step S103: an implantation location for implanting a three-dimensional image asset in a video image is determined based on a location of a specified object in a first video frame.

For step S101, a video image of the three-dimensional image resource needs to be implanted, that is, a video image of the implantation position of the three-dimensional image resource needs to be determined currently, for example, a video image of a television show or a video image of a movie. The three-dimensional image source may be a picture source capable of characterizing a three-dimensional object, for example, a beverage bottle picture.

For step S102, the first preset threshold may be 0.2 or 0.1. The image area occupied by the three-dimensional image resource is smaller than the image area occupied by the specified object, and when the three-dimensional image resource is implanted at the position of the specified object in the first video frame, the intersection ratio of the image area occupied by the three-dimensional image resource and the image area occupied by the specified object is smaller than a first preset threshold value. That is, the image area occupied by the three-dimensional image resource has a small overlapping portion with the image area occupied by the specified object.

For example, the specified object may have a plane having an angle with the vertical direction greater than a preset angle in the first video frame. That is, at the viewing angle of the user, the specified object has a plane biased to the vertical direction. For example, the preset angle may be 75 ° or 85 °. The designated object may be a dining table, a desk, a tea table, or a bedside table, for example.

In order to ensure that the implanted three-dimensional image resource can present a three-dimensional visual effect in the video image, the position of the implanted three-dimensional image resource in the image area occupied by the specified object can be determined according to the position relation between the object represented by the three-dimensional image resource and the specified object in the scene related to the video image. Further, the three-dimensional image resource may be implanted at the determined position, instead of filling the three-dimensional image resource in the image area occupied by the specified object, so that the image area occupied by the three-dimensional image resource has a small overlapping portion with the image area occupied by the specified object. For example, the three-dimensional image resource may be a beverage bottle picture, the designated object may be a dining table, and in an actual living scene, the beverage bottle is usually located on the top of the dining table, so that it can be determined that the beverage bottle picture is implanted in the image area occupied by the top of the dining table, and the video image after the beverage bottle picture is implanted can display the three-dimensional visual effect that the beverage bottle is located on the top of the dining table.

The video image includes a plurality of video frames, and for any video frame, one specified object may exist in the video frame, a plurality of specified objects may exist in the video frame, or no specified object may exist in the video frame. In the application, whether the video frame contains the specified object or not can be detected through the target detection algorithm, and further, the video frame containing the specified object can be determined as the first video frame. It will be appreciated that there may be a plurality of first video frames determined. For example, for any video frame, if the video frame contains at least one specified object, the video may be determined to be the first video frame.

The target detection algorithm may be an R-CNN (Region-based Convolutional Neural Networks, region-based convolutional neural network) algorithm, an SSD (Single Shot MultiBox Detector, single step multiple frame detection) algorithm, or a YOLO (You Only Look Once, a target detection algorithm) algorithm.

For step S103, for example, the position of the specified object in the first video frame may be directly determined as the implantation position of the three-dimensional image resource to be implanted in the video image.

In addition, in a case where there are a plurality of determined video frames including a specified object, referring to fig. 2, fig. 2 is a second flowchart of the implantation position determining method according to the embodiment of the present application, determining, based on the object detection algorithm, a video frame including the specified object in the video image as a first video frame (S102), including:

step S1021: and determining that the designated object contained in the video image is a video frame of the same object based on a target detection algorithm, and taking the video frame as each second video frame corresponding to the designated object.

Determining an implantation location for implanting a three-dimensional image resource in a video image based on a position of a specified object in a first video frame (S103), comprising:

step S1031: an implantation location for implanting a three-dimensional image asset in the video image is determined based on the location of each specified object in the corresponding second video frame.

In the embodiment of the present application, if the first video frame includes a plurality of specified objects, for any one of the specified objects, a plurality of first video frames including the specified object, that is, each of the second video frames corresponding to the specified object, may be determined. For example, a first video frame including a specified object in a video image may be determined based on a target detection algorithm, and further, based on a similarity between image areas occupied by the specified object in the first video frames, whether the specified object included in any two first video frames is the same object may be determined, and a first video frame including the specified object as the same object may be determined as a second video frame corresponding to the specified object.

Subsequently, an implantation location for implanting the three-dimensional image asset in the video image may be determined based on the location of each specified object in the corresponding second video frame. For example, the location of any one of the specified objects in the corresponding second video frame may be determined as the implantation location for implanting the three-dimensional image resource in the video image; alternatively, a plurality of specified objects may be screened, and the implantation position of the three-dimensional image resource to be implanted in the video image may be determined based on the screening result.

Based on this, for any given object, a second video frame in the video image that contains the given object can also be determined. That is, the video frames including the specified object in the video image can be associated, and then, the implantation position of the three-dimensional image resource implanted in the video image is determined based on the position of the specified object in each corresponding second video frame, so that the situation that the position of the same specified object in different video frames is determined to be different implantation positions of the three-dimensional image resource can be avoided, and the effect of the implantation of the three-dimensional image resource is further ensured.

In one embodiment, referring to fig. 3 on the basis of fig. 2, fig. 3 is a third flowchart of an implantation position determining method according to an embodiment of the present application, determining, based on a target detection algorithm, a video frame of which a specified object included in a video image is the same object, as each second video frame corresponding to the specified object (S1021), including:

Step S10211: and determining the current video frames to be processed in the video image according to the sequence among the video frames in the video image.

Step S10212: based on a target detection algorithm, whether the current video frame to be processed contains a specified object or not is detected. If yes, go to step S10213; if not, step S10219 is performed.

Step S10213: and judging whether the appointed object is determined in the video frame before the current video frame to be processed. If so, step S10214 is performed; if not, step S10216 is performed.

Step S10214: and judging whether the appointed object detected in the current video frame to be processed and the appointed object determined in the previous video frame are the same object or not. If yes, go to step S10215; if not, step S10216 is performed.

Step S10215: and determining that the current video frame to be processed and the previous video frame belong to a video frame set containing the same designated object. If the current video frame to be processed is the last frame in the video image, executing step S102111; if the current frame is not the last frame in the video image, the process returns to step S10211.

Step S10216: and determining the video frame which contains the specified object and is determined last time as the current third video frame.

Step S10217: based on the target tracking algorithm, whether the specified object determined in the current third video frame is tracked in the current video frame to be processed is determined. If so, step S10218 is performed.

Step S10218: and determining that the current video frame to be processed and the current third video frame belong to a video frame set containing the same designated object. If the current video frame to be processed is the last frame in the video image, executing step S102111; if the current frame is not the last frame in the video image, the process returns to step S10211.

Step S10219: and judging whether a preset number of continuous fourth video frames exist before the current video frame to be processed. If not, go to step S10216; if so, step S102110 is performed.

Wherein the fourth video frame represents the determined video frame that does not contain the specified object, that is, for the fourth video frame, it is determined that it does not contain the specified object based on both the object detection algorithm and the object tracking algorithm.

Step S102110: it is determined that the current video frame to be processed does not contain the specified object. If the current video frame to be processed is the last frame in the video image, executing step S102111; if the current frame is not the last frame in the video image, the process returns to step S10211.

Step S102111: and determining the video frames in the video frame set containing the specified object as the same object as each second video frame corresponding to the specified object based on the similarity between the image areas occupied by the specified object in the video frames.

In the embodiment of the application, each video frame in the video image can be sequentially processed according to the sequence among the video frames in the video image so as to determine whether a specified object exists in the video frame.

For example, there are N video frames in the video image, the video frames in the video image may be numbered sequentially, and then, the 1 st frame, the 2 nd frame, the … … nd frame and the nth frame may be determined as the current video frame to be processed sequentially, and the processing is performed according to the steps S10212 to S102110. It will be appreciated that for frame 1, it is only necessary to detect whether the frame contains a specified object based on the object detection algorithm, without determining whether the specified object was determined in the previous video frame.

If the current video frame to be processed contains the specified object based on the target detection algorithm and the specified object is determined in the previous video frame of the current video frame to be processed, judging whether the detected specified object in the current video frame to be processed and the specified object determined in the previous video frame are the same object or not.

For example, the degree of coincidence between the image area occupied by the specified object detected in the current video frame to be processed and the image area occupied by the specified object determined in the previous video frame may be calculated. For example, the overlap ratio may be represented by IoU (Intersection over Union, cross-over ratio) of the two image regions.

When the calculated overlap ratio reaches a preset overlap ratio threshold value, the fact that the appointed object detected in the current video frame to be processed and the appointed object determined in the previous video frame are the same object is indicated, and it can be determined that the current video frame to be processed and the previous video frame belong to a video frame set containing the same appointed object. Accordingly, video frames belonging to the same set of video frames may be marked with the same identification (which may be referred to as a target ID).

Based on the above processing, video frames that are adjacent and contain the same specified object in the video image can be determined to belong to one video frame set. Since the same designated object may appear in different segments in the whole video image, that is, different video frame sets may also contain the same designated object, the video frames in the video frame set containing the designated object as the same object may be determined as the second video frames corresponding to the designated object based on the similarity between the image areas occupied by the designated object in the video frames. Thus, for any specified object, the video frame containing the specified object in the whole video image can be determined. That is, the video frames containing the specified object in the whole video image can be associated, and then, the implantation position of the three-dimensional image resource implanted in the video image is determined based on the position of the specified object in each corresponding second video frame, so that the situation that the position of the same specified object in different video frames is determined to be different implantation positions of the three-dimensional image resource can be avoided, and the effect of the implantation of the three-dimensional image resource is further ensured.

Since the same designated object may appear in different segments throughout the video image, i.e., the same designated object may appear in non-adjacent video frames. Therefore, if the specified object detected in the current video frame to be processed is not the same object as the specified object determined in the previous video frame, or if the specified object is not determined in the previous video frame of the current video frame to be processed, the video frame containing the specified object (i.e., the current third video frame) determined last time may be utilized to perform target tracking on the specified object (which may be referred to as the first specified object) determined in the current third video frame based on the target tracking algorithm.

If the first appointed object is tracked in the current video frame to be processed, determining that the current video frame to be processed and the current third video frame belong to a video frame set containing the first appointed object. And continuing to determine the current video frames to be processed in the video image according to the sequence among the video frames in the video image, so that the video frames which are not adjacent and contain the same appointed object in the video image can be determined to belong to one video frame set.

It will be appreciated that the specified object determined in the previous video frame, and the specified object determined in the current third video frame, may be determined based on a target detection algorithm or may be determined based on a target tracking algorithm.

For example, the target Tracking algorithm may be a CSRT (Correlation Filter-based Tracking, correlation filter based Tracking) algorithm, KCF TRACKER (Kernelized Correlation FILTERS TRACKER, nucleated correlation filter Tracking) algorithm, or MIL TRACKER (Multiple INSTANCE LEARNING TRACKER, multiple instance learning Tracking) algorithm.

Since the same specified object may appear in different segments in the entire video image, based on the above processing, video frames that are not adjacent in the video image and contain the same specified object can be determined to belong to one video frame set. Thus, for any specified object, the video frame containing the specified object in the whole video image can be determined. That is, the video frames containing the specified object in the whole video image can be associated, and then, the implantation position of the three-dimensional image resource implanted in the video image is determined based on the position of the specified object in each corresponding second video frame, so that the situation that the position of the same specified object in different video frames is determined to be different implantation positions of the three-dimensional image resource can be avoided, and the effect of the implantation of the three-dimensional image resource is further ensured.

The specified object contained in the current video frame to be processed may not be detected based on the object detection algorithm because it may be affected by the accuracy of the object detection algorithm. Therefore, in the embodiment of the present application, if it is determined that the current video frame to be processed does not include the specified object based on the target detection algorithm, the target tracking may be performed on the first specified object based on the target tracking algorithm by using the video frame including the specified object determined last time (i.e., the current third video frame).

If the first appointed object is tracked in the current video frame to be processed, determining that the current video frame to be processed and the current third video frame belong to a video frame set containing the same appointed object. And continuing to determine the current video frame to be processed in the video image according to the sequence among the video frames in the video image, thereby determining the video frame containing the same appointed object in the video image as belonging to a video frame set.

Based on the above processing, for any specified object, the situation that the video frame of the specified object is not determined due to the influence of the accuracy of the target detection algorithm, and the video frame of the specified object in the whole video image can be determined. That is, the video frames including the specified object in the entire video image can be associated, and then, the implantation position of the three-dimensional image resource to be implanted in the video image is determined based on the position of the specified object in each corresponding second video frame, so that the situation that only the position of the same specified object in a part of the video frames is determined as the implantation position of the three-dimensional image resource to be implanted can be avoided, and the effect of the implantation of the three-dimensional image resource can be further ensured.

When the advertisement is needed to be implanted in the video image, based on the method provided by the embodiment, the situation that only the position of the same appointed object in a part of video frames is determined as the implantation position of the implanted advertisement is avoided, the situation that the position of the same appointed object in different video frames is determined as the implantation position of different implanted advertisements is avoided, and the advertisement implantation through-hole is avoided.

In a video image, a specified object may appear in a different segment, but if no specified object exists in a consecutive plurality of video frames, no specified object may exist in a subsequent video frame. Therefore, in the embodiment of the present application, there is no preset number of consecutive fourth video frames before the current video frame to be processed, which indicates that the specified object may still exist in the current video frame to be processed, so that the current video frame to be processed may be tracked. The preset number may be 10 or may be 20.

Otherwise, if a preset number of continuous fourth video frames exist before the current video frame to be processed, the current video frame to be processed is indicated, and the appointed object may not exist in the subsequent video frames, and here, it is directly determined that the current video frame to be processed does not contain the appointed object without performing target tracking on the current video frame to be processed.

Based on the above processing, the efficiency of implant position determination can be improved without tracking each video frame in the video image.

In addition, since the specified objects may be plural, the plural specified objects may also be screened, and the implantation position of the three-dimensional image resource to be implanted in the video image may be determined based on the screening result. Referring to fig. 4, fig. 4 is a fourth flowchart of an implantation position determining method according to an embodiment of the present application, before determining an implantation position of implanting a three-dimensional image resource in a video image based on a position of each specified object in a corresponding second video frame (S1031), the method may further include:

Step S104: and calculating the evaluation value of each candidate object aiming at a preset evaluation item based on each second video frame corresponding to the candidate object aiming at each candidate object contained in the video image to obtain the object evaluation value of the candidate object.

Wherein the candidate object is determined based on a specified object contained in the video image.

The preset evaluation items comprise at least one of the following: a completeness evaluation item, a definition evaluation item and a stability evaluation item; the integrity assessment term is used to characterize: an average value of the integrality of the candidate object in each corresponding second video frame; the sharpness evaluation term was used to characterize: the proportion of the foreground in the first image area corresponding to the candidate object; the first image area corresponding to the candidate object represents: the image area occupied by the candidate object in each corresponding second video frame;

the stability evaluation item includes at least one of: a region stability evaluation term and an area stability evaluation term; the regional stability evaluation term was used to characterize: the change amplitude of the first image area corresponding to the candidate object; the area stability evaluation term was used to characterize: and in each first image area corresponding to the candidate object, the area of the image area is smaller than the proportion occupied by the image area of the second preset threshold value.

Accordingly, determining an implantation location for implanting a three-dimensional image resource in a video image based on a location of each specified object in a corresponding second video frame (S1031), comprising:

Step S10311: the target object is determined from among the candidate objects based on the object evaluation values of the candidate objects.

Step S10312: an implantation location for implanting a three-dimensional image asset in the video image is determined based on the location of the target object in the corresponding second video frame.

In the embodiment of the present application, if the first video frame includes a plurality of specified objects, a plurality of first video frames including the specified objects, that is, each of the second video frames corresponding to the specified objects, may be determined for any one of the specified objects. For example, it may be determined whether or not the specified objects contained in any two first video frames are the same object based on the similarity between the image areas occupied by the specified objects in the first video frames.

In one implementation, each specified object determined in the first video frame may be directly determined as an alternative object.

In another implementation manner, the selection may be performed on the specified object determined in the first video frame, so as to determine the candidate object. Referring to fig. 5, fig. 5 is a fifth flowchart of an implantation position determining method according to an embodiment of the present application. Before calculating an evaluation value of each candidate object for a preset evaluation item based on each second video frame corresponding to the candidate object for each candidate object included in the video image to obtain an object evaluation value of the candidate object (S104), the implantation position determining method further includes:

step S105: and aiming at each appointed object contained in the video image, obtaining the appearance time of the appointed object in the video image based on each second video frame corresponding to the appointed object.

Step S106: judging whether the appearance duration of the appointed object is longer than a preset duration.

Step S107: and if the appearance duration of the specified object is longer than the preset duration, determining the specified object as an alternative object.

In the embodiment of the application, the appearance time of each specified object in the video image can be obtained based on each second video frame corresponding to the specified object. For example, the number of second video frames corresponding to the specified object may be counted to represent the duration of occurrence of the specified object in the video image. Accordingly, the preset duration may be a preset number of frames, for example, 144 frames or 192 frames.

Or the duration of each second video frame corresponding to the specified object in the video image may be counted, and the corresponding preset duration may be a preset time length, for example, 6 seconds or 8 seconds, as the duration of the occurrence of the specified object in the video image.

The longer the appearance time of the specified object in the video image, correspondingly, if the three-dimensional image resource is implanted in the position of the specified object, the better the implantation effect is, so that the specified object can be screened based on the appearance time, that is, the position of the specified object with shorter appearance time is not suitable for implanting the three-dimensional image resource.

In addition, the designated objects in the video image are screened based on the appearance time of the designated objects, the designated objects with longer appearance time are determined to be the alternative objects, the designated objects with shorter appearance time can be screened out, furthermore, the implantation positions of the three-dimensional image resources implanted in the video image can be determined based on the designated objects (namely the alternative objects) with longer appearance time, and the efficiency of determining the implantation positions can be further improved without executing the subsequent processing of step S104 on each designated object.

And calculating the evaluation value of each candidate object aiming at a preset evaluation item based on each second video frame corresponding to the candidate object to obtain the object evaluation value of the candidate object.

It can be understood that the larger the average value of the completeness of the candidate object in each corresponding second video frame, correspondingly, the three-dimensional image resource is implanted in the position of the candidate object in the video image, so that the implanted three-dimensional image resource can be more complete in the video image. Thus, the specified object can be screened based on the integrity evaluation item.

As described above, the sharpness evaluation term of an alternative object is used to characterize the proportion of the foreground in the first image area corresponding to the alternative object. The first image area corresponding to the candidate object is an image area occupied by the candidate object in each corresponding second video frame, and specifically, the first image area may be an image area corresponding to a minimum circumscribed rectangle of the candidate object in the corresponding second video frame. The first image area may be marked with a target frame, i.e. the target frame is a frame represented by the boundary of the first image area.

In addition, in order to highlight the foreground, the background is often subjected to virtual focus processing, so that the foreground in the video image is clearer. Therefore, the higher the proportion of the foreground in the first image area corresponding to the candidate object, correspondingly, the three-dimensional image resource is implanted in the position of the candidate object in the video image, so that the implanted three-dimensional image resource can be clearer in the video image. Therefore, the specified object can be screened based on the sharpness evaluation item.

In one embodiment, the sharpness evaluation term may be used to characterize the sharpness of the first image area corresponding to the candidate object, where the sharpness of the first image area corresponding to the candidate object may be obtained based on a preset sharpness evaluation method. For example, the preset sharpness evaluation method may be a Tenengrad (a gradient-based function) gradient method, a Laplacian (Laplacian) gradient method, or a variance method.

In addition, the smaller the variation amplitude of the first image area corresponding to the candidate object is, the smaller the proportion of the image area, which is smaller than the second preset threshold, in the first image area corresponding to the candidate object is, the more stable the position of the candidate object in the video image is, and the larger the area is, accordingly, the three-dimensional image resource is implanted in the position of the candidate object in the video image, and the implanted three-dimensional image resource can be more stably displayed in the video image. Furthermore, the effect of the implantation of the three-dimensional image resource can be better. Therefore, the specified object can be screened based on the stability evaluation item.

It is understood that the preset evaluation items may include any one or any two of the integrity evaluation item, the sharpness evaluation item, and the stability evaluation item, or may include three items.

Accordingly, the evaluation value (i.e., the object evaluation value) of each candidate object for the preset evaluation item can be obtained based on any one or any two or three of the evaluation value (may be referred to as a detection score) of each candidate object for the integrity evaluation item, the evaluation value (may be referred to as a sharpness score) of each sharpness evaluation item, and the evaluation value (may be referred to as a stability score) of each stability evaluation item in each corresponding second video frame.

The object evaluation value is positively correlated with the calculated plurality of evaluation values. For example, a sum value of a plurality of obtained evaluation values may be calculated as the target evaluation value. In addition, in calculating the sum value, weights may be set for different evaluation items, and further, a weighted sum of a plurality of evaluation values may be calculated as the target evaluation value.

For example, the candidate object having the highest object evaluation value may be determined as the target object. Therefore, the implantation position can be automatically determined through the electronic equipment, compared with the case that the implantation position is determined by manually browsing the video image, the labor cost can be reduced, and the efficiency of determining the implantation position is improved. Or a plurality of candidate objects with highest object evaluation values can be selected as target objects, and the plurality of candidate objects and the respective corresponding object evaluation values can be provided for the user subsequently, so that the user can further screen the plurality of candidate objects.

Based on the above processing, a target object can be determined from among the candidate objects based on a preset evaluation item, and further, the position of the target object in the video image can be determined as the implantation position of implanting the three-dimensional image resource in the video image. The implantation position with good effect of implanting the three-dimensional image resource can be determined, the efficiency of determining the implantation position is improved, and the effect of implanting the three-dimensional image resource at the determined implantation position is improved.

In one embodiment, referring to fig. 6, fig. 6 is a flowchart of determining a set of video frames containing the same specified object according to an embodiment of the present application.

Step S601: and acquiring an ith frame picture. I.e. the ith frame picture in the video image is obtained as the current video frame to be processed, and the initial value of i is 1.

Step S602: and (5) detecting a target. That is, based on the target detection algorithm, it is detected whether or not the specified object is contained in the current video frame to be processed.

Step S603: and judging whether a target exists. That is, based on the result of the object detection, it is determined whether or not the current video frame to be processed contains an object (i.e., a specified object). If the current video frame to be processed contains the target, executing step S604; if the current video frame to be processed does not contain the target, step S605 is executed.

Step S604: it is determined whether the previous frame has a target. That is, it is determined whether the specified object is determined in the video frame preceding the current video frame to be processed. If the specified object is determined in the previous video frame of the current video frame to be processed, executing step S606; if the specified object is not determined in the video frame preceding the current video frame to be processed, step S607 is performed.

Step S605: and judging whether the duration of the continuous detection result is longer than a threshold value. That is, it is determined whether a preset number of consecutive fourth video frames exist before the current video frame to be processed. The fourth video frame represents the determined video frame that does not contain the specified object. If a preset number of continuous fourth video frames exist before the current video frame to be processed, ending; if there are no consecutive fourth video frames before the current video frame to be processed, step S607 is performed.

The end here is just the end of the flow for the specified object, and the flow for other specified objects may be continued.

Step S606: and judging whether the target contact ratio with the previous frame reaches a threshold value or not. That is, it is determined whether the specified object detected in the current video frame to be processed and the specified object determined in the previous video frame are the same object. If the specified object detected in the current video frame to be processed and the specified object determined in the previous video frame are the same object, executing step S608; if the detected specified object in the current video frame to be processed is not the same object as the specified object determined in the previous video frame, step S607 is performed.

Step S607: and (5) tracking a target. That is, the specified object determined in the current third video frame is subjected to target tracking based on a target tracking algorithm. The current third video frame is the last determined video frame containing the specified object.

Step S608: updating object_result (object result). After the object_result is updated, the execution returns to step S601, and the value of i is assigned to i+1.

Step S609: it is determined whether the target is tracked. That is, it is determined whether the specified object determined in the current third video frame is tracked in the current video frame to be processed. If the specified object determined in the current third video frame is tracked in the current video frame to be processed, executing step S608; and if the specified object determined in the current third video frame is not tracked in the current video frame to be processed, ending.

In the embodiment of the present application, object_result is a set of detection and tracking results obtained after target detection and tracking are performed on specified objects, and is used for recording the detection and tracking results of each specified object. For each specified object, the detection and tracking results of the specified object are shown in table (1). The information recorded in table (1) is as follows:

TrackId the data type is Int (integer), the target tracking result ID, i.e. the target ID in the above embodiment.

Label, data type is Str (character string type), target tracking result tag, i.e. the category of the specified object. The category of the specified object may be a dining table, a desk, a tea table, or a bedside table.

DetectScore the data type is List (List type), the target detection score, i.e. the probability (which may be referred to as a first probability) that a specified object is contained in a video frame is determined based on a target detection algorithm. For a video frame, if the first probability is greater than a preset detection probability threshold, determining that the video frame contains a specified object, and marking DetectScore corresponding to the video frame as the first probability. If the first probability is smaller than the preset detection probability threshold, determining that the video frame does not contain the specified object, and marking DetectScore corresponding to the video frame as-1. The preset detection probability threshold may be 0.7 or 0.8.

TrackScore, the data type is List (tabular), the target tracking score, i.e., the probability (which may be referred to as a second probability) that a specified object is contained in the video frame is determined based on a target tracking algorithm. For one video frame, if the second probability is greater than a preset tracking probability threshold, determining that the video frame contains a specified object, and marking TrackScore corresponding to the video frame as the second probability. If the second probability is smaller than the preset tracking probability threshold, determining that the video frame does not contain the specified object, and marking TrackScore corresponding to the video frame as-1. The preset tracking probability threshold may be 0.7 or 0.8.

FrameId the data type is List (List type), there is a frame number of the object, i.e. a sequence number of the video frame containing the specified object. For example, there are N video frames in a video image, and the video frames in the video image may be numbered sequentially. Further, the sequence number of each video frame in the video image can be obtained, and when the video frame containing the specified object is determined, the sequence number of the video frame can be recorded.

Box, data type is List (List type), the position of the target in each frame of picture, that is, the position of the specified object in each video frame containing the specified object. The positions of the target frame in the above embodiment may be used, and correspondingly, coordinate representations of four vertices of the target frame may be recorded.

Watch (1)

In one embodiment, the evaluation value of an alternative object for the integrity evaluation item is calculated by:

And acquiring a detection value corresponding to each second video frame corresponding to the candidate object as the integrity of the candidate object in the second video frame. Wherein the detection value corresponding to the second video frame represents a probability of determining that the second video frame contains the candidate object.

And calculating the average value of the integrality of the candidate object in each second video frame contained in the video frame set as the average integrality of the candidate object in the video frame set aiming at each video frame set corresponding to the candidate object.

And determining an average value in the average completeness of the candidate object in each video frame set corresponding to the candidate object as an evaluation value of the candidate object for the completeness evaluation item.

It can be understood that, for each candidate object, each video frame included in each video frame set corresponding to the candidate object is a second video frame corresponding to the candidate object. The object IDs corresponding to the second video frames belonging to the same video frame set are the same.

In the embodiment of the present application, the detection value corresponding to the candidate object in each second video frame is the probability that the candidate object is included in the second video frame, which may be a larger value in DetectScore and TrackScore in the above embodiment corresponding to the candidate object in each second video frame, or may be an intersection ratio of the predicted target frame and the actual target frame of the candidate object in each second video frame. The larger the probability of the second video frame containing the candidate object is, the larger the detection value corresponding to the second video frame is, and further, the larger the average integrity of the candidate object in each video frame set corresponding to the candidate object is. The greater the average integrity of the candidate object, the more complete the candidate object appears in the video image. And determining the average value of the average completeness of the candidate object in each video frame set corresponding to the candidate object as the evaluation value of the candidate object for the completeness evaluation item, so that the evaluation value of the more complete candidate object appearing in the video image for the completeness evaluation item can be higher. Based on the method, a more complete implantation position can be selected, and the effect of implanting the three-dimensional image resource is further ensured.

Or for each second video frame corresponding to the candidate object, acquiring a detection value corresponding to the second video frame as the integrity of the candidate object in the second video frame, and calculating the average value of the integrity of the candidate object in each second video frame corresponding to the candidate object as the evaluation value of the candidate object for the integrity evaluation item.

In one embodiment, the average integrity of the set of video frames with a target ID of i may be calculated based on equation (1),

Wherein score1 _i is the average integrity of the video frame set with the target ID i, endId is the sequence number of the last video frame of the video frame set with the target ID i, startId is the sequence number of the first video frame of the video frame set with the target ID i, detectScore _j is the target detection score of the j-th frame in the video frame set with the target ID i, and TrackScore _j is the target tracking score of the j-th frame in the video frame set with the target ID i. j is the sequence number of the video frame in the video frame set with the target ID i.

The evaluation value of the candidate object for the integrity evaluation item can be calculated based on formula (2),

score1＝mean(score1_i∈Ω) (2)

Wherein score1 is an evaluation value of the candidate object for the integrity evaluation item, and Ω is a set of target IDs corresponding to each video frame set including the candidate object.

In one embodiment, the evaluation value of an alternative object for the sharpness evaluation term is calculated by:

and calculating the proportion of the foreground in the first image area corresponding to the candidate object in the second video frame as the definition of the candidate object in the second video frame aiming at each second video frame corresponding to the candidate object.

And calculating the average value of the definition of the candidate object in each second video frame contained in the video frame set as the average definition of the candidate object in the video frame set aiming at each video frame set corresponding to the candidate object.

In the embodiment of the application, the video image is divided into a foreground and a background, the foreground is clearer, and the background is more blurred. In order to ensure the effect of implanting the three-dimensional image resource, the proportion of the foreground in the implantation position needs to be high.

For example, for each video frame in a video image, the video frame may be foreground detected by a foreground detection network. Further, a binary matrix in which the pixel value corresponding to the pixel belonging to the foreground is 1 and the pixel value corresponding to the pixel belonging to the background is 0 can be obtained in accordance with the pixel size of the video frame. The foreground detection network can be trained based on a frame difference method, an optical flow method or an average background method.

Fig. 7 is a schematic diagram of a foreground portion of a video frame according to an embodiment of the present application, as shown in fig. 7. The dashed box in fig. 7 is the foreground in the detected video frame, and the solid box is the target frame in the video frame.

For the j-th frame video frame with the target ID of i, _mas ^k _i, is obtained through the foreground detection network, namely, a binary matrix consistent with the pixel size of the video frame. The target frame corresponding to the specified object in the video frame may be represented as ^b _oxi,＝[x1_i,j,1_i,,2_i,,2_i,],(1_i,j,1_i,) and (2 _i,j,2_i,) are the coordinates of the top left corner vertex and bottom right corner vertex of the target frame, respectively. The sharpness of the candidate object in the j-th video frame with the target ID i can be calculated based on equation (3),

Wherein _score ² _i, is the definition of the candidate object in the j-th frame video frame with the target ID i, mask _i,j (, c) is the pixel value corresponding to the pixel point with the coordinate ⁽ _,c ⁾ in the j-th frame video frame with the target ID i.

The average sharpness of the candidate objects in the set of video frames may be calculated based on equation (4),

Wherein _score ² _i is the average definition of the video frame set with the target ID i, ^E _n ^dId is the sequence number of the last video frame of the video frame set with the target ID i, ^S _tart ^Id is the sequence number of the first video frame of the video frame set with the target ID i, _score ² _i, is the definition of the candidate object in the j-th video frame with the target ID i, j is the sequence number of the video frame in the video frame set with the target ID i.

The evaluation value of the candidate object for the sharpness evaluation term can be calculated based on the formula (5),

score2＝max(score2i∈Ω) (5)

Wherein _score2 is an evaluation value of the candidate object for the sharpness evaluation item, Ω is a set of target IDs corresponding to a set of video frames containing the candidate object.

In one embodiment, for each second video frame corresponding to the candidate object, the sharpness of the first image area corresponding to the candidate object in the second video frame may be calculated based on a preset sharpness evaluation method, and the sharpness of the candidate object in the second video frame may be used as the sharpness of the candidate object.

In one embodiment, the region stability evaluation term includes at least one of: a first region stability rating sub-term and a second region stability rating sub-term.

And determining an average value of the center point stability corresponding to each video frame set corresponding to the candidate object as an evaluation value of the candidate object for the first region stability evaluation sub-item.

In one embodiment, for each candidate object in the set of video frames having a target ID i, the coordinates of the center point of each target frame corresponding to the candidate object may be obtained based on equation (6),

center_i＝[(x1_i,StarId+x2_i,StarId,y1_i,StarId+y2_i,StarId)/2,…,(x1_i,EndId+x2_i,EndId,y1_i,EndId+y2_i,EndId)/2] (6)

Center _i is the coordinates of the center point of each target frame corresponding to the candidate object, endId is the sequence number of the last video frame of the video frame set with the target ID i, and StartId is the sequence number of the first video frame of the video frame set with the target ID i.

The center point stability corresponding to the video frame set is calculated based on the formula (7),

Wherein,For the center point stability corresponding to the video frame set, std [ center _i ] is the first standard deviation corresponding to the video frame set, and mean [ area _i ] is the average area corresponding to the video frame set.

The first area to be processed corresponding to the second video frame in the video frame set with the target ID i is calculated based on the formula (8),

area_i＝[(x2_i,StarId-x1_i,StartId)(y2_i,StarId-y1_i,StartId),…,(x2_i,StarId-x1_i,EndId)(y2_i,StarId-y1_i,EndId) (8)

Wherein area _i is the first area to be processed corresponding to the second video frame in the video frame set with the target ID i, endId is the sequence number of the last video frame in the video frame set with the target ID i, and StartId is the sequence number of the first video frame in the video frame set with the target ID i.

The evaluation value of the candidate object aiming at the first region stability evaluation sub-item is the average value of the center point stability corresponding to each video frame set corresponding to the candidate object.

In one embodiment, the second standard deviation corresponding to the set of video frames with object ID i is calculated based on equation (9),

Wherein,The area _i is the first area to be processed corresponding to the second video frame in the video frame set with the target ID i, which is the second standard deviation corresponding to the video frame set with the target ID i.

The evaluation value of the candidate object for the second region stability evaluation sub-item is the average value of the second standard deviation corresponding to each video frame set corresponding to the candidate object.

In one embodiment, the evaluation value of an alternative object for the area stability evaluation term is calculated by:

and comparing the first area to be processed corresponding to each second video frame corresponding to the candidate object with a second preset threshold value.

And calculating the proportion of video frames with the first area to be processed smaller than a second preset threshold value in each second video frame contained in the video frame set as the area stability of the candidate object in the video frame set aiming at each video frame set corresponding to the candidate object.

And determining an average value of the area stability of the candidate object in each video frame set corresponding to the candidate object as an evaluation value of the candidate object for an area stability evaluation item.

The first area to be processed corresponding to the second video frame with the sequence number j in the video frame set with the target ID i is obtained based on the formula (10), and (x 1 _i,j,y1_i,j) and (x 2 _i,j,y2_i,j) are coordinates of the top left corner vertex and the bottom right corner vertex of the target frame in the second video frame respectively.

A_i,j＝(x2_i,j-x1_i,j)(y2_i,j-y1_i,j) (10)

Wherein a _i,j is a first area to be processed corresponding to a second video frame with a sequence number j in the video frame set with a target ID i.

The proportion of the video frames with the first area to be processed smaller than the second preset threshold value in the second video frames contained in the video frame set with the target ID i can be calculated based on the formula (11),

Wherein,The method comprises the steps that the proportion of video frames with a first area to be processed smaller than a second preset threshold value in each second video frame contained in a video frame set with a target ID of i is calculated, endId is the sequence number of the last video frame of the video frame set with the target ID of i, startId is the sequence number of the first video frame of the video frame set with the target ID of i, ^A _i, is the first area to be processed corresponding to the second video frame with the sequence number of j in the video frame set with the target ID of i, j is the sequence number of the video frame in the video frame set with the target ID of i, and _t ^h _m is the second preset threshold value. I (a _i,<th_m) is an indicator function, and if ^A _i,<th_m, ^I(A _i,<th_m) has a value of 1, whereas ^I(A _i,<th_m) has a value of 0. The second preset threshold may be set according to the type of the candidate object, and is not particularly limited.

The evaluation value of the candidate object for the area stability evaluation term may be calculated based on the formula (12),

Wherein _score ^3c is an evaluation value of the candidate object for the area stability evaluation item, Ω is a set of target IDs corresponding to a set of video frames including the candidate object.

In one embodiment, the evaluation value of an alternative object for the stability evaluation item may be inversely related to the evaluation value for the first region stability evaluation sub-item, the evaluation value for the second region stability evaluation sub-item, and the evaluation value for the area stability evaluation item. For example, the evaluation value of one candidate object for the stability evaluation term may be calculated based on formula (13),

Wherein _score ³ is an evaluation value of the candidate object for a stability evaluation item, _score ^3a is an evaluation value of the candidate object for a first region stability evaluation sub-item, _score ^3b is an evaluation value of the candidate object for a second region stability evaluation sub-item, _score ^3c is an evaluation value of the candidate object for an area stability evaluation item, and p is a preset parameter, which may be 0.5 or 1.

In the embodiment of the application, the stability of the candidate object in each corresponding second video frame can be determined based on the position coordinates of the candidate object in each corresponding second video frame and the area of the occupied image area. Furthermore, an alternative object with a stable position in the video image can be selected, so that the implanted three-dimensional image resource can be more stably displayed in the video image. Furthermore, the effect of the implantation of the three-dimensional image resource can be better.

In one embodiment, referring to fig. 8, fig. 8 is a flowchart of calculating an evaluation value of an alternative object for a preset evaluation item according to an embodiment of the present application. One pit corresponds to an alternative object, the position of which in the video frame may be referred to as a pit.

Step S801: duration filtering, namely, aiming at each appointed object contained in the video image, obtaining the appearance duration of the appointed object in the video image based on each second video frame corresponding to the appointed object; judging whether the appearance duration of the appointed object is longer than a preset duration; and if the appearance duration of the specified object is longer than the preset duration, determining the specified object as an alternative object.

Step S802: the pit detection score1 is calculated, that is, an evaluation value of the candidate object for the integrity evaluation item is calculated.

Step S803: pit definition score2 is calculated, i.e., an evaluation value of the candidate object for the definition evaluation item is calculated.

Step S804: pit stability score3 is calculated, i.e., an evaluation value of the candidate for the stability evaluation item is calculated.

Step S805: and (3) pit total score calculation, namely calculating the evaluation value of the candidate object for a preset evaluation item.

The evaluation value of the candidate object for the preset evaluation item may be a sum of the evaluation value of the candidate object for the integrity evaluation item, the evaluation value of the candidate object for the sharpness evaluation item, and the evaluation value of the candidate object for the stability evaluation item.

In one embodiment, before determining the video frames in the video frame set containing the specified object as the same object based on the similarity between the image areas occupied by the specified object in the video frames, the implantation position determining method further includes:

Step one: based on a scene segmentation algorithm, determining a scene to which each video frame in the video image belongs.

Based on the similarity between the image areas occupied by the specified objects in the video frames, determining the video frames in the video frame set containing the specified objects as the same object as each second video frame corresponding to the specified objects, wherein the method comprises the following steps:

Step two: for each scene, based on the similarity between the image areas occupied by the specified objects in the video frames, determining the video frame set containing the specified objects as the same object from all video frame sets belonging to the scene, and taking the video frame set as the video frame set corresponding to the specified object.

Step three: and determining the video frames in the video frame set corresponding to the specified object as second video frames corresponding to the specified object.

In the embodiment of the application, each video frame in the video image can be divided according to the affiliated scene based on a scene segmentation algorithm. The scene segmentation algorithm may be a pixel comparison method, a histogram comparison method, an edge comparison method, or a block matching method, or may also perform scene segmentation based on a convolutional neural network trained in advance.

Since a specific object often appears in only one scene in a video image, each video frame set belonging to the scene can be associated with each scene, that is, a video frame set containing the specific object as the same object is determined from each video frame set belonging to the scene, and the video frame set corresponding to the specific object is used as the video frame set corresponding to the specific object. And the efficiency of implantation position determination can be further improved without comparing each video frame set contained in the whole video image.

In one embodiment, the scene segmentation may be preceded by shot segmentation, and each video frame in the video image may be divided according to the belonging shots based on a shot segmentation algorithm. The shot segmentation algorithm may be a pixel comparison method, a histogram comparison method, an edge comparison method, or a block matching method. Since a specified object often appears in only one shot in a video image, each video frame set belonging to the shot can be associated for each shot, that is, a video frame set containing the specified object as the same object is determined from each video frame set belonging to the shot, and the video frame set corresponding to the specified object is used as the video frame set corresponding to the specified object. And the efficiency of implantation position determination can be further improved without comparing each video frame set contained in the whole video image.

After shot segmentation, a start frame number (STARTFRAME), a stop frame number (EndFrame) and a shot number (ShotId) of each shot can be obtained. And obtaining the lens number contained in each scene after the scene is segmented.

In one embodiment, referring to fig. 9, fig. 9 is a flowchart of clustering a video frame set according to an embodiment of the present application.

Step S901: and randomly extracting a preset number of video frames corresponding to each target ID in the same scene. Wherein the preset number may be 5 or 6.

Step S902: and cutting the video frame according to the detection frame to extract a target part. The detection frame is the target frame in the above embodiment.

Step S903: and calculating the similarity between the pictures by using the trained deep learning network. That is, the similarity between the pictures corresponding to the extracted target portions is calculated using the deep learning network.

Step S904: and clustering the target IDs according to the similarity. For example, the target IDs may be clustered in combination with similarity using a K-means clustering algorithm or an AGNES clustering algorithm. After clustering the target IDs, a clustering result can be obtained, as shown in table (2). The information recorded in table (2) includes: a target ID set, the data type being List (List type), containing the target IDs of the video frame sets of the same specified object. The implantation location category, the data type is Str (string type). Duration, data type is List (List type), and the accumulated time from the beginning of implantation position to the end of implantation position. Associating the pits, wherein the data type is List (List type), and corresponding pit numbers can be set for a set of target IDs of a video frame set containing the same appointed object to represent one pit, and then the same pit numbers are used for marking the pits containing the same appointed object belonging to the same scene.

Watch (2)

Referring to fig. 10, fig. 10 is a schematic diagram of a video frame set clustering result according to an embodiment of the present application. As shown in fig. 10, the lens with the lens number k includes four objects, namely a dining table, an office table, a tea table and a bedside table. Wherein, dining table class includes: dining table 1 and dining table 2, video frames in the video frame sets with target IDs of 0,5 and 6 comprise dining table 1, and video frames in the video frame sets with target IDs of 10 and 11 comprise dining table 2. The desk class includes: desk 1 and desk 2, video frames in the video frame sets with object IDs 1,3 comprise desk 1, and video frames in the video frame sets with object IDs 5, 6, 7 comprise desk 2. The tea table class comprises a tea table 1, and video frames in a video frame set with target IDs of 7 and 8 comprise the tea table 1. Bedside rug class includes: bedside table 1 and bedside table 2, the video frames in the video frame set with the object ID 2 comprise the bedside table 1, and the video frames in the video frame set with the object IDs 12, 13 and 15 comprise the bedside table 2.

Referring to fig. 11, fig. 11 is a schematic diagram of another clustering result of video frame sets according to an embodiment of the present application. As shown in fig. 11, the scene with the scene number k includes four objects of dining table, office table, tea table and bedside table. Wherein, dining table class includes: dining table 1, dining table 2 and dining table 3. Video frames in the video frame sets with object IDs 0, 5 and 6 comprise table 1, video frames in the video frame sets with object IDs 10 and 11 comprise table 2, and video frames in the video frame sets with object IDs 13 and 14 comprise table 3. Dining table 1 and dining table 2 belong to camera lens A, dining table 3 belongs to camera lens B, and dining table 1 and dining table 3 are the same target.

The desk class includes: desk 1, desk 2, desk 3, desk 4 and desk 5, video frames in the video frame sets with object IDs 1, 3 comprise desk 1, video frames in the video frame sets with object IDs 5, 6, 7 comprise desk 2, video frames in the video frame sets with object IDs 16, 17 comprise desk 3, video frames in the video frame set with object ID 19 comprise desk 4, and video frames in the video frame sets with object IDs 21, 22 comprise desk 5. The office table 1 and the office table 2 belong to the lens a, the office table 3, the office table 4 and the office table 5 belong to the lens B, and the office table 2 and the office table 4 are the same object.

The tea table class comprises a tea table 1 and a tea table 2, wherein video frames in a video frame set with target IDs of 7 and 8 comprise the tea table 1, and video frames in a video frame set with target IDs of 18 and 20 comprise the tea table 2. Tea table 1 belongs to camera lens A, and tea table 2 belongs to camera lens B.

Bedside rug class includes: bedside table 1 and bedside table 2, the video frames in the video frame set with the object ID 2 comprise the bedside table 1, and the video frames in the video frame sets with the object IDs 12 and 15 comprise the bedside table 2. The bedside table 1 and the bedside table 2 belong to a lens B.

In one embodiment, referring to fig. 12, fig. 12 is a sixth flowchart of a method for determining an implantation position according to an embodiment of the present application.

Step S1201: the video is input, i.e., video images are acquired that require implantation of a three-dimensional image asset.

Step S1202: target detection and tracking is performed, i.e., a set of video frames containing the same designated object is determined based on a target detection algorithm, as well as a target tracking algorithm. Namely, step S601 to step S609 in the above embodiment.

Step S1203: shot and scene segmentation, namely, determining the shot to which each video frame belongs and the scene to which each shot belongs in a video image based on a shot segmentation algorithm and a scene segmentation algorithm.

It is to be understood that the step S1202 and the step S1203 may be performed asynchronously, and the two may be performed simultaneously in parallel, or may be performed sequentially. Based on the detection and tracking result obtained in step S1202, and the shots to which each video frame belongs and the scenes to which each shot belongs in the video image obtained in step S1203, pit aggregation is performed.

Step S1204: pit aggregation, that is, for each shot, based on the similarity between the image areas occupied by the specified objects in the video frames, determining the video frame set containing the specified objects as the same object from all video frame sets belonging to the shot, and taking the video frame set as the video frame set corresponding to the specified object. And determining the video frames in the video frame set corresponding to the specified object as second video frames corresponding to the specified object.

Step S1205: pit scoring, namely determining an alternative object based on a specified object contained in the video image, and calculating an evaluation value of the alternative object for a preset evaluation item based on each second video frame corresponding to each alternative object contained in the video image to obtain an object evaluation value of the alternative object.

Step S1206: and outputting pit information, namely determining a target object based on the object evaluation values of the candidate objects, and providing the target object and the object evaluation value corresponding to the target object for a user. For example, the candidate objects may be sorted in order of the object evaluation values of the candidate objects from high to low, and further, the candidate object having the highest object evaluation value may be determined as the target object. Or a plurality of candidate objects with highest object evaluation values can be selected as target objects, and the plurality of candidate objects and the respective corresponding object evaluation values can be provided for the user subsequently, so that the user can further screen the plurality of candidate objects.

Corresponding to the above method embodiment, the embodiment of the present application further provides an implantation position determining apparatus, referring to fig. 13, fig. 13 is a schematic structural diagram of an implantation position determining apparatus provided by the embodiment of the present application, where the apparatus includes:

the video image acquisition module 1301 is configured to acquire a video image that needs to be implanted with a three-dimensional image resource.

The first determining module 1302 is configured to determine, based on the object detection algorithm, a video frame including a specified object in the video image as a first video frame. When the three-dimensional image resource is implanted at the position of the designated object in the first video frame, the intersection ratio of the image area occupied by the three-dimensional image resource and the image area occupied by the designated object is smaller than a first preset threshold value.

An implantation position determining module 1303 is configured to determine an implantation position of implanting a three-dimensional image resource in the video image based on a position of the specified object in the first video frame.

In one embodiment, the first determination module 1302 includes:

An implant location determination module 1303, comprising:

In one embodiment, the first determining sub-module comprises:

In one embodiment, the apparatus further comprises:

In one embodiment, the region stability rating term includes at least one of: a first region stability evaluation sub-term and a second region stability evaluation sub-term;

In one embodiment, the apparatus further comprises:

The embodiment of the present application also provides an electronic device, as shown in fig. 14, including a processor 1401, a communication interface 1402, a memory 1403, and a communication bus 1404, where the processor 1401, the communication interface 1402, and the memory 1403 perform communication with each other through the communication bus 1404,

A memory 1403 for storing a computer program;

The processor 1401 is configured to execute the program stored in the memory 1403, and implement the following steps:

The communication bus mentioned by the above terminal may be a peripheral component interconnect standard (PERIPHERAL COMPONENT INTERCONNECT, abbreviated as PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated as EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the terminal and other devices.

The memory may include random access memory (Random Access Memory, RAM) or may include non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, abbreviated as CPU), a network processor (Network Processor, abbreviated as NP), etc.; but may also be a digital signal Processor (DIGITAL SIGNAL Processor, DSP), application Specific Integrated Circuit (ASIC), field-Programmable gate array (FPGA) or other Programmable logic device, discrete gate or transistor logic device, discrete hardware components.

In yet another embodiment of the present application, a computer readable storage medium is provided, in which a computer program is stored, which when executed by a processor, implements the implantation position determination method according to any of the above embodiments.

In yet another embodiment of the present application, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the implantation position determination method of any of the above embodiments.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk Solid STATE DISK (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus, electronic device, and storage medium embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and references to the parts of the description of the method embodiments are only needed.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application are included in the protection scope of the present application.

Claims

1. A method of implant location determination, the method comprising:

Determining an implantation location for implanting the three-dimensional image resource in the video image based on the location of the specified object in the first video frame;

The determining, based on the target detection algorithm, a video frame containing a specified object in the video image as a first video frame includes:

Determining an implantation position for implanting the three-dimensional image resource in the video image based on the position of each specified object in the corresponding second video frame;

before determining the implantation location of the three-dimensional image resource in the video image based on the location of each specified object in the corresponding second video frame, the method further comprises:

2. The method according to claim 1, wherein the determining, based on the object detection algorithm, the video frame of which the specified object included in the video image is the same object as each second video frame corresponding to the specified object includes:

3. The method according to claim 2, wherein before determining, based on the similarity between image areas occupied by the specified object in the video frames, the video frames in the video frame set that include the specified object as the same object as each second video frame corresponding to the specified object, the method further comprises:

4. The method according to claim 2, wherein the method further comprises:

5. The method according to claim 4, wherein in case it is determined that the specified object is not included in the current video frame to be processed based on the object detection algorithm, before the determining that the video frame including the specified object determined last time is the current third video frame, the method further comprises:

6. The method according to claim 1, wherein the evaluation value of an alternative object for the integrity evaluation item is calculated by:

7. The method according to claim 1, wherein the evaluation value of an alternative object for the sharpness evaluation term is calculated by:

8. The method of claim 1, wherein the region stability evaluation term comprises at least one of: a first region stability evaluation sub-term and a second region stability evaluation sub-term;

9. The method according to claim 1, wherein before the calculating, for each candidate object included in the video image, an evaluation value of the candidate object for a preset evaluation item based on each second video frame corresponding to the candidate object, and obtaining an object evaluation value of the candidate object, the method further includes:

10. An implant location determination apparatus, the apparatus comprising:

an implantation position determining module for determining an implantation position of the three-dimensional image resource to be implanted in the video image based on a position of the specified object in the first video frame;

the first determining module includes:

the implant location determination module includes:

an implantation position determining sub-module for determining an implantation position of implanting the three-dimensional image resource in the video image based on a position of each specified object in the corresponding second video frame;

The apparatus further comprises:

11. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

A memory for storing a computer program;

a processor for carrying out the method steps of any one of claims 1-9 when executing a program stored on a memory.

12. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-9.