[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN113793382A - Video image splicing seam searching method and video image splicing method and device - Google Patents

Video image splicing seam searching method and video image splicing method and device Download PDF

Info

Publication number
CN113793382A
CN113793382A CN202110893253.6A CN202110893253A CN113793382A CN 113793382 A CN113793382 A CN 113793382A CN 202110893253 A CN202110893253 A CN 202110893253A CN 113793382 A CN113793382 A CN 113793382A
Authority
CN
China
Prior art keywords
video
video image
image
frame
fisheye
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110893253.6A
Other languages
Chinese (zh)
Other versions
CN113793382B (en
Inventor
刘伟舟
胡晨
周舒畅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Force Map New Chongqing Technology Co ltd
Original Assignee
Beijing Kuangshi Technology Co Ltd
Beijing Megvii Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kuangshi Technology Co Ltd, Beijing Megvii Technology Co Ltd filed Critical Beijing Kuangshi Technology Co Ltd
Priority to CN202110893253.6A priority Critical patent/CN113793382B/en
Publication of CN113793382A publication Critical patent/CN113793382A/en
Priority to PCT/CN2022/098992 priority patent/WO2023011013A1/en
Application granted granted Critical
Publication of CN113793382B publication Critical patent/CN113793382B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • G06T3/047Fisheye or wide-angle transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/32Indexing scheme for image data processing or generation, in general involving image mosaicing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Studio Devices (AREA)

Abstract

本发明提供了一种视频图像的拼缝搜索方法、视频图像的拼接方法和装置,获取第一视频中每帧视频图像的能量图;针对第一帧视频图像,基于其能量图确定其拼缝搜索结果;针对其余每帧视频图像,基于前一帧视频图像的拼缝搜索结果,确定拼缝搜索区域范围;在该范围内,基于当前视频图像的能量图确定其拼缝搜索结果。该方式基于视频图像的能量图确定拼缝搜索结果,并且,对于除第一帧以外的视频图像,先基于前一帧视频图像的拼缝搜索结果确定拼缝搜索区域范围,再在该拼缝搜索区域范围内确定拼缝搜索结果,这种约束拼缝搜索区域范围的方式可以减少前后帧视频图像的拼缝区域差异,缓解拼接后的视频在播放过程中的抖动问题,进而提升全景视频的拼接效果。

Figure 202110893253

The present invention provides a video image stitching search method, a video image stitching method and device, which can obtain an energy map of each frame of video image in a first video; for the first frame of video image, determine its stitching based on its energy map. Search results; for each other frame of video image, based on the search result of the previous frame of video image, determine the search area range of the seam; within this range, determine the search result of the seam based on the energy map of the current video image. This method determines the seam search result based on the energy map of the video image, and, for the video images other than the first frame, first determines the seam search area range based on the seam search result of the previous frame of video image, and then determines the seam search area range in the seam. The search result of the seam is determined within the search area. This way of constraining the search area of the seam can reduce the difference between the seam area of the video images of the previous and previous frames, alleviate the jitter problem of the video after the mosaic during the playback process, and improve the quality of the panoramic video. Stitching effect.

Figure 202110893253

Description

Video image splicing seam searching method and video image splicing method and device
Technical Field
The invention relates to the technical field of video processing, in particular to a method for searching a splicing seam of a video image, a method and a device for splicing the video image.
Background
The panoramic video splicing means that a plurality of videos with overlapped view fields are spliced, specifically, frame images in the videos correspond to one another one by one, and the frame images corresponding to one another in the videos are spliced to obtain a 360-degree panoramic view field video; in the process of splicing the panoramic videos, the joints of each frame of image in a plurality of videos are generally required to be searched, and then the joints of each frame of image are spliced based on the searched joints; the image splicing algorithm in the related technology is mainly applied to splicing of single images, when the image splicing algorithm is used for splicing a plurality of videos to be spliced, the difference of splicing seam areas of front and rear frame images is large easily, and the spliced videos shake in the playing process, so that the splicing effect of the panoramic videos is influenced.
Disclosure of Invention
The invention aims to provide a video image splicing seam searching method, a video image splicing method and a video image splicing device, so as to relieve the shake of a spliced video in the playing process and improve the splicing effect of a panoramic video.
The invention provides a method for searching for a spliced seam of a video image, which comprises the following steps: acquiring an energy map of each frame of video image in a first video; wherein the energy map is used for indicating the position area and the edge of the specified object in the video image; aiming at a first frame of video image in a first video, determining a seam search result of the first frame of video image based on an energy map of the first frame of video image; wherein, the seam search result comprises: a splicing region of the video image and the target image; the target image is a video image corresponding to the video image in the second video; aiming at each frame of video image except the first frame in the first video, determining a seam search area range of the current video image based on a seam search result of a video image of the previous frame of the current video image; and determining a seam search result of the current video image based on the energy graph of the current video image within the seam search area.
Further, the step of obtaining an energy map of each frame of video image in the first video comprises: acquiring a saliency target energy map, a motion target energy map and an edge target energy map of each frame of video image in a first video; and for each frame of video image, fusing a saliency target energy map, a motion target energy map and an edge energy map corresponding to the frame of video image to obtain an energy map of the frame of video image.
Further, the step of obtaining the saliency target energy map, the motion target energy map and the edge energy map of each frame of video image in the first video includes: for each frame of video image in a first video, inputting the video image into a preset neural network model so as to output a saliency target energy map of the frame of video image through the preset neural network model; determining a moving object energy map of the frame of video image based on the moving object in the frame of video image; and carrying out edge detection on each object in the frame video image to obtain an edge energy map of the frame video image.
Further, for a first frame of video image in the first video, the step of determining a seam search result of the first frame of video image based on the energy map of the first frame of video image includes: aiming at a first frame of video image in a first video, calculating a seam search result of the first frame of video image by adopting a dynamic programming algorithm based on an energy map of the first frame of video image.
Further, aiming at each frame of video image except the first frame in the first video, determining a seam search area range of the current video image based on a seam search result of a video image of a previous frame of the current video image; the step of determining the seam search result of the current video image based on the energy map of the current video image within the seam search area comprises the following steps: aiming at each frame of video image except the first frame in the first video, on the basis of a seam searching result of a previous frame of video image of the current video image, increasing a preset constraint condition, and determining a seam searching area range of the current video image; and determining the seam search result of the current video image by adopting a dynamic programming algorithm based on the energy map of the current video image in the seam search area range.
Furthermore, in the first video, each frame of video image has an overlapping area with a target image corresponding to the video image, the area of the overlapping area corresponding to the video image is a first overlapping area, and the area of the overlapping area corresponding to the target image is a second overlapping area; the method further comprises the following steps: inputting an image corresponding to a first overlapping area in each frame of video image and an image corresponding to a second overlapping area in a target image corresponding to the frame of video image into a pre-trained neural network model to obtain a seam prediction result of the frame of video image; wherein, the seam prediction result comprises: and (3) a splicing prediction area of the video image and the corresponding target image.
Further, the pre-trained neural network model is determined by the following method: acquiring a training sample containing a plurality of continuous groups of image pairs to be spliced and a splicing search result of each group of image pairs to be spliced; inputting the group of image pairs to be spliced and the adjacent joint prediction result of the previous group of image pairs to be spliced into an initial neural network model aiming at each group of image pairs to be spliced except the first group of image pairs to be spliced so as to output the joint prediction result of the group of image pairs to be spliced through the initial neural network model; calculating a loss value of a seam prediction result of the group of image pairs to be spliced based on a seam search result of the group of image pairs to be spliced and a preset loss function; updating the weight parameters of the initial neural network model based on the loss values; and continuing to execute the step of obtaining the training samples containing the continuous groups of the image pairs to be spliced until the initial neural network model converges to obtain the neural network model.
Further, after the steps of obtaining a training sample containing a plurality of continuous groups of image pairs to be spliced and obtaining a splicing search result of each group of image pairs to be spliced, the method further comprises: acquiring a preset splicing seam template; the preset splicing seam template comprises a preset splicing seam area; and inputting the first group of image pairs to be spliced and a preset splicing template into the initial neural network model aiming at the first group of image pairs to be spliced so as to output a splicing prediction result of the first group of image pairs to be spliced through the initial neural network model.
The invention provides a video image splicing method, which comprises the following steps: acquiring a first fisheye video and a second fisheye video; wherein the fisheye video images in the first fisheye video and the second fisheye video have an overlapping region; extracting a first target area of each frame of fisheye video image in the first fisheye video and a second target area of each frame of fisheye video image in the second fisheye video; aiming at two frames of fisheye video images which correspond to each other, determining a first equidistant projection picture of the first fisheye video image after the fisheye video image is unfolded and a second equidistant projection picture of the second fisheye video image after the fisheye video image is unfolded based on a first target area and a second target area which correspond to the two frames of fisheye video images and a pre-obtained updating and unfolding parameter value; determining a seam search result based on the first equidistant projection picture and the second equidistant projection picture which correspond to each other; the seam search result is determined by adopting any one of the seam search methods of the video images; and determining a video splicing result of the video images based on the splicing searching result corresponding to each group of two frames of fisheye video images corresponding to each other.
Further, the step of determining the seam search result based on the first equidistant projection picture and the second equidistant projection picture which correspond to each other includes: aligning the first equidistant projection picture and the second equidistant projection picture which correspond to each other; extracting a third overlapping region based on the aligned first and second equidistant projection pictures; performing illumination compensation on the second equidistant projection picture based on the third overlapping area, so that the pixel value of each pixel in the second equidistant projection picture after illumination compensation is matched with the pixel value of each pixel in the corresponding first equidistant projection picture; and determining a seam searching result based on the first equidistant projection picture and the second equidistant projection picture after illumination compensation.
Further, updating the deployment parameter values comprises: the angle of view parameter value, the parameter value of the optical center in the x-axis direction, the parameter value of the optical center in the y-axis direction and the fisheye rotation angle parameter value; the updated unfolding parameter values are determined in advance by: acquiring an initial expansion parameter value and a preset offset range of each expansion parameter; sampling each expansion parameter based on the initial expansion parameter value and the preset offset range of each expansion parameter to obtain a sampling value of each expansion parameter, and determining a third equidistant projection picture after the expansion of the frame fisheye video image in the first fisheye video and a fourth equidistant projection picture after the expansion of the frame fisheye video image in the second fisheye video based on the sampling value of each expansion parameter; extracting a fourth overlapping area of the third equidistant projection picture and the fourth equidistant projection picture; performing cross-correlation calculation on the fourth overlapping area to obtain a first cross-correlation calculation result; and determining an updated unfolding parameter value based on the first cross-correlation calculation result and the preset iteration number.
Further, the step of determining an updated unfolding parameter value based on the first cross-correlation calculation result and the preset iteration number comprises: according to the preset iteration times, repeatedly executing the steps of sampling each expansion parameter based on the initial expansion parameter value and the preset offset range of each expansion parameter to obtain a plurality of first cross-correlation calculation results; selecting a first cross-correlation calculation result with the largest value from the plurality of first cross-correlation calculation results; and determining the sampling value of the expansion parameter corresponding to the first cross-correlation calculation result with the maximum value as an updated expansion parameter value.
Further, the step of aligning the first equidistant projection picture and the second equidistant projection picture which correspond to each other includes: extracting a first characteristic point from the first equidistant projection picture, and extracting a second characteristic point from the second equidistant projection picture; determining a matching characteristic point pair based on the first characteristic point and the second characteristic point; and aligning the first equidistant projection picture and the second equidistant projection picture based on the matched characteristic point pairs.
Further, the step of aligning the first equidistant projection picture and the second equidistant projection picture which correspond to each other includes: moving the second equidistant projection picture according to a preset direction; extracting a plurality of fifth overlapping areas of the first equidistant projection picture and the second equidistant projection picture in the moving process; performing cross-correlation calculation on the fifth overlapping areas respectively to obtain a plurality of second cross-correlation calculation results; aligning the first and second equidistant projection pictures based on the plurality of second cross-correlation calculation results.
Further, aligning the first and second equidistant projection pictures based on the plurality of second cross-correlation calculation results comprises: selecting a second cross-correlation calculation result with the largest value from the plurality of second cross-correlation calculation results; acquiring position coordinates of corresponding first boundary pixel points of a fifth overlapping region corresponding to a second cross-correlation calculation result with the largest numerical value in the first equidistant projection picture and position coordinates of corresponding second boundary pixel points in the second equidistant projection picture; calculating an affine transformation matrix based on the position coordinates of the first boundary pixel points and the position coordinates of the second boundary pixel points; aligning the first and second isometric projection pictures based on the affine transformation matrix.
Further, the step of determining the video stitching result of the video image based on the stitching search result corresponding to each group of two frames of fisheye video images corresponding to each other includes: aiming at each group of two frames of fisheye video images which correspond to each other, determining a fused overlapping area corresponding to the two frames of fisheye video images in the group based on a seam searching result corresponding to the two frames of fisheye video images in the group; replacing the fused overlapping area with a third overlapping area corresponding to the two frames of fisheye video images in the group to obtain an image splicing result of the two frames of fisheye video images in the group; and determining the video splicing result of the video images based on the image splicing result of the two frames of fisheye video images in each group.
The invention provides a device for searching for a spliced seam of a video image, which comprises: the first acquisition module is used for acquiring an energy map of each frame of video image in a first video; wherein the energy map is used for indicating the position area and the edge of the specified object in the video image; the first determining module is used for determining a seam searching result of a first frame video image in a first video based on an energy map of the first frame video image; wherein, the seam search result comprises: a splicing region of the video image and the target image; the target image is a video image corresponding to the video image in the second video; the second determining module is used for determining the range of a seam searching area of the current video image based on the seam searching result of the video image of the previous frame of the current video image aiming at each frame of video image except the first frame in the first video; and determining a seam search result of the current video image based on the energy graph of the current video image within the seam search area.
The invention provides a video image splicing device, which comprises: the second acquisition module is used for acquiring the first fisheye video and the second fisheye video; wherein the fisheye video images in the first fisheye video and the second fisheye video have an overlapping region; the extraction module is used for extracting a first target area of each frame of fisheye video image in the first fisheye video and a second target area of each frame of fisheye video image in the second fisheye video; a third determining module, configured to determine, for two frames of fisheye video images corresponding to each other, a first equidistant projection picture after the fisheye video image of the frame is unfolded in the first fisheye video and a second equidistant projection picture after the fisheye video image of the frame is unfolded in the second fisheye video based on a first target region and a second target region corresponding to the two frames of fisheye video images and a pre-obtained updated unfolding parameter value; the fourth determining module is used for determining a seam searching result based on the first equidistant projection picture and the second equidistant projection picture which correspond to each other; the seam search result is determined by adopting the seam search device of the video image; and the fifth determining module is used for determining the video splicing result of the video images based on the splicing searching result corresponding to each group of two frames of fisheye video images corresponding to each other.
The invention provides an electronic system, comprising: the device comprises an image acquisition device, a processing device and a storage device; the image acquisition equipment is used for acquiring preview video frames or image data; the storage device has stored thereon a computer program that, when executed by a processing apparatus, executes any one of the above-described method for searching for a splice of video images, or any one of the above-described method for splicing video images.
The invention provides a computer readable storage medium, on which a computer program is stored, the computer program being executed by a processing device to perform the steps of the method for searching for a patchwork of video images as described above, or the steps of the method for stitching video images as described above.
The invention provides a method for searching for a splicing seam of a video image, a method and a device for splicing the video image, which comprises the steps of firstly, acquiring an energy map of each frame of video image in a first video; then, aiming at a first frame of video image in the first video, determining a seam search result of the first frame of video image based on an energy map of the first frame of video image; aiming at each frame of video image except the first frame in the first video, determining a seam search area range of the current video image based on a seam search result of a video image of the previous frame of the current video image; and determining a seam search result of the current video image based on the energy graph of the current video image within the seam search area. The method determines a seam search result based on the energy diagram of the video image, determines a seam search area range based on the seam search result of the previous frame of video image for the video images except the first frame, and then determines the seam search result in the seam search area range.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic structural diagram of an electronic system according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for searching for a seam of a video image according to an embodiment of the present invention;
FIG. 3 is a flowchart of another method for searching for a seam of a video image according to an embodiment of the present invention;
FIG. 4 is a flowchart of another method for searching for a seam of a video image according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a neural network model training process according to an embodiment of the present invention;
fig. 6 is a flowchart of a video image stitching method according to an embodiment of the present invention;
FIG. 7 is a schematic diagram of an isometric projection image unfolding method according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a cross-correlation calculation provided by an embodiment of the present invention;
fig. 9 is a schematic diagram of a picture alignment according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of an apparatus for searching for a video image seam according to an embodiment of the present invention;
fig. 11 is a schematic structural diagram of a video image stitching apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In recent years, technical research based on artificial intelligence, such as computer vision, deep learning, machine learning, image processing, and image recognition, has been actively developed. Artificial Intelligence (AI) is an emerging scientific technology for studying and developing theories, methods, techniques and application systems for simulating and extending human Intelligence. The artificial intelligence subject is a comprehensive subject and relates to various technical categories such as chips, big data, cloud computing, internet of things, distributed storage, deep learning, machine learning and neural networks. Computer vision is used as an important branch of artificial intelligence, particularly a machine is used for identifying the world, and the computer vision technology generally comprises the technologies of face identification, living body detection, fingerprint identification and anti-counterfeiting verification, biological feature identification, face detection, pedestrian detection, target detection, pedestrian identification, image processing, image identification, image semantic understanding, image retrieval, character identification, video processing, video content identification, behavior identification, three-dimensional reconstruction, virtual reality, augmented reality, synchronous positioning and map construction (SLAM), computational photography, robot navigation and positioning and the like. With the research and progress of artificial intelligence technology, the technology is applied to various fields, such as security, city management, traffic management, building management, park management, face passage, face attendance, logistics management, warehouse management, robots, intelligent marketing, computational photography, mobile phone images, cloud services, smart homes, wearable equipment, unmanned driving, automatic driving, smart medical treatment, face payment, face unlocking, fingerprint unlocking, testimony verification, smart screens, smart televisions, cameras, mobile internet, live webcasts, beauty treatment, medical beauty treatment, intelligent temperature measurement and the like.
Currently, panoramic video stitching refers to stitching a plurality of videos with overlapped view fields to obtain a 360-degree panoramic view field video; the panoramic video stitching technology can be applied to application scenes such as a motion camera, a teleconference or security monitoring. Generally, panoramic video splicing can be realized by combining a plurality of wide-angle cameras, the coverage field angles of the wide-angle cameras need to exceed 360 degrees, the number of the wide-angle cameras can be reduced by using fisheye cameras, and 360-degree panoramic video can be generated by splicing two fisheye cameras with the field angles larger than 180 degrees. In the process of splicing the panoramic videos, the joints of each frame of image in a plurality of videos are generally required to be searched, and then the joints of each frame of image are spliced based on the searched joints; the image stitching algorithm in the related technology is mainly applied to stitching of a single image, and when the image stitching algorithm is applied to different fisheye cameras, the quality of the obtained stitched image is unstable, namely, the adaptability to different fisheye cameras is poor; moreover, when the image splicing algorithm is used for splicing a plurality of videos to be spliced, the splicing area difference of front and rear frame images is large, and the spliced videos shake in the playing process, so that the splicing effect of the panoramic video is influenced. Based on this, the embodiments of the present invention provide a method for searching for a seam between video images, a method for splicing video images, and an apparatus for splicing video images, where the technique may be applied to splicing multiple videos, and the technique may be implemented by using corresponding software and hardware, and the embodiments of the present invention are described in detail below.
Example one
First, an example electronic system 100 for implementing a method for searching for a patchwork of video images, a method and an apparatus for stitching video images according to an embodiment of the present invention will be described with reference to fig. 1.
As shown in FIG. 1, an electronic system 100 includes one or more processing devices 102, one or more memory devices 104, an input device 106, an output device 108, and one or more image capture devices 110, which are interconnected via a bus system 112 and/or other type of connection mechanism (not shown). It should be noted that the components and structure of the electronic system 100 shown in fig. 1 are exemplary only, and not limiting, and that the electronic system may have other components and structures as desired.
Processing device 102 may be a gateway or may be an intelligent terminal or device that includes a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may process data from and control other components of electronic system 100 to perform desired functions.
Storage 104 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. Volatile memory can include, for example, Random Access Memory (RAM), cache memory (or the like). The non-volatile memory may include, for example, Read Only Memory (ROM), a hard disk, flash memory, and the like. One or more computer program instructions may be stored on a computer-readable storage medium and executed by processing device 102 to implement the client functionality (implemented by the processing device) of the embodiments of the invention described below and/or other desired functionality. Various applications and various data, such as various data used and/or generated by the applications, may also be stored in the computer-readable storage medium.
The input device 106 may be a device used by a user to input instructions and may include one or more of a keyboard, a mouse, a microphone, a touch screen, and the like.
The output device 108 may output various information (e.g., images or sounds) to the outside (e.g., a user), and may include one or more of a display, a speaker, and the like.
Image capture device 110 may capture preview video frames or image data and store the captured preview video frames or image data in storage 104 for use by other components.
For example, the devices in the exemplary electronic system for implementing the method for searching for a patchwork of video images, the method for stitching video images, and the apparatus according to the embodiments of the present invention may be integrally disposed, or may be disposed in a distributed manner, such as integrally disposing the processing device 102, the storage device 104, the input device 106, and the output device 108, and disposing the image capturing device 110 at a specific position where a target image can be captured. When the above-described devices in the electronic system are integrally provided, the electronic system may be implemented as an intelligent terminal such as a camera, a smart phone, a tablet computer, a vehicle-mounted terminal, and the like.
Example two
The embodiment provides a method for searching for a patchwork of a video image, as shown in fig. 2, the method includes the following steps:
step S202, acquiring an energy map of each frame of video image in a first video; wherein the energy map is used to indicate the location areas and edges of the specified objects in the video image.
The first video may be a video captured by a camera or a camera, for example, a video captured by a wide-angle camera or a fisheye camera; for example, if the energy map is a representation form of a gray map, the higher the gray value of a pixel point in the energy map, the larger the energy generally representing the pixel point, the higher the corresponding energy value, whereas, the lower the gray value of the pixel point, the smaller the energy generally representing the pixel point, and the lower the corresponding energy value, the energy value in the energy map may be represented in a normalized manner, i.e., the energy value may be a value between 0 and 1, and the energy distribution between 0 and 1 is adopted; the above-mentioned designated object may be any object in the video image, for example, the designated object may be a human or an animal in the video image, etc.; the position area can be understood as an area occupied by the specified object in the video image; the edge can be understood as an outer edge contour corresponding to the specified object in the video image. In practical implementation, when a search needs to be performed on a seam of a video image, an energy map corresponding to each frame of the video image in a first video is generally acquired, and the energy map of each frame of the video image can indicate a position area and an edge contour where a specified object is located in the frame of the video image.
Step S204, aiming at a first frame of video image in a first video, determining a seam search result of the first frame of video image based on an energy map of the first frame of video image; wherein, the seam search result comprises: a splicing region of the video image and the target image; the target image is a video image corresponding to the video image in the second video.
The second video may be a video captured by a camera or a camera, for example, a video captured by a wide-angle camera or a fisheye camera; the splicing seam area can be understood as an area corresponding to a corresponding splicing seam line when the video image is spliced with the target image; the video images in the first video and the video images in the second video can be in one-to-one correspondence; for example, two fisheye cameras shoot two segments of videos, namely the first video and the second video, at the same time in the same scene, the fields of vision of the two fisheye cameras are different, in the second video, a video image corresponding to a first frame video image of the first video is a first frame video image of the second video, and the first frame video image of the second video corresponds to the target image; in practical implementation, for a first frame of video image in a first video, a target image corresponding to the first frame of video image in the first video in a second video may be determined, and then a seam region between the first frame of video image and the corresponding target image may be determined based on an energy map of the first frame of video image to determine a seam search result of the first frame of video image, for example, a seam region between the first frame of video image and the corresponding target image may be searched from a relatively static region with smaller energy based on an energy map of the first frame of video image.
Step S206, aiming at each frame of video image except the first frame in the first video, determining the range of a seam searching area of the current video image based on the seam searching result of the previous frame of video image of the current video image; and determining a seam search result of the current video image based on the energy graph of the current video image within the seam search area.
The seam search area range can be understood as a constraint range limited for searching seam search results; in practical implementation, for each frame of video image in the first video except the first frame of video image, the corresponding seam search area range is restricted for the seam search result of the current frame of video image based on the seam search result of the previous frame of video image of the current frame of video image, and the seam search result of the current frame of video image is determined based on the energy map of the current frame of video image in the seam search area range; for example, a preset constraint condition may be added based on a seam search result of a previous frame of video image to constrain a seam search range of a current frame of video image, and a seam region between the current frame of video image and a corresponding target image may be searched from a relatively static region with smaller energy based on an energy map of the current frame of video image within the determined seam search range.
Firstly, acquiring an energy map of each frame of video image in a first video; then, aiming at a first frame of video image in the first video, determining a seam search result of the first frame of video image based on an energy map of the first frame of video image; aiming at each frame of video image except the first frame in the first video, determining a seam search area range of the current video image based on a seam search result of a video image of the previous frame of the current video image; and determining a seam search result of the current video image based on the energy graph of the current video image within the seam search area. The method determines a seam search result based on the energy diagram of the video image, determines a seam search area range based on the seam search result of the previous frame of video image for the video images except the first frame, and then determines the seam search result in the seam search area range.
EXAMPLE III
The present embodiment provides another method for searching for a seam of a video image, which is implemented on the basis of the method of the above embodiment, as shown in fig. 3, the method includes the following steps:
step S302, a saliency target energy map, a motion target energy map and an edge energy map of each frame of video image in the first video are obtained.
The above-mentioned saliency target energy map may indicate a position area of a specified first object in a video image, that is, an area occupied by the first object in the video image, the first object being generally the most noticeable object in the video image, and may also be understood as a saliency target object, an object or a subject of interest, or the like in the video image, for example, in a video image including a person, the person being generally the first object in the video image, or the like; in the saliency target energy map, the energy values of the location areas of the first object are typically relatively high.
The above moving object energy map may indicate a location area of a second object specified in the video image, that is, an area occupied by the second object in the video image, the second object being generally a moving object in the video image, such as a running vehicle or the like included in the video image; in the moving object energy map, the energy value of the location area of the second object is typically relatively high.
The edge energy map may indicate an edge of a third object specified in the video image, that is, an outer edge contour corresponding to the third object, the third object generally including the first object and the second object, and may further include other objects included in the video image, and the like; in the edge energy map, the energy value of the edge of the third object is generally relatively high.
In practical implementation, when a search for a seam of a video image is required, a salient target energy map, a moving target energy map and an edge energy map corresponding to each frame of the video image in a first video are generally acquired, so as to determine a seam search result of each frame of the video image based on the three target energy maps.
Specifically, the step S302 can be implemented by the following steps one to three:
the method comprises the steps that firstly, each frame of video image in a first video is input into a preset neural network model, and a significant target energy map of the frame of video image is output through the preset neural network model.
The preset neural network model may be implemented by various convolutional neural networks, such as a residual error network, a VGG network, and the like, and may be a convolutional neural network model of any size, for example, resnet34_05x, and the like; in practical implementation, when the saliency target energy map of each frame of video image in the first video needs to be constructed, the construction can be implemented by a neural network or the like, for example, each frame of video image can be input into the preset neural network, a first object specified in each frame of video image can be detected by the preset neural network, and the saliency target energy map of each frame of video image is output to indicate a position area of the first object specified in each frame of video image. Certainly, the significant target energy map of each frame of video image may also be determined in other manners, which may specifically refer to an implementation manner in the prior art and will not be described herein again.
And secondly, determining a moving object energy map of the frame video image based on the moving object in the frame video image.
In practical implementation, the moving object energy map of each frame of video image may be determined by optical flow calculation or the like based on the moving object in each frame of video image, for example, for each frame of video image except the first frame in the first video, the moving object may be determined based on the current frame of video image and the previous frame of video image, and then the moving object energy map of the current frame of video image may be determined based on the determined moving object; the detection of the moving target can be understood as a process of presenting and marking an object with a spatial position change in an image sequence or a video as a foreground. Of course, the moving target energy map of each frame of video image may also be determined in other manners, which may specifically refer to the implementation manner in the prior art and will not be described herein again.
And step three, performing edge detection on each object in the frame of video image to obtain an edge energy map of the frame of video image.
The purpose of edge detection is to find a set formed by pixels with severe brightness change in a video image, and the set is often a contour. In practical implementation, optical flow calculation and other manners may be adopted to perform edge detection on each object included in each frame of video image, determine the outline of each object, and obtain an edge energy map of each frame of video image. Certainly, the edge energy map of each frame of video image may also be determined in other manners, which may specifically refer to an implementation manner in the prior art and will not be described herein again.
And step S304, for each frame of video image, fusing the saliency target energy map, the motion target energy map and the edge energy map corresponding to the frame of video image to obtain the energy map of the frame of video image.
For each frame of video image, after obtaining a saliency target energy map, a motion target energy map and an edge energy map of the frame of video image, fusing the three energy maps to obtain a fused energy map corresponding to the frame of video image, namely the energy map of the frame of video image; the fused energy map contains the position area and the edge of a salient target object, the position area and the edge of a moving target and the edge of other objects in the frame of video image.
Step S306, aiming at a first frame of video image in a first video, calculating a seam search result of the first frame of video image by adopting a dynamic programming algorithm based on an energy map of the first frame of video image; wherein, the seam search result comprises: a splicing region of the video image and the target image; the target image is a video image corresponding to the video image in the second video.
The dynamic programming algorithm can solve the problem in a recursion mode by splitting the problem and defining the problem state and the relation between the states; the problem to be solved is decomposed into a plurality of sub-problems, the sub-problems are solved in sequence, and the solution of the former sub-problem provides useful information for the solution of the latter sub-problem; when any sub-problem is solved, various possible local solutions are listed, the local solutions which are possible to reach the optimal are reserved through decision, and other local solutions are discarded; solving each subproblem in sequence, wherein the last subproblem is the solution of the initial problem; in this embodiment, for a first frame of video image in a first video, a plurality of seam search results of the first frame of video image may be searched through a dynamic programming algorithm based on an energy map of the first frame of video image, and an optimal seam search result may be selected from the plurality of seam search results, where the optimal seam search result is generally a seam between the first frame of video image and a corresponding target image, which is searched from a relatively static background with relatively low energy, and the shape of the seam may be irregular, and the optimal seam search result generally avoids a position area and an edge of a salient target object, a position area and an edge of a moving target, and edges of other objects included in the first frame of video image.
Step S308, aiming at each frame of video image except the first frame in the first video, on the basis of the seam searching result of the previous frame of video image of the current video image, a preset constraint condition is added, and the seam searching area range of the current video image is determined.
The preset constraint condition may be set according to actual requirements, for example, for each frame of video image in the first video except for the first frame, the constraint condition may be that on the basis of a previous frame of seam search result, a difference between an upper edge and a lower edge is constrained to be less than 50 pixels, that is, a difference between the seam search result of the current frame of video image and the upper edge and the lower edge of the previous frame of seam search result is constrained not to exceed 50 pixels, and a seam search range of the current frame of video image is determined based on the constraint condition.
And S310, determining a seam search result of the current video image by adopting a dynamic programming algorithm based on the energy map of the current video image in the seam search area range.
Aiming at each frame of video image except the first frame in the first video, after determining the range of a seam searching area of the current frame of video image, calculating a seam searching result of the current frame of video image by adopting a dynamic programming algorithm based on an energy map of the current frame of video image; the seam search result usually avoids the position area and the edge of a salient target object, the position area and the edge of a moving target and the edge of other objects contained in the current frame video image; usually from a relatively static, low-energy background, a search is made for a seam between the video image of the current frame and the corresponding target image. In addition, there may be video images that cannot be normally searched in the first video, and for such video images, the step S306 may be adopted to determine the seam search result of each frame of video image in such video images based on the single frame of video image search.
Firstly, acquiring a saliency target energy map, a motion target energy map and an edge energy map of each frame of video image in a first video; for each frame of video image, fusing a saliency target energy map, a motion target energy map and an edge energy map corresponding to the frame of video image to obtain an energy map of the frame of video image; then, aiming at a first frame of video image in the first video, calculating a seam search result of the first frame of video image by adopting a dynamic programming algorithm based on an energy map of the first frame of video image; aiming at each frame of video image except the first frame in the first video, on the basis of a seam searching result of a video image of the previous frame of the current video image, a preset constraint condition is added, and a seam searching area range of the current video image is determined. And finally, determining the seam searching result of the current video image by adopting a dynamic programming algorithm based on the energy map of the current video image in the seam searching area range. The method determines a seam search result based on the energy diagram of the video image, determines a seam search area range based on the seam search result of the previous frame of video image for the video images except the first frame, and then determines the seam search result in the seam search area range.
Example four
In the method, in a first video, each frame of video image and a target image corresponding to the video image have an overlapping area, a corresponding area of the overlapping area in the video image is a first overlapping area, and a corresponding area in the target image is a second overlapping area; in practical implementation, each frame of video image in the first video has an overlap region with a corresponding target image in the second video, where the first overlap region may be understood as a region corresponding to a part of pictures in the overlap region in each frame of video image in the first video; the second overlap area may be understood as an area corresponding to a part of the picture in the overlap area in each frame of the target image in the second video; as shown in fig. 4, the method includes the steps of:
step S402, acquiring an energy map of each frame of video image in a first video; wherein the energy map is used to indicate the location areas and edges of the specified objects in the video image.
Step S404, aiming at a first frame of video image in a first video, determining a seam search result of the first frame of video image based on an energy map of the first frame of video image; wherein, the seam search result comprises: a splicing region of the video image and the target image; the target image is a video image corresponding to the video image in the second video.
Step S406, aiming at each frame of video image except the first frame in the first video, determining the range of a seam searching area of the current video image based on the seam searching result of the video image of the previous frame of the current video image; and determining a seam search result of the current video image based on the energy graph of the current video image within the seam search area.
Step S408, inputting an image corresponding to a first overlapping area in each frame of video image and an image corresponding to a second overlapping area in a target image corresponding to the frame of video image into a pre-trained neural network model to obtain a joint prediction result of the frame of video image; wherein, the seam prediction result comprises: and (3) a splicing prediction area of the video image and the corresponding target image.
The pre-trained neural network model can be realized by various convolutional neural networks, such as a Unet network, a residual error network, a VGG network, and the like, and the preset neural network model can be a convolutional neural network model of any size, such as resnet34_05x, and the like; in practical implementation, a dynamic programming algorithm can be adopted based on the execution process to determine the seam search result of each frame of video image in the first video, but the dynamic programming algorithm is long in time consumption and low in execution efficiency, so that the seam search mode can be distilled based on a neural network mode, hardware acceleration can be realized through the neural network mode, the processing speed is increased, and the processing efficiency is improved. In specific implementation, for each frame of video image, an image corresponding to a first overlapping area in the frame of video image and an image corresponding to a second overlapping area in a target image corresponding to the frame of video image may be input into the pre-trained neural network model, and a joint prediction result of the frame of video image and the corresponding target image may be obtained through the pre-trained neural network model, where the joint prediction result includes a joint prediction area of the frame of video image and the corresponding target image.
The pre-trained neural network model is determined through the following steps four to nine:
and step four, acquiring training samples containing a plurality of continuous groups of image pairs to be spliced and a splicing search result of each group of image pairs to be spliced.
The image pair to be spliced usually comprises two images to be spliced; the training samples may also be referred to as multi-frame data sets, which generally include a plurality of consecutive sets of image pairs to be stitched; specifically, the training sample may be constructed in the following manner: firstly, acquiring a single image, wherein the single image can be represented by overlap _ left, performing geometrical transform (geometric transformation) on the single image to obtain an image to be spliced, which is paired with the single image, wherein the image to be spliced can be represented by overlap _ right, and the single image and the image to be spliced form a group of image pairs to be spliced; and then generating sequence pictures based on the geographic transform, namely generating a plurality of groups of image pairs to be spliced to simulate a video scene, wherein overlap _ right and overlap _ left in the plurality of groups of image pairs to be spliced can be distinguished through numbers. The above described geometrical transform generally includes random translation or rotation operations on images, and each image to be stitched in the generated multiple sets of image pairs to be stitched may have a black edge. In practical implementation, when a trained neural network model needs to be obtained, a training sample including a plurality of groups of image pairs to be spliced and a seam search result of each group of image pairs to be spliced are generally acquired, wherein the seam search result of each group of image pairs to be spliced can be determined in the related steps, and the seam search result of each group of image pairs to be spliced is used as a GT (group route, which indicates the classification accuracy of a training set with supervised learning and is used for proving or turning over a certain hypothesis) to train an initial neural network so as to realize prediction of a seam region of each frame of video images.
Step five, acquiring a preset splicing seam template; the preset splicing seam template comprises a preset splicing seam area.
The preset seam area in the preset seam template can be set as a seam mask with the left half being 1 and the right half being 0, or can be set as a seam mask with all 1; wherein, the seam mask is 0 to represent black, and the seam mask is 1 to represent white; in practical implementation, a first group of image pairs to be stitched in the plurality of groups of image pairs to be stitched may be understood as a first frame overlap _ right _1 and a first frame overlap _ left _1 in the analog video, and for the first group of image pairs to be stitched, a preset stitching template including a preset stitching region is generally required to be provided for the first group of image pairs to be stitched in advance.
And step six, inputting the first group of image pairs to be spliced and a preset splicing template into the initial neural network model aiming at the first group of image pairs to be spliced, so as to output a splicing prediction result of the first group of image pairs to be spliced through the initial neural network model.
The initial neural network model can be realized by various convolutional neural networks, such as a residual error network, a VGG network and the like; in practical implementation, referring to a schematic diagram of a Neural Network model training process shown in fig. 5, for a first group of images to be spliced, after the preset splicing template is obtained, overlap _ right _1, overlap _ left _1, and mask preset templates (corresponding to the preset splicing template) in a first group of image pairs to be spliced may be input to an NN (Neural Network), that is, an initial Neural Network model, and a seam search result of the first group of image pairs to be spliced is taken as a GT, and then a seam _ mask _ left _1, that is, a seam prediction result of the first group of image pairs to be spliced is output; the seam prediction result comprises the following steps: and the splicing prediction areas of overlap _ right _1 and overlap _ left _1 in the first group of image pairs to be spliced.
And seventhly, inputting the group of image pairs to be spliced and the adjacent seam prediction results of the previous group of image pairs to be spliced into the initial neural network model aiming at each group of image pairs to be spliced except the first group of image pairs to be spliced, and outputting the seam prediction results of the group of image pairs to be spliced through the initial neural network model.
In practical implementation, each group of image pairs to be spliced except the first group of image pairs to be spliced can be understood as including each frame of overlap _ right except the first frame of overlap _ right _1 and each frame of overlap _ left except the first frame of overlap _ left _1 in the analog video; in actual implementation, for each group of image pairs to be spliced except for the first group of image pairs to be spliced, overlap _ right and overlap _ left in the current group of image pairs to be spliced and a seam prediction result of an adjacent previous group of image pairs to be spliced can be input into the initial neural network model, a seam search result of the current group of image pairs to be spliced is used as GT, and a seam prediction result of the current group of image pairs to be spliced is output; the seam prediction result comprises the following steps: the method comprises the steps that a splicing prediction area of overlap _ right and overlap _ left in an image pair to be spliced of a current group is obtained; as shown in fig. 5, the second frame overlap _ right _2, overlap _ left _2, and the first frame seam _ mask _ left _1, that is, the seam prediction result of the first group of image pairs to be stitched, are input into the NN, that is, the initial neural network model, and the seam search result of the second group of image pairs to be stitched is taken as GT, and seam _ mask _ left _2, that is, the seam prediction result of the second group of image pairs to be stitched is output; and analogizing in turn, and outputting the seam prediction result of each group of image pairs to be spliced except the first group of image pairs to be spliced.
And step eight, calculating the loss value of the seam prediction result of the group of image pairs to be spliced based on the seam search result of the group of image pairs to be spliced and a preset loss function.
The Loss function can be used for evaluating the degree of inconsistency between the seam prediction result output by the initial neural network model and the corresponding real seam search result, the degree of inconsistency can be represented by the Loss value, and the Loss function can be represented by Loss; in practical implementation, the above-mentioned loss function can be designed as follows:
loss=loss_continue*scale+loss_gt;
wherein, loss _ continue ═ Max (L1(mask _ cur-mask _ prev), margin);
loss_gt=L1(mask_cur-mask_gt);
the loss _ continuity represents the continuity loss between the output of the current group of image pairs to be spliced and the output of the previous group of image pairs to be spliced; scale represents the continuity loss weight ratio, which is generally set to 0.1, and loss _ GT represents the L1 loss calculated between the output of the current set of image pairs to be spliced and the corresponding GT; mask _ cur represents a joint mask obtained by predicting the current group of image pairs to be spliced, namely a joint prediction result obtained by predicting the current group of image pairs to be spliced; the mask _ prev represents a seam mask obtained by predicting a previous group of image pairs to be spliced, namely a seam prediction result obtained by predicting the previous group of image pairs to be spliced; margin is an allowable threshold for inconsistency, typically set to 0.2; the mask _ gt is a predicted seam truth value of the current group of image pairs to be spliced, and the predicted seam truth value is a real seam searching result of the current group of image pairs to be spliced.
Step nine, updating the weight parameters of the initial neural network model based on the loss values; and continuing to execute the step of obtaining the training samples containing the continuous groups of the image pairs to be spliced until the initial neural network model converges to obtain the neural network model.
The weight parameters may include all parameters in the initial neural network model, such as convolution kernel parameters, and when the initial neural network model is trained, all parameters in the initial neural network model are generally required to be updated based on a loss value of a seam prediction result of the set of image pairs to be stitched, so as to train the initial neural network model. And then, continuously executing the step of obtaining training samples containing a plurality of continuous groups of image pairs to be spliced until the initial neural network model converges or the loss value converges, and finally obtaining the trained neural network model.
The method for searching for the seams of the video images comprises the steps that aiming at each frame of video image, an image corresponding to a first overlapping area in the frame of video image and an image corresponding to a second overlapping area in a target image corresponding to the frame of video image are input into a pre-trained neural network model, and a seam prediction result of the frame of video image is obtained; the neural network is used for predicting the seam area, so that the processing speed of predicting the seam area can be increased, and the processing efficiency is improved.
EXAMPLE five
The embodiment provides a video image stitching method, as shown in fig. 6, the method includes the following steps:
step S602, a first fisheye video and a second fisheye video are obtained; wherein the fisheye video images in the first fisheye video and the second fisheye video have an overlapping region.
The fisheye lens is a special camera lens with short focal length and large field of view, and the field angle of the fisheye lens can be close to or exceed 180 degrees; in practical implementation, in the same scene, at the same time, a first fisheye video may be captured through the first fisheye lens, a second fisheye video may be captured through the second fisheye lens, the corresponding visual fields of the first fisheye lens and the second fisheye lens are usually different, and the first fisheye video and the second fisheye video have overlapping fields of view, and since the first fisheye lens and the second fisheye lens are captured simultaneously, the corresponding fisheye video images in the first fisheye video and the second fisheye video usually have overlapping regions, for example, a first frame fisheye video image in the first fisheye video and a first frame fisheye video image in the second fisheye video have overlapping regions, a second frame fisheye video image in the first fisheye video and a second frame fisheye video image in the second fisheye video have overlapping regions, and so on.
Step S604, a first target region of each frame of fisheye video image in the first fisheye video and a second target region of each frame of fisheye video image in the second fisheye video are extracted.
The target area can be understood as an effective area in the fisheye video image, namely an area containing a shot object; in practical implementation, an original fisheye video image acquired through a fisheye lens is generally square or rectangular, a circular area exists in the square or rectangular area, the corresponding effective area is an effective area of the fisheye video image, the area except the circular area in the square or rectangular area is generally a black background area, after a first fisheye video and a second fisheye video are acquired, a first target area corresponding to the circular area of each frame of fisheye video image in the first fisheye video needs to be extracted, and a second target area corresponding to the circular area of each frame of fisheye video image in the second fisheye video needs to be extracted; specifically, a preset pixel value may be used as a threshold to filter a black background region in the fisheye video image, for example, a pixel value 20 may be used as a threshold to extract a circular effective region of the fisheye video image.
Step S606, for two frames of fisheye video images corresponding to each other, based on the first target region and the second target region corresponding to the two frames of fisheye video images, and the pre-obtained updated expansion parameter value, determining a first equidistant projection picture after the frame of fisheye video image in the first fisheye video is expanded, and a second equidistant projection picture after the frame of fisheye video image in the second fisheye video is expanded.
The equidistant projection picture can also be called an equidistant cylindrical projection picture; the updated unfolding parameters generally include fisheye lens parameters related to the unfolding of fisheye video images, and the like, and for example, the parameters may include a field angle parameter value, a related parameter value of an optical center, or a fisheye rotation angle parameter value; in practical implementation, the manner of unfolding the fisheye video image into equidistant projection pictures is generally as follows: firstly, extracting a target area, namely extracting a fisheye video image with a circular effective area, converting a two-dimensional fisheye coordinate into a three-dimensional coordinate, and then expanding the spherical coordinate of the fisheye video image with the circular effective area according to longitude and latitude mapping to obtain an expanded equidistant projection picture corresponding to the fisheye video image.
For further explanation of the process of expanding the fisheye video image into the equidistant projection image, referring to a schematic diagram of an equidistant projection image expansion manner shown in fig. 7, in a normalized thin film coordinates (normalized fisheye coordinates) corresponding to a target region of the fisheye video image, a pixel point with a coordinate position (x, y) is taken as an example, and a pixel point with a coordinate position (x, y) is apart from a pixel point with a coordinate position (x, y)The distance from the origin in the coordinate system is r, the included angle with the x-axis direction is θ, the fisheye video image with the circular effective area extracted is converted from the normalized fisheye coordinate to a three-dimensional coordinate through conversion formulas phi, r, alert/2 and theta, atan2(y, x), that is, in fig. 7, from a process from a 2D fishery to a 3D vector, a pixel point (x, y) in the normalized fisheye coordinate system is correspondingly converted to P (x, y, z) in the three-dimensional coordinate system, and then the longitude is 2(P, y, z) through a conversion formulay,Px),latitude=atan2(Pz,√(Px 2+Py 2) Get the corresponding longitude and latitude values, i.e. the process of getting longitude/latitude values from 3D vector to longitude/latitude in fig. 7; then, the three-dimensional coordinate system is mapped and expanded according to longitude and latitude through x-longitude/pi and y-2-longitude/pi, and a corresponding position of a pixel point with a coordinate position (x, y) in an equidistant projection picture in a target area of the fisheye video image is obtained, namely in fig. 7, a process of obtaining a two-dimensional equidistant projection picture from a 3D vector to a 2D equiangular (three-dimensional vector); and obtaining the equidistant projection picture corresponding to the fisheye video image based on the corresponding position and the pixel value of each pixel point in the target area in the equidistant projection picture.
According to the process, the fisheye video image after the target area is extracted can be input into the corresponding equidistant projection expansion module, namely, an equidistant projection picture can be output, the parameter of the field angle of the fisheye lens is required to be used in the conversion formula, namely, the field angle of the fisheye lens is known, and then the fisheye video image can be expanded according to the conversion formula; the value r in the above formula may be determined according to the coordinate value of (x, y), and specifically, reference may be made to a determination method in the prior art, which is not described herein again. The remap parameters of the image transformation, namely the mapping relation of the coordinates represented by the conversion formula, can be obtained by calculation in the expansion mode, and because the remap parameters are processed in a pixel-by-pixel serial mode, the processing process is time-consuming, the processing efficiency is low, and the hardware implementation efficiency can be improved by processing based on the block warp, wherein the block warp is used for blocking a target area of a fisheye video image, and small images obtained after multiple blocks are processed in parallel, so that the processing efficiency can be improved.
Correspondingly, the isometric projection picture may also be reversely converted into the target region of the fisheye video image, as shown in fig. 7, specifically, the isometric projection picture may be converted into the target region of the fisheye video image by the conversion formulas of length ═ x pi, length ═ y pi/2, Px=cos(latitude)cos(longitude),Py=cos(latitude)sin(longitude),Pz=sin(latitude),r=2atan2(√(Px 2+Pz2),Py)/aperture,θ=atan2(Pz,Px) And x is rcos (theta) and y is rsin (theta), and the process of converting the equidistant projection picture into the target region of the fisheye video image is realized.
In practical implementation, as the manufacturing and installation of the fisheye camera have process errors, in order to avoid the influence of the errors on the expansion process of the equidistant projection picture, corresponding expansion parameters, such as a field angle parameter value, a related parameter value of an optical center or a fisheye rotation angle parameter value, need to be updated to obtain updated expansion parameters; then, aiming at the two frames of fisheye video images which correspond to each other, based on the first target area and the second target area which correspond to the two frames of fisheye video images respectively and the updated expansion parameters, the first target area is expanded into a first equidistant projection picture and the second target area is expanded into a second equidistant projection picture by referring to the equidistant projection picture expansion mode.
Step S608, determining a seam search result based on the first equidistant projection picture and the second equidistant projection picture which correspond to each other; and determining the seam search result by adopting the method in the embodiment.
In actual implementation, after the first equidistant projection picture and the second equidistant projection picture are obtained, corresponding seam searching results can be determined based on the first equidistant projection picture and the second equidistant projection picture which correspond to each other; for example, based on a first equidistant projection picture after the first frame fisheye video image in the first fisheye video is unfolded and a second equidistant projection picture after the first frame fisheye video image in the second fisheye video is unfolded, the seam search result of the first equidistant projection picture and the second equidistant projection picture may be determined according to the scheme for determining the seam search result in the foregoing embodiment.
And step S610, determining a video splicing result of the video images based on the splicing searching result corresponding to each group of two frames of fisheye video images corresponding to each other.
In practical implementation, the seam search result corresponding to each group of two frames of fisheye video images corresponding to each other can be determined according to the steps, and the video stitching result of the video image can be determined based on the obtained seam search results.
The video image splicing method comprises the steps of firstly, obtaining a first fisheye video and a second fisheye video; and extracting a first target area of each frame of fisheye video image in the first fisheye video and a second target area of each frame of fisheye video image in the second fisheye video. And then determining a first equidistant projection picture of the frame of fisheye video image in the first fisheye video after expansion and a second equidistant projection picture of the frame of fisheye video image in the second fisheye video after expansion based on a first target area and a second target area corresponding to the two frames of fisheye video images and a pre-obtained updated expansion parameter value. Finally, determining a seam search result based on the first equidistant projection picture and the second equidistant projection picture which correspond to each other; and determining a video splicing result of the video images based on the splicing searching result corresponding to each group of two frames of fisheye video images corresponding to each other. The method determines a seam search result based on the energy diagram of the video image, determines a seam search area range based on the seam search result of the previous frame of video image for the video images except the first frame, and then determines the seam search result in the seam search area range.
EXAMPLE six
The embodiment provides another video image splicing method, which is implemented on the basis of the method of the embodiment, and comprises the following steps:
step 802, acquiring a first fisheye video and a second fisheye video; wherein the fisheye video images in the first fisheye video and the second fisheye video have an overlapping region.
Step 804, extracting a first target region of each frame of fisheye video image in the first fisheye video and a second target region of each frame of fisheye video image in the second fisheye video.
Step 806, for two frames of fisheye video images corresponding to each other, determining a first equidistant projection picture of the first fisheye video image after the frame of fisheye video image is unfolded and a second equidistant projection picture of the second fisheye video image after the frame of fisheye video image is unfolded based on a first target area and a second target area corresponding to the two frames of fisheye video images and a pre-obtained updated unfolding parameter value.
The updating the expansion parameter value includes: the angle of view parameter value, the parameter value of the optical center in the x-axis direction, the parameter value of the optical center in the y-axis direction and the fisheye rotation angle parameter value; the field angle parameter value may be a combination of a known standard field angle of the fisheye lens and a field angle offset, where the field angle offset may be represented by "attack _ shift", for example, the known standard field angle of the fisheye lens is 190 degrees, an actual field angle may have a deviation due to manufacturing and mounting process errors of the fisheye camera, and the field angle offset is estimated to be +5 degrees, and the field angle parameter value is 190+5 degrees to 195 degrees; the parameter value of the optical center in the x-axis direction and the parameter value of the optical center in the y-axis direction may be understood as coordinates of the optical center in the x-axis direction and the y-axis direction after considering the offset of the optical center, where the offset of the optical center in the x-axis direction may be represented by x _ shift, and the offset of the optical center in the y-axis direction may be represented by y _ shift; the fisheye rotation angle parameter value can be understood as a fisheye rotation angle considering a fisheye rotation angle offset, wherein the fisheye rotation angle offset can be represented by rotation _ shift; the updated deployment parameter value is determined in advance by the following steps eleven to fifteen:
and step eleven, acquiring an initial expansion parameter value and a preset offset range of each expansion parameter.
The initial expansion parameter values generally include an initial field angle parameter value, an initial optical center parameter value in the x-axis direction, an initial optical center parameter value in the y-axis direction, and an initial fisheye rotation angle parameter value, and the initial expansion parameter values may be generally obtained from camera parameters given by a fisheye camera; the preset offset range generally includes an offset range corresponding to each expansion parameter, and the preset offset range is generally an offset range estimated by a technician for each expansion parameter according to an empirical value; for convenience of description, taking a binocular fisheye camera as an example, in actual implementation, for an existing binocular fisheye camera, initial expansion parameter values corresponding to the four expansion parameters may be obtained from camera parameters given by the fisheye camera, and a proper offset range is set for each expansion parameter according to experience, where the offset range is generally not too large, for example, the initial expansion parameter value of the field angle is 190 degrees, and the corresponding preset offset range is set to ± 10 degrees according to experience.
And step twelve, sampling each expansion parameter based on the initial expansion parameter value and the preset offset range of each expansion parameter to obtain a sampling value of each expansion parameter, and determining a third equidistant projection picture after the expansion of the frame of fisheye video image in the first fisheye video and a fourth equidistant projection picture after the expansion of the frame of fisheye video image in the second fisheye video based on the sampling value of each expansion parameter.
In practical implementation, after the initial deployment parameter value and the preset offset range of each deployment parameter are obtained, random sampling can be performed on the four deployment parameters by combining the initial deployment parameter value and the preset offset range of each deployment parameter, and in the four deployment parameters, each deployment parameter takes one sampling value to form a group of sampling values, for example, in the four deployment parameters, the offset of each deployment parameter has two selectable values which are 0 and 1 respectively, and then the sampling values of the four deployment parameters are 2 in total4The combination mode is 16, each combination mode corresponds to a group of sampling values; based on each group of sampling values, each frame of fisheye video image of the extracted target area in the first fisheye video is processedAnd expanding to obtain an expanded third equidistant projection picture, and expanding each frame of fisheye video image of the extracted target area in the second fisheye video to obtain an expanded fourth equidistant projection picture.
And step thirteen, extracting a fourth overlapping area of the third equidistant projection picture and the fourth equidistant projection picture.
Extracting a fourth overlapping region from the expanded third and fourth equidistant projection pictures, wherein the fourth overlapping region generally comprises: the fourth overlapping area corresponds to a partial picture area in the third equal-distance projection picture and a partial picture area in the fourth equal-distance projection picture.
And fourteen, performing cross-correlation calculation on the fourth overlapped area to obtain a first cross-correlation calculation result.
After the fourth overlapping area is extracted, the partial picture area included in the fourth overlapping area and corresponding to the third equidistant projection picture and the partial picture area included in the fourth equidistant projection picture may be subjected to cross-correlation calculation to obtain a first cross-correlation calculation result, where the first cross-correlation calculation result may reflect a measure of similarity between the two partial picture areas. Referring to a schematic diagram of cross-correlation calculation shown in fig. 8, fig. 8 includes two signals f and g, where f × g represents a cross-correlation settlement result of the two signals, it can be seen that the larger the data of the cross-correlation calculation result is, the more similar the two partial picture areas are, and conversely, the smaller the value of the cross-correlation calculation result is, the more dissimilar the two partial picture areas are.
And step fifteen, determining and updating the expansion parameter value based on the first cross-correlation calculation result and the preset iteration times.
The preset iteration number may be set according to an actual requirement, for example, the preset iteration number may be set to 10000, and the update deployment parameter may be determined based on the first cross-correlation calculation result and a preset iteration number in actual implementation.
Specifically, the step fifteen can be realized by the following steps a to C:
and step A, repeatedly executing the steps of sampling each expansion parameter based on the initial expansion parameter value and the preset offset range of each expansion parameter according to preset iteration times to obtain a plurality of first cross-correlation calculation results.
For convenience of understanding, assuming that the preset iteration number is 10000, the step of sampling each expansion parameter is performed based on the initial expansion parameter value and the preset offset range of each expansion parameter, and 10000 times of sampling is repeatedly performed, so that 10000 first cross-correlation calculation results can be obtained.
And B, selecting the first cross-correlation calculation result with the maximum value from the plurality of first cross-correlation calculation results.
Since the larger the value of the first cross-correlation calculation result is, the higher the similarity between the two partial picture regions is, and the more appropriate the sampling value of the corresponding expansion parameter is, the first cross-correlation calculation result with the largest value can be selected from the plurality of calculated first cross-correlation calculation results.
And step C, determining the sampling value of the expansion parameter corresponding to the first cross-correlation calculation result with the maximum value as an updated expansion parameter value.
The first cross-correlation calculation result with the largest value indicates that the similarity between the two partial picture regions is the highest, and therefore, the sampling value of the corresponding expansion parameter is the most appropriate, and the sampling value of the expansion parameter can be determined as the updated expansion parameter value.
The off-line adaptive expansion parameter optimization process can enable the algorithm to adaptively obtain the optimal expansion parameters of the binocular fisheye camera, and the off-line calculation process only needs to be carried out once for the assembled and fixed binocular fisheye camera.
Step 808, aligning the first equidistant projection picture and the second equidistant projection picture which correspond to each other.
Because the updated expansion parameters obtained by offline optimization can only be well aligned to a scene with a fixed depth of field generally, the images need to be dynamically aligned under the condition that the depth of field changes obviously, and the dynamic alignment modes can include two modes, one is a dynamic alignment mode based on feature point matching, and the other is an affine parameter search mode based on cross-correlation maximization; two implementation modes are described below;
the above dynamic alignment method based on feature point matching may be implemented by the following steps twenty to twenty-two:
twenty, extracting a first characteristic point from the first equidistant projection picture, and extracting a second characteristic point from the second equidistant projection picture.
In order to align the images better, it is usually necessary to select representative areas in the equidistant projection image, such as corner points, edge points, bright points of a dark area or dark points of a bright area in the equidistant projection image, and the first feature points may be corner points, edge points, bright points of a dark area or dark points of a bright area extracted from the first equidistant projection image; the second feature points may be corner points, edge points, bright points of a dark area or dark points of a bright area extracted from the second equidistant projection picture, and the like; in practical implementation, the Feature points of the equidistant projection picture can be extracted based on a sift-Invariant Feature Transform (Scale-Invariant Feature Transform) or other modes; the sift is a description used in the field of image processing, has scale invariance, can detect key points in an image, and is a local feature descriptor.
Twenty-one, based on the first characteristic point and the second characteristic point, determining a matched characteristic point pair.
After the first feature point in the first equidistant projection picture and the second feature point in the second equidistant projection picture are extracted through the above steps, feature point matching may be performed on the first feature point and the second feature point in a blocking manner, referring to a picture alignment diagram shown in fig. 9, img1 corresponds to the first equidistant projection picture, img2 corresponds to the second equidistant projection picture, the blocking may be specifically performed in a manner that feature point matching is performed on a left half portion of img1 and a right half portion of img2, feature point matching is performed on a right half portion of img1 and a left half portion of img2 to obtain corresponding matching feature point pairs, based on the obtained matching feature point pairs, img1 and img2 are aligned, the alignment manner may be that a difference between the matching feature points of img1 and img2 in a w dimension is minimized, so that overlapped regions after being spliced may be aligned as much as possible. For example, the matching can be achieved by calculating the euclidean distance of 128-dimensional key points of two groups of feature points, the smaller the euclidean distance is, the higher the similarity is, and when the euclidean distance is smaller than a set threshold, it can be determined that the matching is successful.
And twenty-two, aligning the first equidistant projection picture and the second equidistant projection picture based on the matched characteristic point pairs.
And aligning the first equidistant projection picture and the second equidistant projection picture based on the determined matching characteristic point pairs. And performing dynamic alignment processing on each corresponding first equidistant projection picture and second equidistant projection picture in a characteristic point matching-based mode.
The above-mentioned manner of affine parameter search based on cross-correlation maximization may be implemented by the following steps thirty to thirty-three:
and thirty, moving the second equidistant projection picture according to a preset direction.
For example, referring to a picture alignment diagram shown in fig. 9, for the unfolded images img1 and img2, img2 may be moved laterally to align with img1, for example, img2 may be moved leftward and then img2 may be moved rightward, and during the moving process, the matching degree between img2 and img1 may generally change.
And thirty-one, extracting a plurality of fifth overlapping areas of the first equidistant projection picture and the second equidistant projection picture in the moving process.
By moving the img2 in fig. 9 to extract the overlap area, the corresponding overlap area can be specifically estimated based on the aperature field angle parameter, for example, for a binocular fisheye camera, if the aperature field angle of each fisheye lens is 180 degrees, the overlap area is 0, and if the aperature field angle of each fisheye lens is greater than 180 degrees, the corresponding overlap area can be estimated based on the overlap angle of the aperature field angles of the two fisheye lenses. In practical implementation, in the process of moving the second equal-distance projection picture, a plurality of fifth overlapping regions of the first equal-distance projection picture and the second equal-distance projection picture may be extracted.
And step thirty-two, performing cross-correlation calculation on the fifth overlapped areas respectively to obtain a plurality of second cross-correlation calculation results.
After the plurality of fifth overlapping regions are extracted, cross-correlation calculation may be performed on a partial picture region corresponding to the first equidistant projection picture and a partial picture region corresponding to the second equidistant projection picture included in each fifth overlapping region to obtain respective corresponding second cross-correlation calculation results, and the similarity between the two partial picture regions is reflected by the second cross-correlation calculation results.
And thirty-third, aligning the first equidistant projection picture and the second equidistant projection picture based on a plurality of second cross-correlation calculation results.
Specifically, the step thirty-three can be realized by the following steps H to K:
and H, selecting a second cross-correlation calculation result with the largest value from the plurality of second cross-correlation calculation results.
Since the larger the value of the second cross-correlation calculation result is, the higher the similarity between the two partial picture regions is, the second cross-correlation calculation result with the largest value can be selected from the plurality of calculated second cross-correlation calculation results.
And step I, acquiring the position coordinates of corresponding first boundary pixel points of a fifth overlapping region corresponding to a second cross-correlation calculation result with the largest numerical value in the first equidistant projection picture and the position coordinates of corresponding second boundary pixel points in the second equidistant projection picture.
And step J, calculating an affine transformation matrix based on the position coordinates of the first boundary pixel points and the position coordinates of the second boundary pixel points.
And K, aligning the first equidistant projection picture and the second equidistant projection picture based on the affine transformation matrix.
Searching for an alignment parameter which maximizes the cross-correlation of the overlapping region, where the alignment parameter may also be referred to as an image boundary point, and the image boundary point is a position coordinate of a first boundary pixel point corresponding to the first equidistant projection picture of the fifth overlapping region corresponding to the second cross-correlation calculation result with the largest value, and a position coordinate of a second boundary pixel point corresponding to the second equidistant projection picture; and calculating an affine parameter of the second equidistant projection picture based on the searched position coordinates of the first boundary pixel points and the second boundary pixel points, wherein the affine parameter can also be called as an affine transformation matrix, and based on the affine transformation matrix, the second equidistant projection picture is projectively transformed to be well aligned with the first equidistant projection picture.
Step 810, extracting a third overlapping area based on the aligned first and second equal-distance projection pictures.
Extracting a third overlapping region based on the aligned first and second equidistant projection pictures, specifically, extracting the third overlapping region based on a feature point matching manner, for example, after a matching feature point pair is determined, the first and second equidistant projection pictures may be paired according to the matching feature point pair, and then the third overlapping region may be extracted; for example, for a binocular fisheye camera, if the field angle of the apertura view of each of the two fisheye lenses is 180 degrees, the overlapping area is 0, and if the field angle of the apertura view of each of the two fisheye lenses is greater than 180 degrees, the corresponding third overlapping area can be estimated according to the overlapping angle of the field angles of the apertura view of the two fisheye lenses.
And 812, performing illumination compensation on the second equidistant projection picture based on the third overlapping area, so that the pixel value of each pixel in the illumination-compensated second equidistant projection picture is matched with the pixel value of each pixel in the corresponding first equidistant projection picture.
Performing illumination compensation on the second equidistant projection picture based on the extracted third overlapping region, for example, the distribution of the pixel values of each pixel in the second equidistant projection picture can be mapped to the distribution similar to the pixel value of each pixel in the first equidistant projection picture by adopting a histogram matching mode; of course, other illumination compensation methods may also be adopted, and reference may be made to the illumination compensation method in the related art, which is not described herein again.
Step 814, determining a seam search result based on the first equidistant projection picture and the second equidistant projection picture after the illumination compensation.
In practical implementation, after the first equidistant projection picture and the illumination-compensated second equidistant projection picture are obtained, the first equidistant projection picture and the corresponding illumination-compensated second equidistant projection picture may be determined according to the scheme for determining the seam search result in the foregoing embodiment.
Step 816, for each group of two frames of fisheye video images corresponding to each other, determining a fused overlapping region corresponding to the two frames of fisheye video images in the group based on the patchwork search result corresponding to the two frames of fisheye video images in the group.
In practical implementation, after the patchwork search results corresponding to the two frames of fisheye video images in each group are obtained for each group of two frames of fisheye video images corresponding to each other, the fusion result of the overlapping regions corresponding to the two frames of fisheye video images in each group can be obtained through fusion operation.
And 818, replacing the fused overlapping area with a third overlapping area corresponding to the two frames of fisheye video images in the group to obtain an image splicing result of the two frames of fisheye video images in the group.
In practical implementation, for two frames of fisheye video images in each group, the corresponding fused overlapping region may be used to replace the corresponding aligned third overlapping region, so as to obtain an image stitching result of the two frames of fisheye video images in the group.
In addition, it should be noted that, the image stitching result of the two frames of fisheye video images in each group may also be obtained based on the optical flow, specifically, the optical flow information of the overlapping region of the first equidistant projection image and the second equidistant projection image after illumination compensation is calculated, the remap (remapping) conversion parameter of the second equidistant projection image after illumination compensation is calculated based on the optical flow information, the optical flow information of the overlapping region corresponding to the second equidistant projection image after illumination compensation is fused with the optical flow information of the overlapping region corresponding to the first equidistant projection image based on the calculated remap conversion parameter, so as to realize the fusion of the overlapping region of the first equidistant projection image and the second equidistant projection image after illumination compensation, and for the case that each frame update is not required, the remap conversion parameter may be merged into the remap parameter when the fisheye video image after extracting the target region is expanded into the equidistant projection image, to reduce the amount of computation; the fused pictures in the optical flow mode can be directly obtained, and the image splicing result is obtained.
And step 820, determining a video splicing result of the video images based on the image splicing result of the two frames of fisheye video images in each group.
And according to the time sequence, combining the image splicing results of the two frames of fisheye video images in each group into a video splicing result of the video images.
Firstly, aligning a first equidistant projection picture and a second equidistant projection picture which correspond to each other; extracting a third overlapping region based on the aligned first and second equidistant projection pictures; then, based on the third overlapping area, performing illumination compensation on the second equidistant projection picture so that the pixel value of each pixel in the second equidistant projection picture after illumination compensation is matched with the pixel value of each pixel in the corresponding first equidistant projection picture; determining a seam search result based on the first equidistant projection picture and the second equidistant projection picture after illumination compensation; and finally, aiming at each group of two frames of fisheye video images which correspond to each other, determining a fused overlapping area corresponding to the two frames of fisheye video images in the group based on a seam searching result corresponding to the two frames of fisheye video images in the group. Replacing the fused overlapping area with a third overlapping area corresponding to the two frames of fisheye video images in the group to obtain an image splicing result of the two frames of fisheye video images in the group; and determining the video splicing result of the video images based on the image splicing result of the two frames of fisheye video images in each group. The method determines a seam search result based on the energy diagram of the video image, determines a seam search area range based on the seam search result of the previous frame of video image for the video images except the first frame, and then determines the seam search result in the seam search area range.
According to the method for splicing the video images, the splicing alignment effect of the video images can be improved through the offline adaptive expansion parameter optimization process, and the adaptability to different binocular shooting fisheye modules is improved; the splicing effect of the splicing algorithm on different scene depth scenes can be improved through the dynamic fine alignment algorithm; in addition, the splicing prediction fusion mode based on the video images can improve the splicing stability of the video images and improve the splicing effect of the panoramic video; in addition, a mode of neural network distillation and blocking warp to replace remap operation is adopted in the mode, and hardware acceleration is achieved.
EXAMPLE seven
The present embodiment provides a device for searching for a patchwork of a video image, as shown in fig. 10, the device including: a first obtaining module 100, configured to obtain an energy map of each frame of video image in a first video; wherein the energy map is used for indicating the position area and the edge of the specified object in the video image; the first determining module 101 is configured to determine, for a first frame of video image in a first video, a seam search result of the first frame of video image based on an energy map of the first frame of video image; wherein, the seam search result comprises: a splicing region of the video image and the target image; the target image is a video image corresponding to the video image in the second video; a second determining module 102, configured to determine, for each frame of video image in the first video except for the first frame, a range of a seam search area of the current video image based on a seam search result of a video image of a previous frame of the current video image; and determining a seam search result of the current video image based on the energy graph of the current video image within the seam search area.
Firstly, acquiring an energy map of each frame of video image in a first video; then, aiming at a first frame of video image in the first video, determining a seam search result of the first frame of video image based on an energy map of the first frame of video image; aiming at each frame of video image except the first frame in the first video, determining a seam search area range of the current video image based on a seam search result of a video image of the previous frame of the current video image; and determining a seam search result of the current video image based on the energy graph of the current video image within the seam search area. The device determines a seam search result based on an energy graph of a video image, determines a seam search area range based on a seam search result of a previous frame of video image for the video images except for a first frame, and determines a seam search result in the seam search area range.
Further, the first obtaining module 100 is further configured to: acquiring a saliency target energy map, a motion target energy map and an edge energy map of each frame of video image in a first video; and for each frame of video image, fusing a saliency target energy map, a motion target energy map and an edge energy map corresponding to the frame of video image to obtain an energy map of the frame of video image.
Further, the first obtaining module 100 is further configured to: for each frame of video image in a first video, inputting the video image into a preset neural network model so as to output a saliency target energy map of the frame of video image through the preset neural network model; determining a moving object energy map of the frame of video image based on the moving object in the frame of video image; and carrying out edge detection on each object in the frame video image to obtain an edge energy map of the frame video image.
Further, the first determining module 101 is further configured to: aiming at a first frame of video image in a first video, calculating a seam search result of the first frame of video image by adopting a dynamic programming algorithm based on an energy map of the first frame of video image.
Further, the second determining module 102 is further configured to: aiming at each frame of video image except the first frame in the first video, on the basis of a seam searching result of a previous frame of video image of the current video image, increasing a preset constraint condition, and determining a seam searching area range of the current video image; and determining the seam search result of the current video image by adopting a dynamic programming algorithm based on the energy map of the current video image in the seam search area range.
Furthermore, in the first video, each frame of video image has an overlapping area with a target image corresponding to the video image, the area of the overlapping area corresponding to the video image is a first overlapping area, and the area of the overlapping area corresponding to the target image is a second overlapping area; the apparatus is also configured to: inputting an image corresponding to a first overlapping area in each frame of video image and an image corresponding to a second overlapping area in a target image corresponding to the frame of video image into a pre-trained neural network model to obtain a seam prediction result of the frame of video image; wherein, the seam prediction result comprises: and (3) a splicing prediction area of the video image and the target image.
Further, the apparatus further comprises a neural network model determining module, configured to: acquiring a training sample containing a plurality of continuous groups of image pairs to be spliced and a splicing search result of each group of image pairs to be spliced; inputting the group of image pairs to be spliced and the adjacent joint prediction result of the previous group of image pairs to be spliced into an initial neural network model aiming at each group of image pairs to be spliced except the first group of image pairs to be spliced so as to output the joint prediction result of the group of image pairs to be spliced through the initial neural network model; calculating a loss value of a seam prediction result of the group of image pairs to be spliced based on a seam search result of the group of image pairs to be spliced and a preset loss function; updating the weight parameters of the initial neural network model based on the loss values; and continuing to execute the step of obtaining the training samples containing the continuous groups of the image pairs to be spliced until the initial neural network model converges to obtain the neural network model.
Further, the neural network model determination module is further configured to: acquiring a preset splicing seam template; the preset splicing seam template comprises a preset splicing seam area; and inputting the first group of image pairs to be spliced and a preset splicing template into the initial neural network model aiming at the first group of image pairs to be spliced so as to output a splicing prediction result of the first group of image pairs to be spliced through the initial neural network model.
The implementation principle and the generated technical effects of the apparatus for searching for a seam of a video image provided by the embodiment of the present invention are the same as those of the embodiment of the method for searching for a seam of a video image, and for brief description, reference may be made to the corresponding contents in the embodiment of the method for searching for a seam of a video image, where the embodiment of the apparatus for searching for a seam of a video image is not mentioned.
Example eight
The present embodiment provides a video image stitching apparatus, as shown in fig. 11, the apparatus includes: a second obtaining module 110, configured to obtain a first fisheye video and a second fisheye video; wherein the fisheye video images in the first fisheye video and the second fisheye video have an overlapping region; the extracting module 111 is configured to extract a first target region of each frame of fisheye video image in the first fisheye video and a second target region of each frame of fisheye video image in the second fisheye video; a third determining module 112, configured to determine, for two frames of fisheye video images corresponding to each other, a first equidistant projection picture after the fisheye video image of the frame is unfolded in the first fisheye video and a second equidistant projection picture after the fisheye video image of the frame is unfolded in the second fisheye video based on the first target region and the second target region corresponding to the two frames of fisheye video images and the pre-obtained updated unfolding parameter value; a fourth determining module 113, configured to determine a seam search result based on the first equidistant projection picture and the second equidistant projection picture that correspond to each other; the seam search result is determined by adopting the seam search device of the video image; and a fifth determining module 114, configured to determine a video stitching result of the video images based on the stitching search result corresponding to each group of two frames of fisheye video images corresponding to each other.
The video image splicing device firstly acquires a first fisheye video and a second fisheye video; and extracting a first target area of each frame of fisheye video image in the first fisheye video and a second target area of each frame of fisheye video image in the second fisheye video. And then determining a first equidistant projection picture of the frame of fisheye video image in the first fisheye video after expansion and a second equidistant projection picture of the frame of fisheye video image in the second fisheye video after expansion based on a first target area and a second target area corresponding to the two frames of fisheye video images and a pre-obtained updated expansion parameter value. Finally, determining a seam search result based on the first equidistant projection picture and the second equidistant projection picture which correspond to each other; and determining a video splicing result of the video images based on the splicing searching result corresponding to each group of two frames of fisheye video images corresponding to each other. The device determines a seam search result based on an energy graph of a video image, determines a seam search area range based on a seam search result of a previous frame of video image for the video images except for a first frame, and determines a seam search result in the seam search area range.
Further, the fourth determining module 113 is further configured to: aligning the first equidistant projection picture and the second equidistant projection picture which correspond to each other; extracting a third overlapping region based on the aligned first and second equidistant projection pictures; performing illumination compensation on the second equidistant projection picture based on the third overlapping area, so that the pixel value of each pixel in the second equidistant projection picture after illumination compensation is matched with the pixel value of each pixel in the corresponding first equidistant projection picture; and determining a seam searching result based on the first equidistant projection picture and the second equidistant projection picture after illumination compensation.
Further, the apparatus further includes a parameter value determination module, and updating the deployment parameter value includes: the angle of view parameter value, the parameter value of the optical center in the x-axis direction, the parameter value of the optical center in the y-axis direction and the fisheye rotation angle parameter value; the parameter value determination module is to: acquiring an initial expansion parameter value and a preset offset range of each expansion parameter; sampling each expansion parameter based on the initial expansion parameter value and the preset offset range of each expansion parameter to obtain a sampling value of each expansion parameter, and determining a third equidistant projection picture after the expansion of the frame fisheye video image in the first fisheye video and a fourth equidistant projection picture after the expansion of the frame fisheye video image in the second fisheye video based on the sampling value of each expansion parameter; extracting a fourth overlapping area of the third equidistant projection picture and the fourth equidistant projection picture; performing cross-correlation calculation on the fourth overlapping area to obtain a first cross-correlation calculation result; and determining an updated unfolding parameter value based on the first cross-correlation calculation result and the preset iteration number.
Further, the parameter value determination module is further configured to: according to the preset iteration times, repeatedly executing the steps of sampling each expansion parameter based on the initial expansion parameter value and the preset offset range of each expansion parameter to obtain a plurality of first cross-correlation calculation results; selecting a first cross-correlation calculation result with the largest value from the plurality of first cross-correlation calculation results; and determining the sampling value of the expansion parameter corresponding to the first cross-correlation calculation result with the maximum value as an updated expansion parameter value.
Further, the fourth determining module 113 is further configured to: extracting a first characteristic point from the first equidistant projection picture, and extracting a second characteristic point from the second equidistant projection picture; determining a matching characteristic point pair based on the first characteristic point and the second characteristic point; and aligning the first equidistant projection picture and the second equidistant projection picture based on the matched characteristic point pairs.
Further, the fourth determining module 113 is further configured to: moving the second equidistant projection picture according to a preset direction; extracting a plurality of fifth overlapping areas of the first equidistant projection picture and the second equidistant projection picture in the moving process; performing cross-correlation calculation on the fifth overlapping areas respectively to obtain a plurality of second cross-correlation calculation results; aligning the first and second equidistant projection pictures based on the plurality of second cross-correlation calculation results.
Further, the fourth determining module 113 is further configured to: selecting a second cross-correlation calculation result with the largest value from the plurality of second cross-correlation calculation results; acquiring position coordinates of corresponding first boundary pixel points of a fifth overlapping region corresponding to a second cross-correlation calculation result with the largest numerical value in the first equidistant projection picture and position coordinates of corresponding second boundary pixel points in the second equidistant projection picture; calculating an affine transformation matrix based on the position coordinates of the first boundary pixel points and the position coordinates of the second boundary pixel points; aligning the first and second isometric projection pictures based on the affine transformation matrix.
Further, the fourth determining module 113 is further configured to: aiming at each group of two frames of fisheye video images which correspond to each other, determining a fused overlapping area corresponding to the two frames of fisheye video images in the group based on a seam searching result corresponding to the two frames of fisheye video images in the group; replacing the fused overlapping area with a third overlapping area corresponding to the two frames of fisheye video images in the group to obtain an image splicing result of the two frames of fisheye video images in the group; and determining the video splicing result of the video images based on the image splicing result of the two frames of fisheye video images in each group.
The implementation principle and the generated technical effect of the video image splicing device provided by the embodiment of the invention are the same as those of the video image splicing method embodiment, and for brief description, the corresponding contents in the video image splicing method embodiment can be referred to where the video image splicing device embodiment is not mentioned.
Example nine
The embodiment provides an electronic system, which is characterized in that the electronic system comprises: the device comprises an image acquisition device, a processing device and a storage device; the image acquisition equipment is used for acquiring preview video frames or image data; the storage device has stored thereon a computer program that, when executed by a processing apparatus, executes the above-described method of searching for a splice of video images, or the above-described method of splicing video images.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working process of the electronic system described above may refer to the corresponding process in the foregoing method embodiments, and is not described herein again.
The embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processing device to perform the steps of the above-mentioned video image stitching search method or the steps of the above-mentioned video image stitching method.
The method for searching for a splice of a video image, the method for splicing a video image, and the computer program product of the apparatus provided in the embodiments of the present invention include a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute the method described in the foregoing method embodiments, and specific implementations may refer to the method embodiments and will not be described herein again.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (20)

1. A method for searching for a splice of a video image, the method comprising:
acquiring an energy map of each frame of video image in a first video; wherein the energy map is used to indicate the location area and edges of a specified object in the video image;
for a first frame of video image in the first video, determining a seam search result of the first frame of video image based on an energy map of the first frame of video image; wherein the patchwork search results comprise: a splicing region of the video image and the target image; the target image is a video image corresponding to the video image in a second video;
for each frame of video image except the first frame in the first video, determining a seam search area range of a current video image based on a seam search result of a video image of a previous frame of the current video image; and determining a seam search result of the current video image based on the energy graph of the current video image within the seam search area.
2. The method of claim 1, wherein the step of obtaining an energy map for each frame of video image in the first video comprises:
acquiring a saliency target energy map, a motion target energy map and an edge energy map of each frame of video image in the first video;
and for each frame of video image, fusing the saliency target energy map, the motion target energy map and the edge energy map corresponding to the frame of video image to obtain the energy map of the frame of video image.
3. The method of claim 2, wherein the step of obtaining the saliency target energy map, the motion target energy map, and the edge energy map of each frame of the video image in the first video comprises:
for each frame of video image in the first video, inputting the video image into a preset neural network model so as to output a significant target energy map of the frame of video image through the preset neural network model;
determining a moving object energy map of the frame of video image based on the moving object in the frame of video image;
and carrying out edge detection on each object in the frame video image to obtain an edge energy map of the frame video image.
4. The method according to any one of claims 1-3, wherein the step of determining, for a first frame of video image in the first video, a patchwork search result for the first frame of video image based on an energy map of the first frame of video image comprises:
and aiming at a first frame of video image in the first video, calculating a seam search result of the first frame of video image by adopting a dynamic programming algorithm based on an energy map of the first frame of video image.
5. The method according to any one of claims 1 to 4, wherein the determining, for each frame of video image in the first video except the first frame, a range of a seam search area of a current video image based on a seam search result of a video image of a frame preceding the current video image; within the range of the seam search area, the step of determining the seam search result of the current video image based on the energy map of the current video image comprises the following steps:
aiming at each frame of video image except the first frame in the first video, on the basis of a seam search result of a video image of the previous frame of the current video image, adding a preset constraint condition, and determining a seam search area range of the current video image;
and determining a seam search result of the current video image by adopting a dynamic programming algorithm based on the energy map of the current video image in the seam search area range.
6. The method according to claim 1, wherein each frame of the first video has an overlapping area with a target image corresponding to the video, the overlapping area corresponding to a first overlapping area in the video, and a second overlapping area in the target image; the method further comprises the following steps:
inputting an image corresponding to a first overlapping area in each frame of video image and an image corresponding to a second overlapping area in a target image corresponding to the frame of video image into a pre-trained neural network model to obtain a joint prediction result of the frame of video image; wherein the seam prediction result comprises: and the video image and the corresponding splicing prediction area of the target image.
7. The method of claim 6, wherein the pre-trained neural network model is determined by:
acquiring a training sample containing a plurality of continuous groups of image pairs to be spliced and a splicing search result of each group of image pairs to be spliced;
inputting the group of image pairs to be spliced and the adjacent joint prediction result of the previous group of image pairs to be spliced into an initial neural network model aiming at each group of image pairs to be spliced except the first group of image pairs to be spliced so as to output the joint prediction result of the group of image pairs to be spliced through the initial neural network model;
calculating a loss value of a seam prediction result of the group of image pairs to be spliced based on a seam search result of the group of image pairs to be spliced and a preset loss function;
updating a weight parameter of the initial neural network model based on the loss value; and continuing to execute the step of obtaining training samples containing a plurality of continuous groups of image pairs to be spliced until the initial neural network model converges to obtain the neural network model.
8. The method of claim 7, wherein after the step of obtaining a training sample comprising a plurality of consecutive sets of image pairs to be stitched, and the step of obtaining a stitch search result for each set of image pairs to be stitched, the method further comprises:
acquiring a preset splicing seam template; the preset splicing seam template comprises a preset splicing seam area;
and aiming at a first group of image pairs to be spliced, inputting the first group of image pairs to be spliced and the preset splicing seam template into the initial neural network model so as to output a splicing seam prediction result of the first group of image pairs to be spliced through the initial neural network model.
9. A method for stitching video images, the method comprising:
acquiring a first fisheye video and a second fisheye video; wherein fisheye video images in the first fisheye video and the second fisheye video have an overlapping region;
extracting a first target area of each frame of fisheye video image in the first fisheye video and a second target area of each frame of fisheye video image in the second fisheye video;
determining a first equidistant projection picture of the first fisheye video image after the fisheye video image is unfolded and a second equidistant projection picture of the second fisheye video image after the fisheye video image is unfolded based on the first target area and the second target area corresponding to the two frames of fisheye video images and a pre-obtained updating unfolding parameter value aiming at the two frames of fisheye video images which correspond to each other;
determining a seam search result based on the first equidistant projection picture and the second equidistant projection picture which are mutually corresponding; wherein the seam search result is determined by the method of any one of the preceding claims 1-8;
and determining a video splicing result of the video images based on the splicing searching result corresponding to each group of two frames of fisheye video images corresponding to each other.
10. The method according to claim 9, wherein the step of determining a patchwork search result based on the first and second equidistant projected pictures corresponding to each other comprises:
aligning the first equidistant projection picture and the second equidistant projection picture which correspond to each other;
extracting a third overlapping region based on the aligned first and second equidistant projection pictures;
performing illumination compensation on the second equidistant projection picture based on the third overlapping area, so that the pixel value of each pixel in the second equidistant projection picture after illumination compensation is matched with the pixel value of each pixel in the corresponding first equidistant projection picture;
and determining the seam searching result based on the first equidistant projection picture and the second equidistant projection picture after illumination compensation.
11. The method of claim 9, wherein the updating the deployment parameter value comprises: the angle of view parameter value, the parameter value of the optical center in the x-axis direction, the parameter value of the optical center in the y-axis direction and the fisheye rotation angle parameter value; the updated unfolding parameter values are determined in advance by:
acquiring an initial expansion parameter value and a preset offset range of each expansion parameter;
sampling each expansion parameter based on the initial expansion parameter value and a preset offset range of each expansion parameter to obtain a sampling value of each expansion parameter, and determining a third equidistant projection picture after the expansion of the frame of fisheye video image in the first fisheye video and a fourth equidistant projection picture after the expansion of the frame of fisheye video image in the second fisheye video based on the sampling value of each expansion parameter;
extracting a fourth overlapping area of the third equidistant projection picture and the fourth equidistant projection picture;
performing cross-correlation calculation on the fourth overlapping area to obtain a first cross-correlation calculation result;
and determining the updated unfolding parameter value based on the first cross-correlation calculation result and a preset iteration number.
12. The method of claim 11, wherein the step of determining the updated unfolding parameter value based on the first cross-correlation calculation and a preset number of iterations comprises:
according to the preset iteration times, repeatedly executing the step of sampling each expansion parameter based on the initial expansion parameter value and the preset offset range of each expansion parameter to obtain a plurality of first cross-correlation calculation results;
selecting a first cross-correlation calculation result with the largest value from the plurality of first cross-correlation calculation results;
and determining the sampling value of the expansion parameter corresponding to the first cross-correlation calculation result with the maximum value as the updated expansion parameter value.
13. The method according to claim 10, wherein the step of aligning the first and second equidistant projection pictures corresponding to each other comprises:
extracting a first characteristic point from the first equidistant projection picture, and extracting a second characteristic point from the second equidistant projection picture;
determining a matching feature point pair based on the first feature point and the second feature point;
aligning the first and second equidistant projection pictures based on the matching feature point pairs.
14. The method according to claim 10, wherein the step of aligning the first and second equidistant projection pictures corresponding to each other comprises:
moving the second equidistant projection picture according to a preset direction;
extracting a plurality of fifth overlapping areas of the first equal-distance projection picture and the second equal-distance projection picture in the moving process;
performing cross-correlation calculation on the fifth overlapping areas respectively to obtain a plurality of second cross-correlation calculation results;
aligning the first and second isometric projection pictures based on the plurality of second cross-correlation computation results.
15. The method of claim 14, wherein aligning the first and second isometric projection pictures based on the second plurality of cross-correlation calculations comprises:
selecting a second cross-correlation calculation result with the largest value from the plurality of second cross-correlation calculation results;
obtaining the position coordinates of corresponding first boundary pixel points of a fifth overlapping region corresponding to the second cross-correlation calculation result with the largest numerical value in the first equidistant projection picture and the position coordinates of corresponding second boundary pixel points in the second equidistant projection picture;
calculating an affine transformation matrix based on the position coordinates of the first boundary pixel points and the position coordinates of the second boundary pixel points;
aligning the first and second isometric projection pictures based on the affine transformation matrix.
16. The method according to claim 10, wherein the step of determining the video stitching result of the video image based on the stitching search result corresponding to each set of two corresponding frames of fisheye video images comprises:
aiming at each group of two frames of fisheye video images which correspond to each other, determining a fused overlapping area corresponding to the two frames of fisheye video images in the group based on a seam searching result corresponding to the two frames of fisheye video images in the group;
replacing the third overlapping area corresponding to the two frames of fisheye video images in the group with the fused overlapping area to obtain an image splicing result of the two frames of fisheye video images in the group;
and determining the video splicing result of the video images based on the image splicing result of the two frames of fisheye video images in each group.
17. An apparatus for searching for a splice of a video image, the apparatus comprising:
the first acquisition module is used for acquiring an energy map of each frame of video image in a first video; wherein the energy map is used to indicate the location area and edges of a specified object in the video image;
the first determination module is used for determining a seam search result of a first frame video image in the first video based on an energy map of the first frame video image; wherein the patchwork search results comprise: a splicing region of the video image and the target image; the target image is a video image corresponding to the video image in a second video;
the second determining module is used for determining a seam searching area range of the current video image according to a seam searching result of a video image of a previous frame of the current video image aiming at each frame of video images except the first frame in the first video; and determining a seam search result of the current video image based on the energy graph of the current video image within the seam search area.
18. An apparatus for stitching video images, the apparatus comprising:
the second acquisition module is used for acquiring the first fisheye video and the second fisheye video; wherein fisheye video images in the first fisheye video and the second fisheye video have an overlapping region;
the extraction module is used for extracting a first target area of each frame of fisheye video image in the first fisheye video and a second target area of each frame of fisheye video image in the second fisheye video;
a third determining module, configured to determine, for two frames of fisheye video images corresponding to each other, a first equidistant projection picture after the fisheye video image of the frame is unfolded in the first fisheye video and a second equidistant projection picture after the fisheye video image of the frame is unfolded in the second fisheye video based on the first target region and the second target region corresponding to the two frames of fisheye video images and a pre-obtained updated unfolding parameter value;
the fourth determining module is used for determining a seam searching result based on the first equidistant projection picture and the second equidistant projection picture which are mutually corresponding; wherein, the seam search result is determined by the seam search device of the video image according to claim 17;
and the fifth determining module is used for determining the video splicing result of the video images based on the splicing searching result corresponding to each group of two frames of fisheye video images corresponding to each other.
19. An electronic system, characterized in that the electronic system comprises: the device comprises an image acquisition device, a processing device and a storage device;
the image acquisition equipment is used for acquiring preview video frames or image data;
the storage device has stored thereon a computer program which, when executed by the processing apparatus, executes the method of searching for a patchwork of video images according to any of claims 1 to 8, or the method of stitching video images according to any of claims 9 to 16.
20. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processing device, performs the steps of the method for searching for a mosaic of video images according to any one of claims 1 to 8, or the steps of the method for stitching video images according to any one of claims 9 to 16.
CN202110893253.6A 2021-08-04 2021-08-04 Video image seam search method, video image splicing method and device Active CN113793382B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110893253.6A CN113793382B (en) 2021-08-04 2021-08-04 Video image seam search method, video image splicing method and device
PCT/CN2022/098992 WO2023011013A1 (en) 2021-08-04 2022-06-15 Splicing seam search method and apparatus for video image, and video image splicing method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110893253.6A CN113793382B (en) 2021-08-04 2021-08-04 Video image seam search method, video image splicing method and device

Publications (2)

Publication Number Publication Date
CN113793382A true CN113793382A (en) 2021-12-14
CN113793382B CN113793382B (en) 2024-10-18

Family

ID=78877131

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110893253.6A Active CN113793382B (en) 2021-08-04 2021-08-04 Video image seam search method, video image splicing method and device

Country Status (2)

Country Link
CN (1) CN113793382B (en)
WO (1) WO2023011013A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114708354A (en) * 2022-03-04 2022-07-05 广东省国土资源测绘院 Drawing method, equipment, medium and product of inlaid line
WO2023011013A1 (en) * 2021-08-04 2023-02-09 北京旷视科技有限公司 Splicing seam search method and apparatus for video image, and video image splicing method and apparatus
WO2024119619A1 (en) * 2022-12-05 2024-06-13 深圳看到科技有限公司 Correction method and apparatus for picture captured underwater, and storage medium
TWI851993B (en) * 2022-04-18 2024-08-11 大陸商北京集創北方科技股份有限公司 Image recognition method, image recognition device and information processing device

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116452426B (en) * 2023-06-16 2023-09-05 广汽埃安新能源汽车股份有限公司 Panorama stitching method and device
CN117541764B (en) * 2024-01-09 2024-04-05 北京大学 Image stitching method, electronic device and storage medium
CN117544862B (en) * 2024-01-09 2024-03-29 北京大学 An image splicing method based on parallel processing of image moments
CN118096523B (en) * 2024-04-25 2024-07-12 陕西旭腾光讯科技有限公司 Image stitching method based on computer vision
CN118941753A (en) * 2024-09-30 2024-11-12 浙江大华技术股份有限公司 Image stitching method, device, terminal and computer-readable storage medium
CN119579403B (en) * 2025-01-25 2025-07-04 狮展(上海)展示服务有限公司 Image processing system and method for suspended spherical screen panoramic vision

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103997609A (en) * 2014-06-12 2014-08-20 四川川大智胜软件股份有限公司 Multi-video real-time panoramic fusion splicing method based on CUDA
CN104408701A (en) * 2014-12-03 2015-03-11 中国矿业大学 Large-scale scene video image stitching method
CN104794683A (en) * 2015-05-05 2015-07-22 中国人民解放军国防科学技术大学 Video connecting method based on planar scanning around gradient joint regions
CN105096239A (en) * 2015-07-02 2015-11-25 北京旷视科技有限公司 Method and device for image registration, method and device for image splicing
US20160286138A1 (en) * 2015-03-27 2016-09-29 Electronics And Telecommunications Research Institute Apparatus and method for stitching panoramaic video
CN106210535A (en) * 2016-07-29 2016-12-07 北京疯景科技有限公司 The real-time joining method of panoramic video and device
CN107333064A (en) * 2017-07-24 2017-11-07 广东工业大学 The joining method and system of a kind of spherical panorama video
CN108093221A (en) * 2017-12-27 2018-05-29 南京大学 A kind of real-time video joining method based on suture
CN110519528A (en) * 2018-05-22 2019-11-29 杭州海康威视数字技术股份有限公司 A kind of panoramic video synthetic method, device and electronic equipment
CN110660023A (en) * 2019-09-12 2020-01-07 中国测绘科学研究院 Video stitching method based on image semantic segmentation
CN111709877A (en) * 2020-05-22 2020-09-25 浙江四点灵机器人股份有限公司 Image fusion method for industrial detection
CN111915483A (en) * 2020-06-24 2020-11-10 北京迈格威科技有限公司 Image splicing method and device, computer equipment and storage medium
CN112508849A (en) * 2020-11-09 2021-03-16 中国科学院信息工程研究所 Digital image splicing detection method and device
CN112862685A (en) * 2021-02-09 2021-05-28 北京迈格威科技有限公司 Image stitching processing method and device and electronic system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7756358B2 (en) * 2004-11-30 2010-07-13 Hewlett-Packard Development Company, L.P. System and method of aligning images
CN101533474B (en) * 2008-03-12 2014-06-04 三星电子株式会社 Character and image recognition system based on video image and method thereof
US10136055B2 (en) * 2016-07-29 2018-11-20 Multimedia Image Solution Limited Method for stitching together images taken through fisheye lens in order to produce 360-degree spherical panorama
CN106651767A (en) * 2016-12-30 2017-05-10 北京星辰美豆文化传播有限公司 Panoramic image obtaining method and apparatus
CN110009567A (en) * 2019-04-09 2019-07-12 三星电子(中国)研发中心 Image stitching method and device for fisheye lens
CN113793382B (en) * 2021-08-04 2024-10-18 北京旷视科技有限公司 Video image seam search method, video image splicing method and device

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103997609A (en) * 2014-06-12 2014-08-20 四川川大智胜软件股份有限公司 Multi-video real-time panoramic fusion splicing method based on CUDA
CN104408701A (en) * 2014-12-03 2015-03-11 中国矿业大学 Large-scale scene video image stitching method
US20160286138A1 (en) * 2015-03-27 2016-09-29 Electronics And Telecommunications Research Institute Apparatus and method for stitching panoramaic video
CN104794683A (en) * 2015-05-05 2015-07-22 中国人民解放军国防科学技术大学 Video connecting method based on planar scanning around gradient joint regions
CN105096239A (en) * 2015-07-02 2015-11-25 北京旷视科技有限公司 Method and device for image registration, method and device for image splicing
CN106210535A (en) * 2016-07-29 2016-12-07 北京疯景科技有限公司 The real-time joining method of panoramic video and device
CN107333064A (en) * 2017-07-24 2017-11-07 广东工业大学 The joining method and system of a kind of spherical panorama video
CN108093221A (en) * 2017-12-27 2018-05-29 南京大学 A kind of real-time video joining method based on suture
CN110519528A (en) * 2018-05-22 2019-11-29 杭州海康威视数字技术股份有限公司 A kind of panoramic video synthetic method, device and electronic equipment
CN110660023A (en) * 2019-09-12 2020-01-07 中国测绘科学研究院 Video stitching method based on image semantic segmentation
CN111709877A (en) * 2020-05-22 2020-09-25 浙江四点灵机器人股份有限公司 Image fusion method for industrial detection
CN111915483A (en) * 2020-06-24 2020-11-10 北京迈格威科技有限公司 Image splicing method and device, computer equipment and storage medium
CN112508849A (en) * 2020-11-09 2021-03-16 中国科学院信息工程研究所 Digital image splicing detection method and device
CN112862685A (en) * 2021-02-09 2021-05-28 北京迈格威科技有限公司 Image stitching processing method and device and electronic system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023011013A1 (en) * 2021-08-04 2023-02-09 北京旷视科技有限公司 Splicing seam search method and apparatus for video image, and video image splicing method and apparatus
CN114708354A (en) * 2022-03-04 2022-07-05 广东省国土资源测绘院 Drawing method, equipment, medium and product of inlaid line
CN114708354B (en) * 2022-03-04 2023-06-23 广东省国土资源测绘院 Method, equipment, medium and product for drawing embedded line
TWI851993B (en) * 2022-04-18 2024-08-11 大陸商北京集創北方科技股份有限公司 Image recognition method, image recognition device and information processing device
WO2024119619A1 (en) * 2022-12-05 2024-06-13 深圳看到科技有限公司 Correction method and apparatus for picture captured underwater, and storage medium

Also Published As

Publication number Publication date
WO2023011013A1 (en) 2023-02-09
CN113793382B (en) 2024-10-18

Similar Documents

Publication Publication Date Title
CN113793382B (en) Video image seam search method, video image splicing method and device
Huang et al. Indoor depth completion with boundary consistency and self-attention
CN111160172B (en) Parking space detection method, device, computer equipment and storage medium
Kendall et al. Posenet: A convolutional network for real-time 6-dof camera relocalization
CN113902657B (en) Image stitching method, device and electronic device
CN113256699B (en) Image processing method, image processing device, computer equipment and storage medium
CN108229281B (en) Neural network generation method, face detection device and electronic equipment
Li et al. MODE: Multi-view omnidirectional depth estimation with 360∘ cameras
CN114677330A (en) An image processing method, electronic device and storage medium
CN114648604A (en) Image rendering method, electronic device, storage medium and program product
Zhang et al. Data association between event streams and intensity frames under diverse baselines
Ali et al. Single image Façade segmentation and computational rephotography of House images using deep learning
Han et al. Relating view directions of complementary-view mobile cameras via the human shadow
Garau et al. Fast automatic camera network calibration through human mesh recovery
CN115330695A (en) A parking information determination method, electronic device, storage medium and program product
CN113592777A (en) Image fusion method and device for double-shooting and electronic system
Zhang et al. Line-based geometric consensus rectification and calibration from single distorted manhattan image
CN116543014A (en) Panorama-integrated automatic teacher tracking method and system
CN115620403A (en) Living body detection method, electronic device, and storage medium
Hwang et al. Real-time 2d orthomosaic mapping from drone-captured images using feature-based sequential image registration
Greco et al. 360 tracking using a virtual PTZ camera
Deka et al. Erasing the ephemeral: Joint camera refinement and transient object removal for street view synthesis
CN119832132B (en) Three-dimensional virtual human video synthesis method, system, equipment and storage medium
CN118097030B (en) A 3D reconstruction method based on BundleFusion
Chen et al. Automatic photographic composition based on convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20250205

Address after: No. 257, 2nd Floor, Building 9, No. 2 Huizhu Road, Liangjiang New District, Yubei District, Chongqing, China 401123

Patentee after: Force Map New (Chongqing) Technology Co.,Ltd.

Country or region after: China

Address before: No. 1268, 1f, building 12, neijian Middle Road, Xisanqi building materials City, Haidian District, Beijing 100096

Patentee before: BEIJING KUANGSHI TECHNOLOGY Co.,Ltd.

Country or region before: China

Patentee before: MEGVII (BEIJING) TECHNOLOGY Co.,Ltd.