[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2022148319A1 - 视频切换方法、装置、存储介质及设备 - Google Patents

视频切换方法、装置、存储介质及设备 Download PDF

Info

Publication number
WO2022148319A1
WO2022148319A1 PCT/CN2021/143821 CN2021143821W WO2022148319A1 WO 2022148319 A1 WO2022148319 A1 WO 2022148319A1 CN 2021143821 W CN2021143821 W CN 2021143821W WO 2022148319 A1 WO2022148319 A1 WO 2022148319A1
Authority
WO
WIPO (PCT)
Prior art keywords
image frame
video
switching
target object
image
Prior art date
Application number
PCT/CN2021/143821
Other languages
English (en)
French (fr)
Inventor
夏璐
方德春
邓清珊
邹文进
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to US18/260,192 priority Critical patent/US20240064346A1/en
Priority to EP21917358.0A priority patent/EP4266208A4/en
Publication of WO2022148319A1 publication Critical patent/WO2022148319A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23424Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for inserting or substituting an advertisement
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/245Aligning, centring, orientation detection or correction of the image by locating a pattern; Special marks for positioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/41Structure of client; Structure of client peripherals
    • H04N21/414Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance
    • H04N21/41407Specialised client platforms, e.g. receiver in car or embedded in a mobile appliance embedded in a portable device, e.g. video client on a mobile phone, PDA, laptop
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/472End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content
    • H04N21/47205End-user interface for requesting content, additional data or services; End-user interface for interacting with content, e.g. for content reservation or setting reminders, for requesting event notification, for manipulating displayed content for manipulating displayed content, e.g. interacting with MPEG-4 objects, editing locally
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Definitions

  • the present application relates to the field of video, and in particular, to a video switching method, device, storage medium and device.
  • the present application provides a video switching method, which can perform video switching according to a target object without manual editing by post-production personnel.
  • the technical solution is as follows:
  • a first aspect provides a video switching method, the method includes: determining a target object; calculating a similarity of the target object between a first image frame and a second image frame to obtain a similarity value, wherein the first image frame An image frame is from the first video, and the second image frame is from the second video; acquiring a switching image frame, wherein the switching image frame includes the first image frame and the second image frame whose similarity value is greater than or equal to a preset threshold image frame; switching the first image frame of the first video to the second image frame of the second video or switching the second image frame of the second video to the first image frame according to the switching image frame The first image frame of the video.
  • the images are automatically aligned by calculating the similarity between the two image frames, thereby realizing video switching, and manual editing by post-production personnel is not required, which is convenient for users to use.
  • the calculating the similarity of the target object between the first image frame and the second image frame, and obtaining the similarity value includes: acquiring the target in the first image frame and the second image frame. The feature of the object; calculate the distance of the feature of the target object between the first image frame and the second image frame to obtain a similarity value.
  • the features of the target object include facial features of the target object and/or body posture features of the target object.
  • the method further includes: providing an editing interface, where the editing interface includes objects presented after identifying the first image frame and the second image frame; then the determining the target object includes: responding to The user's selection determines the target object.
  • the editing interface further includes one or more pairs of switching image frames for the user to select; then switching the first image frame of the first video to the second video according to the switching image frames
  • the second image frame of the second video frame or switching the second image frame of the second video to the first image frame of the first video comprises: in response to one or more pairs of image frames selected by the user, switching the image frames according to the pair or Pairs of switching image frames switching a first image frame of the first video to a second image frame of the second video or switching a second image frame of the second video to a first image frame of the first video image frame.
  • a video switching device comprising: a determination module for determining a target object; a calculation module for calculating the similarity of the target object between a first image frame and a second image frame , to obtain a similarity value, wherein the first image frame comes from the first video, and the second image frame comes from the second video; an acquiring module is used to acquire a switching image frame, wherein the switching image frame includes the similarity a first image frame and a second image frame whose value is greater than or equal to a preset threshold; a switching module, configured to switch the first image frame of the first video to the second image frame of the second video according to the switching image frame image frame or switching the second image frame of the second video to the first image frame of the first video.
  • the computing module is specifically configured to: acquire the target pair in the first image frame and the second image frame
  • the feature of the image is calculated; the distance of the feature of the target object between the first image frame and the second image frame is calculated to obtain a similarity value.
  • the features of the target object include facial features of the target object and/or body posture features of the target object.
  • the device further includes: an editing module, configured to provide an editing interface, the editing interface includes objects presented after identifying the first image frame and the second image frame; then the determining module Specifically for: determining the target object in response to the user's selection.
  • an editing module configured to provide an editing interface, the editing interface includes objects presented after identifying the first image frame and the second image frame; then the determining module Specifically for: determining the target object in response to the user's selection.
  • the editing interface further includes one or more pairs of switching image frames for the user to select; then the switching module is specifically configured to: respond to the one or more pairs of switching image frames selected by the user, according to the pair of switching image frames. or pairs of switching image frames to switch the first image frame of the first video to the second image frame of the second video or to switch the second image frame of the second video to the first image frame of the first video an image frame.
  • the present application also provides an electronic device, the structure of the electronic device includes a processor and a memory, and the memory is used to store and support the electronic device to perform the above-mentioned first aspect and its optional implementations provided by The program of the video switching method, and storing the data involved in implementing the video switching method provided by the first aspect and its optional implementation manners.
  • the processor executes the program stored in the memory to execute the method provided by the foregoing first aspect and its optional implementation manners.
  • the electronic device may also include a communication bus for establishing a connection between the processor and the memory.
  • the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the computer-readable storage medium is run on a computer, the computer is made to execute the first aspect and optional implementations thereof.
  • the video switching method described in the method is not limited to:
  • the similarity of the target object between the first image frame and the second image frame is calculated by determining the target object, and the similarity value is obtained, wherein the first image frame is from the first video, so The second image frame is from the second video, if the similarity value is greater than or equal to a preset threshold, a pair of switching image frames is obtained according to the first image frame and the second image frame, and according to the switching image frame.
  • the switching of the first video and the second video is realized, and the switching effect of the images can be realized very conveniently.
  • FIG. 1 is a schematic structural diagram of a video switching device provided by an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of an application of a video switching device provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of a video switching method provided by an embodiment of the present application.
  • 5a is a schematic diagram of an interactive interface provided by an embodiment of the present application.
  • 5b is a schematic diagram of a video import interface provided by an embodiment of the present application.
  • 5c is a schematic diagram of an object presentation interface provided by an embodiment of the present application.
  • FIG. 6 is a flowchart of another video switching method provided by an embodiment of the present application.
  • FIG. 7a is a schematic diagram of an editing interface provided by an embodiment of the present application.
  • FIG. 7b is a schematic diagram of a first image frame image provided by an embodiment of the present application.
  • 7c is a schematic diagram of a second image frame image provided by an embodiment of the present application.
  • FIG. 8 is a schematic flowchart of another video switching method provided by an embodiment of the present application.
  • 200-electronic equipment 110-processor; 120-external memory interface; 121-internal memory; 130-USB interface; 140-charging management module; 141-power management module; 142-battery; 1-antenna; 2-antenna; 150-mobile communication module; 160-wireless communication module; 170-audio module; 170A-speaker; 170B-receiver; 170C-microphone; 170D-headphone jack; 180-sensor module; 193-camera; 194-display; 195- Video codec; 100-video switching device; 10-determination module; 20-calculation module; 30-acquisition module; 40-switching module; 50-editing module; 501-Dock bar; 510-main interface; 511-status bar 512-video import interface; 513-object presentation interface; 711-switch image frame interface; 712-first image frame display interface; 713-second image frame display interface.
  • the video switching method provided by the embodiments of the present application can be used to automatically perform video switching, thereby reducing the requirement on the user's technical capability for switching video production.
  • the embodiment of the present application provides a video switching method, and the method is executed by a video switching apparatus.
  • the function of the video switching apparatus can be realized by a software system, can also be realized by a hardware device, and can also be realized by a combination of a software system and a hardware device.
  • the video switching device 100 can be logically divided into multiple modules, each module can have different functions, and the function of each module is read by the processor in the electronic device And to implement the computer instructions in the memory, the structure of the electronic device can be the electronic device shown in FIG. 2 below.
  • the video switching apparatus 100 may include a determination module 10 , a calculation module 20 , an acquisition module 30 and a switching module 40 .
  • the video switching apparatus 100 may perform the contents described in steps S40-S44, steps S61-S62 and steps S81-S85 described below. It should be noted that the embodiments of the present application only exemplarily divide the structure and functional modules of the video switching apparatus 100 , but do not make any limitations on the specific division.
  • the determining module 10 is used for determining the target object.
  • the determined target object is used for subsequent video switching, and video switching is realized by aligning the target object.
  • a video can be understood to include a series of image frames, which are displayed at a given frame rate, while stopping at a particular frame in the sequence to obtain a single image frame, ie, an image.
  • the video may include objects.
  • the video may be a video file recorded for a specific object, and the object may be a living body, such as a person or an animal, or a static item such as a book or a TV.
  • the video may be a video recorded for a moving human body.
  • Image recognition is performed on the image frame in the video, and the object included in the image frame is recognized.
  • image frames in the video may be acquired frame by frame, and image recognition is performed on the acquired image frames to obtain objects included in the video. It is also possible to acquire multiple image frames in a video. For example, a video including a specific video object can be acquired, and then multiple image frames can be captured from the video, such as multiple image frames in the 1st, 20th, and 34th seconds of the video. Frame Image frames all correspond to a specific time information. For another example, it is also possible to intercept multiple image frames from the video at certain time intervals, for example, the video can be intercepted every 10 seconds, and the video can be intercepted to the 1st, 11th, 21st, etc. seconds in the video. Multi-frame image frames.
  • the recognized objects may include character A, character B, cat C, TV D, and the like.
  • the determination of the target object may be determined according to the user's selection, for example, by presenting the object identified on the editing interface to the user, and the user determines the target object.
  • the object that meets certain conditions in the default image frame can also be the target object, for example, the object located in the middle of the screen in the default image frame is the target object.
  • the calculation module 20 is configured to calculate the similarity of the target object between the first image frame and the second image frame to obtain a similarity value.
  • the first image frame is from a first video
  • the second image frame is from a second video.
  • the calculation module 20 is used to calculate the similarity of the target object between each first image frame in the first video and each second image frame in the second video, if the first video has 3 first image frames, the second The video has 3 second image frames, you can get 9 similarity values.
  • Video switching can be performed on one or more videos. For example, when performing video switching on a video, edit this video into two videos according to different scenes to obtain a first video and a second video.
  • the first video includes multiple frames of the first image frame
  • the second video includes multiple frames.
  • the second image frame For the second image frame, to calculate the similarity of the target object in the first image frame and the second image frame, you can first obtain a certain first image frame in the first video, and then calculate the similarity between the first image frame and the second video. The similarity of the target object among all the second image frames, then obtain the next first image frame in the first video, and then calculate the next first image frame and all the second image frames in the second video. The similarity of the target object between the image frames, and so on, all the first image frames in the first video are calculated.
  • the obtaining module 30 is configured to obtain a switching image frame, wherein the switching image frame includes a first image frame and a second image frame whose similarity value is greater than or equal to a preset threshold. If the similarity value is greater than or equal to a preset threshold, a switching image frame is obtained according to the first image frame and the second image frame. If the similarity value of the target object between the first image frame and the second image frame is greater than or equal to a preset threshold, a pair of switching image frames is obtained, and the switching image frame includes the first image frame and the second image frame.
  • the pair of switching image frames can be understood as the switching position between the first video and the second video, or the position where the first video and the second video are connected, that is, after the first image frame of the first video is displayed, it switches to the second video.
  • the similarity value of the target object between the first image frame and the second image frame is greater than or equal to the preset threshold, it can be considered that the similarity of the target object in the first image frame and the second image frame is high, then When performing video switching, aligning the target objects of the first image frame and the second image frame allows the user to focus on the target object and ignore the changes of other objects, and the similarity of the target objects is high, so that the Video transitions are smooth and natural.
  • the switching module 40 is configured to switch the first image frame of the first video to the second image frame of the second video or switch the second image frame of the second video to the second image frame according to the switching image frame the first image frame of the first video.
  • the switching of the first video and the second video is realized according to the switching image frame.
  • locate the image frame to be switched between the first video and the second video that is, the image frame to be switched in the first video is the first image frame in the switched image frame
  • the image frame to be switched in the second video is the first image frame in the switched image frame.
  • the image frame to be switched is the second image frame in the switched image frame.
  • the first image frame and the second image frame can be combined to realize switching of the first video and the second video. Or connect the first image frame and the second image frame together, so that after the first image frame is displayed during playback, the next image frame is the second image frame, or after the second image frame is displayed during playback, the next image frame is displayed. is the first image frame.
  • the video switching device 100 may further include an editing module 50, and the editing module 50 is configured to provide a user with an editing interface, where the editing interface includes objects for the user to select, and the objects include an adjustment to each image in the video.
  • the object recognized after image recognition is performed on the frame. That is, the video switching apparatus 100 identifies the object to be switched in the video, and presents the identified object through the editing interface, so that the target object can be selected from the objects presented in the editing interface.
  • the editing interface also includes one or more pairs of switching image frames for the user to select. After the video switching apparatus 100 calculates the similarity of the target object between the first image frame of the first video and the second image frame of the second video, multiple pairs of switching image frames, namely the first video and the second image frame, can be obtained.
  • Video switching can be performed according to the selected switching image frame. For example, if two pairs of switching image frames are selected, after the first video is switched to the second video, the second video can also be switched to the first video.
  • some of the modules included in the video switching apparatus 100 may also be combined into one module.
  • the acquisition module 30 and the switching module 40 may be combined into a video switching module.
  • the video switching apparatus 100 described above can be flexibly deployed.
  • the video switching apparatus 100 may be deployed on an electronic device, which may be a software apparatus deployed on a server in a cloud data center or a virtual machine, and the software apparatus may be used for video switching.
  • the electronic device may include a cell phone, tablet, smart watch, tablet computer, laptop computer, in-vehicle computer, desktop computer, wearable device, and the like.
  • FIG. 2 Please refer to FIG. 2 .
  • the electronic device is a mobile phone as an example.
  • the mobile phone shown in FIG. 2 is only an example, and does not constitute a limitation on the mobile phone. Fewer parts.
  • FIG. 2 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device 200 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, Antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, camera 193, display screen 194, etc.
  • a processor 110 an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, Antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone jack 170D, sensor module 180, camera 193, display screen 194, etc.
  • USB universal serial bus
  • the structures illustrated in the embodiments of the present invention do not constitute a specific limitation on the electronic device 200 .
  • the electronic device 200 may include more or less components than shown, or combine some components, or separate some components, or arrange different components.
  • the illustrated components may be implemented in hardware, software, or a combination of software and hardware.
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, video codec 195, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU), etc. . Wherein, different processing units may be independent devices, or may be integrated in one or more processors.
  • application processor application processor, AP
  • modem processor graphics processor
  • image signal processor image signal processor
  • ISP image signal processor
  • controller video codec 195
  • digital signal processor digital signal processor
  • DSP digital signal processor
  • NPU neural-network processing unit
  • Memory may also be provided in the processor 110 for storing computer instructions and data.
  • the memory in processor 110 is cache memory.
  • the memory may hold computer instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the computer instructions or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
  • the video switching apparatus 100 runs in the processor 110, and the function of each module in the video switching apparatus 100 is read by the processor 110 and executes relevant computer instructions to realize video switching.
  • the video switching apparatus 100 may be deployed in a memory, and the processor 110 reads and executes computer instructions from the memory to implement the video switching.
  • the processor 110 may include one or more interfaces.
  • the interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous transceiver (universal asynchronous transmitter) receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / or universal serial bus (universal serial bus, USB) interface, etc.
  • I2C integrated circuit
  • I2S integrated circuit built-in audio
  • PCM pulse code modulation
  • PCM pulse code modulation
  • UART universal asynchronous transceiver
  • MIPI mobile industry processor interface
  • GPIO general-purpose input/output
  • SIM subscriber identity module
  • USB universal serial bus
  • the charging management module 140 is used to receive charging input from the charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module 140 may receive charging input from the wired charger through the USB interface 130 .
  • the charging management module 140 may receive wireless charging input through a wireless charging coil of the electronic device 200 . While the charging management module 140 charges the battery 142 , the electronic device 200 can also be powered by the power management module 141 .
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, the internal memory 121, the display screen 194, the camera 193, and the wireless communication module 160.
  • the power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, battery health status (leakage, impedance).
  • the power management module 141 may also be provided in the processor 110 .
  • the power management module 141 and the charging management module 140 may also be provided in the same device.
  • the wireless communication function of the electronic device 200 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 200 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization.
  • the antenna 1 can be multiplexed as a diversity antenna of the wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.
  • the mobile communication module 150 can provide a wireless communication solution including 2G/3G/4G/5G, etc. applied on the electronic device 200 .
  • the mobile communication module 150 may include one or more filters, switches, power amplifiers, low noise amplifiers (LNAs), and the like.
  • the mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 .
  • at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110 .
  • at least some of the functional modules of the mobile communication module 150 may be provided in the same device as at least some of the modules of the processor 110.
  • the modem processor may include a modulator and a demodulator.
  • the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low frequency baseband signal is processed by the baseband processor and passed to the application processor.
  • the application processor outputs sound signals through audio devices (not limited to the speaker 170A, the receiver 170B, etc.), or displays images or videos through the display screen 194 .
  • the modem processor may be a stand-alone device.
  • the modem processor may be independent of the processor 110, and may be provided in the same device as the mobile communication module 150 or other functional modules.
  • the wireless communication module 160 can provide applications on the electronic device 200 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (BT), global navigation satellite system (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • BT Bluetooth
  • GNSS global navigation satellite system
  • frequency modulation frequency modulation
  • FM near field communication technology
  • NFC near field communication
  • IR infrared technology
  • the wireless communication module 160 may be one or more devices integrating one or more communication processing modules.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110 , perform frequency modulation on it, amplify it, and convert it into electromagnetic waves for radiation through
  • the antenna 1 of the electronic device 200 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 200 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • the GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (GLONASS), a Beidou navigation satellite system (BDS), a quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the electronic device 200 implements a display function through a GPU, a display screen 194, an application processor, and the like.
  • the GPU is a microprocessor for image processing, and is connected to the display screen 194 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • Processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
  • Display screen 194 is used to display images, videos, and the like.
  • Display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
  • LED organic light-emitting diode
  • AMOLED organic light-emitting diode
  • FLED flexible light-emitting diode
  • Miniled MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
  • the electronic device 200 may include one or N display screens 194 , where N is a positive integer greater than one.
  • the electronic device 200 can realize the shooting function through the ISP, the camera 193, the video codec 195, the GPU, the display screen 194 and the application processor.
  • the ISP is used to process the data fed back by the camera 193 .
  • the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing and converts it into an image visible to the naked eye.
  • ISP can also perform algorithm optimization on image noise, brightness, and skin tone. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene.
  • the ISP may be provided in the camera 193 .
  • Camera 193 is used to capture still images or video.
  • the object is projected through the lens to generate an optical image onto the photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs the digital image signal to the DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other formats of image signals.
  • the electronic device 200 may include 1 or N cameras 193 , where N is a positive integer greater than 1.
  • a digital signal processor is used to process digital signals, in addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 200 selects a frequency point, the digital signal processor is used to perform Fourier transform on the frequency point energy, and the like.
  • Video codec 195 is used to compress or decompress digital video.
  • the electronic device 200 may support one or more video codecs 195 .
  • the electronic device 200 can play or record videos in various encoding formats, such as: moving picture experts group (moving picture experts group, MPEG) 1, MPEG2, MPEG3, MPEG4, and so on.
  • MPEG moving picture experts group
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 200 .
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example to save files like music, video etc in external memory card.
  • Internal memory 121 may be used to store one or more computer programs including instructions.
  • the processor 110 may execute the above-mentioned instructions stored in the internal memory 121, thereby causing the electronic device 200 to execute the video switching method provided in some embodiments of the present application, as well as various functional applications and data processing.
  • the internal memory 121 may include a storage program area and a storage data area.
  • the stored program area may store the operating system; the stored program area may also store one or more application programs (such as gallery, contacts, etc.) and the like.
  • the storage data area may store data (such as photos, contacts, etc.) created during the use of the electronic device 200 and the like.
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic disk storage devices, flash memory devices, universal flash storage (UFS), and the like.
  • the processor 110 causes the electronic device 200 to perform the video switching provided in the embodiments of the present application by executing the instructions stored in the internal memory 121 and/or the instructions stored in the memory provided in the processor. methods, as well as various functional applications and data processing.
  • the electronic device 200 may implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone jack 170D, and the application processor. Such as music playback, recording, etc.
  • the audio module 170 is used for converting digital audio information into analog audio signal output, and also for converting analog audio input into digital audio signal. Audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be provided in the processor 110 , or some functional modules of the audio module 170 may be provided in the processor 110 .
  • Speaker 170A also referred to as a "speaker" is used to convert audio electrical signals into sound signals.
  • the electronic device 200 can listen to music through the speaker 170A, or listen to a hands-free call.
  • the receiver 170B also referred to as "earpiece” is used to convert audio electrical signals into sound signals.
  • the voice can be answered by placing the receiver 170B close to the human ear.
  • the microphone 170C also called “microphone” or “microphone”, is used to convert sound signals into electrical signals.
  • the user can make a sound by approaching the microphone 170C through a human mouth, and input the sound signal into the microphone 170C.
  • the electronic device 200 may be provided with one or more microphones 170C.
  • the electronic device 200 may be provided with two microphones 170C, which may implement a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 200 may further be provided with three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, and implement directional recording functions.
  • the earphone jack 170D is used to connect wired earphones.
  • the earphone interface 170D can be the USB interface 130, or can be a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface.
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the sensor module 180 may include a pressure sensor, a gyro sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity light sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.
  • the touch sensor can be arranged on the display screen, and the touch screen is composed of the touch sensor and the display screen, also called "touch screen”.
  • the above electronic device 200 may also include one or more components such as buttons, motors, indicators, and SIM card interfaces, which are not limited in this embodiment of the present application.
  • the video switching device When the video switching device is a hardware device, it may be the electronic device 200 described above, including a display screen 194, a processor 110 and an internal memory 121, and the internal memory 121 may exist independently and be connected to the processor 110 through a communication bus.
  • the internal memory 121 may also be integrated with the processor 110 .
  • the internal memory 121 may store computer instructions, and when the computer instructions stored in the internal memory 121 are executed by the processor 110, the model optimization method of the present application may be implemented.
  • the internal memory 121 may also store data required by the processor in the process of executing the video switching method of the embodiment of the present application and the generated intermediate data and/or result data.
  • FIG. 3 is a schematic diagram of an application of the video switching apparatus in this application.
  • the function provided by the video switching apparatus 100 may be abstracted into an application by an electronic equipment supplier or an application supplier,
  • a video switching application the electronic device supplier installs the video switching application on the electronic device 200, or the application supplier allows the user to purchase the video switching application.
  • the user can use the video switching application installed on the electronic device 200, or download the video switching application from online, and the user can use the video switching application to perform video switching.
  • FIG. 4 is a flowchart of a video switching method provided by an embodiment of the present application.
  • the video switching method can be performed by the aforementioned video switching device, referring to FIG. 4 , the method includes the following steps:
  • Step S40 Provide an editing interface to the user.
  • the function of the video switching device is abstracted into a video switching application.
  • the interface of the mobile phone includes a status bar 511 , a main interface 510 and a Dock bar 501 .
  • the status bar 511 may include the operator's name (eg, China Mobile), time, signal strength, and current remaining power, and the like. The content of the following status bar 511 is similar and will not be repeated here.
  • the main interface 510 includes applications, including embedded applications and downloadable applications. As shown in FIG. 5a, the main interface 510 includes calendar, alarm clock, and video switching applications.
  • the Dock bar 501 includes commonly used applications such as phone, information and camera.
  • the user can import the pending video to be edited to realize video switching through the video import interface 512.
  • the user can obtain the pending video by reading the video in the gallery on the mobile phone, or obtain the pending video by shooting with the camera or through the web page. Download the corresponding video to be processed, which is not specifically limited in this application.
  • videos for user selection are presented on the video import interface 512, including video 1, video 2 and video 3.
  • the content of each video is different, and the duration can also be different, such as the duration of video 1.
  • the duration of the videos may not be different.
  • the user can select a video or multiple videos. When the user selects a video, the video can be edited into two or more videos according to the recognition of the scene in the video. Or the user determines the position of the clip and the number of clips. It can be understood that if the user selects a video to perform video switching, the target objects in the video are the same, but the scenes are different. It can be understood that the scene includes objects other than the target object, and the objects other than the target object include characters, background environment, etc., and the background environment can include grassland, indoor, sky, stationary objects, and the like. For example, if a user shoots a video, and the scene in the video changes from indoor to outdoor, the video can be edited into an indoor video and an outdoor video.
  • Video 1 and video 3 can be videos of the same dance performed by the same dancer shot at the same angle.
  • the objects in video 3 are all the same dancer, but the dancer's clothing, makeup or hairstyle are different, and the scenes in video 1 and video 3 are also different.
  • Video 1 is the first video
  • video 2 is the second video. It is understandable, Video 2 may also be selected as the video to be processed, that is, the number of videos to be processed is not limited to two, and the number of videos to be processed is not specifically limited in this application.
  • the video switching device After determining the video to be processed, the video switching device performs image processing on the image frames in the to-be-processed video to identify objects in the image frame images.
  • the image frames in the video can be acquired frame by frame, and the acquired The resulting image frames are subjected to image recognition to obtain objects in the video. It is also possible to acquire multiple image frames in a video. For example, a video including specific video objects can be acquired, and then multiple frames of image frames can be intercepted from the video, for example, the multi-frame image frames of the 1st, 11th, 20th, and 34th seconds in the video can be intercepted, wherein , each multi-frame image frame corresponds to a specific time information.
  • the video can be intercepted every 10 seconds, and the video can be intercepted to the 1st, 11th, 21st, etc. seconds in the video.
  • Multi-frame image frames For another example, it is also possible to intercept multiple image frames from the video at certain time intervals, for example, the video can be intercepted every 10 seconds, and the video can be intercepted to the 1st, 11th, 21st, etc. seconds in the video. Multi-frame image frames.
  • the editing interface may further include an object presentation interface 513, which is presented on the object presentation interface 513.
  • Recognition results of objects in the video to be processed including the faces of object A, object B, and object C. It can be understood that the object A, the object B and the object C are the objects recognized after performing image recognition on the image frames in the video 1 and the video 2.
  • the object presentation interface 513 may present the face or the whole of the character.
  • the image frame of video 1 includes object A and/or object B and/or object C
  • the image frame of video 2 includes object A and/or object B and/or object C
  • the image frames of video 1 and video 2 include object A and/or object B and/or object C.
  • the recognized objects include object A, object B and object C.
  • the number of recognized objects is not limited, and the number of recognized objects is determined by the actual number of objects in the video.
  • the video switching device edits this segment of video, and edits multiple segments of video, then the video to be processed is the multiple segments of video edited, then the video switching device will
  • the image frames of the same person with the same posture expression, body posture expression and with a certain time interval are screened out. Since the expression and posture of the person are very close in a certain period of time, in order to achieve the best effect, the screened image frames need to be at a certain time interval.
  • Step S41 Determine the target object.
  • the video switching device can screen out the image frames in which the same target person has the same gesture expression and the same body gesture expression in the first image frame and the second image frame according to the target person.
  • the image frames in the first video can be processed frame by frame to perform face recognition, and whether the face is a target object is determined by combining the RGB data of the face in the image frame with a face recognition algorithm.
  • the processing of the image frames in the second video is the same, and details are not repeated here.
  • a rectangular frame of the human face can be obtained, and then the face in the rectangular frame of the human face can be identified by using the face recognition technology, such as: Face ID technology can be used to label the face to determine which person in the video the face is, and then to determine the target object in the first image frame.
  • Face ID technology can be used to label the face to determine which person in the video the face is, and then to determine the target object in the first image frame.
  • the processing of the second image frame is the same, and details are not repeated here.
  • object A, object B, and object C are presented in the object presentation interface 513, and the user can click on object A to select and determine object A as the target object.
  • the video switching device may automatically determine the object located in the center of the image frame as the target object.
  • Step S42 Calculate the similarity of the target object between the first image frame and the second image frame to obtain a similarity value.
  • a certain first image frame A1 in the first video may be acquired first, and then the first image frame A1 may be compared with all the second image frames in the second video, for example, selecting the second video
  • For a certain second image frame B1 in the second video calculate the similarity of the target object between the first image frame A1 and the second image frame B1, and then obtain the next second image frame B2 in the second video, and calculate The similarity of the target object between the first image frame A1 and the second image frame B2, and so on, calculate the target object between the first image frame and all the second image frames in the second video similarity.
  • Image frame A1 calculates the similarity of the target object between the first image frame A1 and the second image frame B1
  • obtain the next first image frame A2 in the first video and calculate the first image frame A2
  • the similarity of the target object with the second image frame B1 and so on calculate the similarity of the target object between the first image frame and the second image frame.
  • the similarity of the target object can be calculated in the following manner to obtain the similarity value.
  • Step S61 Acquire the characteristics of the target object in the first image frame and the second image frame.
  • Step S62 Calculate the distance of the feature of the target object between the first image frame and the second image frame to obtain a similarity value.
  • the features of the target object such as facial features and/or body posture features of the target object
  • the features of the target object may be acquired.
  • one or more of two-dimensional features, three-dimensional features, and face grids of the human face can be obtained, and the two-dimensional features of the human face in the first image frame and the two-dimensional features of the human face in the second image frame are calculated.
  • the distance between the dimensional features is obtained to obtain the distance measure, and then the similarity value is obtained according to the distance measure.
  • the distance measure of the above features can also be integrated, and the final similarity value can be obtained after processing.
  • the distance may be Euclidean distance, cosine distance, etc., which is not specifically limited in this application. It can be understood that the distance metric is used to measure the distance between individuals in space, and the farther the distance is, the greater the difference between individuals.
  • the similarity measure is to calculate the degree of similarity between individuals. Contrary to the distance measure, the smaller the value of the similarity measure, the smaller the similarity between individuals and the greater the difference.
  • the distance between the facial feature and/or the body posture feature of the target object may be calculated to ensure the similarity of the target object's face in the first image frame and the second image frame, or to ensure that the first image frame and the target object's face are similar.
  • the similarity of the body posture of the target object in the second image frame can also ensure that the face and body posture of the target object in the first image frame and the second image frame are similar.
  • the similarity error value of the target object between the first image frame and the second image frame may be calculated, that is, the distance between the facial feature of the target object and/or the body posture feature of the target object may be calculated , and the similarity value is obtained according to the similarity error value. It can be understood that the larger the similarity error value, the smaller the similarity between individuals and the greater the difference.
  • Step S43 Acquire a switching image frame, wherein the switching image frame includes a first image frame and a second image frame whose similarity value is greater than or equal to a preset threshold.
  • the similarity value is greater than or equal to a preset threshold
  • the features of the target object in the first image frame and the second image frame are similar, such as similar facial features and/or body posture
  • the scenes of the first image frame and the second image frame or the clothing, hairstyle, etc. of the target object may not be similar.
  • a pair of switching image frames can be obtained including the first image frame. an image frame and a second image frame.
  • a pair of switching image frames can be obtained including the first image frame. an image frame and a second image frame.
  • multiple pairs of switching image frames can be obtained.
  • a pair of switching image frames including the first Image frame A1 and second image frame B1.
  • the similarity value between the target object of the first image frame A2 of the first video and the target object of the second image frame B2 of the second video is greater than the preset threshold, then a pair of switching image frames is obtained including the first image frame A2 and The second image frame B2.
  • the editing interface may further include a switching image frame interface 711 (as shown in FIG. 7 a ), and the user may select an image frame to be switched through the switching image frame interface 711 .
  • the switching image frame 1 and the switching image frame 2 are presented on the switching image frame interface 711.
  • the switching image frame interface 711 may include multiple pairs of switching image frames.
  • the switching image frame 1 includes the first image frame A100 and the second image frame B200, and the image frame connected after the first image frame A100 is the first image frame A100.
  • the two image frames B200, or the image frame connected after the second image frame B200 is the first image frame A100.
  • Step S44 Switch the first image frame of the first video to the second image frame of the second video or switch the second image frame of the second video to the first image frame according to the switching image frame The first image frame of the video.
  • the switching of the first video to the second video according to the switching image frame may be implemented as: switching the first video and the second video according to the obtained switching image frame , or implemented as switching the first video and the second video according to the switching image frame selected by the user.
  • the switching between the first video and the second video according to the switching image frame may specifically include: obtaining the first video and the second video according to the switching image frame, and performing the switching between the first video and the second video. If the switching position is determined, the video switching is performed according to the switching position of the first video and the second video.
  • a pair of switching image frames includes a first image frame A10 and a second image frame B10, and the switching of the first video and the second video is implemented according to the first image frame A10 and the second image frame B10, and the first An image frame A10 is connected to the second image frame B10, that is, the image frame after the first image frame A10 is the second image frame B10, or the image frame after the second image frame B10 is the first image frame A10. So as to switch to the second image frame B10 when the first image frame A10 is played. Or, switch to the first image frame A10 when playing the second image frame B10.
  • the first image frame display interface 712 in FIG. 7b presents the image of the first image frame
  • the second image frame display interface 713 in FIG. 7c presents the image of the second image frame
  • the image of the first image frame includes the target object, the grass and the cloud
  • the image of the second image frame in FIG. 7c includes the target object
  • the face and/or body posture of the target object in the image of the first image frame is the same as that of the second image frame.
  • the face and/or body poses of the target object in the images are similar, but the scene is different, such as the target object's clothing, and the background is different.
  • the image frames can be combined into one video according to the switching, so as to realize the switching of the first video and the second video. If all the obtained switching image frames are merged into one video, the first image frame and the second image frame are adjacent in the merged video, and the merged video is different from the first video and the second video. different.
  • some image frames may be appropriately added, for example, two pairs of switching image frames are obtained, wherein a pair of switching image frames includes a first image frame A10 and a second image frame B20, and the other The pair of switching image frames includes a first image frame A31 and a second image frame B41. Then, according to the combination of switching image frames into one video, the first image frames A1 to A9 before the first image frame A10 can be obtained, the second image frames B21 to B40 after the second image frame B20 can be obtained, and after the first image frame A31 can be obtained. of the first image frame.
  • the first image frames A1 to A10, the second image frames B20 to B41, the first image frame A31 and the subsequent first image frames may be combined into one video.
  • Frame A1 starts to display and play, sequentially plays and displays to the first image frame A10, switches to the second image frame B20 after playing and displays the first image frame A10, instead of continuing to play and display the first image frame A11, and then plays and displays the first image frame A11.
  • the second image frames B21 to B40 after the two image frames B20 are switched to the first image frame A31 after being played and displayed to the second image frame B41, and then the image frames after the first image frame A31 are played and displayed.
  • the target objects of the image frames in the switching image frames may be aligned, for example, the target objects in the first image frame and the target objects in the second image frame are aligned at the positions of the image frames, so that the display When switching to the second image frame after the first image frame, the user visually sees little change in the target object.
  • the video to be processed includes three video segments, a pair of switching image frames of video 1 and video 2 can be obtained, then a pair of switching image frames of video 2 and video 3 can be obtained, and then a pair of switching image frames of video 3 and video 1 can be obtained. A pair of switching image frames, etc.
  • the video switching method can automatically realize video switching.
  • the user only needs to input the video to be processed to determine the target object, and then the video switching can be automatically realized, and the video switching can be automatically performed according to the characteristics of the target object, so as to avoid less labor and Waste of time.
  • FIG. 8 is a schematic flowchart of a video switching provided by an embodiment of the present application.
  • the description will take the target object as the face of the person.
  • Step S81 Obtain a rectangular frame of the human face through the RGB image.
  • the RGB images of the first image frame and the second image frame are acquired, and image processing is performed on the two RGB images to obtain a rectangular frame of the human face, that is, the location of the human face in the two RGB images is identified. area, and use the face rectangle to frame the face.
  • Step S82 labeling the face using the recognition technology to determine the target object.
  • the face in the rectangular frame of the face is identified by the face recognition technology, for example, the identified face is labeled using the Face ID technology, so as to determine which person in the video the face is.
  • the video switching device can determine the human face located in the middle of the screen as the target object according to the position of the human face, and can also specify which person is the target object by the user.
  • Step S83 2D face feature point calculation and/or face 3D feature point calculation and/or face grid calculation.
  • step S82 determines that the target object in the first image frame and the second image frame is the same person
  • the similarity of the target object in the first image frame and the second image frame is calculated.
  • Obtain the two-dimensional feature points of the face of the target object in the first image frame obtain the two-dimensional feature points of the target object in the second image frame, and then calculate the two-dimensional feature points of the target object in the two image frames.
  • the distance between the two image frames is obtained, and the similarity error value is obtained, and the difference value of the face of the target object between the two image frames is determined.
  • the three-dimensional feature point of the face of the target object in the first image frame obtains the three-dimensional feature point of the face of the target object in the second image frame, and then calculate the three-dimensional feature point of the target object in the two image frames The distance between the two image frames is obtained, and the similarity error value is obtained, and the difference value of the target object's face between the two image frames is determined.
  • obtaining the face grid points of the target object in the first image frame obtaining the face grid points of the target object in the second image frame, and then calculating the grid points of the target object in the two image frames The distance between the two image frames is obtained, and the similarity error value is obtained, and the difference value of the target object's face between the two image frames is determined.
  • the similarity error value may be obtained according to the difference value between the two-dimensional feature points and/or the difference value between the three-dimensional feature points and/or the difference value between the grid points.
  • Step S84 The similarity error value is less than or equal to the error threshold.
  • Step S85 Obtain the switching image frame.
  • the two image frames are selected to obtain the switching image frame. Video switching is then performed according to the switching image frame.
  • the video switching device automatically aligns the faces in the two image frames completely, and realizes the switching effect, so that the two video
  • the seamless and ingenious connection achieves the high similarity of the faces of the two frames of images to be switched, and the cool effect of environment switching.
  • the computer program product for realizing video switching includes one or more computer instructions for performing video switching.
  • the process described in FIG. 4 and FIG. 6 according to the embodiment of the present application is generated in whole or in part. or function.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, or the like that includes an integration of one or more available media.
  • the available media may be magnetic media (eg: floppy disk, hard disk, magnetic tape), optical media (eg: digital versatile disc (DVD)), or semiconductor media (eg: solid state disk (SSD)) )Wait.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Security & Cryptography (AREA)
  • Television Signal Processing For Recording (AREA)
  • Studio Devices (AREA)

Abstract

本申请公开了一种视频切换方法、装置、存储介质及设备,属于视频处理领域。在本申请实施例中,确定目标对象;计算第一图像帧和第二图像帧之间所述目标对象的相似度,得到相似度值,其中所述第一图像帧来自第一视频,所述第二图像帧来自第二视频;获取切换图像帧,其中所述切换图像帧包括所述相似度值大于或等于预设阈值的第一图像帧和第二图像帧;根据所述切换图像帧将所述第一视频的第一图像帧切换至所述第二视频的第二图像帧或将所述第二视频的第二图像帧切换至所述第一视频的第一图像帧。也即,本申请实施例可以根据两个图像帧中目标对象的相似度自动进行视频切换。

Description

视频切换方法、装置、存储介质及设备
本申请要求于2021年1月5日提交中国专利局、申请号为202110008033.0,发明名称为“视频切换方法、装置、存储介质及设备”的中国专利的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及视频领域,特别涉及一种视频切换方法、装置、存储介质及设备。
背景技术
随着短视频的兴起,越来越多的人开始接受并采用短视频的方式记录和分享生活中的喜怒哀乐。在实际应用场景中,用户出于多样化的需求,往往希望能够将多段不同场景的视频合成一起,做出更好的切换效果,以带来非常炫酷的切换体验,但,当前该切换动作主要通过后期制作人员人工编辑来实现切换动作。
发明内容
本申请提供了一种视频切换方法,无须后期制作人员人工编辑,即可根据目标对象进行视频切换。所述技术方案如下:
第一方面,提供了一种视频切换方法,所述方法包括:确定目标对象;计算第一图像帧和第二图像帧之间所述目标对象的相似度,得到相似度值,其中所述第一图像帧来自第一视频,所述第二图像帧来自第二视频;获取切换图像帧,其中所述切换图像帧包括所述相似度值大于或等于预设阈值的第一图像帧和第二图像帧;根据所述切换图像帧将所述第一视频的第一图像帧切换至所述第二视频的第二图像帧或将所述第二视频的第二图像帧切换至所述第一视频的第一图像帧。
在本申请实施例中,通过计算两帧图像帧之间的相似度进行自动对齐图像,以此实现视频切换,无须后期制作人员人工编辑,便于用户使用。
可选地,所述计算第一图像帧和第二图像帧之间所述目标对象的相似度,得到相似度值包括:获取所述第一图像帧和所述第二图像帧中所述目标对象的特征;计算所述第一图像帧和所述第二图像帧之间所述目标对象的特征的距离,得到相似度值。
可选地,所述目标对象的特征包括目标对象的脸部特征和/或目标对象的身体姿态特征。
可选地,所述方法还包括:提供编辑界面,所述编辑界面包括对所述第一图像帧和所述第二图像帧进行识别后呈现的对象;则所述确定目标对象包括:响应于用户的选择确定目标对象。
可选地,所述编辑界面还包括供用户选择的一对或多对切换图像帧;则所述根据所述切换图像帧将所述第一视频的第一图像帧切换至所述第二视频的第二图像帧或将所述第二视频 的第二图像帧切换至所述第一视频的第一图像帧包括:响应用户选择的一对或多对切换图像帧,根据所述一对或多对切换图像帧将所述第一视频的第一图像帧切换至所述第二视频的第二图像帧或将所述第二视频的第二图像帧切换至所述第一视频的第一图像帧。
第二方面,提供了一种视频切换装置,所述装置包括:确定模块,用于确定目标对象;计算模块,用于计算第一图像帧和第二图像帧之间所述目标对象的相似度,得到相似度值,其中所述第一图像帧来自第一视频,所述第二图像帧来自第二视频;获取模块,用于获取切换图像帧,其中所述切换图像帧包括所述相似度值大于或等于预设阈值的第一图像帧和第二图像帧;切换模块,用于根据所述切换图像帧将所述第一视频的第一图像帧切换至所述第二视频的第二图像帧或将所述第二视频的第二图像帧切换至所述第一视频的第一图像帧。
可选地,所述计算模块具体用于:获取所述第一图像帧和所述第二图像帧中所述目标对
象的特征;计算所述第一图像帧和所述第二图像帧之间所述目标对象的特征的距离,得相似度值。
可选地,所述目标对象的特征包括目标对象的脸部特征和/或目标对象的身体姿态特征。
可选地,所述装置还包括:编辑模块,用于提供编辑界面,所述编辑界面包括对所述第一图像帧和所述第二图像帧进行识别后呈现的对象;则所述确定模块具体用于:响应于用户的选择确定目标对象。
可选地,所述编辑界面还包括供用户选择的一对或多对切换图像帧;则所述切换模块具体用于:响应用户选择的一对或多对切换图像帧,根据所述一对或多对切换图像帧将所述第一视频的第一图像帧切换至所述第二视频的第二图像帧或将所述第二视频的第二图像帧切换至所述第一视频的第一图像帧。
第三方面,本申请还提供了一种电子设备,所述电子设备的结构中包括处理器和存储器,所述存储器用于存储支持电子设备执行上述第一方面及其可选的实现方式所提供的视频切换方法的程序,以及存储用于实现上述第一方面及其可选的实现方式所提供的视频切换方法所涉及的数据。所述处理器执行所述存储器中存储的程序执行前述第一方面及其可选的实现方式提供的方法。所述电子设备还可以包括通信总线,该通信总线用于该处理器与存储器之间建立连接。
第四方面,本申请还提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面及其可选的实现方式所述的视频切换方法。
上述第二方面、第三方面和第四方面所获得的技术效果与第一方面中对应的技术手段获得的技术效果近似,在这里不再赘述。
本申请提供的技术方案带来的有益效果至少包括:
在本申请实施例中,通过确定目标对象,计算第一图像帧和第二图像帧之间所述目标对象的相似度,得到相似度值,其中所述第一图像帧来自第一视频,所述第二图像帧来自第二视频,若所述相似度值大于或等于预设阈值,根据所述第一图像帧和所述第二图像帧得到一对切换图像帧,根据所述切换图像帧实现所述第一视频和所述第二视频的切换,十分便捷的实现图像的切换效果。
附图说明
图1是本申请实施例提供的一种视频切换装置的结构示意图;
图2是本申请实施例提供的一种电子设备的结构示意图;
图3是本申请实施例提供的一种视频切换装置的应用示意图;
图4是本申请实施例提供的一种视频切换方法的流程示意图;
图5a是本申请实施例提供的一种交互界面的示意图;
图5b是本申请实施例提供的一种视频导入界面的示意图;
图5c是本申请实施例提供的一种对象呈现界面的示意图;
图6是本申请实施例提供的另一种视频切换方法流程图;
图7a是本申请实施例提供的一种编辑界面的示意图;
图7b是本申请实施例提供的一种第一图像帧图像的示意图;
图7c是本申请实施例提供的一种第二图像帧图像的示意图;
图8是本申请实施例提供的另一种视频切换方法的流程示意图。
主要元件符号说明
200-电子设备;110-处理器;120-外部存储器接口;121-内部存储器;130-USB接口;140-充电管理模块;141-电源管理模块;142-电池;1-天线;2-天线;150-移动通信模块;160-无线通信模块;170-音频模块;170A-扬声器;170B-受话器;170C-麦克风;170D-耳机接口;180-传感器模块;193-摄像头;194-显示屏;195-视频编解码器;100-视频切换装置;10-确定模块;20-计算模块;30-获取模块;40-切换模块;50-编辑模块;501-Dock栏;510-主界面;511-状态栏;512-视频导入界面;513-对象呈现界面;711-切换图像帧界面;712-第一图像帧展示界面;713-第二图像帧展示界面。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
在对本申请实施例进行详细的解释说明之前,先对本申请实施例涉及的应用场景予以介绍。
当前,越来越多的用户开始在社交平台制作和分享短视频。用户在制作短视频时,希望将多个视频片段合成一个视频,实现视频场景的切换。然而现有均是通过后期视频制作人员人工处理,即从该多段视频片段中一帧一帧搜索要进行切换的图像帧,并整理搜索到的图像帧的图像来实现切换。对于普通用户来说,其没有相应的视频编辑工具或没有视频编辑技术,则没有办法制作出切换效果的视频,且靠后期人工编辑合成视频花费人力。
基于此,可以采用本申请实施例提供的视频切换方法,自动进行视频切换,减少切换视频制作对用户技术能力的要求。
本申请实施例提供了一种视频切换方法,该方法由视频切换装置来执行。视频切换装置的功能可以由软件系统实现,也可以由硬件设备实现,还可以由软件系统和硬件设备结合来实现。
当视频切换装置为软件装置时,参见图1,该视频切换装置100可以在逻辑上分成多个模块,每个模块可以具有不同的功能,每个模块的功能由电子设备中的处理器读取并执行存储器中的计算机指令来实现,该电子设备结构可以如下文中图2所示的电子设备。示例性的, 该视频切换装置100可以包括确定模块10、计算模块20、获取模块30和切换模块40。在一种具体实现方式中,视频切换装置100可以执行下文描述的步骤S40-S44、步骤S61-S62和步骤S81-S85中描述的内容。需要说明的是,本申请实施例仅对视频切换装置100的结构和功能模块进行示例性划分,但是并不对其具体划分做任何限定。
确定模块10,用于确定目标对象。确定出的目标对象用于后续视频切换,通过对齐该目标对象,实现视频切换。视频可以理解为包括一系列的图像帧,将这一系列的图像帧以给定的帧率显示,而在一序列的特定帧处停止可获得单个图像帧,即图像。该视频中可以包括对象,比如,视频可以为针对具体的对象进行录制的视频文件,该对象可以为生命体,如人、动物,还可以为静态的物品如书、电视等。如该视频可以为针对运动的人体进行录制的视频。对视频中的图像帧进行图像识别,识别出该图像帧中包括的对象。在实际应用中,可以逐帧获取视频中的图像帧,并对获取得到的图像帧进行图像识别,以得到视频中包括的对象。还可以获取视频中的多帧图像帧。比如,可以获取包括具体视频对象的视频,然后从视频中截取多帧图像帧,如可以截取到视频中第1秒、第20秒、第34秒等等的多帧图像帧,其中,每多帧图像帧都对应一个具体的时间信息。又比如,还可以按照一定的时间间隔,从视频中截取多帧图像帧,如可以每隔10秒钟对视频进行截取,截取到视频中第1秒、第11秒、第21秒等等的多帧图像帧。其中,识别出的对象可以包括人物A、人物B、猫咪C、电视D等。目标对象的确定可以根据用户的选择确定,如,通过编辑界面上识别出的对象呈现给用户,用户确定目标对象。也可以默认图像帧中符合一定条件的对象为目标对象,如默认图像帧中位于画面中间的对象为目标对象。
计算模块20,用于计算第一图像帧和第二图像帧之间所述目标对象的相似度,得到相似度值。其中所述第一图像帧来自第一视频,所述第二图像帧来自第二视频。计算模块20用于计算第一视频中的各个第一图像帧与第二视频中的各个第二图像帧所述目标对象的相似度,如有第一视频有3个第一图像帧,第二视频有3个第二图像帧,则可以得到9个相似度值。可以对一段或多段视频进行视频切换。如对一段视频进行视频切换时,将这一段视频根据不同场景剪辑出两段视频,得到第一视频和第二视频,第一视频中包括多帧第一图像帧,第二视频中包括多帧第二图像帧,计算第一图像帧和第二图像帧中目标对象的相似度,可以先获取第一视频中的某一第一图像帧,然后计算该第一图像帧与第二视频中的所有第二图像帧之间所述目标对象的相似度,然后再获取第一视频中的下一帧第一图像帧,再计算该下一帧第一图像帧与第二视频中的所有第二图像帧之间所述目标对象的相似度,以此类推,计算完第一视频中所有的第一图像帧。
获取模块30,用于获取切换图像帧,其中所述切换图像帧包括所述相似度值大于或等于预设阈值的第一图像帧和第二图像帧。若所述相似度值大于或等于预设阈值,根据所述第一图像帧和所述第二图像帧得到切换图像帧。若第一图像帧和第二图像帧之间所述目标对象的相似度值大于或等于预设阈值,得到一对切换图像帧,该切换图像帧即包括该第一图像帧和该第二图像帧。该一对切换图像帧可以理解为第一视频与第二视频的切换位置,或第一视频与第二视频连接的位置,即在显示第一视频的该第一图像帧后即切换至第二视频的该第二图像帧,或在显示第二视频的该第二图像帧后切换至第一视频的该第一图像帧。基于第一图像帧和第二图像帧之间所述目标对象的相似度值大于或等于预设阈值,则可以认为第一图像帧和第二图像帧中所述目标对象的相似度高,则在进行视频切换的时候,将第一图像帧和第二 图像帧的目标对象对准,可以让用户将注意力放该目标对象身上而忽略其他对象的变化,而目标对象的相似度高,使得视频切换效果流畅且自然。
切换模块40,用于根据所述切换图像帧将所述第一视频的第一图像帧切换至所述第二视频的第二图像帧或将所述第二视频的第二图像帧切换至所述第一视频的第一图像帧。以根据所述切换图像帧实现所述第一视频和所述第二视频的切换。根据所述切换图像帧定位到第一视频和第二视频要进行切换的图像帧,即第一视频中要进行切换的图像帧为该切换图像帧中的第一图像帧,第二视频中要进行切换的图像帧为该切换图像帧中的第二图像帧。可以将该第一图像帧和第二图像帧合并,实现第一视频和第二视频的切换。或将第一图像帧和第二图像帧连接一起,以在播放显示第一图像帧后,下一帧图像帧为第二图像帧,或在播放显示第二图像帧后,下一帧图像帧为第一图像帧。
可选地,该视频切换装置100还可以包括编辑模块50,该编辑模块50用于向用户提供编辑界面,所述编辑界面包括供用户选择的对象,所述对象包括对所述视频中各个图像帧进行图像识别后识别出的对象。即,视频切换装置100对要进行视频切换中的对象进行识别,通过编辑界面将识别出的对象呈现出来,用于可以从编辑界面呈现的对象中选择目标对象。所述编辑界面还包括供用户选择的一对或多对切换图像帧。视频切换装置100计算第一视频的第一图像帧和第二视频中的第二图像帧之间的所述目标对象的相似度后,可以得到多对切换图像帧,即第一视频和第二视频中有多处位置可以进行视频切换。可以根据选择的切换图像帧进行视频切换,如选择两对切换图像帧,则第一视频切换至第二视频后,还可以从第二视频切换至第一视频。
另外,在一些可能的情况中,上述的视频切换装置100包括的多个模块中的部分模块的也可以合并为一个模块,例如,上述的获取模块30和切换模块40可以合并为视频切换模块。
在本申请实施例中,上述介绍的视频切换装置100可以灵活的部署。例如,该视频切换装置100可以部署在电子设备上,其可以是部署在云数据中心中的服务器或者虚拟机上的软件装置,该软件装置可以用于视频切换。该电子设备可以包括手机、平板、智能手表、平板电脑、膝上型便携计算机、车载电脑、台式计算机、可穿戴设备等等。
请参考图2,以电子设备为手机为例本领域技术人员可以理解,图2所示的手机仅仅是一个范例,并不构成对手机的限定,手机可以具有比图中所示的更多或更少的部件。
示例性地,图2为本申请实施例提供的一种电子设备的结构示意图。
其中,电子设备200可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,摄像头193,显示屏194等。
可以理解的是,本发明实施例示意的结构并不构成对电子设备200的具体限定。在本申请另一些实施例中,电子设备200可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,视频编解码器195,数字信号处理器 (digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
处理器110中还可以设置存储器,用于存储计算机指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的计算机指令或数据。如果处理器110需要再次使用该计算机指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
在一些实施例中,视频切换装置100运行于所述处理器110中,视频切换装置100中每个模块的功能由处理器110读取并执行相关计算机指令来实现视频切换。在其他实施例中,视频切换装置100可以部署在存储器中,由处理器110从存储器中读取并执行计算机指令来实现视频切换。
在一些实施例中,处理器110可以包括一个或多个接口。接口可以包括集成电路(inter-integrated circuit,I2C)接口,集成电路内置音频(inter-integrated circuit sound,I2S)接口,脉冲编码调制(pulse code modulation,PCM)接口,通用异步收发传输器(universal asynchronous receiver/transmitter,UART)接口,移动产业处理器接口(mobile industry processor interface,MIPI),通用输入输出(general-purpose input/output,GPIO)接口,用户标识模块(subscriber identity module,SIM)接口,和/或通用串行总线(universal serial bus,USB)接口等。
充电管理模块140用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块140可以通过USB接口130接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块140可以通过电子设备200的无线充电线圈接收无线充电输入。充电管理模块140为电池142充电的同时,还可以通过电源管理模块141为电子设备200供电。
电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,显示屏194,摄像头193,和无线通信模块160等供电。电源管理模块141还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在其他一些实施例中,电源管理模块141也可以设置于处理器110中。在另一些实施例中,电源管理模块141和充电管理模块140也可以设置于同一个器件中。
电子设备200的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。
天线1和天线2用于发射和接收电磁波信号。电子设备200中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备200上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括一个或多个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信 模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
调制解调处理器可以包括调制器和解调器。其中,调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器170A,受话器170B等)输出声音信号,或通过显示屏194显示图像或视频。在一些实施例中,调制解调处理器可以是独立的器件。在另一些实施例中,调制解调处理器可以独立于处理器110,与移动通信模块150或其他功能模块设置在同一个器件中。
无线通信模块160可以提供应用在电子设备200上的包括无线局域网(wirelesslocal area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(Bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成一个或多个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备200的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备200可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
电子设备200通过GPU,显示屏194,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏194和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器110可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。
显示屏194用于显示图像,视频等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备200可以包括1个或N个显示屏194,N为大于1的正整数。电子设备200可以通过ISP,摄像头193,视频编解码器195,GPU,显示屏194以及应用处理器等实现拍摄功能。ISP用于处理摄像头193反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处 理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头193中。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备200可以包括1个或N个摄像头193,N为大于1的正整数。
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当电子设备200在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。
视频编解码器195用于对数字视频压缩或解压缩。电子设备200可以支持一种或多种视频编解码器195。这样,电子设备200可以播放或录制多种编码格式的视频,例如:动态图像专家组(moving picture experts group,MPEG)1,MPEG2,MPEG3,MPEG4等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备200的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。
内部存储器121可以用于存储一个或多个计算机程序,该一个或多个计算机程序包括指令。处理器110可以通过运行存储在内部存储器121的上述指令,从而使得电子设备200执行本申请一些实施例中所提供的视频切换方法,以及各种功能应用和数据处理等。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统;该存储程序区还可以存储一个或多个应用程序(比如图库、联系人等)等。存储数据区可存储电子设备200使用过程中所创建的数据(比如照片,联系人等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如一个或多个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。在另一些实施例中,处理器110通过运行存储在内部存储器121的指令,和/或存储在设置于处理器中的存储器的指令,来使得电子设备200执行本申请实施例中提供的视频切换方法,以及各种功能应用和数据处理。
电子设备200可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。音频模块170用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块170还可以用于对音频信号编码和解码。在一些实施例中,音频模块170可以设置于处理器110中,或将音频模块170的部分功能模块设置于处理器110中。扬声器170A,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备200可以通过扬声器170A收听音乐,或收听免提通话。受话器170B,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备200接听电话或语音信息时,可以通过将受话器170B靠近人耳接听语音。麦克风170C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风170C发声,将声音信号输入到麦克风170C。电子设备200可以设置一个或多个麦克风170C。在另一些实施例中,电子设备200可以设置两个麦克风170C,除了采集声音信号,还可以实现降噪功能。在另一些实施例中,电子设备200还可以设置三 个,四个或更多麦克风170C,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。耳机接口170D用于连接有线耳机。耳机接口170D可以是USB接口130,也可以是3.5mm的开放移动电子设备平台(open mobile terminal platform,OMTP)标准接口,美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口。
传感器模块180可以包括压力传感器,陀螺仪传感器,气压传感器,磁传感器,加速度传感器,距离传感器,接近光传感器,指纹传感器,温度传感器,触摸传感器,环境光传感器,骨传导传感器等。触摸传感器可以设置于显示屏,由触摸传感器与显示屏组成触摸屏,也称“触控屏”。
另外,上述电子设备200中还可以包括按键、马达、指示器以及SIM卡接口等一种或多种部件,本申请实施例对此不做任何限制。
当视频切换装置为硬件设备时,其可以是上述介绍的电子设备200,包括显示屏194、处理器110和内部存储器121,内部存储器121可以是独立存在,通过通信总线与处理器110相连接。内部存储器121也可以和处理器110集成在一起。内部存储器121可以存储计算机指令,当内部存储器121中存储的计算机指令被处理器110执行时,可以实现本申请的模型优化方法。另外,内部存储器121中还可以存储有处理器在执行本申请实施例视频切换方法的过程中所需的数据以及所产生的中间数据和/或结果数据。
示例性地,图3为本申请中的视频切换装置的一种应用示意图,如图3所示,可以由电子设备供应商或应用供应商将视频切换装置100提供的功能抽象成为一项应用,如视频切换应用,电子设备供应商在电子设备200上安装该视频切换应用,或应用供应商供用户购买该视频切换应用。用户购买该电子设备200即可以使用该电子设备200上安装的视频切换应用,也可以从线上下载该视频切换应用,用户使用该视频切换应用进行视频切换。
接下来对本申请实施例提供的视频切换方法进行介绍。
图4是本申请实施例提供的一种视频切换方法的流程图。该视频切换方法可以由前述的视频切换装置来执行,参见图4,该方法包括以下步骤:
步骤S40:向用户提供编辑界面。
在本申请实施例中,将该视频切换装置的功能抽象成为一项视频切换应用。以手机上安装视频切换应用为例,如图5a所示,手机的界面上包括状态栏511、主界面510和Dock栏501。状态栏511中可以包括运营商的名称(例如中国移动)、时间、信号强度和当前的剩余电量等。下述的状态栏511内容相似在此不再赘述。主界面510上包括应用程序,这些应用程序包括嵌入式应用程序及可下载应用程序,如图5a所示,主界面510上包括日历、闹钟和视频切换应用等。Dock栏501中包括常用的应用程序的,如电话、信息及相机。
用户点击主界面510上的“视频切换”图标,进入视频切换应用,所述视频切换应用呈现编辑界面,该编辑界面包括视频导入界面512(如图5b)。用户可以通过视频导入界面512导入想要进行编辑以实现视频切换的待处理视频,用户可以通过读取手机上的图库中的视频来获取待处理视频,或通过相机拍摄获取待处理视频或通过网页下载相应的待处理视频,本申请对此不做具体限定。如图5b所示,在视频导入界面512上呈现供用户选择的视频,包括视频1、视频2和视频3,每段视频的内容有所差异,时长也可以有所差异,如视频1的时长为30分钟,视频2的时长为15分钟,视频3的时长为30分钟,在其中一种实施例中,视频的时长可以没有差异。用户可以选择一段视频也可以选择多段视频,在用户选择一段视频的 时候,则可以根据对视频中场景的识别,将该一段视频剪辑成两段或多段视频。或由用户确定剪辑的位置和剪辑出的视频的数量。可以理解,用户选择一段视频进行视频切换,则该一段视频中的目标对象相同,但是场景有所差异。可以理解所述场景包括除目标对象之外的对象,除目标对象之外的对象包括如人物、背景环境等,背景环境可以包括如草原、室内、天空、静止的物体等。如用户拍摄一段视频,视频中的场景由室内转为室外,则可以将该一段视频剪辑成室内的视频和室外的视频。
用户点击图5b中的视频1和视频3,选择将视频1和视频3作为待处理视频,视频1和视频3可以是以同一角度拍摄的相同舞者跳的同一支舞蹈的视频,视频1和视频3中对象都是同一舞者,但该舞者的服装、妆容或发型不同,视频1和视频3中的场景也不同,视频1为第一视频,视频2为第二视频,可以理解,还可以选择视频2作为待处理视频,即待处理视频的视频数量不限于两个,本申请对待处理视频的视频数量不做具体限定。
在确定待处理视频后,由视频切换装置对该待处理视频中的图像帧进行图像处理,识别图像帧图像中的对象,在实际应用中,可以逐帧获取视频中的图像帧,并对获取得到的图像帧进行图像识别,以得到视频中的对象。还可以获取视频中的多帧图像帧。比如,可以获取包括具体视频对象的视频,然后从视频中截取多帧图像帧,如可以截取到视频中第1秒、第11秒、第20秒、第34秒等的多帧图像帧,其中,每多帧图像帧都对应一个具体的时间信息。又比如,还可以按照一定的时间间隔,从视频中截取多帧图像帧,如可以每隔10秒钟对视频进行截取,截取到视频中第1秒、第11秒、第21秒等等的多帧图像帧。
对于视频1和视频2,识别其中的生命体,基于视频1和视频2中出现的舞者有三个,如图5c所示,编辑界面还可以包括对象呈现界面513,该对象呈现界面513上呈现待处理视频中对象的识别结果,包括对象A、对象B和对象C的人脸。可以理解,对象A、对象B和对象C是对视频1和视频2中的图像帧进行图像识别后,识别出的对象。其中,对象呈现界面513可以呈现人物的脸部或整体。即在视频1的图像帧中包括对象A和/或对象B和/或对象C,视频2的图像帧中包括对象A和/或对象B和/或对象C,而对视频1和视频2的所有图像帧进行图像识别后,识别出的对象包括对象A、对象B和对象C。识别出的对象不限于其数量,识别出的对象的数量由视频中实际的对象数量决定。
可以理解,在用户从视频导入界面512中选出一段视频的时候,视频切换装置对这一段视频进行剪辑,剪辑出多段视频,则待处理视频为该剪辑出的多段视频,则视频切换装置将相同人具有相同姿态表情、身体姿态表情且具有一定时间间隔的图像帧筛选出来,由于人在一定时间内表情和姿态非常接近,为达到最好的效果,筛选出的图像帧需要在一定时间间隔内。
步骤S41:确定目标对象。
在本申请实施例中,在对某一图像帧中的对象进行识别,可以识别出多个对象,通过对齐目标对象,实现图像帧的切换。可以由用户自行选择对象,以确定目标对象,也可以由视频切换装置自动确定目标对象。目标对象是人物时,则视频切换装置可以根据该目标人物,将第一图像帧和第二图像帧中同一目标人物具有相同姿态表情、相同身体姿态表情的图像帧筛选出来。
在本申请实施例中,可以对第一视频中的图像帧逐帧处理,进行人脸识别,通过图像帧中人脸的RGB数据,结合人脸识别算法,确定该人脸是否为目标对象。对第二视频中的图像 帧的处理同理,在此不再赘述。示例性地,对于第一图像帧,可以通过对第一图像帧的RGB图像进行处理后,获得人脸矩形框,进而使用人脸识别技术对该人脸矩形框中的人脸进行识别,如可以使用Face ID技术将人脸标号,以确定该人脸是视频中的哪个人物,进而确定第一图像帧中的目标对象。对第二图像帧的处理同理,在此不再赘述。
在本申请实施例中,如图5c所示,对象呈现界面513中呈现对象A、对象B及对象C,用户可以点击对象A,以选择将对象A确定为目标对象。在其中一种可能实现方式中,视频切换装置可以自动将位于图像帧画面中心的对象确定为目标对象。
步骤S42:计算第一图像帧和第二图像帧之间所述目标对象的相似度,得到相似度值。
在本申请实施例中,可以先获取第一视频中的某一第一图像帧A1,然后将该第一图像帧A1与第二视频中所有的第二图像帧进行比较,如选择第二视频中的某一第二图像帧B1,计算该第一图像帧A1和该第二图像帧B1之间所述目标对象的相似度,再获取第二视频中的下一第二图像帧B2,计算该第一图像帧A1和该第二图像帧B2之间所述目标对象的相似度,依次类推,计算出该第一图像帧与第二视频中所有的第二图像帧之间所述目标对象的相似度。也可以先获取第二视频中的某一第二图像帧B1,然后将该第二图像帧B1与第一视频中所有的第一图像帧进行比较,如选择第一视频中的某一第一图像帧A1,计算该第一图像帧A1和该第二图像帧B1之间所述目标对象的相似度,再获取第一视频中的下一第一图像帧A2,计算该第一图像帧A2和该第二图像帧B1之间所述目标对象的相似度,依次类推,计算第一图像帧和第二图像帧之间所述目标对象的相似度。
请一并参阅图6,可以通过以下方式计算目标对象的相似度,得到相似度值。
步骤S61:获取所述第一图像帧和所述第二图像帧中所述目标对象的特征。
步骤S62:计算所述第一图像帧和所述第二图像帧之间所述目标对象的特征的距离,得到相似度值。
在本申请实施例中,可以获取所述目标对象的特征如脸部特征和/或目标对象的身体姿态特征。示例性地,可以获取人脸的二维特征、三维特征、人脸网格中的一种或多种,计算第一图像帧中人脸的二维特征与第二图像帧中人脸的二维特征之间的距离,得到距离度量,进而根据距离度量得到相似度值。也可以计算第一图像帧中人脸的三维特征与第二图像帧中人脸的三维特征之间的距离,得到距离度量,进而根据距离度量得到相似度值。也可以计算第一图像帧中人脸网格的特征与第二图像帧中人脸网格的特征之间的距离,得到距离度量,进而根据距离度量得到相似度值。也可以综合上述特征的距离度量,经过处理得到最终相似度值。该距离可以为欧式距离、余弦距离等,本申请对此不做具体限定。可以理解,距离度量用于衡量个体在空间上存在的距离,距离越远说明个体间的差异越大。相似度度量,即计算个体间的相似程度,与距离度量相反,相似度度量的值越小,说明个体间相似度越小,差异越大。
在本申请实施例中,可以计算脸部特征和/或目标对象的身体姿态特征的距离,以保证第一图像帧和第二图像帧中目标对象脸部的相似,或保证第一图像帧和第二图像帧中目标对象身体姿态的相似,也可以保证第一图像帧和第二图像帧中目标对象的脸部和身体姿态均相似。
在本申请实施例中,可以计算第一图像帧与第二图像帧之间的所述目标对象的相似误差值,即,计算目标对象的脸部特征和/或目标对象的身体姿态特征的距离,根据该相似误差值得到相似度值,可以理解相似误差值越大,则个体间相似度越小,差异越大。
步骤S43:获取切换图像帧,其中所述切换图像帧包括所述相似度值大于或等于预设阈值的第一图像帧和第二图像帧。
在本申请实施例中,若所述相似度值大于或等于预设阈值,则该第一图像帧和第二图像帧中所述目标对象的特征相似,如脸部特征相似和/或身体姿态相似,为了达到更好的效果,第一图像帧和第二图像帧的场景或目标对象的服装、发型等可不相似。
可以理解,在第一视频的第一图像帧中的目标对象与第二视频中的第二图像帧的目标对象的相似度值达到预设阈值时,则可以得到一对切换图像帧包括该第一图像帧和第二图像帧。如在对第一视频和第二视频进行切换处理的过程中,可以得到多对切换图像帧。示例性地,第一视频的第一图像帧A1的目标对象与第二视频的第二图像帧B1的目标对象之间的相似度值大于预设阈值,则得到一对切换图像帧包括第一图像帧A1和第二图像帧B1。第一视频的第一图像帧A2的目标对象与第二视频的第二图像帧B2的目标对象之间的相似度值大于预设阈值,则得到一对切换图像帧包括第一图像帧A2和第二图像帧B2。
示例性地,所述编辑界面还可以包括切换图像帧界面711(如图7a),用户可以通过切换图像帧界面711选择进行切换的图像帧。如图7a所示,在切换图像帧界面711上呈现切换图像帧1及切换图像帧2,可以理解,切换图像帧界面711可以包括多对切换图像帧。在用户选择根据切换图像帧1实现第一视频和第二视频的切换时,切换图像帧1包括第一图像帧A100和第二图像帧B200,则第一图像帧A100后连接的图像帧为第二图像帧B200,或,第二图像帧B200后连接的图像帧为第一图像帧A100。
步骤S44:根据所述切换图像帧将所述第一视频的第一图像帧切换至所述第二视频的第二图像帧或将所述第二视频的第二图像帧切换至所述第一视频的第一图像帧。
在本申请实施例中,所述根据切换图像帧将所述第一视频切换至所述第二视频可以实现为:根据得到的切换图像帧实现所述第一视频和所述第二视频的切换,或实现为根据用户选择的切换图像帧实现所述第一视频和所述第二视频的切换。
在本申请实施例中,所述根据所述切换图像帧实现所述第一视频和所述第二视频的切换具体可以包括:根据所述切换图像帧,可以得到第一视频和第二视频进行切换的位置,则根据第一视频和第二视频进行切换的位置进行视频切换。示例性地,一对切换图像帧包括第一图像帧A10和第二图像帧B10,根据第一图像帧A10和第二图像帧B10实现所述第一视频和所述第二视频的切换,第一图像帧A10连接第二图像帧B10,即第一图像帧A10后的图像帧为第二图像帧B10,或第二图像帧B10后的图像帧为第一图像帧A10。以在播放第一图像帧A10的时候即切换至第二图像帧B10。或,在播放第二图像帧B10的时候切换至第一图像帧A10。
请一并参阅图7b和图7c,图7b中第一图像帧展示界面712中呈现第一图像帧的图像,图7c中第二图像帧展示界面713呈现第二图像帧的图像,图7b中第一图像帧的图像中包括目标对象、草地和云,图7c中第二图像帧的图像中包括目标对象,第一图像帧的图像中目标对象的脸和/或身体姿态与第二图像帧的图像中目标对象的脸和/或身体姿态相似,但场景不同,如目标对象的服装、以及背景有所不同。
在其中一种可能实现方式中,可以根据切换图像帧合并成一个视频,以实现第一视频和第二视频的切换。如将得到的所有切换图像帧合并成一个视频,则在该合并后的视频中第一图像帧和第二图像帧相邻,且该合并后的视频与第一视频和第二视频均有所不同。
在本申请实施例中,为了达到更好的效果,可以适当增加一些图像帧,如得到两对切换图像帧,其中一对切换图像帧包括第一图像帧A10和第二图像帧B20,另一对切换图像帧包括第一图像帧A31和第二图像帧B41。则根据切换图像帧合并成一个视频,可以获取第一图像帧A10之前的第一图像帧A1至A9,获取第二图像帧B20之后的第二图像帧B21至B40,获取第一图像帧A31之后的第一图像帧。可以将第一图像帧A1至A10、第二图像帧B20至B41、第一图像帧A31及之后的第一图像帧合并成一个视频,示例性地,依序播放该视频则为从第一图像帧A1开始显示播放,依序播放显示至第一图像帧A10,在播放显示第一图像帧A10之后切换至第二图像帧B20,而非是继续播放显示第一图像帧A11,然后播放显示第二图像帧B20之后的第二图像帧B21至B40,在播放显示至第二图像帧B41之后切换至第一图像帧A31,然后播放显示第一图像帧A31之后的图像帧。
在本申请实施例中,可以将切换图像帧中的图像帧的目标对象对齐,如将第一图像帧中目标对象和第二图像帧中的目标对象在图像画面的位置对齐,以使得在显示第一图像帧后切换至第二图像帧时,用户视觉上看目标对象的变化不大。
可以理解,在待处理视频包括三个视频片段时,可以获得视频1和视频2的一对切换图像帧,然后获取视频2和视频3的一对切换图像帧,再获取视频3和视频1的一对切换图像帧等。
在本申请实施例中,所述视频切换方法可以自动实现视频切换,用户仅需输入待处理视频确定目标对象,就可以自动实现视频切换,根据目标对象的特征自动进行视频切换,避免少人力和时间的浪费。
请参阅图8,图8为本申请实施例提供的一种视频切换的流程示意图。以目标对象为人物的脸部进行说明。
步骤S81:通过RGB图像获取人脸矩形框。
在本申请实施例中,获取第一图像帧和第二图像帧的RGB图像,对这两张RGB图像进行图像处理,获得人脸矩形框,即识别出这两张RGB图像中的人脸所在区域,并使用人脸矩形框将人脸框起来。
步骤S82:使用识别技术将人脸标号,以确定目标对象。
在本申请实施例中,通过人脸识别技术识别人脸矩形框中的人脸,如使用Face ID技术将识别出的人脸进行标号,以确定该人脸是视频中的哪个人。视频切换装置可以根据人脸所在的位置,将位于画面中间的人脸确定为目标对象,也可以由用户指定哪个人是目标对象。
步骤S83:人脸二维特征点计算和/或人脸三维特征点计算和/或人脸网格计算。
在本申请实施例中,在步骤S82确定了第一图像帧和第二图像帧中为同一人的目标对象之后,计算第一图像帧和第二图像帧该目标对象的相似度。获取第一图像帧中该目标对象的人脸二维特征点,获取第二图像帧中该目标对象的人脸二维特征点,然后计算两个图像帧的该目标对象的二维特征点之间的距离,得到相似误差值,确定目标对象的脸在两个图像帧之间的差异值。和/或,获取第一图像帧中该目标对象的人脸三维特征点,获取第二图像帧中该目标对象的人脸三维特征点,然后计算两个图像帧的该目标对象的三维特征点之间的距离,得到相似误差值,确定目标对象的脸在两个图像帧之间的差异值。和/或,获取第一图像帧中该目标对象的人脸网格点,获取第二图像帧中该目标对象的人脸网格点,然后计算两个图像帧的该目标对象的网格点之间的距离,得到相似误差值,确定目标对象的脸在两个图像帧之 间的差异值。可以根据二维特征点之间的差异值和/或三维特征点之间的差异值和/或网格点之间的差异值得到相似误差值。
步骤S84:相似误差值小于或等于误差阈值。
在本申请实施例中,判断所述相似误差值是否小于或等于误差阈值,以此确定该两个图像帧之间目标对象的相似度。
步骤S85:得到切换图像帧。
在本申请实施例中,若第一图像帧和第二图像帧之间的相似误差值小于或等于误差阈值,则将该两个图像帧选出,得到切换图像帧。然后根据切换图像帧进行视频切换。
在本申请实施例中,若所述第一图像帧和第二图像帧来自同一段视频,基于人在定时间内表情和姿态非常接近,为达到最好的效果,在一定时间间隔内,找到相似误差值最小一组的图像帧作为切换图像帧。用户可以根据自己的喜好,在这些切换图像帧中,找出自己喜欢的一对或多对,视频切换装置自动将两帧图像帧中人脸完整对齐,并实现切换效果,以使两段视频无缝的巧妙衔接,达到切换的两帧图像帧人脸相似度高,环境切换的炫酷效果。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
上述各个附图对应的流程的描述各有侧重,某个流程中没有详述的部分,可以参见其他流程的相关描述。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。实现视频切换的计算机程序产品包括一个或多个进行视频切换的计算机指令,在计算机上加载和执行这些计算机程序指令时,全部或部分地产生按照本申请实施例图4和图6所述的流程或功能。
所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如:同轴电缆、光纤、数据用户线(digital subscriber line,DSL))或无线(例如:红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如:软盘、硬盘、磁带)、光介质(例如:数字通用光盘(digital versatile disc,DVD))、或者半导体介质(例如:固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述为本申请提供的实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (12)

  1. 一种视频切换方法,其特征在于,所述方法包括:
    确定目标对象;
    计算第一图像帧和第二图像帧之间所述目标对象的相似度,得到相似度值,其中所述第一图像帧来自第一视频,所述第二图像帧来自第二视频;
    获取切换图像帧,其中所述切换图像帧包括所述相似度值大于或等于预设阈值的第一图像帧和第二图像帧;
    根据所述切换图像帧将所述第一视频的第一图像帧切换至所述第二视频的第二图像帧或将所述第二视频的第二图像帧切换至所述第一视频的第一图像帧。
  2. 根据权利要求1所述的方法,其特征在于,所述计算第一图像帧和第二图像帧之间所述目标对象的相似度,得到相似度值包括:
    获取所述第一图像帧和所述第二图像帧中所述目标对象的特征;
    计算所述第一图像帧和所述第二图像帧之间所述目标对象的特征的距离,得到相似度值。
  3. 根据权利要求2所述的方法,其特征在于,所述目标对象的特征包括目标对象的脸部特征和/或目标对象的身体姿态特征。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述方法还包括:
    提供编辑界面,所述编辑界面包括对所述第一图像帧和所述第二图像帧进行识别后呈现的对象;
    则所述确定目标对象包括:
    响应于用户的选择确定目标对象。
  5. 根据权利要求4所述的方法,其特征在于,所述编辑界面还包括供用户选择的一对或多对切换图像帧;
    则所述根据所述切换图像帧将所述第一视频的第一图像帧切换至所述第二视频的第二图像帧或将所述第二视频的第二图像帧切换至所述第一视频的第一图像帧包括:
    响应用户选择的一对或多对切换图像帧,根据所述一对或多对切换图像帧将所述第一视频的第一图像帧切换至所述第二视频的第二图像帧或将所述第二视频的第二图像帧切换至所述第一视频的第一图像帧。
  6. 一种视频切换装置,其特征在于,所述装置包括:
    确定模块,用于确定目标对象;
    计算模块,用于计算第一图像帧和第二图像帧之间所述目标对象的相似度,得到相似度值,其中所述第一图像帧来自第一视频,所述第二图像帧来自第二视频;
    获取模块,用于获取切换图像帧,其中所述切换图像帧包括所述相似度值大于或等于预设阈值的第一图像帧和第二图像帧;
    切换模块,用于根据所述切换图像帧将所述第一视频的第一图像帧切换至所述第二视频的第二图像帧或将所述第二视频的第二图像帧切换至所述第一视频的第一图像帧。
  7. 根据权利要求6所述的装置,其特征在于,所述计算模块具体用于:
    获取所述第一图像帧和所述第二图像帧中所述目标对象的特征;
    计算所述第一图像帧和所述第二图像帧之间所述目标对象的特征的距离,得相似度值。
  8. 根据权利要求7所述的装置,其特征在于,所述目标对象的特征包括目标对象的脸部特征和/或目标对象的身体姿态特征。
  9. 根据权利要求6至8任一项所述的装置,其特征在于,所述装置还包括:
    编辑模块,用于提供编辑界面,所述编辑界面包括对所述第一图像帧和所述第二图像帧进行识别后呈现的对象;
    则所述确定模块具体用于:
    响应于用户的选择确定目标对象。
  10. 根据权利要求9所述的装置,其特征在于,所述编辑界面还包括供用户选择的一对或多对切换图像帧;
    则所述切换模块具体用于:
    响应用户选择的一对或多对切换图像帧,根据所述一对或多对切换图像帧将所述第一视频的第一图像帧切换至所述第二视频的第二图像帧或将所述第二视频的第二图像帧切换至所述第一视频的第一图像帧。
  11. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序代码,当所述计算机程序代码被电子设备执行时,所述电子设备执行上述权利要求1至5中任一项所述的方法。
  12. 一种电子设备,其特征在于,所述电子设备包括处理器和存储器,所述存储器用于存储一组计算机指令,当所述处理器执行所述一组计算机指令时,所述电子设备执行上述权利要求1至5中任一项所述的方法。
PCT/CN2021/143821 2021-01-05 2021-12-31 视频切换方法、装置、存储介质及设备 WO2022148319A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US18/260,192 US20240064346A1 (en) 2021-01-05 2021-12-31 Video Switching Method and Apparatus, Storage Medium, and Device
EP21917358.0A EP4266208A4 (en) 2021-01-05 2021-12-31 VIDEO SWITCHING METHOD AND APPARATUS, STORAGE MEDIUM AND DEVICE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110008033.0 2021-01-05
CN202110008033.0A CN114724055A (zh) 2021-01-05 2021-01-05 视频切换方法、装置、存储介质及设备

Publications (1)

Publication Number Publication Date
WO2022148319A1 true WO2022148319A1 (zh) 2022-07-14

Family

ID=82234015

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/143821 WO2022148319A1 (zh) 2021-01-05 2021-12-31 视频切换方法、装置、存储介质及设备

Country Status (4)

Country Link
US (1) US20240064346A1 (zh)
EP (1) EP4266208A4 (zh)
CN (1) CN114724055A (zh)
WO (1) WO2022148319A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230353798A1 (en) * 2022-04-29 2023-11-02 Rajiv Trehan Method and system of generating on-demand video of interactive activities
CN115243023A (zh) * 2022-07-20 2022-10-25 展讯通信(上海)有限公司 一种图像处理方法、装置、电子设备及存储介质
US20240029435A1 (en) * 2022-07-25 2024-01-25 Motorola Solutions, Inc. Device, system, and method for altering video streams to identify objects of interest
CN116095221B (zh) * 2022-08-10 2023-11-21 荣耀终端有限公司 一种游戏中的帧率调整方法及相关装置
CN118118734A (zh) * 2022-11-30 2024-05-31 华为技术有限公司 一种视频处理的方法以及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6636220B1 (en) * 2000-01-05 2003-10-21 Microsoft Corporation Video-based rendering
CN110675433A (zh) * 2019-10-31 2020-01-10 北京达佳互联信息技术有限公司 视频处理方法、装置、电子设备及存储介质
CN111294644A (zh) * 2018-12-07 2020-06-16 腾讯科技(深圳)有限公司 视频拼接方法、装置、电子设备及计算机存储介质
CN111460219A (zh) * 2020-04-01 2020-07-28 百度在线网络技术(北京)有限公司 视频处理方法及装置、短视频平台
CN111970562A (zh) * 2020-08-17 2020-11-20 Oppo广东移动通信有限公司 视频处理方法、视频处理装置、存储介质与电子设备

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10734027B2 (en) * 2017-02-16 2020-08-04 Fusit, Inc. System and methods for concatenating video sequences using face detection

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6636220B1 (en) * 2000-01-05 2003-10-21 Microsoft Corporation Video-based rendering
CN111294644A (zh) * 2018-12-07 2020-06-16 腾讯科技(深圳)有限公司 视频拼接方法、装置、电子设备及计算机存储介质
CN110675433A (zh) * 2019-10-31 2020-01-10 北京达佳互联信息技术有限公司 视频处理方法、装置、电子设备及存储介质
CN111460219A (zh) * 2020-04-01 2020-07-28 百度在线网络技术(北京)有限公司 视频处理方法及装置、短视频平台
CN111970562A (zh) * 2020-08-17 2020-11-20 Oppo广东移动通信有限公司 视频处理方法、视频处理装置、存储介质与电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4266208A4

Also Published As

Publication number Publication date
US20240064346A1 (en) 2024-02-22
EP4266208A1 (en) 2023-10-25
EP4266208A4 (en) 2024-06-12
CN114724055A (zh) 2022-07-08

Similar Documents

Publication Publication Date Title
WO2022148319A1 (zh) 视频切换方法、装置、存储介质及设备
CN112714214B (zh) 一种内容接续方法、设备、系统、gui及计算机可读存储介质
CN111476911B (zh) 虚拟影像实现方法、装置、存储介质与终端设备
CN111179282B (zh) 图像处理方法、图像处理装置、存储介质与电子设备
WO2020192461A1 (zh) 一种延时摄影的录制方法及电子设备
CN110381195A (zh) 一种投屏显示方法及电子设备
WO2020140726A1 (zh) 一种拍摄方法及电子设备
US11682148B2 (en) Method for displaying advertisement picture, method for uploading advertisement picture, and apparatus
WO2021057673A1 (zh) 一种图像显示方法及电子设备
CN112954251B (zh) 视频处理方法、视频处理装置、存储介质与电子设备
WO2021104114A1 (zh) 一种提供无线保真WiFi网络接入服务的方法及电子设备
CN113473013A (zh) 图像美化效果的显示方法、装置和终端设备
CN117133306A (zh) 立体声降噪方法、设备及存储介质
CN113593567B (zh) 视频声音转文本的方法及相关设备
CN112269554B (zh) 显示系统及显示方法
CN114449333B (zh) 视频笔记生成方法及电子设备
US20230319217A1 (en) Recording Method and Device
CN115119214B (zh) 一种立体声组网方法、系统及相关装置
CN111626931B (zh) 图像处理方法、图像处理装置、存储介质与电子设备
CN117440194A (zh) 一种投屏画面的处理方法及相关装置
CN115730091A (zh) 批注展示方法、装置、终端设备及可读存储介质
WO2024193523A1 (zh) 端云协同的图像处理方法及相关装置
WO2023071730A1 (zh) 声纹注册方法及电子设备
CN115841099B (zh) 基于数据处理的页面填充词的智能推荐方法
WO2024140123A1 (zh) 一种定格动画生成方法、电子设备、云端服务器及系统

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21917358

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 18260192

Country of ref document: US

ENP Entry into the national phase

Ref document number: 2021917358

Country of ref document: EP

Effective date: 20230721

NENP Non-entry into the national phase

Ref country code: DE