CN117714816A

CN117714816A - Electronic equipment and multimedia data generation method thereof

Info

Publication number: CN117714816A
Application number: CN202310595352.5A
Authority: CN
Inventors: 李光源; 苗锋
Original assignee: Honor Device Co Ltd
Current assignee: Honor Device Co Ltd
Priority date: 2023-05-24
Filing date: 2023-05-24
Publication date: 2024-03-15

Abstract

The application relates to an electronic device and a multimedia data generation method thereof. The multimedia data generation method comprises the following steps: displaying a shooting preview picture of the dynamic photo; generating and storing a first feature detection result of the preview video data; and detecting a shooting instruction of a user, and generating and storing a first feature identifier of a first dynamic photo, wherein the first dynamic photo comprises preview video data, a first preview picture of the shooting instruction detected and collected video data collected after the shooting instruction is detected. By the method, the video in the photographed dynamic photo is subjected to feature detection, and the corresponding relation between the dynamic photo and the feature detection result is stored by adding the mark, so that the photographed content corresponding to the dynamic photo can be displayed more succinctly through the mark. In the process of manufacturing the short video, the dynamic photos can be classified or sorted according to the marks and then displayed, so that a user can conveniently and quickly select the proper dynamic photos based on the marks, and the user experience is good.

Description

Electronic equipment and multimedia data generation method thereof

Technical Field

The application relates to the technical field of terminal equipment. And more particularly, to an electronic device and a multimedia data generation method thereof.

Background

The dynamic photo includes not only the photo taken when the user presses the photographing key, but also the video within a period of time before and after the time point when the user presses the photographing key, which also enables the dynamic photo to record richer photographing contents, and thus, the use is becoming more and more widespread.

Since the dynamic photo includes video, the user can generate a short video using multiple dynamic photos, which can include, for example: video blogs (Vlog) and the like, a plurality of dynamic photos are spliced back and forth to generate the Vlog, if the number of the dynamic photos in the album is too large, a user needs to search and confirm shooting contents of the dynamic photos for making the Vlog one by one, and the user experience is poor.

Disclosure of Invention

The application provides an electronic device and a multimedia data generation method thereof.

In a first aspect, an embodiment of the present application provides a method for generating multimedia data, which is applied to an electronic device with a shooting function, and the method includes:

displaying a shooting preview picture of the dynamic photo;

generating and storing a first feature detection result of the preview video data;

a shooting instruction of a user is detected, a first characteristic identification of a first dynamic photo is generated and stored, wherein the first dynamic photo comprises preview video data, a first preview picture of the shooting instruction is detected, and acquired video data acquired after the shooting instruction is detected, and

The first feature identifier includes a first feature detection result, a second feature detection result of the acquired video data, and a third feature detection result of the first preview screen.

In the present application, the electronic device herein may also be referred to as a terminal device, and is various mobile terminals having a photographing function, for example: a mobile phone. The shooting preview picture can be a preview picture that the user views the mobile phone to the shooting object after the user starts the dynamic photo mode of the mobile phone and before the user does not press the shooting key. The electronic device may cache and update the preview video data. If the user clicks a photographing key of a camera application of the electronic device, the preview video data may be used to record a change process of a photographing object before the user clicks the photographing key of the camera application. The first feature detection result may include a feature detection result obtained by performing feature detection on shot content corresponding to each video frame included in the preview video data by the electronic device, including a scene detection result, a human body detection result, a face detection result, an action evaluation result, and the like.

For example, the photographing instruction herein may be an instruction generated by a user clicking a photographing time point of a photographing key of a camera application of the electronic device. The first dynamic photo includes preview video data, a first preview screen in which a photographing instruction is detected, and captured video data captured after the photographing instruction is detected.

It is understood that the first feature identifier includes a first feature detection result of the preview video data, a second feature detection result of the captured video data, and a third feature detection result of the first preview screen. The second feature detection result may include a feature detection result obtained by performing feature detection on the photographed content corresponding to each video frame included in the collected video data by the electronic device. The third feature detection result may include a feature detection result obtained by performing feature detection on the shot content corresponding to the first preview screen by the electronic device.

In a possible implementation manner of the first aspect, the first feature detection result, the second feature detection result, and the third feature detection result include: at least one of scene detection results, human body detection results, human face detection results and action evaluation results.

In one possible implementation of the first aspect, generating and storing a first feature detection result of the preview video data includes:

in the process of generating and storing the preview video data, performing feature detection on video frames included in the preview video data frame by frame to obtain image features of each video frame, wherein a first feature detection result comprises image features obtained by combining and de-repeating the image features of each video frame.

In a possible implementation manner of the first aspect, detecting a shooting instruction of a user, generating and storing a first feature identifier of a first dynamic photo includes:

after a shooting instruction of a user is detected, feature detection is carried out on video frames included in preview video data frame by frame, and a first feature detection result is obtained.

In a possible implementation of the first aspect, the content of the first feature identifier includes: at least one of sports, travel, scenery, and child interests.

In the present application, the first feature identifier may be a flag of each dynamic photo, and each dynamic photo may include at least one flag, for example: fun for children, figures, scenery, etc.

In a possible implementation of the first aspect, the first feature identifier is displayed on a first dynamic photo.

In this application, the first feature identifier may be displayed in a local area on the first dynamic photo.

In a possible implementation of the first aspect, generating and storing a first feature identifier of a first dynamic photo includes:

and merging and de-repeating the first feature detection result, the second feature detection result and the third feature detection result to obtain a first feature identifier.

In a possible implementation of the first aspect, the method further includes:

in the storage display interface of the dynamic photos, the first dynamic photo is arranged adjacent to the stored second dynamic photo, wherein the second characteristic identifier of the second dynamic photo is identical to the first characteristic identifier of the first dynamic photo.

In the present application, the storage display interface of the dynamic photo here may be an application interface of a gallery or album application, that is, an authoring interface in fig. 2 (c). In some embodiments, the storage display interface of the dynamic photo may also be an application interface for making the dynamic photo in a gallery or an album application displayed by a screen of the electronic device after the user clicks a "one-touch-up" (one-touch-large-tab) control in the application interface of the gallery or album application.

In a possible implementation of the first aspect, the method further includes:

in the storage display interface of the dynamic photos, the first dynamic photo is located before the third dynamic photo, wherein the first feature identifier of the first dynamic photo belongs to the feature identifier in the sorting condition, and the third feature identifier of the third dynamic photo does not belong to the feature identifier in the sorting condition.

In the application, the photo album can be used for classifying and sorting the dynamic photos according to the marks and then presenting the dynamic photos to a user, that is, the photo album can display the prioritized results of the dynamic photos. Before the user clicks the "one-key-pad" (one-key-large-pad) control, that is, in the dynamic photo options corresponding to the "one-key-pad" control, the user is given default choice and recommends a part of dynamic photos and clicks. In some embodiments, a portion of other photos or videos may be included in addition to the recommended dynamic photos. Dynamic photos, other photos, or videos may be uniformly ordered (e.g., by time of capture, etc.), and the album may default to selecting dynamic photos, other photos, or videos for use in generating Vlog for recommendation to the user (on-hook).

In a possible implementation of the first aspect, the method further includes:

a short video generation manipulation of the user is detected, and the dynamic photo selected by the user is generated into a short video.

In the application, the short video generation operation can be that a user can increase or decrease the selected dynamic photo, other photos or videos by himself, and click a one-click slice control to generate the Vlog video after the selection.

In a second aspect, the present application provides an electronic device, comprising:

a memory for storing instructions for execution by one or more processors of the electronic device, an

A processor, one of the processors of the electronic device, for performing the multimedia data generation of the first aspect.

In a third aspect, the present application provides a computer program product comprising: a non-transitory computer readable storage medium containing computer program code for performing the multimedia data generation of the first aspect.

Drawings

Fig. 1 (a) to fig. 1 (e) are schematic views of a scene of displaying a dynamic photo by an electronic device according to an embodiment of the present application;

fig. 2 (a) to fig. 2 (d) are schematic views of a scene of an electronic device selecting a dynamic photo to generate a short video according to an embodiment of the present application;

Fig. 3 (a) to fig. 3 (g) are schematic views of a mobile phone displaying dynamic photos according to an embodiment of the present application;

fig. 4 is a schematic flow chart of an electronic device implementing selection of dynamic photos to generate a short video according to an embodiment of the present application;

fig. 5 is a flowchart of generating a short video by implementing each functional module in a system architecture of an electronic device according to an embodiment of the present application;

fig. 6 is an interactive schematic diagram of an electronic device implementing selection of dynamic photos to generate a short video according to an embodiment of the present application;

fig. 7 (a) and fig. 7 (b) are schematic flow diagrams of selecting dynamic photos to generate short videos for the electronic device implementation provided in the embodiments of the present application;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 9 is a software structural block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and thoroughly described below with reference to the accompanying drawings.

It can be appreciated that the technical manner of the present application is applicable to various electronic devices having a photographing function, and may also be referred to as a terminal device, for example, a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a personal digital assistant (personal digital assistant, PDA), and other mobile terminals.

The following describes an embodiment of the present application by taking an electronic device as an example of a mobile phone.

Fig. 1 (a) to 1 (e) illustrate a scene of a mobile phone 100 displaying dynamic photos. As shown in fig. 1 (a), the screen of the mobile phone 100 displays a dynamic photo 101 stored in an album, the dynamic photo 101 including a photographed person a and a photographed person B. The mobile phone 100 receives a click operation of the dynamic photo 101 by the user, and in response to the operation, as shown in fig. 1 (b), the mobile phone 100 displays the dynamic photo 101 in full screen. The mobile phone 100 receives a click operation of the user on the dynamic photo 101, and in response to the click operation, as shown in fig. 1 (c) to fig. 1 (e), the mobile phone 100 plays the video 102 corresponding to the dynamic photo 101, for example, a video with a duration of 3 s. Fig. 1 (c) shows that the screen of the mobile phone 100 displays the photographed character a first, and as shown in fig. 1 (d) and 1 (e), the screen of the mobile phone 100 displays the photographed character B next to the photographed character a, and after the photograph photographing is completed, the photographed character a leaves.

In some embodiments, the user wants to make Vlog by selecting dynamic photos in an album, such as the album of fig. 2 (a) where the user opens the phone 100, the album's application interface displays multiple photos and an authoring control 2001. The mobile phone 100 receives a click operation of the authoring control 2001 by the user, and in response to the operation, as shown in fig. 2 (b), the mobile phone 100 displays an authoring interface. The authoring interface displays a plurality of photos and free authoring controls 2002. The mobile phone 100 receives a click operation of the free creation control 2002 by the user, and in response to the operation, as shown in fig. 2 (c), the mobile phone 100 displays a plurality of photos through a creation interface, where the creation interface may be an application interface for generating dynamic photos in an album. The mobile phone 100 receives a click operation of the user on the plurality of photos, and in response to the operation, as shown in fig. 2 (d), the mobile phone 100 displays the plurality of photos that the user has selected and starts to make a control 2003 through the authoring interface. Upon receiving a click operation of the start creation control 2003 by the user, the mobile phone 100 completes the Vlog creation in response to the operation. It will be appreciated that the handset 100 may save the Vlog file in a video folder.

As described above, in the process of selecting dynamic photos in the album to make Vlog, because the number of dynamic photos is large, the user needs to click on and open one by one and confirm the shooting content of the dynamic photos, and screen out the appropriate dynamic photos to make Vlog, so that the steps of making Vlog are complicated and time-consuming, and the user experience is poor.

In order to solve the above problems, embodiments of the present application provide a method for generating multimedia data. In the process of shooting dynamic photos, after the dynamic photos are marked, the dynamic photos are added into corresponding types for display, namely in the process of generating Vlog from the dynamic photos, the dynamic photos are displayed in a classified mode and can be sequenced according to a default sequencing rule or a sequencing rule set by a user for selection by the user. For example, feature detection (also called semantic detection) is performed on video data in a dynamic photo (including video data cached before a user presses a photographing key and video data collected after the user presses the photographing key), that is, feature detection is performed on photographing content corresponding to each video frame included in the video data, and feature detection results are stored, including scene detection results, human body detection results, face detection results, action evaluation results, and the like. A mark corresponding to the feature detection result is added to the video data (each video frame) so that a dynamic photograph including a video generated based on the video data also has a mark by which the photographed content corresponding to the dynamic photograph is presented.

In some embodiments, when a user opens an album (also referred to as a gallery) to select dynamic photos to generate Vlog, an application interface for generating dynamic photos in the album is entered, the album may sort and rank the dynamic photos according to the labels and then present them to the user, that is, the album may display the prioritized results of the dynamic photos. Before the user clicks the "one-key-pad" (one-key-large-pad) control, that is, in the dynamic photo options corresponding to the "one-key-pad" control, the user is given default choice and recommends a part of dynamic photos and clicks. In some embodiments, a portion of other photos or videos may be included in addition to the recommended dynamic photos. The dynamic photos, other photos or videos are uniformly ordered (for example, ordered according to shooting time, etc.), the photo album can default to select the dynamic photos, other photos or videos for generating the Vlog and recommend the dynamic photos, other photos or videos to a user (on the basis of the selection), the user can increase or decrease the selected dynamic photos, other photos or videos by himself, and after the selection, the user clicks a one-key-in-one-piece control to generate the Vlog video.

In some embodiments, the user may click on the "one-touch" control to generate the Vlog video directly using the default dynamic photograph, other photographs, or video.

In some embodiments, the markers herein may include movements such as: play football, play basketball, etc., travel: train, airplane, etc., scenery: seaside, forest, etc.

In addition, it can be appreciated that when the dynamic photo is marked, if the video frame corresponding to the video data included in the dynamic photo has a plurality of marks, the generated dynamic photo may also have a plurality of marks. For example, the user interface shown in fig. 3 (c) may be an authoring interface of an album displayed after the mobile phone 100 receives the clicking operation of the one-key sheeting control 3002 shown in fig. 3 (b), where each dynamic photo may include at least one mark, for example: fun for children, figures, scenery, etc. Illustratively, the album of the mobile phone 100 may also categorize and sort the dynamic photos according to their tags, such as: the dynamic photos are arranged in descending order of the number of marks, and the dynamic photos with the same marks are arranged adjacently.

In some embodiments, for dynamic photos, markers in the video data that are less than the threshold number of frames may also be removed, making the markers of dynamic photos more accurate. For example: the dynamic picture includes 100 frames of video data, and the marks corresponding to the video frames with the number less than 10 can be removed.

According to the method for generating the multimedia data, in the process of shooting the dynamic photo, the feature detection is carried out on the video included in the dynamic photo, the corresponding relation between the dynamic photo and the feature detection result is stored in a mode of adding the mark, and the shooting content corresponding to the dynamic photo can be displayed more simply through the mark. In the process of making the Vlog by the user selecting the dynamic photos, the dynamic photos can be classified or sorted according to the marks and then displayed, so that the user can conveniently and quickly select the proper dynamic photos based on the marks, the step of making the Vlog is simplified, and the user experience is good.

In order to describe the method for generating multimedia data provided in the embodiments of the present application in more detail, a flowchart of a method for implementing the method for generating multimedia data by the terminal device in the embodiments of the present application is first described by fig. 4, and the method shown in fig. 4 may be implemented by executing related instructions by a processor of the terminal device. The multimedia data generation method may include the following steps.

S401: and displaying a shooting preview picture of the dynamic photo.

For example, the user may turn on the camera application of the terminal device and turn on a dynamic photo taking mode of the camera application. After the user opens the dynamic photo taking mode of the camera application of the terminal device, the user can view the camera of the terminal device aiming at the shooting object. The application interface of the camera application may display a photographing preview screen of the photographing object.

S402: a first feature detection result of the preview video data is generated and stored.

The preview video data may be, for example, a video having a duration of a first period of time, and the terminal device may buffer and update the preview video data during framing. If the user clicks a photographing key of a camera application of the terminal device, the preview video data may be used to record a change process of a photographing object before the user clicks the photographing key of the camera application.

The first feature detection result may include a feature detection result obtained by performing feature detection on shot content corresponding to each video frame included in the preview video data by the terminal device, including a scene detection result, a human body detection result, a human face detection result, an action evaluation result, and the like.

In some embodiments, the terminal device may perform feature detection on the shot content corresponding to each video frame of the obtained preview video data in real time, and store the obtained correspondence between the feature detection result and the video frame. In another embodiment, the terminal device may further buffer the preview video data first until detecting that the user clicks a photographing key of a camera application of the terminal device, obtain the buffered preview video data, and perform feature detection on a photographing content corresponding to each video frame of the preview video data, to obtain a feature detection result corresponding to the preview video data. It will be appreciated that the preview video data may comprise a plurality of video frames and, correspondingly, the preview video data may have a plurality of feature detection results.

S403: and detecting a shooting instruction of a user, and generating and storing a first characteristic identifier of the first dynamic photo.

The photographing instruction here may be an instruction generated by a user clicking a photographing time point of a photographing key of a camera application of the terminal device, for example. The first dynamic photo includes preview video data, a first preview screen in which a photographing instruction is detected, and captured video data captured after the photographing instruction is detected.

It can be understood that, after the user clicks the photographing key of the camera application, the camera application of the terminal device needs to acquire the acquired video data after the user clicks the photographing key of the camera application, in addition to photographing the first preview image of the photographing object when the user clicks the photographing key of the camera application. The acquired video data may have a second time period, and the duration of the second time period may be the same as or different from the duration of the first time period of the preview video data, where the acquired video data may be used to record a change process of a shooting object after the user clicks a shooting button of the camera application.

The first feature identifier includes a first feature detection result of the preview video data, a second feature detection result of the captured video data, and a third feature detection result of the first preview screen. The second feature detection result may include a feature detection result obtained by performing feature detection on the photographed content corresponding to each video frame included in the acquired video data by the terminal device. The third feature detection result may include a feature detection result obtained by performing feature detection on the shot content corresponding to the first preview screen by the terminal device. The terminal device generates the preview video data, the collected video data and the first preview picture into dynamic pictures and stores the dynamic pictures in an album of the terminal device. The dynamic photograph includes a first feature identification. In some embodiments, the first feature identifier in the album of the terminal device may be represented by a tag of the dynamic photograph.

After the method flow of the multimedia data generating method is introduced, the system architecture of the terminal device implementing the multimedia data generating method in the embodiment of the present application is described below. Referring to fig. 5, fig. 5 exemplarily shows a system architecture diagram for a terminal device to take a dynamic photograph provided in an embodiment of the present application.

As shown in fig. 5, the system architecture of the terminal device includes a hardware portion and an operating system 500, wherein the operating system 500 is composed of an application layer, a system framework layer, and a hardware abstraction layer. The hardware part comprises: a photographing sensor 501, an image signal processor 502, and an image optimization module 503. The hardware abstraction layer of the operating system 500 includes: the video processing module 504, the semantic algorithm module 505, the image processing module 506, the system framework layer includes: the electronic anti-shake module 507, the cache pool 508, the application layer includes: camera application 509.

The photographing sensor 501 is used for collecting data of a photographing object, and may include preview data and video data.

The Image signal processor 502 (IFE) performs processing such as white balance correction and color correction on data. And transmits the image data and the video data to the image optimization module 503, the video processing module 504, and the semantic algorithm module 505, respectively. The video data sent to the video processing module 504 may be divided into preview data and video data, where the preview data is used for displaying a preview screen on an application interface of a camera application, and the video processing module 504 may perform skin care, anti-shake, and the like on the preview screen. The video data is used to buffer video that generates dynamic photos. The video data (Tiny stream) sent to the semantic algorithm module 505 is used for semantic detection (video scene detection) by the semantic algorithm module 505, a mark is added to the generated video according to the semantic detection result, and a judgment use mark can be selected in the generation of a short video (Vlog).

The image optimization module 503 (Bayer Processing Segment, BPS) is configured to perform dead pixel removal, phase focusing, demosaicing, downsampling, HDR processing, and Bayer hybrid noise reduction processing on the image data.

The video processing module 504 is configured to perform video optimization processing such as anti-shake, exposure, mirroring, and the like on the video data.

The semantic algorithm module 505 (perception engine) is configured to perform semantic detection, that is, feature detection, on a frame-by-frame basis on a video frame of video data, and store semantic detection results such as human body, face, and action evaluation in the video frame after detection.

The image processing module 506 is configured to perform image processing tasks such as clipping, noise reduction, color processing, and detail enhancement on the image data.

The electronic anti-shake module 507 (Electric Image Stabilization, EIS) is configured to detect the shake amplitude of the terminal device through an acceleration sensor, a gyroscope, etc. of the terminal device, dynamically adjust a shutter, exposure, and imaging to correct video data and preview data, and ensure sharpness. The electronic anti-shake module 507 may send preview data to the preview display module 5091 of the camera application 509 for shot previewing. The electronic anti-shake module 507 may also send video data to a buffer pool 508.

The buffer pool 508 is used for sequentially storing video frames corresponding to video data, where the video frames may form a video with a duration, for example: 1.5s of video (45 frames of video frames), after the buffer pool 508 is full of 45 frames of video frames, the video frames can be continuously updated, so that the latest video frames are ensured to be stored in the buffer pool 508.

The preview display module 5091 of the camera application 509 is configured to preview a shot image to display a shot object, after receiving a shooting instruction of a user, the video and image synthesis module 5092 of the camera application 509 may obtain, through the image processing module 506, an image (photo) shot when the shooting instruction is received, obtain, through the buffer pool 508, video data stored before the shooting instruction is received and video data collected after the shooting instruction is received, obtain, through the semantic algorithm module 505, a semantic detection result corresponding to the video data, synthesize the semantic detection result into a dynamic photo, and store the dynamic photo in an album of the terminal device 500, where the dynamic photo includes a flag corresponding to the semantic detection result.

After the multimedia data generation method according to the embodiment of the present application and the system architecture of the terminal device according to the embodiment of the present application are described through fig. 4 and fig. 5, a method flow for implementing the multimedia data generation method by each functional module in the system architecture of the terminal device is further described below through fig. 6. The multimedia data generation method may include the following steps.

S601: the user turns on the camera.

Illustratively, the turning on the camera here may be that the in-terminal device turns on the camera application 601. The terminal device may receive an operation in which the user clicks an icon of the camera application, and in response to the operation, the terminal device may execute an instruction to open the camera application 601.

S602: the user turns on the dynamic photograph taking mode.

For example, the camera application 601 may receive an operation of clicking a dynamic photo taking capability control by a user, and in response to the operation, the camera application 601 turns on a dynamic photo taking mode. It will be appreciated that after the camera application 601 starts the dynamic photo taking mode, the application interface of the camera application 601 needs to present the preview content of the object taken at the current moment in real time, and to generate the dynamic photo, the video data of the first period from the current moment needs to be acquired in real time.

S603: the camera HAL6031 returns to the preview screen.

Illustratively, the camera HAL6031 herein may be a (Hardware Abstraction Layer, hardware abstraction) camera hardware abstraction. The preview stream may be a video frame of the object photographed at the current time acquired by the camera HAL6031 through the camera, and is returned to the camera application 601 as a preview screen through the video stream, and the application interface of the camera application 601 may present the preview screen of the object photographed at the current time in real time.

S604: the camera HAL6031 buffers the video stream to a buffer pool.

The buffer pool here may be a memory space (queue buffer) configured in a memory of the terminal device, for buffering the video stream, for example. In some embodiments, the framework layer 602 may configure a buffer pool, which may also be referred to as an ION buffer, for the HAL 6031. The camera HAL6031 may buffer a video stream having a preset duration as preview video data, for example: 45 frames of video, i.e. a video stream of 1.5s duration. The video stream here may be video data of a first period of time before the current moment, and the camera HAL6031 also needs to continually refresh the video stream in the buffer pool. The video frames corresponding to the video stream herein may also be referred to as buffered frames.

S605: the semantic algorithm module 6032 obtains and caches semantic detection results.

The semantic detection result may include a feature detection result obtained by performing feature detection on shot content corresponding to each video frame of the video stream, including a scene detection result, a human body detection result, a human face detection result, and an action evaluation result, and is used for adding a mark corresponding to the semantic detection result to the video frame, so that a dynamic photo generated according to the video frame also has a mark, and a user can generate a video blog (Vlog) by using the dynamic photo conveniently. In some embodiments, the semantic algorithm module 6032 may obtain the semantic detection results from the video frames on a frame-by-frame basis, that is, when the camera HAL6031 caches one video frame of the preview video data, the semantic algorithm module 6032 performs feature detection on the video frame to obtain the feature detection results and caches the feature detection results. For example: the camera HAL6031 sequentially caches video streams corresponding to video frames of 90 frames, the semantic algorithm module 6032 may cache semantic detection results corresponding to video frames of 90 frames, and each video frame may correspond to one semantic detection result.

It can be seen that, the semantic algorithm module 6032 performs feature detection on the video frame in a real-time manner, so that a semantic detection result can be obtained simultaneously when the camera HAL6031 caches the preview video data, and the time consumption of the semantic algorithm module 6032 for performing feature detection is saved.

In another embodiment, the semantic algorithm module 6032 may perform feature detection on each video frame of the preview video data cached by the camera HAL6031 after the camera application 601 issues the photographing instruction, so as to obtain a feature detection result and cache the feature detection result. For example: the camera HAL6031 sequentially caches video streams corresponding to the video frames of the 90 frames, and the semantic algorithm module 6032 can perform feature detection on the cached video frames of the 90 frames to obtain semantic detection results of the video frames of the 90 frames.

It can be seen that if the camera application 601 does not issue a photographing instruction for a long time, that is, the user uses the mobile phone to view for a long time, since the preview video data cached by the camera HAL6031 is continuously updated, the semantic detection result obtained by the semantic algorithm module 6032 is updated. After the camera application 601 issues a photographing instruction, the semantic algorithm module 6032 performs feature detection on each video frame of the preview video data cached by the camera HAL6031, so that power consumption can be saved more.

S606: the user clicks to take a picture, and the camera application 601 issues a picture taking instruction.

For example, the user clicking the photographing may be an operation that the user clicks a photographing key of the camera application 601 of the terminal device, and in response to the operation, the camera application 601 may issue an instruction to photograph to the camera HAL 6031. It is understood that the camera application 601 may record and save the time when the user clicks the photographing key of the camera application 601 of the terminal device as a photographing time point.

S607: the camera HAL6031 returns the target photograph.

The target photograph here may be, for example, a target photograph of a photographing object when the user clicks a photographing key of the camera application 601 of the terminal device.

S608: the camera application 601 issues an instruction to acquire a video stream.

Illustratively, the camera application 601 issues instructions to the camera HAL6031 through the framework layer 602 to acquire video frames corresponding to the video stream.

Illustratively, the camera application 601 may issue an instruction to acquire a video stream to the HAL6031 through the framework layer 602, while issuing an instruction to take a photograph, acquiring a video stream for generating a dynamic photograph. It is understood that the video stream herein may include preview video data as well as capture video data.

S609: the framework layer 602 returns the video stream to the camera application 601.

Illustratively, the framework layer 602 may perform video optimization processing, video encoding, and so on video frames, and then return a video stream (preview video data and capture video data) to the camera application 601, where the video frames in the video stream include corresponding frame numbers and time stamps. The camera application 601 may store the video frames in sequence according to the frame numbers and time stamps corresponding to the video frames. In some embodiments, the camera application 601 may receive and buffer encoded video frames of duration 3 s. The buffered form of the video frames here may be an AVC (Advanced Video Coding ) buffer queue.

S609: the camera application 601 issues a photographing time point and a detection section to the semantic algorithm module 6032.

Illustratively, the photographing time point may be used for the camera application 601 to issue a semantic detection result to the semantic algorithm module 6032, where the semantic detection result is used for obtaining a semantic detection result corresponding to a video with a preset duration before and after the photographing time point. The interval here is used to represent a period of video determined according to a photographing time point. For example: the duration of the video may be 3s, and the section may be a section formed 1.5s before and after the photographing time point. Illustratively, the semantic algorithm module 6032 obtains the semantic detection result corresponding to the photographing time point from the cached semantic detection results according to the photographing time point.

S611: the semantic algorithm module 6032 returns the semantic detection results to the camera application 601.

Illustratively, after the semantic algorithm module 6032 acquires the semantic detection results corresponding to the photographing time point, the semantic detection results are returned to the camera application 601. It may be understood that the semantic detection result herein includes preview video data and semantic detection results corresponding to the collected video data.

S612: the camera application 601 synthesizes dynamic photos.

Illustratively, the camera application 601 may obtain buffered video frames from the AVC buffer queue, and compose a video file, for example: MP4 file. The camera application 601 may synthesize the target photo, the MP4 file, and the semantic detection result into a dynamic photo and save the dynamic photo to the album of the terminal device. The semantic detection result of the dynamic photo in the album of the terminal device may be represented by a mark on the dynamic photo, for example: sports, landscapes, children's interests, etc.

After the method flow of implementing the multimedia data generating method by each functional module in the system architecture of the terminal device is introduced through fig. 6, a flow diagram of selecting dynamic photos to generate short videos is introduced through fig. 7 (a) and fig. 7 (b) below. The process of generating the short video may be performed by a processor of the terminal device, including the following steps. In fig. 7, the terminal device here may be a mobile phone.

S701: the album is opened to display the dynamic photos.

For example, the album may also include a gallery of the mobile phone 100, where the operation of opening the album may be that, as shown in fig. 2 (a) to fig. 2 (c), the mobile phone 100 receives an operation of opening the album of the mobile phone 100 by a user, and an application interface of the album displays a plurality of photos and an authoring control 2001. The mobile phone 100 receives a click operation of the authoring control 2001 by the user, and in response to the operation, the mobile phone 100 displays an authoring interface. The authoring interface displays a plurality of photos and free authoring controls 2002. The mobile phone 100 receives a click operation of the free creation control 2002 by the user, and in response to the click operation, the mobile phone 100 displays a plurality of dynamic photos through the creation interface list.

S702: it is detected whether the number of dynamic photos in the album exceeds a number threshold.

For example, the mobile phone 100 may detect the number of dynamic photos currently shot in the album, and if the number exceeds the number threshold, it indicates that the number of dynamic photos in the album is too large, and in the process that the user selects the dynamic photos to generate Vlog, the user needs to slide the application interface of the album to browse and select the dynamic photos, step S703 is executed to determine and recommend the dynamic photos in the album. Otherwise, step S704 is executed to determine the priority and classification of the dynamic photo according to the mark of the dynamic photo.

S703: and judging and recommending the dynamic photos in the album.

Illustratively, step S703 may be constituted by the following steps, as shown in fig. 7 (b).

S703a: and determining a dynamic photo of which the shooting time is within a preset time period.

Illustratively, the preset time period herein may be preset, for example: the preset time period may be within approximately three days. If the photographing time of the dynamic photograph is earlier than the preset time period, the dynamic photograph may be excluded, that is, the dynamic photograph may be arranged later.

S703b: and determining that the shooting time belongs to a holiday or a dynamic picture of a preset holiday.

Illustratively, the preset holiday herein may be preset, for example: the preset holiday may be a wedding anniversary, birthday, or the like. If the photographing time of the dynamic photo belongs to holidays or preset holidays, the dynamic photo may be reserved, that is, the dynamic photo may be arranged in front.

S703c: and counting the information of the shooting places of the dynamic pictures.

For example, the shooting location here may be position information acquired when the user takes a dynamic photograph.

S703d: the duplication is removed according to the mark of the dynamic photo.

For example, if there are multiple dynamic photos with the same marks, at least one dynamic photo can be reserved, so that adjacent arrangement of the same dynamic photos is avoided, and an application interface of the album is occupied.

S704: the priority and classification of the dynamic photos are determined according to the marks of the dynamic photos.

Illustratively, when the mobile phone 100 generates a dynamic photo, the tag may include: fun for children, travel, nature, sports, etc. It can be seen that a dynamic photograph may include at least one mark, and for a dynamic photograph that includes a plurality of marks, it is indicative that the dynamic photograph includes a plurality of shots.

In some embodiments, the classification herein may be based on the labels of the dynamic pictures, i.e., dynamic pictures that include the same labels are arranged in adjacent succession. The priority of dynamic photos may include: shooting time priority, shooting place priority, number of marks priority, and the like. Taking the photographing time priority as an example, the album of the mobile phone 100 may arrange the dynamic photos in a descending order according to the photographing time, that is, arrange the recently photographed dynamic photos in the front of the album.

S705: and determining the priority of generating the Vlog recommendation according to the determined priority and classification of the dynamic photo.

Illustratively, the priority of the Vlog recommendation herein may be pre-set, and the priority may be determined based on priority and/or classification, for example: the priority of the Vlog recommendation corresponding to the dynamic photograph marked as moving is set to be highest according to the photographing time priority of the dynamic photograph, that is, the dynamic photograph of the dynamic photograph most recently photographed and marked as moving is arranged in the front of the album.

It will be appreciated that determining the priority of generating the Vlog recommendation may also include other various criteria.

S706: the user is prompted by an album for a generation control for generating Vlog.

Illustratively, the cell phone 100 may highlight or animate a localized area on the application interface of the album (e.g., below or above the application interface of the album, etc.) to indicate to the user that the Vlog may be generated by generating a control one-key. For example, as shown in fig. 3 (a), the mobile phone 100 receives an operation of opening an album of the mobile phone 100 by a user, and an application interface of the album displays a plurality of photos and an authoring control 3001. The mobile phone 100 receives a click operation of the authoring control 3001 by the user, and in response to the operation, the mobile phone 100 displays an authoring interface as shown in fig. 3 (b). The authoring interface displays a plurality of photos and one-touch tab control 3002. The mobile phone 100 receives the click operation of the one-key sheeting control 3002 by the user, and in response to the click operation, the mobile phone 100 displays a list of a plurality of recommended dynamic photos, images or videos through the authoring interface as shown in fig. 3 (c). In some embodiments, the one-touch tab control 3002 may also be identified by a one-touch tab control 3002.

S707: displaying dynamic photos, images or videos which are used for generating Vlog and are recommended to a user according to the preset recommendation quantity according to the default recommendation rule.

By way of example, the default recommendation rules herein may be the display of dynamic photos that the user previously set in the album, as well as the recommendation policy. The default recommendation rules may be determined based on the time of taking the dynamic picture, the place of taking, the tag, the priority, etc. For example: the user may set default recommendation rules including: dynamic pictures marked as children's interests, which are most recent in time, are recommended to be taken. The preset recommended number here may be the number of dynamic photos that generate Vlog at a time, for example: the preset recommended number may be 3. It is to be understood that the numerical values herein are exemplary and not limiting in any way. For example, the default recommendation rules may be landscape, portrait; the preset recommendation number may be 3, the album of the mobile phone 100 may display to recommend preset dynamic photos to the user, and the user may directly generate Vlog or adjust the recommended dynamic photos.

In some embodiments, the album of the mobile phone 100 may include a portion of other photos or videos in addition to the dynamic photos, and the album of the mobile phone 100 may prioritize the dynamic photos, other photos (images), or videos, with the default selection for generating the Vlog dynamic photos, other photos, or videos, for recommendation to the user (checked-up). As shown in fig. 3 (c), the album of the mobile phone 100 recommends a plurality of dynamic photo images and videos, and the album of the mobile phone 100 defaults to pick up 3 dynamic photos in a local area 3003 on an application interface of the album to prompt the user that the dynamic photos are picked up, so that the user can directly generate Vlog. The user may also decide to increase or decrease or remain unchanged on the recommended dynamic photos, other photos or videos. For example: increasing and decreasing dynamic photos, changing the order of dynamic photos, etc. As shown in fig. 3 (d), the user removes the dynamic photo that is checked by default in the album of the mobile phone 100, and selects the image 3005, at this time, the local area 3003 on the application interface of the album indicates that the checked dynamic photo has changed, and the image 3005 is added.

S708: and receiving the dynamic photo, video or image selected by the user.

Illustratively, in addition to dynamic photos displayed on the album's application interface that may be used to generate the material of Vlog, the user may also select other videos or images, i.e., the user may add videos or images by himself. For example, with continued reference to fig. 3 (e), the mobile phone 100 detects a click operation performed by the user on the catalog selection control 3006, and in response to this operation, as shown in fig. 3 (f), the mobile phone 100 displays an application interface of videos, which includes a plurality of videos, at least one of which the user can select as a material of a dynamic photograph.

S709: judging whether the number of the selected materials exceeds the upper limit of the number and prompting.

The upper limit of the number here may also be preset, for example: the upper number limit may be 5, and if the number of materials selected by the user exceeds the upper number limit, the mobile phone 100 may prompt the user in the local area 3003 on the application interface of the album.

S710: and screening according to the priorities of the dynamic photos, the videos and the images.

For example, the priorities may be arranged in a dynamic photo to video to image order, i.e., dynamic photos have the highest priority, videos have the highest priority, and images have the lowest priority. If the number of the materials selected by the user exceeds the upper limit of the number, the mobile phone 100 can keep the dynamic photos selected by the user to exclude the images and the videos preferentially.

S711: an operation of the user on the generation control is detected, prompting for audio and a Vlog template for generating Vlog.

Illustratively, with continued reference to fig. 3 (d), upon detection of a click operation by the user on the one-touch tile control 3004, the handset 100 may open a prompt interface for displaying the audio and Vlog templates that generated Vlog, as shown in fig. 3 (g).

S712: and detecting the operation of a user on the confirmation control, generating and storing the Vlog.

Illustratively, with continued reference to fig. 3 (g), the mobile phone 100 detects a click operation performed by the user on the confirmation control 3007, and the mobile phone 100 generates a Vlog that can be saved in the video catalog of the mobile phone 100.

Fig. 8 is a schematic structural diagram of a terminal device 800 according to an embodiment of the present application. As shown in fig. 8, an electronic device (e.g., a mobile phone) may include: processor 810, external memory interface 820, internal memory 821, universal serial bus (universal serial bus, USB) interface 830, charge management module 840, power management module 841, battery 842, antenna 1, antenna 2, mobile communication module 850, wireless communication module 860, audio module 870, speaker 870A, receiver 870B, microphone 870C, ear-piece interface 870D, sensor module 880, keys 890, motor 891, indicator 892, camera 893, display 894, and subscriber identity module (subscriber identification module, SIM) card interface 895, among others.

The sensor module 880 may include a pressure sensor, a gyroscope sensor, a barometric sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, and the like.

It is to be understood that the configuration illustrated in this embodiment does not constitute a specific limitation on the electronic apparatus. In other embodiments, the electronic device may include more or fewer components than shown, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 810 may include one or more processing units, such as: the processor 810 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), a controller, a memory, a video codec, a digital signal processor (digital signal processor, DSP), a baseband processor, and/or a neural network processor (neural-network processing unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

A memory may also be provided in the processor 810 for storing instructions and data. In some embodiments, the memory in processor 810 is a cache memory. The memory may hold instructions or data that the processor 810 has just used or recycled. If the processor 810 needs to reuse the instruction or data, it may be called directly from the memory. Repeated accesses are avoided and the latency of the processor 810 is reduced, thereby improving the efficiency of the system.

In some embodiments, the processor 810 may include one or more interfaces. The interfaces may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit built-in audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (pulse code modulation, PCM) interface, a universal asynchronous receiver transmitter (universal asynchronous receiver/transmitter, UART) interface, a mobile industry processor interface (mobile industry processor interface, MIPI), a general-purpose input/output (GPIO) interface, a subscriber identity module (subscriber identity module, SIM) interface, and/or a universal serial bus (universal serial bus, USB) interface, among others.

It should be understood that the connection relationship between the modules illustrated in this embodiment is only illustrative, and does not limit the structure of the electronic device. In other embodiments, the electronic device may also use different interfacing manners in the foregoing embodiments, or a combination of multiple interfacing manners.

The external memory interface 820 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the electronic device. The external memory card communicates with the processor 810 through an external memory interface 820 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 821 may be used to store computer-executable program code that includes instructions. The processor 810 executes various functional applications of the electronic device and data processing by executing instructions stored in the internal memory 821. For example, in an embodiment of the present application, the processor 810 may include a storage program area and a storage data area by executing instructions stored in the internal memory 821.

The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data created during use of the electronic device (e.g., audio data, phonebook, etc.), and so forth. In addition, the internal memory 821 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.

It should be understood that the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device. In other embodiments of the present application, the electronic device may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Fig. 9 is a software configuration block diagram of a terminal device according to an embodiment of the present invention.

The layered architecture divides the software into several layers, each with distinct roles and branches. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, from top to bottom, an application layer (APK), an application framework layer (FWK), a Hardware Abstraction Layer (HAL), and a kernel layer, respectively.

As shown in fig. 9, the application layer may include an application program such as a camera application. It is understood that the application program herein may be an application or service installed in the terminal device or an application or service not installed in the terminal device retrieved through the quick service center.

The application layer includes a camera application including a preview display module 901 and a dynamic photo management module 902. The preview display module 901 is used for displaying a shooting object on a preview shooting screen. The dynamic photo management module 902 is configured to synthesize the photographed target photo, the MP4 file, the semantic detection result, and the audio data into a dynamic photo.

The application framework layer comprises a storage management module 903. The storage management module 903 is configured to allocate memory space for the camera HAL 904, i.e. to configure a buffer pool for buffering video data

The hardware abstraction layer includes a camera HAL 904, a semantic algorithm module 905, an image processing front end 906, an image processing engine 907, and an image processing module 908.

The camera HAL 904 is used for controlling a camera of the terminal device to take a picture or video, buffering video frames and performing video optimization processing on the video frames. The semantic algorithm module 905 is configured to detect semantic detection results such as human body, face, and action evaluation from a video frame acquired by a camera of the terminal device.

The Image processing Front End 906 (IFE) is configured to perform effect processing on an Image frame, for example: and performing color correction, downsampling and demosaicing on the image frames to obtain 3A data.

Image processing engine 907 (Image processing engine, IPE) for image processing such as cropping, noise reduction, color processing, detail enhancement, etc. of the image frames.

An image processing module 908 (Bayer Processing Segment, BPS) is configured to perform dead point removal, phase focusing, demosaicing, downsampling, HDR processing, and Bayer hybrid noise reduction processing on the image frame.

It will be understood that, although the terms "first," "second," etc. may be used herein to describe various features, these features should not be limited by these terms. These terms are used merely for distinguishing and are not to be construed as indicating or implying relative importance. For example, a first feature may be referred to as a second feature, and similarly a second feature may be referred to as a first feature, without departing from the scope of the example embodiments.

Furthermore, various operations will be described as multiple discrete operations, in a manner that is most helpful in understanding the illustrative embodiments; however, the order of description should not be construed as to imply that these operations are necessarily order dependent, and that many of the operations be performed in parallel, concurrently or with other operations. Furthermore, the order of the operations may also be rearranged. When the described operations are completed, the process may be terminated, but may also have additional operations not included in the figures. The processes may correspond to methods, functions, procedures, subroutines, and the like.

References in the specification to "one embodiment," "an illustrative embodiment," etc., indicate that the embodiment described may include a particular feature, structure, or characteristic, but every embodiment may or may not necessarily include the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment. Furthermore, when a particular feature is described in connection with a particular embodiment, it is within the knowledge of one skilled in the art to affect such feature in connection with other embodiments, whether or not such embodiment is explicitly described.

The terms "comprising," "having," and "including" are synonymous, unless the context dictates otherwise. The phrase "A/B" means "A or B". The phrase "a and/or B" means "(a), (B) or (a and B)".

As used herein, the term "module" may refer to, be part of, or include: a memory (shared, dedicated, or group) for running one or more software or firmware programs, an Application Specific Integrated Circuit (ASIC), an electronic circuit and/or processor (shared, dedicated, or group), a combinational logic circuit, and/or other suitable components that provide the described functionality.

In the drawings, some structural or methodological features may be shown in a particular arrangement and/or order. However, it should be understood that such a particular arrangement and/or ordering is not required. Rather, in some embodiments, these features may be described in a different manner and/or order than shown in the illustrative figures. Additionally, the inclusion of a structural or methodological feature in a particular drawing does not imply that all embodiments need to include such feature, and in some embodiments may not be included or may be combined with other features.

The embodiments of the present application have been described in detail above with reference to the accompanying drawings, but the application of the technical solution of the present application is not limited to the applications mentioned in the embodiments of the present application, and various structures and modifications can be easily implemented with reference to the technical solution of the present application, so as to achieve the various beneficial effects mentioned herein. Various changes, which may be made by those of ordinary skill in the art without departing from the spirit of the present application, are intended to be covered by the claims herein.

Claims

1. A multimedia data generation method, which is applied to an electronic device having a photographing function, the method comprising:

displaying a shooting preview picture of the dynamic photo;

a shooting instruction of a user is detected, a first feature identification of a first dynamic photo is generated and stored, wherein the first dynamic photo comprises the preview video data, a first preview picture of the shooting instruction is detected, and acquired video data acquired after the shooting instruction is detected, and

the first feature identifier includes the first feature detection result, the second feature detection result of the collected video data, and a third feature detection result of the first preview screen.

2. The method of claim 1, wherein the first feature detection result, the second feature detection result, and the third feature detection result comprise: at least one of scene detection results, human body detection results, human face detection results and action evaluation results.

3. The method of claim 2, wherein generating and storing the first feature detection result of the preview video data comprises:

and in the process of generating and storing the preview video data, carrying out feature detection on video frames included in the preview video data frame by frame to obtain image features of each video frame, wherein the first feature detection result comprises the image features obtained by carrying out merging and de-repeating operation on the image features of each video frame. .

4. The method of claim 2, wherein the detecting of the user's photographing instruction generates and stores a first feature identification of a first dynamic photo, comprising:

and after a shooting instruction of a user is detected, carrying out feature detection on video frames included in the preview video data frame by frame to obtain a first feature detection result.

5. The method of claim 2, wherein the content of the first signature comprises: at least one of sports, travel, scenery, and child interests.

6. The method of claim 5, wherein the first feature identifier is displayed on the first dynamic photograph.

7. The method of claim 2, wherein generating and storing the first signature of the first dynamic photograph comprises:

and merging and de-repeating the first feature detection result, the second feature detection result and the third feature detection result to obtain the first feature identifier.

8. The method as recited in claim 1, further comprising:

in the storage display interface of the dynamic photos, a first dynamic photo is arranged adjacent to a stored second dynamic photo, wherein a second feature identifier of the second dynamic photo is identical to a first feature identifier of the first dynamic photo.

9. The method as recited in claim 8, further comprising:

in a storage display interface of the dynamic photos, a first dynamic photo is positioned before a third dynamic photo, wherein a first feature identifier of the first dynamic photo belongs to a feature identifier in a sorting condition, and a third feature identifier of the third dynamic photo does not belong to the feature identifier in the sorting condition.

10. The method as recited in claim 9, further comprising:

11. An electronic device, comprising:

A processor, being one of the processors of an electronic device, for performing the multimedia data generation method of any of claims 1-10.

12. A computer program product, comprising: a non-transitory computer readable storage medium containing computer program code for performing the multimedia data generation method of any one of claims 1-10.