[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN116821415A - Information processing method and device and electronic equipment - Google Patents

Information processing method and device and electronic equipment Download PDF

Info

Publication number
CN116821415A
CN116821415A CN202310824391.8A CN202310824391A CN116821415A CN 116821415 A CN116821415 A CN 116821415A CN 202310824391 A CN202310824391 A CN 202310824391A CN 116821415 A CN116821415 A CN 116821415A
Authority
CN
China
Prior art keywords
area
dynamic
text content
film
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310824391.8A
Other languages
Chinese (zh)
Inventor
姬广如
佘志强
魏丽萍
章峥峥
计晨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Migu Cultural Technology Co Ltd
China Mobile Communications Group Co Ltd
MIGU Digital Media Co Ltd
Original Assignee
Migu Cultural Technology Co Ltd
China Mobile Communications Group Co Ltd
MIGU Digital Media Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Migu Cultural Technology Co Ltd, China Mobile Communications Group Co Ltd, MIGU Digital Media Co Ltd filed Critical Migu Cultural Technology Co Ltd
Priority to CN202310824391.8A priority Critical patent/CN116821415A/en
Publication of CN116821415A publication Critical patent/CN116821415A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/74Browsing; Visualisation therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7844Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using original textual content or text extracted from visual content or transcript of audio data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/635Overlay text, e.g. embedded captions in a TV program
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44016Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving splicing one content stream with another content stream, e.g. for substituting a video clip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Library & Information Science (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses an information processing method, an information processing device and electronic equipment, relates to the technical field of electronic information, and aims to solve the problem that the display effect of the existing electronic reading mode is poor. The method comprises the following steps: determining an associated film according to the text content displayed in the first area on the reading interface; acquiring N dynamic videos in the associated film; and displaying at least one of the N dynamic videos in a second area on the reading interface. According to the embodiment of the application, the dynamic video in the related film is displayed on the text reading interface, so that the text content can be deduced by the film and television content in the related film and television resource, the reading display effect is improved, and better content immersion experience is provided for users.

Description

Information processing method and device and electronic equipment
Technical Field
The present application relates to the field of electronic information technologies, and in particular, to an information processing method, an information processing device, and an electronic device.
Background
Along with the popularization of electronic equipment such as mobile phones, the reading mode of people is changed, the novel is often read by using related software such as an electronic book, the electronic book is basically displayed in a pure text or image-text mode, and a user can only feel the wonderful content of the novel through text language and pictures, so that the display effect is poor.
Disclosure of Invention
The embodiment of the application provides an information processing method, an information processing device and electronic equipment, which are used for solving the problem that the display effect of the existing electronic reading mode is poor.
In a first aspect, an embodiment of the present application provides an information processing method, including:
determining an associated film according to the text content displayed in the first area on the reading interface;
acquiring N dynamic videos in the associated film;
and displaying at least one of the N dynamic videos in a second area on the reading interface.
Optionally, the acquiring N dynamic videos in the associated movie includes:
and carrying out matting processing on N objects in the associated film, which are matched with the characters in the text content, and generating a dynamic video of each object by using the multi-frame image of each object obtained by matting to obtain the N dynamic videos.
Optionally, the determining the associated movie according to the text content displayed in the first area on the reading interface includes:
determining a first film and television resource matched with the information of the literary works according to the information of the literary works to which the literal contents belong;
extracting the speech or the side white in the text content;
acquiring subtitle information of the first video resource;
And determining the film and television fragments, in which the subtitle information in the first film and television resource is matched with the line or the side, as the associated film.
Optionally, the determining the associated movie according to the text content displayed in the first area on the reading interface includes:
determining a second film and television resource matched with the information of the literary works according to the information of the literary works to which the literal contents belong;
semantic understanding is carried out on the text content, and the content plot of the text content is determined;
and determining the film and television fragment matched with the content plot in the second film and television resource as the associated film.
Optionally, the number of the associated movies is multiple, and the multiple associated movies are respectively from multiple different movie resources corresponding to the text content;
the displaying at least one of the N dynamic videos in the second area on the reading interface includes:
displaying N object options in a third area on the reading interface, wherein the N object options are in one-to-one correspondence with the N dynamic videos;
and displaying M dynamic videos corresponding to the M object options in the second area according to the selection operation of the user on the M object options in the N object options.
Optionally, the associated film includes a first film of the same type as the literary work to which the text content belongs;
the displaying at least one of the N dynamic videos in the second area on the reading interface includes:
converting the text content into a first audio signal;
based on the first audio signal, adjusting facial actions in a first dynamic video, wherein the first dynamic video is a dynamic video from the first film in the N dynamic videos;
and displaying the adjusted first dynamic video in the second area.
Optionally, after the displaying at least one of the N dynamic videos in the second area on the reading interface, the method further includes:
updating the text content displayed in the first area;
adjusting the display transparency of the second dynamic video to hide the second dynamic video under the condition that the content performed by the second dynamic video displayed on the second area is not matched with the text content currently displayed in the first area;
or under the condition that the content performed by the second dynamic video is not matched with the text content currently displayed, converting the text content currently displayed in the first area into a second audio signal, and adjusting the facial action in the second dynamic video based on the second audio signal.
Optionally, the displaying at least one of the N dynamic videos in the second area on the reading interface includes:
determining the display position and the display size of the second area according to the display position and the display size of the first area, wherein the second area is not overlapped with the first area;
and displaying at least one of the N dynamic videos in the second area.
In a second aspect, an embodiment of the present application further provides an information processing apparatus, including:
the determining module is used for determining the associated film according to the text content displayed in the first area on the reading interface;
the acquisition module is used for acquiring N dynamic videos in the associated film;
and the display module is used for displaying at least one of the N dynamic videos in a second area on the reading interface.
Optionally, the acquiring module is configured to perform matting processing on N objects in the associated film, where the N objects are matched with characters in the text content, and generate a dynamic video of each object by using a multi-frame image of each object obtained by matting, so as to obtain the N dynamic videos.
Optionally, the determining module includes:
the first determining unit is used for determining a first film and television resource matched with the information of the literary works according to the information of the literary works to which the text content belongs;
The extraction unit is used for extracting the speech or the side notes in the text content;
the acquisition unit is used for acquiring subtitle information of the first film and television resource;
and the second determining unit is used for determining the film and television fragment matched with the caption information and the speech or the side note in the first film and television resource as the associated film.
Optionally, the determining module includes:
a third determining unit, configured to determine, according to information of a literature work to which the text content belongs, a second movie resource that matches the information of the literature work;
a fourth determining unit, configured to perform semantic understanding on the text content, and determine a content scenario of the text content;
and a fifth determining unit, configured to determine a movie clip matching the content scenario in the second movie resource as the associated movie.
Optionally, the number of the associated movies is multiple, and the multiple associated movies are respectively from multiple different movie resources corresponding to the text content;
the display module includes:
the first display unit is used for displaying N object options in a third area on the reading interface, and the N object options are in one-to-one correspondence with the N dynamic videos;
And the second display unit is used for displaying M dynamic videos corresponding to the M object options in the second area according to the selection operation of the user on the M object options in the N object options.
Optionally, the associated film includes a first film of the same type as the literary work to which the text content belongs;
the display module includes:
a conversion unit for converting the text content into a first audio signal;
the adjusting unit is used for adjusting facial actions in a first dynamic video based on the first audio signal, wherein the first dynamic video is a dynamic video from the first film in the N dynamic videos;
and the third display unit is used for displaying the adjusted first dynamic video in the second area.
Optionally, the information processing apparatus further includes:
the updating module is used for updating the text content displayed in the first area;
the first adjusting module is used for adjusting the display transparency of the second dynamic video to hide the second dynamic video when the content performed by the second dynamic video displayed on the second area is not matched with the text content currently displayed in the first area;
Or the second adjusting module is used for converting the text content currently displayed in the first area into a second audio signal and adjusting the facial action in the second dynamic video based on the second audio signal under the condition that the content performed by the second dynamic video is not matched with the text content currently displayed in the first area.
Optionally, the display module includes:
a sixth determining unit, configured to determine, according to a display position and a display size of the first area, a display position and a display size of the second area, where the second area does not overlap with the first area;
and the fourth display unit is used for displaying at least one of the N dynamic videos in the second area.
In a third aspect, an embodiment of the present application further provides an electronic device, including: a memory, a processor and a computer program stored on the memory and executable on the processor, which processor implements the steps in the information processing method as described above when executing the computer program.
In a fourth aspect, embodiments of the present application also provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the information processing method as described above.
In the embodiment of the application, the related film is determined according to the text content displayed in the first area on the reading interface; acquiring N dynamic videos in the associated film; and displaying at least one of the N dynamic videos in a second area on the reading interface. Therefore, by displaying the dynamic video in the related film on the text reading interface, the text content can be deduced by the film and television content in the related film and television resource, so that the reading display effect is improved, and better content immersion experience is provided for users.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art.
FIG. 1 is one of the flowcharts of an information processing method provided by an embodiment of the present application;
fig. 2 is a schematic diagram of subtitle information in a video resource according to an embodiment of the present application;
FIG. 3 is a schematic diagram of matching video resources according to text content according to an embodiment of the present application;
Fig. 4 is a schematic diagram of a person image extracted from a matched movie resource according to an embodiment of the present application;
FIG. 5 is a schematic diagram of the present application for fusing character deduction text content in different movie and television version resources;
fig. 6 is a schematic diagram of an operation interface for dragging a selected character into a performance area according to an embodiment of the present application;
fig. 7 is a block diagram of an information processing apparatus provided by an embodiment of the present application;
fig. 8 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
According to the embodiment of the application, the original novel content is combined with the related film and television works, the scene content of a section of the original novel content is automatically matched with the similar plots in the film and television works, the roles of the film and television plots are extracted through the matting technology and matched into the novel plots, then the matched roles perform the scene of the section again according to the plot language lines of the novel, the original content is returned, and a user faithful to the original can feel the charm of the original through the performance of the film and television, so that better content immersion experience is provided for the user.
Referring to fig. 1, fig. 1 is a flowchart of an information processing method provided in an embodiment of the present application, as shown in fig. 1, including the following steps:
and step 101, determining the associated film according to the text content displayed in the first area on the reading interface.
The embodiment of the application is suitable for an electronic reading scene, can automatically match related film and television works when a user reads the novel works by using related software such as electronic books and the like, and displays characters or related contents in the film and television works on a reading interface through matting processing so as to perform the content scenario of the current reading, so that the user can feel the wonderful of the content of the current reading in combination with the performance of the film and television characters.
The reading interface may be a reading interface of a specific application program, such as an electronic book or a reading application program, and specifically, the user may set a video deduction function to be started for such application program, so as to execute the flow of the embodiment of the present application when the user opens such application program to start reading.
The first area may be a full text display area on the reading interface, or may be a part of text display area on the reading interface, for example, an upper half text area, a middle part text area, or a lower half text area on the reading interface may default to a text area currently read by a user, that is, may be used as the first area, and a specific area size may be preset.
According to the embodiment of the application, the text content displayed in the first area on the reading interface of the electronic equipment can be read in real time, the film and television resources matched with the literary works read currently are searched in a networking mode according to the text content, the film and television segments associated with the text content are determined, or the film and television resources corresponding to each literary work are preloaded, and therefore the film and television segments associated with the text content are determined from the film and television resources corresponding to the literary works read currently; the associated movie may be a movie fragment in which the movie content is the same as or similar to the text content, or a movie fragment in which the movie subtitle is the same as or similar to the line in the text content, or a movie fragment in which the plot of the movie content is similar to the plot of the story expressed by the text content. The associated movie may be a movie fragment that matches the textual content, taken from a piece of complete movie work.
Optionally, the step 101 includes:
determining a first film and television resource matched with the information of the literary works according to the information of the literary works to which the literal contents belong;
extracting the speech or the side white in the text content;
Acquiring subtitle information of the first video resource;
and determining the film and television fragments, in which the subtitle information in the first film and television resource is matched with the line or the side, as the associated film.
In one embodiment, the associated movie may be de-matched based on the speech in the text currently being read.
In this embodiment, information of the literature work to which the currently read text content belongs may be obtained, for example, information of a name, a chapter title, etc. of the literature work displayed on the reading interface may be obtained, or information of the literature work to which the currently read text content belongs may be determined based on the literature work selected by the user before entering the current reading interface, for example, a work name, an author, a chapter title, etc. And determining a first film and television resource matched with the information of the literary works according to the information of the literary works, such as the names of the literary works or the titles of the chapters, and searching the film and television resource matched with the names of the literary works or the titles of the chapters according to the film name of each film and television resource, the title of each set or the brief introduction. For example, when the literature works are read in the dream of the red blood cell, the matched movie and play can be found in the dream of the red blood cell, and when the literature works are read in the life, the matched movie and play can be found in the life.
For video files, the video files are generally divided into video streams, audio streams and subtitle streams, wherein the subtitle streams truly reflect original conversations, bystandings and other contents; for the video without the subtitle stream, the subtitle information of the video can be restored in a mode of converting audio into text. The subtitle information format is shown in fig. 2.
In order to determine the movie fragment associated with the text content in the first movie resource, the text content can be analyzed, and information such as the name, dialogue content, dialogue sequence, bystandings and the like of each character can be identified, so that the speech or bystandings of each character in the text content can be extracted; and acquiring the subtitle information of the first video resource, wherein the subtitle information can be directly acquired from the subtitle stream of the first video resource under the condition that the subtitle stream exists in the first video resource, the speech or the bystandings of each actor are identified, and the audio stream of the first video resource can be converted into a text under the condition that the subtitle stream does not exist in the first video resource, so that the subtitle information is acquired, and the speech or the bystandings of each actor are identified.
In this way, the speech or the side note in the text content can be matched with the caption information in the first video resource, and the caption information matched with the speech or the side note in the first video resource is determined, specifically, when the matching degree exceeds a certain threshold value such as 80%, the matching can be considered, and then the starting frame and the ending frame of the related film in the first video resource are determined according to the matched caption information, and the video frame between the starting frame and the ending frame is the related film.
According to the embodiment, the associated film can be accurately matched according to the literary work information, the lines and other information of the text content read at present, so that the film and television fragment corresponding to the part of the content can be displayed to the user, and immersive reading experience is brought to the user.
Optionally, the step 101 includes:
the determining the associated film according to the text content displayed in the first area on the reading interface comprises the following steps:
determining a second film and television resource matched with the information of the literary works according to the information of the literary works to which the literal contents belong;
semantic understanding is carried out on the text content, and the content plot of the text content is determined;
and determining the film and television fragment matched with the content plot in the second film and television resource as the associated film.
In another embodiment, the associated movie may be de-matched based on the storyline of the text currently being read.
In this embodiment, the manner of determining the second video resource that matches the information of the literature may be similar to that of the previous embodiment, that is, the information of the literature to which the currently read text belongs may be obtained, for example, the information of the name, the chapter title, etc. of the literature displayed on the reading interface may be obtained, or the information of the literature to which the currently read text belongs may be determined based on the literature selected by the user before entering the current reading interface, for example, the name, the author, the chapter title, etc. of the literature. And determining a second video resource matched with the information of the literary works according to the information of the literary works, such as the names of the literary works or the titles of the chapters, and searching the video resource matched with the names of the literary works or the titles of the chapters according to the film name of each video resource, the title of each set or the brief introduction.
In order to determine the video segments associated with the text content in the second video resource, semantic understanding can be performed on the text content, content plot of the text content is determined, for example, the text content is understood through a natural language understanding algorithm, so that semantic representation of the text content is obtained, and therefore, the video segments of similar plot in the second video resource are intelligently matched through the story plot of the text content, marking of the video start position and the video end position of the associated video is achieved, for example, content plot matching is performed on the second video resource and the text content through a deep learning algorithm, and the associated video in the second video resource is determined.
For example, as shown in fig. 3, semantic analysis is performed on the story content displayed in the first area 31 on the reading interface, that is, the first-time meeting plot of three people in the natural indigo of "dream of the red blood cell", and the first-time meeting plot is automatically matched with the relevant plot segment in the film and television work of "dream of the red blood cell". And can be matched to multiple different versions of movie works for the same content scenario in the original.
Therefore, according to the implementation mode, according to the literary work information and the content plot of the text content which is read currently, the associated film can be matched semantically, and further, the film and television fragment corresponding to the part of the content can be displayed to the user, so that immersive reading experience is brought to the user.
Step 102, acquiring the dynamic videos of N objects in the associated film to obtain N dynamic videos, wherein N is a positive integer.
The N dynamic videos may be character dynamic videos or dynamic videos of other contents related to the description of the text contents, and in particular, the specific embodiment of the present application is described by taking characters as an example:
the acquiring N dynamic videos in the related film may be performing a matting process on one or more characters in the related film to obtain a plurality of character dynamic videos composed of the same character image in each frame, where the person to be scratched may be a main character in the related film or a character matching a character appearing in the text content.
The acquiring of the N dynamic videos in the related film may be directly acquiring N dynamic videos of people matching with characters appearing in the text content from dynamic videos of people after the related film is subjected to the matting processing in advance.
Optionally, the step 102 includes:
and carrying out matting processing on N objects in the associated film, which are matched with the characters in the text content, and generating a dynamic video of each object by using the multi-frame image of each object obtained by matting to obtain the N dynamic videos.
In one embodiment, the related film may be scratched while reading, where the object may refer to an object such as a person, an animal, and an animated character in the related film, specifically, taking a person as an example, each frame of image in the related film may be identified, a person image may be determined, a person image in each frame of image may be scratched, and each frame of person image of the same person obtained by the scratching may be combined according to a video frame sequence to generate a dynamic video of the person. In some embodiments, in order to extract the dynamic video of the characters corresponding to the characters in the text content, the images of the characters corresponding to the characters in the text content in the associated film may be identified, so that only the images of the characters are extracted, and the dynamic video of the characters is obtained.
It should be noted that, audio of the character to be scratched in the related film may be added to the dynamic video of the corresponding character, so as to ensure that the character can perform the speech of the character when displaying the dynamic video of a certain character; or, corresponding audio can be generated for the lines of the required performance of the figures in each figure dynamic video, and the corresponding audio can be merged into the corresponding figure dynamic video.
For example, as shown in fig. 4, when the chapter of the first meeting of the three main character beauties in the literary works "red building dream" is read currently, the chapter can be matched with the video segment of the first meeting of the three main character beauties in the literary works "red building dream", so that dynamic matting can be performed on the three character characters in the video segment, and a dynamic video of the three character characters can be obtained.
Thus, according to the embodiment, the related films can be matched in real time according to the current reading content, and the dynamic video of the corresponding role is obtained through matting, so that the current reading content can be deduced conveniently by displaying the dynamic video subsequently.
Step 103, displaying at least one of the N dynamic videos in a second area on the reading interface.
After the N dynamic videos are obtained, the N dynamic videos may be directly displayed in a second area on the reading interface, or the N dynamic videos may be further filtered, and only some of the N dynamic videos may be displayed, or specifically, one or more of the N dynamic videos may be determined to be displayed according to user selection.
The second area may be a specific display area on the reading interface, for example, a bottom area or a top area of the reading interface, and when the first area is a certain part of a text display area on the reading interface, the second area may be an area which is not overlapped with the first area on the reading interface, so as to avoid that a dynamic video displayed on the reading interface and the text content are blocked mutually to influence the reading of a user.
Optionally, the step 103 includes:
determining the display position and the display size of the second area according to the display position and the display size of the first area, wherein the second area is not overlapped with the first area;
and displaying at least one of the N dynamic videos in the second area.
In an embodiment, the first area and the second area may be display areas that do not overlap each other on the reading interface, so as to avoid mutual shielding between text content and dynamic video.
It should be noted that, in some embodiments, the first area and the second area may also overlap, and when the first area and the second area overlap, transparency of the second area and the content displayed therein may be improved, for example, the transparency of the display of the second area is increased, so that the user can view the content displayed in the overlapping area under the condition of overlapping.
In this embodiment, when the dynamic video needs to be displayed on the reading interface, the display area of the dynamic video may be automatically adapted according to the display area of the text content currently being read, that is, the first area, and specifically, the display position and the size of the second area may be determined according to the display position and the size of the first area, so that the display area of the second area does not overlap with the first area, for example, the display position of the second area is different from the display position of the first area, and a certain distance is provided, and the size of the second area is the same as the size of the first area, or the size of the second area is different from the size of the first area by not more than a preset value.
After determining the second region, at least one of the N dynamic videos may be displayed in the second region.
Through the implementation mode, the mutual shielding of the text content and the dynamic video can be avoided, and the better visual display effect is ensured.
Optionally, the number of the associated movies is multiple, and the multiple associated movies are respectively from multiple different movie resources corresponding to the text content;
the step 103 includes:
displaying N object options in a third area on the reading interface, wherein the N object options are in one-to-one correspondence with the N dynamic videos;
and displaying M dynamic videos corresponding to the M object options in the second area according to the selection operation of the user on the M object options in the N object options, wherein M is an integer greater than or equal to 1 and less than or equal to N.
In one embodiment, there may be multiple matching movie resources in the current literature, for example, a piece of literature captures multiple different versions of a movie, in which case multiple associated movies from different movie versions may be matched, and multiple dynamic videos from different movie resources may be obtained through matting. Therefore, in the embodiment, the situation that the dynamic videos in different movie and television dramas need to be simultaneously presented in one picture can occur, and a plurality of extracted dynamic videos can be listed at the moment, so that a user can select the dynamic video to be performed according to preference.
For example, as shown in fig. 5, when reading the literary works "dream of the red blood cells", two versions of the movie works "dream of the red blood cells" can be matched, so that roles in the two versions can be displayed in a fusion manner on a reading interface, for example, the first version of the beauties, the natural jade and the second version of the jade are subjected to a fusion session on the reading interface.
Specifically, taking a figure as an example, N person options may be displayed in a third area on the reading interface, where the N person options respectively correspond to the N dynamic videos, the third area may be a bottom area or other preset areas of the reading interface, a user may select a desired performance person option according to preference, for example, the user may select a certain person option and drag the selected person option to the performance area, that is, the second area, so that the dynamic video corresponding to the person option selected by the user is displayed in the performance area, and the user may perform multiple selection and drag operations on different person options to complete the selection of multiple dynamic videos for performance.
For example, as shown in fig. 6, a plurality of character options obtained by matting two different versions of movie and television work clips are displayed at the bottom of the reading interface, and the user selects a favorite actor and drags the favorite actor to the performance area, so that the dynamic video of the character selected by the user is displayed in the performance area, and the dynamic video of other non-selected characters may not be displayed or hidden.
Thus, through the embodiment, the dynamic videos in different film and television versions can be fused according to the selection of the user, so that the display effect is improved.
Optionally, the associated film includes a first film of the same type as the literary work to which the text content belongs;
the step 103 includes:
converting the text content into a first audio signal;
based on the first audio signal, adjusting facial actions in a first dynamic video, wherein the first dynamic video is a dynamic video from the first film in the N dynamic videos;
and displaying the adjusted first dynamic video in the second area.
In one embodiment, when determining the associated movie, there may be no movie resource of the literature to which the text content belongs, and at this time, the movie resource of the same type as the literature may be searched, a classical movie fragment may be extracted from the movie resource as the associated movie, or a movie fragment performed by an actor loved by the user may be extracted. For example, for a martial arts novel, a video of an actor being played by a user's favorite actor may be extracted from the movie of the martial arts being played by the actor as an associated movie.
In this embodiment, the content of the dynamic video extracted from the related movie may not be identical to the text content, and in this case, the action of the face of the person in the dynamic video needs to be adaptively adjusted to perform the speech of the corresponding character in the text content. Based on the content of the current chapter character in the original literature, the character can deduct the original chapter, and the character can present the movie and television picture for readers loving the original.
In particular, the Text content may be converted into the first audio signal, and in particular, the Speech portion of the Text content that needs To be performed by the character in the first dynamic video may be converted into the first audio signal, for example, text To Speech (TTS) technology may be used To implement the conversion of the Text content into the audio signal.
The first audio signal is then used to drive facial movements of the character in the first dynamic video to match what the first dynamic video performs with the textual content. The method is characterized in that the facial expression of the person in the first dynamic video is required to be redrawn, and the method can be realized by using a voice-driven head animation (MakeItTalk) technology, and the technology can generate a section of facial animation video synchronous with input audio through voice signals and a common two-dimensional (2D) real photo or cartoon picture. The core of the technology is to use 68 key points of the face as an intermediate mode to link the mapping from voice to 2D animation, decompose the whole problem into two independent problems, and learn the mapping from audio to low-dimensional facial key point animation and the mapping from facial key point animation to 2D animation by using a neural network respectively. The method comprises the following specific steps:
Firstly, inputting a picture to be driven, such as a character image in the first dynamic video, and extracting face key points of the character according to the existing method;
secondly, inputting a section of audio, such as the first audio signal, and separating two signals, namely an audio content signal and a speaker identity characteristic signal, through an audio conversion module;
then, using a time sequence network to learn the face key point displacement based on the voice content for the voice content signal, and superposing the face key point displacement on the extracted original face key point of the person to control the matching of the mouth movement and the input audio;
and finally, converting the facial key point animation into a facial animation. The cartoon face animation is converted by using a triangle segmentation and Warping (Warping) module, and the true face animation is converted by using an image translation (ImageTranslation) network structure.
After the facial actions of the people in the first dynamic video are adjusted, the adjusted first dynamic video can be displayed in the second area.
Therefore, according to the embodiment, the actors in the movie drama of the same type as the currently read text content can be used for performing the currently read text content, so that the currently read text content is presented to the user in a movie manner, and the electronic reading experience of the user is improved.
Optionally, after the step 103, the method further includes:
updating the text content displayed in the first area;
adjusting the display transparency of the second dynamic video to hide the second dynamic video under the condition that the content performed by the second dynamic video displayed on the second area is not matched with the text content currently displayed in the first area;
or under the condition that the content performed by the second dynamic video is not matched with the text content currently displayed in the first area, converting the text content currently displayed in the first area into a second audio signal, and adjusting the facial action in the second dynamic video based on the second audio signal.
In one embodiment, with updating of the user reading progress, the text content displayed in the first area is updated, at this time, if the content performed by a certain dynamic video displayed in the second area is not matched with the text content currently displayed in the first area, for example, a speech performed by the second dynamic video is inconsistent with a speech performed in the text content, or when a side white is played through intelligent voice, the display transparency of the dynamic video can be adjusted, for example, the display transparency of the dynamic video is adjusted to realize hiding of the dynamic video, wherein the dynamic video extracted from an associated film can be in a webm format, and a transparent channel is supported by the video in the format, so that the transparency can be adjusted.
In still another embodiment, if the content of a certain dynamic video displayed in the second area does not match the text content currently displayed in the first area, the content may be converted into audio according to the text content currently displayed in the first area, the facial motion of the person in the dynamic video may be adjusted, and the converted audio may be played when the adjusted dynamic video is displayed, so that the adjusted content of the dynamic video performs matches the text content currently displayed in the first area. The specific manner is similar to the adjustment of the first dynamic video in the previous embodiment.
The text content currently displayed in the first area can be converted into a second audio signal, and particularly, the speech part, which needs to be performed by the character in the second dynamic video, in the text content can be converted into the second audio signal. And then using the second audio signal to drive facial actions in the second dynamic video so as to match the content performed by the second dynamic video with the text content currently displayed in the first area. That is, the facial expression of the person in the second dynamic video needs to be redrawn, which can be specifically implemented by using a voice-driven head animation (MakeItTalk) technology, and the specific implementation process can be referred to the related description in the foregoing embodiment, so that no redundant description is needed here.
Thus, through the implementation mode, the inconformity of the content of the dynamic video performance and the text content read at present can be avoided, or the matching of the content of the dynamic video performance and the text content read at present can be ensured, so that the video deduction effect is ensured, and the reading experience of a user is ensured.
According to the embodiment of the application, the original literal content plot is combined with the movie work, and the movie role is deduced again through the technology, so that a more-dimensional experience mode is provided for readers faithful to the original; synthesizing movie and television play roles of different versions in one interface, performing cross-space-time deduction, and providing different scene deductions for users; based on the text content of the novel, the video character roles and the text areas are reasonably adapted and the layout is adjusted, so that the reading experience of readers is better improved.
According to the information processing method, the associated film is determined according to the text content displayed in the first area on the reading interface; acquiring N dynamic videos in the associated film; and displaying at least one of the N dynamic videos in a second area on the reading interface. Therefore, by displaying the dynamic video in the related film on the text reading interface, the text content can be deduced by the film and television content in the related film and television resource, so that the reading display effect is improved, and better content immersion experience is provided for users.
The embodiment of the application also provides an information processing device. Referring to fig. 7, fig. 7 is a block diagram of an information processing apparatus provided in an embodiment of the present application. Since the principle of solving the problem of the information processing apparatus is similar to that of the information processing method in the embodiment of the present application, the implementation of the information processing apparatus can refer to the implementation of the method, and the repetition is omitted.
As shown in fig. 7, the information processing apparatus 700 includes:
the determining module 701 is configured to determine an associated movie according to text content displayed in the first area on the reading interface;
an acquiring module 702, configured to acquire N dynamic videos in the associated movie;
and the display module 703 is configured to display at least one of the N dynamic videos in a second area on the reading interface.
Optionally, the obtaining module 702 is configured to perform matting processing on N objects in the associated movie, where the N objects are matched with characters in the text content, and generate a dynamic video of each object by using a multi-frame image of each object obtained by matting, so as to obtain the N dynamic videos.
Optionally, the determining module 701 includes:
the first determining unit is used for determining a first film and television resource matched with the information of the literary works according to the information of the literary works to which the text content belongs;
The extraction unit is used for extracting the speech or the side notes in the text content;
the acquisition unit is used for acquiring subtitle information of the first film and television resource;
and the second determining unit is used for determining the film and television fragment matched with the caption information and the speech or the side note in the first film and television resource as the associated film.
Optionally, the determining module 701 includes:
a third determining unit, configured to determine, according to information of a literature work to which the text content belongs, a second movie resource that matches the information of the literature work;
a fourth determining unit, configured to perform semantic understanding on the text content, and determine a content scenario of the text content;
and a fifth determining unit, configured to determine a movie clip matching the content scenario in the second movie resource as the associated movie.
Optionally, the number of the associated movies is multiple, and the multiple associated movies are respectively from multiple different movie resources corresponding to the text content;
the display module 703 includes:
the first display unit is used for displaying N object options in a third area on the reading interface, and the N object options are in one-to-one correspondence with the N object dynamic videos;
And the second display unit is used for displaying M dynamic videos corresponding to the M object options in the second area according to the selection operation of the user on the M object options in the N object options.
Optionally, the associated film includes a first film of the same type as the literary work to which the text content belongs;
the display module 703 includes:
a conversion unit for converting the text content into a first audio signal;
the adjusting unit is used for adjusting facial actions in a first dynamic video based on the first audio signal, wherein the first dynamic video is a dynamic video from the first film in the N dynamic videos;
and the third display unit is used for displaying the adjusted first dynamic video in the second area.
Optionally, the information processing apparatus 700 further includes:
the updating module is used for updating the text content displayed in the first area;
the first adjusting module is used for adjusting the display transparency of the second dynamic video to hide the second dynamic video when the content performed by the second dynamic video displayed on the second area is not matched with the text content currently displayed in the first area;
Or the second adjusting module is used for converting the text content currently displayed in the first area into a second audio signal and adjusting the facial action in the second dynamic video based on the second audio signal under the condition that the content performed by the second dynamic video is not matched with the text content currently displayed in the first area.
Optionally, the display module includes:
a sixth determining unit, configured to determine, according to a display position and a display size of the first area, a display position and a display size of the second area, where the second area does not overlap with the first area;
and the fourth display unit is used for displaying at least one of the N dynamic videos in the second area.
The information processing apparatus 700 provided in the embodiment of the present application may execute the above method embodiment, and its implementation principle and technical effects are similar, and this embodiment will not be described herein.
The information processing device 700 of the embodiment of the application determines the associated movie according to the text content displayed in the first area on the reading interface; acquiring N dynamic videos in the associated film; and displaying at least one of the N dynamic videos in a second area on the reading interface. Therefore, by displaying the dynamic video in the related film on the text reading interface, the text content can be deduced by the film and television content in the related film and television resource, so that the reading display effect is improved, and better content immersion experience is provided for users.
The embodiment of the application also provides electronic equipment. Because the principle of solving the problem of the electronic device is similar to that of the information processing method in the embodiment of the application, the implementation of the electronic device can be referred to the implementation of the method, and the repetition is omitted. As shown in fig. 8, an electronic device according to an embodiment of the present application includes:
processor 800, for reading the program in memory 820, performs the following processes:
determining an associated film according to the text content displayed in the first area on the reading interface;
acquiring N dynamic videos in the associated film;
and displaying at least one of the N dynamic videos in a second area on the reading interface.
Wherein in fig. 8, a bus architecture may comprise any number of interconnected buses and bridges, and in particular, one or more processors represented by processor 800 and various circuits of memory represented by memory 820, linked together. The bus architecture may also link together various other circuits such as peripheral devices, voltage regulators, power management circuits, etc., which are well known in the art and, therefore, will not be described further herein. The bus interface provides an interface. The processor 800 is responsible for managing the bus architecture and general processing, and the memory 820 may store data used by the processor 800 in performing operations.
Optionally, the processor 800 is further configured to read the program in the memory 820, and perform the following steps:
and carrying out matting processing on N objects in the associated film, which are matched with the characters in the text content, and generating a dynamic video of each object by using the multi-frame image of each object obtained by matting to obtain the N dynamic videos.
Optionally, the processor 800 is further configured to read the program in the memory 820, and perform the following steps:
determining a first film and television resource matched with the information of the literary works according to the information of the literary works to which the literal contents belong;
extracting the speech or the side white in the text content;
acquiring subtitle information of the first video resource;
and determining the film and television fragments, in which the subtitle information in the first film and television resource is matched with the line or the side, as the associated film.
Optionally, the processor 800 is further configured to read the program in the memory 820, and perform the following steps:
determining a second film and television resource matched with the information of the literary works according to the information of the literary works to which the literal contents belong;
semantic understanding is carried out on the text content, and the content plot of the text content is determined;
and determining the film and television fragment matched with the content plot in the second film and television resource as the associated film.
Optionally, the number of the associated movies is multiple, and the multiple associated movies are respectively from multiple different movie resources corresponding to the text content;
processor 800 is also configured to read a program in memory 820, performing the steps of:
displaying N object options in a third area on the reading interface, wherein the N object options are in one-to-one correspondence with the N dynamic videos;
and displaying M dynamic videos corresponding to the M object options in the second area according to the selection operation of the user on the M object options in the N object options.
Optionally, the associated film includes a first film of the same type as the literary work to which the text content belongs;
processor 800 is also configured to read a program in memory 820, performing the steps of:
converting the text content into a first audio signal;
based on the first audio signal, adjusting facial actions in a first dynamic video, wherein the first dynamic video is a dynamic video from the first film in the N dynamic videos;
and displaying the adjusted first dynamic video in the second area.
Optionally, the processor 800 is further configured to read the program in the memory 820, and perform the following steps:
Updating the text content displayed in the first area;
adjusting the display transparency of the second dynamic video to hide the second dynamic video under the condition that the content performed by the second dynamic video displayed on the second area is not matched with the text content currently displayed in the first area;
or under the condition that the content performed by the second dynamic video is not matched with the text content currently displayed, converting the text content currently displayed in the first area into a second audio signal, and adjusting the facial action in the second dynamic video based on the second audio signal.
Optionally, the processor 800 is further configured to read the program in the memory 820, and perform the following steps:
determining the display position and the display size of the second area according to the display position and the display size of the first area, wherein the second area is not overlapped with the first area;
and displaying at least one of the N dynamic videos in the second area.
The electronic device provided by the embodiment of the present application may execute the above method embodiment, and its implementation principle and technical effects are similar, and this embodiment will not be described herein.
Furthermore, a computer readable storage medium of an embodiment of the present application is used for storing a computer program, where the computer program can be executed by a processor to implement the steps of the method embodiment shown in fig. 1.
In the several embodiments provided in the present application, it should be understood that the disclosed methods and apparatus may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may be physically included separately, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in hardware plus software functional units.
The integrated units implemented in the form of software functional units described above may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium, and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform part of the steps of the transceiving method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
While the foregoing is directed to the preferred embodiments of the present application, it will be appreciated by those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present application, and such modifications and adaptations are intended to be comprehended within the scope of the present application.

Claims (10)

1. An information processing method, characterized by comprising:
determining an associated film according to the text content displayed in the first area on the reading interface;
acquiring N dynamic videos in the associated film;
and displaying at least one of the N dynamic videos in a second area on the reading interface.
2. The method of claim 1, wherein the acquiring N dynamic videos in the associated movie comprises:
and carrying out matting processing on N objects in the associated film, which are matched with the characters in the text content, and generating a dynamic video of each object by using the multi-frame image of each object obtained by matting to obtain the N dynamic videos.
3. The method of claim 1, wherein determining the associated movie based on the text content displayed in the first area on the reading interface comprises:
determining film and television resources matched with the information of the literary works according to the information of the literary works to which the literal contents belong;
Semantic understanding is carried out on the text content, and the content plot of the text content is determined;
and determining the movie fragment matched with the content scenario in the movie resource as the associated movie.
4. The method of claim 1, wherein the number of associated movies is a plurality, and a plurality of the associated movies are respectively from a plurality of different movie resources corresponding to the text content;
the displaying at least one of the N dynamic videos in the second area on the reading interface includes:
displaying N object options in a third area on the reading interface, wherein the N object options are in one-to-one correspondence with the N dynamic videos;
and displaying M dynamic videos corresponding to the M object options in the second area according to the selection operation of the user on the M object options in the N object options.
5. The method according to any one of claims 1 to 4, wherein the associated movie comprises a first movie of the same type as the literary work to which the literal content belongs; the displaying at least one of the N dynamic videos in the second area on the reading interface includes:
Converting the text content into a first audio signal;
based on the first audio signal, adjusting facial actions in a first dynamic video, wherein the first dynamic video is a dynamic video from the first film in the N dynamic videos;
and displaying the adjusted first dynamic video in the second area.
6. The method of any of claims 1-4, wherein after the displaying at least one of the N dynamic videos in the second area on the reading interface, the method further comprises:
updating the text content displayed in the first area;
adjusting the display transparency of the second dynamic video to hide the second dynamic video under the condition that the content performed by the second dynamic video displayed on the second area is not matched with the text content currently displayed in the first area;
or under the condition that the content performed by the second dynamic video is not matched with the text content currently displayed in the first area, converting the text content currently displayed in the first area into a second audio signal, and adjusting the facial action in the second dynamic video based on the second audio signal.
7. The method of any one of claims 1 to 4, wherein the displaying at least one of the N dynamic videos in the second area on the reading interface comprises:
determining the display position and the display size of the second area according to the display position and the display size of the first area, wherein the second area is not overlapped with the first area;
and displaying at least one of the N dynamic videos in the second area.
8. An information processing apparatus, characterized by comprising:
the determining module is used for determining the associated film according to the text content displayed in the first area on the reading interface;
the acquisition module is used for acquiring N dynamic videos in the associated film;
and the display module is used for displaying at least one of the N dynamic videos in a second area on the reading interface.
9. An electronic device, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor; the information processing method according to any one of claims 1 to 7, characterized in that the processor is configured to read a program in a memory to realize the steps in the information processing method.
10. A computer-readable storage medium storing a computer program, characterized in that the computer program, when executed by a processor, implements the steps in the information processing method according to any one of claims 1 to 7.
CN202310824391.8A 2023-07-06 2023-07-06 Information processing method and device and electronic equipment Pending CN116821415A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310824391.8A CN116821415A (en) 2023-07-06 2023-07-06 Information processing method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310824391.8A CN116821415A (en) 2023-07-06 2023-07-06 Information processing method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN116821415A true CN116821415A (en) 2023-09-29

Family

ID=88116541

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310824391.8A Pending CN116821415A (en) 2023-07-06 2023-07-06 Information processing method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN116821415A (en)

Similar Documents

Publication Publication Date Title
CN109688463B (en) Clip video generation method and device, terminal equipment and storage medium
CN110968736B (en) Video generation method and device, electronic equipment and storage medium
WO2022001593A1 (en) Video generation method and apparatus, storage medium and computer device
KR102148392B1 (en) Video metadata tagging system and method thereof
KR102290419B1 (en) Method and Appratus For Creating Photo Story based on Visual Context Analysis of Digital Contents
US20120276504A1 (en) Talking Teacher Visualization for Language Learning
US8719029B2 (en) File format, server, viewer device for digital comic, digital comic generation device
EP1980960A2 (en) Methods and apparatuses for converting electronic content descriptions
US20180130496A1 (en) Method and system for auto-generation of sketch notes-based visual summary of multimedia content
CN112449253B (en) Interactive video generation
JP5634853B2 (en) Electronic comic viewer device, electronic comic browsing system, viewer program, and electronic comic display method
US20180143741A1 (en) Intelligent graphical feature generation for user content
CN112073749A (en) Sign language video synthesis method, sign language translation system, medium and electronic equipment
CN107066438A (en) A kind of method for editing text and device, electronic equipment
CN113395569B (en) Video generation method and device
CN118093899A (en) Artificial intelligent personal multimedia storage device and playing system and method thereof
JP4097736B2 (en) Method for producing comics using a computer and method for viewing a comic produced by the method on a monitor screen
US20230326369A1 (en) Method and apparatus for generating sign language video, computer device, and storage medium
CN117609548A (en) Video multi-mode target element extraction and video abstract synthesis method and system based on pre-training model
CN117171369A (en) Content generation method, device, computer equipment and storage medium
CN116821415A (en) Information processing method and device and electronic equipment
CN116389849A (en) Video generation method, device, equipment and storage medium
Erol et al. Multimedia clip generation from documents for browsing on mobile devices
CN112905838A (en) Information retrieval method and device, storage medium and electronic equipment
CN113312516B (en) Video processing method and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination