[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN115631270A - Live broadcast method and device of virtual role, computer storage medium and terminal - Google Patents

Live broadcast method and device of virtual role, computer storage medium and terminal Download PDF

Info

Publication number
CN115631270A
CN115631270A CN202211312443.5A CN202211312443A CN115631270A CN 115631270 A CN115631270 A CN 115631270A CN 202211312443 A CN202211312443 A CN 202211312443A CN 115631270 A CN115631270 A CN 115631270A
Authority
CN
China
Prior art keywords
driving data
data
virtual
driving
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211312443.5A
Other languages
Chinese (zh)
Inventor
柴金祥
谭宏冰
周子夏
熊兴堂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Movu Technology Co Ltd
Mofa Shanghai Information Technology Co Ltd
Original Assignee
Shanghai Movu Technology Co Ltd
Mofa Shanghai Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Movu Technology Co Ltd, Mofa Shanghai Information Technology Co Ltd filed Critical Shanghai Movu Technology Co Ltd
Priority to CN202211312443.5A priority Critical patent/CN115631270A/en
Publication of CN115631270A publication Critical patent/CN115631270A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • AHUMAN NECESSITIES
    • A63SPORTS; GAMES; AMUSEMENTS
    • A63FCARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
    • A63F13/00Video games, i.e. games using an electronically generated display having two or more dimensions
    • A63F13/55Controlling game characters or game objects based on the game progress
    • A63F13/56Computing the motion of game characters with respect to other game characters, game objects or elements of the game scene, e.g. for simulating the behaviour of a group of virtual soldiers or for path finding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Signal Processing (AREA)
  • Processing Or Creating Images (AREA)

Abstract

A live broadcast method and device of virtual roles, a computer storage medium and a terminal are provided, and the method comprises the following steps: acquiring a first user image, wherein the first user image comprises an image of a first user; performing redirection processing on state information of a first user to generate first driving data for driving a first virtual role, wherein the state information of the first user is obtained according to a first user image; acquiring a trigger instruction, wherein the trigger instruction is generated by triggering of external operation; determining drive data for generating a live video of the first virtual character according to the first drive data, the second drive data and/or the third drive data; the second driving data is driving data which is obtained according to the trigger instruction and is used for driving the first virtual role, and the third driving data is driving data which is obtained according to the trigger instruction and is used for driving elements in the virtual live broadcast scene. Through the scheme provided by the application, the live broadcast content in the live broadcast process of the virtual role can be richer and more flexible.

Description

Live broadcast method and device of virtual role, computer storage medium and terminal
Technical Field
The present application relates to the field of video technologies, and in particular, to a live broadcast method and apparatus for virtual roles, a computer storage medium, and a terminal.
Background
With the development of virtual reality and augmented reality technologies, a batch of representative virtual roles emerge, and virtual live broadcast technology comes along. The virtual live broadcasting technology is a technology for video production by replacing a real person main broadcast with a virtual role. Virtual live techniques typically drive virtual character synchronized performance in real time by capturing the performance of real actors. In the prior art, only the reproduction of the real character performing on the virtual character is usually concerned, and as for the live broadcast content, the action, expression and the like of the virtual character completely depend on the real character anchor, so that the live broadcast content of the virtual character is single and the flexibility is poor.
Disclosure of Invention
The method and the device solve the technical problem of how to enrich and flexible live content of the virtual role.
In order to solve the technical problem, an embodiment of the present application provides a live broadcast method for a virtual role, where the method includes: acquiring a first user image, wherein the first user image comprises an image of a first user; performing redirection processing on the state information of the first user to generate first driving data for driving a first virtual role, wherein the state information of the first user is obtained according to the first user image; acquiring a trigger instruction, wherein the trigger instruction is generated by triggering of external operation; determining drive data for generating a live video of the first virtual character according to the first drive data, the second drive data and/or the third drive data; the second driving data is driving data which is obtained according to the trigger instruction and is used for driving the first virtual role, and the third driving data is driving data which is obtained according to the trigger instruction and is used for driving elements in a virtual live broadcast scene.
Optionally, the external operation is any one of: the method comprises the following steps of interactive operation sent by audiences, operation aiming at a first game role and operation for indicating the first virtual role and a second virtual role to interact; the first game role is a game role related to the first virtual role, and the second virtual role is a virtual role driven by a second user.
Optionally, before determining, according to the first driving data, the second driving data, and/or the third driving data, driving data for generating a live video of the first virtual character, the method further includes: acquiring audio data of a first user; performing tone conversion processing on the audio data to obtain converted audio data, wherein the tone of the converted audio data is a target tone, and the target tone depends on the first virtual character; and generating the live video according to the converted audio data and the first driving data.
Optionally, before determining, according to the first driving data, the second driving data, and/or the third driving data, driving data for generating a live video of the first virtual character, the method further includes: and selecting second driving data and/or third driving data associated with the triggering instruction from a prefabricated database according to the triggering instruction.
Optionally, the triggering instruction is obtained within a preset time window, and selecting, according to the triggering instruction, second driving data and/or third driving data associated with the triggering instruction from a pre-made database includes: and judging whether the same trigger instruction is acquired in the time window, and if not, selecting second driving data and/or third driving data associated with the trigger instruction from a prefabricated database according to the trigger instruction.
Optionally, the third driving data includes: determining, according to the first driving data, the second driving data, and/or the third driving data, driving data for generating a live video of the first virtual character, where the moving trajectory of the virtual item includes positions of the virtual item at multiple times, and the determining includes: when the distance between the position of the virtual prop and the first virtual character is smaller than or equal to a first preset threshold value, triggering to obtain fourth driving data; determining drive data for generating the live video according to the first drive data and the fourth drive data.
Optionally, the method further includes: acquiring fifth driving data, wherein the fifth driving data is generated by redirecting the state information of the second user and is used for driving the second virtual role; and displaying the second virtual role in the virtual live scene of the first virtual role according to the fifth driving data.
Optionally, the triggering instruction is further configured to trigger the second virtual character to respond to the external operation, and determining, according to the first driving data, the second driving data, and/or the third driving data, driving data for generating a live video of the first virtual character includes: obtaining the driving data for generating the live video according to the second driving data and the sixth driving data; wherein the sixth driving data is driving data for driving the second avatar generated according to the trigger instruction.
The embodiment of the present application further provides a live device of a virtual character, the device includes: the first acquisition module is used for acquiring a first user image, and the first user image comprises an image of a first user; the first generation module is used for carrying out redirection processing on the state information of the first user and generating first driving data for driving a first virtual role, wherein the state information of the first user is obtained according to the first user image; the second acquisition module is used for acquiring a trigger instruction, and the trigger instruction is generated by external operation trigger; the second generation module is used for determining driving data used for generating the live video of the first virtual role according to the first driving data, the second driving data and/or the third driving data; the second driving data is driving data which is obtained according to the trigger instruction and is used for driving the first virtual role, and the third driving data is driving data which is obtained according to the trigger instruction and is used for driving elements in a virtual live broadcast scene.
The embodiment of the present application further provides a computer storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the live broadcasting method for the virtual character are executed.
The embodiment of the present application further provides a terminal, which includes a memory and a processor, where the memory stores a computer program that can be run on the processor, and the processor executes the steps of the live broadcast method for the virtual character when running the computer program.
In order to solve the technical features, an embodiment of the present application further provides a live broadcast device for a virtual character, where the device includes:
the embodiment of the present application further provides a storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the live broadcasting method for the virtual character are executed.
The embodiment of the application further provides a terminal, which comprises a memory and a processor, wherein a computer program capable of running on the processor is stored in the memory, and the processor executes the steps of the live broadcast method of the virtual role when running the computer program.
Compared with the prior art, the technical scheme of the embodiment of the application has the following beneficial effects:
in the scheme of the embodiment of the application, on one hand, a first user image is obtained, the state information of a first user is determined according to the first user image, then the state information of the first user is redirected, and first driving data for driving a first virtual role is generated; on the other hand, a trigger instruction generated by external operation triggering is acquired, and then second driving data used for driving the first virtual role and third driving data used for driving elements in the virtual live broadcast scene are determined according to the trigger instruction, so that driving data used for generating live broadcast video are obtained according to the first driving data, the second driving data and/or the third driving data, and accordingly the live broadcast video of the first virtual role can be generated according to the driving data. In the above-described aspect, the live video is generated not only from the first drive data generated by performing the redirection processing based on the status information of the first user, but also from the second drive data and/or the third drive data obtained by the trigger instruction. Therefore, the first virtual role does not completely depend on the state information of the first user, and can be driven by a trigger instruction, for example, the first virtual role is driven by the trigger instruction to present actions or expressions and the like which cannot be achieved by the first user, and elements in a scene where the first virtual role is located can also be driven by the trigger instruction, so that the live content of the first virtual role can be richer and more flexible.
Further, in the solution of the embodiment of the present application, the trigger instruction is generated by triggering an external operation, where the external operation may be any one of the following: the game system comprises interactive operation sent by audiences, operation aiming at a first game character and operation indicating the first virtual character and a second virtual character to interact. Therefore, in the scheme of the embodiment of the application, the second driving data and the third driving data are generated according to the triggering instruction, and the live broadcast video is generated according to the second driving data and the third driving data.
Further, in the solution of the embodiment of the present application, the audio data of the first user is subjected to a timbre conversion process to obtain converted audio data, where the timbre of the converted audio data depends on the first virtual character. By adopting the scheme, the audio with different timbres can be output when the same user drives different virtual roles, and the audio with the same timbre can also be output when different users drive the same virtual role, so that the requirement on the certainty of the timbre of the virtual role is met.
Further, in the scheme of the embodiment of the application, only when the same trigger instruction is not acquired in the time window, the second driving data and/or the third driving data associated with the trigger instruction are/is selected from the pre-manufactured database according to the trigger instruction. By adopting the scheme, the first virtual role can be ensured to give a response in time under the condition of acquiring a large number of trigger instructions in a short time, and the condition that the jamming is caused by a large number of trigger instructions is avoided.
Drawings
Fig. 1 is a schematic view of an application scenario of a live broadcast method for a virtual role in an embodiment of the present application;
fig. 2 is a schematic flowchart of a live broadcast method for a virtual role in an embodiment of the present application;
FIG. 3 is a schematic flow chart diagram illustrating one embodiment of step S22 in FIG. 2;
FIG. 4 is a diagram illustrating a live view in an embodiment of the present application;
fig. 5 is a schematic architecture diagram of another live broadcast method for virtual roles in the embodiment of the present application;
fig. 6 is an architecture diagram of a live broadcast method for a virtual role in an embodiment of the present application;
fig. 7 is a schematic structural diagram of a live device of a virtual character in an embodiment of the present application.
Detailed Description
As described in the background art, in the prior art, generally, only the reproduction of the real person performing on the virtual character is focused, and for the live broadcast content, the action, expression and the like of the virtual character completely depend on the live broadcast of the real person, so that the live broadcast content of the virtual character is single and the flexibility is poor.
In the traditional live broadcast technology of the virtual character, only the performance of the live anchor is reproduced on the virtual character, the action and the expression of the virtual character completely depend on the live anchor, the virtual character cannot show the action and the expression which cannot be done or not done by the live anchor, and the action and the expression of the virtual character are relatively limited, so that the live broadcast content of the virtual character is single and not flexible. For example, actions that a user cannot directly make on a live scene may be: shooting, dancing, rolling, jumping, etc.
In order to solve the above technical problem, an embodiment of the present application provides a live broadcast method for virtual roles, and in a scheme of the embodiment of the present application, on one hand, a first user image is obtained, state information of a first user is determined according to the first user image, and then the state information of the first user is redirected to generate first driving data for driving the first virtual role; on the other hand, a trigger instruction generated by external operation triggering is acquired, and then second driving data used for driving the first virtual role and third driving data used for driving elements in the virtual live broadcast scene are determined according to the trigger instruction, so that driving data used for generating live broadcast video are obtained according to the first driving data, the second driving data and/or the third driving data, and accordingly the live broadcast video of the first virtual role can be generated according to the driving data. In the above-described aspect, the live video is generated not only from the first drive data generated by performing the redirection processing based on the state information of the first user, but also from the second drive data and/or the third drive data obtained from the trigger instruction. Therefore, the first virtual role does not depend on the state information of the first user completely, and can be driven by a trigger instruction, for example, the first virtual role is driven by the trigger instruction to present actions or expressions and the like which cannot be achieved by the first user, and elements in a scene where the first virtual role is located can also be driven by the trigger instruction, so that the live broadcast content of the first virtual role can be richer and more flexible.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, embodiments accompanying the present application are described in detail below with reference to the accompanying drawings.
Referring to fig. 1, fig. 1 is a schematic view of an application scenario of a live broadcast method for a virtual role in an embodiment of the present application. The following non-limiting description is made with reference to fig. 1 for an application scenario of the embodiment of the present application.
As shown in fig. 1, in the solution of the embodiment of the present application, a camera 11 may be used to photograph a first user 10. In the solution of the embodiment of the present application, the first user 10 does not need to wear motion capture clothing, expression capture devices, eye capture devices, and the like. The camera 11 is not provided on the wearing apparatus of the first user 10.
Specifically, the first user 10 is a subject of the camera 11, and the first user 10 may be a living body capable of autonomously performing an action, for example, a real actor, but is not limited thereto. It should be noted that the camera 11 may be any suitable photographing device, and the present embodiment is not limited to the type and number of the camera 11.
In a specific example, the number of the cameras 11 is single, and the cameras 11 may be RGB (R is an abbreviation of RED, G is an abbreviation of Green, and B is an abbreviation of Blue) cameras; an RGBD (D is an abbreviation of Depth map Depth) camera may also be used. That is, the image captured by the camera 11 may be an RGB image, an RGBD image, or the like, but is not limited thereto.
Further, the camera 11 captures the first user 10, and may obtain a first user image, where the first user image may include an image of the first user 10.
In one particular example, the first user image may include an image of the face of the first user 10 and may also include an image of the neck and shoulders of the first user 10. Still further, the first user image may also include an image of the entire body of the first user 10.
Further, the camera 11 may be connected to the terminal 12, the terminal 12 may be various existing devices having data receiving and data processing functions, and the camera 11 may transmit the acquired first user image to the terminal 12. The terminal 12 may be, but is not limited to, a mobile phone, a tablet computer, a computer, and the like. It should be noted that, in this embodiment, the connection manner between the camera 11 and the terminal 12 is not limited, and may be a wired connection or a wireless connection (for example, a bluetooth connection, a local area network connection, or the like). More specifically, the camera 11 may be a camera provided on the terminal 12, and may be, for example, a camera on a mobile phone, a camera on a computer, or the like.
Further, the terminal 12 may generate first driving data for driving the first virtual object 13 according to the first user image, wherein the first driving data and the first user image have the same time code. For convenience of description, the terminal used by the first user will be referred to as a first terminal hereinafter. That is, the first user image is processed by the first terminal to obtain the first driving data.
The first virtual object 13 may include a virtual human, a virtual animal, a virtual plant, and other objects having a face and a body. The first virtual object 13 may be three-dimensional or two-dimensional, which is not limited in this embodiment of the application.
Therefore, the live broadcast method of the virtual character provided by the embodiment of the application can be applied to a scene of single-camera virtual live broadcast, an optical motion capture device or an inertial motion capture device and the like are not required to be arranged to capture the expression, the action and the like of the user, the virtual character can be driven by shooting the user through the camera, and the virtual live broadcast is realized.
Referring to fig. 2, fig. 2 is a schematic flowchart of a live broadcast method for a virtual role in the embodiment of the present application. The method may be performed by a terminal, which may be any existing terminal device with data receiving and processing capabilities, such as, but not limited to, a mobile phone, a computer, an internet of things device, a server, and the like. More specifically, the method illustrated in fig. 2 may be performed by the first terminal described above. Fig. 2 shows that the live broadcasting method of the virtual character may include the steps of:
step S21: acquiring a first user image, wherein the first user image comprises an image of a first user;
step S22: performing redirection processing on the state information of the first user to generate first driving data for driving a first virtual role, wherein the state information of the first user is obtained according to the first user image;
step S23: acquiring a trigger instruction, wherein the trigger instruction is generated by triggering of external operation;
step S24: determining drive data for generating a live video of the first virtual character according to the first drive data, the second drive data and/or the third drive data;
the second driving data is driving data which is obtained according to the trigger instruction and is used for driving the first virtual role, and the third driving data is driving data which is obtained according to the trigger instruction and is used for driving elements in a virtual live broadcast scene.
It is understood that, in a specific implementation, the method may be implemented by a software program running in a processor integrated inside a chip or a chip module; alternatively, the method can be implemented in hardware or a combination of hardware and software, for example, using a dedicated chip or chip module or a dedicated chip or chip module in combination with a software program.
In a specific implementation of step S21, a first user image may be acquired, and the first user image may be obtained by shooting the first user image by a camera. Further, the first user image includes imagery of the first user. In particular, the first user image may include an image of the first user's face, and may also include an image of the first user's body, but is not limited thereto.
For more about the first user image, reference may be made to the related description above about fig. 1, which is not repeated herein.
In a specific implementation of step S22, first driving data for driving the first avatar may be generated from the first user image.
Referring to fig. 3, fig. 3 is a flowchart illustrating an embodiment of step S22 in fig. 2. As shown in fig. 3, step S22 may include step S221 and step S222:
step S221: determining the state information of the first user at the current moment according to the first user image;
step S222: and performing redirection processing on the state information of the first user to generate first driving data for driving the first virtual role.
In a specific implementation of step S221, the status information may include: face information, body posture information, and gaze direction information. The state information of the first user may be obtained by performing reduction reconstruction on the first user according to the first user image. The current time may be a time indicated by a time code of the first user image.
In a first aspect, face pose reconstruction may be performed from an image of a first user to obtain face information of the first user. The face information may include facial pose information and facial expression information.
More specifically, the face pose information is used to describe the position and orientation of the user's face, and more specifically, the position and orientation of the user's face refers to the position and orientation of the user's face in three-dimensional space. For example, the position of the user's face may be the position of the user's face relative to the camera and the orientation of the user's face may be the orientation relative to the camera.
Further, facial expression information may be used to describe the expression of the user. In a specific example, the facial expression information may be weights of a plurality of blended shapes (Blend shapes), wherein the plurality of blended shapes may be preset; the facial expression information may be weights of a plurality of principal component vectors obtained by performing principal component analysis on a plurality of mixed shapes; three-dimensional feature points and the like are also possible, but not limited thereto.
In a second aspect, human body pose reconstruction may be performed from the first user image to obtain human body pose information of the first user. The body posture information can be used to describe the motion posture of the user's body. In a specific example, the body pose information may be joint angle data, and more specifically, the joint angle data is an angle of a joint.
In a third aspect, catch of eye reconstruction may be performed from the first user image to obtain catch of eye information of the first user. The gaze direction information may be used to describe the gaze direction of the user.
Specifically, the direction in which the center of the eyeball points to the center of the three-dimensional pupil is the gaze direction. The central position of the eyeball is the position of the central point of the eyeball, and the central position of the three-dimensional pupil is the position of the central point of the pupil.
In one specific example, the gaze direction information may be a three-dimensional pupil center position. More specifically, the gaze direction information may be a zenith angle and an azimuth angle of the three-dimensional pupil center position in a spherical coordinate system with the eyeball center position as a coordinate origin.
Therefore, the state information of the first user can be obtained through the restoration and reconstruction operation.
In a specific implementation of step S222, a redirection process may be performed according to the status information of the first user, so as to obtain the first driving data. That is, the first driving data refers to driving data for driving the first avatar, which is generated according to the state information of the first user.
It should be noted that the driving data in the embodiments of the present application may include controller data for generating an animation of the virtual character, embodied in the form of a sequence of digitized vectors. The driving data may be converted into a data format (weight and joint angle data of multiple Blend shapes) that the UE or Unity3d may receive, and the data may be input into a rendering engine, such as the UE or Unity3d, to drive the corresponding part of the virtual character to perform a corresponding action.
Specifically, the first driving data may include: first facial drive data, first gaze drive data, and first body drive data.
In a first aspect, the facial expression information of the first user may be redirected to obtain first facial driving data, where the first facial driving data may be used to drive the first virtual character to present an expression that is the same as the facial expression of the first user.
In a second aspect, the gaze driving data of the first user may be redirected to obtain first gaze driving data, and the first gaze driving data may be used to drive the gaze direction of the first virtual character to be consistent with the gaze direction of the first user.
In a third aspect, the human posture information of the first user may be redirected to obtain the first body driving data. The first body drive data may be used to drive the body pose of the first virtual character to be consistent with the body pose of the first user, i.e., the first body drive data may drive the first virtual character to assume the same body pose as the first user. Wherein the first body drive data may comprise: first extremity drive data for driving an extremity of the first avatar and first torso drive data for driving a torso of the first avatar.
It should be noted that, the specific process of redirecting the status information of the first user may be various existing appropriate methods, and this embodiment does not limit this.
Therefore, in the live broadcast process of the first virtual character, the animation video can be generated according to the first driving data, and then the animation video is pushed to the live broadcast platform to obtain the live broadcast video, so that the virtual live broadcast of the first virtual character is realized. The viewer can obtain the live video by accessing the live platform to watch the live of the first virtual character on the terminal of the viewer. More specifically, live broadcasting of the first virtual character is realized by shooting the first user and driving the first virtual character to reproduce the posture (such as expression, catch of eye, action and the like) of the first user according to the shot first user image.
It should be noted that the process of generating a live video according to drive data described in the embodiment of the present application may include: and generating an animation frame sequence according to the driving data, the virtual character and the virtual live broadcast scene in which the virtual character is positioned, and then rendering the animation frame sequence to obtain an animation video. And then, the animation video can be pushed to a live broadcast platform to obtain a live broadcast video. Further, the "driving data" herein may include one or more of first driving data, second driving data, third driving data, fourth driving data, fifth driving data, and sixth driving data.
In addition, audio data of the first user can be collected, and animation videos can be generated according to the audio data and the first driving data. In a specific implementation, the audio data and the first driving data can be input to a rendering engine module associated with the first terminal to obtain an animated video output by the rendering engine module. For ease of description, the rendering engine module associated with the first terminal will be referred to hereinafter as the first rendering engine module.
It should be noted that, in the scheme of the embodiment of the present application, local rendering may be performed, or cloud rendering may be performed. The local rendering refers to that the rendering engine module is deployed locally at the terminal, the cloud rendering may refer to that the rendering engine module is deployed at the cloud server, the terminal is coupled with the rendering engine module, and the driving data and/or the audio data are uploaded to the cloud server to generate an animation frame sequence and then are rendered to generate an animation video.
In a specific example, after the audio data of the first user is collected, the audio data of the first user may be subjected to tone conversion to convert the tone of the first user into a target tone, and the converted audio data may be input to the first rendering engine module. Thereby, an animation video can be generated from the converted audio data and the first drive data. Wherein the target timbre depends on the first avatar. That is, the target timbre changes as the first avatar adopted by the first user changes.
Through the scheme, each virtual role can have a fixed tone, the virtual roles adopted by the same user in live broadcasting are different, and the output tones are also different. Different users can drive the same virtual character when broadcasting directly, and the sound of different actors can be converted into the sound of a fixed character.
The following describes a specific method of tone conversion. Performing timbre conversion on the first user audio data to obtain converted audio data may include steps (1) to (3):
step (1): semantic information in the audio data S of the first user is extracted.
Specifically, a semantic recognition model may be employed to extract semantic content L of the audio data of the first user information, wherein the semantic content L is independent of timbre. More specifically, the semantic Recognition model may extract the semantic content L by using an Automatic Speech Recognition (ASR) Recognition technology.
Step (2): prosodic feature information in the audio data S of the first user is extracted.
In particular, rhymeThe prosodic feature information may refer to prosodic feature information that is independent of a tone color, and may be, for example, a fundamental frequency, a speech rate, a pitch, a volume, or the like, but is not limited thereto. The prosodic feature information may be a prosodic feature vector (P) 1 ~P N ) To characterize. The prosodic feature information may be provided by one or more encoders M A Extracted from the audio data S. Wherein, the encoder M A May be composed of one or more deep learning network models a.
And (3): and synthesizing to obtain the converted audio data.
Specifically, a target feature vector T for characterizing a target timbre may be acquired, and then semantic content L, the target feature vector T, and a prosodic feature vector (P) may be extracted 1 ~P N ) A synthesis process is performed to generate converted audio data.
More specifically, the semantic content L, the target feature vector T, and the prosodic feature vector (P) may be expressed 1 ~P N ) Input to a decoder M C And performing synthesis processing to obtain converted audio data. Wherein the decoder M C May be comprised of one or more deep learning network models.
The target feature vector T may be extracted from a reference audio signal R, the reference audio signal may be an audio signal containing a target timbre, and a duration of the reference audio signal may be not less than a preset duration threshold, where the preset duration threshold may be 3 seconds. In addition, the content of the reference audio signal is not limited in this embodiment.
Further, by inputting the reference audio signal to the encoder M B A target feature vector T for characterizing the target timbre may be extracted. Wherein, the encoder M B May be composed of one or more deep learning network models C.
Before step (3) is executed, prosodic feature information (P) can also be processed 1 ~P N ) Performing adjustment to obtain adjusted prosodic feature information (R) 1 ~R N ). Accordingly, in step (3), the semantic content L, the target feature vector T and the key can be determined according toIntegrated prosodic feature information (R) 1 ~R N ) A synthesis process is performed to generate converted audio data.
Specifically, it may be determined whether or not prosodic feature information needs to be adjusted. Prosodic feature information may be adjusted if the prosodic feature information does not meet preset requirements and/or the user desires to adjust prosodic features to effect a transformation of voice styles (e.g., transforming a deep voice style to a bright voice style, slowing down speech, enhancing frustration, increasing volume, etc.). If the user chooses to transform only the timbre, the prosodic feature information may not be adjusted. The preset requirement may be set according to the actual requirement of the user, and the adjustment of the prosodic feature information may be a linear adjustment.
With reference to fig. 2, in the process that the first user drives the first virtual character to perform live broadcasting, the first virtual character and the element in the virtual live broadcasting scene where the first virtual character is located may also be driven according to the trigger instruction.
In the solutions of other embodiments, the seventh driving data may be generated according to a text content input by the first user, where the text content input by the first user is an utterance of the first user in a live broadcasting process, and the text content may be obtained by performing semantic recognition according to audio of the first user in the live broadcasting process, or may be directly input by the first user. In one specific example, the textual content may be user-generated after capturing the following external operation.
Further, the first driving data and the seventh driving data may be subjected to fusion processing to obtain driving data for generating a live video. The fusion process may refer to superimposition, replacement, and the like, but is not limited thereto. More specifically, in the fusion process, the seventh drive data has a higher priority than the first drive data. The seventh driving data having a higher priority than the first driving data means that the seventh driving data is preferentially used for driving the same portion of the first avatar. Therefore, in the virtual live broadcast process, the live broadcast content can be enriched by not only being driven by the state information of the first user, but also being driven by the words of the first user.
In a specific implementation of step S23, a trigger instruction may be acquired, where the trigger instruction is generated by an external operation trigger.
In one embodiment of the application, the external operation may be issued by the first user. The first user may trigger generation of the trigger instruction by any one of the following:
the first method is as follows: the first user can execute preset operation on the preset control. That is, the external operation may be a preset operation performed on the preset control by the first user, and the preset operation may be a click operation, a slide operation, a long press operation, and the like. The triggering instruction can be different according to different preset controls.
The second method comprises the following steps: when it is detected that the first user speaks the first preset keyword or the user inputs the first preset keyword, a trigger instruction corresponding to the first preset keyword can be triggered and generated. That is, the external operation may be the first user speaking the first preset keyword. The triggering instruction can be different according to different first preset keywords.
In specific implementation, the audio data of the first user may be identified by the semantic identification model, and when it is identified that the first preset keyword exists in the audio data of the first user, a trigger instruction corresponding to the first preset keyword may be generated.
The third method comprises the following steps: when it is detected that the first user makes a first preset action, a trigger instruction corresponding to the first preset action may be triggered and generated. That is, the external operation may be that the first user makes a first preset action, wherein the trigger instruction may be different according to the first preset action.
In specific implementation, a motion recognition model may be set on the terminal to recognize a motion of the first user, and when the motion of the first user is a first preset motion, a trigger instruction corresponding to the first preset motion may be generated.
In a specific implementation of step S24, the terminal may select, according to the trigger instruction, the second driving data and/or the third driving data associated with the trigger instruction from the pre-prepared database. Further, the drive data for generating the live video may be determined from the first drive data, the second drive data and/or the third drive data.
It should be noted that, when the third driving data is not selected according to the trigger instruction, the third driving data may be null. Accordingly, when the second driving data is not selected according to the trigger command, the second driving data may be null.
Specifically, the prefabricated database may include a plurality of preset drive data, the plurality of preset drive data may include drive data for driving the first virtual character, and may also include drive data for driving an element in a virtual live broadcast scene, the virtual live broadcast scene may refer to a virtual scene where the first virtual character is located, and the element in the virtual live broadcast scene may refer to a virtual prop, a virtual special effect, and the like in the virtual live broadcast scene. Therefore, the preset expression, the preset action, the preset special effect and the like can be realized through the preset driving data.
After the trigger instruction is obtained, the preset driving data associated with the trigger instruction can be selected from the preset database according to the trigger instruction and the preset mapping relation. The preset mapping relationship may refer to a corresponding relationship between the trigger instruction and the preset driving data. That is, the preset driving data associated with the trigger instruction refers to the preset driving data corresponding to the trigger instruction.
In a first example, the preset driving data selected according to the trigger instruction includes driving data for driving the first virtual character, and is recorded as second driving data.
Specifically, the second drive data may include one or more of: second facial drive data, second gaze drive data, and second body drive data. The second facial driving data are driving data which are obtained according to the trigger instruction and are used for driving the facial expression of the first virtual character, the second eye driving data are driving data which are obtained according to the trigger instruction and are used for driving the eye direction of the first virtual character, and the second body driving data are driving data which are obtained according to the trigger instruction and are used for driving the body posture of the first virtual character.
Further, drive data for generating live video may be determined from the second drive data and the first drive data.
In the first example, the second driving data may include only the second face driving data. In this case, the first face drive data in the first drive data may be replaced with the second face drive data to obtain drive data for generating live video. In this scheme, the captured expression of the first user may be replaced by a pre-made expression (i.e., an expression corresponding to the second face driving data) to obtain an expression of the first virtual character, so that the expression of the first virtual character is richer and more diverse.
In a second example, the second drive data may comprise only second body drive data. In this case, the first body drive data in the first drive data may be replaced with the second body drive data to obtain drive data for generating live video. When the scheme is adopted, the captured action of the first user can be replaced by the prefabricated action (namely, the action corresponding to the second body driving data) so as to obtain the action of the first virtual character, and the action of the first virtual character is richer and more diverse. Wherein, the limb movement corresponding to the second body driving data can be dancing and the like.
Alternatively, the second drive data may comprise only second body drive data, which may be superimposed to the first body drive data to obtain drive data for generating live video. That is, the selected pre-formed action (i.e., the action corresponding to the second body-actuation data) may be superimposed on the captured first user action to obtain the action of the first virtual character. In other words, the actions of the first virtual character include the actions of the first user and the pre-fabricated limb actions corresponding to the second body actuation data. For example, the first body drive data may comprise only drive data for driving the torso and the second drive data may comprise drive data for driving the limbs, whereby the second drive data and the first body drive data may be superimposed.
In a third example, the second driving data may include only the second catch driving data, and if the first catch driving data is included in the first driving data, the second driving data may replace the first catch driving data in the first driving data. Alternatively, if the first eye driving data is not included in the first driving data, the second driving data may be superimposed with the first driving data.
In a fourth example, the second drive data may include a full set of drive data (i.e., the second drive data includes second facial drive data, second gaze drive data, and second body drive data). In this case, the first drive data may be directly replaced with the second drive data, i.e. the second drive data may be directly used as drive data for generating live video. Therefore, the expression, the limb action and the like of the first virtual character are the prefabricated expression and the prefabricated limb action corresponding to the second driving data.
In a specific implementation, the second driving data may carry a specific time code for specifying a time point for adding the pre-made expression and/or the pre-made action.
Specifically, the time when the trigger command is acquired is denoted as time T, and the designated time code may be Δ T, and the drive data for driving the first virtual character at time (T +. DELTA.T) may be denoted as target drive data. That is, the drive data for generating the live video at the time (T +. DELTA.T) is the target drive data. Wherein Δ T may be predetermined.
More specifically, the target drive data may be determined based on the first drive data and the second drive data at time t. For example, if the second driving data includes the full set of driving data, the target driving data may be the second driving data. With regard to further content of the first drive data and the second drive data according to the time t, reference may be made to the above related description regarding deriving drive data for generating live video from the first drive data and the second drive data.
Further, a time period between time T and time (T +. DELTA.T) is a transition time period, that is, the first avatar is transitioned from the state at time T to the state corresponding to the target drive data.
Specifically, in the time period between time T and time (T +. DELTA.T), the first driving data and the target driving data at time T may be fused to obtain the driving data for generating the live video at each time in the transition time period. In the fusion, the first drive data and the target drive data at time t may be subjected to weighting calculation. Wherein the weight of the first driving data at time t gradually decreases and the weight of the target driving data gradually increases as time passes. With such a scheme, a natural transition of the state of the first avatar can be achieved.
In a non-limiting example, before step S24 is executed, a matching degree between the first driving data and the second driving data at time t may be calculated, and if the calculated matching degree is greater than or equal to a preset matching threshold, step S24 may be executed according to continuation. If the calculated degree of match is less than the match threshold, the second drive data may be ignored. That is, if the calculated matching degree is smaller than the matching threshold, the first user may continue to drive the first virtual character without driving the first virtual character according to the second driving data selected by the trigger instruction. In other words, if the calculated degree of match is less than the match threshold, the trigger instruction may not be automatically responded to. Further, in this case, a reminding message for reminding the first user of the received trigger instruction may be issued, so that the first user manually responds to the trigger instruction to interact with the outside.
In particular, the degree of matching may be used to characterize the degree of similarity between states (e.g., body posture and facial expression) of the first virtual character to which the drive data corresponds. If the matching degree between the two driving data is greater than or equal to the preset matching threshold, the switching between the states corresponding to the two driving data is reasonable and natural. Conversely, if the matching degree is smaller than the matching threshold, it can be determined that the switching between the states corresponding to the two drive data is not reasonable. Therefore, the adoption of the scheme is beneficial to enabling the state switching of the first virtual role to be more natural in the interaction process.
In a second example, the preset driving data selected according to the trigger instruction includes driving data for driving an element in the virtual live scene, and is denoted as third driving data. The element may be a virtual prop, but is not limited thereto. Wherein the third drive data may be drive data of an element. The elements may be fireworks, wind, rain, lightning, etc., but are not limited thereto.
Further, the drive data for generating the live video may be determined based on the third drive data and the first drive data. In a specific implementation, the drive data for generating live video may include both the first drive data and the third drive data.
Specifically, since the first driving data is used to drive the first virtual character and the third driving data is used to drive the elements in the virtual character scene, the first rendering engine module may generate the animation video according to the first driving data and the third driving data. That is, the expression, the posture, and the like of the first user may be captured synchronously in the process of rendering the prefabricated special effect according to the third driving data.
The third driving data may also have a designated time code, and the animation video may be generated based on the third driving data at a time indicated by the designated time code. For example, the time when the trigger command is acquired is denoted as time T, and an animation video may be generated from the third driving data at time (T +. DELTA.T).
In another embodiment of the present application, the external operation may refer to an operation that triggers the first virtual character to interact with the outside.
In a first example, the external action may be an interactive action by a viewer, the viewer referring to a live user watching the first virtual character. Accordingly, the trigger instruction may be generated by an interactive operation trigger issued by the viewer.
In a specific implementation, the audience may send an interactive operation through its own terminal, and the interactive operation sent by the audience may be, but is not limited to, a praise, a focus, sending a barrage message, and clicking a virtual item.
In one non-limiting example, the trigger instruction may be captured within a preset time window. The time window may be set by the first user, that is, the first user may set a time period for receiving the trigger instruction during the live broadcast of the first virtual character.
For example, the starting time of the time window may be manually set by the first user, and the length of the time window may be set in advance, for example, may be 2 seconds.
As another example, the starting time of the time window may be set by a voice trigger of the first user. More specifically, when it is detected that the voice of the first user includes the second preset keyword, the time when the second preset keyword is detected may be used as the starting time of the time window.
Further, for each trigger instruction acquired in the time window, the terminal may determine whether the same trigger instruction has occurred in the current time window, and if the same trigger instruction has not occurred in the current time window, may select the second driving data and/or the third driving data associated with the trigger instruction from the pre-manufactured database. If the same trigger instruction occurs within the current time window, the trigger instruction may be ignored. Ignoring the trigger instruction means that the drive data is not selected according to the trigger instruction, and the trigger instruction may be deleted or discarded.
By adopting the scheme, the situation that a large number of audiences carry out interactive operation at the same time can be dealt with, and response can be given in time aiming at a large number of interactive operations.
Further, the second driving data and/or the third driving data can be selected from the pre-prepared database according to a triggering instruction generated by interactive operation triggering sent by a viewer. Drive data for generating the live video is then determined from the first, second and/or third drive data.
In a non-limiting example, the interactive operation issued by the viewer may be clicking on a virtual item, and accordingly, the third driving data selected according to the trigger instruction generated by the interactive operation trigger may include a movement track of the virtual item, and the movement track may include positions of the virtual item at a plurality of times.
In the virtual live scene, the virtual prop may or may not be in contact with the first virtual character.
When the distance between the virtual prop and the first virtual character is less than or equal to a first preset threshold, selection of the fourth drive data may be triggered. Wherein the fourth driving data may be data for driving the first avatar.
The position of the virtual item is the position of the virtual item in the virtual live broadcast scene where the first virtual character is located. The distance between the virtual prop and the first virtual character may be a distance between a centroid position of the virtual prop and a position of the first virtual character contacting the reference site. Wherein, the contact reference position can be determined according to the type of the virtual prop. For example, if the virtual item is cola, the reference contact location may be a hand; if the virtual prop is a stone, the reference contact portion is a head, etc., but the invention is not limited thereto.
More specifically, when the distance between the position of the virtual item and the first virtual character is less than or equal to a first preset threshold, it may be determined that the virtual item is in contact with the first virtual character in a virtual live broadcast scene, and the fourth driving data is used to drive the first virtual character to show a state after the contact with the virtual item occurs.
Further, the drive data for generating the live video may be determined from the fourth drive data and the first drive data. Regarding more contents of determining the drive data for generating the live video according to the fourth drive data and the first drive data, reference may be made to the above description regarding determining the drive data for generating the live video according to the second drive data and the first drive data, which is not repeated herein.
It should be noted that the third drive data and the fourth drive data have a time progressive relationship, and when a live video is generated according to the fourth drive data, the live video may not be generated according to the third drive data triggering the fourth drive data.
In the second example, the external operation may be an operation for the first game character. Wherein the first game character is a game character associated with the first virtual character.
Specifically, the first game character may be a game character controlled in the game by the first user. For example, the first user may control the first game character through an external device (e.g., a keyboard, a mouse, a gamepad, etc.) of the first terminal. It can be seen that both the first game character and the first virtual character are controlled by the first user.
Further, the operation for the first game character may be an operation of changing the state of the first game character in the game. More specifically, the operation for the first game character may be an attack operation of the first game character by other game characters in the game, for example, a shooting operation or the like. Alternatively, the operation on the first game character may be a control operation of the first game character by the first user, for example, replacement of equipment of the game character. Wherein the other game character may be controlled by a different game player than the first user.
In a specific implementation, during a game, the terminal may obtain game data from the interface server, where the game data may include data of game characters and props in the game. Further, the game data may be parsed to identify operations directed to the first game character. For example, when the analysis determines that the distance between a prop in the game and the head of the first game character is smaller than the preset distance, it may be determined that the head of the first game character is attacked by the prop.
Therefore, when other game characters perform attack operation on the first game character and/or the first user performs preset control operation on the first game character, corresponding trigger instructions can be generated.
Further, the second actuation data and/or the third actuation data may be selected from a pre-prepared database according to a trigger instruction generated for an operation trigger of the first game character. Drive data for generating the live video is then determined from the first, second and third drive data.
Referring to fig. 4, fig. 4 is a schematic diagram of a live view in an embodiment of the present application.
The live view shown in fig. 4 may include a first area 41 and a second area 42, wherein the first area 41 may display a game view in which a game progress of a first user-controlled first game character and other game characters may be displayed. The second region 42 can display a live view that includes live content of the first avatar. The live view may be displayed on an interface of the first user's terminal and may also be displayed on an interface of the viewer's terminal.
During the first user game, an animation video may be generated according to the first driving data to display a video picture including the first virtual character within the second region 42. Further, if the first game character is subjected to the above-described control operation or attack operation, etc., a trigger instruction may be generated, and the second drive data and/or the third drive data may be selected according to the trigger instruction.
Further, the drive data for generating the live video may be determined according to the first drive data, the second drive data, and/or the third drive data, and then the generated live video may be displayed in the second area 42.
By the scheme, the operation aiming at the first game role can be mapped to the first virtual role, the first virtual role responds to the operation aiming at the first game role, and the scheme can realize linkage between the game role and the virtual role and enhance the interest of live broadcast.
In yet another embodiment of the present application, a second virtual character may be displayed in a virtual live scene in which the first virtual character is located. That is, the first virtual character and the second virtual character may be displayed simultaneously in the virtual live scene in which the first virtual character is located.
Wherein the second avatar may be an avatar driven by a second user, the second user being a different user than the first user.
In a specific implementation, fifth driving data may be obtained from a terminal used by the second user, where the fifth driving data is driving data for driving the second virtual role, and the fifth driving data is obtained by performing redirection processing on the state information of the second user. For more details about the generation of the fifth driving data, reference may be made to the above description about step S21 to step S22, which is not repeated herein. For convenience of description, the terminal used by the second user will be referred to as a second terminal hereinafter.
It should be noted that the first terminal and the second terminal are different terminals, that is, the first user image and the second user image are collected by cameras coupled to the different terminals, and the state information of the first user and the state information of the second user are calculated by the different terminals.
After the first terminal acquires the fifth driving data, the driving data for generating the live video can be determined according to the first driving data and the fifth driving data at the same time, so that the first virtual character and the second virtual character are displayed at the same time.
Referring to fig. 5, fig. 5 is a schematic architecture diagram of another live broadcast method for virtual roles in this embodiment. As shown in fig. 5, different user-driven avatars may be displayed in the same avatar scene.
Specifically, the first terminal may upload the first driving data to the interface server after calculating the first driving data, and correspondingly, the second terminal may upload the fifth driving data to the interface server after calculating the fifth driving data.
Further, the first terminal may obtain fifth driving data from the interface server, and input the first driving data and the fifth driving data to the first rendering engine module together, and the first rendering engine module may generate an animation video including the first virtual character and the second virtual character according to the first driving data and the fifth driving data. Further, the animated video may be transmitted to a first video streaming server to simultaneously display the first virtual character and the second virtual character within the live room of the first virtual character.
Accordingly, the second terminal may receive the first driving data from the interface server and input the first driving data and the fifth driving data to the second rendering engine module, and the second rendering engine module may generate an animation video including the first virtual character and the second virtual character according to the first driving data and the fifth driving data. Further, the animated video may be transmitted to a second video streaming server to simultaneously display the first virtual character and the second virtual character within the live room of the second virtual character. The second rendering engine module may refer to a rendering engine module associated with the second terminal.
It should be noted that, in other embodiments, the first terminal may also obtain the fifth driving data without going through the interface server, for example, the first terminal may directly obtain the fifth driving data from the second terminal.
In a specific implementation, before acquiring the fifth driving data, the first terminal may acquire the second virtual character, that is, may acquire a model of the second virtual character, specifically, acquire an instruction about the second virtual character, and load the second virtual character according to the instruction. And further, setting the second virtual role at a preset position in the virtual live broadcast scene where the first virtual role is located. Further, the first terminal may simultaneously display the first virtual character and the second virtual character. Wherein the preset position may be set by the first user. Accordingly, the first virtual character and the second virtual character can be displayed on the terminal of the audience in the first user live broadcast room at the same time.
Accordingly, before acquiring the fifth driving data, the second terminal may acquire the first virtual character, that is, may acquire a model of the first virtual character, specifically, acquire an instruction of the first virtual character, and load the first virtual character. Further, the first virtual role is arranged at a preset position in the virtual live broadcast scene where the second virtual role is located. Further, the second terminal simultaneously displays the first virtual character and the second virtual character. Wherein the preset position may be set by the second user. Accordingly, the first virtual character and the second virtual character may be displayed simultaneously on the viewer's terminal of the second user live room.
Further, during the process of simultaneously displaying the first virtual character and the second virtual character, a trigger instruction may also be received, where the trigger instruction may be generated by an external operation trigger that instructs the first virtual character and the second virtual character to interact with each other. The external operation instructing the first virtual character and the second virtual character to interact with each other may be issued by the first user, or may be issued by the second user, but is not limited thereto.
Further, after receiving the trigger instruction, the corresponding second driving data and sixth driving data may be selected from the pre-manufactured database according to the trigger instruction. The second driving data is used for driving the first virtual character, and the sixth driving data can be used for driving the second virtual character. Interaction between the first virtual character and the second virtual character may occur through driving of the second driving data and the sixth driving data.
In one non-limiting example, the trigger instruction is used to drive the first virtual character and the second virtual character to perform a contact interaction action in the virtual live scene. Hereinafter, the position where the first virtual character and the second virtual character contact in the virtual live scene is denoted as a target position, and the target position may be determined according to the trigger instruction. That is, the target location may be different for different interactive actions.
Further, a first set position may be determined according to the target position and the limb parameters of the first virtual character, and a second set position may be determined according to the target position and the limb parameters of the second virtual character. The body parameter of the virtual character may be a parameter describing a specific form of the generated virtual character, and specifically may refer to a parameter describing a length of a body of the virtual character. In one specific example, the limb parameters of the virtual character may include: the arm length of the first avatar and the arm length of the second avatar. The first set position and the second set position may refer to coordinate positions of a centroid of the virtual character in the virtual scene.
Further, first default driving data and second default driving data may be acquired from a pre-manufactured database, and then the first virtual character may be driven with the first default driving data, and the second virtual character may be driven with the second default driving data, where the default driving data may be driving data for driving the virtual character to move.
More specifically, when the time when the trigger command is acquired is time T, and the designated time codes of the second driving data and the sixth driving data are Δ T, the first virtual character may be driven to move from the position where the time T is located to the first set position according to the first default driving data and the second virtual character may be driven to move from the position where the time T is located to the second set position according to the second default driving data between time (T + 1) and time (T + Δt). At time (T +. DELTA.T), the first avatar can be moved to the first set position, and the second avatar can be moved to the second set position.
Further, the first virtual character and the second virtual character can be interacted with each other in the live video according to the second driving data and the second virtual character according to the sixth driving data at the time (T +. DELTA.T).
Referring to fig. 6, fig. 6 is a schematic architecture diagram of a live broadcast method for a virtual role in this embodiment. As shown in fig. 6, the first terminal may perform restoration reconstruction according to the first user image to obtain the state information of the first user. Further, the state information of the first user may be redirected to obtain the first driving data. For more details on the restoration reconstruction and redirection process, reference may be made to the above description related to fig. 2, and further details are not repeated here.
Further, the first driving data can be input into the first rendering engine module, and then the animation video output by the first rendering engine module is sent to the live broadcast platform, so as to obtain a live broadcast video, and the audience terminal (for example, audience terminal 1, audience terminal 2 \8230 \ 8230; audience terminal n) can access the live broadcast platform to obtain the live broadcast video, so as to watch the live broadcast of the virtual character.
In different modes, a trigger instruction may be received. The triggering instruction may be obtained from the interface server, but is not limited thereto.
When the first user sets a mode for interacting with the viewer (i.e., S1 is closed), the trigger instruction may be generated by an interactive operation trigger issued by the viewer.
For example, the interactive operation may be to throw eggs to the first virtual character, and the corresponding second driving data and third driving data are selected from the pre-manufactured database according to a trigger command generated by triggering of the interactive operation. The second driving data may be null, and the third driving data may be driving data for dropping an egg to the first avatar. And when the distance between the virtual prop (egg) and the first virtual character is smaller than a first preset threshold value, triggering and calling fourth driving data, wherein the fourth driving data can be used for driving the head of the first virtual character to generate a packet.
Accordingly, the first user may also respond to the virtual item manually. For example, after a first avatar has been hit with an egg, the first user may say: the first virtual character may be painful and may express a painful expression, or "not hit", so that the first virtual character may simultaneously present the first user's expression and voice.
For another example, the interactive operation may present a gift to the first virtual character for the viewer, and the first virtual character may give feedback on the behavior of the viewer to present the gift. And selecting corresponding second driving data and third driving data from a prefabricated database according to a triggering instruction generated by the interactive operation trigger. The third driving data may be null, and the second driving data may be used to drive the first avatar to perform an action representing thank you.
When the first user sets a mode for interacting with the first game character (i.e., S2 is closed), the trigger instruction may be generated by an operation trigger for the first game character.
For example, a first game character is hit by a stone in the game, and a corresponding trigger instruction is triggered and generated. Further, corresponding second driving data and third driving data can be selected from the prefabricated database according to the triggering instruction, wherein the second driving data can be null, and the third driving data can be driving data for driving the stone to smash to the first virtual character.
And when the distance between the virtual prop (stone) and the first virtual character is smaller than a first preset threshold value, triggering and calling fourth driving data, wherein the fourth driving data can be used for driving the head of the first virtual character to generate a block.
Accordingly, the first user can also perform feedback according to the operation for the first game character. For example, when a first avatar is hit by a stone, the first user may make an expression and speak, say, well-painful, who hit me, and make a painful and confusing expression. Therefore, the first user can drive the first virtual character to present the expression and the voice of the first user at the same time.
When the first user sets a mode for interacting with other virtual characters (i.e., S3 is closed), the trigger instruction may be generated by an operation trigger indicating that the first virtual character and the second virtual character interact. Wherein, the interactive operation can be clapping or shaking hands.
For example, when a first user indicates that a first virtual character and a second virtual character are clapped, the current real-time motion capture of the first user and the second user may be stopped first. That is, when the trigger instruction is received, the live video is no longer generated from the first drive data and the fifth drive data.
Further, the two virtual characters can be adjusted to be in a natural state, the position of the clapping is determined to be the target position, then the first set position can be calculated according to the target position and the limb parameters of the first virtual character, and the second set position can be calculated according to the target position and the limb parameters of the second virtual character.
Further, the first default driving data and the second default driving data (not shown in fig. 6) may be obtained from a pre-manufactured database, and the first default driving data and the second default driving data may be input to the first rendering engine module to drive the first avatar to move from the natural state position to the first set position, and to drive the second avatar to move from the natural state position to the second set position.
When the first avatar reaches the first set position and the second avatar reaches the second set position, the second driving data and the sixth driving data (not shown in fig. 6) may be read from the pre-manufactured database. The second driving data can be used for driving the first virtual character to execute the clapping action, and the sixth driving data can be used for driving the second virtual character to execute the clapping action. And generating a live video according to the second driving data and the sixth driving data, so that a first virtual character and a second virtual character in the live video can be obtained to execute a clapping action.
Therefore, the scheme of the embodiment of the application can enable the first virtual role to respond to different external operations.
In the scheme of the embodiment of the application, the user can select the image of the first virtual character. For example, the user may select a first avatar from a plurality of pre-manufactured avatars, or the user may create a customized first avatar. The pre-manufactured virtual role is a preset designed virtual role model, and for example, the pre-manufactured virtual role can be an IP role image or a customized fixed role image. The customized virtual character image can be obtained by editing at least one part of the prefabricated virtual character image by a user, and can also be a body template customized virtual character image.
More specifically, the user can respectively select and edit model parts (including head, face, body, clothes, props and the like) of the prefabricated virtual character image to perform local model replacement, select prefabricated parts to be replaced in a prefabricated list of each part, replace the prefabricated parts on the prefabricated virtual character model and store the prefabricated virtual character image as a customized prefabricated virtual character image. Different types of apparel may be provided to support user selection of different types of avatar images, such as daily, new country, and time of day. Therefore, various high-quality virtual images such as compatible realistic, american and quadratic elements can be provided,
furthermore, the virtual live broadcast scene can be set according to the requirements of the user, and the scene can be switched at any time in the live broadcast process. Such as a Racing scene, a cartoon scene, a science and technology scene, a national wind scene, etc. In addition, the user can also select different shots in the virtual live broadcasting process, such as: close-range, distant-range, panoramic whole body, half-body close-up, etc.; dynamic shots can also be selected for virtual live broadcasting.
Therefore, the virtual live broadcast method provided by the embodiment of the application can be based on a live broadcast product which is driven by a three-dimensional (3D) virtual character image in real time, and a user can simply and quickly use the virtual character to interact with audiences in real time. The scheme provided by the application can be suitable for live broadcasting in various scenes, such as e-commerce, entertainment, activities and the like; online meeting, training, interviewing, etc.; video creation; and various application scenes such as offline virtual interaction and the like. More specifically, in the scene of live e-commerce, single-person daily broadcast, promotion activities, theme activities and brand fields can be performed. Meanwhile, various interactive games such as a 3D barrage, a arena PK, an anchor challenge and an on-hook welfare can be customized, the interaction rate can be improved, and various live broadcasting tools such as a voting lottery can be set to help the anchor to enhance the interactive performance.
Therefore, the embodiment of the application provides a more optimized live broadcast method for virtual roles, which can specifically have the following effects:
1. low threshold
1-1: the hardware threshold is low: complex optical dynamic capturing equipment and inertia dynamic capturing equipment are not needed, and only a common RGB color camera or RGBD color and depth camera is needed; 1-2: the operating threshold is low: the user does not need to wear any equipment, only needs to face the camera, and can realize high-quality virtual live broadcast by matching the action and expression of the user with the operation of external equipment such as a mouse, a keyboard and the like; 1-3: the image import threshold is low: the user can select various different virtual images, can also rapidly import the virtual images, can also self-define the virtual images, and can rapidly realize virtual live broadcast.
2. High quality
2-1: the expression is vivid and fine: capturing the expression of a user through a high-precision facial expression capturing system, redirecting the user to a virtual character in real time, and driving the virtual character; in addition, rich expressions can be generated on the virtual character through the prefabricated expressions; 2-2: the action is natural and smooth: the method comprises the steps that the limb actions of a user are captured and redirected to a virtual role in real time, and the virtual role is driven; in addition, rich actions are embodied on the virtual roles through the prefabricated actions; 2-3: the talent and skill is rich: through a large number of prefabricated talent and skill actions, the virtual characters can be triggered to perform various rich talent and skill performances; 2-4: and real-time barrage interaction is supported, and the live broadcast effect can be increased.
3. Strong interaction
3-1: gift interaction: the audience can control the virtual props to interact with the virtual characters through barrage or other modes, such as buttons or voice; 3-2: game linkage: the user can simultaneously control the roles and the virtual roles in the game, so that the states of the game roles in the game are mapped to the virtual roles, and the linkage between the game roles and the virtual roles is realized;
3-3: multi-person interaction: different users can drive different virtual roles in the same virtual live broadcast scene and interact with each other; different virtual characters can play games, chat or do other things together;
4. easy expansion
4-1: extension in sound: through the tone conversion, a fixed user can realize output of various different tones, different users can also output the same sound, and different requirements such as the certainty and the flexibility of the sound are met;
4-2: extension of game and live rooms: virtual live broadcast can get through the barrage of the game or the live broadcast room, gift buttons, role special effects, role attributes and the like in the game, and richer interactive extension is realized.
Referring to fig. 7, fig. 7 is a schematic structural diagram of a live device of a virtual character in an embodiment of the present application, where the device shown in fig. 7 may include:
a first obtaining module 71, configured to obtain a first user image, where the first user image includes an image of a first user;
a first generating module 72, configured to perform redirection processing on the state information of the first user, and generate first driving data for driving a first virtual role, where the state information of the first user is obtained according to the first user image;
a second obtaining module 73, configured to obtain a trigger instruction, where the trigger instruction is generated by triggering an external operation;
a second generating module 74, configured to determine, according to the first driving data, the second driving data, and/or the third driving data, driving data for generating a live video of the first virtual character;
the second driving data is driving data which is obtained according to the trigger instruction and is used for driving the first virtual role, and the third driving data is driving data which is obtained according to the trigger instruction and is used for driving elements in a virtual live broadcast scene.
For more contents such as the working principle, the working method, and the beneficial effects of the live broadcasting device for the virtual character in the embodiment of the present application, reference may be made to the above description related to the live broadcasting method for the virtual character, and details are not described herein again.
The embodiment of the present application further provides a computer storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the live broadcasting method for the virtual character are executed. The storage medium may include ROM, RAM, magnetic or optical disks, etc. The storage medium may further include a non-volatile memory (non-volatile) or a non-transitory memory (non-transient), and the like.
The embodiment of the application further provides a terminal, which comprises a memory and a processor, wherein a computer program capable of running on the processor is stored in the memory, and the processor executes the steps of the live broadcast method of the virtual role when running the computer program. The terminal includes, but is not limited to, a mobile phone, a computer, a tablet computer and other terminal devices.
It should be understood that, in the embodiment of the present application, the processor may be a Central Processing Unit (CPU), and the processor may also be other general-purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
It will also be appreciated that the memory in the embodiments of the subject application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash memory. Volatile memory may be Random Access Memory (RAM) which acts as external cache memory. By way of illustration and not limitation, many forms of Random Access Memory (RAM) are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (enhanced SDRAM), SDRAM (SDRAM), synchlink DRAM (SLDRAM), and direct bus RAM (DR RAM)
The above-described embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. The procedures or functions according to the embodiments of the present application are wholly or partially generated when the computer instructions or the computer program are loaded or executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer program may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire or wirelessly.
In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus and system may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative; for example, the division of the unit is only a logic function division, and there may be another division manner in actual implementation; for example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be physically included alone, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit. For example, for each device or product applied to or integrated into a chip, each module/unit included in the device or product may be implemented by hardware such as a circuit, or at least a part of the module/unit may be implemented by a software program running on a processor integrated within the chip, and the rest (if any) part of the module/unit may be implemented by hardware such as a circuit; for each device or product applied to or integrated with the chip module, each module/unit included in the device or product may be implemented by using hardware such as a circuit, and different modules/units may be located in the same component (e.g., a chip, a circuit module, etc.) or different components of the chip module, or at least some of the modules/units may be implemented by using a software program running on a processor integrated within the chip module, and the rest (if any) of the modules/units may be implemented by using hardware such as a circuit; for each device and product applied to or integrated in the terminal, each module/unit included in the device and product may be implemented by using hardware such as a circuit, and different modules/units may be located in the same component (e.g., a chip, a circuit module, etc.) or different components in the terminal, or at least part of the modules/units may be implemented by using a software program running on a processor integrated in the terminal, and the rest (if any) part of the modules/units may be implemented by using hardware such as a circuit.
It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein indicates that the former and latter associated objects are in an "or" relationship.
The "plurality" appearing in the embodiments of the present application means two or more. The descriptions of the first, second, etc. appearing in the embodiments of the present application are only for illustrating and differentiating the objects, and do not represent the order or the particular limitation of the number of the devices in the embodiments of the present application, and do not constitute any limitation to the embodiments of the present application. Although the present application is disclosed above, the present application is not limited thereto. Various changes and modifications may be effected by one skilled in the art without departing from the spirit and scope of the application, and the scope of protection is defined by the claims.
Although the present application is disclosed above, the present application is not limited thereto. Various changes and modifications may be effected therein by one of ordinary skill in the pertinent art without departing from the scope or spirit of the present disclosure, and it is intended that the scope of the present disclosure be defined by the appended claims.

Claims (11)

1. A live broadcast method of a virtual character is characterized in that the method comprises the following steps:
acquiring a first user image, wherein the first user image comprises an image of a first user;
performing redirection processing on the state information of the first user to generate first driving data for driving a first virtual role, wherein the state information of the first user is obtained according to the first user image;
acquiring a trigger instruction, wherein the trigger instruction is generated by triggering of external operation;
determining drive data for generating a live video of the first virtual character according to the first drive data, the second drive data and/or the third drive data;
the second driving data is driving data which is obtained according to the trigger instruction and is used for driving the first virtual role, and the third driving data is driving data which is obtained according to the trigger instruction and is used for driving elements in a virtual live broadcast scene.
2. The live broadcasting method of the virtual character according to claim 1, wherein the external operation is any one of the following:
the method comprises the following steps of interactive operation sent by audiences, operation aiming at a first game role and operation for indicating the first virtual role and a second virtual role to interact;
the first game role is a game role related to the first virtual role, and the second virtual role is a virtual role driven by a second user.
3. The live method of a virtual character according to claim 1, wherein before determining the drive data for generating the live video of the first virtual character from the first drive data, the second drive data and/or the third drive data, the method further comprises:
acquiring audio data of a first user;
performing tone conversion processing on the audio data to obtain converted audio data, wherein the tone of the converted audio data is a target tone, and the target tone depends on the first virtual character;
and generating the live video according to the converted audio data and the first driving data.
4. The live method of a virtual character according to claim 1, wherein before determining the drive data for generating the live video of the first virtual character from the first drive data, the second drive data and/or the third drive data, the method further comprises:
and selecting second driving data and/or third driving data associated with the triggering instruction from a prefabricated database according to the triggering instruction.
5. The live broadcasting method of the virtual character according to claim 4, wherein the trigger instruction is obtained within a preset time window, and selecting the second driving data and/or the third driving data associated with the trigger instruction from a pre-manufactured database according to the trigger instruction comprises:
and judging whether the same trigger instruction is acquired in the time window, and if not, selecting second driving data and/or third driving data associated with the trigger instruction from a prefabricated database according to the trigger instruction.
6. The live method of a virtual character according to claim 1, wherein the third driving data comprises: determining, according to the first driving data, the second driving data, and/or the third driving data, driving data for generating a live video of the first virtual character, including:
when the distance between the position of the virtual prop and the first virtual character is smaller than or equal to a first preset threshold value, triggering to acquire fourth driving data;
determining drive data for generating the live video according to the first drive data and the fourth drive data.
7. The live method of a virtual character according to claim 1, further comprising:
acquiring fifth driving data, wherein the fifth driving data is driving data which is generated by redirecting the state information of the second user and is used for driving the second virtual role;
and displaying the second virtual role in the virtual live scene of the first virtual role according to the fifth driving data.
8. The live method of the virtual character according to claim 7, wherein the triggering instruction is further configured to trigger the second virtual character to respond to the external operation, and determining, according to the first driving data, the second driving data, and/or the third driving data, driving data for generating a live video of the first virtual character comprises:
obtaining the driving data for generating the live video according to the second driving data and the sixth driving data;
wherein the sixth driving data is driving data for driving the second avatar generated according to the trigger instruction.
9. A live device of a virtual character, the device comprising:
the first acquisition module is used for acquiring a first user image, and the first user image comprises an image of a first user;
the first generation module is used for carrying out redirection processing on the state information of the first user and generating first driving data for driving a first virtual role, wherein the state information of the first user is obtained according to the first user image;
the second acquisition module is used for acquiring a trigger instruction, and the trigger instruction is generated by triggering of external operation;
the second generation module is used for determining driving data used for generating the live video of the first virtual role according to the first driving data, the second driving data and/or the third driving data;
the second driving data are driving data which are obtained according to the trigger instruction and are used for driving the first virtual role, and the third driving data are driving data which are obtained according to the trigger instruction and are used for driving elements in a virtual live broadcast scene.
10. A computer storage medium having a computer program stored thereon, the computer program, when executed by a processor, performing the steps of the live method of a virtual character of any of claims 1 to 8.
11. A terminal comprising a memory and a processor, the memory having stored thereon a computer program operable on the processor, wherein the processor, when executing the computer program, performs the steps of the live method of a virtual character of any of claims 1 to 8.
CN202211312443.5A 2022-10-25 2022-10-25 Live broadcast method and device of virtual role, computer storage medium and terminal Pending CN115631270A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211312443.5A CN115631270A (en) 2022-10-25 2022-10-25 Live broadcast method and device of virtual role, computer storage medium and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211312443.5A CN115631270A (en) 2022-10-25 2022-10-25 Live broadcast method and device of virtual role, computer storage medium and terminal

Publications (1)

Publication Number Publication Date
CN115631270A true CN115631270A (en) 2023-01-20

Family

ID=84906378

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211312443.5A Pending CN115631270A (en) 2022-10-25 2022-10-25 Live broadcast method and device of virtual role, computer storage medium and terminal

Country Status (1)

Country Link
CN (1) CN115631270A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116527956A (en) * 2023-07-03 2023-08-01 世优(北京)科技有限公司 Virtual object live broadcast method, device and system based on target event triggering
CN117319628A (en) * 2023-09-18 2023-12-29 四开花园网络科技(广州)有限公司 Real-time interactive naked eye 3D virtual scene system supporting outdoor LED screen

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116527956A (en) * 2023-07-03 2023-08-01 世优(北京)科技有限公司 Virtual object live broadcast method, device and system based on target event triggering
CN116527956B (en) * 2023-07-03 2023-08-22 世优(北京)科技有限公司 Virtual object live broadcast method, device and system based on target event triggering
CN117319628A (en) * 2023-09-18 2023-12-29 四开花园网络科技(广州)有限公司 Real-time interactive naked eye 3D virtual scene system supporting outdoor LED screen

Similar Documents

Publication Publication Date Title
WO2022062678A1 (en) Virtual livestreaming method, apparatus, system, and storage medium
US11948260B1 (en) Streaming mixed-reality environments between multiple devices
CN112154658B (en) Image processing apparatus, image processing method, and storage medium
WO2021209042A1 (en) Three-dimensional model driving method and apparatus, electronic device, and storage medium
US11893670B2 (en) Animation generation method, apparatus and system, and storage medium
CN107154069B (en) Data processing method and system based on virtual roles
CN110557625A (en) live virtual image broadcasting method, terminal, computer equipment and storage medium
CN111080759B (en) Method and device for realizing split mirror effect and related product
US20160110922A1 (en) Method and system for enhancing communication by using augmented reality
CN114071180A (en) Live broadcast room display method and device
KR20160021146A (en) Virtual video call method and terminal
CN115631270A (en) Live broadcast method and device of virtual role, computer storage medium and terminal
US20190362131A1 (en) Information processing apparatus, information processing method, and program
CN110162667A (en) Video generation method, device and storage medium
KR20150105058A (en) Mixed reality type virtual performance system using online
US20240163528A1 (en) Video data generation method and apparatus, electronic device, and readable storage medium
CN114155322A (en) Scene picture display control method and device and computer storage medium
CN106534618A (en) Method, device and system for realizing pseudo field interpretation
US11914837B2 (en) Video acquisition method, electronic device, and storage medium
KR102200239B1 (en) Real-time computer graphics video broadcasting service system
US20230206535A1 (en) System to convert expression input into a complex full body animation, in real time or from recordings, analyzed over time
WO2023130715A1 (en) Data processing method and apparatus, electronic device, computer-readable storage medium, and computer program product
WO2022198971A1 (en) Virtual character action switching method and apparatus, and storage medium
CN114425162A (en) Video processing method and related device
CN114079800A (en) Virtual character performance method, device, system and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination