CN111741368B

CN111741368B - Interactive video display and generation method, device, equipment and storage medium

Info

Publication number: CN111741368B
Application number: CN202010102557.1A
Authority: CN
Inventors: 杨慕葵
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2020-02-19
Filing date: 2020-02-19
Publication date: 2023-04-07
Anticipated expiration: 2040-02-19
Also published as: CN111741368A

Abstract

The embodiment of the invention discloses a method, a device, equipment and a storage medium for displaying and generating an interactive video, wherein the method for displaying the interactive video is applied to a client and comprises the following steps: when a user interaction instruction is detected, generating a video acquisition request, wherein the video acquisition request comprises the article information of a target article and the user interaction instruction; sending a video acquisition request to a server, and receiving an interactive video of a target object generated by the server; displaying an interactive video in a set area; the video frame picture of the interactive video comprises the virtual assistant, the interactive video is generated by the server according to the sound data and the model data of the virtual assistant, and the sound data and the model data are generated by the server according to the article information and the user interaction instruction. According to the technical scheme of the embodiment of the invention, the interaction video including the virtual assistant is generated by the cloud according to the user interaction instruction and the article information for interaction, so that the interestingness of voice shopping is improved.

Description

Interactive video display and generation method, device, equipment and storage medium

Technical Field

The embodiment of the invention relates to the technical field of videos, in particular to a method, a device, equipment and a storage medium for displaying and generating an interactive video.

Background

With the popularization of internet online shopping, the shopping enthusiasm of consumers stimulates the competition of practitioners, and the new mode of live telecast with goods is gusty.

When we switch to a voice interactive smart device, the experience of voice shopping mainly surrounds a self-service information exhibition mode, in which information triggering of touch clicking or text input is adjusted to voice input. Namely, originally, a webpage website needs to be manually typed in or an application App needs to be clicked, and a shopping item adding button is manually clicked to enter a shopping cart; saying "I want to shop" now presents the flow of information, "join shopping cart" can add items.

However, in the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art: the article information introduction is carried out in a voice mode, the time consumption is long, the interestingness is not enough, the user interaction degree is not enough, and the user experience and the acceptance degree are poor.

Disclosure of Invention

The invention provides a method, a device, equipment and a storage medium for displaying and generating an interactive video, which are used for improving the interactivity and interestingness of voice shopping.

In a first aspect, an embodiment of the present invention provides a method for displaying an interactive video, which is applied to a client, and the method includes:

when a user interaction instruction is detected, generating a video acquisition request, wherein the video acquisition request comprises the article information of a target article and the user interaction instruction;

sending the video acquisition request to a server, and receiving an interactive video of a target object generated by the server;

displaying the interactive video in a set area;

the video frame picture of the interactive video comprises a virtual assistant, the interactive video is generated by the server according to sound data and model data of the virtual assistant, and the sound data and the model data are generated by the server according to the article information and the user interaction instruction.

In a second aspect, an embodiment of the present invention further provides a method for generating an interactive video, which is applied to a server, and the method includes:

receiving a video acquisition request sent by a client, wherein the video acquisition request comprises article information of a target article and a user interaction instruction;

generating model data and sound data of the virtual assistant according to the article information and the user interaction instruction;

generating an interactive video of the target object according to the model data and the sound data, wherein a video frame picture of the interactive video comprises a virtual assistant;

and sending the interactive video to a client so that the client plays the interactive video.

In a third aspect, an embodiment of the present invention further provides an interactive video display apparatus, where the apparatus includes:

the video acquisition request generating module is used for generating a video acquisition request when a user interaction instruction is detected, wherein the video acquisition request comprises the article information of a target article and the user interaction instruction;

the interactive video receiving module is used for sending the video acquisition request to the server and receiving an interactive video of the target object generated by the server;

the interactive video display module is used for displaying the interactive video in a set area;

In a fourth aspect, an embodiment of the present invention further provides an apparatus for generating an interactive video, where the apparatus includes:

the system comprises a video acquisition request receiving module, a video acquisition request sending module and a video acquisition request sending module, wherein the video acquisition request comprises the article information of a target article and a user interaction instruction;

the sound and model data generation module is used for generating model data and sound data of the virtual assistant according to the article information and the user interaction instruction;

the interactive video generation module is used for generating an interactive video of the target object according to the model data and the sound data, wherein a video frame picture of the interactive video comprises a virtual assistant;

and the interactive video sending module is used for sending the interactive video to the client so as to enable the client to play the interactive video.

In a fifth aspect, an embodiment of the present invention further provides a terminal device, where the terminal device includes:

one or more processors;

a memory for storing one or more programs;

when the one or more programs are executed by the one or more processors, the one or more processors implement the method for generating the interactive video provided by any embodiment of the present invention, and/or the method for displaying the interactive video provided by any embodiment of the present invention.

In a sixth aspect, the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the interactive video generation method provided in any embodiment of the present invention, and/or the interactive video presentation method provided in any embodiment of the present invention.

According to the technical scheme of the embodiment of the invention, the server generates the interactive video according to the user interactive instruction and the article information, and sends the interactive video to the client for displaying so as to interact with the user according to the interactive video, thereby improving the interest and the interactive type of shopping of the user; meanwhile, the interactive video also comprises a virtual assistant, and the sound data and the model data of the virtual assistant are generated by the server side in real time according to the user interactive instruction and the article information, so that the diversity and the self-adaptability of the virtual assistant are improved, the interest of the interactive video is further improved, and the shopping experience of a user is improved.

Drawings

Fig. 1 is a flowchart of a method for displaying an interactive video according to a first embodiment of the present invention;

fig. 2 is a flowchart of a method for displaying an interactive video according to a second embodiment of the present invention;

fig. 3 is a flowchart of a method for generating an interactive video according to a third embodiment of the present invention;

fig. 4 is a flowchart of a method for generating an interactive video according to a fourth embodiment of the present invention;

fig. 5 is a flowchart of a method for generating an interactive video according to a fifth embodiment of the present invention;

FIG. 6 is a schematic structural diagram of a display apparatus for interactive videos according to a sixth embodiment of the present invention;

fig. 7 is a schematic structural diagram of an interactive video generation apparatus in a seventh embodiment of the present invention;

fig. 8 is a schematic structural diagram of a terminal device in an eighth embodiment of the present invention;

fig. 9 is a schematic structural diagram of an interactive system in the ninth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not to be construed as limiting the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a method for displaying an interactive video according to an embodiment of the present invention, where the method is applied to a client, and this embodiment is applicable to a situation where a user performs voice shopping, and the method may be executed by a display apparatus for interactive video, as shown in fig. 1, where the method specifically includes the following steps:

and step 110, when a user interaction instruction is detected, generating a video acquisition request.

The video acquisition request comprises the item information of the target item and a user interaction instruction. The user interaction instruction refers to an instruction sent by a user for interacting with the client, and the instruction can be a voice instruction, a gesture instruction, a touch instruction, a text instruction and the like. The video acquisition request refers to a request initiated by a client to a server for acquiring an interactive video. The client may be a shopping guide display screen, such as a shopping guide display screen arranged in a shopping mall or a supermarket, or a mobile terminal of a user, or other equipment arranged facing a customer. The target object can be any object, such as a commodity, an exhibit, and the like. The article information may include identification information, parameter information, additional information, etc. of the target article, such as an article identification code, specification, sales volume, a label corresponding to the article, etc.

Specifically, when a user interaction instruction input by a user is detected, a video acquisition request may be generated according to the user interaction instruction. The video acquisition request can be generated according to the user voice interaction instruction when the microphone of the client acquires the user voice interaction instruction of the user.

Specifically, the client is generally configured to display detailed information of one or more target items, where the target items may be currently displayed items or items selected by the user. The target item and the user interaction instruction related to the target item can also be determined according to the user interaction instruction. Namely, the target object and the interaction information can be included in the user interaction instruction. And further, acquiring the article information of the target article according to a pre-designed target article catalog, and generating a video acquisition request according to the article information and the interaction information of the target article, wherein the target article catalog comprises each target article and the article information corresponding to the target article. Further, the target item catalog may be modified as needed.

For example, the user interaction instruction may be a user voice interaction instruction, such as "introduce XX product", "person who is suitable for yellow skin color", and the like, and when the client detects the user interaction instruction input by the user, the user voice interaction instruction is converted into text information, and a target article corresponding to the user interaction instruction is obtained, or the user interaction information and the target article are identified according to the text information, the article information of the target article is obtained according to a pre-stored corresponding relationship between the target article and the article information thereof, such as the target article catalog mentioned above, and a video obtaining request is generated according to the article information of the target article and the user interaction information.

And step 120, sending the video acquisition request to the server, and receiving the interactive video of the target object generated by the server.

The server refers to a terminal providing services for a user, such as a cloud server. The video frame picture of the interactive video comprises a virtual assistant, the interactive video is generated by the server according to the sound data and the model data of the virtual assistant, and the sound data and the model data are generated by the server according to the article information and the user interaction instruction. The virtual assistant refers to an avatar generated by the server for the target object, and the avatar includes model data and voice data, the model data is data for generating the avatar, and the voice data is voice data corresponding to the virtual assistant and related to the object or the user interaction instruction, and may be introduction information of the target object, such as attribute information of commodity price or comment information of commodity characteristics, or may be response information interacted with the user, such as speech and speech utterances, or answers to user questions.

Optionally, the sound data is generated by the server according to the user interaction instruction and the item information; the model data is generated by the server according to the sound data and the article information.

Specifically, a first corresponding relationship between the item information of the target item, the user interaction instruction, and the sound data may be stored in advance, and the sound data of the virtual assistant may be generated according to the first corresponding relationship, the user interaction instruction, and the item information. Further, model data for the virtual assistant may be determined based on the item information and the acoustic data for the target item.

Optionally, the interactive video is generated by the server based on a voice-driven face recognition technology, generating an expression time sequence of the interactive video according to a sound track time sequence of the sound data, and generating the expression time sequence, the model data, and the sound data.

The face recognition technology based on voice driving is mainly used for driving a three-dimensional or two-dimensional face model according to voice to generate an animation effect and achieve synchronization of voice and video. The audio track time sequence is each frame of sound data and the corresponding sequence thereof, and correspondingly, the expression time sequence is each frame of expression parameters of the virtual assistant and the corresponding sequence thereof determined according to the audio track time sequence.

Specifically, a basic image of the virtual assistant is generated according to the model data, a face parameter corresponding to each frame language is determined based on a voice-driven face technology, an expression time sequence of the interactive video is generated according to an audio track time sequence of the sound data, and the interactive video is generated according to the basic image, the expression time sequence and the audio track time sequence.

And step 130, displaying the interactive video in a set area.

The setting area may be any area on the screen of the client, and may be a default area of the system or a user-defined area. The display screen can be specifically a left half area or an upper half area of the screen of the client side, or other positions, or can be displayed in a full screen mode. When the set area is a partial area, the remaining other areas can be used for static display of the target item.

Example two

Fig. 2 is a flowchart of a method for displaying an interactive video according to a second embodiment of the present invention, where this embodiment is a further refinement and supplement to the previous embodiment, and the method for displaying an interactive video according to this embodiment further includes: receiving a user voice payment instruction and user binding information, and sending the user voice payment instruction and the user binding information to a server; acquiring user response information and sending the user response information to a server; and when a payment verification passing message fed back by the server is received, payment is carried out according to the article information and the user binding information.

As shown in fig. 2, the interactive video display method includes the following steps:

step 210, when a user interaction instruction is detected, generating a video acquisition request.

The video acquisition request comprises the item information of the target item and a user interaction instruction.

Step 220, sending the video acquisition request to the server, and receiving the interactive video of the target object generated by the server.

Optionally, the sound data is generated by the server according to the user interaction instruction and the item information; the model data is generated by the server according to the sound data and the article information. The model data comprises basic image data, accessory data and facial expression data so as to determine the image of the virtual assistant according to the basic image data and the accessory data, wherein the facial expression data are generated by a service end according to the sound data, and the basic image data and the accessory data are generated by the service end according to the article information.

Specifically, the basic image data is determined by the server according to the article information, for example, the provider of the target article may select a default or dedicated basic image (generated according to the basic image data and the accessory data) for the target article in advance, and may determine the corresponding basic image data and the accessory data according to the identification code of the target article. The facial expression data need to be in one-to-one correspondence with the sound data, that is, the facial expression of the virtual image during speaking, such as the mouth shape, needs to be determined according to the sound data, and of course, parameters of other parts, such as blinking eyes, eyebrow height and the like, can also be included.

And step 230, displaying the interactive video in a set area.

Optionally, the method for displaying an interactive video further includes:

and when the user interaction instruction is a set instruction, or during the period of receiving the user interaction instruction, or when the interactive video is played completely, generating a default video acquisition request so as to acquire the default interactive video of the target commodity from a server according to the default video request.

The setting command may be a pause command, a stop command, or an experience command. Of course, the setting instruction may be other instructions. The setting instruction can be default of the system or self-defined, and can be modified. And the default video acquisition request is initiated to the server by the client, and acquires a default interactive video comprising a default image of the virtual assistant. Generally, in the default interactive video, the avatar maintains a fixed avatar, or the setting action may be repeated at a set period.

Illustratively, if a user sends a "stop playing" voice instruction to the client during the playing of the interactive video, the client sends a default video acquisition request to the server according to the voice instruction, and plays or displays the default interactive video in a set area after acquiring the default interactive video from the server.

Optionally, the method for displaying an interactive video further includes:

when the received user interaction instruction is a user experience instruction, starting a camera, wherein the user experience instruction belongs to a set instruction; acquiring user image data and a user operation instruction of the user, and sending the user image data and the user operation instruction to a server so that the server generates an article experience video of the user according to the article information and the user operation instruction; and receiving the item experience video returned by the server, displaying the default interactive video in a virtual assistant area of the interactive video, and displaying the item experience video in a user experience area of the interactive video.

The user experience instruction can be a user initiated experience instruction such as try-on and try-on. The user image data is generated according to the user picture or video acquired by the camera, and comprises appearance image data of the user, such as hair style, skin color, body type and the like. The user operation instruction refers to an instruction performed by the user on the display picture of the client after the camera is turned on, and the instruction may be a color selection instruction, an enlargement or reduction instruction, and the like. The object experience video is the experience video of the experience effect of the superposed target object on the basis of the user field video collected by the camera.

Exemplarily, taking a target object as lipstick as an example, after a user sends an experience instruction, a client starts a camera, at this time, related guide information can be displayed on a screen of the client to indicate that an image corresponding to the head of the user is located in a designated area of the screen, after the user is positioned, a user operation instruction related to experience is sent, for example, a color number of the lipstick is selected or changed, the camera obtains user image data (head image data) of the user and the user operation instruction (color number selected by the user), a server identifies an area where the mouth of the user is located according to the user image data, further generates an object (lipstick) experience video according to the color number selected by the user, and the object (lipstick) experience video is overlapped with the user video acquired by the camera to generate a virtual experience video of the user, so that an effect of the user for experiencing the target object is provided, and user experience is improved.

Step 240, receiving a user voice payment instruction and user binding information.

The voice payment instruction is a voice instruction sent by the user for payment. The user binding information can be information of a bank card, a payment treasure, weChat and the like of the user.

And step 250, sending the user voice payment instruction and the user binding information to a server, so that the server verifies the voiceprint data of the user according to the user voice payment instruction and generates random response information.

The voiceprint data are biological characteristics which do not relate to privacy, can be used for identification, and meanwhile guarantees the privacy safety of the user, and the payment is convenient and fast.

Illustratively, a user sends a payment instruction (user voice payment instruction), the client instructs the user to input binding information, and the user inputs a binding bank card 123, then the client sends the user voice payment instruction and the binding information input by the user to the server, the server determines voiceprint data of the user according to the user voice payment instruction, and verifies the voiceprint data according to the voiceprint data preset by the user, and after the verification is successful, the subsequent steps are performed.

The random response information is a question which needs to be responded by the user and aims to improve the payment safety, and can be a data calculation question which needs the user to provide answers, chinese character reading and any other question.

And step 260, acquiring the user response information and sending the user response information to the server.

Specifically, the server needs to verify the response information of the user, including voiceprint verification and content verification of the response information, so as to ensure the security of the user payment.

And 270, when the payment verification passing message fed back by the server is received, paying according to the article information and the user binding information.

According to the technical scheme of the embodiment of the invention, the interactive video containing the virtual assistant of the target object is generated according to the interactive instruction of the user, so that when the user browses the target object, the interaction is carried out in a manner similar to video live broadcast, and the interestingness and the immersion of voice shopping are improved; the model data of the virtual assistant comprises a basic image, an accessory and a facial expression, the facial expression is determined by voice data, the synchronization of the sound and the picture of the video is ensured, the basic image and the accessory are determined by a target object and a user instruction, the richness and the differentiation of the virtual assistant are improved, and when the user inputs an instruction or the video is played, the virtual assistant adopts a default image for displaying, the diversity of the states of the virtual assistant is improved, and the interestingness of the interactive video is further improved; meanwhile, a user experience function is provided so as to meet the trial requirement of the user and improve the comprehensiveness of the user in recognizing the target object; when the user pays, the voice print verification and the random response information double verification are adopted, and the payment convenience is improved on the premise of ensuring the payment safety.

EXAMPLE III

Fig. 3 is a flowchart of a method for generating an interactive video according to a third embodiment of the present invention, where the method is applied to a server, and this embodiment is applicable to a case where a user purchases a product by voice, and the method may be executed by a device for generating an interactive video, as shown in fig. 3, where the method specifically includes the following steps:

and step 310, receiving a video acquisition request sent by a client.

And step 320, generating model data and sound data of the virtual assistant according to the article information and the user interaction instruction.

Optionally, the generating model data and sound data of the virtual assistant according to the item information and the user interaction instruction includes:

generating sound data of the virtual assistant according to the user interaction instruction and the item information; and generating model data of the virtual assistant according to the sound data and the item information.

Optionally, the generating an interactive video of the target item according to the model data and the sound data includes:

generating an expression time sequence of the interactive video according to an audio track time sequence of the sound data based on a voice-driven face recognition technology; and generating an interactive video of the target object according to the expression time sequence, the model data and the sound data.

Step 330, generating an interactive video of the target object according to the model data and the sound data, wherein a video frame picture of the interactive video comprises a virtual assistant.

And step 340, sending the interactive video to the client so that the client plays the interactive video.

Optionally, the method for generating an interactive video further includes:

receiving a user voice payment instruction and user binding information; verifying the voiceprint data of the user according to the user voice payment instruction; generating random response information and acquiring user response information; and when the user response information meets the set conditions, generating payment verification passing information, and sending the payment verification passing information to the client so that the client can pay according to the article information and the user binding information. When the user pays, the voice print verification and the random response information double verification are adopted, and the payment convenience is improved on the premise of ensuring the payment safety.

Example four

Fig. 4 is a flowchart of a method for generating an interactive video according to a fourth embodiment of the present invention, where this embodiment is a further refinement and supplement to the previous embodiment, and the method for generating an interactive video according to this embodiment further includes: receiving user image data and user operation instructions acquired by a camera; generating an article experience video of the user according to the article information of the target article and a user operation instruction; and sending the item experience video to a client so that the client displays the default interactive video in a virtual assistant area of the interactive video and displays the item experience video in a user experience area of the interactive video.

As shown in fig. 4, the method for generating an interactive video includes the following steps:

and step 410, receiving a video acquisition request sent by a client.

And step 420, generating sound data of the virtual assistant according to the user interaction instruction and the item information.

Step 430, generating model data of the virtual assistant according to the sound data and the article information.

Step 440, generating an interactive video of the target object according to the model data and the sound data.

The video frame picture of the interactive video comprises a virtual assistant;

and 450, generating default image data of the virtual assistant according to the item information, and generating a default interactive video of the virtual assistant according to the default image data when a default video acquisition request is received.

Step 460, sending the interactive video and/or the default interactive video to the client, so that the client plays the interactive video and/or the default interactive video.

Further, to improve efficiency, the default interactive video and the interactive video may be transmitted based on a CDN (Content Delivery Network) technology.

And step 470, receiving the user image data and the user operation instruction collected by the camera.

And step 480, generating an item experience video of the user according to the item information of the target item and the user operation instruction.

Step 490, sending the item experience video to the client, so that the client displays the default interactive video in the virtual assistant area of the interactive video, and displays the item experience video in the user experience area of the interactive video.

According to the technical scheme of the embodiment of the invention, the interactive video containing the virtual assistant of the target object is generated according to the interactive instruction of the user, so that when the user browses the target object, the interaction is carried out in a manner similar to video live broadcast, and the interestingness and the immersion of voice shopping are improved; when a user inputs an instruction or the video is played, the virtual assistant adopts a default image for displaying, so that the diversity of the states of the virtual assistant is improved, and the interestingness of the interactive video is further improved; meanwhile, a user experience function is provided, so that the trial requirement of the user is met, and the comprehensiveness of the user in recognizing the target object is improved.

EXAMPLE five

Fig. 5 is a flowchart of a method for generating an interactive video according to a fifth embodiment of the present invention, which is a further refinement of the third embodiment, and as shown in fig. 5, the method for generating an interactive video includes the following steps:

step 510, receiving a video acquisition request sent by a client.

And step 520, generating sound data of the virtual assistant according to the user interaction instruction and the item information.

Specifically, the user interaction instruction is converted into a user interaction text, and sound data is determined according to the user interaction text and the article information.

And step 530, generating basic image data and accessory data of the virtual assistant according to the article information.

Specifically, a second corresponding relationship among the article information, the basic image data and the accessory data may be pre-established, for example, the basic image data may be stored in a basic modeling library, the accessory data may be stored in an accessory hanging library, and the basic image data and the accessory data corresponding thereto may be determined from the basic modeling library and the accessory hanging library according to the second corresponding relationship and the article information, so as to perform task modeling or assembly of the virtual assistant according to the two data.

And 540, generating image modeling data of the virtual assistant according to the basic image data and the accessory data so as to determine the image of the virtual assistant according to the image modeling data.

And step 550, determining facial expression data of the virtual assistant according to the sound data.

Specifically, after the voice data or voice data of the virtual assistant is determined, the facial expression data of the virtual assistant is determined based on the voice-driven face recognition technology.

Further, based on a voice-driven face recognition technology, an expression time sequence of the interactive video is generated according to an audio track time sequence of the sound data, and therefore the interactive video of the target object is generated according to the expression time sequence, the model data and the sound data.

And 560, generating an interactive video of the target object according to the facial expression data, the image modeling data and the sound data.

And the virtual assistant is included in a video frame picture of the interactive video.

Step 570, sending the interactive video to the client, so that the client plays the interactive video.

According to the technical scheme of the embodiment of the invention, the interactive video of the virtual assistant for interaction is generated through the user interactive instruction and the article information, and the image data and the accessory data of the virtual assistant are generated and changed in real time according to the commodity information and the user instruction, so that the diversity of the virtual assistant image is improved; meanwhile, the facial expression of the virtual assistant is determined by adopting the voice data, the consistency of the voice and the picture in the video is ensured, the quality of the interactive video is improved, and the user interactive experience is further improved.

EXAMPLE six

Fig. 6 is a schematic structural diagram of an interactive video display apparatus according to a sixth embodiment of the present invention, and as shown in fig. 6, the apparatus includes: a video acquisition request generating module 610, an interactive video receiving module 620 and an interactive video presenting module 630.

The video obtaining request generating module 610 is configured to generate a video obtaining request when a user interaction instruction is detected, where the video obtaining request includes item information of a target item and the user interaction instruction; the interactive video receiving module 620 is configured to send the video obtaining request to the server and receive an interactive video of the target item generated by the server; an interactive video display module 630, configured to display the interactive video in a set area; the video frame picture of the interactive video comprises a virtual assistant, the interactive video is generated by the server according to sound data and model data of the virtual assistant, and the sound data and the model data are generated by the server according to the article information and the user interaction instruction.

Optionally, the model data includes basic image data, accessory data and facial expression data, so as to determine the image of the virtual assistant according to the basic image data and the accessory data, wherein the facial expression data is generated by the server according to the sound data, and the basic image data and the accessory data are generated by the server according to the article information.

Optionally, the interactive video display device further includes:

and the default video acquisition request generating module is used for generating a default video acquisition request when the user interaction instruction is a set instruction, or during the period of receiving the user interaction instruction, or when the interactive video is completely played, so as to acquire a default interactive video of the target commodity from a server according to the default video request.

Optionally, the interactive video display device further includes:

the user experience module is used for starting the camera when the received user interaction instruction is a user experience instruction, wherein the user experience instruction belongs to a set instruction; acquiring user image data and a user operation instruction of the user, and sending the user image data and the user operation instruction to a server so that the server generates an article experience video of the user according to the article information and the user operation instruction; and receiving the item experience video returned by the server, displaying the default interactive video in a virtual assistant area of the interactive video, and displaying the item experience video in a user experience area of the interactive video.

Optionally, the interactive video display device further includes:

the voice print payment module is used for receiving a user voice payment instruction and user binding information and sending the user voice payment instruction and the user binding information to a server so that the server verifies voice print data of a user according to the user voice payment instruction and generates random response information; acquiring user response information and sending the user response information to a server; and when the payment verification passing message fed back by the server is received, payment is carried out according to the article information and the user binding information.

The interactive video display device provided by the embodiment of the invention can execute the interactive video display method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

EXAMPLE seven

Fig. 7 is a schematic structural diagram of an interactive video generating apparatus according to a seventh embodiment of the present invention, and as shown in fig. 7, the apparatus includes: a video acquisition request receiving module 710, a sound and model data generating module 720, an interactive video generating module 730 and an interactive video transmitting module 740.

The video obtaining request receiving module 710 is configured to receive a video obtaining request sent by a client, where the video obtaining request includes item information of a target item and a user interaction instruction; a sound and model data generation module 720, configured to generate model data and sound data of the virtual assistant according to the item information and the user interaction instruction; an interactive video generating module 730, configured to generate an interactive video of the target item according to the model data and the sound data, where a video frame picture of the interactive video includes a virtual assistant; the interactive video sending module 740 is configured to send the interactive video to the client, so that the client plays the interactive video.

Optionally, the sound and model data generating module 720 includes:

the sound data generating unit is used for generating sound data of the virtual assistant according to the user interaction instruction and the article information; and the model data generating unit is used for generating model data of the virtual assistant according to the sound data and the article information.

Optionally, the model data generating unit is specifically configured to:

generating basic image data and accessory data of the virtual assistant according to the article information; generating image modeling data of the virtual assistant according to the basic image data and the accessory data so as to determine the image of the virtual assistant according to the image modeling data; and determining facial expression data of the virtual assistant according to the sound data.

Optionally, the sound and model data generating module 720 is specifically configured to:

Optionally, the interactive video generating apparatus further includes:

the default interactive video generation module is used for generating default image data of the virtual assistant according to the article information; and when a default video acquisition request is received, generating a default interactive video of the virtual assistant according to the default image data.

Optionally, the interactive video generating apparatus further includes:

the article experience video generation module is used for receiving user image data and user operation instructions collected by the camera; generating an article experience video of the user according to the article information of the target article and a user operation instruction; and sending the item experience video to a client side so that the client side displays the default interactive video in a virtual assistant area of the interactive video and displays the item experience video in a user experience area of the interactive video.

Optionally, the interactive video generating apparatus further includes:

the voice print payment verification module is used for receiving a user voice payment instruction and user binding information; verifying the voiceprint data of the user according to the user voice payment instruction; generating random response information and acquiring user response information; and when the user response information meets the set conditions, generating payment verification passing information, and sending the payment verification passing information to the client so that the client can pay according to the article information and the user binding information.

The interactive video generation device provided by the embodiment of the invention can execute the interactive video generation method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example eight

Fig. 8 is a schematic structural diagram of a terminal device according to an eighth embodiment of the present invention, as shown in fig. 8, the terminal device includes a processor 810, a memory 820, an input device 830, and an output device 840; the number of the device processors 810 may be one or more, and one processor 810 is taken as an example in fig. 8; the processor 810, the memory 820, the input device 830 and the output device 840 in the apparatus may be connected by a bus or other means, for example, in fig. 8.

The memory 820 is used as a computer-readable storage medium and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the interactive video generation method and/or the interactive video display method in the embodiment of the present invention (for example, the video capture request generation module 610, the interactive video receiving module 620, and the interactive video display module 630 in the interactive video display device, the video capture request receiving module 710, the sound and model data generation module 720, the interactive video generation module 730, and the interactive video sending module 740 in the interactive video generation device). The processor 810 executes various functional applications and data processing of the device by executing software programs, instructions and modules stored in the memory 820, that is, the above-described interactive video generation method and/or the above-described interactive video presentation method are/is implemented.

The memory 820 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 820 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 820 may further include memory located remotely from the processor 810, which may be connected to a device/terminal/server through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 830 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function controls of the apparatus. The output device 840 may include a display device such as a display screen.

Example nine

Fig. 9 is a schematic structural diagram of an interactive system according to a ninth embodiment of the present invention, and as shown in fig. 9, the interactive system includes a client 910 and a server 920, and the server 920 is in communication connection with the client 910.

The client 910 is configured to interact with a user, for example, receive a user interaction instruction of the user, generate a corresponding request to be sent to the server 920, and display related multimedia information, such as various interactive videos and pictures. The client 910 may execute the method for displaying the interactive video provided by any embodiment of the present invention. And the server 920 is configured to generate an interactive video related to the user interactive instruction and the target item according to various requests of the client 910, and send the interactive video to the client for playing. The server 920 may execute the interactive video generation method provided by any embodiment of the present invention.

Example ten

An embodiment of the present invention further provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a method for displaying an interactive video and/or a method for generating an interactive video, where the method for displaying an interactive video includes:

displaying the interactive video in a set area;

The interactive video generation method comprises the following steps:

Of course, the storage medium provided by the embodiment of the present invention contains computer-executable instructions, and the computer-executable instructions are not limited to the method operations described above, and may also perform the method for generating the interactive video provided by any embodiment of the present invention, and/or perform related operations in the method for presenting the interactive video.

From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which can be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the methods according to the embodiments of the present invention.

It should be noted that, in the embodiments of the interactive video generating device and the interactive video displaying device, the units and modules included in the embodiments are only divided according to functional logic, but are not limited to the above division, as long as the corresponding functions can be implemented; in addition, specific names of the functional units are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present invention.

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for displaying an interactive video is applied to a client, and is characterized by comprising the following steps:

displaying the interactive video in a set area;

the video frame picture of the interactive video comprises a virtual assistant, the interactive video is generated by a server according to sound data and model data of the virtual assistant, and the sound data and the model data are generated by the server according to the article information and a user interaction instruction; the model data comprises basic image data, accessory data and facial expression data;

wherein the method further comprises:

when the user interaction instruction is a set instruction, or during the period of receiving the user interaction instruction, or when the interactive video is played completely, generating a default video acquisition request so as to acquire a default interactive video of the target object from a server according to the default video acquisition request;

wherein the method further comprises:

when the received user interaction instruction is a user experience instruction, starting a camera, wherein the user experience instruction belongs to a set instruction;

acquiring user image data and a user operation instruction of the user, and sending the user image data and the user operation instruction to a server so that the server generates an article experience video of the user according to the article information and the user operation instruction;

receiving the item experience video returned by the server, displaying the default interactive video in a virtual assistant area of the interactive video, and displaying the item experience video in a user experience area of the interactive video; wherein, the client is a shopping guide display screen; the target item is a currently displayed item or a user selected item.

2. The method of claim 1, wherein the voice data is generated by a server according to the user interaction instruction and item information; the model data is generated by the server according to the sound data and the article information.

3. The method according to claim 1, wherein the avatar of the virtual assistant is determined according to the basic avatar data and accessory data, wherein the facial expression data is generated by a server according to the sound data, and the basic avatar data and accessory data are generated by a server according to the item information.

4. The method of claim 1, wherein the interactive video is generated by the server based on a voice-driven face recognition technology, wherein the expression timing of the interactive video is generated according to the track timing of the sound data, and wherein the expression timing, the model data and the sound data are generated.

5. The method of claim 1, further comprising:

receiving a user voice payment instruction and user binding information, and sending the user voice payment instruction and the user binding information to a server, so that the server verifies voiceprint data of a user according to the user voice payment instruction and generates random response information;

acquiring user response information and sending the user response information to a server;

and when the payment verification passing message fed back by the server is received, payment is carried out according to the article information and the user binding information.

6. A method for generating an interactive video is applied to a server side, and is characterized in that the method comprises the following steps:

generating model data and sound data of the virtual assistant according to the item information and the user interaction instruction; wherein the model data comprises basic image data, accessory data and facial expression data;

sending the interactive video to a client side so that the client side can play the interactive video;

wherein the method further comprises:

generating default image data of the virtual assistant according to the item information;

when a default video acquisition request is received, generating a default interactive video of the virtual assistant according to the default image data;

wherein the method further comprises:

receiving user image data and user operation instructions acquired by a camera;

generating an article experience video of the user according to the article information of the target article and a user operation instruction;

sending the item experience video to a client so that the client displays the default interactive video in a virtual assistant area of the interactive video and displays the item experience video in a user experience area of the interactive video; wherein, the client is a shopping guide display screen; the target item is a currently displayed item or a user-selected item.

7. The method of claim 6, wherein generating model data and sound data for a virtual assistant from the item information and user interaction instructions comprises:

generating sound data of the virtual assistant according to the user interaction instruction and the item information;

and generating model data of the virtual assistant according to the sound data and the item information.

8. The method of claim 7, wherein generating model data for the virtual assistant from the sound data and the item information comprises:

generating basic image data and accessory data of the virtual assistant according to the article information;

generating image modeling data of the virtual assistant according to the basic image data and the accessory data so as to determine the image of the virtual assistant according to the image modeling data;

and determining facial expression data of the virtual assistant according to the sound data.

9. The method of claim 6, wherein generating an interactive video of the target item from the model data and the sound data comprises:

generating an expression time sequence of the interactive video according to an audio track time sequence of the sound data based on a voice-driven face recognition technology;

and generating an interactive video of the target object according to the expression time sequence, the model data and the sound data.

10. The method of claim 6, further comprising:

receiving a user voice payment instruction and user binding information;

verifying the voiceprint data of the user according to the user voice payment instruction;

if the verification is passed, generating random response information and acquiring user response information;

and when the user response information meets the set conditions, generating payment verification passing information, and sending the payment verification passing information to the client so that the client can pay according to the article information and the user binding information.

11. A presentation apparatus for interactive video, wherein the presentation apparatus is applied to a client, the apparatus comprising:

wherein, interactive video's display device still includes:

a default video acquisition request generation module, configured to generate a default video acquisition request when the user interaction instruction is a setting instruction, or during receiving the user interaction instruction, or when the interactive video is completely played, so as to acquire a default interactive video of the target item from a server according to the default video acquisition request;

wherein, this interactive video's display device still includes:

the user experience module is used for starting the camera when the received user interaction instruction is a user experience instruction, wherein the user experience instruction belongs to a set instruction; acquiring user image data and a user operation instruction of the user, and sending the user image data and the user operation instruction to a server so that the server generates an article experience video of the user according to the article information and the user operation instruction; receiving the item experience video returned by the server, displaying the default interactive video in a virtual assistant area of the interactive video, and displaying the item experience video in a user experience area of the interactive video; wherein, the client is a shopping guide display screen; the target item is a currently displayed item or a user-selected item.

12. An apparatus for generating an interactive video, the apparatus comprising:

the sound and model data generation module is used for generating model data and sound data of the virtual assistant according to the article information and the user interaction instruction; the model data comprises basic image data, accessory data and facial expression data;

the interactive video sending module is used for sending the interactive video to a client so that the client can play the interactive video;

the interactive video generation device further comprises:

the default interactive video generation module is used for generating default image data of the virtual assistant according to the article information; when a default video acquisition request is received, generating a default interactive video of the virtual assistant according to the default image data;

the interactive video generation device further comprises:

the article experience video generation module is used for receiving user image data and user operation instructions collected by the camera; generating an article experience video of the user according to the article information of the target article and a user operation instruction; sending the item experience video to a client so that the client displays the default interactive video in a virtual assistant area of the interactive video and displays the item experience video in a user experience area of the interactive video; wherein, the client is a shopping guide display screen; the target item is a currently displayed item or a user-selected item.

13. A terminal device, characterized in that the device comprises:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of presenting an interactive video according to any one of claims 1-5 and/or the method of generating an interactive video according to any one of claims 6-10.

14. A storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform the method of presenting an interactive video according to any one of claims 1-5 and/or the method of generating an interactive video according to any one of claims 6-10.