CN115484521A

CN115484521A - Intelligent sound box, virtual image control method, equipment and medium

Info

Publication number: CN115484521A
Application number: CN202110602966.2A
Authority: CN
Inventors: 杜兆臣; 孟卫明; 王彦芳; 王月岭; 田羽慧
Original assignee: Hisense Group Holding Co Ltd
Current assignee: Hisense Group Holding Co Ltd
Priority date: 2021-05-31
Filing date: 2021-05-31
Publication date: 2022-12-16

Abstract

The application discloses an intelligent sound box, an avatar control method, equipment and a medium, in the embodiment of the application, when a touch sensor in the intelligent sound box recognizes touch operation, information of the touch operation is sent to a controller of the intelligent sound box, and the controller of the intelligent sound box responds to the information of the touch operation to control the avatar to act.

Description

Intelligent sound box, virtual image control method, equipment and medium

Technical Field

The application relates to the field of intelligent sound boxes and virtual projection, in particular to an intelligent sound box, a virtual image control method, equipment and a medium.

Background

Along with the rapid development of science and technology, more and more intelligent devices are presented in the market, and the intelligent sound box can play music, video and the like for people as one of the intelligent devices, and plays an increasingly important role in the life of people. In the prior art, the intelligent sound box is mainly divided into two types, one type is a sound box without a screen, the other type is a sound box with a screen, the sound box with the screen only has a common display, different video contents can be displayed in the display, the video contents are only displayed in an electronic screen, the great distance sense is brought to a user, and the user experience is reduced.

The application provides an intelligent sound box, intelligent sound box includes:

the touch sensor is configured to recognize touch operation, and if the touch operation is recognized, the touch sensor sends information of the touch operation to the controller;

the controller is configured to control an avatar action in response to the information of the touch operation.

The application provides an avatar control method, which is applied to a smart sound box and comprises the following steps:

if the touch operation is identified, determining the information of the touch operation;

and receiving the information of the touch operation and controlling the action of the virtual image.

The present application provides an electronic device comprising a processor and a memory, said memory being adapted to store program instructions, said processor being adapted to carry out the steps of the above-mentioned avatar control method when executing a computer program stored in the memory.

The present application provides a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above-described avatar control method.

In the embodiment of the application, when the touch sensor in the intelligent loudspeaker box recognizes the touch operation, the touch operation information is sent to the controller of the intelligent loudspeaker box, and the controller of the intelligent loudspeaker box responds to the touch operation information to control the virtual image action.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings may be obtained according to these drawings without inventive labor.

Fig. 1 is a detailed structural schematic diagram of an intelligent sound box according to some embodiments of the present application;

FIG. 2a is a schematic view of a multi-color card provided in some embodiments of the present application;

fig. 2b is a schematic illustration of an aerial image of a multi-color card under dim conditions according to some embodiments of the present application;

fig. 3 is a schematic structural diagram of a touch sensor according to some embodiments of the present application;

fig. 4 is a schematic illustration of an avatar aerial imaging provided in some embodiments of the present application;

FIG. 5 is a schematic diagram of a process for controlling an action of an avatar through a click operation according to some embodiments of the present application;

fig. 6 is a schematic diagram illustrating a control process of an avatar under a sliding operation according to some embodiments of the present application;

FIG. 7 is a schematic illustration of a process for voice-driving an avatar provided in some embodiments of the present application;

FIG. 8 is a schematic diagram of an audio file adjustment page provided by some embodiments of the present application;

FIG. 9 is a schematic diagram illustrating relationships between different touch operation types provided by some embodiments of the present application;

FIG. 10 is a process diagram of content interaction for a click operation provided by some embodiments of the present application;

FIG. 11 is a process diagram of content interaction for a swipe operation provided by some embodiments of the present application;

FIG. 12 is a schematic process diagram of an avatar control method according to some embodiments of the present application;

fig. 13 is a schematic structural diagram of an electronic device according to some embodiments of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the present application clearer, the present application will be described in further detail with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making any creative effort belong to the protection scope of the present application.

In this application embodiment, when touch operation is discerned to the touch sensor in the intelligence audio amplifier, give the controller of intelligent audio amplifier with the information transmission of this touch operation, this controller of intelligent audio amplifier responds to this touch operation's information, controls the avatar action, because in this application embodiment, the intelligence audio amplifier can be after discerning touch operation, control the avatar action, and then realized with user's interdynamic, improved user's experience.

In order to realize interaction with a user and improve user experience, some embodiments of the application provide an intelligent sound box, a virtual image control method, equipment and a medium.

Fig. 1 is a schematic structural diagram of a smart sound box 100 according to some embodiments of the present application, where the smart sound box 100 includes:

the touch control system comprises a touch sensor 101, wherein the touch sensor 101 is configured to recognize a touch operation and send information of the touch operation to a controller 102 if the touch operation is recognized;

the controller 102, the controller 102 is configured to control an avatar action in response to the information of the touch operation.

In order to realize interaction with a user through the intelligent sound box, in the application, the user can realize interaction with the intelligent sound box through touch operation, and specifically, the intelligent sound box is provided with a touch sensor which can identify the touch operation of the user. If the avatar is displayed on the display, the sensor is mounted inside the display. The information of the touch operation may be position information of a touch point. After the touch sensor recognizes the touch operation, the recognized touch operation information is sent to the controller.

And after receiving the information of the touch operation, the controller controls the action of the virtual image according to the information of the touch operation.

In order to realize the interaction with the user through the smart sound box, on the basis of the above embodiments, in this application, the smart sound box 100 further includes:

the display 103 is located inside the housing of the smart sound box 100, and a display surface of the display 103 and an upper surface of the housing of the smart sound box form a preset included angle;

negative refraction lens glass 104, negative refraction lens glass is located the upper surface of the casing of smart sound box 100, negative refraction lens glass 104 refracts the image that shows on the display surface of display 104 for aerial virtual image.

In this application, this avatar can show on the display of smart sound box, when touch operation, can touch the avatar on this display. In order to improve the interactivity with the user, the avatar may also be an aerial avatar, that is, the avatar displayed on the display of the smart speaker may be projected into the air, and imaged in the air, and when performing a touch operation, information of the touch operation when the touch operation is performed on the aerial avatar may be identified based on the touch sensor.

In order to image the avatar in the air, the smart speaker comprises a display for displaying the avatar, of course also pictures or videos, etc., and a negative refractive lens glass. In this application, place this display inside the casing, the upper surface of the display surface of this display and the casing of intelligent audio amplifier is preset contained angle. In the present application, to ensure that the image in the display is in a ratio of 1:1, the preset included angle may be 45 degrees. The preset included angle is obtained according to experience, and under the preset included angle, the aerial imaging effect which can be presented is better. In this application, this negative refraction lens glass is located the upper surface of this smart loudspeaker's casing, and this negative refraction lens glass refracts the image that shows on the display surface of display into aerial virtual image.

Because negative refraction lens glass receives the influence of illumination and display background easily, and leads to the formation of image effect not obvious, consequently, can be in advance to different illumination scenes, to different background colours, carry out the test experiment, wherein, this illumination scene can be normal indoor illumination, indoor dim condition and indoor dark condition etc.. In the application, a multi-color card can be used for carrying out a test experiment, the imaging effect of the multi-color card in the air is determined under various illumination conditions, and finally the color of the background of the display is determined.

Specifically, this intelligent sound box is whole to be a cylinder, inside the display was located the casing of this intelligent sound box, the size of this intelligent sound box's display adaptation size, and the display of this intelligent sound box is 45 degrees with the contained angle of the upper surface of intelligent sound box, this negative refraction lens glass is located the upper surface of intelligent sound box, this negative refraction lens glass is circular structure, in order to guarantee that the image in the display and the image of aerial formation of image are with 1:1, the included angle between the aerial image and the negative refraction lens glass is 45 degrees, and the size of the negative refraction lens glass and the size of the display meet the following formula:

wherein L is ₂ Is the diameter of the negative refractive lens glass, L ₁ The length of the display of the intelligent sound box.

In addition, in order to avoid the problem that the imaging effect is not obvious, a display of a high-definition screen can be selected, and the virtual image displayed on the display is a high-definition image or video.

Fig. 2a is a schematic view of a multi-color card provided by some embodiments of the present application, and fig. 2b is a schematic view of aerial imaging of a multi-color card provided by some embodiments of the present application in a dim condition, which will now be described with respect to fig. 2a and 2 b:

the multi-color card has a plurality of different colors including red, yellow, green and the like, and can determine which color has the best imaging effect according to the imaging result under the dim condition, and then the color of the display of the sound box screen is set to be the color with the best imaging effect. Specifically, the color with the best imaging effect is determined according to the experimental result.

In order to realize that the touch sensor can sense the information of the touch operation when the touch operation is performed on the aerial virtual image, in the application, the touch sensor is installed above the negative refraction lens glass, and the infrared emission surface of the touch sensor is parallel to the aerial imaging surface.

Fig. 3 is a schematic structural diagram of a touch sensor according to some embodiments of the present application, and is now described with reference to fig. 3:

the touch sensor has a red emitting surface, a plurality of holes capable of emitting infrared rays and holes capable of receiving infrared rays are formed in the infrared emitting surface of the touch sensor, the emitting surface of the touch sensor is parallel to the aerial imaging position, the length of the touch sensor is equal to the width of the display, and the touch sensor has positive and negative electrodes and is used for being connected with a main control board of the intelligent sound box.

In order to control the avatar action according to the information of the touch operation, on the basis of the above embodiments, in the present application, the controller 102 is configured to perform:

determining whether the aerial virtual image is touched or not according to the information of the touch operation, and determining the target type of the touch operation;

and if the aerial virtual image is determined to be touched, controlling the aerial virtual image to execute a preset action according to the touch operation of the target type.

The touch operation includes a plurality of touch operation types such as click operation, slide operation, multi-click operation and press operation, and the touched position may be any part of the aerial avatar or other positions outside the aerial avatar. In order to increase the diversity of interaction, different interactions can be performed for different touch operation types and different touch positions, that is, when the touched position is the aerial avatar, the aerial avatar can be controlled to execute different preset actions, and other operations can be performed at other positions of the aerial imaging except the aerial avatar. Therefore, after receiving the information of the touch operation, the controller of the smart sound box determines whether the aerial avatar is touched according to the information of the touch operation, and determines the target type corresponding to the touch operation.

The process of determining the target type corresponding to the touch operation according to the information of the touch operation is the prior art, and is not described herein again.

After the target type of the touch operation is determined, in order to perform corresponding interaction, in the application, the aerial avatar is controlled to execute a preset action according to the touch operation of the target type, wherein the preset action may be head shaking, hand stretching and the like.

In order to control the action of the corresponding part of the avatar, in the present application, on the basis of the above embodiments, the controller 102 is configured to:

if the target type is click operation, determining the action position of the click operation on the aerial virtual image, determining the target part to which the position belongs, and controlling the action of the target part of the aerial virtual image.

Since the types of the touch operations include a plurality of types, such as a click operation, a slide operation, a multi-click operation, a press operation, and the like, and in order to improve the variety of interactions with the user, the aerial avatar may be controlled to perform different preset actions according to different types of the touch operations.

Because the clicking operation can be performed at different positions of the aerial virtual image, that is, the action positions of the clicking operation on the aerial virtual image may be different, and the positions to which different positions belong are also different, the user experience can be improved in order to improve the diversity of interaction and improve the user experience, and different positions can be controlled to execute the preset action. Wherein the part may be a head, legs, torso, arms, etc. of the aerial avatar.

In the present application, after the type of the target is determined as the click operation, in order to determine what part of the action of the aerial avatar is to be controlled, in the present application, the action position of the click operation on the avatar is determined. After determining the action position of the clicking operation on the virtual image, determining a target part to which the position belongs, and controlling the action of the target part of the aerial virtual image.

For example, if the target part is the head, the virtual image can be controlled to perform head nodding action; when the target part is an arm, the virtual image can be controlled to do hand-waving motion; if the target part is the trunk, the virtual image can be controlled to do the belly kneading action; if the target portion is a leg, the avatar may be controlled to perform a running motion.

In order to improve the interactivity between the user and the smart speaker, on the basis of the above embodiments, in this application, the controller 102 is configured to perform:

determining the action position of the clicking operation on the aerial virtual image, determining a target part to which the position belongs, searching first voice information corresponding to the target part, and playing the first voice information.

In the present application, if the target type is a click operation, in addition to determining a target portion to which the position belongs based on an action position of the click operation on the aerial avatar, and controlling a target portion action of the aerial avatar, in order to implement interaction with a user through the smart speaker, in the present application, first voice information may be played. Specifically, when the first voice message to be played is determined, the action position of the virtual image is determined according to the clicking operation, the target part to which the position belongs is determined, and the corresponding relationship between each part and the voice message to be played is preserved in advance, so that after the target part is determined, the first voice message corresponding to the target part is searched and played.

For example, if the target site is the head, the content may be played as "you have a cerebellar melon smart as me |)! "first voice information; when the target part is an arm, the contents of playing are' hardworking hands are tools for creating all wealth! "first voice information; if the target part is the torso, the content can be played as "My belly so that the target part rolls because the target part is full of knowledge! "first voice information; if the target part is a leg part, the playing content is' loved life people, and the people can love sports! "is detected.

Fig. 4 is a schematic view of aerial imaging of an avatar according to some embodiments of the present application, and fig. 5 is a schematic view of a process for controlling an action of the avatar through a click operation according to some embodiments of the present application, which will now be described with reference to fig. 4 and 5:

the method comprises the steps that an avatar forms images in the air, wherein the avatar comprises a head part, legs, a trunk, arms and other parts, when a click operation of a touch operation on the avatar in the air is determined, the intelligent sound box judges a click position, namely, an action position of the click operation on the avatar is determined, and a target part to which the position belongs is determined. If the target part is determined to be the head part, the intelligent sound box controls the virtual image to perform head pointing action, and determines and plays first voice information corresponding to the head part according to the corresponding relation between each part and the audio file to be played, wherein the corresponding relation is stored in advance.

If the target part is determined to be the trunk area, the intelligent sound box controls the virtual image to carry out the belly rubbing action, and first voice information corresponding to the trunk is determined and played according to the corresponding relation between each part and the audio file to be played, wherein the corresponding relation is stored in advance.

If the target part is determined to be an arm area, the intelligent sound box controls the virtual image to swing a hand, and determines and plays first voice information corresponding to the arm according to the corresponding relation between each part and the audio file to be played, wherein the corresponding relation is stored in advance.

If the target part is determined to be the leg area, the intelligent sound box controls the virtual image to run, and determines and plays first voice information corresponding to the leg according to the corresponding relation of each part and the audio file to be played, wherein the corresponding relation is stored in advance.

And if the action position of the click operation is an invalid area, controlling the virtual image to keep the original state, namely, the intelligent sound box does not control the virtual image not to execute the preset action, wherein the invalid area is an area except the virtual image.

if the target type is a sliding operation, determining the direction of the sliding operation, and controlling the virtual image to execute a preset action corresponding to the direction.

If it is determined that the target type of the touch operation is the sliding operation, the sliding direction may be up-down sliding, left-right sliding, and the like, and therefore, in order to improve the diversity of the interaction and improve the experience of the user, different preset actions may be set for different directions of the sliding operation.

In the application, if the target type is determined to be the sliding operation, the direction of the sliding operation is determined, and after the sliding direction is determined, the aerial virtual image is controlled to execute the preset action corresponding to the direction.

For example, if the sliding direction is up-down sliding, the preset action corresponding to the up-down direction may be jumping and rolling; if the sliding direction is left-right sliding, the preset action corresponding to the up-down direction can be dancing and turning.

On the basis of the above embodiments, in this embodiment, the method further includes, in order to improve interactivity between the user and the avatar:

and searching and playing second voice information corresponding to the direction.

In the present application, if the target type is a sliding operation, in addition to controlling the aerial avatar to execute a preset action corresponding to the direction according to the sliding direction, in order to enhance the interactivity between the user and the avatar, in the present application, the second voice message may be played, specifically, when the second voice message to be played is determined, the sliding direction is determined according to the sliding operation, and the corresponding relationship between each direction and the audio file to be played is pre-stored, so that when the sliding direction is determined, the second voice message corresponding to the sliding direction is searched and played.

For example, if the sliding direction is up and down sliding, the playable content may be "rolling in the air to highlight its own things! "second voice information; if the sliding direction is left-right sliding, the playable content can be that "all beauty of life is better than the last dance! "second speech information.

Fig. 6 is a schematic diagram of a control process of an avatar under a sliding operation according to some embodiments of the present application, and the description will now be made with reference to fig. 6:

when the intelligent sound box determines that the touch operation is performed on the aerial imaging virtual image, the intelligent sound box judges the sliding direction, if the sliding direction is left-right sliding, the intelligent sound box controls the virtual image to perform circle-turning dancing action, and corresponding second voice information is searched and played according to the corresponding relation between each pre-stored direction and the audio file to be played. If the sliding direction is up-down sliding, the intelligent sound box controls the virtual image to roll, and corresponding second voice information is searched and played according to the corresponding relation between each direction and the audio file to be played, wherein the corresponding relation is stored in advance.

In order to implement the file playing function, on the basis of the foregoing embodiments, in this application, the controller 102 is configured to execute:

receiving a file playing instruction, wherein the file instruction carries the name of a file to be played and the type of the file;

and controlling the aerial virtual image to execute a preset action corresponding to the type of the file according to the type of the stored file, and playing the file with the name.

In the application, if the requirement of file playing exists, a file playing instruction can be sent to the intelligent sound box, wherein the type of the file can be a music file or a video file, and different preset actions can be set aiming at different types of files in order to increase the interaction between a user and the virtual image.

In the application, the smart sound box receives a file playing instruction, wherein the file instruction carries the name of the file to be played and the type of the file. After receiving the playing instruction, determining the type of the file, and determining the name of the file to be played according to the file playing instruction.

In the application, if the type of the file is determined, the aerial virtual image is controlled to execute the preset action corresponding to the type of the file, and the file with the name is determined and played according to the name of the file to be played carried in the file playing instruction.

For example, when there is a demand for music playing, after the smart speaker recognizes the file playing instruction, the type of the file carried in the file playing instruction is a music file, and the name of the file to be played carried in the file playing instruction is "blue and white porcelain", the virtual image may be controlled to perform an action of listening to music with headphones, and the music named "blue and white porcelain" may be played. When the video playing requirement exists, after the intelligent sound box identifies the file playing instruction, the type of the file carried in the file playing instruction is the video file, and the name of the file to be played carried in the file playing instruction is 'our country', the virtual image can be controlled to watch television, and the video named 'our country' is played.

Fig. 7 is a schematic process diagram of a voice-driven avatar provided by some embodiments of the present application, and is now described with respect to fig. 7:

before the intelligent sound box receives a file playing instruction, the intelligent sound box identifies the state of the sound box, if the file playing instruction is not received, the intelligent sound box controls the virtual image to keep eager action, namely the intelligent sound box controls the virtual image to perform the action expecting to receive the instruction, and if the file playing instruction is received, the intelligent sound box controls the virtual image to perform the action of listening to music or watching video and plays corresponding music or video.

In order to adjust the currently played audio file, on the basis of the foregoing embodiments, in this application, the controller 102 is configured to perform:

and if the other positions outside the aerial virtual image are determined to be touched and the touch operation is determined to be a preset operation, displaying an audio file adjusting page, and adjusting the currently played audio file after receiving an adjusting instruction.

In order to adjust the currently played audio file, the smart sound box may display an audio file adjustment page after receiving an audio file adjustment instruction, where the audio file adjustment instruction may be a voice instruction or an instruction sent based on an application program.

In order to improve the intellectualization of the smart sound box, in the present application, if there is a need for adjusting an audio file, a user may touch other positions in the aerial image except for the avatar, that is, the smart sound box recognizes that other positions in the aerial image are touched, and determines that the touch operation is a preset operation, an audio file adjustment page is displayed, where the audio file adjustment page also images in the air, and the principle of the imaging of the audio file in the air is the same as that of the imaging of the avatar in the air, which is not described herein again. The audio file adjustment page comprises a plurality of keys, such as an audio file volume adjustment key, an audio file pause key and an audio file switching key. The preset operation may be a vertical sliding operation, a left-right sliding operation, a pressing operation, or the like. In the present application, the preset operation is a left-right sliding operation at a position other than the avatar.

Furthermore, since the setting error of the touch sensor is ± 2mm, in order to avoid the click failure, the audio file may be adjusted to have a length and a width of a key in a page larger than 8mm.

Fig. 8 is a schematic diagram of an audio file adjustment page according to some embodiments of the present application, and is described with reference to fig. 8:

four keys are displayed on the audio file adjusting page, the uppermost key is a volume increasing key, the lowermost key is a volume decreasing key, the leftmost key is an audio file pause key, the rightmost key is an audio file switching key, and the intelligent sound box adjusts the audio file being played when recognizing that any one of the four keys is touched.

In the application, after the audio file adjustment page is displayed, if there is a need for audio file adjustment, an adjustment instruction may be sent to the smart sound box, where the adjustment instruction may be a volume adjustment instruction, an audio file pause instruction, and an audio file switching instruction. After receiving the adjustment instruction, the smart sound box adjusts the currently played audio file, and specifically, after receiving the volume adjustment instruction, the smart sound box adjusts the volume; when the intelligent sound box receives an audio file pause instruction, pausing a currently played audio file; and after receiving the audio file switching instruction, the intelligent sound box switches the currently played audio file.

Fig. 9 is a schematic diagram of a relationship between different touch operation types according to some embodiments of the present application, fig. 10 is a schematic diagram of a content interaction process of a click operation according to some embodiments of the present application, and fig. 11 is a schematic diagram of a content interaction process of a slide operation according to some embodiments of the present application, which will now be described with reference to fig. 9, fig. 10, and fig. 11:

the intelligent sound box comprises a controller, and particularly, the controller is applied to the intelligent sound box during voice driving, namely, after a file playing instruction is received, the controller controls the virtual image to play music or watch videos. In the air point touch interaction process, namely the process of touch operation based on air imaging, the controller of the intelligent sound box is applied, specifically, when a point touch state controls a virtual screen, namely, when a user clicks the virtual image of the air imaging, the touch sensor senses the clicked position and sends the clicked position to the controller of the intelligent sound box, and the controller of the intelligent sound box controls the virtual image to perform corresponding action according to the point touch position and sends out corresponding first voice information. When the virtual screen is operated in a sliding state, namely, when a user performs sliding operation on the aerial imaging avatar, the touch sensor senses a plurality of position messages which are coherent in one direction and sends the position messages to the controller of the intelligent sound box, the controller determines the direction of the sliding operation, namely the controller of the intelligent sound box judges the gesture direction, and then the avatar is controlled to perform corresponding action according to the direction of the sliding operation and corresponding second voice information is sent.

In order to implement interaction with a user and improve user experience, the present application provides an avatar control method, which is applied to a smart speaker, and fig. 12 is a process diagram of an avatar control method provided in some embodiments of the present application, where the process includes the following steps:

s1201: if the touch operation is identified, determining the information of the touch operation;

s1202: and controlling the action of the virtual image according to the information of the touch operation.

In a possible implementation manner, the controlling the avatar action according to the information of the touch operation includes:

In a possible implementation, the controlling the aerial avatar to perform a preset action according to the touch operation of the target type includes:

In one possible embodiment, the method further comprises:

and determining the action position of the clicking operation on the aerial virtual image, determining a target part to which the position belongs, searching first voice information corresponding to the target part, and playing the first voice information.

In a possible implementation manner, the controlling the aerial avatar to perform a preset action according to the touch operation of the target type includes:

and if the target type is a sliding operation, determining the direction of the sliding operation, and controlling the aerial virtual image to execute a preset action corresponding to the direction.

In one possible embodiment, the method further comprises:

The method is applied to the intelligent sound box, and specific processes of executing the virtual image control method by the intelligent sound box can refer to the other embodiments, and detailed contents are not repeated.

Fig. 13 is a schematic structural diagram of an electronic device according to some embodiments of the present application, and on the basis of the foregoing embodiments, an embodiment of the present application further provides an electronic device, as shown in fig. 13, including: the system comprises a processor 1301, a communication interface 1302, a memory 1303 and a communication bus 1304, wherein the processor 1301, the communication interface 1302 and the memory 1303 complete communication with each other through the communication bus 1304;

the memory 1303 stores therein a computer program that, when executed by the processor 1301, causes the processor 1301 to perform the steps of:

and controlling the action of the virtual image according to the information of the touch operation.

Further, the processor 1301 is further configured to determine whether the aerial avatar is touched according to the information of the touch operation, and determine a target type of the touch operation; and if the aerial virtual image is determined to be touched, controlling the aerial virtual image to execute a preset action according to the touch operation of the target type.

Further, the processor 1301 is further configured to determine, if the target type is a click operation, an action position of the click operation on the aerial avatar, determine a target portion to which the position belongs, and control the aerial avatar to perform the action on the target portion.

Further, the processor 1301 is further configured to determine an action position of the click operation on the aerial avatar, determine a target portion to which the position belongs, search for first voice information corresponding to the target portion, and play the first voice information.

Further, the processor 1301 is further configured to determine a direction of the sliding operation if the target type is the sliding operation, and control the aerial avatar to execute a preset action corresponding to the direction.

Further, the processor 1301 is further configured to search for second voice information corresponding to the direction and play the second voice information.

Further, the processor 1301 is further configured to receive a file playing instruction, where the file playing instruction carries a name of a file to be played and a type of the file; and controlling the aerial virtual image to execute a preset action corresponding to the type of the file according to the type of the stored file, and playing the file with the name.

Further, the processor 1301 is further configured to display an audio file adjustment page if it is determined that other positions outside the aerial avatar are touched and it is determined that the touch operation is a preset operation, and adjust the currently played audio file after receiving the adjustment instruction.

The communication interface 1302 is used for communication between the electronic device and other devices.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Alternatively, the memory may be at least one memory device located remotely from the aforementioned processor.

The Processor may be a general-purpose Processor, including a central processing unit, a Network Processor (NP), and the like; but may also be a Digital instruction processor (DSP), an application specific integrated circuit, a field programmable gate array or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.

On the basis of the foregoing embodiments, an embodiment of the present application provides a computer-readable storage medium, in which a computer program executable by a processor is stored, and when the program runs on the processor, the processor is caused to execute the following steps:

In a possible implementation, the controlling the avatar action according to the information of the touch operation includes:

determining whether the aerial avatar is touched according to the information of the touch operation, and determining the target type of the touch operation;

if the target type is click operation, determining the action position of the click operation on the aerial virtual image, determining the target part to which the position belongs, and controlling the aerial virtual image to act on the target part.

In one possible embodiment, the method further comprises:

if the target type is a sliding operation, determining the direction of the sliding operation, and controlling the aerial virtual image to execute a preset action corresponding to the direction.

In one possible embodiment, the method further comprises:

receiving a file playing instruction, wherein the file playing instruction carries the name of a file to be played and the type of the file;

In one possible embodiment, the method further comprises:

Since the principle of solving the problem by the computer readable medium is similar to that of the avatar control method, after the processor executes the computer program in the computer readable medium, the steps to be implemented may refer to the other embodiments, and the repeated parts are not described again.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A smart speaker, the smart speaker comprising:

2. The smart sound box of claim 1, further comprising:

the display is positioned inside the shell of the intelligent sound box, and a display surface of the display and the upper surface of the shell of the intelligent sound box form a preset included angle;

negative refraction lens glass, negative refraction lens glass is located the upper surface of the casing of intelligent audio amplifier, negative refraction lens glass will the image refraction that shows on the display surface of display is aerial virtual image.

3. The smart sound box of claim 2, wherein the controller is configured to perform:

4. The smart sound box of claim 3, wherein the controller is configured to perform:

5. The smart sound box of claim 3, wherein the controller is configured to perform:

6. The smart sound box of claim 1, wherein the controller is configured to perform:

7. The smart sound box of claim 6, wherein the controller is configured to perform:

8. An avatar control method, the method comprising:

9. An electronic device, characterized in that the electronic device comprises a processor and a memory, the memory being adapted to store program instructions, the processor being adapted to carry out the steps of the avatar control method of claim 8 when executing a computer program stored in the memory.

10. A computer-readable storage medium, characterized in that it stores a computer program which, when being executed by a processor, carries out the steps of the avatar control method of claim 8 above.