CN107182021A

CN107182021A - The virtual acoustic processing system of dynamic space and processing method in VR TVs

Info

Publication number: CN107182021A
Application number: CN201710331233.3A
Authority: CN
Inventors: 王杰; 张婷婷
Original assignee: Guangzhou Chuangsheng Technology Co Ltd
Current assignee: Guangzhou Chuangsheng Technology Co Ltd
Priority date: 2017-05-11
Filing date: 2017-05-11
Publication date: 2017-09-19

Abstract

The invention discloses the virtual acoustic processing system of dynamic space in a kind of VR TVs and method, the processing system includes VR TV remote controllers, first processor, calling module and second processor.Wherein first processor is according to the head orientation angle information of the setting angle calculation user of reference coordinate, Orientation differences command information and virtual speaker.Calling module read head orientation angles information and the setting angle of virtual speaker draw the relative azimuth angle between head and virtual speaker；From head-position difficult labor database recall corresponding to relative azimuth angle transfer function or match with closest to the corresponding transfer function of certain relative azimuth angle in head-position difficult labor database.Second processor carries out process of convolution to generate the first path signal and alternate path signal according to transfer function to input signal.The position that the present invention is moved according to spectators user exports corresponding audio signal, and then is conducive to improving the immersion experience of spectators user.

Description

Dynamic space virtual sound processing system and method in VR television

Technical Field

The invention relates to a data processing technology, in particular to a dynamic space virtual sound processing system and a dynamic space virtual sound processing method in a Virtual Reality (VR) television.

Background

With the advent of VR television, the traditional mode of viewing television by audience users has changed significantly, and when a user watches VR television, the audio signal of the VR is typically played back by an audio playing device (e.g., headphones) to play back the sound in the virtual reality. The audience user can control the television screen picture through the remote controller so as to change the watching visual angle, and when watching the virtual reality image, the action behavior of the audience user can also change along with the change of the scene and plot of the virtual reality. For example, in a virtual reality scenario, a player in a virtual football game scenario hits a ball at a court and runs, or in a virtual tv drama war scenario, a gunshot or explosion sound is suddenly sounded behind a spectator user, and the user's head often turns naturally. However, when the action behavior of the audience user changes, the orientation of the sound source in the virtual reality has changed for the user, but the orientation of the sound source played back in the audio playing device (such as a headphone) worn by the audience user has not changed correspondingly, so that the immersion created by the virtual reality is greatly influenced, and the experience effect of the user is reduced.

Disclosure of Invention

In view of the above, there is a need to provide a dynamic spatial virtual sound processing system and method in VR television that can improve the user experience.

A dynamic spatial virtual sound processing system in a VR television, comprising:

the VR television remote controller is used for outputting azimuth change instruction information corresponding to a VR picture watched by a user;

the first processor is used for acquiring the direction change instruction information output by the VR television remote controller and setting a reference coordinate according to a trigger condition and the direction change instruction information; the first processor is also used for calculating the head position angle information of the user according to the reference coordinate, the position change instruction information and the setting angles of the plurality of virtual loudspeakers;

the calling module is used for reading the head azimuth angle information and the setting angles of the plurality of virtual speakers to obtain the relative azimuth angle between the current head and each virtual speaker; the calling module is also used for calling out a transmission function corresponding to the relative azimuth angle or matching the transmission function to a transmission function corresponding to a relative azimuth angle closest to the head-related transmission function database according to a head-related transmission function database; and

and the second processor is used for receiving an input signal and carrying out convolution processing on the input signal according to the transmission function so as to generate a first channel signal and a second channel signal corresponding to a playing device.

A method of dynamic spatial virtual sound processing in a VR television, comprising:

outputting direction change instruction information corresponding to a certain VR picture watched by a user through a VR television remote controller;

acquiring azimuth change instruction information output by the VR television remote controller;

setting a reference coordinate according to a triggering condition and the direction change instruction information;

calculating the head azimuth angle information of the user according to the reference coordinate, the azimuth change instruction information and the setting angles of the plurality of virtual loudspeakers;

reading the head azimuth angle information and the setting angles of the plurality of virtual speakers to obtain the relative azimuth angle between the current head and each virtual speaker;

calling out a transmission function corresponding to the relative azimuth angle according to a head-related transmission function database or matching the transmission function corresponding to a relative azimuth angle closest to the head-related transmission function database; and

receiving an input signal, performing convolution processing on the input signal according to the transmission function of each virtual loudspeaker, and generating a first channel signal and a second channel signal corresponding to a playing device.

Further, the head azimuth angle information includes a horizontal angle and an elevation angle, and the processing method further includes: when the audience user enters the initial time of watching the VR television program, the horizontal angle and the elevation angle contained in the direction change instruction information output by the VR television remote controller are initialized.

Further, in one case, the input signal includes a dual-channel stereo audio signal, the dual-channel stereo audio signal has a left virtual speaker and a right virtual speaker, and the left virtual speaker and the right virtual speaker are respectively disposed in front of a left side and a right side of the user, and the processing method further includes:

the convolution processing of the input signal according to the transfer function of each virtual loudspeaker to obtain a first channel signal comprises the following steps:

performing convolution processing on the input signal according to the transfer function of each virtual loudspeaker to obtain a second path signal comprises:

wherein L is a first path signal, R is a second path signal,representing a convolution operation, l being the left-path signal in the input signal, r being the right-path signal in the input signal, hrir_l(θ₀θ, φ) corresponds to the right virtual speaker-left ear transfer function, hrir_r(θ₀-theta, phi) corresponds to the right virtual speaker-right ear transfer function,corresponding to the left virtual speaker-left ear transfer function,corresponding to the transfer function of left virtual speaker-right ear, where₀＝30°。

In another aspect, the input signal includes a multi-channel surround audio signal, the multi-channel surround audio signal includes a front left virtual speaker, a front right virtual speaker, a middle virtual speaker, a rear left virtual speaker, a rear right virtual speaker and a bass virtual speaker, wherein the front left virtual speaker, the front right virtual speaker, the middle virtual speaker, the rear left virtual speaker and the rear right virtual speaker are respectively disposed in front left, front right, rear left and rear right of the user, and the processing method further includes:

wherein L is a first path signal, R is a second path signal,representing a convolution operation, l being a left channel signal in the input signal, r being a right channel signal in the input signal, ls being a left surround sound signal in the input signal, rs being a right surround sound signal in the input signal, c being a center channel signal in the input signal, lfe being a bass channel signal in the input signal,corresponding to the transfer function of the front right virtual speaker-left ear,corresponding to the right front virtual speaker-right ear transfer function,corresponding to the left front virtual speaker-left ear transfer function,corresponding to the left front virtual speaker-right ear transfer function,corresponding to the right rear virtual speaker-left ear transfer function,corresponding to the right rear virtual speaker-right ear transfer function,corresponding to the left rear virtual speaker-left ear transfer function,corresponding to the left rear virtual speaker-right ear transfer function, where₀＝30°，θ_s＝110°±10°。

The invention relates to a dynamic space virtual sound processing system and a processing method in a VR television, which are created by the invention, by obtaining the relative azimuth angle between the current head of a viewer user and each virtual loudspeaker, calling out a transmission function corresponding to the relative azimuth angle from a head related transmission function database or matching the transmission function with a transmission function closest to a certain relative azimuth angle in the head related transmission function database, receiving an input signal, and carrying out convolution processing on the input signal according to the transmission function of each virtual loudspeaker to generate a first channel signal and a second channel signal corresponding to a playing device, so that the corresponding audio signals can be output according to the moving position of the viewer user, and the immersive experience of the viewer user can be further improved.

Drawings

FIG. 1 is a block diagram of a dynamic spatial virtual sound processing system in a VR television of the present invention;

FIG. 2 is a block diagram of the preferred embodiment of FIG. 1;

FIG. 3 is a schematic diagram of the azimuth angle corresponding to the application of the dynamic spatial virtual sound processing system in a VR television according to the present invention to a stereo audio signal;

FIG. 4 is a schematic diagram of the azimuth angle of the dynamic spatial virtual sound processing system in a VR television according to the present invention applied to a surround sound audio signal;

FIG. 5 is a diagram of a preferred embodiment of the second processor of FIG. 2 for convolution processing of a stereo audio signal;

FIG. 6 is a diagram of a preferred embodiment of the second processor of FIG. 2 for performing convolution processing on a surround audio signal;

FIG. 7 is a flow chart of a preferred embodiment of a method for dynamic spatial virtual sound processing in a VR television in accordance with the present invention.

Description of the main elements

First processor	20
		Second processor	50
VR television remote controller	10
		Calling module	30
Input signal	40
		Play device	60
Rendering device	200
		Real-time convolution module	500

Detailed Description

Referring to fig. 1, a preferred embodiment of a dynamic spatial virtual sound processing system in a VR television of the present invention comprises:

a VR television remote controller 10 for outputting an orientation change instruction message corresponding to a VR picture viewed by a user;

the first processor 20 is configured to obtain the direction change instruction information output by the VR television remote controller 10, and set a reference coordinate according to a trigger condition and the direction change instruction information; the first processor 20 is further configured to calculate head orientation angle information of the user according to the reference coordinate, the orientation change instruction information, and the setting angles of the plurality of virtual speakers;

the calling module 30 is configured to read the head azimuth angle information and the setting angles of the plurality of virtual speakers to obtain a relative azimuth angle between the head of the current user and the virtual speakers; the calling module 30 is further configured to call out a transfer function corresponding to the relative orientation angle or match a transfer function corresponding to a relative orientation angle closest to a head related transfer function database according to the head related transfer function database; and

the second processor 50 is configured to receive an input signal 40, and the second processor 50 is further configured to perform convolution processing on the input signal 40 according to the transfer function to generate a first path signal and a second path signal corresponding to a playback device 60. The playing device 60 is used for playing back the audio signal obtained after processing by the second processor 50.

In this embodiment, the orientation change instruction information output by the VR television remote control 10 includes a step signal representing a change in orientation such as left/right/up/down corresponding to a change in angle of view of the VR television screen. The VR tv remote controller 10 is generally provided with a sensor, and the sensor senses the motion information of the user and converts the motion information to obtain the step signal, which is transmitted to the first processor 20.

Referring to fig. 2, in the preferred embodiment, the first processor 20 includes a renderer 200, the renderer 200 obtains the direction change instruction information (such as step signal) output by the VR tv remote controller 10, and sets a reference coordinate according to a trigger condition and the direction change instruction information; the renderer 200 calculates the current head azimuth angle information of the user according to the reference coordinate, the azimuth change instruction information and the setting angles of the plurality of virtual speakers, wherein the head azimuth angle information comprises a horizontal angle theta and an elevation angle phi, the horizontal angle theta is a horizontal included angle between the current sight of the audience user and the main axis direction of the VR television main camera, and the main axis direction of the main camera is generally positioned on the horizontal plane during shooting; the elevation angle phi is an included angle between the current sight of the audience user and a horizontal plane where the main axis direction of the VR television main camera is located. The first processor 20 calculates the current head azimuth angle information (including the horizontal angle θ and the elevation angle φ) of the user through the renderer 200 and transmits the information to the calling module 30. Further, in the present embodiment, when the user watches the initial time of the VR television program, the renderer 200 sets the received motion information as the reference coordinate. For example, when the viewer user enters the initial time of watching the VR television program, the renderer 200 positions the orientation of the viewer user right ahead (i.e., initializes the azimuth change instruction information output by the VR television remote controller 10), and if the renderer 200 calculates the horizontal angle θ and the elevation angle Φ included in the current head azimuth angle information of the viewer user to 0 degree and 0 degree according to the reference coordinates, the azimuth change instruction information, and the setting angles of the virtual speakers at this time, the view level line of the viewer user is set to be parallel to the main axis direction (i.e., the earth horizontal plane) of the VR television main camera. In other embodiments, the user may also set the reference coordinate by a function key, for example, when the function key is triggered, the renderer 200 sets the action information received at this time as the reference coordinate.

Referring to fig. 2, in the present embodiment, the invoking module 30 reads the current head azimuth angle information (θ, Φ) of the user calculated by the renderer 200, and superimposes the head azimuth angle information (θ, Φ) and the setting angles of the plurality of virtual speakers to obtain the relative azimuth angle between the head of the current user and each virtual speaker.

Referring to fig. 2, in the present embodiment, the second processor 50 includes a real-time convolution module 500, which receives an input signal 40, and performs real-time convolution on the input signal 40 according to a Head Related Transfer Function (HRTF) of each virtual speaker output by the calling module 30 to generate a first channel signal and a second channel signal corresponding to a playback device 60. The real-time convolution module 500 can be implemented by a DSP (Digital signal processing) chip.

In this embodiment, the playing device 60 is an earphone worn by the viewer user for playing back the audio signal processed by the second processor 50. The playback device 60 is used to play back a two-channel stereo audio signal or a multi-channel surround audio signal. The dual channel stereo sound may have a left virtual speaker and a right virtual speaker, wherein the left virtual speaker and the right virtual speaker are disposed at the left front and the right front of the user, respectively; the surround sound of the multi-channel can be the surround sound of a 5.1 channel, and is provided with a left front virtual loudspeaker, a right front virtual loudspeaker, a middle virtual loudspeaker, a left rear virtual loudspeaker, a right rear virtual loudspeaker and a bass virtual loudspeaker (also called a subwoofer), wherein the left front virtual loudspeaker, the right front virtual loudspeaker, the middle virtual loudspeaker, the left rear virtual loudspeaker and the right rear virtual loudspeaker are respectively arranged at the left front part, the right front part, the left rear part and the right rear part of an audience user.

Referring to fig. 3, for a dual channel stereo audio signal, initially, the horizontal angle between the left virtual speaker l and the front center axis is (360 ° - θ)₀) The horizontal angle between the right virtual loudspeaker r and the right front central axis is theta₀. That is, for dual channel stereo, the left virtual speaker/is set at an angle of (360-theta)₀) The setting angle of the right virtual speaker r is theta₀. In the present embodiment, θ is in the international stereo standard₀30 ° is set. Thus, the calling module 30 is used for setting the angle (360-theta) of the left virtual speaker according to the current head azimuth angle information (theta, phi) of the user₀) Setting angle theta of right virtual loudspeaker₀Performing superposition processing to obtain: the relative azimuth angles between the head of the current user and the left virtual loudspeaker l and the right virtual loudspeaker r are respectively (360-theta)₀-θ,φ)，(θ₀-θ,φ)。

Referring to fig. 4, for the multi-channel (taking 5.1 channels as an example) surround sound audio signal, initially, the horizontal angle between the front left virtual speaker l and the front center axis is (360 ° - θ)₀) The horizontal angle between the right front virtual loudspeaker r and the right front central axis is theta₀The middle virtual loudspeaker c is arranged at 0 degree right in front of the user (not shown), and the horizontal angle between the left rear virtual loudspeaker ls and the right front central axis is (360-theta)_s) The horizontal angle between the right rear virtual loudspeaker rs and the central axis of the right front is theta_s. In the present embodiment, θ is recommended in the surround sound international standard_s110 ° ± 10 °. Namely, the setting angle of the front left virtual loudspeaker is (360-theta)₀) The setting angle of the right front virtual loudspeaker r is theta₀The setting angle of the middle virtual loudspeaker c is 0 degree (not shown in the figure), and the setting angle of the left rear virtual loudspeaker ls is (360-theta)_s) The setting angle of the right rear virtual loudspeaker rs is theta_s. Thus, the relative azimuth angles of the head of the current user and the left front virtual speaker l, the right front virtual speaker r, the left rear virtual speaker ls, and the right rear virtual speaker rs are (360 ° - θ)₀-θ,φ)，(θ₀-θ,φ)，(360°-θ_s-theta, phi) and (theta)_s- θ, φ). In this embodiment, for the middle virtual speaker c and the bass virtual speaker lfe, the calling module 30 will send the path signals of the middle virtual speaker c and the bass virtual speaker lfeMultiplied by 0.707 (i.e.) And the rear signals are respectively fed back to the front left front virtual loudspeaker l and the front right virtual loudspeaker r.

In this embodiment, the calling module 30 can locate the sound effect of each virtual speaker. The calling module 30 calls a Transfer Function (HRTF) corresponding to the relative azimuth angle or a Transfer Function matched to a relative azimuth angle closest to the Head-Related Transfer Function database according to a Head-Related Transfer Function (HRTF) database, and outputs a Transfer Function (HRTF) of each virtual speaker.

Referring to fig. 5, for a dual-channel stereo audio signal, the input signal 40 has a left channel signal l and a right channel signal r. The calling module 30 calls a time-domain form of a transfer function corresponding to the relative orientation angle from a head-related transfer function (HRTF) database(corresponding to the right virtual speaker-left ear),(corresponding to the right virtual speaker-right ear),(corresponding to the left virtual speaker-left ear),(for left virtual speaker-right ear) or to the transfer function time domain form corresponding to a relative azimuth angle closest to the head-related transfer function database. The real-time convolution module 500 performs real-time convolution calculation on the input signal 40 and the calling value of the corresponding relative azimuth angle to obtain a processed first path signal L and a processed second path signal R, and outputs the first path signal L and the second path signal R to the broadcastA device 60, wherein:

wherein,representing a convolution operation, l being the left-path signal in the input signal, r being the right-path signal in the input signal, hrir_l(θ₀θ, φ) corresponds to the right virtual speaker-left ear transfer function, hrir_r(θ₀-theta, phi) corresponds to the right virtual speaker-right ear transfer function,corresponding to the left virtual speaker-left ear transfer function,corresponding to the transfer function of the left virtual speaker-right ear, θ₀＝30°。

Referring to fig. 6, when performing convolution operation on an audio signal of a surround sound (taking 5.1 path as an example), the calling module 30 calls the time domain form of the transfer function corresponding to the relative azimuth angle from the head-related HRTF database(corresponding to the front right virtual speaker-left ear),(corresponding to the front right virtual speaker-right ear),(corresponding to the left front virtual speaker-left ear),(corresponding to the left front virtual speaker-right ear),(corresponding to the rear right virtual speaker-left ear),(corresponding to the right rear virtual speaker-right ear),(corresponding to the left rear virtual speaker-left ear),(corresponding to the left rear virtual speaker-right ear) or to the transfer function time-domain form corresponding to a relative azimuth angle closest to the head-related HRTF database. The real-time convolution module 500 performs real-time convolution calculation on the input signal 40 and the calling value of the corresponding relative azimuth angle to obtain a processed first path signal L and a processed second path signal R, and outputs the processed first path signal L and the processed second path signal R to the playing device 60, where:

wherein,representing convolution operation, l is a left channel signal in the input signal, r is a right channel signal in the input signal, ls is a left surround sound signal in the input signal, rs is a right surround sound signal in the input signal, c is a middle channel signal in the input signal, and lfe is a low tone signal in the input signalThe signal of the path is sent to the receiver,corresponding to the transfer function of the front right virtual speaker-left ear,corresponding to the right front virtual speaker-right ear transfer function,corresponding to the left front virtual speaker-left ear transfer function,corresponding to the left front virtual speaker-right ear transfer function,corresponding to the right rear virtual speaker-left ear transfer function,corresponding to the right rear virtual speaker-right ear transfer function,corresponding to the left rear virtual speaker-left ear transfer function,corresponding to the left rear virtual speaker-right ear transfer function, where₀＝30°，θ_s＝110°±10°。

Referring to fig. 7, a preferred embodiment of the dynamic spatial virtual sound processing method in VR tv of the present invention includes the following steps:

calling out a transmission function corresponding to the relative azimuth angle according to a head-related transmission function database or matching the transmission function corresponding to a relative azimuth angle closest to the head-related transmission function database;

Further, the head azimuth angle information includes a horizontal angle and an elevation angle, and the dynamic spatial virtual sound processing method in the VR television further includes: when the audience user enters the initial time of watching the VR television program, the horizontal angle and the elevation angle contained in the direction change instruction information output by the VR television remote controller are initialized.

The dynamic space virtual sound processing system and the processing method in the VR television acquire the relative azimuth angle between the current head of the audience user and each virtual loudspeaker, call out the transmission function corresponding to the relative azimuth angle from a head related transmission function database or match the transmission function corresponding to a relative azimuth angle closest to the head related transmission function database, receive an input signal, and perform convolution processing on the input signal according to the transmission function of each virtual loudspeaker to generate a first channel signal and a second channel signal corresponding to a playing device.

The invention has the following advantages:

1. the visual angle change of the audience can be tracked in real time, and the immersion sense of the virtual reality is enhanced.

2. Has wide application range. The system is applicable to headphone virtual reality playback of stereo and surround sound signals, including substantially audio signals in existing television inventory programs.

It should be noted that the terms "first," "second," and the like in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A dynamic spatial virtual sound processing system in a VR television, comprising:

2. The dynamic spatial virtual sound processing system in a VR television of claim 1, wherein the head azimuth information includes a horizontal angle and an elevation angle, and the first processor initializes the horizontal angle and the elevation angle included in the azimuth change instruction information output by the VR television remote controller when the viewer user enters an initial time of watching the VR television program.

3. The dynamic spatial virtual sound processing system in a VR television of claim 1 or 2, wherein the input signal includes a two-channel stereo audio signal having a left virtual speaker and a right virtual speaker, wherein the left virtual speaker and the right virtual speaker are respectively disposed at a left front and a right front of a user.

4. The dynamic spatial virtual sound processing system in a VR television of claim 3, wherein the first path signal resulting from the second processor convolving the input signal with the transfer function of each virtual speaker comprises:

the second processor performs convolution processing on the input signal according to the transfer function of each virtual loudspeaker to obtain a second channel signal, and the second channel signal comprises:

5. The dynamic spatial virtual sound processing system in a VR television of claim 1 or 2, wherein the input signal includes an audio signal of a multi-channel surround sound having a front left virtual speaker, a front right virtual speaker, a center virtual speaker, a rear left virtual speaker, a rear right virtual speaker, and a bass virtual speaker, wherein the front left virtual speaker, the front right virtual speaker, the center virtual speaker, the rear left virtual speaker, and the rear right virtual speaker are respectively disposed at front left, front right, rear left, and rear right of a user.

6. The dynamic spatial virtual sound processing system in a VR television of claim 5, wherein the first path signal resulting from the second processor convolving the input signal with the transfer function of each virtual speaker comprises:

7. A method for dynamic spatial virtual sound processing in a VR television, comprising:

8. The dynamic spatial virtual sound processing method in a VR television of claim 7, wherein the head azimuth angle information includes a horizontal angle and an elevation angle, the processing method further comprising:

when the audience user enters the initial time of watching the VR television program, the horizontal angle and the elevation angle contained in the direction change instruction information output by the VR television remote controller are initialized.

9. The dynamic spatial virtual sound processing method in a VR television of claim 7 or 8, wherein the input signal includes a two-channel stereo audio signal having a left virtual speaker and a right virtual speaker, wherein the left virtual speaker and the right virtual speaker are respectively disposed at a left front side and a right front side of a user, the processing method further comprising:

10. The dynamic spatial virtual sound processing method in a VR television of claim 7 or 8, wherein the input signal includes an audio signal of a multi-channel surround sound having a front left virtual speaker, a front right virtual speaker, a center virtual speaker, a rear left virtual speaker, a rear right virtual speaker, and a bass virtual speaker, wherein the front left virtual speaker, the front right virtual speaker, the center virtual speaker, the rear left virtual speaker, and the rear right virtual speaker are respectively disposed at front left, front right, rear left, and rear right of a user, the processing method further comprising: