CN113993057A

CN113993057A - Sound field self-adaption system, method and storage medium based on audio real-time positioning technology

Info

Publication number: CN113993057A
Application number: CN202111242898.XA
Authority: CN
Inventors: 陈锐志; 叶锋; 徐诗豪; 钱隆; 李正
Original assignee: Zhejiang Deqing Zhilu Navigation Technology Co ltd
Current assignee: Zhejiang Deqing Zhilu Navigation Technology Co ltd
Priority date: 2021-10-25
Filing date: 2021-10-25
Publication date: 2022-01-28

Abstract

The invention belongs to the technical field of electronic information technology and indoor positioning, and discloses a sound field self-adaptive system, a sound field self-adaptive method and a sound field self-adaptive storage medium based on an audio real-time positioning technology, wherein an intelligent loudspeaker node plays an audio-video stream on demand and plays different sound channels according to the characteristics of a data stream; broadcasting the audio positioning signal modulated in a predetermined manner outwards at fixed time periods and intervals; receiving signal feedback of the electronic terminal and adjusting the playing time delay of the video stream passing through the intelligent loudspeaker node; and the electronic terminal carries out denoising processing on the received audio positioning signal, carries out audio positioning signal demodulation and distance difference calculation at the same time, and feeds back the result to each intelligent loudspeaker node. The invention can adaptively carry out accurate time delay processing on the sound production of each sound box so as to ensure that the sound produced by each sound box can reach the position of a listener at the same time and realize the adaptive adjustment of the optimal listening position.

Description

Sound field self-adaption system, method and storage medium based on audio real-time positioning technology

Technical Field

The invention belongs to the technical field of electronic information technology and indoor positioning, and particularly relates to a sound field self-adaptive system, a sound field self-adaptive method and a sound field self-adaptive storage medium based on an audio real-time positioning technology.

Background

Currently, for sound systems, when the speaker nodes are fixed, an optimum listening position is formed, which is commonly referred to as the "emperor's position". In order to set the specific location as the optimum listening location, the placement locations of the speaker nodes are usually adjusted according to the acoustic environment of the listener, or the sounding time of each speaker node is adjusted by a central control system in the sound system.

The key point for realizing the sound field self-adaptation is to measure the real-time distance between a loudspeaker node and an electronic terminal, and the traditional distance measurement method based on signals such as Bluetooth or Wi-Fi and the like needs fine-grained channel information, and the highest distance measurement precision is only a meter level and does not meet the requirement of the sound field self-adaptation. Some other distance measurement technologies based on UWB signals have no universality, and not only need to add a special transceiver unit to a speaker and perform complex logic management and coordination control, but also need a special electronic terminal to be adapted to the speaker, and thus, a popular intelligent terminal cannot be used.

Through the above analysis, the problems and defects of the prior art are as follows: the traditional ranging method based on signals such as Bluetooth or Wi-Fi requires fine-grained channel information, the ranging precision is not high, and the requirement of sound field self-adaption is not met. Other ranging techniques or methods increase the complexity of the system and have specific requirements on the terminal.

The difficulty in solving the above problems and defects is:

the traditional signal specially used for short-distance communication has disadvantages for measurement in indoor scenes for a long time, environment diversity, signal bandwidth and other communication specified special conditions limit the accuracy and stability when the signal is used for measuring distance. Some signal frequency bands or techniques dedicated to high-precision measurement still have cost and device compatibility limitations in a short time.

The significance of solving the problems and the defects is as follows:

for a sound system, there are fundamental conditions for sound production itself. Through the improvement of software logic and simple hardware modification, such as adding a low-cost wireless communication unit, the high-precision distance measurement effect based on the audio signal can be realized.

Disclosure of Invention

Aiming at the problems in the prior art, the invention provides a sound field self-adaptive system, a sound field self-adaptive method and a sound field self-adaptive storage medium based on an audio real-time positioning technology.

The invention is realized in such a way that a sound field adaptive system based on an audio real-time positioning technology comprises the following components:

the intelligent loudspeaker comprises an intelligent loudspeaker node and an electronic terminal;

the intelligent loudspeaker node is used for playing the video and audio stream and playing different sound channels according to the characteristics of the data stream; for playing out the audio positioning signal modulated in a predetermined manner at fixed time periods and intervals; the intelligent speaker node is used for receiving the audio and video stream broadcast time delay of the electronic terminal;

and the electronic terminal is used for denoising the received audio positioning signal, demodulating the audio positioning signal and calculating the distance difference, and feeding the result back to each intelligent loudspeaker node.

Furthermore, time synchronization is carried out between the intelligent loudspeaker nodes through a radio frequency unit.

Furthermore, the intelligent loudspeaker node is provided with an audio interface, an analog-to-digital conversion module, a processor, a digital-to-analog conversion module, a power amplification module, a loudspeaker, a radio frequency transceiving processor and an antenna;

the processor is used for performing video stream playing, wireless communication, time synchronization, positioning signal emission and active delay control by utilizing the synchronous signal processing, the time management logic, the signal modulation processing logic and the sounding management logic.

Further, the frequency of the audio positioning signal is within the range of the audible band and the ultrasonic band near the audible band, and can be modulated in frequency.

Furthermore, the electronic terminal is an intelligent device comprising a microphone;

the intelligent device comprises a special label device matched with the system and a smart phone/smart wearable device provided with a matched application program.

Another object of the present invention is to provide an audio real-time localization technology-based sound field adaptation method applied to the audio real-time localization technology-based sound field adaptation system, where the audio real-time localization technology-based sound field adaptation method includes:

firstly, playing audio and video streams by using loudspeakers, simultaneously self-networking a plurality of loudspeaker nodes through a time synchronization unit, and outwards playing audio positioning signals modulated in a predetermined mode at fixed time periods and intervals;

step two, more than one electronic terminal receives the audio positioning signals played by the loudspeaker nodes, carries out denoising and demodulation processing on the received audio positioning signals, calculates distance differences based on the demodulated audio positioning signals, and feeds back calculation results to each loudspeaker node;

and step three, each loudspeaker node receives a calculation result fed back by the electronic terminal, and meanwhile, each loudspeaker node takes the same time of the video and audio streams arriving at the terminal as a target based on the calculation result, and the time delay of the video and audio stream playing passing through each loudspeaker node is adjusted through active time delay processing.

Further, in step two, the denoising and demodulating the received audio positioning signal, and calculating the distance difference based on the demodulated audio positioning signal includes:

(1) filtering and denoising the received audio signal by using a 10-order Butterworth band-pass filter; short-time Fourier transform is carried out on the filtered data, a time-frequency matrix of the current window signal is obtained by calculating by a method of adding a window to the signal in the time domain,

(2) using the frequency modulation rate of the known signal to carry out rotation transformation on the matrix, calculating the energy accumulation of the matrix on the horizontal axis, calculating the variation of the adjacent energy accumulations and estimating the arrival time T of the audio signal_C(ii) a With detected time stamp T_cSampling data with the size of an N/2 detection window is intercepted before and after the sampling data is taken as a reference, and the generalized cross correlation degree is calculated with a template signal;

(3) after the maximum peak of the signal correlation is positioned, a reverse-order backtracking peak value searching method is carried out forward, and the first time exceeding the peak value threshold value is selected as the arrival time estimation of the audio signal

Wherein alpha (0< alpha < 1) is a threshold coefficient;

(4) according to the arrival time delay acquired by each loudspeaker node

Multiplying by the speed of sound C yields the distance or difference in distance result.

Further, in the third step, the adjusting, by taking the same time of the audio-video streams arriving at the terminal as a target, the time delay of the audio-video stream playing passing through each speaker node through the active time delay processing includes:

and calculating the time difference of the positions of the arrived listeners based on the signals of the other intelligent loudspeaker nodes in the sound field compared with the signals of the first intelligent loudspeaker node by taking the time system of the first intelligent loudspeaker node in the fixed sound production period as a reference, and further calculating the time delay length of the other intelligent loudspeaker nodes in the sound field compared with the original sound production time in the next period.

Further, the step of adjusting the time delay of the audio-video stream playing through each speaker node by active time delay processing with the same time of the audio-video stream reaching the terminal as a target comprises the following steps:

1) recording the distance difference between each intelligent loudspeaker node and the position of a listener, which is calculated by the electronic terminal, as delta 2/1, delta 3/2, … and delta n/n-1, wherein the delta n/n-1 represents the distance difference between the nth loudspeaker box and the nth-1 intelligent loudspeaker node in the sound field and the position of the listener;

2) and according to the distance difference, calculating the time difference of the audio signals of the intelligent loudspeaker nodes except the first intelligent loudspeaker node relative to the first sound box by the following formula:

wherein, Delta T_nRepresenting the difference in arrival time, delta, of the signal at the nth loudspeaker box relative to the first smart loudspeaker node_k/k-1Representing the distance difference from the kth intelligent loudspeaker node and the kth-1 intelligent loudspeaker node to the position of a listener, and c representing the given sound velocity at normal temperature and normal pressure;

3) according to the calculated time difference of other intelligent loudspeaker nodes except the first intelligent loudspeaker node, self-adaptive delay adjustment is carried out on each intelligent loudspeaker node, namely-delta T is added on the basis of the appointed sounding time stamp of each original sound box_nTime delay of (2); wherein if Δ T_nIf the number of the intelligent loudspeaker nodes is positive, the sound production of the nth intelligent loudspeaker node needs to be advanced by the preset sound production time stamp | delta T_nL, |; if Δ T_nIf the number of the nodes is negative, the sound production of the nth intelligent loudspeaker node needs to be delayed by the original sound production time stamp | delta T_n|；

4) Ending the time delay processing of the sound production of each intelligent loudspeaker node in the period; judging whether the identity of the optimum listener is transferred or not, and if not, starting state detection and time delay processing of a new cycle; if yes, clearing the time delay flag bit maintained by the system, and performing time delay adjustment again for the transferred listening position.

By combining all the technical schemes, the invention has the advantages and positive effects that:

the invention is based on the real-time positioning technology of audio signals, measures the distance difference between each loudspeaker and a listener, and carries out accurate time delay processing on the sound production of each loudspeaker in a self-adaptive manner according to the real-time distance difference between the loudspeaker and the listener, so as to ensure that the sound produced by each loudspeaker can reach the position of the listener at the same time, and realize the self-adaptive adjustment of the optimum listening position (the ' emperor ' position ').

The invention can optimize and adjust the sound production time delay of the sound box in real time according to the position change of the listener so as to ensure that the listener can always obtain the best listening effect after the position change.

The audio positioning technology adopted by the invention can achieve the distance measurement precision of average decimeter level, and provides technical support for sound field self-adaptation. And the existing foundation of the sound system can be directly utilized based on the audio signal without adding excessive hardware cost.

Drawings

Fig. 1 is a schematic diagram of a sound field adaptive system based on an audio real-time positioning technology according to an embodiment of the present invention.

FIG. 2 is a schematic structural diagram of a sound field adaptive system based on an audio real-time positioning technology according to an embodiment of the present invention;

in the figure: 1. an intelligent speaker node; 2. and an electronic terminal.

Fig. 3 is a schematic diagram of a sound field adaptation method based on an audio real-time positioning technology according to an embodiment of the present invention.

FIG. 4 is a flowchart of a sound field adaptation method based on an audio real-time localization technique according to an embodiment of the present invention.

Fig. 5 is a flowchart of adaptive audio signal delay estimation and distance calculation based on an intelligent speaker node frequency modulation signal according to an embodiment of the present invention.

Fig. 6 is a flow chart of sound production delay of an "emperor position" adaptive sound box based on a distance measurement result of an audio positioning technology according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In view of the problems in the prior art, the present invention provides a sound field adaptive system based on audio real-time positioning technology, and the following describes the present invention in detail with reference to the accompanying drawings.

As shown in fig. 1-2, a sound field adaptive system based on an audio real-time positioning technology provided by an embodiment of the present invention includes:

the intelligent loudspeaker comprises an intelligent loudspeaker node 1 and an electronic terminal 2;

the intelligent loudspeaker node 1 is used for playing the video and audio stream and playing different sound channels according to the characteristics of the data stream; for playing out the audio positioning signal modulated in a predetermined manner at fixed time periods and intervals; the intelligent speaker node is used for receiving the audio and video stream broadcast time delay of the electronic terminal;

and the electronic terminal 2 is used for denoising the received audio positioning signal, demodulating the audio positioning signal and calculating the distance difference, and feeding the result back to each intelligent loudspeaker node.

The intelligent loudspeaker nodes provided by the embodiment of the invention are time-synchronized through the radio frequency unit.

As shown in fig. 3, the intelligent speaker node provided in the embodiment of the present invention is provided with an audio interface, an analog-to-digital conversion module, a processor, a digital-to-analog conversion module, a power amplification module, a speaker, a radio frequency transceiver processor, and an antenna.

The processor provided by the embodiment of the invention is used for performing video stream playing, wireless communication, time synchronization, positioning signal emission and active delay control by utilizing the synchronous signal processing, the time management logic, the signal modulation processing logic and the sounding management logic.

The frequency of the audio positioning signal provided by the embodiment of the invention is within the range of the audible band and the ultrasonic band near the audible band, and the frequency can be modulated.

The electronic terminal provided by the embodiment of the invention is an intelligent device comprising a microphone;

the intelligent device provided by the embodiment of the invention comprises but is not limited to a special label device matched with a system and a smart phone/intelligent wearable device provided with a matched application program.

As shown in fig. 4-5, a sound field adaptive method based on an audio real-time positioning technology according to an embodiment of the present invention includes:

s101, playing audio and video streams by using speakers, simultaneously self-networking a plurality of speaker nodes through a time synchronization unit, and outwards playing audio positioning signals modulated in a preset mode at fixed time periods and intervals;

s102, the electronic terminal receives the audio positioning signals played by the loudspeaker nodes, carries out denoising and demodulation processing on the received audio positioning signals, calculates distance differences based on the demodulated audio positioning signals, and feeds back calculation results to the loudspeaker nodes;

s103, each loudspeaker node receives a calculation result fed back by the electronic terminal, and meanwhile, each loudspeaker node takes the same time of the video and audio streams arriving at the terminal as a target based on the calculation result, and the time delay of the video and audio stream playing passing through each loudspeaker node is adjusted through active time delay processing.

The embodiment of the present invention provides a method for denoising and demodulating a received audio positioning signal, and calculating a distance difference based on the demodulated audio positioning signal, including:

(2) using the frequency modulation rate of the known signal to perform rotation transformation on the matrix, calculating the energy accumulation of the matrix on the horizontal axis, and calculating the variation estimation of the adjacent energy accumulationMeasuring the time of arrival T of an audio signal_C(ii) a With detected time stamp T_cSampling data with the size of an N/2 detection window is intercepted before and after the sampling data is taken as a reference, and the generalized cross correlation degree is calculated with a template signal;

Wherein alpha (0< alpha < 1) is a threshold coefficient;

(4) according to the arrival time delay acquired by each loudspeaker node

The embodiment of the invention provides a method for adjusting the time delay of playing the audio and video streams passing through each loudspeaker node by active time delay processing, which aims at the same time of the audio and video streams reaching a terminal, and comprises the following steps:

As shown in fig. 6, the adjusting of the time delay of the playback of the video streams passing through each speaker node by the active delay processing, which is provided by the embodiment of the present invention and aims at the same video stream time of the terminal, includes the following steps:

The technical solution of the present invention is further described with reference to the following specific embodiments.

Example 1:

referring now to fig. 1 and 3, a sound field adaptation system based on audio localization technology according to a preferred embodiment of the present invention is described, which includes: intelligent speaker node, electronic terminal, self-adaptation execution system.

At least two sound boxes are fixedly arranged in the watching space, the basic functions of the loudspeakers in the sound boxes comprise playing video and audio streams and playing different sound channels according to the characteristics of the data streams, so that the system can output stereo effect. Each loudspeaker forms a multifunctional node, and the extended functions of the multifunctional node comprise: the nodes carry out time synchronization through the radio frequency unit and play the audio positioning signal modulated in a predetermined mode outwards at fixed time periods and intervals, and the signal is superposed with the audio and video stream without influencing the listening effect; the node can receive the terminal signal feedback and adjust the time delay of the playing of the video stream passing through the node. So it is called a smart speaker node.

The electronic terminal, which generally follows the listener, may perform audio positioning signal demodulation and distance difference resolution, and feed the results back to each speaker node.

For example, the electronic terminal may be a companion tag of the adaptive system, or may be implemented by installing an application on a smart phone.

And after measuring the distance difference between the electronic terminal and each loudspeaker, the electronic terminal feeds back the distance difference to all the loudspeaker nodes in a radio frequency communication mode. Based on the feedback information, each node respectively executes active delay action by taking the same time of the video and audio stream arriving at the terminal as a target.

Example 2:

in one embodiment, to achieve adaptive adjustment of the optimal listener position ("emperor position"), the electronic terminal needs to follow the listener, perform audio positioning signal demodulation and distance difference calculation, and feed the result back to each speaker node, including: the self-adaptive system receives and records the positioning audio signals which are emitted by the loudspeaker nodes and subjected to frequency modulation by using the electronic terminal, performs real-time signal demodulation and detection on the audio signals to obtain arrival time delay, calculates distance difference according to the arrival time delay and feeds back the distance result to each loudspeaker node.

Specifically, fig. 5 schematically shows a flow of adaptive audio signal delay estimation and distance solution based on an intelligent speaker node frequency modulation signal according to a preferred embodiment of the present invention, including:

step X: firstly, a microphone of an electronic terminal, which is reached by an audio signal broadcasted by an intelligent loudspeaker node, is defined as x (t). Because the lengths of the propagation path and the reflection path are different from each other due to the reflection surface, the audio signal received by the electronic terminal can be regarded as the superposition of different energy and time delay signals, which can be defined as

Where s' (t) is the modulation signal of the smart speaker node, and h (t) is the Channel Impulse Response (CIR), α, of the indoor environment_iIs the attenuation coefficient or channel gain of the path, τ_iIs the path propagation delay.

Step X: the invention designs a 10-order Butterworth band-pass filter to preprocess the received audio signal so as to filter background music and environmental noise and weaken the influence of the background music and the environmental noise on subsequent demodulation detection.

Step X: and then, Short-Time Fourier Transform (STFT) is carried out on the filtered data, and a Time-frequency matrix of the current window signal is calculated and obtained by a method of adding a window to the signal in a Time domain. Using the frequency modulation rate of the known signal to perform rotation transformation on the matrix, obtaining the energy accumulation of the matrix on the horizontal axis, calculating the variation of the adjacent energy accumulations to roughly estimate the arrival time T of the audio signal_c。

Step X: with time stamp T of the coarse detection_cAnd taking the sample data of the size of the N/2 detection window as a reference, and intercepting the sample data of the size of the N/2 detection window before and after the sample data is used for solving the generalized cross correlation degree with the template signal.

Step X: after the maximum peak of the signal correlation is positioned, a reverse-order backtracking peak value searching method is carried out forward, and the first time exceeding the peak value threshold value is selected as the arrival time estimation of the audio signal

Wherein alpha (0< alpha ≦ 1) is a threshold coefficient, and alpha is selected to be 0.3.

Step X: according to the arrival time delay acquired by each loudspeaker node

Multiplying by sound velocity (C) to obtain a distance or a distance difference result, and feeding back the result to each loudspeaker node.

Example 3:

in an embodiment, to ensure the best listening effect, the real-time adaptively optimizing and adjusting the sound time delay of the sound box according to the distance or the distance difference result obtained by the solution in the previous embodiment includes: and calculating the time difference of the signals of the rest of the sound boxes in the sound field reaching the position of a listener compared with the signals of the first sound box by taking the time system of the first sound box in the fixed sound production period as a reference, and further back calculating the time delay length required by the rest of the sound boxes in the sound field compared with the original sound production time in the next period.

Specifically, fig. 6 schematically shows a "emperor's position" adaptive sound box sounding delay flow based on a distance measurement result of an audio positioning technology according to a preferred embodiment of the present invention, including:

step X: the sending time stamps of the audio signals of all the sound boxes in the sound production period relative to the electronic terminal time system are respectively t1, t2, … and tn, wherein the sending time stamp of the audio signal corresponding to the first sound box is t 1;

step X: recording the distance difference of each sound box to the position of a listener, which is calculated by the electronic terminal, as delta 2/1, delta 3/2, … and delta n/n-1, wherein the delta n/n-1 represents the distance difference between the nth sound box and the nth-1 sound box in the sound field to the position of the listener;

step X: and according to the distance difference, inversely calculating the time difference of the arrival time of the audio signals of all the sound boxes except the first sound box at the position of the listener relative to the first sound box, and applying a calculation formula:

wherein, Delta T_nIs the difference of arrival time, delta, of the signal of the nth loudspeaker box relative to the first loudspeaker box_k/k-1The distance difference from the kth sound box and the kth-1 sound box to the position of a listener is shown, and c is the given sound velocity under normal temperature and normal pressure;

step X: according to the calculated time difference of all the sound boxes except the first sound box, self-adaptive delay adjustment is carried out on all the sound boxes, namely-delta T is added on the basis of the appointed sounding time stamp of the original sound boxes_nTime delay of (2);

wherein if Δ T_nIf it is positive, the sound production of the nth speaker needs to be advanced by the predetermined sound production time stamp | Δ T_nL, |; if Δ T_nIf the sound is negative, the sound production of the nth sound box needs to be delayed by the original sound production time stamp | delta T_n|；

Step X: finishing the time delay processing of each sound box in the period;

step X: and (3) judging: confirming whether the identity of the sweet listener is transferred,

if not, starting state detection and time delay processing of a new cycle;

if yes, clearing the time delay zone bit maintained by the system, and starting a new time delay scheme for the transferred listening position.

Example 4:

a sound field self-adaptive system based on audio real-time positioning technology comprises: the system comprises an intelligent loudspeaker node, an electronic terminal and a self-adaptive execution system;

wherein, in the watching space, at least two sound boxes are fixedly arranged, the basic functions of the loudspeakers in the sound boxes comprise playing video and audio streams and playing different sound channels according to the characteristics of the data streams, so that the system can output stereo effect. Each loudspeaker forms a multifunctional node, and the extended functions of the multifunctional node comprise: the nodes are self-networked through a time synchronization unit, and audio positioning signals modulated in a preset mode are played outwards at fixed time periods and intervals, and are superposed with the audio and video stream, so that the listening effect is not influenced; the node can receive the terminal signal feedback and adjust the time delay of the playing of the video stream passing through the node. So it is called a smart speaker node.

The electronic terminal, which generally follows the listener, may perform audio positioning signal demodulation and distance difference resolution, and feed the results back to each speaker node. The application process can be realized by a matched label of the self-adaptive system or by installing an application program on the smart phone.

And after measuring the distance difference between the electronic terminal and each loudspeaker, feeding back the distance difference to all loudspeaker nodes. Based on the feedback information, each node respectively executes active delay action by taking the same time of the video and audio stream arriving at the terminal as a target.

The loudspeaker used by the sound field self-adaptive system based on the audio real-time positioning technology is an intelligent sound box, and has the function of transmitting positioning signals in real time besides the traditional audio-video playing function. The applied method and system implant the functions of wireless communication, time synchronization, positioning signal transmission and active time delay in the existing sound system. The implantation generally needs to be implemented by software and hardware modification based on a general sound system, including addition of hardware units such as a radio frequency transceiver processor and an antenna for communication and time synchronization, addition of synchronous signal processing and time management logic, addition of signal modulation processing logic and sounding logic management, and the like.

The frequency of the positioning audio signal emitted by the sound field self-adaptive system loudspeaker based on the audio real-time positioning technology is within the ultrasonic wave band range of the audible wave band and the near audible wave band, and the positioning audio signal is subjected to frequency modulation.

The electronic terminal of the sound field self-adaptive system based on the audio real-time positioning technology can filter background music and environmental noise, receive and demodulate and position audio signals in real time and calculate the distance difference.

The sound field adaptive system electronic terminal based on the audio real-time positioning technology can be various intelligent devices including a microphone, including but not limited to a special label device matched with the system, a smart phone/smart wearable device provided with a matched application program, and the like.

The sound field self-adaptive system based on the audio real-time positioning technology comprises a self-adaptive execution method, wherein the sound field self-adaptive execution method optimizes and adjusts sound box sounding time delay in real time according to the position change of a listener so as to ensure that the listener can obtain the best listening effect all the time after the position change.

A sound field adaptive system based on audio real-time positioning technology has the function of switching the sweet spot tracking from one listener to another.

In the description of the present invention, "a plurality" means two or more unless otherwise specified; the terms "upper", "lower", "left", "right", "inner", "outer", "front", "rear", "head", "tail", and the like, indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, are only for convenience in describing and simplifying the description, and do not indicate or imply that the device or element referred to must have a particular orientation, be constructed in a particular orientation, and be operated, and thus, should not be construed as limiting the invention. Furthermore, the terms "first," "second," "third," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

It should be noted that the embodiments of the present invention can be realized by hardware, software, or a combination of software and hardware. The hardware portion may be implemented using dedicated logic; the software portions may be stored in a memory and executed by a suitable instruction execution system, such as a microprocessor or specially designed hardware. Those skilled in the art will appreciate that the apparatus and methods described above may be implemented using computer executable instructions and/or embodied in processor control code, such code being provided on a carrier medium such as a disk, CD-or DVD-ROM, programmable memory such as read only memory (firmware), or a data carrier such as an optical or electronic signal carrier, for example. The apparatus and its modules of the present invention may be implemented by hardware circuits such as very large scale integrated circuits or gate arrays, semiconductors such as logic chips, transistors, or programmable hardware devices such as field programmable gate arrays, programmable logic devices, etc., or by software executed by various types of processors, or by a combination of hardware circuits and software, e.g., firmware.

The above description is only for the purpose of illustrating the present invention and the appended claims are not to be construed as limiting the scope of the invention, which is intended to cover all modifications, equivalents and improvements that are within the spirit and scope of the invention as defined by the appended claims.

Claims

1. A sound field self-adaptation method based on an audio real-time positioning technology is characterized by comprising the following steps:

step two, one or more electronic terminals receive the audio positioning signals played by the loudspeaker nodes, perform denoising and demodulation processing on the received audio positioning signals, calculate distance differences based on the demodulated audio positioning signals, and feed back calculation results to the loudspeaker nodes;

and step three, each loudspeaker node receives the calculation result fed back by each electronic terminal, and meanwhile, each loudspeaker node takes the same time of the audio and video streams reaching the selected or default terminal as a target based on the calculation result, and the audio and video stream playing time delay passing through each loudspeaker node is adjusted through active time delay processing.

2. The method of claim 1, wherein the step two, denoising and demodulating the received audio localization signal, and calculating the distance difference based on the demodulated audio localization signal comprises:

(1) carrying out filtering and denoising processing on the received audio signal by using a digital filter; performing mathematical transformation on the filtered data to obtain a time-frequency matrix of the signal;

(2) the matrix is rotated using the frequency modulation of the known signalConverting, calculating energy accumulation of matrix on horizontal axis, calculating variation of adjacent energy accumulation to estimate arrival time T of audio signal_c(ii) a With detected time stamp T_cSampling data with the size of an N/2 detection window is intercepted before and after the sampling data is taken as a reference, and the generalized cross correlation degree is calculated with a template signal;

Wherein alpha (0< alpha < 1) is a threshold coefficient;

(4) according to the arrival time delay acquired by each loudspeaker node

3. The sound field adaptive method according to claim 1, wherein the step three, aiming at the same time of the video streams arriving at the selected or default terminal, adjusting the delay of the video stream playing through each speaker node by the active delay processing comprises:

4. The sound field adaptive method based on the real-time audio positioning technology as claimed in claim 1, wherein the step of adjusting the delay of the playback of the video streams through each speaker node by the active delay processing with the aim that the video streams arriving at the terminal have the same time comprises the following steps:

5. A sound field adaptive system based on an audio real-time positioning technology for implementing the sound field adaptive method of any one of claims 1 to 4, wherein the sound field adaptive system based on the audio real-time positioning technology comprises:

6. The system of claim 5, wherein the nodes of the intelligent speakers are time-synchronized through the RF unit.

7. The sound field adaptive system based on the audio real-time positioning technology, as claimed in claim 5, wherein the smart speaker node is provided with an audio interface, an analog-to-digital conversion module, a processor, a digital-to-analog conversion module, a power amplification module, a speaker, a radio frequency transceiver processor and an antenna;

8. The system of claim 5, wherein the audio localization signal has a frequency within the audible band and the ultrasonic band near the audible band, and is frequency-modulated.

9. The sound field adaptive system based on the audio real-time positioning technology, as recited in claim 5, wherein the electronic terminal is a smart device comprising a microphone;

10. A computer-readable storage medium, storing a computer program which, when executed by a processor, causes the processor to perform the sound field adaptation method based on audio real-time localization techniques of claims 1-5.