CN114913868A

CN114913868A - FPGA-based acoustic array directional pickup method

Info

Publication number: CN114913868A
Application number: CN202210539422.0A
Authority: CN
Inventors: 吴宇; 向伟铭; 刘钰; 周毅炜
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-05-17
Filing date: 2022-05-17
Publication date: 2022-08-16
Anticipated expiration: 2042-05-17
Also published as: CN114913868B

Abstract

The invention relates to sound array signal acquisition and signal processing, in particular to a sound array directional sound pickup method based on an FPGA. The invention takes FPGA as a main control broadband beam former based on space-time frequency, implements a rectangular MEMS digital microphone array with 32 array elements as a carrier, and realizes the functions of real-time, reliable and stable directional pickup and interference suppression in indoor and outdoor environments. On a hardware circuit controlled by an FPGA chip, 32 microphone elements pass through I ² Decoding the S interface, synchronously acquiring and transmitting the audio at 48KSpS and 24bit standard audio sampling rate and quantization precision, and sequentially performing the following steps: at airspacePhysical, frequency domain, wavelet domain and time domain processing. Finally, the output audio of the invention can inhibit the interference sound source and the environmental noise to a great extent while enhancing the target sound source, and is dedicated to providing rich demand scenes such as intelligent office, intelligent home, high-quality conference, equipment health monitoring and the like.

Description

FPGA-based acoustic array directional pickup method

Technical Field

The invention relates to sound array signal acquisition and signal processing, in particular to a sound array directional pickup method based on an FPGA (field programmable gate array).

Background

The acoustic array system is widely applied to the fields of industrial monitoring, military, business office and the like. Under the design requirements and concepts of more intellectualization, miniaturization and digitization, the array system can replace human beings to carry out specific work in specific occasions to complete specific tasks.

Compared with the traditional array signal processing technology, the acoustic array signal processing has the following four difficulties:

1. bandwidth of received signal: the object of the acoustic array signal processing is a broadband voice signal, and a typical frequency band is 300-3400 Hz. The receiving array forms a non-coherent array for the signal sources, and the receiving signals of each array element are not only different in phase, but cannot be weighted and added directly.

2. Non-stationarity of the received signal: the speech signal needs to be preprocessed, such as anti-aliasing filtering, pre-emphasis, windowing and framing, to ensure the short-time stability of the received signal, so as to obtain a digital signal with lower distortion and higher tone quality.

3. Physical characteristics of the received signal: the received signal is usually a mechanical longitudinal wave, and due to reflection and diffraction of the wave, the signal received by the microphone is superposed and interfered by multipath signals besides a direct signal, so that acoustic reverberation is generated; in an indoor environment, diffraction and reflection from room boundaries or obstacles causes unknown changes in the amplitude and phase of the acoustic signal propagating to each array element.

4. External environment in which the array is located: the signal processing algorithm should be adapted to indoor and outdoor environments, to near-field and far-field conditions; the near field is relative to the far field, taking into account both the angle of incidence of the source and the distance of the source relative to the array.

Aiming at the four technical difficulties, although the hardware scheme of the acquisition card + DSP processor and the algorithm of the broadband adaptive beam forming which are widely adopted at present can obtain some directional pickup effects with high signal-to-noise ratio, various technical problems which are not solved can be met: such as less array elements, insufficient pickup distance, insufficient suppression of non-stationary interference, large calculation amount caused by increasing a large number of array elements, and high hardware cost.

Disclosure of Invention

Aiming at the problems or the defects, the invention provides the FPGA-based acoustic array directional pickup method for solving the problems of small array element number, insufficient pickup distance, insufficient non-stable interference suppression, large operation amount caused by increasing a large number of array elements, high hardware cost and the like in the directional pickup effect realized based on the current acoustic array beam forming technology.

An FPGA-based acoustic array directional pickup method comprises the following specific steps:

step 1, sending control characters mapped to an azimuth angle-pitch angle group ranging from-80 degrees to an FPGA (field programmable gate array) by an upper computer UART (universal asynchronous receiver transmitter) module. FPGA synchronously acquires output words of 32-array-element microphone array and passes through I ² S decodes the resulting 48KHz, 24bit, 32CH and digital line PCM (pulse code modulation) stream.

Then the FPGA carries out ASCII decoding on the control characters to obtain an azimuth angle-pitch angle group in a target direction and an interference direction, and then a tapped delay line which is mapped to the range from minus 80 degrees to 80 degrees and has different lengths (quantized to be minus 4 to 4) is adopted to carry out fixed weight value delay summation on the input 32CH parallel digital PCM stream; the sequence of the delay summation is that the delay summation of the microphones which are arranged in the transverse direction of 4 rows is firstly carried out, and then the delay summation of the summation result of the previous 4 rows is carried out in the longitudinal direction.

Finally the microphone array is aligned to the azimuth-elevation set of target and interference directions respectively by Mux9_1 (1-out-of-9 splitter) and outputs 8KHz, 24bit, 2CH digital PCM streams.

And 2, carrying out system signal-to-interference ratio improvement on the 8KHz, 24bit and 2CH digital PCM streams output in the step 1.

And (3) carrying out windowed short-time Fourier transform on the two paths of signals (azimuth angle-pitch angle groups) of the alignment target and the interference direction obtained in the step (1) by the FPGA through a rectangular window with 24K points, and calculating a cross-power spectrum. And improving the optimal weight of the post filtering with Wiener under the traditional MMSE (minimum mean square error) criterion, namely adding a variable coefficient alpha to improve the weighting effect. And weighting the improved weight value with the frequency domain of the target direction. And finally, inversely transforming the weighting result to a time domain single channel to remove the non-stationary interference signal. The transformation sequence is native Order, and the transformation is realized as Radix-2 (base-2). To improve the signal-to-interference ratio of the system.

And finally outputting 48KHz, 24bit and 1CH digital PCM streams, which are single-path target direction output signals S (n) with high signal-to-interference ratio obtained after frequency domain weighting.

And 3, for the 48KHz, 24bit and 1CH digital PCM stream S (n) output in the step 2.

And (3) the FPGA adopts a decomposition reconstruction FIR filter group of dB3 wavelet basis to perform substrate stationary noise estimation on the single-path non-stationary interference removing signal obtained in the step (2). The estimated noise is used as input, the error signal is used as output to carry out adaptive noise cancellation, and the weight updating and filtering of the adaptive filter are carried out based on LMS (least mean square) criterion to realize 33-taps (33 taps) FIR. To improve the signal-to-noise ratio of the system.

Finally outputting 48KHz, 24bit, 1CH digital PCM stream, which is a high signal-to-noise ratio single-path target direction output signal S after adaptive noise cancellation _af (n)。

Step 4, outputting 48KHz, 24bit, 1CH digital PCM stream S for step 3 _af (n)。

And (3) calculating the short-time energy of the single-path signal obtained in the step (3) by the FPGA according to the signal envelope extracted by the Hilbert filter. The filter response characteristic is Hilbert-transform (Hilbert transform), the filtering is realized as 11-taps FIR, envelope direct current component is calculated through 16-order sliding filtering, 4-level division is carried out according to the intensity of the envelope direct current component, and gain time domain signals under corresponding division are output through Mux4_ 1. The automatic gain control on the amplitude is carried out to ensure the volume balance of the output audio.

Finally obtaining 48KHz, 24bit, 1CH digital PCM stream S _audio (n) straightAnd then output to the sound card and generate sound to the external earphone.

Further, the step 1 specifically comprises the following steps:

step 1.1, sending the control characters fed back by the upper computer to the FPGA for ASCII decoding through a UART protocol so as to obtain the target position of spatial filtering

And interference position

Wherein

Respectively representing an azimuth angle and a pitch angle, a lower corner mark s representing a target and i representing interference;

and step 1.2, mapping-80 degrees to 80 degrees into integers quantized-4 to 4 according to the azimuth angle and pitch angle information in the step 1.1 to serve as the length of a tapped delay line, and performing fixed weight value delay summation by adopting the tapped delay line mapped to the lengths of different angle information. And firstly performing delay summation of 4 rows of transversely arranged microphones on an airspace to obtain total 4 x 2CH signals of four paths of respectively aligned target and interference direction azimuth angles, and then performing delay summation of four paths of longitudinal directions to obtain 2CH signals of aligned target and interference direction pitch angles.

The fixed weight delayed sum is expressed as

Wherein w ═ 1, 1., 1] ^T H-4, -3., 0.. 3.. 4, i.e., ih, which measures the above-mentioned "different lengths", n represents a sampling point in the time domain, M denotes the number of microphone elements used for the current delay sum, 8 when the transverse microphone delay sum is performed, and 4 when the longitudinal microphone delay sum is performed.

Step 1.3, timing and overflow issues in the hardware design for steps 1.1 to 1.2: the lateral alignment azimuth requires the use of 4 x 7 x2 adders and the longitudinal alignment pitch requires the use of 3 x2 adders. The adders adopt a tree-shaped production line, the bit width needs to be increased by 1bit after each adder, the output is shifted to the right by 3 bits after accumulation and summation, and then cast (conversion) is unified to 24_0_ fixed. The UART module receives the control word and decodes it into ASCII as the reference signal for the multiplexer Mux9_1, and since h has 9 values, the selection of the Mux9_1 output is required for each time delay summation.

Step 1.4, after spatial filtering of the microphone array signals in steps 1.1 to 1.3, two alignment signals align _ s (n) and align _ i (n) are generated, and the two alignment signals are Aligned from the azimuth angle and the pitch angle respectively

And

further, the step 2 specifically comprises the following steps:

step 2.1, performing windowing and framing on the two alignment signals Aligned _ s (n) and Aligned _ i (n) obtained in the step 1 according to a rectangular window of 0.5s (24K length), and calculating short-time Fourier transform with the same point number to obtain two frequency domain signals F _ s (K) and F _ i (K), wherein K represents a sampling point on a frequency domain.

And 2.2, respectively calculating the self-power spectral density of the two paths of frequency domain signals obtained in the step 2.1 to obtain P _ s (k) and P _ i (k). The self-power spectrum is calculated as P (k) ═ F (k) · F ^* (k) /N, wherein [. cndot.)] ^* Indicating the conjugate of the sequence and N the frame length.

Step 2.3, the optimal weight with Wiener post-filtering under the MMSE criterion is improved, and the weight is written as:

and adjusting the value of the coefficient alpha according to the short-time power spectral density in the interference direction, wherein the larger the P _ i is, the smaller the alpha is to reduce the music noise caused by excessive weighting, and the smaller the P _ i is, the larger the alpha is to further enhance the suppression effect on the non-stationary interference signal.

Step 2.4, after obtaining the weighted weight in step 2.3, the step 2.1 of calculating the frequency domain transformation F _ s (k) of the target direction is carried out as step 2The weighted re-fourier inverse transform of claim 3, calculated as: s (n) ═ F ^-1 {F_s(k)·w _opt (k)}。

Step 2.5, timing and overflow issues in the hardware design for steps 2.1 to 2.4: the FFT used by the Fourier transform and the IFFT module used by the inverse Fourier transform both adopt the Natural Order and the base 2 operation. The conjugate multiplication is substantially a modular square calculation, and 1 adder +2 multipliers can be used instead of the complex multiplier, and the outputs are all Full-Scale. The 'adjustment of the value of alpha according to the short-time power spectral density of the interference direction' is realized by using Mux4_ 1. Since the IFFT supports Fixed-point input with only an upper limit of 33 bits, the operation results related to the multipliers are all subjected to 16-bit right-shift redundancy prevention quantization. The result output by the IFFT module has a decimal number, and needs to be uniformly cast to 24_0_ fixed.

And 2.6, obtaining a single-path target direction output signal S (n) with high signal-to-interference ratio by the two paths of space domain alignment signals through the calculation from the step 2.1 to the step 2.5 through frequency domain weighting.

Further, the step 3 specifically comprises the following steps:

step 3.1, wavelet decomposition and reconstruction each require a pair of FIR filter banks, given the filter coefficients of the dB3 wavelet basis used:

a low-pass filter for scale-space decomposition Lo _ D ═ 0.035226-0.085441-0.1350110.4598780.8068920.332671;

a height filter Hi _ D [ -0.3326710.806892-0.459878-0.1350110.0854410.035226 ] for wavelet spatial decomposition;

a low-pass filter for scale-space reconstruction Lo _ R ═ [ 0.3326710.8068920.459878-0.135011-0.0854410.035226 ];

a high pass filter Hi _ R for wavelet spatial reconstruction [ 0.3326710.806892-0.459878-0.1350110.085441-0.035226 ]. The above coefficients are all quantized using 32_31_ fixed.

Step 3.2, one-layer decomposition: sequentially passing the single-path target direction signal S (n) obtained in the step 2 through Lo _ D and Hi _ D given in the step 3.1, and performing 2-time extraction to obtain a wavelet domain coefficient A ₁ And D ₁ 。

Step 3.3, two-layer decomposition: d obtained by calculation in step 3.2 ₁ Sequentially passes through Lo _ D and Hi _ D given in the step 3.1 and performs 2-time extraction to obtain a wavelet domain coefficient A ₂ And D ₂ 。

And 3.4, threshold value noise filtering. Let D of a layer ₁ The following hard decision and value are made for the quantized value of each sampling and the threshold lambda to extract the estimated noise:

wherein the threshold λ is calculated using the sqtwolog criterion:

wherein sigma ² Is the estimated variance of the current sample frame, and N is the frame length.

Let A be two layers as described above ₂ And D ₂ And carrying out the same judgment and value taking to extract the corresponding estimation noise.

And 3.5, reconstructing by one layer. Let the estimated noise D of the wavelet space calculated in step 3.2 ₁ Firstly, 2 times of interpolation is carried out, and then time domain sequence N _ D is obtained through Hi _ R reduction ₁ (n)。

And 3.6, reconstructing the two layers. Let the estimated noise D of the wavelet space calculated in step 3.3 ₂ Estimated noise A of sum scale space ₂ Firstly, performing 2-time interpolation and then sequentially passing through Hi _ R and Lo _ R; after the completion, the time domain sequences N _ D are respectively obtained by 2 times of interpolation and then are restored through Lo _ R ₂ (N) and N _ A ₂ (n)。

And 3.7, time domain superposition. And (3) superposing the estimated noise of a certain frame calculated in the steps 3.5 to 3.7:

N(n)＝N_D ₁ (n)+N_D ₂ (n)+N_A ₂ (n)

step 3.8, using N (n) obtained by calculation in step 3.7 as input signal, S (n) as reference signal, and the output of the filter is

With S (n) -y _N And (n) as an error signal, and performing adaptive FIR filtering based on the LMS criterion. Filter order of 32, initial weight w _af And (5) setting the weight value as 0, and giving an updated calculation formula of the weight value by a gradient descent method: w is a _af (n+1)＝w _af (n)+2μ(S(n)-y _N (n)) n (n), wherein the learning step size μ is 0.005.

Step 3.9, the timing and overflow problems in the hardware design through steps 3.1 to 3.8 above: the FIR filters used for wavelet decomposition reconstruction are all 6-taps, 5 delay units, 6 multipliers and 5 adders, whose coefficients are all quantized using 32_31_ fixed, and the output of each FIR filter is cast to 24_0_ fixed. The threshold value is calculated by using the variance of a sample frame, the variance is estimated by using a statistical formula, all the related adders and multipliers adopt Full-Scale output, and the mean value calculation related to the variance estimation is realized by adopting 16-order moving average filtering. The hard decision of threshold noise filtering is realized by Mux2_ 1. The FIR adaptive filter comprises 33-taps, 32 delay units, 33 multipliers and 32 adders, wherein the coefficients (namely weight values) and the outputs are 24_0_ fixed, and the learning step size is 16_16_ fixed.

Step 3.10, through the calculation from the step 3.1 to the step 3.9, the single-channel digital audio PCM stream output after the adaptive noise cancellation is finally represented as S _af (n)＝S(n)-y _N (n)。

Further, the step 4 is specifically as follows:

step 4.1, the response characteristic of the Hilbert filter is essentially an all-pass filter with the amplitude-frequency characteristic of 1, and the Hilbert filter with an even order requires 0 and f _s A strict cut-off at/2 (-above 20 dB). The output of the Hilbert filter is characterized in phase by a delayed input of 90. The Filter design with the properties is realized by utilizing a Filter-Designer tool box, the response characteristic is set as Hilbert-Transformation, the realization method is an equiripple FIR with the order of 10 orders, and the Filter is obtained as [ -0.53380-0.22630-0.641200.641200.226300.5338]. The coefficients are quantized by 16_15_ fixed and placed in an IP core of a hardware design, wherein the IP core is 11-tapsThe FIR filter of (1).

Step 4.2, the input of Hilbert filter is the single-path digital audio PCM stream S obtained in step 3 _af (n) outputting a pair of orthogonal IQ sequences S in time domain _I (n) and S _Q (n) wherein S _I (n) is S after 5 tap delays _af (n)，S _Q And (n) is the output result after 10 tap delays of the Hilbert filter and weighted.

Step 4.3, calculating S obtained in step 4.2 _I (n) and S _Q (n) the sum of squares and the evolution of the square to obtain the input signal S _af Envelope of (n):

step 4.4, calculating E obtained in step 4.3 by utilizing 16-order moving average filtering _af Direct current component E of (n) _DC (n) for providing the reference signal of Mux for automatic gain control. Design Mux4_1, E to be calculated _DC (n) is divided into 4 ranges by amplitude, the four ranges being E _DC (n)>2 ²⁰ 、2 ¹⁶ <E _DC (n)≤2 ²⁰ 、2 ¹⁴ <E _DC (n)≤2 ¹⁶ And E _DC (n)≤2 ¹² . According to calculated E _DC (n) in the range interval, S _af And (n) respectively shifting the amplitude of the signal to the right by 4 bits, shifting the amplitude of the signal to the right by 2 bits, shifting the amplitude of the signal to the right by 1bit and shifting the amplitude of the signal to the left by 1bit, and performing expansion gain.

Step 4.5, timing and overflow issues in the hardware design for steps 4.1 to 4.4: the FIR filters used for Hilbert transform are 11-taps, 10 delay units, 11 multipliers and 10 adders, the coefficients of which are all quantized with 16_15_ fixed, and the output of each FIR filter is cast to 24_0_ fixed. The sum of squares operation involves a multiplier-adder and the subsequent moving average filtering both uses the Full-Scale output. E _DC (n) case to 24_0_ fixed by a method of shifting right by 16 bits before being input to the Mux4_ 1.

Step 4.6, through the calculation of the above steps 4.1 to 4.5, the final output signal is the pair S _af (n) digital audio PCM stream S with amplitude subjected to automatic gain control _audio (n) of (a). The signal is distributed to a DDR3 memory of an ARM through an AXI (advanced extensible interface) bus, then is sent to an upper computer through a TCP/IP (transmission control protocol/internet protocol) protocol, and then is driven to play by a sound card.

The invention has the following beneficial effects:

(1) the number of array elements is large. Compared with the existing directional sound pickup acoustic array product which is generally provided with less than 10 microphones, the signal-to-noise ratio of the audio output of the invention can be improved by more than 14dB, so that the sound pickup range of outdoor and indoor scenes is far and can reach more than 20 m.

(2) Real-time processing, low delay and low hardware cost. The increase of the number of the array elements can cause that the hardware scheme of a capture card and a DSP operation platform commonly used by the existing acoustic array product is difficult to carry out the real-time processing of high throughput and low delay value. By means of acceleration of FPGA parallel operation, the overall test delay of the system is 1-2 s; and the FPGA is used as an integrated hardware platform for data acquisition, transmission and processing, and an expensive acquisition card and a complex data interface are not required to be configured for the system independently, so that the hardware cost is greatly reduced. The method meets the application scenes of industrial monitoring, military, business office and the like which require real-time accuracy.

(3) The effects of inhibiting non-stationary interference and stationary noise floor are obvious. Compared with the existing products in the market, the directional pickup function of non-stationary interference suppression cannot be well realized, and the method has more obvious suppression effect on non-stationary interference signals and substrate noise in the space.

Drawings

FIG. 1 is a schematic diagram of the operation of the present invention;

FIG. 2 is a flow diagram of the overall architecture of the present invention;

FIG. 3 is a first partial (airspace) framework of the present invention;

FIG. 4 is a second partial (frequency domain) framework diagram of the present invention;

FIG. 5 is a third partial (wavelet domain) framework diagram of the present invention;

FIG. 6 is a fourth part (time domain) of the present invention in a block diagram;

FIG. 7 shows the results of the performance test of the examples.

Fig. 8 is a time domain waveform at various steps of an embodiment.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

The 32-array sound array directional pickup method based on the FPGA is realized on a ZYNQ-XC7Z020 heterogeneous chip, and the working schematic diagram of the whole system is shown in FIG. 1, wherein the 32-array sound array directional pickup system, the MEMS digital microphone, the 130-degree wide-angle camera and the noise reduction Bluetooth headset are referred to, and the method is responsible for realizing the following functions:

(1) a digital MEMS microphone chip ICS-43432 is selected. The FPGA synchronously acquires sound pressure analog signals with the surrounding working frequency of 300-3400 Hz and the spatial distribution of 360 degrees in all directions, and the sound pressure analog signals pass through an internal ADC (analog to digital converter) of ICS-43432, anti-aliasing filtering and I ² And the S interface is used for converting the data acquired by each channel into a PCM format which can be processed by a digital system, wherein the sampling frequency is 48KHz, the quantization precision is signed-24bit, and the number of synchronous channels is 32 CH.

(2) The 32CH originally acquired digital audio signals output 1CH digital signals in the FPGA through a space-time frequency-based broadband beam forming algorithm, and the data stream meets the design requirements of real-time performance, non-stationary interference suppression, stationary bottom noise suppression and long pickup distance.

(3) The 1CH digital signals output in the last step are distributed into a DDR3 memory of the ARM part through an AXI bus inside the FPGA, and then are sent to an upper computer through a TCP/IP protocol. The upper computer receives data, is connected with the Bluetooth headset and drives the sound card to play through the headset.

The test environment of the system is located outdoors or indoors.

(1) Under the outdoor environment, the signal-to-noise ratio is 0 ~ 5dB, and single sound source pickup: azimuth angle 0 °, pitch angle 3 °, and pickup distance 20 m.

Multi-sound-source sound pickup: the azimuth angle of the sound source A is-30 degrees, the pitch angle is 5 degrees, and the pickup distance is 5 m; the azimuth angle of the sound source B is 30 degrees, the pitch angle is 5 degrees, and the pickup distance is 5 m.

(2) In indoor environment, the space size is 10 × 8 × 5m, the signal-to-noise ratio is more than 20dB, the reverberation time RT60 is 250ms, and multiple sound sources pick up sound: the azimuth angle of the sound source A is-30 degrees, the pitch angle is 12 degrees, and the pickup distance is 5 m; the azimuth angle of the sound source B is 30 degrees, the pitch angle is 11 degrees, and the sound pickup distance is 5 m.

Taking an indoor test as an example, a performance test result representing the embodiment of fig. 7 is plotted (in order from top to bottom, a layer 1 represents an interference signal, a layer 2 curve represents a target signal, a layer 3 curve represents a single microphone receiving signal No. 2, and a layer 4 curve represents directional sound pickup output).

Fig. 8 represents the time domain waveform of this embodiment at each step of the present invention (in order from top to bottom, layer 1 represents the received signal of single microphone No. 2, layer 2 curve represents the output of layer 2 signal after spatial processing, layer 3 curve represents the output of layer 2 signal after frequency domain weighting, and layer 4 curve represents the output of layer 3 signal after adaptive noise cancellation (ANS) and Automatic Gain Control (AGC)).

It can be seen from the time domain signal that the output signal of the directional pickup reproduces the target signal, and greatly inhibits the non-stationary interference signal and stationary substrate noise, thus proving the feasibility of the system of the invention.

In conclusion, the FPGA serves as a main control space-time-frequency-based broadband beam former, the rectangular MEMS digital microphone array with 32 array elements as a carrier is implemented, and real-time, reliable and stable directional sound pickup and interference suppression in indoor and outdoor environments are achieved. On a hardware circuit controlled by an FPGA chip, 32 microphone elements pass through I ² Decoding the S interface, synchronously acquiring and transmitting the audio at 48KSpS and 24bit standard audio sampling rate and quantization precision, and sequentially performing the following steps: spatial domain processing, frequency domain processing, wavelet domain processing, and time domain processing. Finally, the output audio of the invention can inhibit the interference sound source and the environmental noise to a great extent while enhancing the target sound source, and is dedicated to providing rich demand scenes such as intelligent office, intelligent home, high-quality conference, equipment health monitoring and the like.

Claims

1. An FPGA-based acoustic array directional pickup method is characterized by comprising the following steps:

step 1, sending control characters mapped to an azimuth angle-pitch angle group ranging from-80 degrees to an FPGA (field programmable gate array) by a UART (universal asynchronous receiver-transmitter) module of an upper computer; FPGA synchronously acquires output words of 32-array-element microphone array and passes through I ² S decoding to obtain 48KHz, 24bit and 32CH parallel digital PCM streams;

then the FPGA carries out ASCII decoding on the control characters to obtain an azimuth angle-pitch angle group in a target direction and an interference direction, and fixed weight value delay summation is carried out on the input 32CH parallel digital PCM stream by adopting tap delay lines which are mapped from-80 degrees to 80 degrees and quantized to-4 to 4 and have different lengths; the sequence of the delay summation is that the delay summation of the microphones which are arranged in the transverse direction of 4 rows is firstly carried out, and then the delay summation of the summation result of the previous 4 rows is carried out in the longitudinal direction;

finally, aligning the microphone array with the azimuth angle-pitch angle group of the target and the interference direction respectively through Mux9_1, and outputting 8KHz, 24bit and 2CH digital PCM streams;

step 2, carrying out system signal-to-interference ratio improvement treatment on the 8KHz, 24bit and 2CH digital PCM streams output in the step 1;

the FPGA respectively performs windowing short-time Fourier transform on the two paths of azimuth angle-pitch angle groups aligned with the target and the interference direction obtained in the step 1 by using a rectangular window with 24K points, and calculates a cross-power spectrum;

improving the optimal weight with Wiener post filtering under the traditional MMSE minimum mean square error criterion, wherein the improvement is to add a variable coefficient alpha to improve the weighting effect; weighting the improved weight value and the frequency domain of the target direction, and finally inversely transforming the weighting result to a time domain single channel to remove non-stationary interference signals; the transformation time sequence is Natural Order of the native Order, and the transformation is realized as Radix-2;

finally outputting 48KHz, 24bit and 1CH digital PCM streams, which are single-path target direction output signals S (n) obtained after frequency domain weighting;

step 3, outputting 48KHz, 24bit and 1CH digital PCM stream S (n) in step 2;

the FPGA adopts a decomposition reconstruction FIR filter group of a dB3 wavelet basis to carry out substrate stationary noise estimation on the single-path non-stationary interference-removed signal obtained in the step 2; taking the estimated noise as input and the error signal as output to perform adaptive noise cancellation, updating the weight of an adaptive filter based on an LMS (least mean square) criterion, and realizing filtering as 33-tapsFIR;

finally outputting 48KHz, 24bit, 1CH digital PCM stream, as single target direction output signal S after adaptive noise cancellation _af (n)；

Step 4, outputting 48KHz, 24bit and 1CH digital PCM streams output in the step 3, namely the short-time energy of the single-path signal;

the FPGA calculates the short-time energy of the single-path signal obtained in the step 3 according to the signal envelope extracted by the Hilbert filter, the filter response characteristic is Hilbert-Transformation Hilbert transform, the filtering is realized as 11-taps FIR, the envelope direct-current component is calculated through 16-order sliding filtering, 4-level division is carried out according to the intensity of the envelope direct-current component, and the gain time domain signal under the corresponding division is output through Mux4_ 1; performing automatic gain control on amplitude to ensure the volume balance of output audio;

finally obtaining 48KHz, 24bit, 1CH digital PCM stream S _audio And (n) directly outputting the sound to a sound card and sounding to an external earphone.

2. The FPGA-based acoustic array directional sound pickup method according to claim 1, wherein the step 1 is as follows:

And interference position

Wherein

step 1.2, mapping-80 degrees to 80 degrees into integers quantized-4 to 4 according to the azimuth angle and pitch angle information in the step 1.1 to be used as the length of a tapped delay line, and performing fixed weight delay summation by adopting the tapped delay line mapped to the lengths of different angle information; firstly, carrying out delay summation of 4 rows of microphones which are transversely arranged on an airspace to obtain total 4 x 2CH signals of four paths of respectively aligned targets and interference direction azimuth angles, and then carrying out delay summation of four paths of microphones in a longitudinal direction to obtain 2CH signals of aligned targets and interference direction pitch angles;

the fixed weight delayed sum is expressed as

Wherein w ═ 1, 1., 1] ^T H-4, -3.., 0.. 3.. 4, i.e., ih measures different lengths, n represents a sampling point in the time domain, M represents the number of microphone elements used for the current delay summation, 8 when the transverse microphone delay summation is performed, and 4 when the longitudinal microphone delay summation is performed;

step 1.3, 4 × 7 × 2 adders are used for the transverse alignment azimuth angle, and 3 × 2 adders are used for the longitudinal alignment pitch angle; the adders adopt a tree-shaped production line, the bit width needs to be increased by 1bit after each adder, the output is shifted to the right by 3 bits after accumulated summation, and then the unified cast is converted to 24_0_ fixed; the UART module receives and decodes the control characters into ASCII which is used as a reference signal of the multiplexer Mux9_1, and because h has 9 values, the output selection of the Mux9_1 is required to be carried out during each time of delay summation; to improve timing and overflow issues in step 1.1 to 1.2 hardware design:

And

3. the FPGA-based acoustic array directional sound pickup method according to claim 2, wherein the step 2 is as follows:

step 2.1, performing windowing and framing on the two alignment signals Aligned _ s (n) and Aligned _ i (n) obtained in the step 1 according to rectangular windows with lengths of 0.5s and 24K respectively, and calculating short-time Fourier transform with the same point number to obtain two frequency domain signals F _ s (K) and F _ i (K), wherein K represents a sampling point on a frequency domain;

step 2.2, respectively calculating the self-power spectral density of the two paths of frequency domain signals obtained in the step 2.1 to obtain P _ s (k) and P _ i (k); the self-power spectrum is calculated as P (k) ═ F (k) · F ^* (k) /N, wherein [. cndot.)] ^* Represents the conjugate of the sequence, and N represents the frame length;

adjusting the value of a coefficient alpha according to the short-time power spectral density in the interference direction, wherein the larger the P _ i is, the smaller the alpha is to reduce the music noise caused by excessive weighting, and the smaller the P _ i is, the larger the alpha is to further enhance the suppression effect on the non-stationary interference signal;

step 2.4, after obtaining the weighted weight in step 2.3, the frequency domain transform F _ s (k) in the target direction calculated in step 2.1 is subjected to the weighted inverse fourier transform as described in step 2.3, and the calculation is: s (n) ═ F ^-1 {F_s(k)·w _opt (k)}；

Step 2.5, timing and overflow issues in the hardware design for steps 2.1 to 2.4: the FFT module and the IFFT module both adopt Natural Order and base 2 operation; the conjugate multiplication calculation is substantially modular square calculation, 1 adder and 2 multipliers are used for replacing complex multipliers, Full scales of half-time power spectral densities, and the value of alpha is adjusted by Mux4_1 according to the size of the short-time power spectral density in the interference direction; because the IFFT module only supports Fixed-point input of Fixed-point with an upper limit of 33 bits, the operation results related to the multiplier are all subjected to 16-bit right shift redundancy prevention quantization; the result output by the IFFT module is small, and cast is unified to 24_0_ fixed;

4. The FPGA-based acoustic array directional sound pickup method according to claim 3, wherein the step 3 is as follows:

step 3.1, wavelet decomposition and reconstruction respectively need a pair of FIR filter groups, and the filter coefficients of dB3 wavelet basis are respectively:

a low-pass filter for scale-space decomposition Lo _ D ═ 0.035226-0.085441-0.1350110.4598780.8068920.332671; a height filter Hi _ D [ -0.3326710.806892-0.459878-0.1350110.0854410.035226 ] for wavelet spatial decomposition; a low-pass filter for scale-space reconstruction Lo _ R ═ [ 0.3326710.8068920.459878-0.135011-0.0854410.035226 ]; a high pass filter Hi _ R for wavelet spatial reconstruction [ 0.3326710.806892-0.459878-0.1350110.085441-0.035226 ], the above coefficients all quantized using 32_31_ fixed;

step 3.2, one-layer decomposition: sequentially calculating Lo _ D and Hi _ D through step 3.1 and performing 2-time extraction on the single-path target direction signal S (n) obtained in step 2 to obtain a wavelet domain coefficient A ₁ And D ₁ ；

Step 3.3, two-layer decomposition: d obtained by calculation in step 3.2 ₁ Sequentially passes Lo _ D and Hi _ D calculated in step 3.1, and performs 2-time extraction to obtain a wavelet domain coefficient A ₂ And D ₂ ；

Step 3.4, threshold noise filtering: let D of a layer ₁ The following hard decision and value are made for the quantized value of each sampling and the threshold lambda to extract the estimated noise:

wherein the threshold λ is calculated using the sqtwolog criterion:

wherein sigma ² The estimated variance of the current sample frame is N, and the N is the frame length;

let A be two layers as described above ₂ And D ₂ The same judgment and value taking are also carried out, and the corresponding estimation noise is extracted;

step 3.5, reconstructing a layer: let the estimated noise D of the wavelet space calculated in step 3.2 ₁ Firstly, 2 times of interpolation is carried out, and then time domain sequence N _ D is obtained through Hi _ R reduction ₁ (n)；

Step 3.6, two-layer reconstruction: let the estimated noise D of the wavelet space calculated in step 3.3 ₂ Estimated noise A of sum scale space ₂ Firstly, performing 2-time interpolation and then sequentially passing through Hi _ R and Lo _ R; after the completion, the time domain sequences N _ D are respectively obtained by 2 times of interpolation and then are restored through Lo _ R ₂ (N) and N _ A ₂ (n)；

Step 3.7, time domain superposition: and (3) superposing the estimated noise of a certain frame calculated in the steps 3.5 to 3.7:

N(n)＝N_D ₁ (n)+N_D ₂ (n)+N_A ₂ (n)

With S (n) -y _N (n) as an error signal, performing adaptive FIR filtering based on an LMS criterion; the order of the filter is 32, and the initial weight w _af And (5) setting the weight value as 0, and giving an updated calculation formula of the weight value by a gradient descent method: w is a _af (n+1)＝w _af (n)+2μ(S(n)-y _N (n)) n (n), wherein the learning step size μ ═ 0.005;

step 3.9, timing and overflow issues in the hardware design for step 3.1 to step 3.8: FIR filters used for wavelet decomposition reconstruction are all 6-taps, 5 delay units, 6 multipliers and 5 adders, coefficients of the FIR filters are quantized by using 32_31_ fixed, and the output of each FIR filter is cast to 24_0_ fixed; the variance of a sample frame used for threshold calculation is estimated by adopting a statistical formula, all the related adders and multipliers adopt Full-Scale output, and the mean calculation related to the variance estimation is realized by adopting 16-order moving average filtering; the hard decision of threshold noise filtering is realized by Mux2_ 1; the FIR adaptive filter is 33-taps, 32 delay units, 33 multipliers and 32 adders, the coefficient and the output use 24_0_ fixed, and the learning step uses 16_16_ fixed;

5. The FPGA-based acoustic array directional sound pickup method according to claim 4, wherein the step 4 is as follows:

step 4.1, the response characteristic of the Hilbert filter is essentially an all-pass filter with the amplitude-frequency characteristic of 1, and the Hilbert filter with an even order requires 0 and f _s The position/2 is strictly cut off by more than-20 dB, and the output of the Hilbert filter is characterized in that the delay input is 90 degrees; the Filter design with the properties is realized by using a Filter-Designer tool box, the response characteristic is set to Hilbert-Transformation, the realization method is equal-ripple FIR, the order is 10 orders, and [ -0.53380-0.22630-0.641200.641200.226300.5338 ] is obtained](ii) a The coefficients are quantized and placed in an IP core designed by hardware by adopting 16_15_ fixed, and the IP core is an FIR filter of 11-taps;

step 4.2, the input of Hilbert filter is the single-path digital audio PCM stream S obtained in step 3 _af (n) outputting a pair of orthogonal IQ sequences S in time domain _I (n) and S _Q (n) wherein S _I (n) is S after 5 tap delays _af (n)，S _Q (n) is the output result after 10 taps of the Hilbert filter are delayed and weighted;

step 4.3, calculating S obtained in step 4.2 _I (n) and S _Q The sum of squares of (n) is calculated and the input signal S is obtained _af Envelope of (n):

step 4.4, calculating E obtained in step 4.3 by utilizing 16-order moving average filtering _af Direct current component E of (n) _DC (n) providing a reference signal of Mux for automatic gain control; design Mux4_1, E to be calculated _DC (n) dividing by amplitude into 4 ranges, E for each of the four ranges _DC (n)>2 ²⁰ 、2 ¹⁶ <E _DC (n)≤2 ²⁰ 、2 ¹⁴ <E _DC (n)≤2 ¹⁶ And E _DC (n)≤2 ¹² (ii) a According to calculated E _DC (n) in the range interval, S _af (n) the amplitudes are respectively shifted to the right by 4 bits, to the right by 2 bits, to the right by 1bit and to the left by 1 bit;

step 4.5, for timing and overflow issues in the hardware design of step 4.1 to step 4.4: the FIR filters used for Hilbert transformation are 11-taps, 10 delay units, 11 multipliers and 10 adders, the coefficients of the FIR filters are quantized by using 16_15_ fixed, and the output of each FIR filter is cast to 24_0_ fixed; the square sum operation involves a multiplier-adder, and the subsequent moving average filtering adopts Full-Scale output; e _DC (n) case to 24_0_ fixed by a method of right-shifting 16 bits before being input to Mux4_ 1;

step 4.6, through the calculation of the above steps 4.1 to 4.5, the final output signal is the pair S _af (n) digital audio PCM stream S with automatic gain control of amplitude _audio And (n), distributing the signal to a DDR3 memory of the ARM through an AXI advanced extensible interface bus, sending the signal to an upper computer through a TCP/IP protocol, and driving a sound card to play.