CN114464216A

CN114464216A - Acoustic detection method and device under unmanned driving environment

Info

Publication number: CN114464216A
Application number: CN202210118800.8A
Authority: CN
Inventors: 查诚; 任福佳; 金菶; 刘庆怡
Original assignee: Guizhou Hansikai Intelligent Technology Co ltd
Current assignee: Guizhou Hansikai Intelligent Technology Co ltd
Priority date: 2022-02-08
Filing date: 2022-02-08
Publication date: 2022-05-10

Abstract

The embodiment of the application provides an acoustic detection method and an acoustic detection device under an unmanned driving environment, wherein the acoustic detection method comprises the following steps: acquiring an acoustic signal of an environment where the unmanned vehicle is located at the current moment; carrying out multi-scale spatial decomposition on the acoustic signals to obtain a plurality of groups of decomposition coefficients of the acoustic signals in different scale spaces; respectively extracting acoustic features of each group of decomposition coefficients to obtain a plurality of feature vectors with the same number as the decomposition coefficient groups; correspondingly inputting the plurality of feature vectors into a plurality of trained neural network models respectively, and outputting classification results of the acoustic object recognition in the environment under different scale spaces; and determining whether the sound-producing object is a preset type vehicle according to the classification results in all the scale spaces. The method can realize that the unmanned vehicle utilizes the acoustic signal to identify the specific sounding vehicle such as an ambulance, a police car and the like so as to carry out corresponding control operation such as avoidance and the like.

Description

Acoustic detection method and device under unmanned driving environment

Technical Field

The application relates to the technical field of unmanned control, in particular to an acoustic detection method and device in an unmanned driving environment.

Background

Under the unmanned driving environment, the environment of the unmanned vehicle is often complex, the perception of the current unmanned vehicle on surrounding objects is generally obtained by a millimeter wave radar, a laser radar, a camera and the like, and the signals are all based on a visual perception system. According to the regulations of road traffic safety laws, if special vehicles for performing tasks, such as police cars and ambulances, are encountered, the running vehicles need to avoid. However, a single visual perception signal does not enable the unmanned vehicle to effectively perceive such special vehicle information. The acoustic signal is used as another sensing signal and can also provide rich environment information, and more accurate external environment information can be provided for the unmanned vehicle under specific conditions.

Disclosure of Invention

In view of this, the present application provides an acoustic detection method and apparatus in an unmanned driving environment.

In a first aspect, an embodiment of the present application provides an acoustic detection method in an unmanned driving environment, including:

acquiring an acoustic signal of an environment where the unmanned vehicle is located at the current moment;

carrying out multi-scale spatial decomposition on the acoustic signals to obtain a plurality of groups of decomposition coefficients of the acoustic signals in different scale spaces;

respectively extracting acoustic features of each group of decomposition coefficients to obtain a plurality of feature vectors with the same number as the group of the decomposition coefficients;

correspondingly inputting the plurality of feature vectors into a plurality of trained neural network models respectively, and outputting classification results of the acoustic object recognition in the environment under different scale spaces;

and determining whether the sound-producing object is a preset type vehicle according to the classification results in all the scale spaces.

In some embodiments, the performing multi-scale spatial decomposition on the acoustic signal to obtain multiple sets of decomposition coefficients of the acoustic signal in different scale spaces includes:

performing binary wavelet decomposition on the acoustic signals based on frequency to obtain a plurality of groups of wavelet coefficients and a group of residual coefficients of different scale spaces; the sets of wavelet coefficients and the set of residual coefficients together constitute sets of decomposition coefficients of the acoustic signal.

In some embodiments, each neural network model is constructed using a conditional restricted boltzmann machine, and the training process of the plurality of neural network models includes:

taking P +1 feature vectors with classification labels as first training samples, respectively and correspondingly inputting the first training samples into P +1 first conditional restricted Boltzmann machines, and respectively carrying out unsupervised training on the P +1 first conditional restricted Boltzmann machines by utilizing a preset learning algorithm to obtain network parameters of each first conditional restricted Boltzmann machine; wherein, P is an integer, and the P +1 characteristic vectors are obtained by characteristic extraction of P groups of wavelet coefficients and 1 group of residual coefficients;

taking the label vectors corresponding to the P +1 eigenvectors as second training samples, respectively and correspondingly inputting the second training samples into P +1 second conditional restricted Boltzmann machines, and respectively carrying out unsupervised training on the P +1 second conditional restricted Boltzmann machines by utilizing a preset learning algorithm to obtain network parameters of each second conditional restricted Boltzmann machine;

acquiring high-order feature information of each feature vector in the P +1 feature vectors and high-order feature information of each tag vector in the P +1 tag vectors;

respectively taking the high-order characteristic information of the characteristic vector and the label vector as the input and the output of a recurrent neural network, and carrying out supervision training to obtain network parameters of the recurrent neural network;

and stacking the first conditional limited Boltzmann machine, the recurrent neural network and the second conditional limited Boltzmann machine which obtain the network parameters from bottom to top, and constructing to obtain a depth limited Boltzmann machine which is used as a trained neural network model.

In some embodiments, the determining whether the sound-generating object is a preset type of vehicle according to the classification results in all scale spaces includes:

carrying out weighted average on the classification results under all scale spaces to obtain a combined classification result;

and selecting the vehicle type with the maximum value from the combined classification result to determine whether the sound-producing object is a preset type vehicle.

In some embodiments, the predetermined type of vehicle is any one of a police vehicle, a fire truck, and an ambulance.

In a second aspect, an embodiment of the present application further provides a method for controlling an unmanned vehicle, including:

the acoustic detection method under the unmanned driving environment is adopted to determine whether the sounding object in the environment where the unmanned vehicle is located is a preset type vehicle;

when the sounding object is determined to be a preset type vehicle, calculating the distance from the preset type vehicle to the unmanned vehicle according to the acoustic signals acquired in real time;

and when the preset type vehicle is detected to move to the preset distance range right behind the vehicle, performing avoidance operation.

In some embodiments, upon determining that the sound-generating object is a preset type of vehicle, the method further comprises:

continuously acquiring a plurality of acoustic signals within an environment at subsequent successive time instants;

when it is detected that the plurality of subsequent acoustic signals all contain specific frequency signals sent by the preset type of vehicle, determining whether the preset type of vehicle approaches to the unmanned vehicle according to the amplitude change of the specific frequency signals;

and if the unmanned vehicle approaches, calculating the distance from the preset type vehicle to the unmanned vehicle according to the acoustic signals acquired in real time.

In a third aspect, an embodiment of the present application further provides an acoustic detection apparatus in an unmanned driving environment, including:

the acquisition module is used for acquiring acoustic signals of the environment where the unmanned vehicle is located at the current moment;

the decomposition module is used for carrying out multi-scale space decomposition on the acoustic signals to obtain a plurality of groups of decomposition coefficients of the acoustic signals in different scale spaces;

the characteristic extraction module is used for respectively extracting acoustic characteristics of each group of decomposition coefficients to obtain a plurality of characteristic vectors with the same number as the decomposition coefficient groups;

the model identification module is used for correspondingly inputting the plurality of characteristic vectors into a plurality of trained neural network models respectively and outputting the classification results of the acoustic object identification in the environment under different scale spaces;

and the determining module is used for determining whether the sound-producing object is a preset type vehicle according to the classification result under all the scale spaces.

In a fourth aspect, an embodiment of the present application further provides an unmanned vehicle, where the unmanned vehicle includes a sound sensing unit, a processor, and a memory, where the sound sensing unit is configured to acquire an environmental acoustic signal, and the memory stores a computer program, and the processor is configured to execute the computer program to implement the acoustic detection method or the unmanned vehicle control method in the unmanned driving environment.

In a fifth aspect, an embodiment of the present application further provides a readable storage medium, which stores a computer program, and when the computer program is executed on a processor, the computer program implements the acoustic detection method or the unmanned vehicle control method in the unmanned driving environment.

The embodiment of the application has the following beneficial effects:

according to the acoustic detection method under the unmanned driving environment, the acoustic signal of the environment where the unmanned vehicle is located at the current moment is obtained; carrying out multi-scale spatial decomposition on the acoustic signals to obtain a plurality of groups of decomposition coefficients of the acoustic signals in different scale spaces; respectively extracting acoustic features of each group of decomposition coefficients to obtain a plurality of feature vectors with the same number as the groups of the decomposition coefficients; correspondingly inputting each feature vector into a plurality of trained neural network models respectively, and outputting classification results of the acoustic object recognition in the environment under different scale spaces; and determining whether the sound-producing object is a preset type vehicle according to the classification results in all the scale spaces. According to the method, through multi-resolution analysis of external acoustic signals and further recognition of sounding objects through the neural network model, whether the sounding objects are concerned specific vehicles can be determined more accurately, and more accurate environmental information and basis are provided for automatic avoidance and other operations of unmanned vehicles.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.

FIG. 1 shows a schematic structural diagram of an unmanned vehicle according to an embodiment of the present application;

FIG. 2 is a flow chart illustrating an acoustic detection method in an unmanned driving environment according to an embodiment of the present application;

FIG. 3 is a system architecture diagram illustrating an acoustic detection method in an unmanned driving environment according to an embodiment of the present application;

FIG. 4 is a flow chart illustrating a training process of a plurality of neural network models of an acoustic detection method in an unmanned driving environment according to an embodiment of the present application;

fig. 5a and 5b respectively show a composition structure of the depth-limited zmann machine and a network structure diagram of the depth-limited zmann machine when the number of RNN hidden layers is 1 according to the embodiment of the present application;

FIG. 6 shows a first flowchart of an unmanned vehicle control method of an embodiment of the present application;

FIG. 7 is a graph illustrating a relationship between training sample labels and a linear distance of a sound source collecting system in the unmanned vehicle control method according to the embodiment of the present application;

FIG. 8 shows a second flowchart of an unmanned vehicle control method of an embodiment of the present application;

fig. 9 is a schematic structural diagram illustrating an acoustic detection device in an unmanned driving environment according to an embodiment of the present application;

fig. 10 is a schematic structural diagram showing the unmanned vehicle control device according to the embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments.

The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.

Hereinafter, the terms "including", "having", and their derivatives, which may be used in various embodiments of the present application, are intended to indicate only specific features, numbers, steps, operations, elements, components, or combinations of the foregoing, and should not be construed as first excluding the existence of, or adding to, one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing.

Furthermore, the terms "first," "second," "third," and the like are used solely to distinguish one from another and are not to be construed as indicating or implying relative importance.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the various embodiments of the present application belong. The terms (such as those defined in commonly used dictionaries) should be interpreted as having a meaning that is consistent with their contextual meaning in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in various embodiments.

Fig. 1 is a schematic structural diagram of an unmanned vehicle according to an embodiment of the present application. Exemplarily, the unmanned vehicle mainly includes a processor 11, a memory 12 and a sound sensing unit 13, where the memory 12 and the sound sensing unit 13 are both connected to the processor 11, the memory 12 stores a corresponding computer program, and the processor 11 is configured to execute the computer program to implement the acoustic detection method or the unmanned vehicle control method in the unmanned driving environment in the embodiment of the present application, and by performing multi-resolution analysis on an external acoustic signal and further performing recognition on a sounding object through a neural network model, it can be determined more accurately whether the sounding object is a specific vehicle that is concerned relatively, so as to provide more accurate environmental information and basis for the unmanned vehicle to implement operations such as automatic avoidance.

The processor 11 may be an integrated circuit chip having signal processing capability. The Processor may be a general-purpose Processor including at least one of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), and a Network Processor (NP), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that implements or executes the methods, steps and logic blocks disclosed in the embodiments of the present application.

The Memory 12 may be, but not limited to, a Random Access Memory (RAM), a Read Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like. The memory is used for storing a computer program, and the processor can execute the computer program correspondingly after receiving the execution instruction.

The sound sensing unit 13 is mainly used for collecting environmental sound information around the unmanned vehicle and transmitting the environmental sound information to the vehicle control system, so that the vehicle control system performs corresponding operations according to the environmental sound information. For example, the sound sensing unit 13 may include, but is not limited to, a sound sensor, such as a MIC (microphone), a sound pickup array, and the like, provided inside and/or outside the vehicle. It should be understood that there may be more than one sound sensing unit in the embodiments of the present application, and when there are multiple sound sensing units, synchronous sampling may be performed to obtain sound signals and the like in different directions of the vehicle at the same time, so that the vehicle can determine the direction, distance, and the like of the same sound source.

Based on the unmanned vehicle with the structure, the embodiment of the application provides an acoustic detection method under the unmanned driving environment. The method is described in detail below.

Referring to fig. 2, exemplarily, the acoustic detection method in the unmanned driving environment includes steps S110 to S150:

and S110, acquiring acoustic signals of the environment where the unmanned vehicle is located at the current moment.

For example, during the process of driving the unmanned vehicle, the acoustic signal of the environment where the vehicle is located can be collected in real time through the sound sensing unit 13 arranged on the vehicle, and then the signal is subjected to acoustic analysis by the control system of the vehicle, so as to obtain the relevant signal of the outside from the signal.

And S120, carrying out multi-scale spatial decomposition on the acoustic signal to obtain multiple groups of decomposition coefficients of the acoustic signal in different scale spaces.

The multi-scale spatial decomposition is also multi-resolution decomposition, which means that signals are decomposed on different scale spaces to obtain signal expansion coefficients under different scales (resolutions). It is understood that the original signal before decomposition can be restored from the expansion coefficients of these wavelet transforms. And based on the expansion results of different scales, the observation and analysis of the target frequency under different thickness accuracies can be carried out.

In the unmanned vehicle, in view of the fact that the collected acoustic signals of the external environment are often complex, and generally include sounds of different frequencies or the same frequency emitted by various objects, and the sounds of these frequencies may be superimposed or masked at a certain moment, so that the difficulty of recognizing the sound-emitting object by the sounds increases, for this reason, the present embodiment proposes to perform subband decomposition on the acoustic signals by using different scale functions to obtain subbands that do not intersect with each other, thereby performing feature extraction from these subbands, performing further classification on the sound-emitting object, and the like.

For example, in one embodiment, the acoustic signal f acquired at time t may be processed^(t)Performing binary wavelet decomposition based on frequency to obtain multiple groups of wavelet coefficients and a group of residual coefficients in different scale spaces, wherein P groups of wavelet coefficients can be expressed as

j represents the jth scale space; a set of residual coefficients can be expressed as

Thus, the sets of wavelet coefficients

And a set of residual coefficients

Together forming a plurality of sets of decomposition coefficients of the acoustic signal. It is understood that the above binary wavelet decomposition can be realized by the Mallat algorithm, and as to how the Mallat algorithm calculates the residual coefficients and wavelet coefficients of each scale space, reference can be made to the related literature, and the description is not made here.

Then, after performing multi-resolution analysis on the acoustic signal, a plurality of sets of decomposition coefficients may be obtained, as shown in fig. 3, such as including the above P +1 sets of decomposition coefficients, and then performing feature extraction on each set of decomposition coefficients to obtain frequency feature information at different resolutions.

And S130, respectively carrying out acoustic feature extraction on each group of decomposition coefficients to obtain a plurality of feature vectors with the same number as the group of the decomposition coefficients.

The acoustic features may include, but are not limited to, signal intensity, loudness, mel-frequency cepstral coefficients, line spectrum pairs, and energy in different frequency bands. And then, carrying out statistical calculation on the features extracted from the same group of decomposition coefficients, and cascading to obtain the corresponding feature vector in the scale space. The statistical calculation function may include a mean, a variance, a linear regression coefficient, a standard deviation, a kurtosis, a slope, a quartile, a spacing, and the like.

When the feature extraction is performed, features of a frame level, a segment level, a speech level and the like can be selected according to actual requirements. For example, in one embodiment, for the feature extraction in step S130, frame-level acoustic feature extraction may be performed to obtain corresponding features.

And S140, correspondingly inputting each feature vector into the trained neural network models respectively, and outputting the classification results of the acoustic object recognition in the environments under different scale spaces.

In the unmanned environment, acoustic event detection usually includes two parts of feature learning and pattern recognition, and in some existing acoustic event detection methods, the two parts are usually processed independently, for example, in the case of pattern recognition, recognition is often performed by using a Support Vector Machine (SVM), a Gaussian Mixture Model (GMM), and the like, and acoustic features used for detecting a recognition task do not effectively utilize correlation of the recognition task. Therefore, the embodiment of the application provides an acoustic event detection method based on the combination of wavelet decomposition and a depth-limited Boltzmann machine, and combines feature learning and pattern recognition in acoustic event detection, so that the correlation between acoustic features and recognition tasks is fully utilized, and more accurate external environment signals are acquired.

Exemplarily, the structure of each neural network model is the same, and the neural network model is constructed by using a conditional limited boltzmann machine as an input layer and an output layer of the model. Among them, the Conditional RBM (CRBM) is a random neural network, which usually includes a visible layer and a hidden layer, and neurons in the same layer are independent from each other, while neurons in different network layers are connected to each other. In the training process, network parameters such as connection weights and offset vectors between the visible layers and the hidden layers need to be calculated. As an intermediate layer connecting the input layer and the output layer, a Recurrent Neural Network (RNN) is used.

It can be appreciated that acoustic detection of a vehicle in an unmanned driving environment is a very challenging task due to the effects of ambient noise. The acoustic signals of the vehicle have better time sequence stability relative to the environmental noise, so that the RNN is used as an intermediate layer of the depth-limited Boltzmann machine to perform time sequence modeling on the transformation relation between the feature vector and the high-order feature information of the tag vector, so as to improve the identification accuracy rate and the like of the acoustic signals of the vehicle.

In one embodiment, the neural network models can be pre-constructed and trained, as shown in fig. 4, the training process can include the following sub-steps S141 to S145:

s141, taking P +1 feature vectors with classification labels as first training samples, respectively and correspondingly inputting the first training samples into P +1 first conditional restricted Boltzmann machines, and respectively performing unsupervised training on the P +1 first conditional restricted Boltzmann machines by using a preset learning algorithm to obtain network parameters of each first conditional restricted Boltzmann machine; wherein, P is an integer, and the P +1 characteristic vectors are obtained by performing characteristic extraction and statistical calculation on P groups of wavelet coefficients and 1 group of residual coefficients.

The classification label can be pre-labeled by collecting training samples, and mainly refers to which category the sound-producing object belongs to. For unmanned vehicles, the classification tags described above will be associated with the particular vehicle performing the task in view of the need to avoid the particular vehicle in time, and may include, but are not limited to, for example, a police vehicle, a fire truck, an ambulance, etc. to mark the acoustic signal. It is understood that the classification label corresponds to a predetermined type of vehicle mentioned later, that is, if the vehicle is identified as a predetermined type of vehicle, a training sample containing a specific frequency signal emitted by the predetermined type of vehicle should be used for classification labeling and learning training.

Since the first conditional restricted boltzmann machine is for feature vectors, hereinafter referred to as feature CRBM. In the training of the feature CRBM, the predetermined learning algorithm may include, but is not limited to, a Contrast Divergence (CD) algorithm, a simulated annealing (simulated annealing) algorithm, and the like, which are not limited herein and may be selected according to actual requirements.

Suppose that the p-th feature vector is denoted as

Where P is 1, 2, …, P +1, and t denotes the t-th time. Taking the contrast divergence method as an example, for step S141, the principle of maximum likelihood estimation can be used

Respectively carrying out unsupervised training on the P +1 characteristic CRBM by adopting a contrast divergence method to obtain a P +1 group of parameters, and recording the parameters as

Wherein,

for visible layer nodes in p-th feature CRBM

And hidden layer node

Is given by the directional connection weight matrix of (a),

for visible layer nodes in p-th feature CRBM

And hidden layer node

Is given by the directional connection weight matrix of (a),

for the visible layer node at the t-th time in the p-th feature CRBM

And visible layer node at time t-1

The connection weight matrix of (2).

It will be appreciated that the nodes of the feature CRBM correspond one-to-one to the dimensions of the feature vector. In the embodiment of the application, the acoustic object is identified by respectively adopting one conditionally restricted Boltzmann machine for the decomposition coefficients in different scale spaces, and then the final classification result is determined by combining the classification results in all the scale spaces, so that the identification accuracy rate is improved, and the like.

And S142, taking the label vectors corresponding to the P +1 characteristic vectors as second training samples, respectively and correspondingly inputting the second training samples into the P +1 second conditional restricted Boltzmann machines, and respectively carrying out unsupervised training on the P +1 second conditional restricted Boltzmann machines by using a preset learning algorithm to obtain network parameters of each second conditional restricted Boltzmann machine.

It is understood that the first conditional restricted boltzmann machine described above mainly aims at feature vectors, and the second conditional restricted boltzmann machine mainly aims at tag vectors (hereinafter referred to as tag CRBM), where "first" and "second" are only used for distinguishing conditional restricted boltzmann machines of different vectors.

In step S142, the unsupervised training of the tag CRBM may be performed by referring to the above training mode of the feature vector. E.g. based on the maximum likelihood estimation principle

Adopting contrast divergence method to carry out unsupervised training on the CRBM to obtain corresponding network parameters

Wherein

CRBM visible layer node for p-th label

And hidden layer vector

Is used to determine the directional connection weight matrix of (c),

CRBM visible layer vector for the p-th label

And hidden layer vector

Is given by the directional connection weight matrix of (a),

CRBM visible layer vector for the p-th label

And

directed connection rights ofAnd (4) a heavy matrix.

S143, obtaining the high-order feature information of each feature vector in the P +1 feature vectors and the high-order feature information of each label vector in the P +1 label vectors.

For the p-th feature CRBM, the feature vector at time t is known

And the feature vector at time t-1

Feature vectors can be obtained using mean field approximation

And

higher-order feature information of

Wherein s is a sigmoid function;

are the bias vectors of the visible layer nodes and hidden layer nodes in the feature CRBM.

Similarly, for the p-th tag CRBM, the tag vector at the t moment is known

And the tag vector at time t-1

Tag vector can be obtained using mean field approximation

And

high-order characteristic information of

Wherein,

are the bias vectors for the visible layer nodes and hidden layer nodes in the label CRBM.

It should be understood that the execution order of the steps S141, S142, and S143 is not limited, and may be executed simultaneously, or executed according to a set sequence.

And S144, respectively taking the feature vector and the high-order feature information of the label vector as the input and the output of the recurrent neural network, and performing supervision training to obtain the network parameters of the recurrent neural network.

The Recurrent Neural Network (RNN) is mainly used for learning and training between feature vectors and high-order feature information of label vectors. Exemplarily, the output of the feature CRBM

And input of tag CRBM

And respectively used as input and output of RNN to make supervision training. In one embodiment, RNN acts as a depth-limited boltzmann machine middle layer whose objective function may be:

wherein,

and

respectively the weights of the recurrent neural network,

and

respectively an offset, λ_pIs a regularization factor. When the RNN hidden layer has only 1 layer,

the expression of (a) is as follows:

s145, stacking the first conditional restricted Boltzmann machine, the recurrent neural network and the second conditional restricted Boltzmann machine which obtain the network parameters from bottom to top, and building up the depth restricted Boltzmann machine.

Exemplarily, after the network parameters of the features CRBM, RNN and the label CRBM are obtained, as shown in fig. 5a, the trained features CRBM, RNN and label CRBM are sequentially stacked and built from bottom to top, so as to implement mapping from the acoustic features to the object labels. For example, fig. 5b specifically shows a network structure diagram of a depth-limited boltzmann machine when the number of hidden layers of RNN is 1 and p is 1.

Thus, for a given feature vector

And

output to tag CRBM

Then there are:

wherein N represents a normal distribution function,

representing p-th label CRBM visible layer node

And hidden layer vector

Is given by the directional connection weight matrix of (a),

indicating the offset of the p-th tag CRBM,

the variance of the normal distribution N function is represented. The above formula can be accurately estimated based on a least square method, and the unknown parameters are solved by adopting an interleaving iteration method. It will be appreciated that the training process for each neural network model is the same.

As for the step S140, as shown in fig. 3, in the actual detection process, only a plurality of feature vectors extracted at corresponding times need to be respectively and correspondingly input into the neural network models, and then corresponding classification label vectors can be output, that is, a classification result for recognizing the sound-generating object is obtained.

And S150, determining whether the sound-producing object is a preset type vehicle according to the classification result in all the scale spaces.

In one embodiment, step S150 includes: and carrying out weighted average on the classification results under all the scale spaces to obtain a combined classification result. And then, selecting the vehicle type corresponding to the maximum value from the combined classification result to determine whether the sounding object is a preset type vehicle.

Since each depth-limited boltzmann machine will output a corresponding predictive classification vector that includes probabilities of identifying the sound producing object as a different type of vehicle, it will be appreciated that these types of vehicles are all designated vehicles of interest, and may include, but are not limited to, police, fire, and ambulance vehicles, and the like. Wherein, the corresponding numerical range of each label vector component is [0,1 ]. The outputs of the depth limited boltzmann machines can then be weighted averaged to obtain a combined tag output for the sound producing object. Different weight values can be distributed to the depth-limited boltzmann machine under different scales, and specific weight values can be set according to actual conditions, which are not limited herein.

For example, after weighted averaging is performed on probability values of vehicles belonging to the same type in each depth-limited boltzmann machine, a combined average label vector is obtained, and then one with the highest probability is selected as a final classification result. For example, assuming that the preset type vehicle is a type a vehicle, if the probability value of the type a vehicle in the combined classification result is the maximum, it is determined that the sounding object is indeed the preset type vehicle. Further, if an ambulance, a fire engine, or the like is determined, the unmanned vehicle may further determine whether to avoid or the like related operations.

In the embodiment of the application, CRBM unsupervised training is respectively carried out on the feature vectors and the label vectors corresponding to the feature vectors in different scale spaces, high-order feature information of the two vectors is extracted, and then the high-order feature information of the feature vectors and the label vectors is respectively used as input and output of the RNN, so that the identification accuracy of the acoustic features of the vehicle can be improved.

Referring to fig. 6, based on the acoustic detection method in the unmanned driving environment, an embodiment of the present application further provides a method for controlling an unmanned vehicle, so as to execute a corresponding control operation when it is determined that a sound-generating object in the environment is a preset type of vehicle.

Exemplarily, the unmanned vehicle control method comprises steps S210 to S230:

step S210, using the acoustic detection method in the unmanned driving environment of the above embodiment to determine whether the sound object in the environment of the unmanned vehicle is a preset type vehicle.

For example, during the operation of the unmanned vehicle, the acoustic detection method of the above embodiment may be used to determine whether a specific vehicle, such as a police car or an ambulance, exists according to the sound signal obtained in real time, and if so, step S220 is executed.

Step S220, when the sound-producing object is determined to be a preset type vehicle, calculating a distance from the preset type vehicle to the unmanned vehicle according to the acoustic signal acquired in real time.

Regarding the positioning information of the vehicle, in one embodiment, a subspace-based sound source positioning method may be employed, while the distance information is determined using the tag value variation range. For example, when building a training sample, the training sample is labeled using a step function as shown in fig. 7, where a linear distance is mainly collected. When in marking, the variation range of the sound source orientation can be 0-360 degrees, and the interval can be 10 degrees.

And step S230, when the preset type vehicle is detected to move to the preset distance range right behind the vehicle, performing avoidance operation. For example, if it is detected that a specific vehicle that is performing a task moves to a position directly behind the unmanned vehicle and is a short distance away from the unmanned vehicle, the avoidance operation is immediately started to improve the intelligence of the unmanned vehicle.

As an alternative, in some cases, a specific sound emitted by a specific vehicle may be recognized only at a longer distance, but may disappear soon, for example, the specific vehicle enters another corner area, and the like, so that the unmanned vehicle does not need to perform operations such as real-time distance judgment and further subsequent avoidance, and the like, so that resource occupation and cruising ability can be saved.

Exemplarily, as shown in fig. 8, when the sound generating object is determined to be a preset type of vehicle, the acoustic detection method in the unmanned driving environment further includes steps S240 to S250:

s240, a plurality of acoustic signals in the environment at subsequent consecutive time instances are continuously acquired.

And S250, when the fact that the plurality of subsequent acoustic signals contain the specific frequency signal emitted by the preset type of vehicle is detected, determining whether the preset type of vehicle approaches to the unmanned vehicle or not according to the amplitude change of the specific frequency signal. If the unmanned vehicle approaches, the step S230 is executed.

Fig. 9 is a schematic structural diagram of an acoustic detection apparatus 100 in an unmanned driving environment according to an embodiment of the present disclosure.

Exemplarily, the acoustic detection apparatus 100 in the unmanned driving environment includes:

the acquiring module 110 is configured to acquire an acoustic signal of an environment where the unmanned vehicle is located at the current time;

the decomposition module 120 is configured to perform multi-scale spatial decomposition on the acoustic signal to obtain multiple groups of decomposition coefficients of the acoustic signal in different scale spaces;

a feature extraction module 130, configured to perform acoustic feature extraction on each group of decomposition coefficients respectively to obtain a plurality of feature vectors with the same number as the group of decomposition coefficients;

the model identification module 140 is configured to correspondingly input the feature vectors into a plurality of trained neural network models, and output classification results of the acoustic object identification in the environments in different scale spaces;

and the determining module 150 is configured to determine whether the sound-generating object is a preset type vehicle according to the classification results in all scale spaces.

It is understood that the device of the present embodiment corresponds to the acoustic detection method in the driverless driving environment of the above embodiment, and the alternatives in the method are also applicable to the present embodiment, so that the description is not repeated here.

Fig. 10 is a schematic structural diagram of an unmanned vehicle control device 200 according to an embodiment of the present application. Exemplarily, the unmanned vehicle control device 200 includes:

the acoustic detection module 210 is configured to determine whether a sound-generating object in an environment where the unmanned vehicle is located is a preset type vehicle by using the acoustic detection method in the unmanned driving environment according to the above embodiment;

the distance obtaining module 220 is configured to, when it is determined that the sounding object is a preset type vehicle, calculate a distance from the preset type vehicle to the unmanned vehicle according to the acoustic signal obtained in real time;

and the avoidance control module 230 is configured to perform an avoidance operation when it is detected that the preset type of vehicle moves within a preset distance range right behind the vehicle.

It is to be understood that the apparatus of the present embodiment corresponds to the unmanned vehicle control method of the above-described embodiment, and the alternatives in the method are also applicable to the present embodiment, and therefore, the description will not be repeated here.

The present application also provides a readable storage medium for storing said computer program for use in the above-mentioned unmanned vehicle.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, each functional module or unit in each embodiment of the present application may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.

The functions may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a smart phone, a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application.

Claims

1. An acoustic detection method in an unmanned driving environment, comprising:

2. The method according to claim 1, wherein the performing multi-scale spatial decomposition on the acoustic signal to obtain multiple sets of decomposition coefficients of the acoustic signal in different scale spaces comprises:

3. The acoustic detection method in the unmanned driving environment of claim 2, wherein each neural network model is constructed by using a conditional restricted boltzmann machine, and the training process of the plurality of neural network models comprises:

4. The method according to claim 1, wherein the determining whether the sound-generating object is a preset type of vehicle according to the classification result in all scale spaces comprises:

5. The acoustic detection method in an unmanned driving environment of any one of claims 1 to 4, wherein the predetermined type of vehicle is any one of a police vehicle, a fire truck and an ambulance.

6. A method of controlling an unmanned vehicle, comprising:

adopting the acoustic detection method in the unmanned driving environment according to any one of claims 1 to 5 to determine whether the sounding object in the environment where the unmanned vehicle is located is a preset type vehicle;

7. The unmanned vehicle control method of claim 6, wherein upon determining that the sound-generating object is a preset type of vehicle, the method further comprises:

8. An acoustic detection device under an unmanned driving environment, comprising:

9. An unmanned vehicle, comprising a sound sensing unit for obtaining ambient acoustic signals, a processor and a memory, the memory storing a computer program, the processor being configured to execute the computer program to implement the method for acoustic detection in an unmanned driving environment of any one of claims 1-5 or the method for controlling an unmanned vehicle of any one of claims 6-7.

10. A readable storage medium, characterized in that it stores a computer program which, when executed on a processor, implements the method of acoustic detection in an unmanned driving environment according to any of claims 1-5 or the method of controlling an unmanned vehicle according to any of claims 6 or 7.