CN112793584B

CN112793584B - Emergency vehicle audio detection

Info

Publication number: CN112793584B
Application number: CN202011481773.8A
Authority: CN
Inventors: 罗琦; 许珂诚; 周金运; 肖祥全; 黄硕; 胡江滔; 缪景皓
Original assignee: Baidu USA LLC
Current assignee: Baidu USA LLC
Priority date: 2019-12-05
Filing date: 2020-12-07
Publication date: 2023-12-15
Anticipated expiration: 2040-12-07
Also published as: CN112793584A; US11609576B2; US20210173408A1

Abstract

The present disclosure provides an emergency vehicle audio detection. In one embodiment, the process is performed during control of an Autonomous Driving Vehicle (ADV). The microphone signal senses sound in the environment of the ADV. The microphone signals are combined and filtered to form an audio signal having sound sensed in the environment of the ADV. A neural network is applied to the audio signal to detect the presence of an audio signature of the emergency vehicle siren. If a siren is detected, a change in the audio signature is used to make a determination as to whether the emergency vehicle siren is a) moving toward the ADV or b) not moving toward the ADV. The ADV may make driving decisions, such as slowing down, stopping, and/or turning to one side, based on the emergency vehicle whistle being moved toward the ADV.

Description

Emergency vehicle audio detection

Technical Field

Embodiments of the present disclosure generally relate to operating an autonomous vehicle. More particularly, embodiments of the present disclosure relate to audio-based detection of emergency vehicles.

Background

A vehicle operating in an autonomous mode (e.g., unmanned) may alleviate occupants (particularly the driver) from certain driving-related responsibilities. When operating in autonomous mode, the vehicle may navigate to various locations using onboard sensors, allowing the vehicle to travel with minimal human interaction, or in some cases without any passenger.

Emergency vehicles, such as police cars, ambulances, and fire trucks, use an emergency siren to alert pedestrians and vehicle drivers to an emergency. When the siren sounds, pedestrians and vehicle drivers must be away from the emergency vehicle and allow the emergency vehicle to pass. The siren produces a loud noise that can be easily heard, identified, and propagated relatively far.

Autonomous Driving Vehicles (ADV) must be aware of the emergency so that the ADV can respond correctly. For example, if an emergency siren on a fire truck sounds, the ADV should be remote from the fire truck. Thus, it is desirable that the ADV make driving decisions based on the audio detection of the emergency siren so that the ADV can be away from the fire lanes during an emergency.

Disclosure of Invention

In a first aspect, there is provided a computer-implemented method of operating an Autonomous Driving Vehicle (ADV), the method comprising:

capturing an audio signal stream using one or more audio capture devices mounted on the ADV, the audio signal stream representing an audible environment surrounding the ADV;

applying a neural network to at least a portion of the audio signal stream to detect an emergency vehicle siren signal;

in response to determining the emergency vehicle siren signal, performing an audio analysis on at least a portion of the audio signal stream to determine whether the emergency vehicle is moving toward or away from the ADV; and

the trajectory of the ADV is planned to be controlled based on a determination of whether the emergency vehicle is moving towards or away from the ADV.

In a second aspect, there is provided a data processing system comprising:

a processor; and

a memory coupled to the processor and for storing instructions that, when executed by the processor, cause the processor to perform operations of the method of operating an Autonomous Driving Vehicle (ADV) as described in the first aspect.

In a third aspect, there is provided a non-transitory machine readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform the operations of the method of operating an Autonomous Driving Vehicle (ADV) as described in the first aspect.

By the method and system of the present application, the ADV can make the correct driving decision based on the audio detection of the siren.

Drawings

Embodiments of the present disclosure are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements.

FIG. 1 is a block diagram illustrating a networked system according to one embodiment.

FIG. 2 is a block diagram illustrating an example of an autonomous vehicle according to one embodiment.

3A-3B are block diagrams illustrating examples of a perception and planning system for use with an autonomous vehicle according to one embodiment.

Fig. 4 illustrates a process for detecting an emergency vehicle siren, according to some embodiments.

Fig. 5 shows a block diagram illustrating a system for detecting an emergency vehicle siren, according to some embodiments.

Fig. 6 illustrates an example of buffering an audio signal according to some embodiments.

Fig. 7-10 illustrate examples of ADVs interacting with emergency vehicles.

Detailed Description

Various embodiments and aspects of the disclosure will be described with reference to details discussed below, and the accompanying drawings will illustrate the various embodiments. The following description and drawings are illustrative of the disclosure and are not to be construed as limiting the disclosure. Numerous specific details are described to provide a thorough understanding of various embodiments of the disclosure. However, in some instances, well-known or conventional details are not described in order to provide a brief discussion of embodiments of the present disclosure.

Reference in the specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the disclosure. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment.

A system and process may a) detect the presence of an emergency vehicle whistle in one or more microphone signals, and b) determine whether the emergency vehicle whistle is approaching an ADV based on a change in the amplitude or frequency of the whistle (e.g., via a doppler effect). If the ADV senses that the emergency vehicle is approaching the ADV (i.e., getting closer), the ADV may make driving decisions to give the emergency vehicle an unobstructed path.

Fig. 1 is a block diagram illustrating an autonomous vehicle network configuration according to one embodiment of the present disclosure. Referring to fig. 1, a network configuration 100 includes an autonomous vehicle 101 that may be communicatively coupled to one or more servers 103-104 through a network 102. Although one autonomous vehicle is shown, multiple autonomous vehicles may be coupled to each other and/or to servers 103-104 through network 102. The network 102 may be any type of network, such as a Local Area Network (LAN), wide Area Network (WAN) (such as the Internet, cellular network, satellite network, or a combination thereof), wired or wireless network. Servers 103-104 may be any kind of server or cluster of servers, such as Web or cloud servers, application servers, backend servers, or a combination thereof. The servers 103-104 may be data analysis servers, content servers, traffic information servers, map and point of interest (MPOI) servers, or location servers, among others.

An autonomous vehicle refers to a vehicle that may be configured to be in an autonomous mode in which the vehicle navigates through an environment with little or no input from a driver. Such autonomous vehicles may include a sensor system having one or more sensors configured to detect information about the environment in which the vehicle is operating. The vehicle and its associated controller(s) use the detected information to navigate through the environment. The autonomous vehicle 101 may operate in a manual mode, a fully autonomous mode, or a partially autonomous mode.

In one embodiment, autonomous vehicle 101 includes, but is not limited to, a perception and planning system 110, a vehicle control system 111, a wireless communication system 112, a user interface system 113, and a sensor system 115. Autonomous vehicle 101 may also include certain general components included in a common vehicle, such as an engine, wheels, steering wheel, transmission, etc., which may be controlled by vehicle control system 111 and/or sensing and planning system 110 using various communication signals and/or commands, such as, for example, acceleration signals or commands, deceleration signals or commands, steering signals or commands, braking signals or commands, etc.

The components 110-115 may be communicatively coupled to each other via an interconnect, bus, network, or combination thereof. For example, the components 110-115 may be communicatively coupled to each other via a Controller Area Network (CAN) bus. The CAN bus is a vehicle bus standard designed to allow microcontrollers and devices to communicate with each other in applications without a host. It is a message-based protocol originally designed for multiple electrical wiring in an automobile, but is also used in many other situations.

Referring now to FIG. 2, in one embodiment, sensor system 115 includes, but is not limited to, one or more cameras 211, a Global Positioning System (GPS) unit 212, an Inertial Measurement Unit (IMU) 213, a radar unit 214, and a light detection and ranging (LIDAR) unit 215. The GPS system 212 may include a transceiver 214, the transceiver 214 being operable to provide information regarding the location of the autonomous vehicle. The IMU unit 213 may sense changes in the position and orientation of the autonomous vehicle based on inertial acceleration. Radar unit 214 may represent a system that utilizes radio signals to sense objects within the local environment of an autonomous vehicle. In some embodiments, radar unit 214 may additionally sense the speed and/or heading of an object in addition to sensing the object. The LIDAR unit 215 may use a laser to sense objects in the environment in which the autonomous vehicle is located. The LIDAR unit 215 may include one or more laser sources, a laser scanner, and one or more detectors, as well as other system components. The camera 211 may include one or more devices to capture images of the environment surrounding the autonomous vehicle. The camera 211 may be a still camera and/or a video camera. The camera may be mechanically movable, for example by mounting the camera on a rotating and/or tilting platform.

The sensor system 115 may also include other sensors such as sonar sensors, infrared sensors, steering sensors, throttle sensors, brake sensors, and audio sensors (e.g., microphones). The audio sensor may be configured to capture sound from an environment surrounding the autonomous vehicle. The steering sensor may be configured to sense a steering angle of a steering wheel, a vehicle wheel, or a combination thereof. The throttle sensor and the brake sensor sense a throttle position and a brake position of the vehicle, respectively. In some cases, the throttle sensor and the brake sensor may be integrated as an integrated throttle/brake sensor. In some embodiments, any combination of sensors (e.g., cameras, scanners, and/or detectors) of the sensor system may collect data for detecting an obstacle.

In one embodiment, the vehicle control system 111 includes, but is not limited to, a steering unit 201, a throttle unit 202 (also referred to as an acceleration unit), and a braking unit 203. The steering unit 201 is used to adjust the direction or forward direction of the vehicle. The throttle unit 202 is used to control the speed of the motor or engine and thus the speed and acceleration of the vehicle. The brake unit 203 is used to slow down the vehicle by providing friction to slow down the wheels or tires of the vehicle. Note that the components shown in fig. 2 may be implemented in hardware, software, or a combination thereof.

Referring again to fig. 1, the wireless communication system 112 is used to allow communication between the autonomous vehicle 101 and external systems (such as devices, sensors, other vehicles, etc.). For example, wireless communication system 112 may communicate wirelessly with one or more devices directly or via a communication network, such as through servers 103-104 of network 102. The wireless communication system 112 may communicate with another component or system using any cellular communication network or Wireless Local Area Network (WLAN), for example using WiFi. The wireless communication system 112 may communicate directly with devices (e.g., a passenger's mobile device, a display device within the vehicle 101, speakers), for example, using an infrared link, bluetooth, or the like. The user interface system 113 may be part of peripheral devices implemented within the vehicle 101, including, for example, a keyboard, a touch screen display, a microphone, a speaker, and the like.

Some or all of the functions of the autonomous vehicle 101 may be controlled or managed by the perception and planning system 110, particularly when operating in an autonomous driving mode. The perception and planning system 110 includes the necessary hardware (e.g., processor(s), memory, storage devices) and software (e.g., operating system, planning and routing programs) to receive information from the sensor system 115, the control system 111, the wireless communication system 112 and/or the user interface system 113, process the received information, plan a route or path from the origin to the destination point, and then drive the vehicle 101 based on the planning and control information. Alternatively, the perception and planning system 110 may be integrated with the vehicle control system 111.

For example, a user as a passenger may specify a starting location and destination of a trip, e.g., via a user interface. The perception and planning system 110 obtains data relating to the trip. For example, the awareness and planning system 110 may obtain location and route information from an MPOI server (which may be part of servers 103-104). The location server provides location services, and the MPOI server provides map services and POIs for certain locations. Alternatively, such location and MPOI information may be cached locally in persistent storage of the perception and planning system 110.

The perception and planning system 110 may also obtain real-time traffic information from a traffic information system or server (TIS) as the autonomous vehicle 101 moves along the route. Note that servers 103-104 may be operated by third party entities. Alternatively, the functionality of servers 103-104 may be integrated with sensing and planning system 110. Based on the real-time traffic information, MPOI information and location information detected or sensed by the sensor system 115, and real-time traffic environment data (e.g., obstacles, objects, nearby vehicles), the perception and planning system 110 may plan an optimal route and drive the vehicle 101 via the control system 111, e.g., according to the planned route, to safely and efficiently reach the specified destination.

The server 103 may be a data analysis system that performs data analysis services for various clients. In one embodiment, data analysis system 103 includes a data collector 121 and a machine learning engine 122. The data collector 121 collects driving statistics 123 from individual vehicles (autonomous vehicles or conventional vehicles driven by human drivers). The driving statistics 123 include information indicating the responses (e.g., speed, acceleration, deceleration, direction) of the vehicle captured by the vehicle's sensors at different points in time, for example, by issued driving commands (e.g., throttle, brake, steering commands). The driving statistics 123 may also include information describing driving environments at different points in time, such as, for example, routes (including start and destination locations), MPOI, road conditions, weather conditions, and the like.

Based on the driving statistics 123, the machine learning engine 122 generates or trains a set of rules, algorithms, and/or predictive models 124 for various purposes. Algorithm 124 may then be uploaded to the ADV for real-time use during autonomous driving.

Fig. 3A and 3B are block diagrams illustrating examples of a perception and planning system for use with an autonomous vehicle according to one embodiment. The system 300 may be implemented as part of the autonomous vehicle 101 of fig. 1, including but not limited to the perception and planning system 110, the control system 111, and the sensor system 115. Referring to fig. 3A-3B, perception and planning system 110 includes, but is not limited to, a positioning module 301, a perception module 302, a prediction module 303, a decision module 304, a planning module 305, a control module 306, and a routing module 307.

Some or all of the modules 301-307 may be implemented in software, hardware, or a combination thereof. For example, the modules may be installed in persistent storage 352, loaded into memory 351, and executed by one or more processors (not shown). Note that some or all of these modules may be communicatively coupled to or integrated with some or all of the modules of the vehicle control system 111 of fig. 2. Some of the modules 301-307 may be integrated together as an integrated module.

The positioning module 301 determines the current location of the autonomous vehicle 300 (e.g., using the GPS unit 212) and manages any data related to the user's journey or route. The positioning module 301 (also referred to as a map and route module) manages any data related to the user's journey or route. The user may log in and specify the starting location and destination of the trip, for example, via a user interface. The positioning module 301 communicates with other components of the autonomous vehicle 300, such as the map and route information 311, to obtain data related to the journey. For example, the positioning module 301 may obtain location and route information from a location server as well as a Map and POI (MPOI) server. The location server provides location services and the MPOI server provides map services and POIs for certain locations, which may be cached as part of the map and route information 311. The positioning module 301 may also obtain real-time traffic information from a traffic information system or server as the autonomous vehicle 300 moves along a route.

Based on the sensor data provided by the sensor system 115 and the positioning information obtained by the positioning module 301, the perception of the surrounding environment is determined by the perception module 302. The perception information may represent what an average driver would perceive around a vehicle that the driver is driving. The perception may include lane configuration, traffic lights, relative position of another vehicle, pedestrians, buildings, crosswalks, or other traffic related signs (e.g., parking signs, yielding signs), for example, in the form of objects. Lane configuration includes information describing one or more lanes, such as, for example, the shape of the lane (e.g., straight or curved), the width of the lane, there are a number of lanes in the road, one or two-way lanes, merging or splitting lanes, exit lanes, etc.

The perception module 302 may include a computer vision system or functionality of a computer vision system to process and analyze images captured by one or more cameras to identify objects and/or features in the environment of the autonomous vehicle. The objects may include traffic signals, road boundaries, other vehicles, pedestrians and/or obstacles, etc. Computer vision systems may use object recognition algorithms, video tracking, and other computer vision techniques. In some embodiments, the computer vision system may map the environment, track the object, and estimate the speed of the object, etc. The perception module 302 may also detect objects based on other sensor data provided by other sensors, such as radar and/or LIDAR.

For each object, the prediction module 303 predicts what the object will represent in this case. Predictions are made based on perceived data of the perceived driving environment at the time according to a set of map/route information 311 and traffic rules 312. For example, if the object is a vehicle in the opposite direction and the current driving environment includes an intersection, the prediction module 303 will predict whether the vehicle is likely to move straight ahead or turn. If the awareness data indicates that the intersection is clear of traffic lights, the prediction module 303 may predict that the vehicle may have to stop completely before entering the intersection. If the awareness data indicates that the vehicle is currently in a left-only lane or a right-only lane, the prediction module 303 may predict that the vehicle will be more likely to turn left or right, respectively.

In some aspects, the prediction module generates a predicted trajectory of an obstacle that predicts a path of the moving obstacle at least in an area deemed relevant to a current path of the ADV. The predicted trajectory may be generated based on a current state of the moving obstacle (e.g., a speed, a position, a heading, an acceleration, or a type of the moving obstacle), map data, and traffic rules.

For example, the ADV may identify an obstacle as a vehicle (e.g., an emergency vehicle) that is sensed to be traveling in a driving lane based on a forward direction and position of the obstacle with reference to map data, the map data including a lane position and direction that confirms that the obstacle appears to be driving on the driving lane based on the direction and position of the obstacle. It is assumed that the map data indicates that this is a lane that can only be turned right. According to traffic regulations (e.g. an obstacle of the "vehicle" type must turn right on a lane that can only turn right), a trajectory can be generated that defines the movement of the obstacle. The trajectory may contain coordinates and/or mathematical representations of lines that predict obstacle movement.

In some embodiments, when predicting a movement trajectory of an obstacle, a prediction system or module divides the trajectory prediction of the obstacle into two parts: 1) Longitudinal movement trajectory generation, and 2) lateral movement trajectory generation. These portions may combine to form a predicted trajectory of the obstacle.

In some embodiments, generating a lateral movement trajectory (also simply referred to as a lateral trajectory) includes optimizing the trajectory using a first polynomial function. Generating a longitudinal movement trajectory (also simply referred to as a longitudinal trajectory) includes optimizing the trajectory using a second polynomial function. Based on a) the current state of the obstacle as an initial state and b) the predicted end state of the obstacle as a set of constraints, an optimization is performed to smoothly align the trajectory with at least the current heading of the obstacle. The end state is determined according to the shape of the lane to which the obstacle is predicted to move. Once the longitudinal movement trajectory and the lateral movement trajectory have been defined and generated, a final predicted trajectory of the obstacle may be determined by combining the longitudinal movement trajectory and the lateral movement trajectory. As a result, the predicted trajectory of the obstacle is more accurate based on the current state of the obstacle and the shape of the lane.

Polynomial optimization or polynomial fitting refers to optimizing the shape of a curve (e.g., the trajectory of an obstacle) represented by a polynomial function (e.g., a five-degree or four-degree polynomial function) such that the curve is continuous along the curve (e.g., the derivative at the junction of two adjacent segments may be obtained). In the autonomous driving field, a polynomial curve from a start point to an end point is divided into a plurality of sections (or segments), each section corresponding to a control point (or reference point). Such a piecewise polynomial curve is called a piecewise polynomial. When optimizing a piecewise polynomial, in addition to a set of initial state constraints and final state constraints, a set of connection constraints and a set of boundary constraints between two adjacent segments must be satisfied.

A set of connection constraints includes that the position (x, y), velocity, heading and acceleration of adjacent segments must be the same. For example, the end position of the first section (e.g., front section) and the start position of the second section (e.g., rear section) must be the same or within a predetermined proximity. The speed, direction of advance and acceleration of the end position of the first section and the corresponding speed, direction of advance and acceleration of the start position of the second section must be the same or within a predetermined range. In addition, each control point is associated with a predetermined boundary (e.g., about 0.2 meters around the control point). The polynomial curve must pass through each control point within its respective boundary. When these two sets of constraints are satisfied during the optimization, the polynomial curve representing the trajectory should be smooth and continuous.

For each object, decision module 304 makes a decision as to how to process the object. For example, for a particular object (e.g., another vehicle in a cross-road) and metadata describing the object (e.g., speed, direction, turn angle), the decision module 304 decides how to meet the object (e.g., overtake, let go, stop, pass). The decision module 304 may make such a decision according to a set of rules (such as traffic rules or driving rules 312) that may be stored in the persistent storage 352.

The routing module 307 is configured to provide one or more routes or paths from the origin point to the destination point. For a given journey from a starting location to a destination location, received from a user for example, the routing module 307 obtains route and map information 311 and determines all possible routes or paths from the starting location to the destination location. The routing module 307 may generate a reference line of the topography map for each route (which determines the arrival of the destination location from the starting location). The reference line refers to an ideal route or path that is not disturbed by other factors such as other vehicles, obstacles or traffic conditions. That is, if there are no other vehicles, pedestrians, or obstacles on the road, the ADV should follow the reference line accurately or closely. The topography map is then provided to decision module 304 and/or planning module 305. Based on other data provided by other modules, such as traffic conditions from the positioning module 301, driving environments perceived by the perception module 302, and traffic conditions predicted by the prediction module 303, the decision module 304 and/or the planning module 305 examine all possible routes to select and modify one of the best routes. The actual path or route used to control the ADV may be close to or different from the reference line provided by the routing module 307, depending on the particular driving environment at the time.

Based on the perceived decisions for each object, the planning module 305 uses the reference line provided by the routing module 307 as a basis to plan a path or route for the autonomous vehicle as well as driving parameters (e.g., distance, speed, and/or turning angle). That is, for a given object, the decision module 304 decides how to process the object, and the planning module 305 determines how to do so. For example, for a given object, the decision module 304 may decide to pass through the object, while the planning module 305 may determine whether to pass on the left or right side of the object. Planning and control data is generated by the planning module 305, which includes information describing how the vehicle 300 moves in the next movement cycle (e.g., the next route/path segment). For example, the planning and control data may instruct the vehicle 300 to move 10 meters at a speed of 30 miles per hour (mph) and then change to the right lane at a speed of 25 mph.

Based on the planning and control data, the control module 306 controls and drives the autonomous vehicle by sending appropriate commands or signals to the vehicle control system 111 according to the route or path defined by the planning and control data. The planning and control data includes sufficient information to drive the vehicle from a first point to a second point of the route or path at different points in time along the route or path using appropriate vehicle settings or driving parameters (e.g., throttle, brake, steering commands).

In one embodiment, the planning phase is performed in several planning cycles (also referred to as driving cycles), such as, for example, every 100 milliseconds (ms) interval. For each planning period or driving period, one or more control commands will be issued based on the planning and control data. I.e., every 100ms, the planning module 305 plans the next line segment or path segment (e.g., including the target location and the time required for the ADV to reach the target location). Alternatively, the planning module 305 may also specify particular speeds, directions, and/or steering angles, etc. In one embodiment, the planning module 305 plans the route segment or path segment for a next predetermined period of time (such as 5 seconds). For each planning cycle, the planning module 305 plans the target location for the current cycle (e.g., the next 5 seconds) based on the target location planned in the previous cycle. The control module 306 then generates one or more control commands (e.g., throttle, brake, steering control commands) based on the planning and control data for the current cycle.

Note that the decision module 304 and the planning module 305 may be integrated as an integrated module. The decision module 304/planning module 305 may include a navigation system or functionality of a navigation system to determine a travel path for an autonomous vehicle. For example, the navigation system may determine a series of speeds and directional heading to affect movement of the autonomous vehicle along a path that substantially avoids the perceived obstacle, while generally heading the autonomous vehicle along a road-based path to the final destination. The destination may be set according to user input via the user interface system 113. The navigation system may dynamically update the travel path as the autonomous vehicle is operated. The navigation system may incorporate data from the GPS system and one or more maps to determine the travel path of the autonomous vehicle.

In fig. 4, a process 400 that may be performed during driving of an Autonomous Driving Vehicle (ADV) is illustrated to make a driving decision for the ADV based on whether an emergency vehicle siren (e.g., siren, ambulance, and fire truck) is heard. The process may be repeated periodically, for example, every driving cycle, or every multiple driving cycles, or every planning cycle. The process may take precedence over the other vehicle planning decisions above until it is determined that the emergency vehicle siren is no longer approaching the ADV. At that time, the ADV may resume normal planning based on the destination, obstacle detection, etc., as described elsewhere.

At block 402, the process includes receiving one or more microphone signals having sounds sensed in an environment of an ADV. The microphone signal may be generated by a microphone integrated into the ADV and arranged to sense sound in the environment of the ADV. In some embodiments, one or more microphones may be disposed on the exterior of the ADV or in a manner such that the microphones may sense ambient sounds outside of the ADV.

At block 404, the process includes processing one or more microphone signals to generate an audio signal having sound sensed in the environment of the ADV. Such audio processing may include combining microphone signals and filtering the resulting combined signals, as described further in other sections.

At block 406, the process includes applying a neural network to the audio signal to detect the presence of an audio signature of an emergency vehicle siren. The neural network may be trained with audio training data (e.g., prior to deployment of the ADV). The training data may include: a) Audio data classified as having an emergency vehicle siren, and b) audio data classified as not having an emergency vehicle siren. For example, the training data may include a number of audio samples, some of which contain different sirens, fire truck sirens, and ambulance sirens, which are classified as having an emergency vehicle siren, and some of which contain other noise, such as, but not limited to, dog barking, bird song, car horn, music, and the like, which are classified as not having an emergency vehicle siren.

In some embodiments, the neural network is a convolutional neural network, although other neural networks may be used without departing from the scope of the application. Similarly, the neural network may have a depth N (e.g., 1, 2, 3, or greater) that can be determined by routine testing and experimentation. Neural networks may be trained using known techniques such as, for example, linear regression, minimizing a loss function such as Mean Square Error (MSE) or linear least squares (also known as a cost function), gradient descent, and/or other known training techniques.

If a siren (e.g., an audio signature of an emergency vehicle siren) is detected by the neural network algorithm, the process will proceed to block 410 at decision block 408. If not, the process will do nothing, meaning that in the current driving cycle the driving decision of the ADV will not be affected by this process.

At block 410, in response to detecting the presence of the audio signature, the process may analyze the change in the audio signature to make a determination of whether the emergency vehicle whistle is a) moving toward the ADV or b) not moving toward the ADV. In some embodiments, the analysis may include detecting a change in amplitude, frequency (also referred to as pitch) of the audio signature from one time frame of the audio signal to another time frame of the audio signal.

For example, an increase in the amplitude and/or frequency of the siren relative to time may indicate that the emergency vehicle siren is moving toward the ADV. In some embodiments, the microphone signals may be analyzed to determine a phase shift of the audio signature from one microphone signal to another, which may also indicate the direction of the whistle and whether the whistle is approaching ADV.

At block 412, the process may control the ADV based on the determination. For example, if the emergency vehicle siren is moving toward ADV, the process may generate a driving decision: steering the ADV out of the current driving lane, accelerating or decelerating the ADV, and/or braking the ADV to slow or stop. The driving decisions may be affected by a series of control commands that activate throttle, brake and/or steering.

Referring to fig. 5, an ADV system is shown that can detect and respond to approaching sirens. One or more microphones 502 may sense sound in the environment of the ADV. The microphones form a microphone array having a fixed and known position on the ADV, the microphone array being arranged to sense sound in a plurality of directions around the ADV.

The microphone signal processor 504 may process the microphone signal to generate an audio signal 515 for consumption by a machine learning algorithm module 518 and a doppler shift and frequency/time analysis module 516. The microphone signal processor comprises a combiner module 506 that combines the microphone signals into a combined microphone signal 507. This may be done, for example, by averaging the microphone signals. The combiner may use feature extraction to determine key points of the microphone signals to be aligned so that they can be correctly combined. The combiner may combine the microphone signals into a single audio signal using other combining or downmix techniques without departing from the scope of this disclosure.

The microphone signal processor comprises a filter 508 that removes noise components from the combined microphone signal 507. For example, the low pass filter 510 may attenuate high frequency audio components. Additionally or alternatively, the active noise control module 512 may actively remove noise from the high frequency audio component.

For example, active noise control may analyze the waveform of the microphone signal to determine background noise in the environment of the ADV. The noise may include noise generated by the ADV, or other environmental noise, such as the engine of other vehicles, music, wind, etc. An anti-noise signal may be generated that, when applied to the combined microphone signal, may cancel noise in the combined microphone signal. The anti-noise signal may be a phase shifted or inverted version of the detected noise (which may be detected in the microphone signal or the combined microphone signal). In some embodiments, one or more microphone signals may be used as noise reference signals. Known adaptive algorithms may analyze the noise reference signal to determine noise in the combined microphone signal to be cancelled. This noise can then be inverted so that when applied to the combined microphone signal, destructive interference is created, and the cancellation of the noise in the audio signal makes it possible to more effectively identify the siren in the audio signal.

The filtered audio signal 513 resulting from filtering the combined microphone signal may be output by a filter module and fed to the audio signal storage buffer 154. The audio signal storage buffer may store the audio signal as a plurality of time frames prior to applying the neural network to the audio signal (at block 518) and analyzing the change in the audio signature (at block 516).

For example, referring to fig. 6, a plurality of audio frames (e.g., frame 1 and frame 2) may be extracted from the audio signal 604. Each audio frame may be superimposed on an adjacent audio frame, although this is not required. The frame size may vary based on the application (e.g., 0.5 seconds, 1 second, 3 seconds) and may be adjusted for the application by routine testing and experimentation. Each frame may be stored in a memory buffer 614. The doppler shift and frequency/time analysis module and the machine learning detection algorithm module may consume each frame (e.g., on a frame-by-frame basis) from the memory buffer as the memory buffer is filled with filtered audio from the filter module. Thus, the memory buffer serves as a pipeline providing continuous audio data for processing by the module.

Referring back to fig. 5, the machine learning detection algorithm module 518 applies a neural network algorithm to the audio signal 515 to determine whether an audio signature of an emergency whistle is present in the audio signal. As mentioned, the audio signal may be a series of audio frames. Thus, the neural network algorithm may analyze a series of audio frames to determine if an audio signature of the siren is present.

If so, the Doppler shift and frequency/time analysis module 516 may be notified that a whistle is detected. In response to the notification, the doppler shift and frequency/time analysis module 516 may analyze the audio signal 515 to estimate whether the emergency vehicle whistle (and thus the emergency vehicle) is moving toward the ADV based on the doppler effect. For example, if the frequency of the emergency siren increases over time (e.g., from one time frame to another time frame) (e.g., from 1Hz to 1.1 Hz), based on the doppler effect, this may indicate that the emergency vehicle is moving toward the ADV. In this case, the ADV should be stopped or stopped alongside. Other indications may include an increase in the loudness of the whistle, or a phase shift of the whistle from one microphone signal to another. The time frames may be analyzed in the time and/or frequency domain.

In some embodiments, the siren logic 517 may also analyze the movement of the ADV to help determine if the emergency siren is approaching the ADV. For example, the whistle logic 517 may check the ADV speed. If the ADV speed is zero and the amplitude and frequency of the siren remain unchanged over time, the siren logic 517 may form a determination that the emergency vehicle and the siren are stationary. In another example, if the ADV is moving and the amplitude and frequency of the siren remain constant (or substantially constant) over time, the siren logic needs more data to form a determination. For example, the ADV may analyze the camera data to determine where the emergency vehicle is and whether the emergency vehicle is approaching the ADV.

The determination 519 may be indicated to the decision module 304 (as described elsewhere) and/or the planning module 305. The determination may be one of "close", "not close". The decision module and/or planning module may then make driving decisions (e.g., slow down, change lanes, stop) based on the determination. The driving decisions may be affected by a series of control commands (throttle, brake, steering) issued from the control module 306 to the vehicle control system 111. In some embodiments, the driving decision is made only when it is determined that the emergency vehicle is approaching an ADV. Thus, the ADV can ignore sirens that are not approaching and prevent unnecessary disturbance to driving.

Fig. 7-10 also illustrate different non-exhaustive scenarios in which an ADV detects and responds to an emergency vehicle whistle by performing a process such as that described in fig. 4. In fig. 7, emergency vehicle 704 is approaching ADV 702. The ADV 702 has one or more microphones 703 that sense sounds in the environment of the ADV. Once the ADV determines that the emergency vehicle 704 is approaching the ADV (e.g., from behind as shown), the ADV may make driving decisions such as slowing, braking, and/or steering the roadside.

Fig. 8 shows a scenario in which the emergency vehicle 804 is moving away from the ADV 802. In this case, the ADV may detect that the emergency vehicle whistle is moving away from the ADV, and the ADV may continue to travel along its current path without interference.

Fig. 9 and 10 show a scenario in which an emergency vehicle and an ADV are on the same path. Even if the emergency vehicle is not on the same road or path, the vehicles on the road must respond to the emergency vehicle because the non-emergency vehicle must give way to give priority to the emergency vehicle in selecting the path of travel. In fig. 9, an ADV 902 is driving toward an intersection and sensing an approaching emergency vehicle 904 based on the sirens of the emergency vehicle. ADV may respond by slowing down, braking, and/or stopping sideways so that the emergency vehicle may have priority to choose a path (straight or turn) and ensure that the ADV will not block the emergency vehicle. But in fig. 10, when it is determined that the emergency vehicle 1004 is driving away from the ADV 1002, then the ADV may safely follow the path of the ADV without further consideration of the emergency vehicle.

It should be understood that the examples discussed in fig. 7-10 are illustrative and not limiting. Many scenarios involving ADV and detecting whether an emergency vehicle is approaching may be described, all of which are not actually shown in this disclosure.

Note that some or all of the components shown and described above may be implemented in software, hardware, or a combination thereof. For example, such components may be implemented as software installed and stored in a persistent storage device, which may be loaded by a processor (not shown) and executed in memory to implement the processes or operations described throughout the present application. Alternatively, such components may be implemented as executable code programmed or embedded in special purpose hardware, such as an integrated circuit (e.g., an application specific IC or ASIC), a Digital Signal Processor (DSP), or a Field Programmable Gate Array (FPGA), which may be accessed from an application via a corresponding driver and/or operating system. Furthermore, as part of a set of instructions that are accessible to software components via one or more particular instructions, such components may be implemented as particular hardware logic in a processor or processor core.

Some portions of the preceding detailed description have been presented in terms of algorithms and symbolic representations of operations on data bits within a computer memory. These algorithmic descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. An algorithm is here, and generally, considered to be a self-consistent sequence of operations leading to a desired result. The operations are those requiring physical manipulations of physical quantities.

It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the above discussion, it is appreciated that throughout the description, discussions utilizing terms such as those set forth in the appended claims, refer to the action and processes of a computer system, or similar electronic computing device, that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.

Embodiments of the present disclosure also relate to an apparatus for performing the operations herein. Such a computer program is stored in a non-transitory computer readable medium. A machine-readable medium includes any mechanism for storing information in a form readable by a machine (e.g., a computer). For example, a machine-readable (e.g., computer-readable) medium includes a machine (e.g., computer) readable storage medium (e.g., read only memory ("ROM"), random access memory ("RAM"), magnetic disk storage medium, optical storage medium, flash memory device).

The processes or methods depicted in the preceding figures may be performed by processing logic that comprises hardware (e.g., circuitry, dedicated logic, etc.), software (e.g., embodied on a non-transitory computer readable medium), or a combination of both. Although the processes or methods are described above in terms of some sequential operations, it should be appreciated that some of the operations described may be performed in a different order. Further, some operations may be performed in parallel rather than sequentially.

Embodiments of the present disclosure are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of embodiments of the disclosure as described herein.

In the foregoing specification, embodiments of the present disclosure have been described with reference to specific exemplary embodiments thereof. It will be evident that various modifications may be made thereto without departing from the broader spirit and scope of the disclosure as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims

1. A computer-implemented method of operating an autonomous vehicle ADV, the method comprising:

performing an audio analysis on at least a portion of the audio signal stream in response to determining the emergency vehicle siren signal to determine whether the emergency vehicle is moving toward or away from the ADV, including analyzing camera data to determine whether the emergency vehicle is moving toward or away from the ADV in response to detecting that the amplitude and frequency of the emergency vehicle siren remains constant over time with the ADV moving; and

2. The method of claim 1, further comprising, in response to determining that the emergency vehicle is moving toward the ADV, controlling the ADV including at least one of steering the ADV out of a current driving lane or braking the ADV to slow down.

3. The method of claim 1, wherein performing audio analysis on at least a portion of the stream of audio signals comprises detecting a change in at least one of amplitude or frequency of an audio pattern across a plurality of audio frames of the audio signals.

4.A method according to claim 3, further comprising, in response to detecting an increase in amplitude or frequency, determining that the emergency vehicle whistle is moving towards ADV.

5. The method of claim 1, further comprising performing a Digital Signal Processing (DSP) operation on the audio signal to remove noise.

6. The method of claim 1, wherein the audio signal comprises a plurality of audio frames and the stream of audio signals is stored in a buffer prior to applying the neural network to the audio signal and analyzing the change in the audio signature.

7. The method of claim 1, wherein the neural network is trained with audio data representative of emergency vehicle sirens collected from a plurality of emergency vehicles.

8. The method of claim 1, wherein the neural network is a convolutional neural network.

9. A non-transitory machine readable medium having instructions stored therein, which when executed by a processor, cause the processor to perform the operations of the method of operating an autonomous vehicle ADV of any of claims 1 to 8.

10. A data processing system, comprising:

a processor; and

a memory coupled to the processor and for storing instructions that, when executed by the processor, cause the processor to perform the operations of the method of operating an autonomous vehicle ADV of any of claims 1 to 8.

11. A computer program product comprising a computer program which, when executed by a processor, causes the processor to perform the operations of the method of operating an autonomous vehicle ADV of any of claims 1 to 8.