[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2022086739A2 - Systems and methods for camera-lidar fused object detection - Google Patents

Systems and methods for camera-lidar fused object detection Download PDF

Info

Publication number
WO2022086739A2
WO2022086739A2 PCT/US2021/054333 US2021054333W WO2022086739A2 WO 2022086739 A2 WO2022086739 A2 WO 2022086739A2 US 2021054333 W US2021054333 W US 2021054333W WO 2022086739 A2 WO2022086739 A2 WO 2022086739A2
Authority
WO
WIPO (PCT)
Prior art keywords
lidar
dataset
image
segments
points
Prior art date
Application number
PCT/US2021/054333
Other languages
French (fr)
Other versions
WO2022086739A3 (en
Inventor
Arsenii Saranin
Basel ALGHANEM
Benjamin D. BALLARD
Jason Ziglar
G. Peter K. CARR
Original Assignee
Argo AI, LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US17/078,548 external-priority patent/US12135375B2/en
Priority claimed from US17/078,561 external-priority patent/US12122428B2/en
Priority claimed from US17/078,532 external-priority patent/US12050273B2/en
Priority claimed from US17/078,543 external-priority patent/US11885886B2/en
Priority claimed from US17/078,575 external-priority patent/US11430224B2/en
Application filed by Argo AI, LLC filed Critical Argo AI, LLC
Priority to DE112021005607.7T priority Critical patent/DE112021005607T5/en
Priority to CN202180085904.7A priority patent/CN116685874A/en
Publication of WO2022086739A2 publication Critical patent/WO2022086739A2/en
Publication of WO2022086739A3 publication Critical patent/WO2022086739A3/en

Links

Classifications

    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0011Planning or execution of driving tasks involving control alternatives for a single driving scenario, e.g. planning several paths to avoid obstacles
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W10/00Conjoint control of vehicle sub-units of different type or different function
    • B60W10/04Conjoint control of vehicle sub-units of different type or different function including control of propulsion units
    • B60W10/06Conjoint control of vehicle sub-units of different type or different function including control of propulsion units including control of combustion engines
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W10/00Conjoint control of vehicle sub-units of different type or different function
    • B60W10/18Conjoint control of vehicle sub-units of different type or different function including control of braking systems
    • B60W10/184Conjoint control of vehicle sub-units of different type or different function including control of braking systems with wheel brakes
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W10/00Conjoint control of vehicle sub-units of different type or different function
    • B60W10/20Conjoint control of vehicle sub-units of different type or different function including control of steering systems
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0015Planning or execution of driving tasks specially adapted for safety
    • B60W60/0017Planning or execution of driving tasks specially adapted for safety of other traffic participants
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0027Planning or execution of driving tasks using trajectory prediction for other traffic participants
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/86Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/88Lidar systems specially adapted for specific applications
    • G01S17/89Lidar systems specially adapted for specific applications for mapping or imaging
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S7/00Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
    • G01S7/48Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S17/00
    • G01S7/4808Evaluating distance, position or velocity data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/803Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2420/00Indexing codes relating to the type of sensors based on the principle of their operation
    • B60W2420/40Photo, light or radio wave sensitive means, e.g. infrared sensors
    • B60W2420/408Radar; Laser, e.g. lidar
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2554/00Input parameters relating to objects
    • B60W2554/40Dynamic objects, e.g. animals, windblown objects
    • B60W2554/402Type
    • B60W2554/4026Cycles

Definitions

  • the present disclosure relates generally to object detection systems. More particularly, the present disclosure relates to implementing systems and methods for carnera- LiD AR Fused (“CLF”) object detection with LiDAR-to-image detection matching, point pruning, local variation segmentation, segment merging and/or segment filtering.
  • CLF carnera- LiD AR Fused
  • Modem day vehicles have at least one on-board computer and have internet/ satellite connectivity.
  • the software running on these on-board computers monitor and/or control operations of the vehicles.
  • the vehicle also comprises LiDAR detectors for detecting objects in proximity thereto.
  • the LiDAR detectors generate LiDAR datasets that measure the distance from the vehicle to an object at a plurality of different times. These distance measurements can be used for tracking movements of the object, making predictions as to the objects trajectory, and planning paths of travel for the vehicle based on the predicted objects trajectory.
  • the present disclosure concerns implementing systems and methods for object detection with LiDAR-to-image detection matching.
  • the object detection may be used to control an autonomous vehicle.
  • the methods comprise: obtaining, by a computing device, a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; and using, by a computing device, the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle.
  • the object is detected by: matching points of the LiDAR dataset to pixels in the at least one image; and detecting the object in a point cloud defined by the LiDAR dataset based on the matching.
  • the object detection is used to facilitate at least one autonomous driving operation (e.g., autonomous driving operation comprises an object tracking operation, an object trajectory prediction operation, a vehicle trajectory determination operation, and/or a collision avoidance operation).
  • the methods also comprise obtaining, by the computing device, at least one image that was captured at a time when a sensor of the LiDAR system swept over a center of a camera Field Of View (“FOV”), wherein the at least one image is used in addition to the LiDAR dataset to detect the object.
  • the matching may be based on identifiers for each object detected in the at least one image, a mask identifier, cell identifiers for a mask, confidence values for each cell, LiDAR point identifiers, LiDAR point coordinates, extrinsic LiDAR sensor and camera calibration parameters, and/or intrinsic camera calibration parameters.
  • the matching may comprise determining a probability distribution of pixels of the at least one image to which a point of the LiDAR dataset may project taking into account a projection uncertainty in view of camera calibration uncertainties.
  • the probability distribution is determined by computing a probability distribution function over image space coordinates for a pixel to which a point of the LiDAR dataset would probably project.
  • the probability distribution function may be computed in accordance with following mathematical equation where x 1 and/ represent image space coordinates for a pixel, and X. F and Z represent LiDAR space coordinates for a point of the LiDAR dataset.
  • the probability distribution function may be converted to image detection mask coordinates in accordance with the following mathematical equation where represent image space boundaries of a bounding box, and R represents a mask resolution.
  • the matching comprises determining a probability distribution over a set of object detections in which a point of the LiDAR dataset is likely to be, based on at least one confidence value indicating a level of confidence that at least one respective pixel of the at least one image belongs to a given detected object.
  • the probability distribution may be determined by computing a probability that a point of the LiDAR dataset projects into an image detection independent of all other image detections. For example, the probability may be computed in accordance with the following mathematical equation(s).
  • the matching comprises determining a probability that the LiDAR point does not project into any image detection.
  • the matching involves normalizing a plurality of probabilities determined for a given point of the LiDAR dataset in accordance with the following mathematical equation where represents a probability that that a point of the LiDAR dataset projects into an image detection independent of all other image detections and represents a probability that the LiDAR point does not project into any image detection.
  • the present disclosure also concerns implementing systems and methods for CLF object detection with point pruning.
  • the present solution can be used to operate an autonomous vehicle.
  • the methods comprise: obtaining, by a computing device, a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; and using, by a computing device, the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle.
  • the object is detected by generating a pruned LiDAR dataset by reducing a total number of points contained in the LiDAR dataset, and detecting the object in a point cloud defined by the pruned LiDAR dataset.
  • the object detection may be used by the computing device to facilitate at least one autonomous driving operation (e.g., an object tracking operation, an object trajectory prediction operation, a vehicle trajectory determination operation, and/or a collision avoidance operation).
  • the methods also comprise obtaining, by the computing device, at least one image that was captured at a time when a sensor of the LiDAR system swept over a center of a camera FOV.
  • the image is used in addition to the LiDAR dataset to detect the object.
  • the pruned LiDAR dataset is generated by downsampling the points based on a planned trajectory of the autonomous vehicle.
  • the points of the LiDAR dataset, corresponding to a first region along the planned trajectory of the autonomous vehicle, may be downsampled at a higher or lower sampling rate than points of the LiDAR dataset corresponding to a second region that is not along the planned trajectory of the autonomous vehicle.
  • the first region may comprise a region including points corresponding to at least one object that is unlikely to interfere with the autonomous vehicle when following the planned trajectory
  • the second region may comprise a region including points corresponding to at least one object that is likely to interfere with the autonomous vehicle when following the planned trajectory.
  • the pruned LiDAR dataset is generated by downsampling the LiDAR dataset based on point labels assigned to the points.
  • Each of the point labels may comprise at least one of an object class identifier, a color, and/or a unique identifier.
  • the LiDAR dataset is downsampled by assigning a first importance label to points associated with a moving object class and a second importance label to points associated with a static object class.
  • the points assigned the first importance label may be downsampled (e.g., at a first resolution), and/or the points assigned the second importance label may be downsampled (e.g., at a second resolution lower than the first resolution).
  • the pruned LiDAR dataset is generated by downsampling the LiDAR dataset based on point distances from a bounding box.
  • a point may be removed from the LiDAR dataset when a respective one of the point distances is greater than a threshold distance.
  • the pruned LiDAR dataset is generated by downsampling the LiDAR dataset using a map that includes information associated with a planned trajectory of the autonomous vehicle.
  • a point may be removed from the LiDAR dataset when the point has a height less than a minimum height threshold value or greater than a maximum height threshold value.
  • the pruned LiDAR dataset is generated by downsampling the LiDAR dataset at a resolution selected based on a modeled process latency.
  • the present disclosure further concerns implementing systems and methods for object detection with local variation segmentation.
  • the object detection may be used to control an autonomous vehicle.
  • the method comprises: obtaining, by a computing device, a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; and using, by a computing device, the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle.
  • the object is detected by: computing a distribution of object detections that each point of the LiDAR dataset is likely to be in; creating a plurality of segments of LiDAR data points using the distribution of object detections; and detecting the object in a point cloud defined by the LiDAR dataset based on the plurality of segments of LiDAR data points.
  • the object detection may be used by the computing device to facilitate at least one autonomous driving operation (e.g., autonomous driving operation comprises an object tracking operation, an object trajectory prediction operation, a vehicle trajectory determination operation, and/or a collision avoidance operation).
  • the methods also comprise obtaining, by the computing device, at least one image that was captured at a time when a sensor of the LiDAR system swept over a center of a camera FOV, wherein the at least one image is used in addition to the LiDAR dataset to detect the object.
  • the distribution of object detections may be computed based on (a) a probability distribution of pixels of the at least one image to which a point of the LiDAR dataset may project, and (b) a probability that the point does not project into any image detection.
  • the segments of LiDAR data points may be created by using the LiDAR dataset to construct a connectivity graph.
  • the connectivity graph comprises points of the LiDAR dataset plotted in a 3D coordinate system and connection lines respectively connecting the points.
  • the connection lines may be added to the connectivity graph based on whether two points of the LiDAR dataset are within a threshold spatial or temporal distance from each other, whether two points are nearest neighbors, or triangulation.
  • the segments of LiDAR data points are created by determining, for each point in the connectivity graph, a descriptor comprising a vector of elements that characterize a given point of the LiDAR data set.
  • the elements of the vector may comprise a surface normal, a color value based on the at least one image, an intensity, a texture, spatial coordinates, a height above ground, a class label, an instance identifier, an image based feature, a Fast Point Feature Histogram, an image detection capability, and/or a modified distance.
  • the segments of LiDAR data points are created by further assigning a weight to each connection line based on the descriptor.
  • the weight represents a dissimilarity measure between two points connected to each other in the connectivity graph via the connection line.
  • the plurality of segments of LiDAR data points are created by further merging points of the LiDAR dataset based on the weights. Two points may be merged together when a weight associated with a respective connection line is less than a threshold value.
  • the present disclosure concerns implementing systems and methods for object detection with segment merging.
  • the object detection may be used to control an autonomous vehicle.
  • the methods comprise: obtaining, by a computing device, a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; and using, by a computing device, the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle.
  • the object is detected by: computing a distribution of object detections that each point of the LiDAR dataset is likely to be in; creating a plurality of segments of LiDAR data points using the distribution of object detections; merging the plurality of segments of LiDAR data points to generate merged segments; and detecting the object in a point cloud defined by the LiDAR dataset based on the merged segments.
  • the object detection may be used by the computing device to facilitate at least one autonomous driving operation (e.g., an object tracking operation, an object trajectory prediction operation, a vehicle trajectory determination operation, and/or a collision avoidance operation).
  • the methods also comprise obtaining, by the computing device, at least one image that was captured at a time when a sensor of the LiDAR system swept over a center of a camera FOV, wherein the at least one image is used in addition to the LiDAR dataset to detect the object.
  • the distribution of object detections may be computed based on (a) a probability distribution of pixels of the at least one image to which a point of the LiDAR dataset may project, and (b) a probability that the point does not project into any image detection.
  • the merged segments may be generated by: selecting pairs of segments from the plurality of segments of LiDAR data points; computing features for each pair of segments based on attributes of the segments contained in the pair; generating, for each pair of segments, a probability that the segments contained in the pair should be merged based on the features; and merging the plurality of segments of LiDAR data points based on the probabilities generated for the pairs of segments.
  • the pairs of segments may be filtered to remove pairs of segments which have centroid-to-centroid distances greater than a threshold value.
  • the features may include, but are not limited to, a difference between the average of the probability distributions that were computed for the LiDAR data points contained in a first segment of the plurality of segments of LiDAR data points and the average of the probability distributions that were computed for the LiDAR data points contained in a second segment of the plurality of segments of LiDAR data points.
  • the attributes may include, but are not limited to, an average of a plurality of probability distributions that were computed for the LiDAR data points contained in a given segment of the plurality of segments of LiDAR data points, and/or each probability distribution specifying detected objects in which a given LiDAR data point is likely to be.
  • the attributes include a 2D region that the LiDAR data points in a given segment cover, a percentage of LiDAR data points contained in the given segment that are on a road, a percentage of LiDAR data points contained in the given segment that are off a road, and/or a total number of lanes that the given segment at least partially overlaps.
  • the features include a difference in on-road proportions, difference in off-road proportions, a region compatibility, a lane compatibility, a difference between a total number of lanes that a first segment of LiDAR data points at least partially overlaps and a total number of lanes that a second segment of LiDAR data points at least partially overlaps, a difference or distance in height between segments of LiDAR data points, a mask compatibility, a difference in object type distributions, and/or an object type compatibility.
  • the present disclosure concerns implementing systems and methods for object detection with segment fdtering.
  • the object detection can be used to control an autonomous vehicle.
  • the methods comprise: obtaining, by a computing device, a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; and using, by a computing device, the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle.
  • the object is detected by performing the following operations: computing a distribution of object detections that each point of the LiDAR dataset is likely to be in; creating a plurality of segments of LiDAR data points using the distribution of object detections; merging the plurality of segments of LiDAR data points to generate merged segments; and detecting the object in a point cloud defined by the LiDAR dataset based on the merged segments.
  • the object detection is used by the computing device to facilitate at least one autonomous driving operation (e.g., an object tracking operation, an object trajectory prediction operation, a vehicle trajectory determination operation, and/or a collision avoidance operation).
  • the methods also comprise obtaining, by the computing device, at least one image that was captured at a time when a sensor of the LiDAR system swept over a center of a camera FOV, wherein the at least one image is used in addition to the LiDAR dataset to detect the object.
  • the distribution of object detections may be computed based on (a) a probability distribution of pixels of the at least one image to which a point of the LiDAR dataset may project, and (b) a probability that the point does not project into any image detection.
  • the detecting comprises obtaining information for a given detection mask and a given merged segment of the merged segments.
  • the information may comprise at least one of P m representing a number of points of a LiDAR dataset that project into the given detection mask, Si representing a number of points forming the given merged segment, P s m representing a number of points in the given merged segment projecting into the given detection mask, a height of the given merged segment, a length l s of the given merged segment, and/or a width ii’ s of the given merged segment.
  • the detecting comprises determining at least one cluster feature based on the information.
  • the cluster feature may comprise: a cluster feature U determined based on a number of points of a LiDAR dataset that project into the given detection mask and/or a number of points forming the given merged segment; a cluster feature V determined based on a number of points in the given merged segment projecting into the given detection mask and/or a number of points of a LiDAR dataset that project into the given detection mask; and/or a cluster feature H representing a cluster height, a cluster feature L representing a cluster length, a cluster feature LTW representing a length-to-width ratio for a cluster, and/or a cluster feature C representing a cylinder convolution (or fit) score of clustered LiDAR data points.
  • the detecting comprises computing a projection score PS based on the at least one cluster feature.
  • the projection score PS is a product of two or more cluster features.
  • the detecting comprises using the projection score PS to verify that the given merged segment is part of a particular detected object that is associated with the given detection mask.
  • a verification may be made that the given merged segment is part of a particular detected object that is associated with the given detection mask when the projection score PS exceeds a threshold value or has a value greater than other projection scores determined for other merged segments with points in the given detection mask.
  • the implementing systems can comprise: a processor; and a non-transitory computer- readable storage medium comprising programming instructions that are configured to cause the processor to implement a method for object detection.
  • the above described methods can also be implemented by a computer program product comprising a memory and programming instructions that are configured to cause a processor to perform operations.
  • FIG. 1 is an illustration of an illustrative system.
  • FIG. 2 is an illustration of an illustrative architecture for a vehicle.
  • FIG. 3 is an illustration of an illustrative architecture for a LiDAR system employed by the vehicle shown in FIG. 2.
  • FIG. 4 is an illustration of an illustrative computing device.
  • FIG. 5 provides a block diagram that is useful for understanding how vehicles control is achieved in accordance with the present solution.
  • FIGS. 6A-6B (collectively referred to herein as “FIG. 6”) provides a flow diagram of an illustrative method for controlling an autonomous vehicle using CLF object detection.
  • FIG. 7 provides a flow diagram of an illustrative method for CLF object detection.
  • FIG. 8 provides a flow diagram of an illustrative method for pruning (or reducing) the number of LiDAR data points that are processed for purposed of detecting an object that is located in proximity to an AV.
  • FIG. 9 provides a flow diagram of an illustrative method for performing a LiDAR-to- Image Detection (“LID”) matching algorithm.
  • LID LiDAR-to- Image Detection
  • FIG. 10 provides an illustrative image captured by a camera of a vehicle.
  • FIG. 11 provides an illustrative image having a plurality of bounding boxes overlaid thereon.
  • FIG. 12 provides an illustrative image having a bounding box and mask overlaid thereon.
  • FIG. 13 provides a flow diagram of an illustrative method for determining a probability distribution of pixels to which a LiDAR data point may project taking into account a projection uncertainty.
  • FIG. 14 provides a flow diagram of an illustrative method for determining a probability distribution over a set of object detections in which a LiDAR data point is likely to be.
  • FIG. 15 provides an illustration that is useful for understanding the novel Local Variation Segmentation (“LVS”) algorithm of the present solution.
  • LVS Local Variation Segmentation
  • FIG. 16 provides an illustration showing a graph that is generated during the LVS algorithm of FIG. 15.
  • FIG. 17 provides an illustration of an illustrative architecture for a segment merger.
  • FIG. 18 provides a flow diagram of an illustrative method for object detection segment filtering.
  • An “electronic device” or a “computing device” refers to a device that includes a processor and memory. Each device may have its own processor and/or memory, or the processor and/or memory may be shared with other devices as in a virtual machine or container arrangement.
  • the memory will contain or receive programming instructions that, when executed by the processor, cause the electronic device to perform one or more operations according to the programming instructions.
  • memory each refer to a non-transitory device on which computer-readable data, programming instructions or both are stored. Except where specifically stated otherwise, the terms “memory,” “memory device,” “data store,” “data storage facility” and the like are intended to include single device embodiments, embodiments in which multiple memory devices together or collectively store a set of data or instructions, as well as individual sectors within such devices.
  • processor and “processing device” refer to a hardware component of an electronic device that is configured to execute programming instructions. Except where specifically stated otherwise, the singular term “processor” or “processing device” is intended to include both single-processing device embodiments and embodiments in which multiple processing devices together or collectively perform a process.
  • vehicle refers to any moving form of conveyance that is capable of carrying either one or more human occupants and/or cargo and is powered by any form of energy.
  • vehicle includes, but is not limited to, cars, trucks, vans, trains, autonomous vehicles, aircraft, aerial drones and the like.
  • An “autonomous vehicle” is a vehicle having a processor, programming instructions and drivetrain components that are controllable by the processor without requiring a human operator.
  • An autonomous vehicle may be fully autonomous in that it does not require a human operator for most or all driving conditions and functions, or it may be semi-autonomous in that a human operator may be required in certain conditions or for certain operations, or that a human operator may override the vehicle’s autonomous system and may take control of the vehicle.
  • the present solution concerns systems and methods for controlling vehicles.
  • the methods generally involve: generating a vehicle trajectory for the vehicle that is in motion; performing CLF object detection operations to detect an object within a given distance from the vehicle; generating at least one possible object trajectory for the object which was detected; using the vehicle trajectory and at least one possible object trajectory to determine whether there is an undesirable probability that a collision will occur between the vehicle and the object; and modifying the vehicle trajectory when a determination is made that there is an undesirableprobability that the collision will occur.
  • the present solution is being described herein in the context of an autonomous vehicle.
  • the present solution is not limited to autonomous vehicle applications.
  • the present solution can be used in other applications such as robotic application, radar system application, metric applications, and/or system performance applications.
  • System 100 comprises a vehicle 102i that is traveling along a road in a semi-autonomous or autonomous manner.
  • Vehicle 102i is also referred to herein as an Autonomous Vehicle (“AV”).
  • the AV 102i can include, but is not limited to, a land vehicle (as shown in FIG. 1), an aircraft, or a watercraft.
  • AV 102i is generally configured to detect objects 102i, 114, 116 in proximity thereto.
  • the objects can include, but are not limited to, a vehicle 1022, cyclist 114 (such as a rider of a bicycle, electric scooter, motorcycle, or the like) and/or a pedestrian 116.
  • the object detection is achieved in accordance with a novel CLF object detection process.
  • the novel CLF object detection process will be described in detail below.
  • AV 102i When such a detection is made, AV 102i performs operations to: generate one or more possible object trajectories for the detected object; and analyze at least one of the generated possible object trajectories to determine whether or not there is an undesirable probability that a collision will occur between the AV and object in a threshold period of time (e.g., 1 minute). If so, the AV 102i performs operations to determine whether the collision can be avoided if a given vehicle trajectory is followed by the AV 102i and any one of a plurality of dynamically generated emergency maneuvers is performed in predefined time period (e.g., /V milliseconds).
  • a threshold period of time e.g. 1 minute
  • the AV 102i takes no action or optionally performs a cautious maneuver (e.g., mildly slows down). In contrast, if the collision cannot be avoided, then the AV 102i immediately takes an emergency maneuver (e.g., brakes and/or changes direction of travel).
  • a cautious maneuver e.g., mildly slows down.
  • an emergency maneuver e.g., brakes and/or changes direction of travel.
  • FIG. 2 there is provided an illustration of an illustrative system architecture 200 for a vehicle.
  • Vehicles 102i and/or 1022 of FIG. 1 can have the same or similar system architecture as that shown in FIG. 2.
  • system architecture 200 is sufficient for understanding vehicle(s) 102i, 1022 of FIG. 1.
  • the vehicle 200 includes an engine or motor 202 and various sensors 204-218 for measuring various parameters of the vehicle.
  • the sensors may include, for example, an engine temperature sensor 204, a battery voltage sensor 206, an engine Rotations Per Minute (“RPM”) sensor 208, and a throttle position sensor 210.
  • the vehicle may have an electric motor, and accordingly will have sensors such as a battery monitoring system 212 (to measure current, voltage and/or temperature of the battery), motor current 214 and voltage 216 sensors, and motor position sensors such as resolvers and encoders 218
  • Operational parameter sensors that are common to both types of vehicles include, for example: a position sensor 236 such as an accelerometer, gyroscope and/or inertial measurement unit; a speed sensor 238; and an odometer sensor 240.
  • the vehicle also may have a clock 242 that the system uses to determine vehicle time during operation.
  • the clock 242 may be encoded into the vehicle on-board computing device, it may be a separate device, or multiple clocks may be available.
  • the vehicle also will include various sensors that operate to gather information about the environment in which the vehicle is traveling.
  • sensors may include, for example: a location sensor 260 (e.g., a Global Positioning System (“GPS”) device); object detection sensors such as one or more cameras 262; a LiDAR sensor system 264; and/or a radar and/or a sonar system 266.
  • the sensors also may include environmental sensors 268 such as a precipitation sensor and/or ambient temperature sensor.
  • the object detection sensors may enable the vehicle to detect objects that are within a given distance range of the vehicle 200 in any direction, while the environmental sensors collect data about environmental conditions within the vehicle’s area of travel
  • the on-board computing device 220 analyzes the data captured by the sensors and optionally controls operations of the vehicle based on results of the analysis. For example, the on-board computing device 220 may control: braking via a brake controller 232; direction via a steering controller 224; speed and acceleration via a throttle controller 226 (in a gas-powered vehicle) or a motor speed controller 228 (such as a current level controller in an electric vehicle); a differential gear controller 230 (in vehicles with transmissions); and/or other controllers.
  • Geographic location information may be communicated from the location sensor 260 to the on-board computing device 220, which may then access a map of the environment that corresponds to the location information to determine known fixed features of the environment such as streets, buildings, stop signs and/or stop/go signals. Captured images from the cameras 262 and/or object detection information captured from sensors such as LiDAR 264 is communicated from those sensors) to the on-board computing device 220. The object detection information and/or captured images are processed by the on-board computing device 220 to detect objects in proximity to the vehicle 200. Any known or to be known technique for making an object detection based on sensor data and/or captured images can be used in the embodiments disclosed in this document.
  • LiDAR information is communicated from LiDAR sensor 264 to the on-board computing device 220. Additionally, captured images are communicated from the camera(s) 262 to the on-board computing device 220. The LiDAR information and/or captured images are processed by the on-board computing device 220 to detect objects in proximity to the vehicle 200. The manner in which the object detections are made by the on-board computing device 220 will become evident as the discussion progresses.
  • the on-board computing device 220 When the on-board computing device 220 detects a moving object, the on-board computing device 220 will generate one or more possible object trajectories for the detected object, and analyze the possible object trajectories to assess the probability of a collision between the object and the AV. If the probability exceeds an acceptable threshold, the on-board computing device 220 performs operations to determine whether the collision can be avoided if the AV follows a defined vehicle trajectory and/or implements one or more dynamically generated emergency maneuvers is performed in a pre-defined time period (e.g., N milliseconds). If the collision can be avoided, then the on-board computing device 220 may cause the vehicle 200 to perform a cautious maneuver (e.g., mildly slow down, accelerate, or swerve). In contrast, if the collision cannot be avoided, then the on-board computing device 220 will cause the vehicle 200 to take an emergency maneuver (e.g., brake and/or change direction of travel).
  • a cautious maneuver e.g., mild
  • LiDAR system 264 of FIG. 2 may be the same as or substantially similar to the LiDAR system 300. As such, the discussion of LiDAR system 300 is sufficient for understanding LiDAR system 264 of FIG. 2.
  • the LiDAR system 300 includes a housing 306 which may be rotatable 360° about a central axis such as hub or axle 316.
  • the housing may include an emitter/receiver aperture 312 made of a material transparent to light.
  • an emitter/receiver aperture 312 made of a material transparent to light.
  • the present solution is not limited in this regard. In other scenarios, multiple apertures for emitting and/or receiving light may be provided. Either way, the LiDAR system 300 can emit light through one or more of the aperture(s) 312 and receive reflected light back toward one or more of the aperture(s) 211 as the housing 306 rotates around the internal components.
  • the outer shell of housing 306 may be a stationary dome, at least partially made of a material that is transparent to light, with rotatable components inside of the housing 306.
  • a light emitter system 304 that is configured and positioned to generate and emit pulses of light through the aperture 312 or through the transparent dome of the housing 306 via one or more laser emitter chips or other light emitting devices.
  • the emitter system 304 may include any number of individual emitters (e.g., 8 emitters, 64 emitters, or 128 emitters). The emitters may emit light of substantially the same intensity or of varying intensities.
  • the individual beams emitted by the light emitter system 304 will have a well-defined state of polarization that is not the same across the entire array. As an example, some beams may have vertical polarization and other beams may have horizontal polarization.
  • the LiDAR system will also include a light detector 308 containing a photodetector or array of photodetectors positioned and configured to receive light reflected back into the system.
  • the emitter system 304 and light detector 308 would rotate with the rotating shell, or they would rotate inside the stationary dome of the housing 306.
  • One or more optical element structures 310 may be positioned in front of the light emitting unit 304 and/or the light detector 308 to serve as one or more lenses or waveplates that focus and direct light that is passed through the optical element structure 310
  • One or more optical element structures 310 may be positioned in front of a minor 312 to focus and direct light that is passed through the optical element structure 310.
  • the system includes an optical element structure 310 positioned in front of the mirror 312 and connected to the rotating elements of the system so that the optical element structure 310 rotates with the mirror 312.
  • the optical element structure 310 may include multiple such structures (for example lenses and/or waveplates).
  • multiple optical element structures 310 may be arranged in an array on or integral with the shell portion of the housing 306.
  • each optical element structure 310 may include a beam splitter that separates light that the system receives from light that the system generates.
  • the beam splitter may include, for example, a quarter-wave or half-wave waveplate to perform the separation and ensure that received light is directed to the receiver unit rather than to the emitter system (which could occur without such a waveplate as the emitted light and received light should exhibit the same or similar polarizations).
  • the LiDAR system will include a power unit 318 to power the light emitting unit 304, a motor 316, and electronic components.
  • the LiDAR system will also include an analyzer 314 with elements such as a processor 322 and non-transitory computer-readable memory 320 containing programming instructions that are configured to enable the system to receive data collected by the light detector unit, analyze it to measure characteristics of the light received, and generate information that a connected system can use to make decisions about operating in an environment from which the data was collected.
  • the analyzer 314 may be integral with the LiDAR system 300 as shown, or some or all of it may be external to the LiDAR system and communicatively connected to the LiDAR system via a wired or wireless communication network or link.
  • FIG. 4 there is provided an illustration of an illustrative architecture for a computing device 400.
  • the computing device 110 of FIG. 1 and/or the vehicle on-board computing device 220 of FIG. 2 is/are the same as or similar to computing device 300. As such, the discussion of computing device 300 is sufficient for understanding the computing device 110 of FIG. 1 and the vehicle on-board computing device 220 of FIG. 2.
  • Computing device 400 may include more or less components than those shown in FIG. 4. However, the components shown are sufficient to disclose an illustrative solution implementing the present solution.
  • the hardware architecture of FIG. 4 represents one implementation of a representative computing device configured to operate a vehicle, as described herein. As such, the computing device 400 of FIG. 4 implements at least a portion of the method(s) described herein.
  • the hardware includes, but is not limited to, one or more electronic circuits.
  • the electronic circuits can include, but are not limited to, passive components (e.g., resistors and capacitors) and/or active components (e.g., amplifiers and/or microprocessors).
  • the passive and/or active components can be adapted to, arranged to and/or programmed to perform one or more of the methodologies, procedures, or functions described herein.
  • the computing device 400 comprises a user interface 402, a Central Processing Unit (“CPU”) 406, a system bus 410, a memory 412 connected to and accessible by other portions of computing device 400 through system bus 410, a system interface 460, and hardware entities 414 connected to system bus 410.
  • the user interface can include input devices and output devices, which facilitate user-software interactions for controlling operations of the computing device 400.
  • the input devices include, but are not limited to, a physical and/or touch keyboard 450.
  • the input devices can be connected to the computing device 400 via a wired or wireless connection (e.g., a Bluetooth® connection).
  • the output devices include, but are not limited to, a speaker 452, a display 454, and/or light emitting diodes 456.
  • System interface 460 is configured to facilitate wired or wireless communications to and from external devices (e.g., network nodes such as access points, etc ).
  • Hardware entities 414 perform actions involving access to and use of memory 412, which can be a Random Access Memory (“RAM”), a disk drive, flash memory, a Compact Disc Read Only Memory (“CD-ROM”) and/or another hardware device that is capable of storing instructions and data.
  • Hardware entities 414 can include a disk drive unit 416 comprising a computer-readable storage medium 418 on which is stored one or more sets of instructions 420 (e g., software code) configured to implement one or more of the methodologies, procedures, or functions described herein.
  • the instructions 420 can also reside, completely or at least partially, within the memory 412 and/or within the CPU 406 during execution thereof by the computing device 400.
  • the memory 412 and the CPU 406 also can constitute machine-readable media.
  • machine-readable media refers to a single medium or multiple media (e g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions 420.
  • machine- readable media also refers to any medium that is capable of storing, encoding or carrying a set of instructions 420 for execution by the computing device 400 and that cause the computing device 400 to perform any one or more of the methodologies of the present disclosure.
  • FIG. 5 there is provided a block diagram that is useful for understanding how vehicles control is achieved in accordance with the present solution. All of the operations performed in blocks 502-518 can be performed by the on-board computing device of a vehicle (e.g., AV 102i of FIG. 1).
  • a location of the vehicle is detected. This detection can be made based on sensor data output from a location sensor (e.g., location sensor 248 of FIG. 2) of the vehicle. This sensor data can include, but is not limited to, GPS data.
  • the detected location of the vehicle is then passed to block 506.
  • an object is detected within proximity of the vehicle. This detection is made based on sensor data output from a LiDAR system (e.g., LiDAR system 264 of FIG. 2) and a camera (e g., camera 262 of FIG. 2) of the vehicle. The manner in which the object detection is achieved will become evident as the discussion progresses.
  • Information about the detected object is passed to block 506 This information includes, but is not limited to, an initial predicted trajectory of the object, a speed of the object, a full extent of the object, a heading of the object, a direction of travel of the object, and/or a classification of the object.
  • the full extent of the object and the heading of the object can be specified by a cuboid defined in a 3D graph on which the LiDAR data points are plotted.
  • the plotted LiDAR data points form a 3D point cloud.
  • the initial predicted object trajectory can include, but is not limited to, a linear path pointing in the heading direction of the cuboid.
  • This object detection information output from block 504 can be subsequently used to facilitate at least one autonomous driving operation (e.g., object tracking operations, object trajectory prediction operations, vehicle trajectory determination operations, and/or collision avoidance operations).
  • a cuboid can be defined for the detected object in a 3D graph comprising a LiDAR dataset.
  • the cuboid heading and geometry can be used to predict object trajectories in block 512 as discussed below and/or determine a vehicle trajectory in block 506 as discussed below.
  • a worst-case predicted objected trajectory can be identified and used to trigger emergency maneuvers in blocks 514-518 as discussed below.
  • the present solution is not limited to the particulars of this example.
  • a vehicle trajectory is generated using the information from blocks 502 and 504.
  • Techniques for determining a vehicle trajectory are well known in the art. Any known or to be known technique for determining a vehicle trajectory can be used herein without limitation. For example, in some scenarios, such a technique involves determining a trajectory for the AV that would pass the object when the object is in front of the AV, the cuboid has a heading direction that is aligned with the direction in which the AV is moving, and the cuboid has a length that is greater than a threshold value. The present solution is not limited to the particulars of this scenario.
  • the vehicle trajectory 520 can be determined based on the location information from block 502, the object detection information from block 504, and map information 528 (which is pre-stored in a data store of the vehicle).
  • the vehicle trajectory 520 may represent a smooth path that does not have abrupt changes that would otherwise provide passenger discomfort.
  • the vehicle trajectory is defined by a path of travel along a given lane of a road in which the object is not predicted travel within a given amount of time.
  • the vehicle trajectory 520 is then provided to block 508.
  • a steering angle and velocity command is generated based on the vehicle trajectory 520.
  • the steering angle and velocity command is provided to block 510 for vehicle dynamics control.
  • the present solution augments the above-described vehicle trajectory planning process 500 of blocks 502-510 with an additional supervisory layer process 550.
  • the additional supervisory layer process 550 optimizes the vehicle trajectory for the most likely behavior of the objects detected in block 504, but nonetheless maintains acceptable operations if worst-case behaviors occurs.
  • This additional supervisory layer process 550 is implemented by blocks 512-518.
  • an object classification is performed in block 504 to classify the detected object into one of a plurality of classes and/or sub-classes.
  • the classes can include, but are not limited to, a vehicle class and a pedestrian class.
  • the vehicle class can have a plurality of vehicle sub-classes.
  • the vehicle sub-classes can include, but are not limited to, a bicycle subclass, a motorcycle sub-class, a skateboard sub-class, a roller blade sub-class, a scooter sub-class, a sedan sub-class, an SUV sub-class, and/or a truck sub-class.
  • the object classification is made based on sensor data generated by a LiDAR system (e.g., LiDAR system 264 of FIG.
  • Information 530 specifying the object’s classification is provided to block 512, in addition to the information 532 indicating the object’s actual speed and direction of travel.
  • Block 512 involves determining one or more possible object trajectories for the object detected in 504.
  • the possible object trajectories can include, but are not limited to, the following trajectories:
  • a trajectory defined by the object • a trajectory defined by the object’s actual speed (e.g., 1 mile per hour) and actual direction of travel (e.g., west);
  • a trajectory defined by the object s actual speed (e.g., 1 mile per hour) and another possible direction of travel (e.g., south, south-west, or X (e.g., 40°) degrees from the object’s actual direction of travel in a direction towards the AV) for the object;
  • a trajectory defined by another possible speed for the object e.g., 2-10 miles per hour
  • another possible direction of travel e.g., south, south-west, or X (e.g., 40°) degrees from the object’s actual direction of travel in a direction towards the AV) for the object.
  • the possible speed(s) and/or possible directi on(s) of travel may be pre-defmed for objects in the same class and/or sub-class as the object.
  • the one or more possible object trajectories 522 is(are) then passed to block 514.
  • the system may cause the vehicle’s speed and steering controllers to move the vehicle according to the defined trajectory as discussed below.
  • 512 may optionally also involve selecting one of the possible object trajectories which provides a worstcase collision scenario for the AV. This determination is made based on information 532 indicating the AV’s actual speed and direction of travel. The selected possible object trajectory is then passed to block 514, instead of all the possible object trajectories determined in 512.
  • a collision check is performed for each of the possible object trajectories 522 passed to block 514.
  • the collision check involves determining whether there is an undesirable probability that a collision will occur between the vehicle and the object. Such a determination is made by first determining if the vehicle trajectory 520 and a given possible object trajectory 522 intersect. If the two trajectories 520, 522 do not intersect, then the vehicle trajectory 520 is deemed to be an acceptable vehicle trajectory and no control action is taken to modify the vehicle trajectory.
  • a predicted time at which a collision would occur if the two trajectories are followed is determined.
  • the predicted time is compared to a threshold value (e.g., 1 second). If the predicted time exceeds the threshold value, then the vehicle trajectory 520 is deemed to be an acceptable vehicle trajectory and no control action is taken to modify the vehicle trajectory
  • the predicted time is equal to or less than the threshold value, then a determination is made as to whether the collision can be avoided if (a) the vehicle trajectory is followed by the AV and (b) any one of a plurality of dynamically generated emergency maneuvers is performed in a pre-defined time period (e.g., N milliseconds).
  • the dynamically generated emergency maneuvers include, but are not limited to, the following:
  • an emergency maneuver that comprises a braking command and that is determined based on the vehicle trajectory and a possible object trajectory; • an emergency maneuver that comprises at least a steering command, and a braking command or an acceleration command, and that is determined via a gradient descent from the active AV trajectory on an objective function which penalizes collision and/or ride discomfort; and/or
  • an emergency maneuver that comprises a pre-defined emergency maneuver that has been optimized via a gradient descent from the active AV trajectory on an objective function which penalizes collision and/or ride discomfort.
  • an emergency braking maneuver is produced by postulating a trajectory that maintains the intended trajectory for the pre-defined time period (N milliseconds) and then decelerates at a maximum braking profile parameterized by maximum allowable deceleration and jerk limits.
  • the maximum braking profile is produced along the original trajectory via Euler integration of a new velocity profile, or by other methods. The present solution is not limited to the particulars of these scenarios.
  • an emergency maneuver that comprises both steering and braking is generated by: parameterizing both steering and braking with a limited set of spline points (e.g., 4 spline points for steering and 3 spline points for velocity); minimizing an objective function which penalizes collision and/or ride discomfort, as a function of those parameters, using conjugate gradient descent, Newton’s method, Powell’s method, or other existing method(s) for minimizing multivariate functions; and computing the trajectory corresponding from the parameterized spline points with the minimal objective function cost.
  • a limited set of spline points e.g., 4 spline points for steering and 3 spline points for velocity
  • an objective function which penalizes collision and/or ride discomfort, as a function of those parameters, using conjugate gradient descent, Newton’s method, Powell’s method, or other existing method(s) for minimizing multivariate functions
  • computing the trajectory corresponding from the parameterized spline points with the minimal objective function cost is not limited to the particulars of these scenarios
  • a pre-defined emergency maneuver is generated by recording commands from a human operator during a simulated emergency braking event, or by sampling a small set of steering torques and braking profiles applied to the current vehicle state. These torques are computed at constant intervals from zero up until the limits of the steering and brake mechanism, or by other methods. The present solution is not limited to the particulars of these scenarios. [0099] If it is determined that the collision can be avoided in the pre-defined time period, then the vehicle trajectory 520 is deemed to be an acceptable vehicle trajectory and no control action is taken to modify the vehicle trajectory. Alternatively, the AV is caused to perform a cautious maneuver (e.g., mildly slow down such as by 5-10 mph).
  • a cautious maneuver e.g., mildly slow down such as by 5-10 mph.
  • a control action command is generated as shown by 516, and used to adjust or otherwise modify the vehicle trajectory at 508 prior to being passed to block 510.
  • the vehicle trajectory can be adjusted or otherwise modified to cause the vehicle to decelerate, cause the vehicle to accelerate, and/or cause the vehicle to change its direction of travel.
  • the AV is caused to immediately take an emergency maneuver.
  • This emergency maneuver may include one of the dynamically generated emergency maneuvers discussed above. Techniques for causing an AV to take emergency maneuvers are well known in the art.
  • FIG. 6 there is provided a flow diagram of an illustrative method 600 for controlling a vehicle (e.g., vehicle 102i of FIG. 1). At least a portion of method 600 is performed by a vehicle on-board computing device (e.g., vehicle on-board computing device 220 of FIG. 2). Method 600 is performed for each object (e g., vehicle 1022 of FIG. 1, cyclist 104 of FIG. 1, and/or pedestrian 106 of FIG. 1) that has been detected to be within a distance range from the vehicle at any given time.
  • Method 600 comprises a plurality of operations 602-630.
  • the present solution is not limited to the particular order of operations 602-630 shown in FIG. 6.
  • the operations of 620 can be performed in parallel with the operations of 604-618, rather than subsequent to as shown in FIG. 6.
  • method 600 begins with 602 and continues with 604 where a vehicle trajectory (e.g., vehicle trajectory 520 of FIG. 5) for an AV is generated.
  • the vehicle trajectory represents a smooth path that does not have abrupt changes that would otherwise provide passenger discomfort.
  • Techniques for determining a vehicle trajectory are well known in the art. Any known or to be known technique for determining a vehicle trajectory can be used herein without limitation.
  • the vehicle trajectory is determined based on location information generated by a location sensor (e.g., location sensor 260 of FIG. 2) of the AV, object detection information generated by the on-board computing device (e.g., on-board computing device 220 of FIG.
  • lane information is used as an alternative to or in addition to the location information and/or map information.
  • method 600 continues with 605 where the AV performs operations to detect an object that is in proximity thereto.
  • a CLF object detection algorithm is employed in 605.
  • the CLF object detection algorithm will be described in detail below.
  • the object detection is then used to facilitate at least one autonomous driving operation (e.g., object tracking operations, object trajectory prediction operations, vehicle trajectory determination operations, and/or collision avoidance operations).
  • a cuboid can be defined for the detected object in a 3D graph comprising a LiDAR data set.
  • the cuboid specifies a heading of the object and/or full extent of the object’s geometry.
  • the heading and object geometry can be used to predict an object trajectory and/or determine a vehicle trajectory, as is known in the art and discussed above.
  • the present solution is not limited to the particulars of this example.
  • method 600 continues with 606 where one or more possible object trajectories (e.g , possible object trajectories 522 of FIG. 5) are determined for the object (e.g., vehicle 1022, cyclist 104 or pedestrian 106 of FIG. 1) detected in 605.
  • object e.g., vehicle 1022, cyclist 104 or pedestrian 106 of FIG. 1
  • the possible object trajectories can include, but are not limited to, the following trajectories: a trajectory defined by the object’s actual speed (e.g., 1 mile per hour) and actual direction of travel (e.g., west); a trajectory defined by the object’s actual speed (e.g., 1 mile per hour) and another possible direction of travel (e.g., south, south-west, or X (e.g., 40°) degrees from the object’s actual direction of travel in a direction towards the AV); a trajectory defined by another possible speed for the object (e.g., 2-10 miles per hour) and the object’ s actual direction of travel (e.g., west); and/or a trajectory defined by another possible speed for the object (e.g., 2-10 miles per hour) and another possible direction of travel (e.g., south or south-west or X (e.g., 40°) degrees from the object’s actual direction of travel in a direction towards the AV).
  • one of the possible object trajectories is selected for subsequent analysis.
  • the operations of 610-628 are performed (e.g., in an iterative or parallel manner) for each possible object trajectory generated in 606.
  • the operations of 610-628 are performed for only one of the possible object trajectories which provides a worstcase collision scenario for the AV. This worst-case possible object trajectory is selected based on information indicating the AV’s actual speed and direction of travel (e.g., generated by a speed sensor 238 of FIG. 2 and/or location sensor 260 of FIG. 2).
  • a worst-collision scenario may include, but is not limited to, a collision scenario which is to occur sooner than all other collision scenarios provided by the possible object trajectories and/or is expected to result in serious injury or death (e.g., a high speed, side-impact collision or a high speed, head-on collision).
  • the operations 610-628 are performed for two or more of the possible object trajectories which provide the top Z (e g., 2 or 5) worst-case collision scenarios for the AV.
  • Z is an integer selected in accordance with a particular application. The present solution is not limited to the particulars of these scenarios.
  • next 610 a determination is made as to whether the vehicle trajectory generated in 604 and the possible object trajectory selected in 608 intersect each other. If the two trajectories do not intersect each other [611 :NO], then 612 is performed where method 600 returns to 604.
  • method 600 continues to 614 where a time value is determined.
  • This time value represents a time at which a collision will occur if the vehicle trajectory is followed by the AV and the possible object trajectory is followed by the object.
  • the time value determined in 614 is then compared to a threshold time value, as shown by 616.
  • the threshold time value is selected in accordance with a given application (e.g., one or more seconds). If the time value is greater than the threshold time value [616:NO], then 618 is performed where method 600 returns to 604. If the time value is equal to or less than the threshold time value [616:YES], then method 600 continues with 620- 622.
  • 620-622 involve: dynamically generating one or more emergency maneuver profiles based on the vehicle trajectory and the possible object trajectory; and determine whether the collision can be avoided if the vehicle trajectory is followed by the AV and any one of the emergency maneuvers is performed in a pre-defined time period (e.g., V milliseconds).
  • a pre-defined time period e.g., V milliseconds
  • the first maneuver can include, but is not limited to, one of the dynamically generated emergency maneuvers discussed above in relation to 620.
  • Techniques for causing an AV to take maneuvers are well known in the art. Any known or to be known technique for causing an AV to take maneuvers can be used here.
  • 630 is performed where method 600 ends or other processing is performed.
  • novel solution for detecting objects.
  • This novel solution may be performed in block 504 of FIG. 5 and/or block 605 of FIG. 6.
  • the novel solution is referred to herein as a CLF based solution.
  • CLF object detection is to detect objects in a LiDAR point cloud with added context from image detections.
  • the AV may operate is a cluttered environment in which objects can move and interact with the AV and/or each other. In a pure LiDAR environment, this task is extremely difficult in situations when objects are in close proximity to each other and interact with each other.
  • the CLF object detection takes full advantage of monocular camera image detections where detections are fused with the LiDAR point cloud.
  • LiDAR data points are projected into the monocular camera frame in order to transfer pixel information to the LiDAR data points, as described above.
  • the transferred information can include, but is not limited to, color, object type and object instance.
  • the CLF based solution detects objects (e.g., objects 1022, 114 and/or 116 of FIG. 1) in a LiDAR point cloud with added context from image detections.
  • the AV e.g., AV 102i of FIG. 1 must operate in a cluttered environment, where objects can move and interact. In a pure LiDAR environment, this task is extremely difficult in situations when objects are in close proximity to each other and also when objects interact.
  • Typical segment detection approaches based on Euclidean point clustering struggle to detect separate objects that are in close proximity to each other. For example, pedestrians staying close to the vehicle, loading it or entering it are likely to be represented as a single pedestrian+vehicle segment - the points are close, but objects are separate.
  • a large object such as a bus
  • a large windowed area on the sides of the bus allows the light from the laser scanner to pass freely through the window and get returns from the objects inside the bus. This produces multiple fragments that actually belong to the same large object. Points are far apart, but they belong to the same object.
  • monocular camera detections are fused with a LiDAR point cloud.
  • Points of the LiDAR point cloud are projected into a monocular camera frame in order to transfer pixel information to each point in the LiDAR point cloud.
  • the pixel information includes, but is not limited to, a color, an object type and an object instance.
  • the cameras e g., cameras 262 of FIG. 2
  • the cameras may have overlapping Fields Of View (“FOV”).
  • the LiDAR system’s vertical FOV does not perfectly overlap with the vertical FOVs of the cameras. Therefore, some LiDAR points may be visible from multiple cameras and other points may not be visible from any of the cameras.
  • the cameras are configured to fire as a LiDAR system sweeps over the center of the camera’s FOV.
  • This time alignment error e.g., the difference in time between LiDAR point capture and image capture
  • This time alignment error is used to compute a projection uncertainty, which is then used for LiDAR to Image Detection matching.
  • the camera image information is used as an additional cue to aid with LiDAR point segmentation.
  • the distance function used to cluster LiDAR points is augmented to include a color and an image detection instance compatibility. This makes LiDAR points projecting into different object detections in the image to appear as if they are further apart to the segmentation algorithm. Similarly, LiDAR points that project into the same image detection mask appear closer. This approach provides profound improvement compared to the segmentation that relies on Euclidean distance between points alone in cases with different objects in close proximity to each other.
  • Segmentation Any segmentation algorithm can be used by the present solution as long as it supports a customized distance function.
  • the segmentation algorithm used in the CLF based solution is LVS.
  • the present solution may include color distance and/or image detection instance compatibility in the distance function.
  • the two major error modes of any segmentation algorithm are under-segmentation (multiple objects represented with a single segment) and over- segmentation (single object represented as multiple segments).
  • an optimization is performed for a minimal number of under-segmentation events at the cost of a high number of over-segmentation events. Oversegmentation events are then handled by a separate Segment Merger component.
  • Segment Merger Any machine-learned classification technique can be employed by the present solution to learn which segments should be merged.
  • the machine-learned classification technique includes, but are not limited to, an artificial neural network, a random forest, a decision tree, and/or a support vector machine.
  • the machine-learned classification technique is trained to determine which segments should be merged with each other.
  • the same image detection information that was used in segmentation is now aggregated over the constituent points of the segment in order to compute segment-level features.
  • the ground height and lane information features from HD map are also used to aid segment merging.
  • Segment Filter Not all detected segments are relevant to the AV and many of them correspond to the clutter off the road (buildings, poles, garbage cans, etc.). This is where image detection information is used again to find relevant objects off the road. Because only tracking actors that can move off the road are only of interest, static objects can be discarded to improve the rest of the tracking pipeline latency and reduce its compute requirements. It is important to distinguish relevant objects (e.g., moving objects, or objects that can move and possibly intersect the AV path if start moving) from static objects (e.g., objects that are unlikely to move). Highly relevant objects may be assigned the highest priority in order to allocate limited onboard computation resources accordingly.
  • relevant objects e.g., moving objects, or objects that can move and possibly intersect the AV path if start moving
  • static objects e.g., objects that are unlikely to move
  • Every image detection mask corresponds to a collection of LiDAR points inside a frustum in 3D space.
  • the challenge here is that there are usually multiple objects at different depths projecting into the same image detection mask.
  • An example is a vehicle detection with the pole in front of it and also a pedestrian behind it.
  • LiDAR points that belong to the true pedestrian object and pole object will have points labeled as vehicle due to projection errors that occur during sensor fusion stage. These errors arise from difference in time when LiDAR point was acquired and when image pixel was acquired, parallax effect due to different positions of LiDAR and camera (LiDAR may see above the object seen by the camera), AV movement, actor movement, calibration errors, and/or accuracy and limited resolution of image detection masks.
  • the present CLF based solution has many advantages. For example, the present CLF based solution takes full advantage of image detections but does not only rely on image detections or machine learning. This means both separating objects in close proximity and detecting objects that have not been recognized before. This approach combines ML image detections with classical methods for point cloud segmentation
  • Over-Segmentation + Merge strategy is probably well known for image pixels, but may not be widely used when applied to LiDAR point clouds.
  • many baseline LiDAR detection approaches either operate with a single cluster step, or employ deep learning methods.
  • the proposed approach builds small clusters from low level features, but then extracts more meaningful features from the clusters to determine which clusters to merge in order to form objects
  • Method 700 begins with 702 and continues with 704 where operations are performed by a LiDAR system (e.g., LiDAR system 264 of FIG. 2) of the AV (e.g., AV 102i of FIG. 1 and/or 200 of FIG. 2) to generate a LiDAR dataset.
  • the LiDAR dataset measures a distance (contains distance, azimuth and elevation measurements) from the AV to at least one object (e.g., vehicle 1022 of FIG. 1) at a given time t
  • the LiDAR dataset comprises a plurality of data points that form a point cloud when plotted on a 3D graph.
  • LiDAR datasets are well known in the art. Any known or to be known technique for generating LiDAR datasets can be used here. In some scenarios, the LiDAR system continuously spins at 10 Hz and captures data at whatever its current angle is.
  • a detection is made as to when a sensor of the LiDAR system is about to sweep over a center of a camera’s FOV.
  • Operations of the camera e.g., camera 262 of FIG. 2 are triggered when such a detection is made, as shown by 708.
  • the camera captures an image as the LiDAR system’s sensor sweeps over the center of the camera’s FOV.
  • the image includes content representing the location of a first object (e.g., vehicle 1022 of FIG. 1) at a given time t relative to the AV.
  • the image is referred to herein as a camera frame or a monocular camera frame.
  • the camera is a global shutter (i.e., all pixels are captured at the same time) operating at 20 Hz.
  • the operations of 706-710 aid with the temporal alignment of the camera’ s firing with the LiDAR system sweeping.
  • the time alignment error i.e., the difference in time between LiDAR point capture and image capture is therefore minimized in the camera.
  • an on-board computing device e.g., on-board computing device 220 of FIG. 2 performs operations to obtain the image and the LiDAR dataset.
  • the on-board computing device then performs operations in 714-728 to detect objects in proximity to the AV using the image and the LiDAR dataset.
  • 714-728 involve: pruning (or reducing) a total number of points contained in the LiDAR dataset; performing LiDAR-to-Image object detection operations to compute a distribution of object detections that each point of the LiDAR dataset is likely to be in; performing local variation segmentation using the outputs of the LiDAR-to-Image object detection operations to create a plurality of segments of LiDAR data points; performing segment merging operations to merge the plurality of segments of LiDAR data points into objects; and performing segment filtering operations to detect objects in the point cloud defined by the LiDAR dataset.
  • the LiDAR points can be further pruned one or more times during the on-board computing device’s processing of the image and LiDAR dataset as shown by 718, 722 and 726.
  • the point pruning operations of 714, 718, 722 and 726 are described in detail in the following section entitled “Point Pruning” and in relation to FIG. 8.
  • the LiDAR- to-Image object detection operations of 716 are described in detail in the following section entitled “LiDAR-to-Image Detection Matching” and in relation to FIGS. 9-14.
  • the local variation segmentation operations of 720 are described in detail in the following section entitled “Local Variation Segmentation with Image Detection Features” and in relation to FIGS. 15-16.
  • the segment merging operations of 724 are described in detail in the following section entitled “Segment Merger” and in relation to FIG 17.
  • the segment filtering operations of 728 are described in detail in the following section entitled “Object Detection Segment Filtering” and in relation to FIG. 18.
  • LiDAR datasets may contain a significant number of points.
  • a LiDAR scanner e.g., LiDAR sensor system 264 of FIG. 2
  • LiDAR sensor system 264 of FIG. 2 may produce a high density range image that contains more than 100,000 points every 100 ms. Processing each and every LiDAR data point can be prohibitively expensive in a real-time system.
  • limiting the number of LiDAR data points that are ultimately processed by the system for object detection purposes yields advantages including, without limitation, reduced energy consumption, reduced draws on hardware capacity, and reduced system latency.
  • the present solution implements a method for pruning (or reducing) the number of LiDAR data points that are processed for purposes of detecting an object (e.g., AV 1022 of FIG. 1) that is located in proximity to an AV (e.g., AV 102i or FIG. 1).
  • FIG. 8 there is provided a flow diagram of an illustrative method 800 for pruning (or reducing) the number of LiDAR data points that are processed for purposes of detecting an object (e.g., AV 102i of FIG. 1) that is located in proximity to an AV (e.g., AV 102i or FIG. 1).
  • Method 800 may be performed by an on-board computing device (e.g., onboard computing device 220 of FIG. 2) and/or a remote computing device (e.g., computing device 110 of FIG. 1).
  • the operations of method 800 may be performed in the same or different order in accordance with a given application. Also, method 800 may be absent of one or more operations in accordance with a given application.
  • the operations of 804-814 may be performed at different points during an object detection process.
  • the downsampling operations of 804 can be performed in 714 of FIG. 7.
  • the downsampling operations of 806-808 can be performed in 714 and/or 718.
  • the downsampling operations of 810 can be performed in 718 of FIG. 7.
  • the operations of 812 can be performed in 714 and/or 722 of FIG. 7.
  • the operations of 814 can be performed in 714, 718, 722 and/or 726 of FIG. 7.
  • the present solution is not limited to the particulars of this example.
  • method 800 begins with 802 and continues with optional 804 where the LiDAR dataset is downsampled based on a planned trajectory of an AV.
  • downsampling is performed for LiDAR data points corresponding to a region of interest along a planned trajectory of the AV at a lower rate than the LiDAR data points corresponding to other regions that are not along the planned trajectory of the AV.
  • Downsampling may additionally or alternatively be performed for LiDAR data points corresponding to regions that are not of interest along the planned trajectory at a higher rate than the LiDAR data points corresponding to a region of interest.
  • a region of interest may be a region that includes LiDAR data points corresponding to at least one object that is likely to interfere with the AV when following the planned trajectory (e g., a region that includes a vehicle, a bicycle and/or a pedestrian along the planned trajectory of the AV). Regions that are not regions of interest may include LiDAR data points that correspond to at least one object that is unlikely to interfere with the AV when following the planned trajectory.
  • This object may include, but is not limited to, a parked vehicle on the side of a road, and a vehicle to the rear of the AV that is traveling in the opposite direction as the AV.
  • LiDAR data points of the LiDAR dataset are projected into a camera frame (or image) in order to transfer information from the image-based object detections to the LiDAR data points.
  • Techniques for projecting LiDAR data points into a camera frame are well known in the art. Any known or to be known technique for projecting LiDAR data points into a frame can be used here without limitation.
  • One known projection technique implements a naive projection algorithm that is defined by mathematical equation (1) provided below.
  • the transferred information is referred to herein as point labels.
  • a point label refers to an indication or description associated with a LiDAR data point that includes information or data particular to that LiDAR data point.
  • a point label may include an object class identifier (e.g., a vehicle class identifier, a pedestrian class identifier, a tree class identifier, and/or a building class identifier), a color (e g., an RGB value), at least one unique identifier (e.g., for the object, corresponding image pixel(s), and/or LiDAR data point), and/or an object instance identifier (e.g., if there are many objects of the same class detected in an image).
  • object class identifier e.g., a vehicle class identifier, a pedestrian class identifier, a tree class identifier, and/or a building class identifier
  • a color e.g., an RGB value
  • at least one unique identifier e.g., for the object, corresponding image pixel(s), and/or LiDAR data point
  • an object instance identifier e.g., if there are many objects of the same class detected in an image
  • the system may downsample a LiDAR dataset based on the associated point labels.
  • points of a LiDAR dataset are partitioned into two or more classes based on the point labels associated with the points of the LiDAR dataset.
  • LiDAR data points may be separated into two classes, namely a first class containing LiDAR data points assigned high importance labels and a second class containing LiDAR data points assigned low importance labels.
  • High importance labels may comprise labels that are important to track with a high accuracy.
  • the high importance label is assigned to LiDAR data points with, for example, object class identifiers associated with a vehicle class, a pedestrian, a bicycle, or other moving object class.
  • Low importance label is assigned to LiDAR data points with, for example, object class identifiers that are associated with static object classes (e.g., a building class, a foliage class, a construction barrier class, and/or a signage class).
  • the low importance labels may be less important than high importance labels to track with a high degree of accuracy.
  • the LiDAR dataset is then downsampled based on the importance labels of the points in the LiDAR dataset (as determined by their corresponding point labels).
  • LiDAR data points having high importance labels are not downsampled, or are alternatively downsampled with a high resolution.
  • LiDAR data points having low importance labels are downsampled more aggressively than the LiDAR data points having high importance labels, i.e., with a lower resolution.
  • the present solution is not limited to the particulars of this example.
  • the LiDAR dataset is downsampled in accordance with a frustum pruning algorithm.
  • a LiDAR dataset may include points that correspond to objects (e.g., other vehicles, pedestrians, cyclists, and/or signs) located on a road or other path of travel (e g., bike trail or path), and/or points that correspond to objects (e.g., buildings, trees and/or other foliage) located off road or other path of travel.
  • a frustum may be generated for one or more the detected objects.
  • the frustum corresponding to an image detection bounding box encompasses LiDAR data points of a point cloud that are likely to correspond to a particular object.
  • the LiDAR data points that project within or in proximity to the image detection bounding box may be of more relevance or importance to the object detection process than the LiDAR data points that project further away from the bounding box since the LiDAR data points located further away from the bounding box are unlikely to correspond to objects of interest (e.g., pedestrian, bike, vehicle).
  • the LiDAR data points may be further downsampled and/or pruned based on their distances from the bounding box. For example, pruning is performed for the LiDAR data points that are located more than a threshold distance away from the bounding box. If the distance is less than or equal to the threshold distance, then the point remains in the LiDAR dataset. If the distance is greater than the threshold distance, the point is removed from the LiDAR dataset.
  • the present solution is not limited to the particulars of this example. If in addition to the image detection bounding box, the image object boundary is known (in the form of a pixel mask for example), then (instead of using the distance to the bounding box) the distance to the mask can be used instead of the distance to the bounding box.
  • the decision as to whether to keep the point in the dataset is determined based on whether the point projects into the dilated mask.
  • the LiDAR dataset is downsampled using a map that includes information associated with a trajectory of an AV (e.g., AV 102i of FIG 1).
  • an AV may have a planned trajectory or path of travel that it is autonomously following.
  • the map includes various information that corresponds to the planned trajectory or path of travel. This information may include, but is not limited to, information about lane placement, surface gradient, road boundaries, and/or locations of stationary objects.
  • the map may be stored and/or retrieved from a datastore (e g., memory 412 of FIG. 4) of the AV.
  • One or more points of the LiDAR dataset may be identified for downsampling relative to the map.
  • downsampling is performed for LiDAR data points that are located below a minimum height threshold value on the map.
  • a minimum height threshold value For example, an assumption is made that most LiDAR points of interest to an AV correspond to objects that have heights that exceed a certain height measurement (e.g., two feet). Points are removed from the LiDAR dataset that are associated with heights less than the minimum height threshold value (e.g., two feet). An assumption may also be made that most LiDAR points of interest to an AV correspond to objects that have heights below a maximum height threshold value (e.g., 100 feet). Thus, points are removed from the LiDAR dataset that are associated with heights exceeding the maximum threshold value.
  • the present solution is not limited to the particulars of this example.
  • the points of the LiDAR dataset are downsampled based on process latency.
  • An object detection pipeline may employ multiple algorithms that have different time complexity characteristics.
  • the entire pipeline latency as a function of input data size may be a non-linear curve. Analysis of latency data from vehicle logs may provide insights on how the function looks.
  • the function may be a linear function and/or a higher order function (e.g., polynomial).
  • a pipeline latency model is created.
  • the pipeline latency model is then utilized to estimate latency given a certain input data size, and may use this estimated latency to manipulate downampling resolution.
  • 816 is performed where method 800 ends or other operations are performed.
  • the LID matching algorithm of the present solution has multiple aspects. These aspects include: (i) synchronizing camera firing with LiDAR system sweeping; (ii) accounting for projection uncertainty with known camera calibration uncertainties; and (iii) determining which image detection of a plurality of image detections each point in a LiDAR dataset is most likely to be in. As noted above, aspect (i) is achieved by triggering image capturing when a focal point of the LiDAR sensor is aligned with a center of the camera’ s FOV. This time alignment error (i.e., the difference in time between LiDAR point capture and image capture) is minimized by this synchronization.
  • Aspect (ii) involves: determining an uncertainty in camera calibration based on eleven calibration parameters (i.e., 5 intrinsic: an xy focal length, a skew, an xy image center; 6 extrinsic: XYZ translation, 3 degrees of freedom rotation); projecting the uncertainty into a camera frame; and determining a distribution of pixels to which a LiDAR point may project (instead of a single pixel).
  • Aspect (iii) is achieved by: considering each object detection as an independent measurement; and using the confidences to compute a distribution of detections in which a LiDAR point is likely to be.
  • Aspects (i)-(iii) allow the LID matching algorithm to account for several sources of error and uncertainty to better match LiDAR points with camera-space objects.
  • the LID matching algorithm takes into account both projection uncertainty and the full confidence information in image detections. Presently, no projection uncertainty is considered and image detection confidences (in the whole detection and per-pixel in the mask) are binarized. Object type estimation would be updated to take the new matching into account.
  • the present solution computes an object type distribution for each image detection that a LiDAR point may project into.
  • the set of object type distributions are then combined using the estimated probability for each image detection.
  • a naive method might be, for a point in multiple image detections, to average the type distribution for each image detection.
  • the present solution is a weighted average, weighted by the likelihood of each image detection into account.
  • Method 900 continues with image analysis operations 906-912. These image analysis operations 906-912 may be performed by a Commercial-Off-The-Shelf (“COTS”) image analyzer implementing a conventional object detection algorithm. 906-912 generally involve: identifying one or more objects (e g., vehicle 1022 of FIG. 1, cyclist 114 of FIG. 1, pedestrian 116 of FIG. 1, and/or vehicle 1002 of FIG.
  • COTS Commercial-Off-The-Shelf
  • the on-board computing device determines or obtains extrinsic LiDAR sensor and camera calibration parameters and intrinsic camera calibration parameters.
  • the extrinsic LiDAR sensor and camera calibration parameters include, but are not limited to, LiDAR sensor coordinates, and/or information indicating a correspondence between LiDAR sensor coordinates and camera coordinates.
  • the intrinsic camera calibration parameters include, but are not limited to, an x focal length, a y focal length, a skew, an image center, a focal center of the image, and/or 3D coordinates (x, y, z) of a camera position.
  • various information is input into a LID matching algorithm.
  • This information includes, but is not limited to, identifiers for each object detected in the image, mask identifiers, cell identifiers for each mask, confidence values for each cell, LiDAR point identifiers, LiDAR point coordinates, extrinsic LiDAR sensor and camera calibration parameters, and intrinsic camera calibration parameters.
  • These inputs are used in subsequent operations 918-920 to: determine (for each point of the LiDAR dataset) a probability distribution of pixels to which a LiDAR data point may project taking into account a projection uncertainty in view of camera calibration uncertainties; and determine (for each point of the LiDAR dataset) a probability distribution over a set of object detections in which a LiDAR data point is likely to be, based on the confidence values.
  • the operations of 918 are described in detail below in relation to FIG. 13.
  • the operations of 920 are described in detail below in relation to FIG. 14.
  • 922 is performed where method 900 ends or other operations are performed (e.g., return to 902).
  • the on-board computing device computes a Probability Distribution Function (“PDF”) /(x*, /) over image space coordinates for a pixel to which a LiDAR point would probably project in accordance with a naive projection algorithm (i.e., a probability distribution that is centered around a naive projection point).
  • PDF Probability Distribution Function
  • the naive projection algorithm is defined by the following mathematical equation (1). where x' and / represent image space coordinates for a pixel, and X, ⁇ and Z represent LiDAR space coordinates for a point of the LiDAR dataset. Basically, each point of the LiDAR dataset is projected onto a pixel of an image that resides on the same line as the pixel, where a line is drawn from each pixel to a region of the image.
  • each PDF for each LiDAR point is required to be: (i) representable in image space coordinates; (ii) convertible to image detection mask coordinates (can be translated and scaled); and (iii) composable (or combinable) with other projection uncertainty PDFs.
  • the present solution uses a Jacobian of the PDF to propagate an uncertainty from LiDAR-frame to camera-frame. This (or a similar alternative for propagating uncertainty) helps satisfy requirement (i) for probability distribution.
  • the PDF is then converted to image detection mask coordinates, as shown by 1306.
  • This conversion is achieved via translation and scaling (where the scaling in x and the scaling in y are independent).
  • the conversion is defined by the following mathematical equation (2). where Xbbax and Xbbax represent the image space boundaries of a bounding box and R is a mask resolution.
  • 920 of FIG. 9 involves performing various operations 1404- 1408.
  • the on-board computing device computes a probability that a LiDAR point Ipi projects into a given image detection independent of all other image detections (e.g., d2, . . dio).
  • the probability is expressed as At this point, a exists for a likely LiDAR point projection over image detection mask coordinates.
  • the image detection confidence ca and the per-pixel confidences c x m, y m are considered in this computation. These confidences are in [0, 1] but are not probabilities.
  • the mapping can include, but is not limited to, a logistic function.
  • the per-pixel confidences in the image detection mask are for the whole mask pixel (no infinitesimal coordinates). So, the onboard computing device computes the probability that a LiDAR point projects into a specific image detection mask pixel in accordance with mathematical equation (3).
  • Ip is a LiDAR point
  • tnp is a mask pixel
  • dmp represents a mask pixel associated with a given object detection d
  • dy represents y-axis coordinate for a mask pixel associated with the given object detection ⁇ Z
  • dx represents an x- axis coordinate for the mask pixel associated with the given object detection d
  • This probability p lp e mp) is then used by the on-board computing device to compute the probability that the LiDAR point is in the image detection independent of all other image detections. This computation is defined by the following mathematical equation (4). where the mask resolution is R by R.
  • this probability is computed for each detection the LiDAR point may project into.
  • the probabilities may sum up to greater than one. An assumption is made that a LiDAR point can only project into a single image detection. Thus, each independent probability is treated as an independent measurement is independent of
  • the on-board computing device further computes the probability that the LiDAR point does not project into any image detection, as shown by 1406. This computation is defined by mathematical equation (5).
  • the on-board computing device computes a dependent probability by normalizing over all computed probabilities.
  • This computation is defined by the following mathematical equation (6).
  • the LID matching algorithm outputs this probability for every detection that the LiDAR point may project into. That is, for each point, a sparse probability distribution over image detections is output from the LID matching algorithm.
  • the sparse probability distribution represents the probability distribution over a set of object detections in which a LiDAR data point is likely to be.
  • the present solution provides an improved LVS based algorithm that eliminates or minimizes the merging of close objects. This improvement is at least partially achieved through the use of additional features including (i) an image detection capability feature and (ii) a modified distance feature.
  • Feature (i) is the difference between which image detections each point is in. Each point has a per-camera distribution of image detections that it’s in (and the likelihood that it’s not in any image detection). The information from all cameras are combined probabilistically into a single number that indicates whether the points are likely in the same image detection or not.
  • Feature (ii) is an expanded or contracted height component of a geometric distance between points.
  • Feature (ii) is provided to address the issues that point clouds do not have a uniform density of points and that there are fewer lasers pointed at the upper and lower ends of an object.
  • Feature (i) and (ii) are combined in the LVS based algorithm with common features such as color similarity.
  • Features (i) and (ii) provide a superior object detection capability, by being more likely to combine clusters that are in the same object and less likely to combine clusters that are not in the same object.
  • the conventional PCLVS algorithm handles segmentation in a wide variety of relatively easy and moderate scenarios for extracting objects from a point cloud, but does not currently perform as desired in challenging scenarios.
  • This approach does not leverage other aspects of the information available from the LiDAR data, such as (i) the negative information provided by LiDAR returns passing through regions of the environment without interacting and (ii) the underlying structure of how the data is captured. This information can be used to improve performance of segmentation in ambiguous or challenging scenarios.
  • the PCLVS approach attempts to largely produce segments which correspond 1 : 1 to objects in the world, without rigorously utilizing information outside the LiDAR returns to do so. This leads to an increase in segmentation errors, particularly under-segmentation errors.
  • Under-segmentation errors are particularly difficult to solve after segmentation, due to the fact that splitting an undersegmented object requires implementing a second segmentation algorithm. Biasing towards over- segmentation provides two crucial benefits: an improvement in the ability to extract the boundaries which most critically impact motion planning for an AV, and allowing postprocessing to reason about merging segments together, which is a fundamentally different algorithm.
  • the present solution proposes a new LVS based segmentation approach which solves these problems: providing a framework for integrating additional information from the LiDAR sensors; defining the problem to ensure that the output is structured in a fashion which is more amendable to downstream processing; and improving performance by reducing undersegmentation and improving boundary recall.
  • LiDAR data points 1502 are input into the LVS algorithm 1500.
  • the LiDAR data points 1502 are passed to a graph constructor 1504 where a connectivity graph is constructed by plotting the LiDAR data points on a 3D graph and connecting LiDAR data points.
  • the LiDAR data point connections may be made based on whether two points are within a threshold spatial distance from each other, and/or whether two points are within a threshold temporal distance from each other.
  • each LiDAR data point is connected to its K-nearest neighbors.
  • a Delaunay triangulation is constructed and used as the connectivity graph.
  • the connected LiDAR data points represent a proposed set of LiDAR data points that should be merged to form a segment 1512.
  • An illustrative graph 1600 is provided in FIG. 16. As shown in FIG. 16, the graph 1600 has a plurality of nodes 1602 representing LiDAR data points or measurements. Connection lines 1604 have been added between the nodes 1602. The connection lines 1604 are also referred to herein as graph edges ey.
  • a descriptor determiner 1506 determines a descriptor for each node 1602 (or LiDAR data point).
  • the descriptor is a vector V of elements that characterize the node (or LiDAR data point).
  • the elements include, but are not limited to, surface normals Ni, a per-point color value (R i GiBi) based on an image (e.g., image 1000 of FIG. 10), an intensity li, a texture 7 ⁇ .
  • V (Ni, RGiBi, li, Ti, (Xi, yi , zi), Hi, ck, idi, fi, FPFHi, . . .) (7)
  • An edge weight assignor 1508 assigns weights to each graph edge eij.
  • the graph edge comprises an edge feature MDi.
  • the modified distance MDi is an expanded or contracted height component of a geometric distance between nodes (or LiDAR data points).
  • the modified distance MDi may be defined by the following mathematical equation (8).
  • H is the point height above ground
  • a and k are constants for logistic function that compresses the Z axis distances when points are close to the ground.
  • the weights each represent a dissimilarity measure between two adjacent nodes 1602.
  • a weight is computed for each type of element contained in the vector V. More specifically, a weight w n is computed for surface normal, which may be defined by the following mathematical equation (9).
  • a weight w c is computed for color, which may be defined by the following mathematical equation (10).
  • a weight wi i s computed for intensity which may be defined by the following mathematical equation (11).
  • - and Ij are LiDAR point intensities
  • I max is the maximum possible intensity value.
  • a weight Wd is computed for 3D graph coordinates, which may be defined by the following mathematical equation (12). where d min represents a minimum distance within the graph, and d max represents a maximum distance within the graph.
  • a weight W cI is computed for ch. which may be defined by the following mathematical equation (13).
  • the value of weight Wd may be 1 if the object classes are different, or -1 if the object classes are the same.
  • a graph node may be composed of multiple LiDAR points.
  • cI i is the probability distribution over object classes for constituent points. Bhattacharyya distance can be used to compute the similarity between two probability distributions.
  • a weight W FPFH is computed for False Point Feature Histogram which may be defined by the following mathematical equation (14).
  • a weight w iDc is computed for image detection capability which may be defined by the following mathematical equation (15). where c is the compatibility between points, C is the set of all cameras, D c is the set of image detections in C, and d is the clamping function.
  • a weight WMD is computed for modified distance which may be the same as MDij above.
  • the above weights may be combined into one non-negative scalar w(e ij ) by, for example, linear combination.
  • the information from all cameras are combined probabilistically into a single number that indicates whether the points are likely in the same image detection or not.
  • the non-negative scaler w(e ij ) may be defined by the following mathematical equation (16). where k n , k c , k i , k T , k d , k H , k cI , k id , k ppFH , k IDC and km are predefined constants.
  • the scaler w(eij) is then assigned by the edge assignor 1508 as the edge weight for a given graph edge ey.
  • the edge weights w(ey) are then passed to a LiDAR point merger 1510,
  • the LiDAR point merger 1510 uses the edge weights w(ey) to decide which LiDAR data points should be merged together to form segments 1512. The LiDAR points are merged based on these decisions.
  • the output of the LiDAR point merger 1510 is a plurality of segments 1512. The segments 1512 are used in subsequent segment merging operations.
  • the iterative segment merging operations performed by the LiDAR point merger 1510 involve building segments by iteratively merging smaller segments, until a stopping condition is reached. Specifically, all nodes 1602 are initially considered individual segments, and all graph edges 1604 are sorted in ascending order by edge weight w(ey). The graph edges 1604 are considered in order, treating each graph edge as a merge proposal if the graph edge connects two different segments. A merge proposal is accepted if the weight between the two segments is less than a largest internal variation of the two segments, plus a term which biases segmentation to merge small segments.
  • the final output 1512 is a segmentation of all observations into distinct clusters. Each of the segments 1512 comprises one or more LiDAR points.
  • a metric generator 1514 is provided to collect, compute and/or generate segmentation metrics from the segmentation operation and output.
  • segmentation metrics include an under-segmentation error metric, a boundary recall metric, and an instance precision and recall metric.
  • Under-segmentation error metric measures how much the segmentation results include segments which cross boundaries between distinct objects in the scene. Since an undersegmentation event involves two ground truth segments, this error metric must be computed such that it does not double count the event.
  • the under-segmentation error metric can be computed by finding each segment which intersects more than one ground-truth object, and dividing the segment between the ground-truth objects.
  • the under-segmentation error metric is then defined as the sum of the smaller of the two sub-segments for all these under-segmentations, averaged over the number of points across all segments. More formally, the under-segmentation error metric UE is defined by the following mathematical equation (18). where GT represents a set of ground truth labels, and O represents a set of computed labels.
  • the boundary recall metric measures a degree to which a boundary of each object is recovered by segmentation. Over-segmentation produces boundaries which are internal to ground truth segmentation, but are intrinsic to the performance improvements of the present approach. Thus, this metric aims to measure how many of the LiDAR data points which represent boundaries of objects are extracted by a given segmentation. This can be computed by projecting the 3D point cloud data into a depth image, and painting each pixel with an associated segment label. Boundaries can thus be computed by finding the edges in the image. The same process can be performed with the output segmentation, with edges then being labels as true positives (edges present in both images) and false negatives (edges present in the ground truth data, but not in the output segmentation).
  • the boundary recall metric BR may be defined by the following mathematical equation (19).
  • a performance of extracting objects of interest can be computed as precision and recall metrics over object instances. For each object in the ground truth, a determination can be made as to whether a segment is majority associated with a ground truth label in the same fashion as is performed in under-segmentation error. With this information, precision and recall can be computed in a standard fashion.
  • the segments 1512 output from the LVS algorithm 1500 are too small for estimating cuboids.
  • a segment merger is employed to construct segments large enough for subsequent shape prior (e.g., cuboid) estimation.
  • the segment merger performs segment merging operations that generally involve: selecting pairs of segments; identifying which pairs of segments have a centroid-to-centroid distance greater than a threshold value (e.g., 3 m); computing features for each segment pair (which centroid-to-centroid distance less than the threshold value (e.g., ⁇ 3 m)) based on the attributes of the segments contained in the pair; generating (for each segment pair) a probability that the segments should be merged based on the computed features; and merging segments based on the probabilities.
  • a threshold value e.g. 3 m
  • the segments 1512 are input into the segment merger 1700.
  • the segments 1512 may optionally be pre-processed in 1706.
  • Pre-processing operations are well known in the art.
  • the pre-processing can involve selecting pairs of segments, obtaining centroids for the segments, determining centroid-to-centroid distances for each pair of segments, identifying which pairs of segments have a centroid-to-centroid distance greater than a threshold value (e.g., 3 m), and removing the identified pairs of segments from further consideration for segment merging purposes.
  • a threshold value e.g., 3 m
  • the threshold values is defined as a sum of a first segments’ radius from the centroid and a second segment’s radius from the centroid plus a pre-defined constant (e.g., 0.5 m).
  • a set of attributes for each segment may be obtained and/or generated.
  • a set of attributes can include, but is not limited to: (i) a 2D region that the LiDAR data points in the segment cover; (ii) an average of the probability distributions that were computed in 920 of FIG. 9 for the LiDAR. data points contained in the segment; (iii) a percentage of LiDAR. data points contained in the segment that are on a road; (iv) a percentage of LiDAR data points contained in the segment that are off a road; and/or (v) a total number of lanes that a segment at least partially overlaps.
  • Attributes (i), (iii), (iv) and (v) may be determined using a road map, a lane map and/or other map. For example, attribute (i) is determined by identifying a region on the map where the segment resides. Attributes (ii) and (iii) are determined by identifying which LiDAR data points in a segment reside on a road contained in the map, and identifying which LiDAR data points in a segment do not reside on a road contained in the map. Attribute (iv) is determined by identifying which lanes in a map the LiDAR data points of segment cover, and counting the number of identified lanes.
  • a graph is constructed in which the segments are plotted. Links are added to the graph for pairs of nearby-by segments (taking into account the size of each segment). These links define pairs of segments for which features should be generated by feature generator 1712.
  • each set of features describes a pairing of two segments.
  • the features may be generated using the attributes generated by attribute generator 1708.
  • the features can include, but are not limited to: • difference between an average of a probability distributions that was computed in 920 of FIG. 9 for a first segment and an average of a probability distributions that was computed in 920 of FIG. 9 for a second segment;
  • difference in off-road proportions e.g., difference in a percentage of LiDAR data points contained in a first segment that are off a road and a percentage of LiDAR data points contained in a second segment that are off a road
  • region compatibility e.g., a degree of overlap between the 2D lanes which are covered by first and second segments
  • lane compatibility e.g., a degree of overlap between the lanes in which first and second segments are in
  • lane compatibility e.g., a degree of overlap between the lanes in which first and second segments are in
  • object type compatibility e.g., If there is any intersection in the types that any constituent point projects to, then compatible.
  • the features are then passed from the feature generator 1712 to the machine learned classifier 1714.
  • the machine learned classifier 1714 analyzes each set of features to determine a probability that the corresponding segments should be merged. For example, a low probability for merging two segments is determined when (1) a difference between probability distribution averages exceeds a threshold value and (2) lane incompatibility exists. In contrast, a high probability exists when (1) a difference between probability distributions averages is less than a threshold value and (2) lane compatibility exists.
  • the present solution is not limited in this regard.
  • the probabilities could be assigned a numerical value (e.g., 0-10) in addition to or as an alternative to a level (e g , low, medium, or high).
  • the level or degree of probability can be determined by any combination of features selected in accordance with a given application.
  • the machine learned classifier 1714 is trained using a machine learning algorithm that learns when two segments should be merged together in view of one or more features. Any machine learning algorithm can be used herein without limitation. For example, one or more of the following machine learning algorithms is employed here: supervised learning; unsupervised learning; semi-supervised learning; and reinforcement learning. The learned information by the machine learning algorithm can be used to generate rules for determining a probability that two segments should be merged. These rules are then implemented by the machine learned classifier 1714.
  • the merge probabilities are then analyzed by the machined learned classifier 1714 to classify the pairs of segments as merge pairs or non-merge pairs. For example, a pair of segments is classified as a merge pair when the respective merge probability has a level of high or has a numerical value greater than a threshold value. In contrast, a pair of segments is classified as a non-merge pair when the respective merge probability has a level of low or has a numerical value less than a threshold value.
  • the present solution is not limited to the particulars of this example.
  • the classifications are then passed to the merger 1716.
  • the merger 1716 merges the segments in accordance with the classifications. For example, segments in each merge pair are merged together. Notably, redundant links are not evaluated for segment merging purposes. For example, if segment A should merge with segment B and segment B should merge with segment C, then the segment merger 1716 does not evaluate merging segment A with segment C.
  • the present solution is not limited to the particulars of this example.
  • the estimated cuboid now has enough information to merge fragments based on their overlap area with the estimated cuboid.
  • Another example where cuboids help is segmentation of the buses. A large window area allows laser light to pass through and scan the interior portions of the bus resulting in multiple fragments that are far away from the L-shape of the bus exterior.
  • the merger 1716 outputs a plurality of merged segments 1714
  • the image detections are used to find relevant objects off the road. Because only off-road moving actors are of interest, static objects can be discarded to improve the rest of the CLF object detection pipeline and reduce the CLF object detection algorithm’s computational requirements.
  • An example is a vehicle detection with a pole in front and also a pedestrian behind. LiDAR data points that belong to the true pedestrian object and the pole object will have points labeled as vehicles due to projection errors that occur during the sensor fusion stage.
  • projection characteristics are computed for all segments containing LiDAR data points that project into a particular image detection mask. One or more best matches are reported that are likely to correspond to the object detected on the image. This helps eliminate clutter from the set of tracked objects, and reduces tracking pipeline latency and computational requirements.
  • FIG. 18 there is provided a flow diagram of an illustrative method 1800 for object detection segment filtering.
  • Input into a segment filter is a collection of candidate segments formed at earlier stages of the pipeline, where every candidate may or may not correspond to the real world object.
  • the intuition behind adding nearby points, followed by geometric segmentation is that the projected points of a false cluster (such as a wall or a tree) will have many /V points within close distance to P points which results in a single cluster containing both point categories.
  • the resulting false cluster will contain a relatively small number of Ppoints compared to the total number of points in the cluster.
  • a true cluster will mostly consist of Ppoints with a relatively small number of /V points.
  • a cluster feature U is needed to discriminate true segments of LiDAR data points from false segments of LiDAR data points.
  • cluster feature U when it is not sufficient to identify true segments
  • a larger true object e.g., vehicle
  • a smaller false object e.g., a pole
  • a smaller false object may consist entirely of P points while a vehicle cluster will have some mix of P points and /V points.
  • another cluster feature Pis needed, and used in conjunction with cluster feature U to verify that the segment is correctly associated with a given object detection.
  • the cluster feature Pis defined by the following mathematical equation (21).
  • V count (P)/ count (D ) (21 ) where D represents a total number of points that project into a particular image detection mask m (e.g., mask 1200 of FIG. 12).
  • the D points are usually distributed across multiple objects in the world.
  • cluster features that can be used to identify segments of LiDAR data points that are associated with a pedestrian, a vehicle, a bicyclist, and/or any other moving object.
  • additional cluster features include a cluster feature H representing a cluster height, a cluster feature £ representing a cluster length, and a cluster feature LTW representing a length- to-width ratio for a cluster.
  • Clusters with a height above 2.0 - 2.5 meters are unlikely to be associated with pedestrians.
  • Clusters over 1 meter in length are unlikely to be associated with pedestrians.
  • Clusters with a length-to-width ratio above 4.0 often tend to be associated with buildings and are unlikely associated with pedestrians.
  • Clusters with high cylinder convolution score are likely to be associated with pedestrians.
  • method 1800 begins with 1804 where various information for a given image detection mask m (e.g., mask 1200 of FIG. 12) is obtained (e.g., from memory 412 of FIG. 4).
  • This information includes, but is not limited to, Pm representing a number of points of a LiDAR data set that project into the mask m, Si representing a number of points forming a given merged segment s of LiDAR data points (e g., merged segment 1714 of FIG.
  • Cluster feature U may be determined in accordance with the following mathematical equation (22), and cluster feature V may be determined in accordance with the following mathematical equation (23).
  • Cluster feature H is set equal to h s .
  • Cluster feature £ is set equal to l s .
  • Cluster feature LTW may be determined by the following mathematical equation (24).
  • a projection score PS is computed based on the cluster features U, V, H, L, and/or LTW.
  • the projection score may be defined by the following mathematical equation (25).
  • the projection score can represent the product of any combination of cluster features.
  • the projection score is used to verify that the merged segment is part of the detected object associated with a given image detection mask. Such verification can be made when the projection score is greater than a threshold value.
  • An object detection may be made in 1816 when such a verification is made. In some scenarios, the object detection is made based on the results of operations 1804-1814 for two or more merged segments that are associated with the same image detection mask. For example, an object detection is made that a given merged segment of a plurality of merged segments is associated with a given detected object when the PS computed for the given merged segment is greater than the PSs computed for the other merged segments of the plurality of merged segments. Subsequently, 1818 is performed where method 1800 ends or other processing is performed.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Transportation (AREA)
  • Mechanical Engineering (AREA)
  • General Physics & Mathematics (AREA)
  • Chemical & Material Sciences (AREA)
  • Combustion & Propulsion (AREA)
  • Remote Sensing (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Automation & Control Theory (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Electromagnetism (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Traffic Control Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

Systems and methods for object detection. The methods comprise: obtaining a LiDAR dataset and using the LiDAR dataset and image(s) to detect an object by: matching points of the LiDAR dataset to pixels in the image; generating a pruned LiDAR dataset by reducing a total number of points contained in the LiDAR dataset; computing a distribution of object detections that each point of the LiDAR dataset is likely to be in; creating a plurality of segments of LiDAR data points using the distribution of object detections; merging the plurality of segments of LiDAR data points to generate merged segments; and/or detecting the object in a point cloud defined by the LiDAR dataset based on the merged segments of LiDAR data points. The object detection may be used to facilitate at least one autonomous driving operation.

Description

SYSTEMS AND METHODS FOR CAMERA-LIDAR FUSED OBJECT DETECTION
CROSS-REFERENCE AND CLAIM OF PRIORITY
[0001] This patent document claims priority to U.S. Patent Application No. 17/078,532 filed October 23, 2020, U.S. Patent Application No. 17/078,543 filed October 23, 2020, U.S. Patent Application No. 17/078,548 filed October 23, 2020, U.S. Patent Application No. 17/078,561 filed October 23, 2020, and U.S Patent Application No. 17/078,575 filed October 23, 2020, all of which are incorporated herein by reference.
BACKGROUND
Statement of the Technical Field
[0002] The present disclosure relates generally to object detection systems. More particularly, the present disclosure relates to implementing systems and methods for carnera- LiD AR Fused (“CLF”) object detection with LiDAR-to-image detection matching, point pruning, local variation segmentation, segment merging and/or segment filtering.
Description of the Related Art
[0003] Modem day vehicles have at least one on-board computer and have internet/ satellite connectivity. The software running on these on-board computers monitor and/or control operations of the vehicles. The vehicle also comprises LiDAR detectors for detecting objects in proximity thereto. The LiDAR detectors generate LiDAR datasets that measure the distance from the vehicle to an object at a plurality of different times. These distance measurements can be used for tracking movements of the object, making predictions as to the objects trajectory, and planning paths of travel for the vehicle based on the predicted objects trajectory.
SUMMARY
[0004] The present disclosure concerns implementing systems and methods for object detection with LiDAR-to-image detection matching. The object detection may be used to control an autonomous vehicle. In this scenario, the methods comprise: obtaining, by a computing device, a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; and using, by a computing device, the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle. The object is detected by: matching points of the LiDAR dataset to pixels in the at least one image; and detecting the object in a point cloud defined by the LiDAR dataset based on the matching. The object detection is used to facilitate at least one autonomous driving operation (e.g., autonomous driving operation comprises an object tracking operation, an object trajectory prediction operation, a vehicle trajectory determination operation, and/or a collision avoidance operation).
[0005] In some scenarios, the methods also comprise obtaining, by the computing device, at least one image that was captured at a time when a sensor of the LiDAR system swept over a center of a camera Field Of View (“FOV”), wherein the at least one image is used in addition to the LiDAR dataset to detect the object. The matching may be based on identifiers for each object detected in the at least one image, a mask identifier, cell identifiers for a mask, confidence values for each cell, LiDAR point identifiers, LiDAR point coordinates, extrinsic LiDAR sensor and camera calibration parameters, and/or intrinsic camera calibration parameters.
[0006] The matching may comprise determining a probability distribution of pixels of the at least one image to which a point of the LiDAR dataset may project taking into account a projection uncertainty in view of camera calibration uncertainties. The probability distribution is determined by computing a probability distribution function over image space coordinates for a pixel to which a point of the LiDAR dataset would probably project. The probability distribution function may be computed in accordance with following mathematical equation
Figure imgf000004_0001
where x1 and/ represent image space coordinates for a pixel, and X. F and Z represent LiDAR space coordinates for a point of the LiDAR dataset. The probability distribution function may be converted to image detection mask coordinates in accordance with the following mathematical equation
Figure imgf000005_0001
where represent image space boundaries of a bounding box, and R represents a
Figure imgf000005_0002
mask resolution.
[0007] Alternatively or additionally, the matching comprises determining a probability distribution over a set of object detections in which a point of the LiDAR dataset is likely to be, based on at least one confidence value indicating a level of confidence that at least one respective pixel of the at least one image belongs to a given detected object. The probability distribution may be determined by computing a probability that a point of the LiDAR dataset projects into an image detection independent of all other image detections. For example, the probability may be computed in accordance with the following mathematical equation(s).
Figure imgf000005_0003
mask coordinates,
Figure imgf000005_0004
represents the y limits of the pixel in mask coordinates, dmp represents a mask pixel associated with a given object detection d, dy represents y-axis coordinate for a mask pixel associated with the given object detection d, and dx represents an x- axis coordinate for the mask pixel associated with the given object detection d [0008] Alternatively or additionally, the matching comprises determining a probability that the LiDAR point does not project into any image detection. For example, the matching involves normalizing a plurality of probabilities determined for a given point of the LiDAR dataset in accordance with the following mathematical equation
Figure imgf000006_0001
where represents a probability that that a point of the LiDAR dataset projects into
Figure imgf000006_0002
an image detection independent of all other image detections and
Figure imgf000006_0003
represents a probability that the LiDAR point does not project into any image detection.
[0009] The present disclosure also concerns implementing systems and methods for CLF object detection with point pruning. The present solution can be used to operate an autonomous vehicle. In this scenario, the methods comprise: obtaining, by a computing device, a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; and using, by a computing device, the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle. The object is detected by generating a pruned LiDAR dataset by reducing a total number of points contained in the LiDAR dataset, and detecting the object in a point cloud defined by the pruned LiDAR dataset. The object detection may be used by the computing device to facilitate at least one autonomous driving operation (e.g., an object tracking operation, an object trajectory prediction operation, a vehicle trajectory determination operation, and/or a collision avoidance operation).
[0010] In those or other scenarios, the methods also comprise obtaining, by the computing device, at least one image that was captured at a time when a sensor of the LiDAR system swept over a center of a camera FOV. The image is used in addition to the LiDAR dataset to detect the object.
[0011] In those or other scenarios, the pruned LiDAR dataset is generated by downsampling the points based on a planned trajectory of the autonomous vehicle. The points of the LiDAR dataset, corresponding to a first region along the planned trajectory of the autonomous vehicle, may be downsampled at a higher or lower sampling rate than points of the LiDAR dataset corresponding to a second region that is not along the planned trajectory of the autonomous vehicle. The first region may comprise a region including points corresponding to at least one object that is unlikely to interfere with the autonomous vehicle when following the planned trajectory, and the second region may comprise a region including points corresponding to at least one object that is likely to interfere with the autonomous vehicle when following the planned trajectory.
[0012] In those or other scenarios, the pruned LiDAR dataset is generated by downsampling the LiDAR dataset based on point labels assigned to the points. Each of the point labels may comprise at least one of an object class identifier, a color, and/or a unique identifier.
Alternatively or additionally, the LiDAR dataset is downsampled by assigning a first importance label to points associated with a moving object class and a second importance label to points associated with a static object class. The points assigned the first importance label may be downsampled (e.g., at a first resolution), and/or the points assigned the second importance label may be downsampled (e.g., at a second resolution lower than the first resolution).
[0013] In those or other scenarios, the pruned LiDAR dataset is generated by downsampling the LiDAR dataset based on point distances from a bounding box. A point may be removed from the LiDAR dataset when a respective one of the point distances is greater than a threshold distance.
[0014] In those or other scenarios, the pruned LiDAR dataset is generated by downsampling the LiDAR dataset using a map that includes information associated with a planned trajectory of the autonomous vehicle. A point may be removed from the LiDAR dataset when the point has a height less than a minimum height threshold value or greater than a maximum height threshold value. Additionally or alternatively, the pruned LiDAR dataset is generated by downsampling the LiDAR dataset at a resolution selected based on a modeled process latency.
[0015] The present disclosure further concerns implementing systems and methods for object detection with local variation segmentation. The object detection may be used to control an autonomous vehicle. In this scenario, the method comprises: obtaining, by a computing device, a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; and using, by a computing device, the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle. The object is detected by: computing a distribution of object detections that each point of the LiDAR dataset is likely to be in; creating a plurality of segments of LiDAR data points using the distribution of object detections; and detecting the object in a point cloud defined by the LiDAR dataset based on the plurality of segments of LiDAR data points. The object detection may be used by the computing device to facilitate at least one autonomous driving operation (e.g., autonomous driving operation comprises an object tracking operation, an object trajectory prediction operation, a vehicle trajectory determination operation, and/or a collision avoidance operation).
[0016] In those or other scenarios, the methods also comprise obtaining, by the computing device, at least one image that was captured at a time when a sensor of the LiDAR system swept over a center of a camera FOV, wherein the at least one image is used in addition to the LiDAR dataset to detect the object. The distribution of object detections may be computed based on (a) a probability distribution of pixels of the at least one image to which a point of the LiDAR dataset may project, and (b) a probability that the point does not project into any image detection.
[0017] The segments of LiDAR data points may be created by using the LiDAR dataset to construct a connectivity graph. The connectivity graph comprises points of the LiDAR dataset plotted in a 3D coordinate system and connection lines respectively connecting the points. The connection lines may be added to the connectivity graph based on whether two points of the LiDAR dataset are within a threshold spatial or temporal distance from each other, whether two points are nearest neighbors, or triangulation.
[0018] Additionally or alternatively, the segments of LiDAR data points are created by determining, for each point in the connectivity graph, a descriptor comprising a vector of elements that characterize a given point of the LiDAR data set. The elements of the vector may comprise a surface normal, a color value based on the at least one image, an intensity, a texture, spatial coordinates, a height above ground, a class label, an instance identifier, an image based feature, a Fast Point Feature Histogram, an image detection capability, and/or a modified distance.
[0019] Additionally or alternatively, the segments of LiDAR data points are created by further assigning a weight to each connection line based on the descriptor. The weight represents a dissimilarity measure between two points connected to each other in the connectivity graph via the connection line.
[0020] Additionally or alternatively, the plurality of segments of LiDAR data points are created by further merging points of the LiDAR dataset based on the weights. Two points may be merged together when a weight associated with a respective connection line is less than a threshold value.
[0021] The present disclosure concerns implementing systems and methods for object detection with segment merging. The object detection may be used to control an autonomous vehicle. In this scenarios, the methods comprise: obtaining, by a computing device, a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; and using, by a computing device, the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle. The object is detected by: computing a distribution of object detections that each point of the LiDAR dataset is likely to be in; creating a plurality of segments of LiDAR data points using the distribution of object detections; merging the plurality of segments of LiDAR data points to generate merged segments; and detecting the object in a point cloud defined by the LiDAR dataset based on the merged segments. The object detection may be used by the computing device to facilitate at least one autonomous driving operation (e.g., an object tracking operation, an object trajectory prediction operation, a vehicle trajectory determination operation, and/or a collision avoidance operation).
[0022] In those or other scenarios, the methods also comprise obtaining, by the computing device, at least one image that was captured at a time when a sensor of the LiDAR system swept over a center of a camera FOV, wherein the at least one image is used in addition to the LiDAR dataset to detect the object. The distribution of object detections may be computed based on (a) a probability distribution of pixels of the at least one image to which a point of the LiDAR dataset may project, and (b) a probability that the point does not project into any image detection.
[0023] The merged segments may be generated by: selecting pairs of segments from the plurality of segments of LiDAR data points; computing features for each pair of segments based on attributes of the segments contained in the pair; generating, for each pair of segments, a probability that the segments contained in the pair should be merged based on the features; and merging the plurality of segments of LiDAR data points based on the probabilities generated for the pairs of segments.
[0024] The pairs of segments may be filtered to remove pairs of segments which have centroid-to-centroid distances greater than a threshold value. The features may include, but are not limited to, a difference between the average of the probability distributions that were computed for the LiDAR data points contained in a first segment of the plurality of segments of LiDAR data points and the average of the probability distributions that were computed for the LiDAR data points contained in a second segment of the plurality of segments of LiDAR data points. The attributes may include, but are not limited to, an average of a plurality of probability distributions that were computed for the LiDAR data points contained in a given segment of the plurality of segments of LiDAR data points, and/or each probability distribution specifying detected objects in which a given LiDAR data point is likely to be.
[0025] Alternatively or additionally, the attributes include a 2D region that the LiDAR data points in a given segment cover, a percentage of LiDAR data points contained in the given segment that are on a road, a percentage of LiDAR data points contained in the given segment that are off a road, and/or a total number of lanes that the given segment at least partially overlaps. The features include a difference in on-road proportions, difference in off-road proportions, a region compatibility, a lane compatibility, a difference between a total number of lanes that a first segment of LiDAR data points at least partially overlaps and a total number of lanes that a second segment of LiDAR data points at least partially overlaps, a difference or distance in height between segments of LiDAR data points, a mask compatibility, a difference in object type distributions, and/or an object type compatibility. [0026] The present disclosure concerns implementing systems and methods for object detection with segment fdtering. The object detection can be used to control an autonomous vehicle. In these scenarios, the methods comprise: obtaining, by a computing device, a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; and using, by a computing device, the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle. The object is detected by performing the following operations: computing a distribution of object detections that each point of the LiDAR dataset is likely to be in; creating a plurality of segments of LiDAR data points using the distribution of object detections; merging the plurality of segments of LiDAR data points to generate merged segments; and detecting the object in a point cloud defined by the LiDAR dataset based on the merged segments. The object detection is used by the computing device to facilitate at least one autonomous driving operation (e.g., an object tracking operation, an object trajectory prediction operation, a vehicle trajectory determination operation, and/or a collision avoidance operation).
[0027] In those or other scenarios, the methods also comprise obtaining, by the computing device, at least one image that was captured at a time when a sensor of the LiDAR system swept over a center of a camera FOV, wherein the at least one image is used in addition to the LiDAR dataset to detect the object. The distribution of object detections may be computed based on (a) a probability distribution of pixels of the at least one image to which a point of the LiDAR dataset may project, and (b) a probability that the point does not project into any image detection.
[0028] In those or other scenarios, the detecting comprises obtaining information for a given detection mask and a given merged segment of the merged segments. The information may comprise at least one of Pm representing a number of points of a LiDAR dataset that project into the given detection mask, Si representing a number of points forming the given merged segment, Psm representing a number of points in the given merged segment projecting into the given detection mask, a height of the given merged segment, a length ls of the given merged segment, and/or a width ii’s of the given merged segment.
[0029] In those or other scenarios, the detecting comprises determining at least one cluster feature based on the information. The cluster feature may comprise: a cluster feature U determined based on a number of points of a LiDAR dataset that project into the given detection mask and/or a number of points forming the given merged segment; a cluster feature V determined based on a number of points in the given merged segment projecting into the given detection mask and/or a number of points of a LiDAR dataset that project into the given detection mask; and/or a cluster feature H representing a cluster height, a cluster feature L representing a cluster length, a cluster feature LTW representing a length-to-width ratio for a cluster, and/or a cluster feature C representing a cylinder convolution (or fit) score of clustered LiDAR data points.
[0030] In those or other scenarios, the detecting comprises computing a projection score PS based on the at least one cluster feature. The projection score PS is a product of two or more cluster features.
[0031] In those or other scenarios, the detecting comprises using the projection score PS to verify that the given merged segment is part of a particular detected object that is associated with the given detection mask. A verification may be made that the given merged segment is part of a particular detected object that is associated with the given detection mask when the projection score PS exceeds a threshold value or has a value greater than other projection scores determined for other merged segments with points in the given detection mask.
[0032] The implementing systems can comprise: a processor; and a non-transitory computer- readable storage medium comprising programming instructions that are configured to cause the processor to implement a method for object detection. The above described methods can also be implemented by a computer program product comprising a memory and programming instructions that are configured to cause a processor to perform operations.
BRIEF DESCRIPTION OF THE DRAWINGS
[0033] The present solution will be described with reference to the following drawing figures, in which like numerals represent like items throughout the figures.
[0034] FIG. 1 is an illustration of an illustrative system. [0035] FIG. 2 is an illustration of an illustrative architecture for a vehicle.
[0036] FIG. 3 is an illustration of an illustrative architecture for a LiDAR system employed by the vehicle shown in FIG. 2.
[0037] FIG. 4 is an illustration of an illustrative computing device.
[0038] FIG. 5 provides a block diagram that is useful for understanding how vehicles control is achieved in accordance with the present solution.
[0039] FIGS. 6A-6B (collectively referred to herein as “FIG. 6”) provides a flow diagram of an illustrative method for controlling an autonomous vehicle using CLF object detection.
[0040] FIG. 7 provides a flow diagram of an illustrative method for CLF object detection.
[0041] FIG. 8 provides a flow diagram of an illustrative method for pruning (or reducing) the number of LiDAR data points that are processed for purposed of detecting an object that is located in proximity to an AV.
[0042] FIG. 9 provides a flow diagram of an illustrative method for performing a LiDAR-to- Image Detection (“LID”) matching algorithm.
[0043] FIG. 10 provides an illustrative image captured by a camera of a vehicle.
[0044] FIG. 11 provides an illustrative image having a plurality of bounding boxes overlaid thereon.
[0045] FIG. 12 provides an illustrative image having a bounding box and mask overlaid thereon.
[0046] FIG. 13 provides a flow diagram of an illustrative method for determining a probability distribution of pixels to which a LiDAR data point may project taking into account a projection uncertainty. [0047] FIG. 14 provides a flow diagram of an illustrative method for determining a probability distribution over a set of object detections in which a LiDAR data point is likely to be.
[0048] FIG. 15 provides an illustration that is useful for understanding the novel Local Variation Segmentation (“LVS”) algorithm of the present solution.
[0049] FIG. 16 provides an illustration showing a graph that is generated during the LVS algorithm of FIG. 15.
[0050] FIG. 17 provides an illustration of an illustrative architecture for a segment merger.
[0051] FIG. 18 provides a flow diagram of an illustrative method for object detection segment filtering.
DETAILED DESCRIPTION
[0052] As used in this document, the singular forms “a,” “an,” and “the” include plural references unless the context clearly dictates otherwise. Unless defined otherwise, all technical and scientific terms used herein have the same meanings as commonly understood by one of ordinary skill in the art. As used in this document, the term “comprising” means “including, but not limited to.” Definitions for additional terms that are relevant to this document are included at the end of this Detailed Description.
[0053] An “electronic device” or a “computing device” refers to a device that includes a processor and memory. Each device may have its own processor and/or memory, or the processor and/or memory may be shared with other devices as in a virtual machine or container arrangement. The memory will contain or receive programming instructions that, when executed by the processor, cause the electronic device to perform one or more operations according to the programming instructions.
[0054] The terms “memory,” “memory device,” “data store,” “data storage facility” and the like each refer to a non-transitory device on which computer-readable data, programming instructions or both are stored. Except where specifically stated otherwise, the terms “memory,” “memory device,” “data store,” “data storage facility” and the like are intended to include single device embodiments, embodiments in which multiple memory devices together or collectively store a set of data or instructions, as well as individual sectors within such devices.
[0055] The terms “processor” and “processing device” refer to a hardware component of an electronic device that is configured to execute programming instructions. Except where specifically stated otherwise, the singular term “processor” or “processing device” is intended to include both single-processing device embodiments and embodiments in which multiple processing devices together or collectively perform a process.
[0056] The term “vehicle” refers to any moving form of conveyance that is capable of carrying either one or more human occupants and/or cargo and is powered by any form of energy. The term “vehicle” includes, but is not limited to, cars, trucks, vans, trains, autonomous vehicles, aircraft, aerial drones and the like. An “autonomous vehicle” is a vehicle having a processor, programming instructions and drivetrain components that are controllable by the processor without requiring a human operator. An autonomous vehicle may be fully autonomous in that it does not require a human operator for most or all driving conditions and functions, or it may be semi-autonomous in that a human operator may be required in certain conditions or for certain operations, or that a human operator may override the vehicle’s autonomous system and may take control of the vehicle.
[0057] In this document, when terms such as “first” and “second” are used to modify a noun, such use is simply intended to distinguish one item from another, and is not intended to require a sequential order unless specifically stated. In addition, terms of relative position such as “vertical” and “horizontal”, or “front” and “rear”, when used, are intended to be relative to each other and need not be absolute, and only refer to one possible position of the device associated with those terms depending on the device’s orientation.
[0058] Real-time prediction of actions by drivers of other vehicles and pedestrians is a challenge for on-road semi-autonomous or autonomous vehicle applications. Such real-time prediction is particularly challenging when the drivers and/or pedestrians break traffic rules. Systematically assuming the worst case action from the drivers and/or pedestrians will paralyze the self-driving vehicle, but erroneously optimistic predictions can result in undesirable autonomous vehicle behavior.
[0059] This document describes methods and systems that are directed to addressing the problems described above, and/or other issues. Accordingly, the present solution concerns systems and methods for controlling vehicles. The methods generally involve: generating a vehicle trajectory for the vehicle that is in motion; performing CLF object detection operations to detect an object within a given distance from the vehicle; generating at least one possible object trajectory for the object which was detected; using the vehicle trajectory and at least one possible object trajectory to determine whether there is an undesirable probability that a collision will occur between the vehicle and the object; and modifying the vehicle trajectory when a determination is made that there is an undesirableprobability that the collision will occur.
[0060] Notably, the present solution is being described herein in the context of an autonomous vehicle. The present solution is not limited to autonomous vehicle applications. The present solution can be used in other applications such as robotic application, radar system application, metric applications, and/or system performance applications.
[0061] Illustrative Systems
[0062] Referring now to FIG. 1, there is provided an illustration of an illustrative system 100. System 100 comprises a vehicle 102i that is traveling along a road in a semi-autonomous or autonomous manner. Vehicle 102i is also referred to herein as an Autonomous Vehicle (“AV”). The AV 102i can include, but is not limited to, a land vehicle (as shown in FIG. 1), an aircraft, or a watercraft.
[0063] AV 102i is generally configured to detect objects 102i, 114, 116 in proximity thereto. The objects can include, but are not limited to, a vehicle 1022, cyclist 114 (such as a rider of a bicycle, electric scooter, motorcycle, or the like) and/or a pedestrian 116. The object detection is achieved in accordance with a novel CLF object detection process. The novel CLF object detection process will be described in detail below. When such a detection is made, AV 102i performs operations to: generate one or more possible object trajectories for the detected object; and analyze at least one of the generated possible object trajectories to determine whether or not there is an undesirable probability that a collision will occur between the AV and object in a threshold period of time (e.g., 1 minute). If so, the AV 102i performs operations to determine whether the collision can be avoided if a given vehicle trajectory is followed by the AV 102i and any one of a plurality of dynamically generated emergency maneuvers is performed in predefined time period (e.g., /V milliseconds). If the collision can be avoided, then the AV 102i takes no action or optionally performs a cautious maneuver (e.g., mildly slows down). In contrast, if the collision cannot be avoided, then the AV 102i immediately takes an emergency maneuver (e.g., brakes and/or changes direction of travel).
[0064] Referring now to FIG. 2, there is provided an illustration of an illustrative system architecture 200 for a vehicle. Vehicles 102i and/or 1022 of FIG. 1 can have the same or similar system architecture as that shown in FIG. 2. Thus, the following discussion of system architecture 200 is sufficient for understanding vehicle(s) 102i, 1022 of FIG. 1.
[0065] As shown in FIG. 2, the vehicle 200 includes an engine or motor 202 and various sensors 204-218 for measuring various parameters of the vehicle. In gas-powered or hybrid vehicles having a fuel-powered engine, the sensors may include, for example, an engine temperature sensor 204, a battery voltage sensor 206, an engine Rotations Per Minute (“RPM”) sensor 208, and a throttle position sensor 210. If the vehicle is an electric or hybrid vehicle, then the vehicle may have an electric motor, and accordingly will have sensors such as a battery monitoring system 212 (to measure current, voltage and/or temperature of the battery), motor current 214 and voltage 216 sensors, and motor position sensors such as resolvers and encoders 218
[0066] Operational parameter sensors that are common to both types of vehicles include, for example: a position sensor 236 such as an accelerometer, gyroscope and/or inertial measurement unit; a speed sensor 238; and an odometer sensor 240. The vehicle also may have a clock 242 that the system uses to determine vehicle time during operation. The clock 242 may be encoded into the vehicle on-board computing device, it may be a separate device, or multiple clocks may be available. [0067] The vehicle also will include various sensors that operate to gather information about the environment in which the vehicle is traveling. These sensors may include, for example: a location sensor 260 (e.g., a Global Positioning System (“GPS”) device); object detection sensors such as one or more cameras 262; a LiDAR sensor system 264; and/or a radar and/or a sonar system 266. The sensors also may include environmental sensors 268 such as a precipitation sensor and/or ambient temperature sensor. The object detection sensors may enable the vehicle to detect objects that are within a given distance range of the vehicle 200 in any direction, while the environmental sensors collect data about environmental conditions within the vehicle’s area of travel
[0068] During operations, information is communicated from the sensors to an on-board computing device 220. The on-board computing device 220 analyzes the data captured by the sensors and optionally controls operations of the vehicle based on results of the analysis. For example, the on-board computing device 220 may control: braking via a brake controller 232; direction via a steering controller 224; speed and acceleration via a throttle controller 226 (in a gas-powered vehicle) or a motor speed controller 228 (such as a current level controller in an electric vehicle); a differential gear controller 230 (in vehicles with transmissions); and/or other controllers.
[0069] Geographic location information may be communicated from the location sensor 260 to the on-board computing device 220, which may then access a map of the environment that corresponds to the location information to determine known fixed features of the environment such as streets, buildings, stop signs and/or stop/go signals. Captured images from the cameras 262 and/or object detection information captured from sensors such as LiDAR 264 is communicated from those sensors) to the on-board computing device 220. The object detection information and/or captured images are processed by the on-board computing device 220 to detect objects in proximity to the vehicle 200. Any known or to be known technique for making an object detection based on sensor data and/or captured images can be used in the embodiments disclosed in this document. [0070] LiDAR information is communicated from LiDAR sensor 264 to the on-board computing device 220. Additionally, captured images are communicated from the camera(s) 262 to the on-board computing device 220. The LiDAR information and/or captured images are processed by the on-board computing device 220 to detect objects in proximity to the vehicle 200. The manner in which the object detections are made by the on-board computing device 220 will become evident as the discussion progresses.
[0071] When the on-board computing device 220 detects a moving object, the on-board computing device 220 will generate one or more possible object trajectories for the detected object, and analyze the possible object trajectories to assess the probability of a collision between the object and the AV. If the probability exceeds an acceptable threshold, the on-board computing device 220 performs operations to determine whether the collision can be avoided if the AV follows a defined vehicle trajectory and/or implements one or more dynamically generated emergency maneuvers is performed in a pre-defined time period (e.g., N milliseconds). If the collision can be avoided, then the on-board computing device 220 may cause the vehicle 200 to perform a cautious maneuver (e.g., mildly slow down, accelerate, or swerve). In contrast, if the collision cannot be avoided, then the on-board computing device 220 will cause the vehicle 200 to take an emergency maneuver (e.g., brake and/or change direction of travel).
[0072] Referring now to FIG. 3, there is provided an illustration of an illustrative LiDAR system 300. LiDAR system 264 of FIG. 2 may be the same as or substantially similar to the LiDAR system 300. As such, the discussion of LiDAR system 300 is sufficient for understanding LiDAR system 264 of FIG. 2.
[0073] As shown in FIG. 3, the LiDAR system 300 includes a housing 306 which may be rotatable 360° about a central axis such as hub or axle 316. The housing may include an emitter/receiver aperture 312 made of a material transparent to light. Although a single aperture is shown in FIG. 2, the present solution is not limited in this regard. In other scenarios, multiple apertures for emitting and/or receiving light may be provided. Either way, the LiDAR system 300 can emit light through one or more of the aperture(s) 312 and receive reflected light back toward one or more of the aperture(s) 211 as the housing 306 rotates around the internal components. In an alternative scenarios, the outer shell of housing 306 may be a stationary dome, at least partially made of a material that is transparent to light, with rotatable components inside of the housing 306.
[0074] Inside the rotating shell or stationary dome is a light emitter system 304 that is configured and positioned to generate and emit pulses of light through the aperture 312 or through the transparent dome of the housing 306 via one or more laser emitter chips or other light emitting devices. The emitter system 304 may include any number of individual emitters (e.g., 8 emitters, 64 emitters, or 128 emitters). The emitters may emit light of substantially the same intensity or of varying intensities. The individual beams emitted by the light emitter system 304 will have a well-defined state of polarization that is not the same across the entire array. As an example, some beams may have vertical polarization and other beams may have horizontal polarization. The LiDAR system will also include a light detector 308 containing a photodetector or array of photodetectors positioned and configured to receive light reflected back into the system. The emitter system 304 and light detector 308 would rotate with the rotating shell, or they would rotate inside the stationary dome of the housing 306. One or more optical element structures 310 may be positioned in front of the light emitting unit 304 and/or the light detector 308 to serve as one or more lenses or waveplates that focus and direct light that is passed through the optical element structure 310
[0075] One or more optical element structures 310 may be positioned in front of a minor 312 to focus and direct light that is passed through the optical element structure 310. As shown below, the system includes an optical element structure 310 positioned in front of the mirror 312 and connected to the rotating elements of the system so that the optical element structure 310 rotates with the mirror 312. Alternatively or in addition, the optical element structure 310 may include multiple such structures (for example lenses and/or waveplates). Optionally, multiple optical element structures 310 may be arranged in an array on or integral with the shell portion of the housing 306.
[0076] Optionally, each optical element structure 310 may include a beam splitter that separates light that the system receives from light that the system generates. The beam splitter may include, for example, a quarter-wave or half-wave waveplate to perform the separation and ensure that received light is directed to the receiver unit rather than to the emitter system (which could occur without such a waveplate as the emitted light and received light should exhibit the same or similar polarizations).
[0077] The LiDAR system will include a power unit 318 to power the light emitting unit 304, a motor 316, and electronic components. The LiDAR system will also include an analyzer 314 with elements such as a processor 322 and non-transitory computer-readable memory 320 containing programming instructions that are configured to enable the system to receive data collected by the light detector unit, analyze it to measure characteristics of the light received, and generate information that a connected system can use to make decisions about operating in an environment from which the data was collected. Optionally, the analyzer 314 may be integral with the LiDAR system 300 as shown, or some or all of it may be external to the LiDAR system and communicatively connected to the LiDAR system via a wired or wireless communication network or link.
[0078] Referring now to FIG. 4, there is provided an illustration of an illustrative architecture for a computing device 400. The computing device 110 of FIG. 1 and/or the vehicle on-board computing device 220 of FIG. 2 is/are the same as or similar to computing device 300. As such, the discussion of computing device 300 is sufficient for understanding the computing device 110 of FIG. 1 and the vehicle on-board computing device 220 of FIG. 2.
[0079] Computing device 400 may include more or less components than those shown in FIG. 4. However, the components shown are sufficient to disclose an illustrative solution implementing the present solution. The hardware architecture of FIG. 4 represents one implementation of a representative computing device configured to operate a vehicle, as described herein. As such, the computing device 400 of FIG. 4 implements at least a portion of the method(s) described herein.
[0080] Some or all components of the computing device 400 can be implemented as hardware, software and/or a combination of hardware and software. The hardware includes, but is not limited to, one or more electronic circuits. The electronic circuits can include, but are not limited to, passive components (e.g., resistors and capacitors) and/or active components (e.g., amplifiers and/or microprocessors). The passive and/or active components can be adapted to, arranged to and/or programmed to perform one or more of the methodologies, procedures, or functions described herein.
[0081] As shown in FIG. 4, the computing device 400 comprises a user interface 402, a Central Processing Unit (“CPU”) 406, a system bus 410, a memory 412 connected to and accessible by other portions of computing device 400 through system bus 410, a system interface 460, and hardware entities 414 connected to system bus 410. The user interface can include input devices and output devices, which facilitate user-software interactions for controlling operations of the computing device 400. The input devices include, but are not limited to, a physical and/or touch keyboard 450. The input devices can be connected to the computing device 400 via a wired or wireless connection (e.g., a Bluetooth® connection). The output devices include, but are not limited to, a speaker 452, a display 454, and/or light emitting diodes 456. System interface 460 is configured to facilitate wired or wireless communications to and from external devices (e.g., network nodes such as access points, etc ).
[0082] At least some of the hardware entities 414 perform actions involving access to and use of memory 412, which can be a Random Access Memory (“RAM”), a disk drive, flash memory, a Compact Disc Read Only Memory (“CD-ROM”) and/or another hardware device that is capable of storing instructions and data. Hardware entities 414 can include a disk drive unit 416 comprising a computer-readable storage medium 418 on which is stored one or more sets of instructions 420 (e g., software code) configured to implement one or more of the methodologies, procedures, or functions described herein. The instructions 420 can also reside, completely or at least partially, within the memory 412 and/or within the CPU 406 during execution thereof by the computing device 400. The memory 412 and the CPU 406 also can constitute machine-readable media. The term "machine-readable media", as used here, refers to a single medium or multiple media (e g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions 420. The term "machine- readable media", as used here, also refers to any medium that is capable of storing, encoding or carrying a set of instructions 420 for execution by the computing device 400 and that cause the computing device 400 to perform any one or more of the methodologies of the present disclosure.
[0083] Referring now to FIG. 5, there is provided a block diagram that is useful for understanding how vehicles control is achieved in accordance with the present solution. All of the operations performed in blocks 502-518 can be performed by the on-board computing device of a vehicle (e.g., AV 102i of FIG. 1).
[0084] In block 502, a location of the vehicle is detected. This detection can be made based on sensor data output from a location sensor (e.g., location sensor 248 of FIG. 2) of the vehicle. This sensor data can include, but is not limited to, GPS data. The detected location of the vehicle is then passed to block 506.
[0085] In block 504, an object is detected within proximity of the vehicle. This detection is made based on sensor data output from a LiDAR system (e.g., LiDAR system 264 of FIG. 2) and a camera (e g., camera 262 of FIG. 2) of the vehicle. The manner in which the object detection is achieved will become evident as the discussion progresses. Information about the detected object is passed to block 506 This information includes, but is not limited to, an initial predicted trajectory of the object, a speed of the object, a full extent of the object, a heading of the object, a direction of travel of the object, and/or a classification of the object. The full extent of the object and the heading of the object can be specified by a cuboid defined in a 3D graph on which the LiDAR data points are plotted. The plotted LiDAR data points form a 3D point cloud. The initial predicted object trajectory can include, but is not limited to, a linear path pointing in the heading direction of the cuboid.
[0086] This object detection information output from block 504 can be subsequently used to facilitate at least one autonomous driving operation (e.g., object tracking operations, object trajectory prediction operations, vehicle trajectory determination operations, and/or collision avoidance operations). For example, a cuboid can be defined for the detected object in a 3D graph comprising a LiDAR dataset. The cuboid heading and geometry can be used to predict object trajectories in block 512 as discussed below and/or determine a vehicle trajectory in block 506 as discussed below. A worst-case predicted objected trajectory can be identified and used to trigger emergency maneuvers in blocks 514-518 as discussed below. The present solution is not limited to the particulars of this example.
[0087] In block 506, a vehicle trajectory is generated using the information from blocks 502 and 504. Techniques for determining a vehicle trajectory are well known in the art. Any known or to be known technique for determining a vehicle trajectory can be used herein without limitation. For example, in some scenarios, such a technique involves determining a trajectory for the AV that would pass the object when the object is in front of the AV, the cuboid has a heading direction that is aligned with the direction in which the AV is moving, and the cuboid has a length that is greater than a threshold value. The present solution is not limited to the particulars of this scenario. The vehicle trajectory 520 can be determined based on the location information from block 502, the object detection information from block 504, and map information 528 (which is pre-stored in a data store of the vehicle). The vehicle trajectory 520 may represent a smooth path that does not have abrupt changes that would otherwise provide passenger discomfort. For example, the vehicle trajectory is defined by a path of travel along a given lane of a road in which the object is not predicted travel within a given amount of time. The vehicle trajectory 520 is then provided to block 508.
[0088] In block 508, a steering angle and velocity command is generated based on the vehicle trajectory 520. The steering angle and velocity command is provided to block 510 for vehicle dynamics control.
[0089] Notably, the present solution augments the above-described vehicle trajectory planning process 500 of blocks 502-510 with an additional supervisory layer process 550. The additional supervisory layer process 550 optimizes the vehicle trajectory for the most likely behavior of the objects detected in block 504, but nonetheless maintains acceptable operations if worst-case behaviors occurs. This additional supervisory layer process 550 is implemented by blocks 512-518.
[0090] As shown in FIG. 5, an object classification is performed in block 504 to classify the detected object into one of a plurality of classes and/or sub-classes. The classes can include, but are not limited to, a vehicle class and a pedestrian class. The vehicle class can have a plurality of vehicle sub-classes. The vehicle sub-classes can include, but are not limited to, a bicycle subclass, a motorcycle sub-class, a skateboard sub-class, a roller blade sub-class, a scooter sub-class, a sedan sub-class, an SUV sub-class, and/or a truck sub-class. The object classification is made based on sensor data generated by a LiDAR system (e.g., LiDAR system 264 of FIG. 2) and/or a camera (e.g., camera 262 of FIG. 2) of the vehicle. Techniques for classifying objects based on LiDAR data and/or imagery data are well known in the art. Any known or to be known object classification technique can be used herein without limitation. Information 530 specifying the object’s classification is provided to block 512, in addition to the information 532 indicating the object’s actual speed and direction of travel.
[0091] Block 512 involves determining one or more possible object trajectories for the object detected in 504. The possible object trajectories can include, but are not limited to, the following trajectories:
• a trajectory defined by the object’s actual speed (e.g., 1 mile per hour) and actual direction of travel (e.g., west);
• a trajectory defined by the object’s actual speed (e.g., 1 mile per hour) and another possible direction of travel (e.g., south, south-west, or X (e.g., 40°) degrees from the object’s actual direction of travel in a direction towards the AV) for the object;
• a trajectory defined by another possible speed for the object (e.g., 2-10 miles per hour) and the object’s actual direction of travel (e.g., west); and/or
• a trajectory defined by another possible speed for the object (e.g., 2-10 miles per hour) and another possible direction of travel (e.g., south, south-west, or X (e.g., 40°) degrees from the object’s actual direction of travel in a direction towards the AV) for the object.
The possible speed(s) and/or possible directi on(s) of travel may be pre-defmed for objects in the same class and/or sub-class as the object. The one or more possible object trajectories 522 is(are) then passed to block 514. The system may cause the vehicle’s speed and steering controllers to move the vehicle according to the defined trajectory as discussed below.
[0092] In the case that two or more possible object trajectories are determined, then 512 may optionally also involve selecting one of the possible object trajectories which provides a worstcase collision scenario for the AV. This determination is made based on information 532 indicating the AV’s actual speed and direction of travel. The selected possible object trajectory is then passed to block 514, instead of all the possible object trajectories determined in 512.
[0093] In block 514, a collision check is performed for each of the possible object trajectories 522 passed to block 514. The collision check involves determining whether there is an undesirable probability that a collision will occur between the vehicle and the object. Such a determination is made by first determining if the vehicle trajectory 520 and a given possible object trajectory 522 intersect. If the two trajectories 520, 522 do not intersect, then the vehicle trajectory 520 is deemed to be an acceptable vehicle trajectory and no control action is taken to modify the vehicle trajectory.
[0094] In contrast, if the two trajectories 520, 522 do intersect, then a predicted time at which a collision would occur if the two trajectories are followed is determined. The predicted time is compared to a threshold value (e.g., 1 second). If the predicted time exceeds the threshold value, then the vehicle trajectory 520 is deemed to be an acceptable vehicle trajectory and no control action is taken to modify the vehicle trajectory
[0095] If the predicted time is equal to or less than the threshold value, then a determination is made as to whether the collision can be avoided if (a) the vehicle trajectory is followed by the AV and (b) any one of a plurality of dynamically generated emergency maneuvers is performed in a pre-defined time period (e.g., N milliseconds). The dynamically generated emergency maneuvers include, but are not limited to, the following:
• an emergency maneuver that comprises a braking command and that is determined based on the vehicle trajectory and a possible object trajectory; • an emergency maneuver that comprises at least a steering command, and a braking command or an acceleration command, and that is determined via a gradient descent from the active AV trajectory on an objective function which penalizes collision and/or ride discomfort; and/or
• an emergency maneuver that comprises a pre-defined emergency maneuver that has been optimized via a gradient descent from the active AV trajectory on an objective function which penalizes collision and/or ride discomfort.
[0096] In some scenarios, an emergency braking maneuver is produced by postulating a trajectory that maintains the intended trajectory for the pre-defined time period (N milliseconds) and then decelerates at a maximum braking profile parameterized by maximum allowable deceleration and jerk limits. The maximum braking profile is produced along the original trajectory via Euler integration of a new velocity profile, or by other methods. The present solution is not limited to the particulars of these scenarios.
[0097] In those or other scenarios, an emergency maneuver that comprises both steering and braking is generated by: parameterizing both steering and braking with a limited set of spline points (e.g., 4 spline points for steering and 3 spline points for velocity); minimizing an objective function which penalizes collision and/or ride discomfort, as a function of those parameters, using conjugate gradient descent, Newton’s method, Powell’s method, or other existing method(s) for minimizing multivariate functions; and computing the trajectory corresponding from the parameterized spline points with the minimal objective function cost. The present solution is not limited to the particulars of these scenarios.
[0098] In those or other scenarios, a pre-defined emergency maneuver is generated by recording commands from a human operator during a simulated emergency braking event, or by sampling a small set of steering torques and braking profiles applied to the current vehicle state. These torques are computed at constant intervals from zero up until the limits of the steering and brake mechanism, or by other methods. The present solution is not limited to the particulars of these scenarios. [0099] If it is determined that the collision can be avoided in the pre-defined time period, then the vehicle trajectory 520 is deemed to be an acceptable vehicle trajectory and no control action is taken to modify the vehicle trajectory. Alternatively, the AV is caused to perform a cautious maneuver (e.g., mildly slow down such as by 5-10 mph). Techniques for causing an AV to take a cautious maneuver such as slowing down are well known in the art. For example, a control action command is generated as shown by 516, and used to adjust or otherwise modify the vehicle trajectory at 508 prior to being passed to block 510. The vehicle trajectory can be adjusted or otherwise modified to cause the vehicle to decelerate, cause the vehicle to accelerate, and/or cause the vehicle to change its direction of travel.
[00100] In contrast, if it is determined that the collision cannot be avoided in the pre-defined time period, then the AV is caused to immediately take an emergency maneuver. This emergency maneuver may include one of the dynamically generated emergency maneuvers discussed above. Techniques for causing an AV to take emergency maneuvers are well known in the art.
[00101] Illustrative Methods For Control A Vehicle
[00102] Referring now to FIG. 6, there is provided a flow diagram of an illustrative method 600 for controlling a vehicle (e.g., vehicle 102i of FIG. 1). At least a portion of method 600 is performed by a vehicle on-board computing device (e.g., vehicle on-board computing device 220 of FIG. 2). Method 600 is performed for each object (e g., vehicle 1022 of FIG. 1, cyclist 104 of FIG. 1, and/or pedestrian 106 of FIG. 1) that has been detected to be within a distance range from the vehicle at any given time.
[00103] Method 600 comprises a plurality of operations 602-630. The present solution is not limited to the particular order of operations 602-630 shown in FIG. 6. For example, the operations of 620 can be performed in parallel with the operations of 604-618, rather than subsequent to as shown in FIG. 6.
[00104] As shown in FIG. 6A, method 600 begins with 602 and continues with 604 where a vehicle trajectory (e.g., vehicle trajectory 520 of FIG. 5) for an AV is generated. The vehicle trajectory represents a smooth path that does not have abrupt changes that would otherwise provide passenger discomfort. Techniques for determining a vehicle trajectory are well known in the art. Any known or to be known technique for determining a vehicle trajectory can be used herein without limitation. In some scenarios, the vehicle trajectory is determined based on location information generated by a location sensor (e.g., location sensor 260 of FIG. 2) of the AV, object detection information generated by the on-board computing device (e.g., on-board computing device 220 of FIG. 2) of the AV, images captured by at least one camera (e.g., camera 262 of FIG. 2) of the AV, and map information stored in a memory (e.g., memory 412 of FIG. 4) of the AV. In other scenarios, lane information is used as an alternative to or in addition to the location information and/or map information.
[00105] Once the vehicle trajectory is generated, method 600 continues with 605 where the AV performs operations to detect an object that is in proximity thereto. A CLF object detection algorithm is employed in 605. The CLF object detection algorithm will be described in detail below. The object detection is then used to facilitate at least one autonomous driving operation (e.g., object tracking operations, object trajectory prediction operations, vehicle trajectory determination operations, and/or collision avoidance operations). For example, a cuboid can be defined for the detected object in a 3D graph comprising a LiDAR data set. The cuboid specifies a heading of the object and/or full extent of the object’s geometry. The heading and object geometry can be used to predict an object trajectory and/or determine a vehicle trajectory, as is known in the art and discussed above. The present solution is not limited to the particulars of this example.
[00106] Accordingly, method 600 continues with 606 where one or more possible object trajectories (e.g , possible object trajectories 522 of FIG. 5) are determined for the object (e.g., vehicle 1022, cyclist 104 or pedestrian 106 of FIG. 1) detected in 605. The possible object trajectories can include, but are not limited to, the following trajectories: a trajectory defined by the object’s actual speed (e.g., 1 mile per hour) and actual direction of travel (e.g., west); a trajectory defined by the object’s actual speed (e.g., 1 mile per hour) and another possible direction of travel (e.g., south, south-west, or X (e.g., 40°) degrees from the object’s actual direction of travel in a direction towards the AV); a trajectory defined by another possible speed for the object (e.g., 2-10 miles per hour) and the object’ s actual direction of travel (e.g., west); and/or a trajectory defined by another possible speed for the object (e.g., 2-10 miles per hour) and another possible direction of travel (e.g., south or south-west or X (e.g., 40°) degrees from the object’s actual direction of travel in a direction towards the AV). The possible speed(s) and/or possible direction(s) of travel may be pre-defmed for objects in the same class and/or subclass as the object.
[00107] Next in 608, one of the possible object trajectories is selected for subsequent analysis. In some scenarios, the operations of 610-628 are performed (e.g., in an iterative or parallel manner) for each possible object trajectory generated in 606. In other scenarios, the operations of 610-628 are performed for only one of the possible object trajectories which provides a worstcase collision scenario for the AV. This worst-case possible object trajectory is selected based on information indicating the AV’s actual speed and direction of travel (e.g., generated by a speed sensor 238 of FIG. 2 and/or location sensor 260 of FIG. 2). A worst-collision scenario may include, but is not limited to, a collision scenario which is to occur sooner than all other collision scenarios provided by the possible object trajectories and/or is expected to result in serious injury or death (e.g., a high speed, side-impact collision or a high speed, head-on collision). In yet other scenarios, the operations 610-628 are performed for two or more of the possible object trajectories which provide the top Z (e g., 2 or 5) worst-case collision scenarios for the AV. Z is an integer selected in accordance with a particular application. The present solution is not limited to the particulars of these scenarios.
[00108] In next 610, a determination is made as to whether the vehicle trajectory generated in 604 and the possible object trajectory selected in 608 intersect each other. If the two trajectories do not intersect each other [611 :NO], then 612 is performed where method 600 returns to 604.
[00109] In contrast, if the two trajectories do intersect each other [611:YES], then method 600 continues to 614 where a time value is determined. This time value represents a time at which a collision will occur if the vehicle trajectory is followed by the AV and the possible object trajectory is followed by the object. The time value determined in 614 is then compared to a threshold time value, as shown by 616. The threshold time value is selected in accordance with a given application (e.g., one or more seconds). If the time value is greater than the threshold time value [616:NO], then 618 is performed where method 600 returns to 604. If the time value is equal to or less than the threshold time value [616:YES], then method 600 continues with 620- 622. 620-622 involve: dynamically generating one or more emergency maneuver profiles based on the vehicle trajectory and the possible object trajectory; and determine whether the collision can be avoided if the vehicle trajectory is followed by the AV and any one of the emergency maneuvers is performed in a pre-defined time period (e.g., V milliseconds). Upon completing 622, method 600 continues with 624 of FIG. 6B.
[00110] Referring now to FIG. 6B, if the collision cannot be avoided in the pre-defined time period [624:NO], then 626 is performed where the AV is caused to immediately take a first maneuver. The first maneuver can include, but is not limited to, one of the dynamically generated emergency maneuvers discussed above in relation to 620. Techniques for causing an AV to take maneuvers are well known in the art. Any known or to be known technique for causing an AV to take maneuvers can be used here. Subsequently, 630 is performed where method 600 ends or other processing is performed.
[00111] In contrast, if the collision can be avoided in the pre-defined time period [624: YES], then 628 is performed where the AV is optionally caused to perform a second maneuver (e.g., mildly slow down). Subsequently, 630 is performed where method 600 ends or other processing is performed.
[00112] CLF Object Detection
[00113] The following discussion is directed to a novel solution for detecting objects. This novel solution may be performed in block 504 of FIG. 5 and/or block 605 of FIG. 6. The novel solution is referred to herein as a CLF based solution.
[00114] The purpose of CLF object detection is to detect objects in a LiDAR point cloud with added context from image detections. The AV may operate is a cluttered environment in which objects can move and interact with the AV and/or each other. In a pure LiDAR environment, this task is extremely difficult in situations when objects are in close proximity to each other and interact with each other. The CLF object detection takes full advantage of monocular camera image detections where detections are fused with the LiDAR point cloud. LiDAR data points are projected into the monocular camera frame in order to transfer pixel information to the LiDAR data points, as described above. The transferred information can include, but is not limited to, color, object type and object instance.
[00115] There are several challenges when transferring labels from 2D image detections to a 3D LiDAR point cloud. In this regard, it should be noted that image pixels are not acquired at exactly the same time as LiDAR data points in the sweep wedge corresponding to the camera’s FOV. The camera’s exposure time window is usually much smaller than the time it takes for a LiDAR. spinning assembly to sweep over the camera’s horizontal FOV. This temporal alignment issue is most noticeable for moving objects having large angular velocity relative to the LiDAR system. It should also be noted that the LiDAR system is mounted at a different location than the monocular cameras. Due to a parallax issue there are regions of space perceived by the LiDAR system but not perceived by the camera, and vice versa. This makes label transfer ambiguous in cases where more than one LiDAR point project into the same region of the image. There are also issues with the accuracy and limited resolution of image detections masks, sensor calibrations errors, and the relative movement of the AV and actors.
[00116] The CLF based solution detects objects (e.g., objects 1022, 114 and/or 116 of FIG. 1) in a LiDAR point cloud with added context from image detections. The AV (e.g., AV 102i of FIG. 1) must operate in a cluttered environment, where objects can move and interact. In a pure LiDAR environment, this task is extremely difficult in situations when objects are in close proximity to each other and also when objects interact. Typical segment detection approaches based on Euclidean point clustering struggle to detect separate objects that are in close proximity to each other. For example, pedestrians staying close to the vehicle, loading it or entering it are likely to be represented as a single pedestrian+vehicle segment - the points are close, but objects are separate. On the opposite side, a large object (such as a bus) can often be detected as multiple objects if it’s partially occluded by other objects in front of it. Another challenge is that the large windowed area on the sides of the bus allows the light from the laser scanner to pass freely through the window and get returns from the objects inside the bus. This produces multiple fragments that actually belong to the same large object. Points are far apart, but they belong to the same object.
[00117] In the CLF based solution, monocular camera detections are fused with a LiDAR point cloud. Points of the LiDAR point cloud are projected into a monocular camera frame in order to transfer pixel information to each point in the LiDAR point cloud. The pixel information includes, but is not limited to, a color, an object type and an object instance. Notably, the cameras (e g., cameras 262 of FIG. 2) of the AV may have overlapping Fields Of View (“FOV”). The LiDAR system’s vertical FOV does not perfectly overlap with the vertical FOVs of the cameras. Therefore, some LiDAR points may be visible from multiple cameras and other points may not be visible from any of the cameras. To aid with temporal alignment, the cameras are configured to fire as a LiDAR system sweeps over the center of the camera’s FOV. This time alignment error (e.g., the difference in time between LiDAR point capture and image capture) is used to compute a projection uncertainty, which is then used for LiDAR to Image Detection matching.
[00118] The camera image information is used as an additional cue to aid with LiDAR point segmentation. The distance function used to cluster LiDAR points is augmented to include a color and an image detection instance compatibility. This makes LiDAR points projecting into different object detections in the image to appear as if they are further apart to the segmentation algorithm. Similarly, LiDAR points that project into the same image detection mask appear closer. This approach provides profound improvement compared to the segmentation that relies on Euclidean distance between points alone in cases with different objects in close proximity to each other.
[00119] Segmentation: Any segmentation algorithm can be used by the present solution as long as it supports a customized distance function. In some scenarios, the segmentation algorithm used in the CLF based solution is LVS. For LVS, the present solution may include color distance and/or image detection instance compatibility in the distance function. The two major error modes of any segmentation algorithm are under-segmentation (multiple objects represented with a single segment) and over- segmentation (single object represented as multiple segments). In the CLF based solution, an optimization is performed for a minimal number of under-segmentation events at the cost of a high number of over-segmentation events. Oversegmentation events are then handled by a separate Segment Merger component.
[00120] Segment Merger: Any machine-learned classification technique can be employed by the present solution to learn which segments should be merged. The machine-learned classification technique includes, but are not limited to, an artificial neural network, a random forest, a decision tree, and/or a support vector machine. The machine-learned classification technique is trained to determine which segments should be merged with each other. The same image detection information that was used in segmentation is now aggregated over the constituent points of the segment in order to compute segment-level features. In addition to that the ground height and lane information features from HD map are also used to aid segment merging.
[00121] Segment Filter: Not all detected segments are relevant to the AV and many of them correspond to the clutter off the road (buildings, poles, garbage cans, etc.). This is where image detection information is used again to find relevant objects off the road. Because only tracking actors that can move off the road are only of interest, static objects can be discarded to improve the rest of the tracking pipeline latency and reduce its compute requirements. It is important to distinguish relevant objects (e.g., moving objects, or objects that can move and possibly intersect the AV path if start moving) from static objects (e.g., objects that are unlikely to move). Highly relevant objects may be assigned the highest priority in order to allocate limited onboard computation resources accordingly. Every image detection mask corresponds to a collection of LiDAR points inside a frustum in 3D space. The challenge here is that there are usually multiple objects at different depths projecting into the same image detection mask. An example is a vehicle detection with the pole in front of it and also a pedestrian behind it. LiDAR points that belong to the true pedestrian object and pole object will have points labeled as vehicle due to projection errors that occur during sensor fusion stage. These errors arise from difference in time when LiDAR point was acquired and when image pixel was acquired, parallax effect due to different positions of LiDAR and camera (LiDAR may see above the object seen by the camera), AV movement, actor movement, calibration errors, and/or accuracy and limited resolution of image detection masks. In order to resolve the ambiguity of image detection mask to segment association, projection characteristics are determined for all segments containing points that project into a particular image detection mask. Only 1 or a few best matches that are likely to correspond to the object detected on the image are reported. This helps eliminate clutter from the set of tracked objects and reduce tracking pipeline latency and computational requirements.
[00122] The present CLF based solution has many advantages. For example, the present CLF based solution takes full advantage of image detections but does not only rely on image detections or machine learning. This means both separating objects in close proximity and detecting objects that have not been recognized before. This approach combines ML image detections with classical methods for point cloud segmentation
[00123] Over-Segmentation + Merge strategy is probably well known for image pixels, but may not be widely used when applied to LiDAR point clouds. In other words, many baseline LiDAR detection approaches either operate with a single cluster step, or employ deep learning methods. The proposed approach builds small clusters from low level features, but then extracts more meaningful features from the clusters to determine which clusters to merge in order to form objects
[00124] Many learning based approaches use generic hand crafted features, or operate on the raw data (like the matching function in the Microsoft Kinect) The proposed approach incorporates several novel hand crafted features which are optimized for the objects in the environment (vehicles and vegetation).
[00125] Referring now to FIG. 7, there is provided a flow diagram of an illustrative method 700 for CLF based object detection. Method 700 begins with 702 and continues with 704 where operations are performed by a LiDAR system (e.g., LiDAR system 264 of FIG. 2) of the AV (e.g., AV 102i of FIG. 1 and/or 200 of FIG. 2) to generate a LiDAR dataset. The LiDAR dataset measures a distance (contains distance, azimuth and elevation measurements) from the AV to at least one object (e.g., vehicle 1022 of FIG. 1) at a given time t The LiDAR dataset comprises a plurality of data points that form a point cloud when plotted on a 3D graph. Techniques for generating LiDAR datasets are well known in the art. Any known or to be known technique for generating LiDAR datasets can be used here. In some scenarios, the LiDAR system continuously spins at 10 Hz and captures data at whatever its current angle is.
[00126] In 706, a detection is made as to when a sensor of the LiDAR system is about to sweep over a center of a camera’s FOV. Operations of the camera (e.g., camera 262 of FIG. 2) are triggered when such a detection is made, as shown by 708. In 710, the camera captures an image as the LiDAR system’s sensor sweeps over the center of the camera’s FOV. The image includes content representing the location of a first object (e.g., vehicle 1022 of FIG. 1) at a given time t relative to the AV. The image is referred to herein as a camera frame or a monocular camera frame. In some scenarios, the camera is a global shutter (i.e., all pixels are captured at the same time) operating at 20 Hz. The operations of 706-710 aid with the temporal alignment of the camera’ s firing with the LiDAR system sweeping. The time alignment error (i.e., the difference in time between LiDAR point capture and image capture) is therefore minimized in the camera.
[00127] In 712, an on-board computing device (e.g., on-board computing device 220 of FIG. 2) of the AV performs operations to obtain the image and the LiDAR dataset. The on-board computing device then performs operations in 714-728 to detect objects in proximity to the AV using the image and the LiDAR dataset. 714-728 involve: pruning (or reducing) a total number of points contained in the LiDAR dataset; performing LiDAR-to-Image object detection operations to compute a distribution of object detections that each point of the LiDAR dataset is likely to be in; performing local variation segmentation using the outputs of the LiDAR-to-Image object detection operations to create a plurality of segments of LiDAR data points; performing segment merging operations to merge the plurality of segments of LiDAR data points into objects; and performing segment filtering operations to detect objects in the point cloud defined by the LiDAR dataset. Notably, the LiDAR points can be further pruned one or more times during the on-board computing device’s processing of the image and LiDAR dataset as shown by 718, 722 and 726. This pruning can improve computational efficiency of the on-board computing device. Notably, the point pruning operations of 714, 718, 722 and 726 are described in detail in the following section entitled “Point Pruning” and in relation to FIG. 8. The LiDAR- to-Image object detection operations of 716 are described in detail in the following section entitled “LiDAR-to-Image Detection Matching” and in relation to FIGS. 9-14. The local variation segmentation operations of 720 are described in detail in the following section entitled “Local Variation Segmentation with Image Detection Features” and in relation to FIGS. 15-16. The segment merging operations of 724 are described in detail in the following section entitled “Segment Merger” and in relation to FIG 17. The segment filtering operations of 728 are described in detail in the following section entitled “Object Detection Segment Filtering” and in relation to FIG. 18. Upon completing 728, 730 is performed where method 700 ends or other processing is performed.
[00128] Point Pruning
[00129] LiDAR datasets may contain a significant number of points. For instance, a LiDAR scanner (e.g., LiDAR sensor system 264 of FIG. 2) may produce a high density range image that contains more than 100,000 points every 100 ms. Processing each and every LiDAR data point can be prohibitively expensive in a real-time system. As such, limiting the number of LiDAR data points that are ultimately processed by the system for object detection purposes yields advantages including, without limitation, reduced energy consumption, reduced draws on hardware capacity, and reduced system latency. Accordingly, the present solution implements a method for pruning (or reducing) the number of LiDAR data points that are processed for purposes of detecting an object (e.g., AV 1022 of FIG. 1) that is located in proximity to an AV (e.g., AV 102i or FIG. 1).
[00130] Referring now to FIG. 8, there is provided a flow diagram of an illustrative method 800 for pruning (or reducing) the number of LiDAR data points that are processed for purposes of detecting an object (e.g., AV 102i of FIG. 1) that is located in proximity to an AV (e.g., AV 102i or FIG. 1). Method 800 may be performed by an on-board computing device (e.g., onboard computing device 220 of FIG. 2) and/or a remote computing device (e.g., computing device 110 of FIG. 1). The operations of method 800 may be performed in the same or different order in accordance with a given application. Also, method 800 may be absent of one or more operations in accordance with a given application. In this regard, it should be understood that one or more of the below described criteria for downsampling LiDAR data points can be employed during a given application. It should also be understood that the operations of 804-814 may be performed at different points during an object detection process. For example, the downsampling operations of 804 can be performed in 714 of FIG. 7. The downsampling operations of 806-808 can be performed in 714 and/or 718. The downsampling operations of 810 can be performed in 718 of FIG. 7. The operations of 812 can be performed in 714 and/or 722 of FIG. 7. The operations of 814 can be performed in 714, 718, 722 and/or 726 of FIG. 7. The present solution is not limited to the particulars of this example.
[00131] As shown in FIG. 8, method 800 begins with 802 and continues with optional 804 where the LiDAR dataset is downsampled based on a planned trajectory of an AV. For example, downsampling is performed for LiDAR data points corresponding to a region of interest along a planned trajectory of the AV at a lower rate than the LiDAR data points corresponding to other regions that are not along the planned trajectory of the AV. Downsampling may additionally or alternatively be performed for LiDAR data points corresponding to regions that are not of interest along the planned trajectory at a higher rate than the LiDAR data points corresponding to a region of interest. A region of interest may be a region that includes LiDAR data points corresponding to at least one object that is likely to interfere with the AV when following the planned trajectory (e g., a region that includes a vehicle, a bicycle and/or a pedestrian along the planned trajectory of the AV). Regions that are not regions of interest may include LiDAR data points that correspond to at least one object that is unlikely to interfere with the AV when following the planned trajectory. This object may include, but is not limited to, a parked vehicle on the side of a road, and a vehicle to the rear of the AV that is traveling in the opposite direction as the AV.
[00132] In optional 806, LiDAR data points of the LiDAR dataset are projected into a camera frame (or image) in order to transfer information from the image-based object detections to the LiDAR data points. Techniques for projecting LiDAR data points into a camera frame are well known in the art. Any known or to be known technique for projecting LiDAR data points into a frame can be used here without limitation. One known projection technique implements a naive projection algorithm that is defined by mathematical equation (1) provided below. The transferred information is referred to herein as point labels. A point label refers to an indication or description associated with a LiDAR data point that includes information or data particular to that LiDAR data point. For instance, a point label may include an object class identifier (e.g., a vehicle class identifier, a pedestrian class identifier, a tree class identifier, and/or a building class identifier), a color (e g., an RGB value), at least one unique identifier (e.g., for the object, corresponding image pixel(s), and/or LiDAR data point), and/or an object instance identifier (e.g., if there are many objects of the same class detected in an image).
[00133] In optional 808, the system (e.g., system 100 of FIG. 1 and/or 200 of FIG. 2) may downsample a LiDAR dataset based on the associated point labels. For example, points of a LiDAR dataset are partitioned into two or more classes based on the point labels associated with the points of the LiDAR dataset. For example, LiDAR data points may be separated into two classes, namely a first class containing LiDAR data points assigned high importance labels and a second class containing LiDAR data points assigned low importance labels. High importance labels may comprise labels that are important to track with a high accuracy. The high importance label is assigned to LiDAR data points with, for example, object class identifiers associated with a vehicle class, a pedestrian, a bicycle, or other moving object class. Low importance label is assigned to LiDAR data points with, for example, object class identifiers that are associated with static object classes (e.g., a building class, a foliage class, a construction barrier class, and/or a signage class). The low importance labels may be less important than high importance labels to track with a high degree of accuracy. The LiDAR dataset is then downsampled based on the importance labels of the points in the LiDAR dataset (as determined by their corresponding point labels). For example, LiDAR data points having high importance labels are not downsampled, or are alternatively downsampled with a high resolution. LiDAR data points having low importance labels are downsampled more aggressively than the LiDAR data points having high importance labels, i.e., with a lower resolution. The present solution is not limited to the particulars of this example. [00134] In optional 810, the LiDAR dataset is downsampled in accordance with a frustum pruning algorithm. A LiDAR dataset may include points that correspond to objects (e.g., other vehicles, pedestrians, cyclists, and/or signs) located on a road or other path of travel (e g., bike trail or path), and/or points that correspond to objects (e.g., buildings, trees and/or other foliage) located off road or other path of travel. A frustum may be generated for one or more the detected objects. The frustum corresponding to an image detection bounding box encompasses LiDAR data points of a point cloud that are likely to correspond to a particular object. The LiDAR data points that project within or in proximity to the image detection bounding box may be of more relevance or importance to the object detection process than the LiDAR data points that project further away from the bounding box since the LiDAR data points located further away from the bounding box are unlikely to correspond to objects of interest (e.g., pedestrian, bike, vehicle). As such, the LiDAR data points may be further downsampled and/or pruned based on their distances from the bounding box. For example, pruning is performed for the LiDAR data points that are located more than a threshold distance away from the bounding box. If the distance is less than or equal to the threshold distance, then the point remains in the LiDAR dataset. If the distance is greater than the threshold distance, the point is removed from the LiDAR dataset. The present solution is not limited to the particulars of this example. If in addition to the image detection bounding box, the image object boundary is known (in the form of a pixel mask for example), then (instead of using the distance to the bounding box) the distance to the mask can be used instead of the distance to the bounding box. The decision as to whether to keep the point in the dataset is determined based on whether the point projects into the dilated mask.
[00135] In optional 812, the LiDAR dataset is downsampled using a map that includes information associated with a trajectory of an AV (e.g., AV 102i of FIG 1). For instance, an AV may have a planned trajectory or path of travel that it is autonomously following. The map includes various information that corresponds to the planned trajectory or path of travel. This information may include, but is not limited to, information about lane placement, surface gradient, road boundaries, and/or locations of stationary objects. The map may be stored and/or retrieved from a datastore (e g., memory 412 of FIG. 4) of the AV. One or more points of the LiDAR dataset may be identified for downsampling relative to the map. More specifically, downsampling is performed for LiDAR data points that are located below a minimum height threshold value on the map. For example, an assumption is made that most LiDAR points of interest to an AV correspond to objects that have heights that exceed a certain height measurement (e.g., two feet). Points are removed from the LiDAR dataset that are associated with heights less than the minimum height threshold value (e.g., two feet). An assumption may also be made that most LiDAR points of interest to an AV correspond to objects that have heights below a maximum height threshold value (e.g., 100 feet). Thus, points are removed from the LiDAR dataset that are associated with heights exceeding the maximum threshold value. The present solution is not limited to the particulars of this example.
[00136] In optional 814, the points of the LiDAR dataset are downsampled based on process latency. An object detection pipeline may employ multiple algorithms that have different time complexity characteristics. The entire pipeline latency as a function of input data size may be a non-linear curve. Analysis of latency data from vehicle logs may provide insights on how the function looks. For example, the function may be a linear function and/or a higher order function (e.g., polynomial). By accumulating data, a pipeline latency model is created. The pipeline latency model is then utilized to estimate latency given a certain input data size, and may use this estimated latency to manipulate downampling resolution. Subsequently, 816 is performed where method 800 ends or other operations are performed.
[00137] LiDAR-to-Image Detection Matching
[00138] The LID matching algorithm of the present solution has multiple aspects. These aspects include: (i) synchronizing camera firing with LiDAR system sweeping; (ii) accounting for projection uncertainty with known camera calibration uncertainties; and (iii) determining which image detection of a plurality of image detections each point in a LiDAR dataset is most likely to be in. As noted above, aspect (i) is achieved by triggering image capturing when a focal point of the LiDAR sensor is aligned with a center of the camera’ s FOV. This time alignment error (i.e., the difference in time between LiDAR point capture and image capture) is minimized by this synchronization. Aspect (ii) involves: determining an uncertainty in camera calibration based on eleven calibration parameters (i.e., 5 intrinsic: an xy focal length, a skew, an xy image center; 6 extrinsic: XYZ translation, 3 degrees of freedom rotation); projecting the uncertainty into a camera frame; and determining a distribution of pixels to which a LiDAR point may project (instead of a single pixel). Aspect (iii) is achieved by: considering each object detection as an independent measurement; and using the confidences to compute a distribution of detections in which a LiDAR point is likely to be. Aspects (i)-(iii) allow the LID matching algorithm to account for several sources of error and uncertainty to better match LiDAR points with camera-space objects.
[00139] Accordingly, the LID matching algorithm takes into account both projection uncertainty and the full confidence information in image detections. Presently, no projection uncertainty is considered and image detection confidences (in the whole detection and per-pixel in the mask) are binarized. Object type estimation would be updated to take the new matching into account.
[00140] There are two major issues with the conventional LID matching algorithms, mask bleed/mask erosion and mask shift. The present solution solves both of these issues by estimating p(lpi Ε dj) (where Ipi represents a LiDAR point i and dj represents an image detection j), instead of providing the image detections that a LiDAR point naively projects into. This probability estimation takes into account image detection confidences, projection uncertainty, and the interaction of multiple overlapping masks. There are several known sources of projection uncertainty such as camera intrinsics, camera-to-LiDAR extrinsics, and time alignment (due to errors in motion compensation (e g., a bump in the road is not well tracked by pose) and due to object movement).
[00141] This change requires a change to the current single-frame object type estimation. Instead of a bitset-counting method, the present solution computes an object type distribution for each image detection that a LiDAR point may project into. The set of object type distributions are then combined using the estimated probability for each image detection. A naive method might be, for a point in multiple image detections, to average the type distribution for each image detection. The present solution is a weighted average, weighted by the likelihood of each image detection into account. [00142] Referring now to FIG. 9, there is provided a flow diagram of a method 900 for performing the LID matching algorithm. Method 900 begins with 904 where an image (e.g., image 1000 of FIG. 10) is obtained by an on-board computing device (e g., on-board computing device 220 of FIG. 2) of an AV (e g., AV 1021 of FIG. 1). Method 900 continues with image analysis operations 906-912. These image analysis operations 906-912 may be performed by a Commercial-Off-The-Shelf (“COTS”) image analyzer implementing a conventional object detection algorithm. 906-912 generally involve: identifying one or more objects (e g., vehicle 1022 of FIG. 1, cyclist 114 of FIG. 1, pedestrian 116 of FIG. 1, and/or vehicle 1002 of FIG. 10) in the image; defining a two dimensional bounding box (e.g., bounding box 1100 of FIG 11) encompassing each identified object; defining a mask (or grid) (e.g., mask 1200 of FIG. 12) for each two dimensional bounding box; and computing a confidence value for each cell of the mask (or grid) that the pixel(s) therein belong to a given detected object. Technique for computing confidence values for object detection purposes are well known in the art.
[00143] In 914, the on-board computing device determines or obtains extrinsic LiDAR sensor and camera calibration parameters and intrinsic camera calibration parameters. The extrinsic LiDAR sensor and camera calibration parameters include, but are not limited to, LiDAR sensor coordinates, and/or information indicating a correspondence between LiDAR sensor coordinates and camera coordinates. The intrinsic camera calibration parameters include, but are not limited to, an x focal length, a y focal length, a skew, an image center, a focal center of the image, and/or 3D coordinates (x, y, z) of a camera position.
[00144] In 916, various information is input into a LID matching algorithm. This information includes, but is not limited to, identifiers for each object detected in the image, mask identifiers, cell identifiers for each mask, confidence values for each cell, LiDAR point identifiers, LiDAR point coordinates, extrinsic LiDAR sensor and camera calibration parameters, and intrinsic camera calibration parameters. These inputs are used in subsequent operations 918-920 to: determine (for each point of the LiDAR dataset) a probability distribution of pixels to which a LiDAR data point may project taking into account a projection uncertainty in view of camera calibration uncertainties; and determine (for each point of the LiDAR dataset) a probability distribution over a set of object detections in which a LiDAR data point is likely to be, based on the confidence values. The operations of 918 are described in detail below in relation to FIG. 13. The operations of 920 are described in detail below in relation to FIG. 14. Subsequently, 922 is performed where method 900 ends or other operations are performed (e.g., return to 902).
[00145] As shown in FIG. 13, 918 of FIG. 9 involves a plurality of sub-operations 1304-1306. In 1304, the on-board computing device computes a Probability Distribution Function (“PDF”) /(x*, /) over image space coordinates for a pixel to which a LiDAR point would probably project in accordance with a naive projection algorithm (i.e., a probability distribution that is centered around a naive projection point). The naive projection algorithm is defined by the following mathematical equation (1).
Figure imgf000044_0002
Figure imgf000044_0001
where x' and / represent image space coordinates for a pixel, and X, ¥ and Z represent LiDAR space coordinates for a point of the LiDAR dataset. Basically, each point of the LiDAR dataset is projected onto a pixel of an image that resides on the same line as the pixel, where a line is drawn from each pixel to a region of the image.
[00146] There are several sources of projection uncertainty, such as time alignment and object movement. If multiple sources of projection uncertainty are used, then each PDF for each LiDAR point is required to be: (i) representable in image space coordinates; (ii) convertible to image detection mask coordinates (can be translated and scaled); and (iii) composable (or combinable) with other projection uncertainty PDFs. The present solution uses a Jacobian of the PDF to propagate an uncertainty from LiDAR-frame to camera-frame. This (or a similar alternative for propagating uncertainty) helps satisfy requirement (i) for probability distribution. [00147] The PDF is then converted to image detection mask coordinates, as shown by 1306.
This conversion is achieved via translation and scaling (where the scaling in x and the scaling in y are independent). The conversion is defined by the following mathematical equation (2).
Figure imgf000045_0001
where Xbbax and Xbbax represent the image space boundaries of a bounding box and R is a mask resolution.
[00148] As shown in FIG. 14, 920 of FIG. 9 involves performing various operations 1404- 1408. In 1404, the on-board computing device computes a probability that a LiDAR point Ipi projects into a given image detection independent of all other image detections (e.g.,
Figure imgf000045_0005
d2, . . dio). The probability is expressed as At this point, a exists for a
Figure imgf000045_0004
Figure imgf000045_0003
likely LiDAR point projection over image detection mask coordinates. The image detection confidence ca and the per-pixel confidences cxm, ym are considered in this computation. These confidences are in [0, 1] but are not probabilities. A mapping in applied to compute a probability p(d) from Cd and a probability p (mp Ε d) from cxm, ym, where mp represents a mask pixel. The mapping can include, but is not limited to, a logistic function. The per-pixel confidences in the image detection mask are for the whole mask pixel (no infinitesimal coordinates). So, the onboard computing device computes the probability that a LiDAR point projects into a specific image detection mask pixel in accordance with mathematical equation (3).
Figure imgf000045_0002
where Ip is a LiDAR point, tnp is a mask pixel, represent the x limits of a pixel in
Figure imgf000046_0002
Figure imgf000046_0001
mask coordinates, represents the y limits of the pixel in mask coordinates, dmp represents a mask pixel associated with a given object detection d, dy represents y-axis coordinate for a mask pixel associated with the given object detection <Z, and dx represents an x- axis coordinate for the mask pixel associated with the given object detection d This probability p lp e mp) is then used by the on-board computing device to compute the probability that the LiDAR point is in the image detection independent of all other image detections. This computation is defined by the following mathematical equation (4).
Figure imgf000046_0003
where the mask resolution is R by R. For each point, this probability is computed for each detection the LiDAR point may project into. In some scenarios, the probabilities may sum up to greater than one. An assumption is made that a LiDAR point can only project into a single image detection. Thus, each independent probability is treated as an independent measurement is independent of
Figure imgf000046_0006
Figure imgf000046_0005
[00149] The on-board computing device further computes the probability that the LiDAR point does not project into any image detection, as shown by 1406. This computation is defined by mathematical equation (5).
Figure imgf000046_0004
[00150] Finally in 1408, the on-board computing device computes a dependent probability by normalizing over all computed probabilities. This computation is defined by the following mathematical equation (6).
Figure imgf000047_0001
where represents a probability that that a point of the LiDAR dataset projects into
Figure imgf000047_0002
an image detection independent of all other image detections and
Figure imgf000047_0003
represents a probability that the LiDAR point does not project into any image detection. The result of this computation represents the probability that a LiDAR point projects into a particular detection. For each LiDAR point, the LID matching algorithm outputs this probability for every detection that the LiDAR point may project into. That is, for each point, a sparse probability distribution over image detections is output from the LID matching algorithm. The sparse probability distribution represents the probability distribution over a set of object detections in which a LiDAR data point is likely to be.
[00151] Local Variation Segmentation with Image Detection Features
[00152] In some conventional object detection algorithms using a 3D LiDAR point cloud, small point cloud clusters are created that are very similar, and therefore are almost certainly in the same object The point cloud clusters (fewer in number and having more context than single points) are merged into an obj ect. Common methods for point cloud clustering are Connected Components and Density-Based Spatial Clustering of Applications (“DBSCAN”). Both of these methods only take into account local information (and not the larger context of the scene). A better method is Point Cloud Local Variation Segmentation (“PCLVS”), which combines local information with wider context. An illustrative PCLVS algorithm is discussed in a document entitled “Graph Based Over-Segmentation Methods for 3D Point Clouds”, written by Yizhak Ben-Shabat et al. This document discusses using multiple features of a point (location, color based on an image, and direction of an estimated surface at the point). These features alone are not necessarily enough to keep two close objects from being merged together.
[00153] The present solution provides an improved LVS based algorithm that eliminates or minimizes the merging of close objects. This improvement is at least partially achieved through the use of additional features including (i) an image detection capability feature and (ii) a modified distance feature. Feature (i) is the difference between which image detections each point is in. Each point has a per-camera distribution of image detections that it’s in (and the likelihood that it’s not in any image detection). The information from all cameras are combined probabilistically into a single number that indicates whether the points are likely in the same image detection or not. Feature (ii) is an expanded or contracted height component of a geometric distance between points. Feature (ii) is provided to address the issues that point clouds do not have a uniform density of points and that there are fewer lasers pointed at the upper and lower ends of an object. Feature (i) and (ii) are combined in the LVS based algorithm with common features such as color similarity. Features (i) and (ii) provide a superior object detection capability, by being more likely to combine clusters that are in the same object and less likely to combine clusters that are not in the same object.
[00154] The conventional PCLVS algorithm handles segmentation in a wide variety of relatively easy and moderate scenarios for extracting objects from a point cloud, but does not currently perform as desired in challenging scenarios. This approach does not leverage other aspects of the information available from the LiDAR data, such as (i) the negative information provided by LiDAR returns passing through regions of the environment without interacting and (ii) the underlying structure of how the data is captured. This information can be used to improve performance of segmentation in ambiguous or challenging scenarios. Furthermore, the PCLVS approach attempts to largely produce segments which correspond 1 : 1 to objects in the world, without rigorously utilizing information outside the LiDAR returns to do so. This leads to an increase in segmentation errors, particularly under-segmentation errors. Under-segmentation errors are particularly difficult to solve after segmentation, due to the fact that splitting an undersegmented object requires implementing a second segmentation algorithm. Biasing towards over- segmentation provides two crucial benefits: an improvement in the ability to extract the boundaries which most critically impact motion planning for an AV, and allowing postprocessing to reason about merging segments together, which is a fundamentally different algorithm. The present solution proposes a new LVS based segmentation approach which solves these problems: providing a framework for integrating additional information from the LiDAR sensors; defining the problem to ensure that the output is structured in a fashion which is more amendable to downstream processing; and improving performance by reducing undersegmentation and improving boundary recall.
[00155] Referring now to FIG. 15, there is provided an illustration that is useful for understanding the novel LVS algorithm 1500 of the present solution. As shown in FIG. 15, LiDAR data points 1502 are input into the LVS algorithm 1500. The LiDAR data points 1502 are passed to a graph constructor 1504 where a connectivity graph is constructed by plotting the LiDAR data points on a 3D graph and connecting LiDAR data points. The LiDAR data point connections may be made based on whether two points are within a threshold spatial distance from each other, and/or whether two points are within a threshold temporal distance from each other. In other scenarios, each LiDAR data point is connected to its K-nearest neighbors. In yet other scenarios, a Delaunay triangulation is constructed and used as the connectivity graph. The connected LiDAR data points represent a proposed set of LiDAR data points that should be merged to form a segment 1512. An illustrative graph 1600 is provided in FIG. 16. As shown in FIG. 16, the graph 1600 has a plurality of nodes 1602 representing LiDAR data points or measurements. Connection lines 1604 have been added between the nodes 1602. The connection lines 1604 are also referred to herein as graph edges ey.
[00156] Next, a descriptor determiner 1506 determines a descriptor for each node 1602 (or LiDAR data point). The descriptor is a vector V of elements that characterize the node (or LiDAR data point). The elements include, but are not limited to, surface normals Ni, a per-point color value (RiGiBi) based on an image (e.g., image 1000 of FIG. 10), an intensity li, a texture 7}. spatial coordinates (xi, yi, zi), height above ground Hi, a probability distribution ck over object classes for point (or node) image detections that they project into, a set of instance identifiers {id,}, an image based feature f\ and/or a Fast Point Feature Histogram FPFHi Each of the listed elements RiGiBi, h, Ti, (xi,yi, zi) and FPFHi is well known in the art. Accordingly, the vector V may be defined by the following mathematical equation (7).
V = (Ni, RGiBi, li, Ti, (Xi, yi, zi), Hi, ck, idi, fi, FPFHi, . . .) (7)
[00157] An edge weight assignor 1508 assigns weights to each graph edge eij. The graph edge comprises an edge feature MDi. The modified distance MDi is an expanded or contracted height component of a geometric distance between nodes (or LiDAR data points). The modified distance MDi may be defined by the following mathematical equation (8).
Figure imgf000050_0001
H is the point height above ground, a and k are constants for logistic function that compresses the Z axis distances when points are close to the ground.
[00158] The weights each represent a dissimilarity measure between two adjacent nodes 1602. A weight is computed for each type of element contained in the vector V. More specifically, a weight wn is computed for surface normal, which may be defined by the following mathematical equation (9).
Figure imgf000050_0002
A weight wc is computed for color, which may be defined by the following mathematical equation (10).
Figure imgf000050_0003
A weight wi i s computed for intensity, which may be defined by the following mathematical equation (11).
Figure imgf000050_0004
where - and Ij are LiDAR point intensities, and I max is the maximum possible intensity value. A weight Wd is computed for 3D graph coordinates, which may be defined by the following mathematical equation (12).
Figure imgf000051_0001
where dmin represents a minimum distance within the graph, and dmax represents a maximum distance within the graph.
A weight WcI is computed for ch. which may be defined by the following mathematical equation (13). The value of weight Wd may be 1 if the object classes are different, or -1 if the object classes are the same. A graph node may be composed of multiple LiDAR points. As noted above, cIi is the probability distribution over object classes for constituent points. Bhattacharyya distance can be used to compute the similarity between two probability distributions.
A weight WFPFH is computed for False Point Feature Histogram which may be defined by the following mathematical equation (14).
Figure imgf000051_0002
A weight wiDc is computed for image detection capability which may be defined by the following mathematical equation (15).
Figure imgf000051_0003
where c is the compatibility between points, C is the set of all cameras, Dc is the set of image detections in C, and d is the clamping function.
A weight WMD is computed for modified distance which may be the same as MDij above. [00159] The above weights may be combined into one non-negative scalar w(eij) by, for example, linear combination. The information from all cameras are combined probabilistically into a single number that indicates whether the points are likely in the same image detection or not. The non-negative scaler w(eij) may be defined by the following mathematical equation (16).
Figure imgf000052_0002
where kn, kc, ki, kT, kd, kH, kcI, kid, kppFH, kIDC and km are predefined constants. The scaler w(eij) is then assigned by the edge assignor 1508 as the edge weight for a given graph edge ey. The edge weights w(ey) are then passed to a LiDAR point merger 1510,
[00160] The LiDAR point merger 1510 uses the edge weights w(ey) to decide which LiDAR data points should be merged together to form segments 1512. The LiDAR points are merged based on these decisions. The output of the LiDAR point merger 1510 is a plurality of segments 1512. The segments 1512 are used in subsequent segment merging operations.
[00161] The iterative segment merging operations performed by the LiDAR point merger 1510 involve building segments by iteratively merging smaller segments, until a stopping condition is reached. Specifically, all nodes 1602 are initially considered individual segments, and all graph edges 1604 are sorted in ascending order by edge weight w(ey). The graph edges 1604 are considered in order, treating each graph edge as a merge proposal if the graph edge connects two different segments. A merge proposal is accepted if the weight between the two segments is less than a largest internal variation of the two segments, plus a term which biases segmentation to merge small segments. More formally, given segments Ci, Cj connected by an edge ey with a weight w(eij)
Figure imgf000052_0001
where MST(Cx) defines a minimum spanning tree of C, and 5 represents a parameter controlling a degree of segmentation which occurs. This threshold can be applied on a per-element basis or on a weighted sum of the weight defined for the graph edge. The final output 1512 is a segmentation of all observations into distinct clusters. Each of the segments 1512 comprises one or more LiDAR points.
[00162] In some scenarios, a metric generator 1514 is provided to collect, compute and/or generate segmentation metrics from the segmentation operation and output. In defining these, an assumption is made that a labeled ground truth is available for point clouds labelling all the objects which are of interest to the system. Note that since this segmentation approach is meant to detect all objects which should be avoided, labels should exist for obstacles which are not of interest to the detection system, but should be avoided (e.g., foreign-object debris, road signs, and/or garbage cans). The proposed metrics include an under-segmentation error metric, a boundary recall metric, and an instance precision and recall metric.
[00163] Under-segmentation error metric measures how much the segmentation results include segments which cross boundaries between distinct objects in the scene. Since an undersegmentation event involves two ground truth segments, this error metric must be computed such that it does not double count the event. The under-segmentation error metric can be computed by finding each segment which intersects more than one ground-truth object, and dividing the segment between the ground-truth objects. The under-segmentation error metric is then defined as the sum of the smaller of the two sub-segments for all these under-segmentations, averaged over the number of points across all segments. More formally, the under-segmentation error metric UE is defined by the following mathematical equation (18).
Figure imgf000053_0001
where GT represents a set of ground truth labels, and O represents a set of computed labels.
[00164] The boundary recall metric measures a degree to which a boundary of each object is recovered by segmentation. Over-segmentation produces boundaries which are internal to ground truth segmentation, but are intrinsic to the performance improvements of the present approach. Thus, this metric aims to measure how many of the LiDAR data points which represent boundaries of objects are extracted by a given segmentation. This can be computed by projecting the 3D point cloud data into a depth image, and painting each pixel with an associated segment label. Boundaries can thus be computed by finding the edges in the image. The same process can be performed with the output segmentation, with edges then being labels as true positives (edges present in both images) and false negatives (edges present in the ground truth data, but not in the output segmentation). The boundary recall metric BR may be defined by the following mathematical equation (19).
Figure imgf000054_0001
[00165] A performance of extracting objects of interest can be computed as precision and recall metrics over object instances. For each object in the ground truth, a determination can be made as to whether a segment is majority associated with a ground truth label in the same fashion as is performed in under-segmentation error. With this information, precision and recall can be computed in a standard fashion.
[00166] Segment Merger
[00167] Notably, the segments 1512 output from the LVS algorithm 1500 are too small for estimating cuboids. As such, a segment merger is employed to construct segments large enough for subsequent shape prior (e.g., cuboid) estimation. The segment merger performs segment merging operations that generally involve: selecting pairs of segments; identifying which pairs of segments have a centroid-to-centroid distance greater than a threshold value (e.g., 3 m); computing features for each segment pair (which centroid-to-centroid distance less than the threshold value (e.g., < 3 m)) based on the attributes of the segments contained in the pair; generating (for each segment pair) a probability that the segments should be merged based on the computed features; and merging segments based on the probabilities.
[00168] Referring now to FIG. 17, an illustration is provided of an illustrative segment merger 1700. The segments 1512 are input into the segment merger 1700. The segments 1512 may optionally be pre-processed in 1706. Pre-processing operations are well known in the art. The pre-processing can involve selecting pairs of segments, obtaining centroids for the segments, determining centroid-to-centroid distances for each pair of segments, identifying which pairs of segments have a centroid-to-centroid distance greater than a threshold value (e.g., 3 m), and removing the identified pairs of segments from further consideration for segment merging purposes. In some scenarios, the threshold values is defined as a sum of a first segments’ radius from the centroid and a second segment’s radius from the centroid plus a pre-defined constant (e.g., 0.5 m).
[00169] The remaining segments are passed to an attribute generator 1708 and a graph constructor 1710. At the attribute generator 1608, a set of attributes for each segment may be obtained and/or generated. A set of attributes can include, but is not limited to: (i) a 2D region that the LiDAR data points in the segment cover; (ii) an average of the probability distributions that were computed in 920 of FIG. 9 for the LiDAR. data points contained in the segment; (iii) a percentage of LiDAR. data points contained in the segment that are on a road; (iv) a percentage of LiDAR data points contained in the segment that are off a road; and/or (v) a total number of lanes that a segment at least partially overlaps. Attributes (i), (iii), (iv) and (v) may be determined using a road map, a lane map and/or other map. For example, attribute (i) is determined by identifying a region on the map where the segment resides. Attributes (ii) and (iii) are determined by identifying which LiDAR data points in a segment reside on a road contained in the map, and identifying which LiDAR data points in a segment do not reside on a road contained in the map. Attribute (iv) is determined by identifying which lanes in a map the LiDAR data points of segment cover, and counting the number of identified lanes.
[00170] At the graph constructor 1710, a graph is constructed in which the segments are plotted. Links are added to the graph for pairs of nearby-by segments (taking into account the size of each segment). These links define pairs of segments for which features should be generated by feature generator 1712.
[00171] In some scenarios, each set of features describes a pairing of two segments. The features may be generated using the attributes generated by attribute generator 1708. The features can include, but are not limited to: • difference between an average of a probability distributions that was computed in 920 of FIG. 9 for a first segment and an average of a probability distributions that was computed in 920 of FIG. 9 for a second segment;
• difference in on-road proportions, or difference in a percentage of LiDAR data points contained in a first segment that are on a road and a percentage of LiDAR data points contained in a second segment that are on a road;
• difference in off-road proportions (e.g., difference in a percentage of LiDAR data points contained in a first segment that are off a road and a percentage of LiDAR data points contained in a second segment that are off a road);
• region compatibility (e.g., a degree of overlap between the 2D lanes which are covered by first and second segments);
• lane compatibility (e.g., a degree of overlap between the lanes in which first and second segments are in) (e.g., If the sets of lanes overlap, then compatible. If neither segment is in any lanes, no information. If the sets of lanes do not overlap, then non-compatible);
• difference between the total number of lanes that a first segment at least partially overlaps and the total number of lanes that a second segment at least partially overlaps;
• a nearest distance between convex hulls (or points);
• a Hausdoff distance between convex hulls (or points);
• whether the convex hulls intersect;
• a difference or distance in height between segments (e.g., If a height interval for each segment intersects, this distance is zero. Otherwise, this is the distance between a minimum height of a higher segments and a maximum height of a lower segment.),
• mask compatibility which is defined by mathematical equation where C represents a set of cameras, D represents a set of
Figure imgf000057_0001
image detections, W1cd represents summed probabilities of points in segment 1 matching with image detection d in camera c, and npic represents a total number of points in segment 1 that projected into camera c. This results in a number between 0 and 1 (because for any segment. This weights the contribution of each
Figure imgf000057_0002
camera by the number of points that project into it.,
• dominant mask compatibility (If segments have a same dominant mask in any camera, then compatible. If segments have different dominant masks in any camera, then incompatible. If there is a tie, then compatibility is determined as (segment 1 points that project to segment 1 dominant mask)*(segment 2 points that project to segment 2 dominant mask );
• difference in object type distributions (e.g., earth-mover’s distance);
• average range of two segments (context feature);
• smaller most likely obj ect size;
• shared most likely object type (e.g., If the most likely object type of each segment is the same, then that object type.);
• object type compatibility (e.g., If there is any intersection in the types that any constituent point projects to, then compatible.);
• dominant object type compatibility (e.g., If there is a dominant object type (most points project to a mask that is of a type) and this dominant type is the same for both segments, then compatible); • difference between area-related features (e g., difference between a ratio of [sum of areas of segment convex hulls] to [area of merged segment convex hulls] for two segments); and
• difference of color histogram.
[00172] The features are then passed from the feature generator 1712 to the machine learned classifier 1714. The machine learned classifier 1714 analyzes each set of features to determine a probability that the corresponding segments should be merged. For example, a low probability for merging two segments is determined when (1) a difference between probability distribution averages exceeds a threshold value and (2) lane incompatibility exists. In contrast, a high probability exists when (1) a difference between probability distributions averages is less than a threshold value and (2) lane compatibility exists. The present solution is not limited in this regard. The probabilities could be assigned a numerical value (e.g., 0-10) in addition to or as an alternative to a level (e g , low, medium, or high). The level or degree of probability can be determined by any combination of features selected in accordance with a given application.
[00173] The machine learned classifier 1714 is trained using a machine learning algorithm that learns when two segments should be merged together in view of one or more features. Any machine learning algorithm can be used herein without limitation. For example, one or more of the following machine learning algorithms is employed here: supervised learning; unsupervised learning; semi-supervised learning; and reinforcement learning. The learned information by the machine learning algorithm can be used to generate rules for determining a probability that two segments should be merged. These rules are then implemented by the machine learned classifier 1714.
[00174] The merge probabilities are then analyzed by the machined learned classifier 1714 to classify the pairs of segments as merge pairs or non-merge pairs. For example, a pair of segments is classified as a merge pair when the respective merge probability has a level of high or has a numerical value greater than a threshold value. In contrast, a pair of segments is classified as a non-merge pair when the respective merge probability has a level of low or has a numerical value less than a threshold value. The present solution is not limited to the particulars of this example.
[00175] The classifications are then passed to the merger 1716. The merger 1716 merges the segments in accordance with the classifications. For example, segments in each merge pair are merged together. Notably, redundant links are not evaluated for segment merging purposes. For example, if segment A should merge with segment B and segment B should merge with segment C, then the segment merger 1716 does not evaluate merging segment A with segment C. The present solution is not limited to the particulars of this example.
[00176] It should be noted that one issue that is difficult to deal with from a pairwise segment merging perspective is fragments of larger vehicles. For example, a large box truck may be observed as multiple fragments that are far apart Due to projection uncertainty, these fragments often do not project into the same image detection mask as the back of the truck, and there is not enough context in order for the merger to combine these fragments based on the merge probabilities. Therefore, the merger 1716 performs additional operations to fit each large detected segment detected as vehicle (e.g., a back of the truck) to a shape model (e.g., a cuboid) in order to estimate the true extent of the detected object. Bounding box estimator may use ground height and lane information from onboard HD map and visual heading from image detection. The estimated cuboid now has enough information to merge fragments based on their overlap area with the estimated cuboid. Another example where cuboids help is segmentation of the buses. A large window area allows laser light to pass through and scan the interior portions of the bus resulting in multiple fragments that are far away from the L-shape of the bus exterior. Upon completing the merge operations, the merger 1716 outputs a plurality of merged segments 1714
[00177] Object Detection Segment Filtering
[00178] Not all detected segments of LiDAR data points are relevant to the AV, and many of the segments correspond to the clutter off the road (e.g., buildings, poles, and/or garbage cans). The image detections are used to find relevant objects off the road. Because only off-road moving actors are of interest, static objects can be discarded to improve the rest of the CLF object detection pipeline and reduce the CLF object detection algorithm’s computational requirements.
[00179] Because of the label transfer issues, there are multiple segments of LiDAR data points inside the frustum corresponding to the image detection mask. There may be 0 or 1 true positive segments in this collection, and 0 to N false positive segments in this collection. Solving this association problem is a primary purpose of the CLF segment filter.
[00180] An example is a vehicle detection with a pole in front and also a pedestrian behind. LiDAR data points that belong to the true pedestrian object and the pole object will have points labeled as vehicles due to projection errors that occur during the sensor fusion stage. In order to resolve the ambiguity of image detection mask to segment association, projection characteristics are computed for all segments containing LiDAR data points that project into a particular image detection mask. One or more best matches are reported that are likely to correspond to the object detected on the image. This helps eliminate clutter from the set of tracked objects, and reduces tracking pipeline latency and computational requirements.
[00181] Referring now to FIG. 18, there is provided a flow diagram of an illustrative method 1800 for object detection segment filtering. Input into a segment filter is a collection of candidate segments formed at earlier stages of the pipeline, where every candidate may or may not correspond to the real world object. For every segment, there would be two sets of points that form the segment. P point - points that project into the image detection mask. N points - points that don’t project into the image detection mask, but in close proximity to the segment. The intuition behind adding nearby points, followed by geometric segmentation is that the projected points of a false cluster (such as a wall or a tree) will have many /V points within close distance to P points which results in a single cluster containing both point categories. The resulting false cluster will contain a relatively small number of Ppoints compared to the total number of points in the cluster. A true cluster, however, will mostly consist of Ppoints with a relatively small number of /V points. Thus, a cluster feature U is needed to discriminate true segments of LiDAR data points from false segments of LiDAR data points. The cluster feature U is defined by the following mathematical equation (20). U = count (P)/(count (N) + count (P)) (20) where Prepresents a number of projected points in a cluster, and N represents a number of points in close distance to the points P. Sometimes when using the cluster feature U alone, cases may be encountered when it is not sufficient to identify true segments For example, a larger true object (e.g., vehicle) may be occluded by a smaller false object (e.g., a pole). In this case, a smaller false object may consist entirely of P points while a vehicle cluster will have some mix of P points and /V points. To help such cases, another cluster feature Pis needed, and used in conjunction with cluster feature U to verify that the segment is correctly associated with a given object detection. The cluster feature Pis defined by the following mathematical equation (21).
V = count (P)/ count (D ) (21 ) where D represents a total number of points that project into a particular image detection mask m (e.g., mask 1200 of FIG. 12). The D points are usually distributed across multiple objects in the world.
[00182] There are other cluster features that can be used to identify segments of LiDAR data points that are associated with a pedestrian, a vehicle, a bicyclist, and/or any other moving object. These additional cluster features include a cluster feature H representing a cluster height, a cluster feature £ representing a cluster length, and a cluster feature LTW representing a length- to-width ratio for a cluster.
[00183] Clusters with a height above 2.0 - 2.5 meters are unlikely to be associated with pedestrians. Clusters over 1 meter in length are unlikely to be associated with pedestrians. Clusters with a length-to-width ratio above 4.0 often tend to be associated with buildings and are unlikely associated with pedestrians. Clusters with high cylinder convolution score are likely to be associated with pedestrians.
[00184] Accordingly, method 1800 begins with 1804 where various information for a given image detection mask m (e.g., mask 1200 of FIG. 12) is obtained (e.g., from memory 412 of FIG. 4). This information includes, but is not limited to, Pm representing a number of points of a LiDAR data set that project into the mask m, Si representing a number of points forming a given merged segment s of LiDAR data points (e g., merged segment 1714 of FIG. 17), sm representing a number of points in the given merged segment s projecting into the mask m, a height hs of a merged segment, a length ls of the merged segment, and/or a width n\ of the merged segment. This information is used in 1806-1810 to determined one or more cluster features U, V, H, L, and/or LTW. Cluster feature U may be determined in accordance with the following mathematical equation (22), and cluster feature V may be determined in accordance with the following mathematical equation (23).
Figure imgf000062_0001
Cluster feature H is set equal to hs. Cluster feature £ is set equal to ls. Cluster feature LTW may be determined by the following mathematical equation (24).
Figure imgf000062_0002
[00185] In 1812, a projection score PS is computed based on the cluster features U, V, H, L, and/or LTW. The projection score may be defined by the following mathematical equation (25).
Figure imgf000062_0003
Notably, the present solution is not limited to mathematical equation (25). The projection score can represent the product of any combination of cluster features.
[00186] Next in 1814, the projection score is used to verify that the merged segment is part of the detected object associated with a given image detection mask. Such verification can be made when the projection score is greater than a threshold value. An object detection may be made in 1816 when such a verification is made. In some scenarios, the object detection is made based on the results of operations 1804-1814 for two or more merged segments that are associated with the same image detection mask. For example, an object detection is made that a given merged segment of a plurality of merged segments is associated with a given detected object when the PS computed for the given merged segment is greater than the PSs computed for the other merged segments of the plurality of merged segments. Subsequently, 1818 is performed where method 1800 ends or other processing is performed.
[00187] Although the present solution has been illustrated and described with respect to one or more implementations, equivalent alterations and modifications will occur to others skilled in the art upon the reading and understanding of this specification and the annexed drawings. In addition, while a particular feature of the present solution may have been disclosed with respect to only one of several implementations, such feature may be combined with one or more other features of the other implementations as may be desired and advantageous for any given or particular application. Thus, the breadth and scope of the present solution should not be limited by any of the above described embodiments. Rather, the scope of the present solution should be defined in accordance with the following claims and their equivalents.

Claims

CLAIMS What is claimed is:
1. A method for controlling an autonomous vehicle, comprising: obtaining, by a computing device, a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; using, by the computing device, the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle, the object being detected by matching points of the LiDAR dataset to pixels in the at least one image, and detecting the object in a point cloud defined by the LiDAR dataset based on the matching; using, by the computing device, the object detection to facilitate at least one autonomous driving operation.
2. The method according to claim 1, further comprising obtaining, by the computing device, at least one image that was captured at a time when a sensor of the LiDAR system swept over a center of a camera field of view, wherein the at least one image is used in addition to the LiDAR dataset to detect the object.
3. The method according to claim 1, wherein the at least one autonomous driving operation comprises an object tracking operation, an object trajectory prediction operation, a vehicle trajectory determination operation, or a collision avoidance operation.
4. The method according to claim 1, wherein the matching is based on at least one of identifiers for each object detected in the at least one image, a mask identifier, cell identifiers for a mask, confidence values for each cell, LiDAR point identifiers, LiDAR point coordinates, extrinsic LiDAR sensor and camera calibration parameters, and intrinsic camera calibration parameters.
5. The method according to claim 1, wherein the matching comprises determining a probability distribution of pixels of the at least one image to which a point of the LiDAR dataset may project taking into account a projection uncertainty in view of camera calibration uncertainties.
6. The method according to claim 5, wherein the probability distribution is determined by computing a probability distribution function over image space coordinates for a pixel to which a point of the LiDAR dataset would probably project.
7. The method according to claim 6, wherein the probability distribution function is computed in accordance with following mathematical equation
Figure imgf000065_0001
where x' and ' represent image space coordinates for a pixel, and X F and Z represent LiDAR space coordinates for a point of the LiDAR dataset.
8. The method according to claim 6, wherein the probability distribution function is converted to image detection mask coordinates in accordance with the following mathematical equation
Figure imgf000065_0002
+ where Xbbax and Xbb<K represent image space boundaries of a bounding box, and R represents a mask resolution.
9. The method according to claim 1, wherein the matching comprises determining a probability distribution over a set of object detections in which a point of the LiDAR dataset is likely to be, based on at least one confidence value indicating a level of confidence that at least one respective pixel of the at least one image belongs to a given detected object.
10. The method according to claim 9, wherein the probability distribution is determined by computing a probability that a point of the LiDAR dataset projects into an image detection independent of all other image detections.
11. The method according to claim 10, wherein the probability is computed in accordance with the following mathematical equation
Figure imgf000066_0001
where Ip is a LiDAR point, mp is a mask pixel, represent the x limits of a pixel in
Figure imgf000066_0002
mask coordinates,
Figure imgf000066_0003
represents the y limits of the pixel in mask coordinates, dmp represents a mask pixel associated with a given object detection d, dy represents y-axis coordinate for a mask pixel associated with the given object detection d, and dx represents an x- axis coordinate for the mask pixel associated with the given object detection d
12. The method according to claim 11, wherein the probability is computed in accordance with the following mathematical equation
Figure imgf000066_0004
13. The method according to claim 10, wherein the matching comprises determining a probability that the LiDAR point does not project into any image detection.
14. The method according to claim 13, wherein the matching comprises normalizing a plurality of probabilities determined for a given point of the LiDAR dataset in accordance with the following mathematical equation
Figure imgf000067_0001
where represents a probability that that a point of the LiDAR dataset projects
Figure imgf000067_0002
into an image detection independent of all other image detections and
Figure imgf000067_0003
represents a probability that the LiDAR point does not project into any image detection.
15. A system, comprising: a processor; a non-transitory computer-readable storage medium comprising programming instructions that are configured to cause the processor to implement a method for operating an autonomous vehicle, wherein the programming instructions comprise instructions to: obtaining, by a computing device, a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; using, by a computing device, the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle, the object being detected by matching points of the LiDAR dataset to pixels in the at least one image, and detecting the object in a point cloud defined by the LiDAR dataset based on the matching; using, by the computing device, the object detection to facilitate at least one autonomous driving operation.
16. The system according to claim 15, wherein the programming instructions further comprise instructions to obtain at least one image that was captured at a time when a sensor of the LiDAR system swept over a center of a camera field of view, wherein the at least one image is used in addition to the LiDAR dataset to detect the object.
17. The system according to claim 15, wherein the at least one autonomous driving operation comprises an object tracking operation, an object trajectory prediction operation, a vehicle trajectory determination operation, or a collision avoidance operation.
18. The system according to claim 15, wherein the matching is based on at least one of identifiers for each object detected in the at least one image, a mask identifier, cell identifiers for a mask, confidence values for each cell, LiDAR point identifiers, LiDAR point coordinates, extrinsic LiDAR sensor and camera calibration parameters, and intrinsic camera calibration parameters.
19. The system according to claim 15, wherein the matching comprises determining a probability distribution of pixels of the at least one image to which a point of the LiDAR dataset may project taking into account a projection uncertainty in view of camera calibration uncertainties.
20. The system according to claim 19, wherein the probability distribution is determined by computing a probability distribution function over image space coordinates for a pixel to which a point of the LiDAR dataset would probably project.
21. The system according to claim 20, wherein the probability distribution function is computed in accordance with following mathematical equation
Figure imgf000069_0001
where and / represent image space coordinates for a pixel, and X. Y and Z represent LiDAR space coordinates for a point of the LiDAR dataset.
22. The system according to claim 20, wherein the probability distribution function is converted to image detection mask coordinates in accordance with the following mathematical equation
Figure imgf000069_0002
bbox bbox
+ where Xbbax and Xbbax represent image space boundaries of a bounding box, and R represents a mask resolution.
23. The system according to claim 15, wherein the matching comprises determining a probability distribution over a set of object detections in which a point of the LiDAR dataset is likely to be, based on at least one confidence value indicating a level of confidence that at least one respective pixel of the at least one image belongs to a given detected object.
24. The system according to claim 23, wherein the probability distribution is determined by computing a probability that a point of the LiDAR dataset projects into an image detection independent of all other image detections.
25. The system according to claim 24, wherein the probability is computed in accordance with the following mathematical equation
Figure imgf000070_0001
where Ip is a LiDAR point, mp is a mask pixel, represent the x limits of a pixel in
Figure imgf000070_0002
mask coordinates, represents the y limits of the pixel in mask coordinates, dmp
Figure imgf000070_0003
represents a mask pixel associated with a given object detection d, dy represents y-axis coordinate for a mask pixel associated with the given object detection d, and dx represents an x- axis coordinate for the mask pixel associated with the given object detection d
26. The system according to claim 25, wherein the probability is computed in accordance with the following mathematical equation
Figure imgf000070_0004
27. The system according to claim 24, wherein the matching comprises determining a probability that the LiDAR point does not project into any image detection.
28. The method according to claim 27, wherein the matching comprises normalizing a plurality of probabilities determined for a given point of the LiDAR dataset in accordance with the following mathematical equation
Figure imgf000070_0005
where represents a probability that that a point of the LiDAR dataset projects into
Figure imgf000071_0001
an image detection independent of all other image detections and
Figure imgf000071_0002
represents a probability that the LiDAR point does not project into any image detection.
29. A non-transitory computer-readable medium having instructions stored thereon that, when executed by at least one computing device, cause the at least one computing device to perform operations comprising: obtaining a LiDAR dataset generated by a LiDAR system of an autonomous vehicle; using the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle, the object being detected by matching points of the LiDAR dataset to pixels in the at least one image, and detecting the object in a point cloud defined by the LiDAR dataset based on the matching; and using the object detection to facilitate at least one autonomous driving operation.
30. A method for operating an autonomous vehicle, comprising: obtaining, by a computing device, a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; using, by a computing device, the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle, the object being detected by generating a pruned LiDAR dataset by reducing a total number of points contained in the LiDAR dataset, and detecting the object in a point cloud defined by the pruned LiDAR dataset; using, by the computing device, the object detection to facilitate at least one autonomous driving operation.
31. The method according to claim 30, further comprising obtaining, by the computing device, at least one image that was captured at a time when a sensor of the LiDAR system swept over a center of a camera Field Of View (“FOV”), wherein the at least one image is used in addition to the LiDAR dataset to detect the object.
32. The method according to claim 30, wherein the at least one autonomous driving operation comprises an object tracking operation, an object trajectory prediction operation, a vehicle trajectory determination operation, or a collision avoidance operation.
33. The method according to claim 30, wherein the pruned LiDAR dataset is generated by downsampling the points based on a planned trajectory of the autonomous vehicle.
34. The method according to claim 33, wherein points of the LiDAR dataset corresponding to a first region along the planned trajectory of the autonomous vehicle are downsampled at a lower sampling rate than points of the LiDAR dataset corresponding to a second region that is not along the planned trajectory of the autonomous vehicle.
35. The method according to claim 34, wherein points of the LiDAR dataset corresponding to a first region along the planned trajectory of the autonomous vehicle are downsampled at a higher sampling rate than points of the LiDAR dataset corresponding to a second region that is also along the planned trajectory of the autonomous vehicle.
36. The method according to claim 35, wherein the first region comprises a region including points corresponding to at least one object that is unlikely to interfere with the autonomous vehicle when following the planned trajectory, and the second region comprises a region including points corresponding to at least one object that is likely to interfere with the autonomous vehicle when following the planned trajectory.
37. The method according to claim 30, wherein the pruned LiDAR dataset is generated by downsampling the LiDAR dataset based on point labels assigned to the points.
38. The method according to claim 37, wherein each of said point labels comprises at least one of an object class identifier, a color, and a unique identifier.
39. The method according to claim 37, wherein the LiDAR dataset is downsampled by assigning a first importance label to points associated with a moving object class and a second importance label to points associated with a static object class.
40. The method according to claim 39, wherein the points assigned the first importance label are downsampled at a first resolution and the points assigned the second importance label are downsampled at a second resolution lower than the first resolution.
41. The method according to claim 39, wherein only the points assigned the second importance label are downsampled.
42. The method according to claim 30, wherein the pruned LiDAR dataset is generated by downsampling the LiDAR dataset based on point distances from a bounding box.
43. The method according to claim 42, wherein a point is removed from the LiDAR dataset when a respective one of the point distances is greater than a threshold distance.
44. The method according to claim 30, wherein the pruned LiDAR dataset is generated by downsampling the LiDAR dataset using a map that includes information associated with a planned trajectory of the autonomous vehicle.
45. The method according to claim 44, wherein a point is removed from the LiDAR dataset when the point has a height less than a minimum height threshold value or greater than a maximum height threshold value.
46. The method according to claim 30, wherein the pruned LiDAR dataset is generated by downsampling the LiDAR dataset at a resolution selected based on a modeled process latency.
47. A system, comprising: a processor; a non-transitory computer-readable storage medium comprising programming instructions that are configured to cause the processor to implement a method for operating an autonomous vehicle, wherein the programming instructions comprise instructions to: obtain a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; use the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle, the object being detected by generating a pruned LiDAR dataset by reducing a total number of points contained in the LiDAR dataset, and detecting the object in a point cloud defined by the pruned LiDAR dataset; use the object detection to facilitate at least one autonomous driving operation.
48. The system according to claim 47, wherein the programming instructions comprise instructions to obtain at least one image that was captured at a time when a sensor of the LiDAR system swept over a center of a camera field of view, the at least one image is used in addition to the LiDAR dataset to detect the object.
49. The system according to claim 47, wherein the at least one autonomous driving operation comprises an object tracking operation, an object trajectory prediction operation, a vehicle trajectory determination operation, or a collision avoidance operation.
50. The system according to claim 47, wherein the pruned LiDAR dataset is generated by downsampling the points based on a planned trajectory of the autonomous vehicle.
51. The system according to claim 50, wherein points of the LiDAR dataset corresponding to a first region along the planned trajectory of the autonomous vehicle are downsampled at a lower sampling rate than points of the LiDAR dataset corresponding to a second region that is not along the planned trajectory of the autonomous vehicle.
52. The system according to claim 51, wherein points of the LiDAR dataset corresponding to a first region along the planned trajectory of the autonomous vehicle are downsampled at a higher sampling rate than points of the LiDAR dataset corresponding to a second region that is also along the planned trajectory of the autonomous vehicle.
53. The system according to claim 52, wherein the first region comprises a region including points corresponding to at least one object that is unlikely to interfere with the autonomous vehicle when following the planned trajectory, and the second region comprises a region including points corresponding to at least one object that is likely to interfere with the autonomous vehicle when following the planned trajectory.
54. The system according to claim 47, wherein the pruned LiDAR dataset is generated by downsampling the LiDAR dataset based on point labels assigned to the points.
55. The system according to claim 54, wherein each of said point labels comprises at least one of an object class identifier, a color, and a unique identifier.
56. The system according to claim 55, wherein the LiDAR dataset is downsampled by assigning a first importance label to points associated with a moving object class and a second importance label to points associated with a static object class.
57. The system according to claim 56, wherein the points assigned the first importance label are downsampled at a first resolution and the points assigned the second importance label are downsampled at a second resolution lower than the first resolution.
58. The system according to claim 56, wherein only the points assigned the second importance label are downsampled.
59. The system according to claim 47, wherein the pruned LiDAR dataset is generated by downsampling the LiDAR dataset based on point distances from a bounding box.
60. The system according to claim 59, wherein a point is removed from the LiDAR dataset when a respective one of the point distances is greater than a threshold distance.
61. The system according to claim 47, wherein the pruned LiDAR dataset is generated by downsampling the LiDAR dataset using a map that includes information associated with a planned trajectory of the autonomous vehicle.
62. The system according to claim 61, wherein a point is removed from the LiDAR dataset when the point has a height less than a minimum height threshold value or greater than a maximum height threshold value.
63. The system according to claim 47, wherein the pruned LiDAR dataset is generated by downsampling the LiDAR dataset at a resolution selected based on a modeled process latency.
64. A non-transitory computer-readable medium having instructions stored thereon that, when executed by at least one computing device, cause the at least one computing device to perform operations comprising: obtaining, by a computing device, a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; using, by a computing device, the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle, the object being detected by generating a pruned LiDAR dataset by reducing a total number of points contained in the LiDAR dataset, and detecting the object in a point cloud defined by the pruned LiDAR dataset, using, by the computing device, the object detection to facilitate at least one autonomous driving operation.
65. A method for controlling an autonomous vehicle, comprising: obtaining, by a computing device, a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; using, by the computing device, the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle, the object being detected by computing a distribution of object detections that each point of the LiDAR dataset is likely to be in, creating a plurality of segments of LiDAR data points using the distribution of object detections, and detecting the object in a point cloud defined by the LiDAR dataset based on the plurality of segments of LiDAR data points; and using, by the computing device, the object detection to facilitate at least one autonomous driving operation.
66. The method according to claim 65, further comprising obtaining, by the computing device, at least one image that was captured at a time when a sensor of the LiDAR system swept over a center of a camera field of view, wherein the at least one image is used in addition to the LiDAR dataset to detect the object.
67. The method according to claim 65, wherein the at least one autonomous driving operation comprises an object tracking operation, an object trajectory prediction operation, a vehicle trajectory determination operation, or a collision avoidance operation.
68. The method according to claim 65, wherein the distribution of object detections is computed based on (a) a probability distribution of pixels of the at least one image to which a point of the LiDAR dataset may project, and (b) a probability that the point does not project into any image detection.
69. The method according to claim 65, wherein the plurality of segments of LiDAR data points are created by using the LiDAR dataset to construct a connectivity graph, the connectivity graph comprising points of the LiDAR dataset plotted in a 3D coordinate system and connection lines respectively connecting the points.
70. The method according to claim 69, wherein the connection lines are added to the connectivity graph based on whether two points of the LiDAR dataset are within a threshold spatial or temporal distance from each other, whether two points are nearest neighbors, or triangulation.
71. The method according to claim 69, wherein the plurality of segments of LiDAR data points are created by further determining, for each point in the connectivity graph, a descriptor comprising a vector of elements that characterize a given point of the LiDAR data set.
72. The method according to claim 71, wherein the elements comprise at least one of a surface normal, a color value based on the at least one image, an intensity, a texture, spatial coordinates, a height above ground, a class label, an instance identifier, an image based feature, a Fast Point Feature Histogram, an image detection capability, and a modified distance.
73. The method according to claim 71 , wherein the plurality of segments of LiDAR data points are created by further assigning a weight to each connection line based on the descriptor, the weight representing a dissimilarity measure between two points connected to each other in the connectivity graph via the connection line.
74. The method according to claim 71 , wherein the plurality of segments of LiDAR data points are created by further merging points of the LiDAR dataset based on the weights.
75. The method according to claim 74, wherein two points are merged together when a weight associated with a respective connection line is less than a threshold value.
76. A system, comprising: a processor; a non-transitory computer-readable storage medium comprising programming instructions that are configured to cause the processor to implement a method for operating an autonomous vehicle, wherein the programming instructions comprise instructions to: obtain a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; use the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle, the object being detected by computing a distribution of object detections that each point of the LiDAR dataset is likely to be in, creating a plurality of segments of LiDAR data points using the distribution of object detections, and detecting the object in a point cloud defined by the LiDAR dataset based on the plurality of segments of LiDAR data points; and use the object detection to facilitate at least one autonomous driving operation.
77. The system according to claim 76, wherein the programming instructions further comprise instructions to obtain at least one image that was captured at a time when a sensor of the LiDAR system swept over a center of a camera Field Of View (“FOV”), wherein the at least one image is used in addition to the LiDAR dataset to detect the object.
78. The system according to claim 76, wherein the at least one autonomous driving operation comprises an object tracking operation, an object trajectory prediction operation, a vehicle trajectory determination operation, or a collision avoidance operation.
79. The system according to claim 76, wherein the distribution of object detections is computed based on (a) a probability distribution of pixels of the at least one image to which a point of the LiDAR dataset may project, and (b) a probability that the point does not project into any image detection.
80. The system according to claim 76, wherein the plurality of segments of LiDAR data points are created by using the LiDAR dataset to construct a connectivity graph, the connectivity graph comprising points of the LiDAR dataset plotted in a 3D coordinate system and connection lines respectively connecting the points.
81. The system according to claim 80, wherein the connection lines are added to the connectivity graph based on whether two points of the LiDAR dataset are within a threshold spatial or temporal distance from each other, whether two points are nearest neighbors, or triangulation.
82. The system according to claim 80, wherein the plurality of segments of LiDAR data points are created by further determining, for each point in the connectivity graph, a descriptor comprising a vector of elements that characterize a given point of the LiDAR data set.
83. The system according to claim 82, wherein the elements comprise at least one of a surface normal, a color value based on the at least one image, an intensity, a texture, spatial coordinates, a height above ground, a class label, an instance identifier, an image based feature, a Fast Point Feature Histogram, an image detection capability, and a modified distance.
84. The system according to claim 83, wherein the plurality of segments of LiDAR data points are created by further assigning a weight to each connection line based on the descriptor, the weight representing a dissimilarity measure between two points connected to each other in the connectivity graph via the connection line.
85. The system according to claim 83, wherein the plurality of segments of LiDAR data points are created by further merging points of the LiDAR dataset based on the weights.
86. The system according to claim 85, wherein two points are merged together when a weight associated with a respective connection line is less than a threshold value.
87. A non-transitory computer-readable medium having instructions stored thereon that, when executed by at least one computing device, cause the at least one computing device to perform operations comprising: obtaining, by a computing device, a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; using, by the computing device, the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle, the object being detected by computing a distribution of object detections that each point of the LiDAR dataset is likely to be in, creating a plurality of segments of LiDAR data points using the distribution of object detections, and detecting the object in a point cloud defined by the LiDAR dataset based on the plurality of segments of LiDAR data points; and using, by the computing device, the object detection to facilitate at least one autonomous driving operation.
88. A method for controlling an autonomous vehicle, comprising: obtaining, by a computing device, a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; using, by the computing device, the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle, the object being detected by computing a distribution of object detections that each point of the LiDAR dataset is likely to be in, creating a plurality of segments of LiDAR data points using the distribution of object detections, merging the plurality of segments of LiDAR data points to generate merged segments, and detecting the object in a point cloud defined by the LiDAR dataset based on the merged segments; using, by the computing device, the object detection to facilitate at least one autonomous driving operation.
89. The method according to claim 88, further comprising obtaining, by the computing device, at least one image that was captured at a time when a sensor of the LiDAR system swept over a center of a camera field of view, wherein the at least one image is used in addition to the LiDAR dataset to detect the object.
90. The method according to claim 88, wherein the at least one autonomous driving operation comprises an object tracking operation, an object trajectory prediction operation, a vehicle trajectory determination operation, or a collision avoidance operation.
91. The method according to claim 88, wherein the distribution of object detections is computed based on (a) a probability distribution of pixels of the at least one image to which a point of the LiDAR dataset may project, and (b) a probability that the point does not project into any image detection.
92. The method according to claim 88, wherein the merged segments are generated by: selecting pairs of segments from the plurality of segments of LiDAR data points; computing features for each pair of segments based on attributes of the segments contained in the pair; generating, for each pair of segments, a probability that the segments contained in the pair should be merged based on the features; and merging the plurality of segments of LiDAR data points based on the probabilities generated for the pairs of segments.
93. The method according to claim 92, further comprising filtering the pairs of segments to remove pairs of segments which have centroid-to-centroid distances greater than a threshold value.
94. The method according to claim 92, wherein the attributes comprise an average of a plurality of probability distributions that were computed for the LiDAR data points contained in a given segment of the plurality of segments of LiDAR data points, each probability distribution specifying detected objects in which a given LiDAR data point is likely to be.
95. The method according to claim 94, wherein the features comprise difference between the average of the probability distributions that were computed for the LiDAR data points contained in a first segment of the plurality of segments of LiDAR data points and the average of the probability distributions that were computed for the LiDAR data points contained in a second segment of the plurality of segments of LiDAR data points.
96. The method according to claim 92, wherein the attributes comprise at least one of a 2D region that the LiDAR data points in a given segment cover, a percentage of LiDAR data points contained in the given segment that are on a road, a percentage of LiDAR data points contained in the given segment that are off a road, and a total number of lanes that the given segment at least partially overlaps.
97. The method according to claim 92, wherein the features comprise at least one of difference in on-road proportions, difference in off-road proportions, region compatibility, lane compatibility, a difference between a total number of lanes that a first segment of LiDAR data points at least partially overlaps and a total number of lanes that a second segment of LiDAR data points at least partially overlaps, a difference or distance in height between segments of LiDAR data points, a mask compatibility, a difference in object type distributions, and an object type compatibility.
98. A system, comprising: a processor; a non-transitory computer-readable storage medium comprising programming instructions that are configured to cause the processor to implement a method for operating an autonomous vehicle, wherein the programming instructions comprise instructions to: obtain a LiDAR dataset generated by a LiDAR system of the autonomous vehicle, use the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle, the object being detected by computing a distribution of object detections that each point of the LiDAR dataset is likely to be in, creating a plurality of segments of LiDAR data points using the distribution of object detections, merging the plurality of segments of LiDAR data points to generate merged segments, and detecting the object in a point cloud defined by the LiDAR dataset based on the merged segments; use the object detection to facilitate at least one autonomous driving operation.
99. The system according to claim 98, wherein the programming instructions further comprise instructions to obtain at least one image that was captured at a time when a sensor of the LiDAR system swept over a center of a camera field of view, wherein the at least one image is used in addition to the LiDAR dataset to detect the object.
100. The system according to claim 98, wherein the at least one autonomous driving operation comprises an object tracking operation, an object trajectory prediction operation, a vehicle trajectory determination operation, or a collision avoidance operation.
101. The system according to claim 98, wherein the distribution of object detections is computed based on (a) a probability distribution of pixels of the at least one image to which a point of the LiDAR dataset may project, and (b) a probability that the point does not project into any image detection.
102. The system according to claim 98, wherein the merged segments are generated by: selecting pairs of segments from the plurality of segments of LiDAR data points; computing features for each pair of segments based on attributes of the segments contained in the pair; generating, for each pair of segments, a probability that the segments contained in the pair should be merged based on the features; and merging the plurality of segments of LiDAR data points based on the probabilities generated for the pairs of segments.
103. The system according to claim 102, wherein the programming instructions further comprise instructions to filter the pairs of segments to remove pairs of segments which have centroid-to-centroid distances greater than a threshold value.
104. The system according to claim 102, wherein the attributes comprise an average of a plurality of probability distributions that were computed for the LiDAR data points contained in a given segment of the plurality of segments of LiDAR data points, each probability distribution specifying detected objects in which a given LiDAR data point is likely to be
105. The system according to claim 104, wherein the features comprise difference between the average of the probability distributions that were computed for the LiDAR data points contained in a first segment of the plurality of segments of LiDAR data points and the average of the probability distributions that were computed for the LiDAR data points contained in a second segment of the plurality of segments of LiDAR data points.
106. The system according to claim 102, wherein the attributes comprise at least one of a 2D region that the LiDAR data points in a given segment cover, a percentage of LiDAR data points contained in the given segment that are on a road, a percentage of LiDAR data points contained in the given segment that are off a road, and a total number of lanes that the given segment at least partially overlaps.
107. The system according to claim 102, wherein the features comprise at least one of difference in on-road proportions, difference in off-road proportions, region compatibility, lane compatibility, a difference between a total number of lanes that a first segment of LiDAR data points at least partially overlaps and a total number of lanes that a second segment of LiDAR data points at least partially overlaps, a difference or distance in height between segments of LiDAR data points, a mask compatibility, a difference in object type distributions, and an object type compatibility.
108. A non-transitory computer-readable medium having instructions stored thereon that, when executed by at least one computing device, cause the at least one computing device to perform operations comprising: obtaining a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; using the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle, the object being detected by computing a distribution of object detections that each point of the LiDAR dataset is likely to be in, creating a plurality of segments of LiDAR data points using the distribution of object detections, merging the plurality of segments of LiDAR data points to generate merged segments, and detecting the object in a point cloud defined by the LiDAR dataset based on the merged segments; using the object detection to facilitate at least one autonomous driving operation.
109. A method for controlling an autonomous vehicle, comprising: obtaining, by a computing device, a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; using, by the computing device, the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle, the object being detected by performing the following operations computing a distribution of object detections that each point of the LiDAR dataset is likely to be in, creating a plurality of segments of LiDAR data points using the distribution of object detections, merging the plurality of segments of LiDAR data points to generate merged segments; and detecting the object in a point cloud defined by the LiDAR dataset based on remaining ones the merged segments, the detecting comprising obtaining information for a given detection mask and a given merged segment of the merged segments; using, by the computing device, the object detection to facilitate at least one autonomous driving operation.
110. The method according to claim 109, further comprising obtaining, by the computing device, at least one image that was captured at a time when a sensor of the LiDAR system swept over a center of a camera field of view, wherein the at least one image is used in addition to the LiDAR dataset to detect the object.
111. The method according to claim 109, wherein the at least one autonomous driving operation comprises an object tracking operation, an object trajectory prediction operation, a vehicle trajectory determination operation, or a collision avoidance operation.
112. The method according to claim 109, wherein the distribution of object detections is computed based on (a) a probability distribution of pixels of the at least one image to which a point of the LiDAR dataset may project, and (b) a probability that the point does not project into any image detection.
113. The method according to claim 109, wherein the information comprises at least one of Pm representing a number of points of a LiDAR dataset that project into the given detection mask, Si representing a number of points forming the given merged segment, P^m representing a number of points in the given merged segment projecting into the given detection mask, a height of the given merged segment, a length ls of the given merged segment, and a width of the given merged segment.
114. The method according to claim 109, wherein the detecting further comprises determining at least one cluster feature based on the information.
115. The method according to claim 114, wherein the at least one cluster feature comprises a cluster feature U determined based on a number of points of a LiDAR dataset that project into the given detection mask and a number of points forming the given merged segment.
116. The method according to claim 114, wherein the at least one cluster feature comprises a cluster feature V determined based on a number of points in the given merged segment projecting into the given detection mask and a number of points of a LiDAR dataset that project into the given detection mask.
117. The method according to claim 114, wherein the at least one cluster feature comprises a cluster feature H representing a cluster height, a cluster feature L representing a cluster length, a cluster feature LTW representing a length-to-width ratio for a cluster, or a cluster feature C representing a cylinder convolution (or fit) score of clustered LiDAR data points.
118. The method according to claim 109, wherein the detecting further comprises computing a projection score PS based on the at least one cluster feature.
119. The method according to claim 118, wherein the projection score PS is a product of two or more cluster features.
120. The method according to claim 119, wherein the detecting further comprises using the projection score PS to verify that the given merged segment is part of a particular detected object that is associated with the given detection mask.
121. The method according to claim 120, wherein a verification is made that the given merged segment is part of a particular detected object that is associated with the given detection mask when the projection score PS exceeds a threshold value or has a value greater than other projection scores determined for other merged segments with points in the given detection mask.
122. A system, comprising: a processor; a non-transitory computer-readable storage medium comprising programming instructions that are configured to cause the processor to implement a method for operating an autonomous vehicle, wherein the programming instructions comprise instructions to: obtain a LiDAR dataset generated by a LiDAR system of the autonomous vehicle; use the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle, the object being detected by performing the following operations computing a distribution of object detections that each point of the LiDAR dataset is likely to be in, creating a plurality of segments of LiDAR data points using the distribution of object detections, merging the plurality of segments of LiDAR data points to generate merged segments, and detecting the object in a point cloud defined by the LiDAR dataset based on the merged segments, the detecting comprising obtaining information for a given detection mask and a given merged segment of the merged segments; and use the object detection to facilitate at least one autonomous driving operation.
123. The system according to claim 122, wherein the programming instructions comprise instructions to obtain at least one image that was captured at a time when a sensor of the LiDAR system swept over a center of a camera field of view, wherein the at least one image is used in addition to the LiDAR dataset to detect the object.
124. The system according to claim 122, wherein the at least one autonomous driving operation comprises an object tracking operation, an object trajectory prediction operation, a vehicle trajectory determination operation, or a collision avoidance operation.
125. The system according to claim 122, wherein the distribution of object detections is computed based on (a) a probability distribution of pixels of the at least one image to which a point of the LiDAR dataset may project, and (b) a probability that the point does not project into any image detection.
126. The system according to claim 122, wherein the information comprises at least one of Pm representing a number of points of a LiDAR dataset that project into the given detection mask, Si representing a number of points forming the given merged segment, P m representing a number of points in the given merged segment projecting into the given detection mask, a height of the given merged segment, a length ls of the given merged segment, and a width n\ of the given merged segment.
127. The system according to claim 126, wherein the detecting further comprises determining at least one cluster feature based on the information.
128. The system according to claim 127, wherein the at least one cluster feature comprises a cluster feature U determined based on a number of points of a LiDAR dataset that project into the given detection mask and a number of points forming the given merged segment.
129. The system according to claim 128, wherein the at least one cluster feature comprises a cluster feature V determined based on a number of points in the given merged segment projecting into the given detection mask and a number of points of a LiDAR dataset that project into the given detection mask.
130. The system according to claim 128, wherein the at least one cluster feature comprises a cluster feature H representing a cluster height, a cluster feature L representing a cluster length, a cluster feature LTW representing a length-to-width ratio for a cluster, or a cluster feature C representing a cylinder convolution (or fit) score of clustered LiDAR data points.
131. The system according to claim 126, wherein the detecting further comprises computing a projection score PS based on the at least one cluster feature.
132. The system according to claim 132, wherein the projection score PS is a product of two or more cluster features.
133. The system according to claim 132, wherein the detecting further comprises using the projection score PS to verify that the given merged segment is part of a particular detected object that is associated with the given detection mask.
134. The system according to claim 133, wherein a verification is made that the given merged segment is part of a particular detected object that is associated with the given detection mask when the projection score PS exceeds a threshold value or has a value greater than other projection scores determined for other merged segments with points in the given detection mask.
135. A non-transitory computer-readable medium having instructions stored thereon that, when executed by at least one computing device, cause the at least one computing device to perform operations comprising: obtaining a LiDAR dataset generated by a LiDAR system of an autonomous vehicle; using the LiDAR dataset and at least one image to detect an object that is in proximity to the autonomous vehicle, the object being detected by performing the following operations computing a distribution of object detections that each point of the LiDAR dataset is likely to be in, creating a plurality of segments of LiDAR data points using the distribution of object detections, merging the plurality of segments of LiDAR data points to generate merged segments, and detecting the object in a point cloud defined by the LiDAR dataset based on the merged segments, the detecting comprising obtaining information for a given detection mask and a given merged segment of the merged segments; and using the object detection to facilitate at least one autonomous driving operation.
PCT/US2021/054333 2020-10-23 2021-10-11 Systems and methods for camera-lidar fused object detection WO2022086739A2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
DE112021005607.7T DE112021005607T5 (en) 2020-10-23 2021-10-11 Systems and methods for camera-LiDAR-fused object detection
CN202180085904.7A CN116685874A (en) 2020-10-23 2021-10-11 Camera-laser radar fusion object detection system and method

Applications Claiming Priority (10)

Application Number Priority Date Filing Date Title
US17/078,543 2020-10-23
US17/078,548 2020-10-23
US17/078,548 US12135375B2 (en) 2020-10-23 2020-10-23 Systems and methods for camera-LiDAR fused object detection with local variation segmentation
US17/078,532 2020-10-23
US17/078,561 US12122428B2 (en) 2020-10-23 2020-10-23 Systems and methods for camera-LiDAR fused object detection with segment merging
US17/078,532 US12050273B2 (en) 2020-10-23 2020-10-23 Systems and methods for camera-LiDAR fused object detection with point pruning
US17/078,575 2020-10-23
US17/078,561 2020-10-23
US17/078,543 US11885886B2 (en) 2020-10-23 2020-10-23 Systems and methods for camera-LiDAR fused object detection with LiDAR-to-image detection matching
US17/078,575 US11430224B2 (en) 2020-10-23 2020-10-23 Systems and methods for camera-LiDAR fused object detection with segment filtering

Publications (2)

Publication Number Publication Date
WO2022086739A2 true WO2022086739A2 (en) 2022-04-28
WO2022086739A3 WO2022086739A3 (en) 2022-06-23

Family

ID=81291747

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2021/054333 WO2022086739A2 (en) 2020-10-23 2021-10-11 Systems and methods for camera-lidar fused object detection

Country Status (2)

Country Link
DE (1) DE112021005607T5 (en)
WO (1) WO2022086739A2 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115035184A (en) * 2022-06-13 2022-09-09 浙江大学 Honey pomelo volume estimation method based on lateral multi-view reconstruction
US20230303084A1 (en) * 2022-03-23 2023-09-28 Robert Bosch Gmbh Systems and methods for multi-modal data augmentation for perception tasks in autonomous driving
CN117706942A (en) * 2024-02-05 2024-03-15 四川大学 Environment sensing and self-adaptive driving auxiliary electronic control method and system
CN117893412A (en) * 2024-03-15 2024-04-16 北京理工大学 Point cloud data filtering method, device, equipment and storage medium
CN118349694A (en) * 2024-06-17 2024-07-16 山东大学 Method and system for generating ramp converging region vehicle track database
CN118429914A (en) * 2024-07-04 2024-08-02 中国科学院长春光学精密机械与物理研究所 Vehicle detection method based on combination of unmanned aerial vehicle visible light video and SAR image
CN118799689A (en) * 2024-09-11 2024-10-18 成都赛力斯科技有限公司 Target detection method and device based on multi-mode fusion, electronic equipment and medium
US12145592B2 (en) * 2022-03-23 2024-11-19 Robert Bosch Gmbh Systems and methods for multi-modal data augmentation for perception tasks in autonomous driving

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US12117528B2 (en) * 2019-02-15 2024-10-15 Arizona Board Of Regents On Behalf Of The University Of Arizona Mobile 3D imaging system and method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10671862B2 (en) * 2018-01-30 2020-06-02 Wipro Limited Method and system for detecting obstacles by autonomous vehicles in real-time
US10726567B2 (en) * 2018-05-03 2020-07-28 Zoox, Inc. Associating LIDAR data and image data
US10491885B1 (en) * 2018-06-13 2019-11-26 Luminar Technologies, Inc. Post-processing by lidar system guided by camera information

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230303084A1 (en) * 2022-03-23 2023-09-28 Robert Bosch Gmbh Systems and methods for multi-modal data augmentation for perception tasks in autonomous driving
US12145592B2 (en) * 2022-03-23 2024-11-19 Robert Bosch Gmbh Systems and methods for multi-modal data augmentation for perception tasks in autonomous driving
CN115035184A (en) * 2022-06-13 2022-09-09 浙江大学 Honey pomelo volume estimation method based on lateral multi-view reconstruction
CN115035184B (en) * 2022-06-13 2024-05-28 浙江大学 Honey pomelo volume estimation method based on lateral multi-view reconstruction
CN117706942A (en) * 2024-02-05 2024-03-15 四川大学 Environment sensing and self-adaptive driving auxiliary electronic control method and system
CN117706942B (en) * 2024-02-05 2024-04-26 四川大学 Environment sensing and self-adaptive driving auxiliary electronic control method and system
CN117893412A (en) * 2024-03-15 2024-04-16 北京理工大学 Point cloud data filtering method, device, equipment and storage medium
CN117893412B (en) * 2024-03-15 2024-06-11 北京理工大学 Point cloud data filtering method, device, equipment and storage medium
CN118349694A (en) * 2024-06-17 2024-07-16 山东大学 Method and system for generating ramp converging region vehicle track database
CN118429914A (en) * 2024-07-04 2024-08-02 中国科学院长春光学精密机械与物理研究所 Vehicle detection method based on combination of unmanned aerial vehicle visible light video and SAR image
CN118799689A (en) * 2024-09-11 2024-10-18 成都赛力斯科技有限公司 Target detection method and device based on multi-mode fusion, electronic equipment and medium

Also Published As

Publication number Publication date
WO2022086739A3 (en) 2022-06-23
DE112021005607T5 (en) 2023-08-24

Similar Documents

Publication Publication Date Title
US11430224B2 (en) Systems and methods for camera-LiDAR fused object detection with segment filtering
US12135375B2 (en) Systems and methods for camera-LiDAR fused object detection with local variation segmentation
US12050273B2 (en) Systems and methods for camera-LiDAR fused object detection with point pruning
US12122428B2 (en) Systems and methods for camera-LiDAR fused object detection with segment merging
US12118732B2 (en) Systems and methods for object detection with LiDAR decorrelation
US11885886B2 (en) Systems and methods for camera-LiDAR fused object detection with LiDAR-to-image detection matching
WO2022086739A2 (en) Systems and methods for camera-lidar fused object detection
Jebamikyous et al. Autonomous vehicles perception (avp) using deep learning: Modeling, assessment, and challenges
CN111615703B (en) Sensor Data Segmentation
US11900692B2 (en) Multi-modal, multi-technique vehicle signal detection
US12050661B2 (en) Systems and methods for object detection using stereovision information
US20230252638A1 (en) Systems and methods for panoptic segmentation of images for autonomous driving
US20230123184A1 (en) Systems and methods for producing amodal cuboids
US20230373520A1 (en) System and method for generating information on remainder of measurement using sensor data
EP4148600A1 (en) Attentional sampling for long range detection in autonomous vehicles
KR20230167694A (en) Automatic lane marking extraction and classification from lidar scans
CN117593892B (en) Method and device for acquiring true value data, storage medium and electronic equipment
CN117591847B (en) Model pointing evaluating method and device based on vehicle condition data
US20240265298A1 (en) Conflict analysis between occupancy grids and semantic segmentation maps
Vega Deep Learning for Multi-modal Sensor Data Fusion in Autonomous Vehicles
Moonjarin et al. Automated overtaking assistance system: a real-time approach using deep learning techniques
WO2023146693A1 (en) False track mitigation in object detection systems
CN117152579A (en) System and computer-implemented method for a vehicle

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 202180085904.7

Country of ref document: CN

122 Ep: pct application non-entry in european phase

Ref document number: 21883552

Country of ref document: EP

Kind code of ref document: A2