EP4229602A2 - Detection and ranging based on a single monoscopic frame - Google Patents

Detection and ranging based on a single monoscopic frame

Info

Publication number: EP4229602A2
Authority: EP; European Patent Office
Prior art keywords: digital image; image; stereoscopic; monoscopic; stereoscopic image
Prior art date: 2019-01-25
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.): Pending

Application number

EP20805109.4A

Other languages

German (de)

English (en)

French (fr)

Inventor

Behrooz MALEKI

Sarvenaz SARKHOSH

Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)

Bitanimate Inc

Original Assignee

Bitanimate Inc

Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)

2019-01-25

Filing date

2020-01-27

Publication date

2023-08-23

2020-01-27 Application filed by Bitanimate Inc filed Critical Bitanimate Inc

2023-08-23 Publication of EP4229602A2 publication Critical patent/EP4229602A2/en

Status Pending legal-status Critical Current

Classifications

- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/207—Image signal generators using stereoscopic image cameras using a single 2D image sensor
- H04N13/211—Image signal generators using stereoscopic image cameras using a single 2D image sensor using temporal multiplexing
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/204—Image signal generators using stereoscopic image cameras
- H04N13/239—Image signal generators using stereoscopic image cameras using two 2D image sensors having a relative position equal to or related to the interocular distance
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/261—Image signal generators with monoscopic-to-stereoscopic image conversion
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/271—Image signal generators wherein the generated image signals comprise depth maps or disparity maps
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N13/20—Image signal generators
- H04N13/275—Image signal generators from 3D object models, e.g. computer-generated stereoscopic image signals
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N2013/0074—Stereoscopic image analysis
- H04N2013/0081—Depth or disparity estimation from stereoscopic image signals

Definitions

the embodiments discussed in this disclosure relate to detection and ranging based on a single monoscopic frame.
Detection and ranging applications have increased in demand with the advent of autonomous and semi-autonomous vehicles.
an ability to detect and range objects in an environment becomes increasingly helpful.
Further considerations of autonomous and semi-autonomous operation of vehicles may include safety, such as an ability to stay on a trajectory of travel and avoid collisions with objects. Accordingly, some systems have been developed for detection, ranging, and/or safety purposes.
actual three-dimensional cameras may be used to capture three-dimensional images.
a multitude of monoscopic cameras may be employed to create a three-dimensional effect when the combined images from all the different cameras are stitched together.
Such systems are vision-based, while other conventional systems may be signal-based.
RADAR uses radio signals
LIDAR uses laser signals to detect and range objects.
each of the foregoing conventional systems may be deficient in one or more aspects.
three- dimensional cameras are bulky and/or expensive, as is LIDAR technology or a host of monoscopic cameras like the approximately eight cameras used by some TESLA® autonomous/semi-autonomous vehicles.
LIDAR may have limited usage at nighttime, in cloudy weather, or at high altitudes (e.g., above 2000 meters).
RADAR may not detect small objects or provide a precise image of an object due to wavelength of the radio signals.
binocular vision system uses two eyes spaced approximately two and a half inches (approximately 6.5 centimeters) apart. Each eye sees the world from a slightly different perspective. The brain uses the difference in these perspectives to calculate or gauge distance.
This binocular vision system is partly responsible for the ability to determine with relatively good accuracy the distance of an object. The relative distance of multiple objects in a field-of-view may also be determined with the help of binocular vision.
Three-dimensional (stereoscopic) imaging takes advantage of the depth perceived by binocular vision by presenting two images to a viewer where one image is presented to one eye (e.g., the left eye) and the other image is presented to the other eye (e.g., the right eye).
the images presented to the two eyes may include substantially the same elements, but the elements in the two images may be offset from each other to mimic the offsetting perspective that may be perceived by the viewer’s eyes in everyday life. Therefore, the viewer may perceive depth in the elements depicted by the images.
one or more stereoscopic images may be generated based on a single monoscopic image that may be obtained from a camera sensor.
the stereoscopic images may each include the first digital image and the second digital image that, when viewed using any suitable stereoscopic viewing technique, may result in a user or software program receiving a three-dimensional effect with respect to the elements included in the stereoscopic images.
the monoscopic image may depict a geographic setting of a particular geographic location and the resulting stereoscopic image may provide a three-dimensional (3D) rendering of the geographic setting.
Use of the stereoscopic image may help a system obtain more accurate detection and ranging capabilities.
Reference to a“stereoscopic image” in the present disclosure may refer to any configuration of the first digital image (monoscopic) and the second digital image (monoscopic) that together may generate a 3D effect as perceived by a viewer or software program.
Figure 1A illustrates an example system configured to generate stereoscopic (3D) images, according to some embodiments of the present disclosure.
Figure IB illustrates an example environment in which stereoscopic image generation based on a single monoscopic frame occurs.
Figure 2 illustrates an example flow diagram of a method for topological optimization of graph-based models.
Figure 3 illustrates an example system that may be used in topological optimization of graph-based models.
Figure 4 illustrates an example of a depth map generated by a detection application and/or a stereoscopic image module.
Figure 5 illustrates an example of a stereoscopic pair provided to a graph-based model for training purposes.
FIG. 1A illustrates an example system 100 configured to generate stereoscopic (3D) images, according to some embodiments of the present disclosure.
the system 100 may include a stereoscopic image generation module 104 (referred to hereinafter as“stereoscopic image module 104”) configured to generate one or more stereoscopic images 108.
the stereoscopic image module 104 may include any suitable system, apparatus, or device configured to receive monoscopic images 102 and to generate each of the stereoscopic images 108 based on two or more of the monoscopic images 102.
the stereoscopic image module 104 may include software that includes computer-executable instructions configured to cause a processor to perform operations for generating the stereoscopic images 108 based on the monoscopic images 102.
the monoscopic images 102 may include digital images obtained by a camera sensor that depict a setting.
the monoscopic images 102 may include digital images that depict an object in the setting.
the object may be any element that is visually detectable, such as a tree, a pedestrian, a flying bird, an airplane, an airborne missile, a ship, a buoy, a river or ocean, a curb, a traffic sign, traffic lines (e.g., double lines indicating a“no pass zone”), a mountain, a wall, a house, a fire hydrant, a dog, or any other suitable object visually detectable by a camera sensor.
traffic lines e.g., double lines indicating a“no pass zone”
the stereoscopic image module 104 may be configured to acquire the monoscopic images 102 via a detection application communicatively coupled to the camera sensor.
“detection application” is short for“detection and ranging application.”
the stereoscopic image module 104 may be configured to access the detection application (such as the detection application 124 of FIG. IB) via any suitable network such as the network 128 of FIG. IB to request the monoscopic images 102 from the detection application.
the detection application and associated monoscopic images 102 may be stored on a same device that may include the stereoscopic image module 104.
the stereoscopic image module 104 may be configured to access the detection application stored on the device to request the monoscopic images 102 from a storage area of the device on which they may be stored.
the stereoscopic image module 104 may be included with the detection application in which the stereoscopic image module 104 may obtain the monoscopic images 102 via the detection application by accessing portions of the detection application that control obtaining the monoscopic images 102.
the stereoscopic image module 104 may be separate from the detection application (e.g., as shown in FIG. IB), but may be configured to interface with the detection application to obtain the monoscopic images 102.
the stereoscopic image module 104 may be configured to generate the stereoscopic images 108 as indicated below.
the description is given with respect to generation of an example stereoscopic image 120 (illustrated in FIG. IB and described below), which may be an example of one of the stereoscopic images 108 of FIG. 1 A. Further, the description is given with respect to generation of the stereoscopic image 120 based on an example first digital image 110 and an example second digital image 112, which are illustrated in FIG. IB.
the first digital image 110 and the second digital image 112 are examples of monoscopic images that may be included with the monoscopic images 102 of FIG. 1 A.
FIG. IB illustrates an example environment 105 in which stereoscopic image generation based on a single monoscopic frame occurs.
the elements of FIG. IB may be arranged according to one or more embodiments of the present disclosure.
FIG. IB includes: a machine 122 having a detection application 124 and a computing system 126; a network 128; and a stereoscopic image module 130 having a graph-based model 132 and a computing system 134. Further illustrated are a setting 109, a first digital image 110, a second digital image 112, a focal point 113, a camera 114, focal distances 115a/115b, an imaginary camera 116, and a displacement factor 118.
the stereoscopic image module 130 may be the same as or similar to the stereoscopic image module 104 described above in conjunction with FIG. 1A. Additionally or alternatively, the computing system 126 and the computing system 134 may be the same as or similar to the system 300 described below in conjunction with FIG. 3.
the setting 109 may include any geographical setting in which the camera 114 may capture an image.
the setting 109 may include garages, driveways, streets, sidewalks, oceans, rivers, skies, forests, cities, villages, landing/launching areas such as airport runways and flight decks, warehouses, stores, inventory aisles, and any other suitable environment in which the machine 122 may detect and range objects.
the first digital image 110 may include any aspect and/or portion of the setting 109.
the first digital image 110 may include the focal point 113 based on the focal distance 115a of the camera 114.
the focal distance 115a to the focal point 113 may be a known constant based on specifications of the camera 114.
the camera 114 may be attached to the machine 122.
machine may refer to any device configured to store and/or execute computer code, e.g., executable instructions of a software application.
the machine may be movable from a first geographic position (e.g.,“Point A”) to a second geographic position (e.g.,“Point B”).
the machine 122 may be autonomous or semi-autonomous with respect to moving between geographic positions.
the machine 122 may be human-operated between geographic positions.
Examples of a machine 122 may include robots, drones, rockets, space stations, self driving cars/trucks, human-operated cars/trucks, equipment (e.g., construction/maintenance equipment such as a backhoe, a street-sweeper, a steam roller, etc.), storage pods (e.g., a transportable storage unit, etc.), or any other suitable device configured to move between geographic positions.
equipment e.g., construction/maintenance equipment such as a backhoe, a street-sweeper, a steam roller, etc.
storage pods e.g., a transportable storage unit, etc.
the machine may include a device that is stationary, and in some embodiments, fixed in position.
the machine may include an anti-missile device stationed at a military base, a security device fixed at a perimeter of a prison, a hovering helicopter, or any other suitable machine, whether temporarily stationary or permanently fixed in position.
the machine may include a client device.
the client device may include a mobile phone, a smartphone, a tablet computer, a laptop computer, a desktop computer, a set-top box, a virtual -reality device, a wearable device, a connected device, any mobility device that has an operating system, a satellite, etc.
the detection and ranging capabilities of the machine 122 enabled by the present application may be advantageous in any variety of fields or industries, including, for example: commercial/industrial purposes, manufacturing purposes, military purposes (e.g., Army, Navy, National Guard, Marines, Air Force, and Space Force), government agency purposes (e.g., Federal Bureau of Investigations, Central Intelligence Agency, and National Transportation Safety Board), etc.
military purposes e.g., Army, Navy, National Guard, Marines, Air Force, and Space Force
government agency purposes e.g., Federal Bureau of Investigations, Central Intelligence Agency, and National Transportation Safety Board
the machine 122 may detect and/or range along a trajectory.
the trajectory may include any path of travel and/or a surrounding area for the machine 122, whether in air, on land, in space, or on water.
the camera 114 may be configured to capture in the first digital image 110 a portion of the traj ectory of the machine 122, e.g., the portion of the trajectory nearest to the machine 122, another portion of the trajectory farthest away from the machine 122, or another portion not necessarily part of the trajectory of the machine 122.
the camera 114 may capture a portion of the trajectory up to about two meters away from the machine 122; up to about five meters away from the machine 122; up to about twenty meters away from the machine 122; up to about fifty meters away from the machine 122; up to about one hundred meters away from the machine 122; up to about two hundred meters away from the machine 122; up to about five hundred meters away from the machine 122; up to about one thousand meters away from the machine 122; up to about five thousand meters away from the machine 122; etc.
the advancement of camera technology may continue to facilitate advantages in imaging speed, resolution, measurement accuracy, and focal distances.
the first digital image 110 captured by the camera 114 may be obtained by the detection application 124.
the detection application 124 may request the first digital image 110 from the camera 114. Additionally or alternatively, the detection application 124 may receive the first digital image 110 as sent from the camera 114.
the stereoscopic image module 130 may obtain the first digital image 110 from the detection application 124.
the stereoscopic image module 130 may request the first digital image 110 from the detection application 124.
the stereoscopic image module 130 may receive the first digital image 110 as sent from the detection application 124.
the stereoscopic image module 130 may obtain the first digital image 110 via the network 128, e.g., where the stereoscopic image module 130 is positioned remotely from the machine 122 as shown in FIG. IB, such as a remote server.
the remote server may be the same as or similar to the computing system 134.
the remote server may include one or more computing devices, such as a rackmount server, a router computer, a server computer, a personal computer, a mainframe computer, a laptop computer, a tablet computer, a desktop computer, a smartphone, cars, drones, a robot, any mobility device that has an operating system, etc.), data stores (e.g., hard disks, memories, databases), networks, software components, and/or hardware components.
the stereoscopic image module 130 may obtain the first digital image 110 without the network 128, e.g., where the stereoscopic image module 130 is integrated with the machine 122 (e.g., not positioned at the remote server).
the network 128 may be any network or configuration of networks configured to send and receive communications between systems and devices.
the network 128 may include a conventional type network, a wired or wireless network, and may have numerous different configurations. Additionally or alternatively, the network 128 may include any suitable topology, configuration or configurations including a star configuration, token ring configuration, or other configurations.
the network 128 may include a local area network (LAN), a wide area network (WAN) (e.g., the Internet), DECT ULE, and/or other interconnected data paths across which multiple devices may communicate.
the network 128 may include a peer-to-peer network.
the network 128 may also be coupled to or include portions of a telecommunications network that may enable communication of data in a variety of different communication protocols.
the network 128 may include BlueTooth® communication networks (e.g., MESH Bluetooth) and/or cellular communication networks for sending and receiving data including via short messaging service (SMS), multimedia messaging service (MMS), hypertext transfer protocol (HTTP), direct data connection, wireless application protocol (WAP), e-mail, or the like.
SMS short messaging service
MMS multimedia messaging service
HTTP hypertext transfer protocol
WAP wireless application protocol
the network 128 may include WiFi, NFC, LTE, LTE-Advanced, 1G, 2G, 3G, 4G, 5G, etc., ZigBee®, LoRA®— a wireless technology developed to enable low data rate communications to be made over long distances by sensors and actuators for machine to machine communication and internet of things (IoT) applications— wireless USB, or any other such wireless technology.
WiFi Wireless Fidelity
NFC Wireless Fidelity
LTE Long Term Evolution
LTE-Advanced 1G, 2G, 3G, 4G, 5G, etc.
ZigBee® LoRA®
IoT internet of things
the stereoscopic image module 130 may input the first digital image 110 into the graph-based model 132.
the term“graph-based model” may include a deep neural network, a deep belief network, a recurrent neural network, or some other graph model such as a genetic programming model or a tree-based or forest- based machine learning model.
the graph-based model 132 may include any artificial intelligence system or learning-based mechanism, examples of which may include: perceptron, multilayer peceptron, feed forward, radial basis network, deep feed forward, recurrent neural network, long/short term memory, gated recurrent unit, auto encoder, variational auto encoder, denoising auto encoder, sparse auto encoder, any sequence-to-sequence model, shallow neural networks, markov chain, hopfield network, boltzmann machine, restricted boltzmann machine, deep belief network, deep convolutional network, convolutional neural network (e.g., VGG- 16), deconvolutional network, deep convolutional inverse graphics network, modular neural network, generative adversarial network, liquid state machine, extreme learning machine, echo state network, recursive neural network, deep residual network, kohonen network, support vector machine, neural turing machine, etc.
any artificial intelligence system or learning-based mechanism examples of which may include: perceptron, multilayer peceptron, feed forward
the graph-based model 132 may be trained to generate (e.g., with help of the system 134) the second digital image 112 based on input in the form of the first digital image 110.
the training of the graph-based model 132 is described later in this disclosure.
the second digital image 112 may be configured to be an image of a same area or a similar area of the setting 109.
the first digital image 110 and the second digital image 112 may substantially overlap.
data may be discarded that corresponds to portions where the first digital image 110 the second digital image 112 do not overlap.
the second digital image 112 may be generated as a monoscopic image that visually mimics what the imaginary camera 116 would image if the imaginary camera 116 were an actual camera like the camera 114.
the imaginary camera 116 is virtually positioned at a different position from an actual position of the camera 114.
an object imaged in the first digital image 110 may be imaged from a first position and/or at a first angle.
the object may be imaged in the second digital image 112 from a second position and/or at a second angle such that the second position and/or the second angle are different from the first position and the first angle, respectively.
the stereoscopic image 120 with perceptible depth may be generated using the first digital image 110 captured by the camera 114 and the second digital image 112 generated by the stereoscopic image module 130.
the positional relationship of the camera 114 relative to the imaginary camera 116 may include the displacement factor 118.
the displacement factor 118 may include: an angle or orientation with respect to one or more axes (e.g., roll, pitch, and yaw), an offset lateral distance or offset vertical height, etc.
the displacement factor 118 may be a known constant.
the displacement factor 118 may be set at a value such that the stereoscopic image 120 resulting from the second digital image 112 is of sufficient quality and accuracy.
the displacement factor 118 may be set at a value such that distance measurements based on the stereoscopic image 120 are sufficiently accurate and/or fit a certain model.
the stereoscopic image 120 may be used to generate a depth map.
the detection application 124 and/or the stereoscopic image module 130 may generate the depth map.
An example of a depth map is illustrated in FIG. 4.
the depth map may include a corresponding pixel for each pixel in the stereoscopic image 120.
Each corresponding pixel in the depth map may be representative of relative distance data from the camera 114 for each respective pixel in the stereoscopic image 120.
a pixel in the depth map having a certain shade of purple or gray-scale may correspond to a particular relative distance, which is not an actual distance value.
a pixel in a first depth map and a pixel in a second depth map may include a same shade of color or gray-scale, yet have different actual distance values (e.g., even orders of magnitude different actual distance values).
color or gray-scale in the generated depth map does not represent an actual distance value for a pixel; rather, the color or gray-scale of a pixel in the generated depth map may represent a distance value relative to adjacent pixels.
a subset of pixels of a total amount of pixels in the depth map may be associated with an object.
the detection application 124 and/or the stereoscopic image module 130 may determine that the subset of pixels in the depth map is indicative of the object. In this manner, a presence of an object may be preliminarily identified or detected, though not necessarily ranged.
a portion of the subset of pixels associated with the object may be analyzed. In some embodiments, the portion of the subset of pixels may be analyzed as opposed to the entire subset of pixels associated with the object to reduce computational overhead, increase ranging speed, etc.
every pixel associated with a pedestrian (e.g., the feet, legs, torso, neck, and head) need not all be ranged. Rather, one or more portions of pixels associated with the pedestrian may be considered as representative of where the pedestrian is located relative to the camera 114 for ranging purposes.
the subset of pixels associated with the object may be averaged, segmented, or otherwise simplified to a portion of the subset of pixels.
a resolution of one or both of the stereoscopic image 120 and the depth map may be temporarily decreased (and later restored to original resolution). In this manner, the portion of the subset of pixels may include relative distance data sufficiently representative of the object.
the relative distance data for the object may be converted to an actual distance value (e.g., in inches, feet, meters, kilometers, etc.).
an actual distance value e.g., in inches, feet, meters, kilometers, etc.
the relative distance data in the depth map may decrease in accuracy.
an amount of offset from the actual distance data may be graphed or fitted to a curve as a function of actual distance.
a curve of correction values may be implemented to correct an offset from the actual distance data.
the graph-based model 132 may be trained to generate the second digital image 112 based on a single monoscopic image such as the first digital image 110 for subsequent generation of the stereoscopic image 120.
stereoscopic pair images may be provided to the graph-based model 132.
the stereoscopic pair images may include a first monoscopic image and a second monoscopic image.
An example of a stereoscopic pair provided to the graph-based model 132 for training purposes is illustrated in FIG. 5.
the first monoscopic image and the second monoscopic image may include images taken of any same or similar setting, but from different positions and/or angles.
the first monoscopic image and the second monoscopic image taken together may for a stereoscopic pair with perceivable depth.
the first monoscopic image and the second monoscopic image may include a setting 109 of any type, nature, location, or subject.
Some stereoscopic pair images may be related by type, nature, location, or subject; however, diversity among the stereoscopic pair images in addition to increased quantity may help improve a training quality or capability of the graph-based model 132 to generate the second digital image 112 and the stereoscopic image 120 of sufficient quality and accuracy.
the training of the graph-based model 132 may occur on a server side, e.g., at the stereoscopic image module 130 when positioned remotely from the machine 122. Additionally or alternatively, the training of the graph-based model 132 may be a one time process, after which generation of the second digital image 112 and stereoscopic image 120 may be enabled. In other embodiments, the training of the graph-based model 132 may occur on an as-needed basis, a rolling basis (e.g., continually), or on an interval basis (e.g., a predetermined schedule). As an example of an as-needed basis, inaccuracies or safety threats may come to light, e.g., in the event of a safety violation or accident.
additional training focused on inaccuracies or safety threats may be provided to the graph-based model 132.
one or more aspects of training of the graph-based model 132 may occur at the machine 122, e.g., via the detection application 124.
feedback may be received at the graph-based model 132 from: the detection application 124 via the machine 122, a user of the machine 122 via the machine 122, a third-party such as a law enforcement officer, etc.
Modifications, additions, or omissions may be made to the environment 105 without departing from the scope of the present disclosure.
the environment 105 may include other elements than those specifically listed. Additionally, the environment 105 may be included in any number of different systems or devices.
FIG. 2 illustrates an example flow diagram of a method 200 for topological optimization of graph-based models.
the method 200 may be arranged in accordance with at least one embodiment described in the present disclosure.
the method 200 may be performed, in whole or in part, in some embodiments by the software system and/or a processing system, such as a system 300 described below in conjunction with FIG. 3.
some or all of the steps of the method 200 may be performed based on the execution of instructions stored on one or more non-transitory computer-readable media.
various blocks may be divided into additional blocks, combined into fewer blocks, or eliminated, depending on the desired implementation.
the method 200 may begin at block 205 at which a first digital image is obtained via one or both of a detection application and a camera sensor.
the first digital image may be a monoscopic image that depicts a setting from a first position of the camera sensor communicatively coupled to the detection application.
the first digital image may include a trajectory of a machine.
a second digital image may be generated based on the first digital image.
the second digital image may be a monoscopic image that depicts a setting from a second position different from the first position.
the second digital image is not an image captured by a camera, such as the camera capturing the first digital image of block 205.
a stereoscopic image of the setting may be generated.
the stereoscopic image may include the first digital image and the second digital image.
the stereoscopic image may be an image from which detection and ranging determinations may be based.
the blocks of the methods may be implemented in differing order. Furthermore, the blocks are only provided as examples, and some of the blocks may be optional, combined into fewer blocks, or expanded into additional blocks.
one or more additional blocks may be included in the method 200 that include obtaining a plurality of stereoscopic pair images that each includes a first monoscopic image and a second monoscopic image; sending the plurality of stereoscopic pair images as inputs into a graph-based model.
the graph-based model may be trained to know how to generate a second digital image of block 210 based on the first digital image for subsequent generation of the stereoscopic image of block 215.
one or more additional blocks may be included in the method 200 that include sending the first digital image as an input into the graph-based model, wherein the second digital image is output from the graph-based model based on one or both of the plurality of stereoscopic pair images and the first digital image input into the graph-based model.
one or more additional blocks may be included in the method 200 that include generating a depth map that includes a corresponding pixel for each pixel in the stereoscopic image, each corresponding pixel in the depth map representative of relative distance data from the camera sensor for each respective pixel in the stereoscopic image.
one or more additional blocks may be included in the method 200 that include associating a subset of pixels of a total amount of pixels in the depth map as indicative of an object; and based on a portion of the subset of pixels in the depth map associated with the object, obtaining an actual distance from the camera sensor to the object in the stereoscopic image using: the relative distance data of the portion associated with object; a focal point of the first digital image and the second digital image; and a displacement factor between the first digital image and the second digital image.
obtaining an actual distance to the object may include determining a correction value that compensates for an offset in distance measurements based on perceived depth in the stereoscopic image.
one or more additional blocks may be included in the method 200 that include sending a warning for presentation via the detection application when the actual distance to the object satisfies a first threshold distance; and/or causing, via the detection application, a machine communicatively coupled to the detection application to perform a corrective action when the actual distance to the object satisfies a second threshold distance.
the first threshold may distance and the second threshold distance may be the same, while in other embodiments, different distances to the detected object. Additionally or alternatively, the first threshold distance and/or the second threshold distance may vary depending on any of a myriad of factors.
contributing factors affecting the first and second threshold differences may include: a speed of the machine and/or object, a trajectory of the machine and/or object, regulating rules or laws, a cost/benefit analysis, a risk predictive analysis, or any other suitable type of factor in which a threshold distance between the machine and a detected object may be merited.
the warning for presentation (e.g., at a display) via the detection application may include a visual warning signal and/or an audible warning signal. Additionally or alternatively, the detection application may cause the machine to perform a corrective action that includes stopping the machine, slowing the machine, swerving the machine, dropping/raising an altitude of the machine, an avoiding maneuver, or any other suitable type of corrective action to mitigate damage to the machine and the object and/or prevent contact between the machine and the object.
one or more additional blocks may be included in the method 200 that include determining a presence of an object within the stereoscopic image; and based on image recognition processing of the object via a graph-based model, classifying the object.
determining a presence of an object within the stereoscopic image may include an analysis of pixels within the stereoscopic image and/or within the depth map. For example, if a group of pixels form an example shape or comprise a particular color or gray-scale, the presence of an object may be inferred. In these or other embodiments, recognition of the object may be a separate step.
image recognition may include image recognition training of a graph-based model.
the graph-based model may be fed input data (e.g., images of objects), and output of the graph-based model (e.g., guesses) may be compared to expected results such as predetermined or human designated labels.
weights, biases, and other parameters in the graph-based model may be modified to decrease the error rate of the guesses.
weights in the graph-based model may be adjusted so that the guesses better match the predetermined or human designated labels of the images of objects.
the input data fed to the graph-based model for training purposes may include images of a host of different objects. Hundreds, thousands, or millions of images of objects may be provided to the graph-based model. Additionally or alternatively, the images of the objects provided to the graph-based model may include labels that correspond to one or more features, pixels, boundaries, or any other detectable aspect of the objects.
additional or alternative image recognition techniques may be used with the graph-based model to classify the objects. Examples may include using: greyscale; RGB (red, green, and blue) values ranging from, for example, zero to 255; pre processing techniques (e.g., image cropping/flipping/angle manipulation, adjustment of image hue, contrast and saturation, etc.); testing subsets or small batch sizes of data as opposed to entire datasets; and max-pooling to reduce the dimensions of an image by taking the maximum pixel value of a grid.
pre processing techniques e.g., image cropping/flipping/angle manipulation, adjustment of image hue, contrast and saturation, etc.
max-pooling to reduce the dimensions of an image by taking the maximum pixel value of a grid.
FIG. 3 illustrates an example system 300 that may be used in topological optimization of graph-based models.
the system 300 may be arranged in accordance with at least one embodiment described in the present disclosure.
the system 300 may include a processor 310, memory 312, a communication unit 316, a display 318, a user interface unit 320, and a peripheral device 322, which all may be communicatively coupled.
the system 300 may be part of any of the systems or devices described in this disclosure.
the processor 310 may include any suitable special-purpose or general- purpose computer, computing entity, or processing device including various computer hardware or software modules and may be configured to execute instructions stored on any applicable computer-readable storage media.
the processor 310 may include a microprocessor, a microcontroller, a digital signal processor (DSP), an application-specific integrated circuit (ASIC), a Field-Programmable Gate Array (FPGA), or any other digital or analog circuitry configured to interpret and/or to execute program instructions and/or to process data.
DSP digital signal processor
ASIC application-specific integrated circuit
FPGA Field-Programmable Gate Array
the processor 310 may include any number of processors distributed across any number of networks or physical locations that are configured to perform individually or collectively any number of operations described in this disclosure.
the processor 310 may interpret and/or execute program instructions and/or process data stored in the memory 312.
the processor 310 may execute the program instructions stored in the memory 312.
the processor 310 may execute program instructions stored in the memory 312 that are related detection and ranging based on a single monoscopic frame.
instructions may be used to perform one or more operations or functions described in the present disclosure.
the memory 312 may include computer-readable storage media or one or more computer-readable storage mediums for carrying or having computer-executable instructions or data structures stored thereon.
Such computer-readable storage media may be any available media that may be accessed by a general-purpose or special-purpose computer, such as the processor 310.
such computer-readable storage media may include non-transitory computer-readable storage media including Random Access Memory (RAM), Read-Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM) or other optical disk storage, magnetic disk storage or other magnetic storage devices, flash memory devices (e.g., solid state memory devices), or any other storage medium which may be used to carry or store particular program code in the form of computer-executable instructions or data structures and which may be accessed by a general-purpose or special-purpose computer. Combinations of the above may also be included within the scope of computer-readable storage media.
RAM Random Access Memory
ROM Read-Only Memory
EEPROM Electrically Erasable Programmable Read-Only Memory
CD-ROM Compact Disc Read-Only Memory
flash memory devices e.g., solid state memory devices
Combinations of the above may also be included within the scope of computer-readable storage media.
Computer-executable instructions may include, for example, instructions and data configured to cause the processor 310 to perform a certain operation or group of operations as described in this disclosure.
the term“non-transitory” as explained in the present disclosure should be construed to exclude only those types of transitory media that were found to fall outside the scope of patentable subject matter in the Federal Circuit decision of In re Nuijten, 500 F.3d 1346 (Fed. Cir. 2007). Combinations of the above may also be included within the scope of computer-readable media.
the communication unit 316 may include any component, device, system, or combination thereof that is configured to transmit or receive information over a network. In some embodiments, the communication unit 316 may communicate with other devices at other locations, the same location, or even other components within the same system.
the communication unit 316 may include a modem, a network card (wireless or wired), an infrared communication device, a wireless communication device (such as an antenna), and/or chipset (such as a Bluetooth device, an 802.6 device (e.g., Metropolitan AreaNetwork (MAN)), a Wi-Fi device, a WiMax device, cellular communication facilities, etc.), and/or the like.
the communication unit 316 may permit data to be exchanged with a network and/or any other devices or systems described in the present disclosure.
the display 318 may be configured as one or more displays, like an LCD, LED, or other type of display.
the display 318 may be configured to present topologies, indicate mutations to topologies, indicate warning notices, show validation performance improvement values, display weights, biases, etc., and other data as directed by the processor 310.
the user interface unit 320 may include any device to allow a user to interface with the system 300.
the user interface unit 320 may include a mouse, a track pad, a keyboard, buttons, and/or a touchscreen, among other devices.
the user interface unit 320 may receive input from a user and provide the input to the processor 310.
the user interface unit 320 and the display 318 may be combined.
the peripheral devices 322 may include one or more devices.
the peripheral devices may include a sensor, a microphone, and/or a speaker, among other peripheral devices.
system 300 may include any number of other components that may not be explicitly illustrated or described. Further, depending on certain implementations, the system 300 may not include one or more of the components illustrated and described.
the use of the term“and/or” is intended to be construed in this manner.
the terms“about,”“substantially,” and“approximately” should be interpreted to mean a value within 10% of an actual value, for example, values like 3 mm or 100% (percent).
any disjunctive word or phrase presenting two or more alternative terms, whether in the description, claims, or drawings, should be understood to contemplate the possibilities of including one of the terms, either of the terms, or both terms.
the phrase“A or B” should be understood to include the possibilities of“A” or“B” or“A and B.”
first,”“second,”“third,” etc. are not necessarily used herein to connote a specific order or number of elements.
the terms“first,” “second,”“third,” etc. are used to distinguish between different elements as generic identifiers. Absence a showing that the terms“first,”“second,”“third,” etc., connote a specific order, these terms should not be understood to connote a specific order. Furthermore, absence a showing that the terms“first,”“second,”“third,” etc., connote a specific number of elements, these terms should not be understood to connote a specific number of elements.
a first widget may be described as having a first side and a second widget may be described as having a second side.
the use of the term“second side” with respect to the second widget may be to distinguish such side of the second widget from the“first side” of the first widget and not to connote that the second widget has two sides.

Landscapes

Engineering & Computer Science (AREA)
Multimedia (AREA)
Signal Processing (AREA)
Image Analysis (AREA)
Testing, Inspecting, Measuring Of Stereoscopic Televisions And Televisions (AREA)
Studio Devices (AREA)
Control Of Driving Devices And Active Controlling Of Vehicle (AREA)
Processing Or Creating Images (AREA)

EP20805109.4A 2019-01-25 2020-01-27 Detection and ranging based on a single monoscopic frame Pending EP4229602A2 (en)

Applications Claiming Priority (2)

Application Number	Priority Date	Filing Date	Title
US201962797114P	2019-01-25	2019-01-25
PCT/US2020/015278 WO2020231484A2 (en)	2019-01-25	2020-01-27	Detection and ranging based on a single monoscopic frame

Publications (1)

Publication Number	Publication Date
EP4229602A2 true EP4229602A2 (en)	2023-08-23

Family

ID=71794267

Family Applications (1)

Application Number	Title	Priority Date	Filing Date
EP20805109.4A Pending EP4229602A2 (en)	2019-01-25	2020-01-27	Detection and ranging based on a single monoscopic frame

Country Status (4)

Country	Link
EP (1)	EP4229602A2 (zh)
JP (1)	JP2022518532A (zh)
CN (1)	CN111491154A (zh)
WO (1)	WO2020231484A2 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
WO2023214790A1 (ko) *	2022-05-04	2023-11-09	한화비전 주식회사	영상 분석 장치 및 방법
WO2024063242A1 (ko) *	2022-09-20	2024-03-28	한화비전 주식회사	영상 분석 장치 및 방법

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number	Priority date	Publication date	Assignee	Title
US20120300034A1 (en) *	2011-05-23	2012-11-29	Qualcomm Incorporated	Interactive user interface for stereoscopic effect adjustment
WO2015017855A1 (en) *	2013-08-02	2015-02-05	Xactware Solutions, Inc.	System and method for detecting features in aerial images using disparity mapping and segmentation techniques
US9721385B2 (en) *	2015-02-10	2017-08-01	Dreamworks Animation Llc	Generation of three-dimensional imagery from a two-dimensional image using a depth map
US10129530B2 (en) *	2015-09-25	2018-11-13	Intel Corporation	Video feature tagging
CN107580209B (zh) *	2017-10-24	2020-04-21	维沃移动通信有限公司	一种移动终端的拍照成像方法及装置

2019
- 2019-04-23 CN CN201910328119.4A patent/CN111491154A/zh active Pending
2020
- 2020-01-27 WO PCT/US2020/015278 patent/WO2020231484A2/en active Application Filing
- 2020-01-27 JP JP2021543198A patent/JP2022518532A/ja active Pending
- 2020-01-27 EP EP20805109.4A patent/EP4229602A2/en active Pending

Also Published As

Publication number	Publication date
WO2020231484A2 (en)	2020-11-19
JP2022518532A (ja)	2022-03-15
CN111491154A (zh)	2020-08-04
WO2020231484A3 (en)	2021-02-25

Publication	Publication Date	Title
Schedl et al.	2021	An autonomous drone for search and rescue in forests using airborne optical sectioning
Geraldes et al.	2019	UAV-based situational awareness system using deep learning
KR101836304B1 (ko)	2018-03-08	포인트 클라우드 데이터에 기반하는 차량 윤곽 검출 방법과 장치
US20100305857A1 (en)	2010-12-02	Method and System for Visual Collision Detection and Estimation
US20170277187A1 (en)	2017-09-28	Aerial Three-Dimensional Scanner
US20230213643A1 (en)	2023-07-06	Camera-radar sensor fusion using local attention mechanism
US11595634B2 (en)	2023-02-28	Detection and ranging based on a single monoscopic frame
CN108140245B (zh)	2022-08-23	测距方法、装置以及无人机
Yeum et al.	2017	Autonomous image localization for visual inspection of civil infrastructure
Aswini et al.	2018	UAV and obstacle sensing techniques–a perspective
KR20180133745A (ko)	2018-12-17	라이다 센서 및 팬틸트줌 카메라를 활용한 비행체 식별 시스템 및 그 제어 방법
Al-Sheary et al.	2017	Crowd monitoring system using unmanned aerial vehicle (UAV)
CN107464046A (zh)	2017-12-12	一种基于无人机的地质灾害监测评估系统
EP4229602A2 (en)	2023-08-23	Detection and ranging based on a single monoscopic frame
Bovcon et al.	2017	Improving vision-based obstacle detection on USV using inertial sensor
Jindal et al.	2021	Bollard segmentation and position estimation from lidar point cloud for autonomous mooring
Baeck et al.	2019	Drone based near real-time human detection with geographic localization
Arsenos et al.	2024	Common Corruptions for Evaluating and Enhancing Robustness in Air-to-Air Visual Object Detection
Li et al.	2024	High‐resolution model reconstruction and bridge damage detection based on data fusion of unmanned aerial vehicles light detection and ranging data imagery
Kim et al.	2021	Detecting and localizing objects on an unmanned aerial system (uas) integrated with a mobile device
Gur et al.	2020	Image processing based approach for crime scene investigation using drone
Xu et al.	2022	[Retracted] Multiview Fusion 3D Target Information Perception Model in Nighttime Unmanned Intelligent Vehicles
Benoit et al.	2024	Eyes-Out Airborne Object Detector for Pilots Situational Awareness
Kainth et al.	2023	Chasing the Intruder: A Reinforcement Learning Approach for Tracking Unidentified Drones
Kim et al.	2022	Imaging lidar prototype with homography and deep learning ranging methods

Legal Events

Date	Code	Title	Description
2020-11-20	STAA	Information on the status of an ep patent application or granted ep patent	Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE
2021-12-31	STAA	Information on the status of an ep patent application or granted ep patent	Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN
2023-07-14	STAA	Information on the status of an ep patent application or granted ep patent	Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE
2023-07-21	PUAI	Public reference made under article 153(3) epc to a published international application that has entered the european phase	Free format text: ORIGINAL CODE: 0009012
2023-07-21	STAA	Information on the status of an ep patent application or granted ep patent	Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE
2023-08-23	17P	Request for examination filed	Effective date: 20220825
2023-08-23	AK	Designated contracting states	Kind code of ref document: A2 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR