[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

WO2021126062A1 - Method and system for predicting movement - Google Patents

Method and system for predicting movement Download PDF

Info

Publication number
WO2021126062A1
WO2021126062A1 PCT/SE2020/051222 SE2020051222W WO2021126062A1 WO 2021126062 A1 WO2021126062 A1 WO 2021126062A1 SE 2020051222 W SE2020051222 W SE 2020051222W WO 2021126062 A1 WO2021126062 A1 WO 2021126062A1
Authority
WO
WIPO (PCT)
Prior art keywords
representations
movable object
behavioral
processing circuitry
sequence
Prior art date
Application number
PCT/SE2020/051222
Other languages
French (fr)
Inventor
Thomas KLINTBERG
Simon VAJEDI
Carl-Fredrik ALVEKLINT
Original Assignee
Forsete Group Ab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Forsete Group Ab filed Critical Forsete Group Ab
Publication of WO2021126062A1 publication Critical patent/WO2021126062A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G05CONTROLLING; REGULATING
    • G05DSYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
    • G05D1/00Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots
    • G05D1/0088Control of position, course, altitude or attitude of land, water, air or space vehicles, e.g. using automatic pilots characterized by the autonomous decision making process, e.g. artificial intelligence, predefined behaviours
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0011Planning or execution of driving tasks involving control alternatives for a single driving scenario, e.g. planning several paths to avoid obstacles
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W60/00Drive control systems specially adapted for autonomous road vehicles
    • B60W60/001Planning or execution of driving tasks
    • B60W60/0027Planning or execution of driving tasks using trajectory prediction for other traffic participants
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/13Satellite images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2420/00Indexing codes relating to the type of sensors based on the principle of their operation
    • B60W2420/40Photo, light or radio wave sensitive means, e.g. infrared sensors
    • B60W2420/403Image sensing, e.g. optical camera
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B60VEHICLES IN GENERAL
    • B60WCONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE OR DIFFERENT FUNCTION; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TO THE CONTROL OF A PARTICULAR SUB-UNIT
    • B60W2420/00Indexing codes relating to the type of sensors based on the principle of their operation
    • B60W2420/40Photo, light or radio wave sensitive means, e.g. infrared sensors
    • B60W2420/408Radar; Laser, e.g. lidar
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B64AIRCRAFT; AVIATION; COSMONAUTICS
    • B64UUNMANNED AERIAL VEHICLES [UAV]; EQUIPMENT THEREFOR
    • B64U2201/00UAVs characterised by their flight controls
    • B64U2201/10UAVs characterised by their flight controls autonomous, i.e. by navigating independently from ground or air stations, e.g. by using inertial navigation systems [INS]

Definitions

  • the present disclosure generally relates to a novel concept to use machine learning for predicting a future movement associated with a first movable object.
  • the present disclosure also relates to a corresponding system and a computer program product.
  • Predictive modeling generally refers to techniques for extracting information from data to build a model that can predict an output from a given input. Predicting an output can include predicting future trends or behavior patterns, to name a few examples.
  • Machine learning is used in relation to such predictive modeling.
  • Machine learning is a form of artificial intelligence that is employed to allow computers to evolve behaviors based on empirical data.
  • Machine learning may take advantage of training examples to capture characteristics of interest of their unknown underlying probability distribution. Training data may be seen as examples that illustrate relations between observed variables. Specifically, focus is nowadays towards ways to automatically learn to recognize complex patterns and make intelligent decisions based on data.
  • An autonomous vehicle is typically equipped a computer system implementing the prediction modelling based on information from a plurality of sensors, such as cameras, radar, and other similar devices, that is used to interpret a surrounding of the vehicle.
  • the computer system is adapted to executes numerous decisions while the autonomous vehicle is in motion, such as speeding up, slowing down, stopping, turning, etc.
  • Autonomous vehicles may also use the cameras, sensors, and global positioning devices to gather and interpret images and sensor data about its surrounding environment, e.g., pedestrians, bicyclists, other vehicles, parked cars, trees, buildings, etc.
  • a further example of a prediction-based implementation scheme is disclosure in US20190302767, specifically focusing on a system associated with predicting an intent of an object in an environment proximate an autonomous vehicle.
  • the prediction scheme suggested in US20190302767 is adapted to generate a discrete set of semantic intents, each of which can correspond to a candidate trajectory, staying stationary, changing lanes, or another action.
  • a computer implemented method for using machine learning to predict a future movement associated with a first movable object comprising the steps of receiving, at a processing circuitry, a sequence of representations indicative of present behavior of the first movable object, wherein the sequence of representations is acquired using a sensor and comprises a plurality of different behavioral parameters, forming, using the processing circuitry, a plurality of behavioral maps for the first movable object by transforming the plurality of representations for each of the plurality of different behavioral parameters using at least one machine learning component, applying, using the processing circuitry, a predetermined noise reduction or transformation scheme to at least a portion of the plurality of behavioral maps, wherein the predetermined noise reduction or transformation scheme is selected based on at least one of the behavioral parameters or the sensor used for acquiring the sequence of representations, and predicting, using the processing circuitry, the future movement of the first movable object based on the plurality of behavioral maps.
  • the present behavior of a first object is analyzed to determine a likely future “action” of the first object.
  • this is achieved by separating the received sequence of representations, for example being data received form an image sensor, a lidar sensor, a radar sensor, etc., into different behavioral parameters.
  • the different behavioral parameters for the first object present within the sequence of representations may generally be dependent on the type of object, whereby e.g. in case the first object is a human it will have different behavioral parameters as compared to an animal or a movable machine/robot.
  • the data relating to the behavioral parameters for the first object may then in line with the present disclosure be handled in parallel, using at least one machine learning component for forming behavioral maps for each of the plurality of different behavioral parameters.
  • the behavioral maps may in turn be seen as how the first object presently behave within its present context and depending on the specific behavioral parameter.
  • the behavioral maps are possibly combined (or fused together) for predicting the future movement of the first movable object. Accordingly, in line with the present disclosure the overall behavior of the first object is observed and analyzed, possibly segmented, to then be combined again for predicting the future movement of the first object.
  • the method according to the present disclosure further comprise the step of filtering at least a portion of the plurality of behavioral maps using a predetermined filtering scheme, where the filtering scheme is based on the behavioral parameter or a sensor used for acquiring the sequence of representations. That is, different type of filtering schemes may be used for different sensors used for acquiring the sequence of representations, with the intention to reduce an amount of noise being introduced when determining the plurality of behavioral maps.
  • the predetermined noise reduction or transformation scheme may also be used for general manipulation, for example including normalization, scaling and/or cropping, such as for example but not limited to an implementation where the sequence of representations comprise an image.
  • a prediction model which can generate a future sequence of one or more behavior parameters (i.e. the future predicted movement of the first object), based on a past sequence of the same representations, or based on a sequence of a larger set of behavior parameters.
  • This prediction is not of a discrete set of semantic intents, which is a significant difference, and which provides a unique technical effect for the present prediction scheme, for example including the possibility to make use and scale the present prediction scheme to be used in complex implementations including a large plurality of (first) objects, each associated with a large number of behavioral parameters.
  • the plurality of behavioral maps for the first movable object may be seen as a game changer, in comparison to prior-art implementations, specifically in relation to specific embodiments of the present disclosure where some changes to the first object (such as relating to gaze estimation) will be challenging to predict.
  • the behavioral parameters are selected from a group comprising object pose, head pose, eye gaze, velocity, position, acceleration or an interaction map for the first movable object.
  • interaction map should within the context of the present disclosure be understood to relate to how the first object relates to other first objects (such as when the first object is part of a “crowd”) and in relation to fixed objects in the vicinity of the first object. Accordingly, it may in some embodiments be desirable to allow the sequence of representations to comprise information relating to at least one static object located in a vicinity of the first movable object.
  • a dedicated machine learning component is selected for each of at least a portion of the behavioral parameters.
  • Such an implementation may for example make a formation of the dedicated machine learning component slightly simpler as such a component may be directed only to a specific behavioral parameter.
  • Such an implementation may also allow for a modularity of the implementation, possibly allowing for a flexible introduction of further (and updated) dedicated machine learning components over time. That said, within the context of the present disclosure it may also be possible to form a single machine learning component that has been formed to handle all of the different behavioral parameters handled in line with the present disclosure.
  • the at least one machine learning component reviews at least a present and a predetermined number of previous representations for the first movable object, where the predetermined number of previous representations for the first movable object in some embodiments may be dynamic. Accordingly, the machine learning component will in such an embodiment not just review the present behavior of the first object, but also take into account how the object just previously behaved. It may in line with the present disclosure be possible to apply different weights to the different representations, where representations in the past generally will be provided with a lower weight as compared to newly received representations.
  • the scheme according to the present disclosure is performed using processing circuitry comprised with a second object, the second object being different from the first object.
  • the second object may be stationary or movable.
  • a stationary second object may be a security camera, a computer system, etc.
  • a movable second object may generally be any form of craft or vessel.
  • Possible movable second objects also include any form of unmanned aerial vehicles (UAV) (such as drones) or vehicles (such for example as semi or fully autonomous vehicles). Further both present and future stationary or movable second objects are possible and within the scope of the present disclosure.
  • UAV unmanned aerial vehicles
  • An example of a stationary implementation may for example include an automatic self-checkout system in a store.
  • the second object may comprise a sensor collecting a sequence of images
  • the first object is a human.
  • the overall aim may for example be to predict an intention of a customer, such as for example identifying a product that the customer is picking from a shelf in in the store.
  • Such an implementation may however be expanded further and must not necessarily be directed to just an automatic self- checkout system in a store but may generally be used for predicting an intention of a customer in a store or any other area where e.g. a human is present.
  • control commands for the second movable object based on the predicted future movement of the first movable object, for example represent a trajectory for the second movable object.
  • control commands could be direct actuation commands for controlling an operation of the second object.
  • control commands for the second movable object are formed to reduce an interaction between the first and the second object, generally with the intention to ensure that second movable object avoid the visual field of the first movable object.
  • the overall intention may for example be to ensure that the vehicle is controlled in a way to ensure that the human stays safe without being hit by the car.
  • the second object may be an airborne drone on a recognizance mission with the intention of “not being seen” by the first object, where again the first object may be a human, for example being part of a plurality of humans arranged in a crowd.
  • the first object may be a human, for example being part of a plurality of humans arranged in a crowd.
  • it will typically be desirable to analyze the human(s) in regards to their movements/head pose/eye gaze, etc., and to form the control commands for the drone such that the drone is arranged to fly outside of e.g. a line of sight for the human(s).
  • the scheme according to the present disclosure may be adapted in a manner to increase the interaction between the first and the second object, such as in relation to implementations where it is desirable to arrange the first and the second object in close vicinity of each other.
  • Such embodiments may for example find its way into industrial applications where a robot being the second object is to interact with an animal being the first object.
  • a computer system comprising processing circuitry, the computer system arranged to predict a future movement associated with a first movable object by adapting the processing circuitry to receive a sequence of representations indicative of present behavior of the first movable object, wherein the sequence of representations is acquired using a sensor and comprises a plurality of different behavioral parameters, form a plurality of behavioral maps for the first movable object by transforming the plurality of representations for each of the plurality of different behavioral parameters using at least one machine learning component, apply a predetermined noise reduction or transformation scheme to at least a portion of the plurality of behavioral maps, wherein the predetermined noise reduction or transformation scheme is selected based on at least one of the behavioral parameters or the sensor used for acquiring the sequence of representations, and predict the future movement of the first movable object based on the plurality of behavioral maps.
  • the computer system may in some embodiments be comprised as an onboard component of a second object (being different from the first object).
  • the second object may as such be movable or stationary.
  • a computer program product comprising a non-transitory computer readable medium having stored thereon computer program means for operating a computer system to predict a future movement associated with a first movable object using machine learning
  • the computer system comprising processing circuitry
  • the computer program product comprises code for receiving, at the processing circuitry, a sequence of representations indicative of present behavior of the first movable object, wherein the sequence of representations is acquired using a sensor and comprises a plurality of different behavioral parameters, code for forming, using the processing circuitry, a plurality of behavioral maps for the first movable object by transforming the plurality of representations for each of the plurality of different behavioral parameters using at least one machine learning component, code for applying, using the processing circuitry, a predetermined noise reduction or transformation scheme to at least a portion of the plurality of behavioral maps, wherein the predetermined noise reduction or transformation scheme is selected based on at least one of the behavioral parameters or the sensor used for acquiring the sequence of representations, and code for
  • a software executed by the server for operation in accordance to the present disclosure may be stored on a computer readable medium, being any type of memory device, including one of a removable nonvolatile random access memory, a hard disk drive, a floppy disk, a CD-ROM, a DVD-ROM, a USB memory, an SD memory card, or a similar computer readable medium known in the art.
  • the present disclosure generally relates to a novel concept for using machine learning to predict a future movement associated with a first movable object.
  • the present disclosure also relates to a corresponding system and a computer program product.
  • Fig. 1 conceptually illustrates a computer system according to an embodiment of the present disclosure connected to a vehicle
  • Figs. 2A - 2C presents a possible use of the computer system in a relation to an airborne drone
  • Fig. 3 is a flow chart illustrating the steps of performing the method according to a currently preferred embodiment of the present disclosure.
  • Fig. 4 conceptually shows a possible implementation of a machine learning component that may be used on relation to the present disclosure.
  • FIG. 1 there is conceptually illustrated a computer system 100 according to an embodiment of the present disclosure.
  • the purpose of the computer system 100 is, in one embodiment, to dynamically observe and analyze a behavior of a first movable object for predicting the future movement of the first object.
  • the computer system 100 comprises processing circuitry 102 and a plurality of sensors, for example including a camera 104, a lidar sensor 106 and/or a radar sensor 108.
  • the processing circuitry 102 may for example be manifested as a general-purpose processor, an application specific processor, a circuit containing processing components, a group of distributed processing components, a group of distributed computers configured for processing, a field programmable gate array (FPGA), etc.
  • the processor may be or include any number of hardware components for conducting data or signal processing or for executing computer code stored in memory.
  • the memory may be one or more devices for storing data and/or computer code for completing or facilitating the various methods described in the present description.
  • the memory may include volatile memory or non volatile memory.
  • the memory may include database components, object code components, script components, or any other type of information structure for supporting the various activities of the present description.
  • any distributed or local memory device may be utilized with the systems and methods of this description.
  • the memory is communicably connected to the processor (e.g., via a circuit or any other wired, wireless, or network connection) and includes computer code for executing one or more processes described herein.
  • the computer system 100 is connected to a network, such as the Internet 110, allowing the computer system 100 to communicate and exchange information with e.g. a remotely located server 112, having a thereto connected remote database 114.
  • the remotely located server 112 may be arranged to receive information about the first object and to provide the computer system 100 with general directions for operation.
  • the computer system 100 may further comprise a transceiver 116 adapted to allow for any form of wireless connections like WLAN, CDMA, GSM, GPRS, 3/4/5G mobile communications, or similar. Other present of future wireless communication protocols are possible and within the scope of the present disclosure.
  • a transceiver 116 adapted to allow for any form of wireless connections like WLAN, CDMA, GSM, GPRS, 3/4/5G mobile communications, or similar.
  • Other present of future wireless communication protocols are possible and within the scope of the present disclosure.
  • Figs. 2A - 2C there is shown a possible approach of implementing the computer system 100 in relation to a movable second object, where in Fig. 2 the movable second object is in the form of an airborne drone 200.
  • the scheme according to the present disclosure could be implemented in relation to any other form of stationary or movable (second) objects for determining a movement of another (first) object (or e.g. group of objects).
  • a group of persons 202, 204, 206, 207 are illustrated as walking along a road 208.
  • Each of the persons 202, 204, 206, 207 are within the scope of the present disclosure each defined as a first moving object.
  • the road 208 as well as e.g. the trees 210 along the road 208 (as well as other similar objects) are considered to be stationary objects in a vicinity of the first moving object(s) 202, 204, 206, 207.
  • FIG. 2A there is shows a plurality of airborne drones 212, 214, 216 flying at a distance from the persons 202, 204, 206, 207.
  • Each of the drones comprises a control system 100 as presented in Fig. 1 and further detailed in Fig. 2B, preferably each arranged in communication with the remote server 114. It may also be possible to allow the airborne drones 212, 214, 216 to communicate directly between each other, using any form of wireless communication protocol, e.g. as suggested above.
  • the airborne drone 212 (or group of airborne drones) has been assigned to perform a recognizance mission, dynamically travelling from a start position to a destination.
  • the destination could be the same position as the start position.
  • the control system 100 has been adapted to implement the scheme according to the present disclosure.
  • the camera 104 (or any other sensor comprised with the drone 212) is used for collecting information as to a surrounding of the drone 212.
  • the information from the camera 104 is received, SI, at the processing circuitry 102.
  • the images may be defined to comprise representations indicative of a present behavior of the persons 202, 204, 206, 207.
  • Such behavior may be defined to include a plurality of different behavioral parameters, where the plurality of different behavioral parameters for example may include a direction of movement of the persons 202, 204, 206, 207, sizes of the persons 202, 204, 206, 207.
  • Other behavioral parameters that may be identified from e.g.
  • the images from the camera 104 may include different positions of the persons 202, 204, 206, 207 face, head, or upper body. These positions may, for example, be the eyes, eyelids, eyebrows, nose, mouth, cheek, neck, shoulders, arms, etc.
  • the camera 104 may also detect, with further reference to Fig. 2C, if the head, or eyes, of the operator is rotating to the right or left (yaw), 218, rotating up or down (pitch), 220, or, in the case of the head movements, leaning towards the right or left shoulder (roll), 222. If providing the camera 104 with e.g. high-quality optics, it could also be possible to detect e.g. an eye gaze direction for each of the persons 202, 204, 206, 207.
  • the processing circuitry 102 may form, S2, a plurality of behavioral maps for the first movable object by transforming the plurality of representations for each of the plurality of different behavioral parameters using at least one machine learning component.
  • the at least one machine learning component reviews at least a present and a predetermined number of previous representations for the first movable object, i.e. possible historical data as to how the persons 202, 204, 206, 207 have behaved both seen separately and together.
  • the processing circuitry 102 may combine and/or fuse together information comprised with the behavioral maps for predicting, S3, the future movement of the persons 202, 204, 206, 207. That is, each of the behavioral maps will provide a component in the overall prediction of how each of the persons 202, 204, 206, 207 seen individually as well as the group of persons 202, 204, 206, 207 likely will behave, and specifically move, including for example in what direction the 202, 204, 206, 207 likely will look, how fast they will walk, etc.
  • the processing circuitry 102 may determine or form control commands that may be used by the drone 212 (as well as the other drones 214, 216) to operate the drone 212 such that it will be positioned in an undisclosed position in relation to the predicted future movement of the persons 202, 204, 206, 207.
  • the control commands may in some embodiments be defined as a trajectory for the drone 212 that may be interpreted by control means comprised with the drone 212 for controlling e.g. electrical motors comprised with the drone 212.
  • the machine learning component 400 may on a high level be described as an encoder block 402 and a decoder block 404 that may define the backbone of the machine learning component 400.
  • the machine learning component 400 is also dependent on additional software components that may be used to administer data, guide the training process and optimize the machine learning component 400.
  • the above-mentioned software components may comprise:
  • a first module adapted to generate training sequences and labels, batch and at random distribute a sample from the training pool to the machine learning component to process.
  • the labels are sequences of data in same output domain that the machine learning component 400 will operate in and used as ground truth to objectively measure how close or far away the current iteration of the machine learning component 400 is to a desired state.
  • a second module that objectively measure the state of the current prediction from the machine learning component 400 and the label that is distributed from the data administering component defined in (i.) and propagate the individual gradients for the neuron connections backwards to the input neurons.
  • a third module that will collect the gradients from all connections between the neurons in the machine learning component 400 and calculate the individual amount of adjustment needed to better represent the labeled data.
  • the encoder block 402 is preferably implemented to have an equal amount input neurons as needed input streams of data, in other words, the input dimension shall preferably match the input dimension of the machine learning component 400.
  • h(n) represents the output from the encoder block 402
  • i(n) represents the input to the encoder block 402.
  • the encoder block 402 preferably consists of K number of stacked layers.
  • the output dimension of the encoder block 402 is determined by the number of neurons or 'hidden units' in the last of the stacked layers in the encoder block 402.
  • Information contained in the output from the stacked encoder block 402 is a multidimensional representation of past and current state.
  • the output from the encoder block is passed on to the decoder block 404, where the decoder block in turn will transform the latent representation into two parts, firstly generate new input to the encoder block 402, secondly generate an element for the predicted output sequence.
  • the output sequence may then be generated via sampling from a gaussian mixture model, this due to the multimodal nature of the problem domain.
  • a positive side effect from this is that it is also possible to gauge the uncertainty in the prediction yielding larger confidence in the predicted data.
  • the present disclosure relates to a computer implemented method for using machine learning to predict a future movement associated with a first movable object, the method comprising the steps of receiving a sequence of representations indicative of present behavior of the first movable object, wherein the sequence of representations comprises a plurality of different behavioral parameters, forming a plurality of behavioral maps for the first movable object by transforming the plurality of representations for each of the plurality of different behavioral parameters using at least one machine learning component, and predicting the future movement of the first movable object based on the plurality of behavioral maps.
  • the overall behavior of the first object may be observed and analyzed for predicting the future movement of the first object.
  • control functionality of the present disclosure may be implemented using existing computer processors, or by a special purpose computer processor for an appropriate system, incorporated for this or another purpose, or by a hardwire system.
  • Embodiments within the scope of the present disclosure include program products comprising machine- readable medium for carrying or having machine-executable instructions or data structures stored thereon.
  • Such machine-readable media can be any available media that can be accessed by a general purpose or special purpose computer or other machine with a processor.
  • machine-readable media can comprise RAM, ROM, EPROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code in the form of machine-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer or other machine with a processor.
  • a network or another communications connection either hardwired, wireless, or a combination of hardwired or wireless
  • any such connection is properly termed a machine-readable medium.
  • Machine-executable instructions include, for example, instructions and data which cause a general-purpose computer, special purpose computer, or special purpose processing machines to perform a certain function or group of functions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Remote Sensing (AREA)
  • Human Computer Interaction (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Mathematical Physics (AREA)
  • Mechanical Engineering (AREA)
  • Social Psychology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Aviation & Aerospace Engineering (AREA)
  • Game Theory and Decision Science (AREA)
  • Psychiatry (AREA)
  • Business, Economics & Management (AREA)
  • Astronomy & Astrophysics (AREA)
  • Transportation (AREA)
  • Databases & Information Systems (AREA)
  • Ophthalmology & Optometry (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Control Of Position, Course, Altitude, Or Attitude Of Moving Bodies (AREA)
  • Traffic Control Systems (AREA)

Abstract

The present disclosure relates to a computer-implemented method for using machine learning to predict a future movement associated with a first movable object (202,204,206,207) comprising the steps of: - receiving (S1), at a processing circuitry (102), a sequence of representations indicative of present behavior of the first movable object (202,204,206,207), wherein the sequence of representations is acquired using a sensor (104) and comprises a plurality of different behavioral parameters, - forming (S2), using the processing circuitry, a plurality of behavioral maps for the first movable object (202,204,206,207) by transforming the plurality of representations for each of the plurality of different behavioral parameters using at least one machine learning component (400), - applying, using the processing circuitry (102), a predetermined noise reduction or transformation scheme to at least a portion of the plurality of behavioral maps, wherein the predetermined noise reduction or transformation scheme is selected based on at least one of the behavioral parameters or the sensor (104) used for acquiring the sequence of representations and - predicting (S3), using the processing circuitry (102), the future movement of the first movable object (202,204,206,207) based on the plurality of behavioral maps.

Description

METHOD AND SYSTEM FOR PREDICTING MOVEMENT
TECHNICAL FIELD
The present disclosure generally relates to a novel concept to use machine learning for predicting a future movement associated with a first movable object. The present disclosure also relates to a corresponding system and a computer program product.
BACKGROUND
Recent advances in computers and communications have had great impact on how to predict the likelihood of an event occurring within a certain amount of time or the amount of time until an event is likely to occur. Predictive modeling generally refers to techniques for extracting information from data to build a model that can predict an output from a given input. Predicting an output can include predicting future trends or behavior patterns, to name a few examples.
In some implementations machine learning is used in relation to such predictive modeling. Machine learning is a form of artificial intelligence that is employed to allow computers to evolve behaviors based on empirical data. Machine learning may take advantage of training examples to capture characteristics of interest of their unknown underlying probability distribution. Training data may be seen as examples that illustrate relations between observed variables. Specifically, focus is nowadays towards ways to automatically learn to recognize complex patterns and make intelligent decisions based on data.
One possible application of such machine learning based prediction modelling is in relation to semi or fully autonomous vehicles. An autonomous vehicle is typically equipped a computer system implementing the prediction modelling based on information from a plurality of sensors, such as cameras, radar, and other similar devices, that is used to interpret a surrounding of the vehicle. The computer system is adapted to executes numerous decisions while the autonomous vehicle is in motion, such as speeding up, slowing down, stopping, turning, etc. Autonomous vehicles may also use the cameras, sensors, and global positioning devices to gather and interpret images and sensor data about its surrounding environment, e.g., pedestrians, bicyclists, other vehicles, parked cars, trees, buildings, etc.
An exemplary implementation of such a computer system is disclosed in US9248834, presenting a solution where information from the computer system is used to form a detailed map about the vehicle’s surrounding to allow the vehicle to safely maneuver the vehicle in various environments. This detailed map may describe expected conditions of the vehicle's environment such as the shape and location of roads, parking spots, dead zones, traffic signals, and other objects. In this regard, the detailed map may be used to assist in making driving decisions involving intersections and traffic signals, without the interaction of a driver/operator.
A further example of a prediction-based implementation scheme is disclosure in US20190302767, specifically focusing on a system associated with predicting an intent of an object in an environment proximate an autonomous vehicle. The prediction scheme suggested in US20190302767 is adapted to generate a discrete set of semantic intents, each of which can correspond to a candidate trajectory, staying stationary, changing lanes, or another action.
Even though both of US9248834 and US20190302767 provide interesting approaches to apply a prediction scheme for improving an overall safety in operating an autonomous vehicle, there is always room for further improvements and expansion of such technology, with the intention to reduce computational complexity while at the same time improving accuracy of the prediction.
SUMMARY
In view of above-mentioned and other drawbacks of general prior art within the technical area, it is an object of the present disclosure to provide improvements in relation to prediction of future movement associated with movable objects.
According to an aspect of the present disclosure, it is therefore provided a computer implemented method for using machine learning to predict a future movement associated with a first movable object, the method comprising the steps of receiving, at a processing circuitry, a sequence of representations indicative of present behavior of the first movable object, wherein the sequence of representations is acquired using a sensor and comprises a plurality of different behavioral parameters, forming, using the processing circuitry, a plurality of behavioral maps for the first movable object by transforming the plurality of representations for each of the plurality of different behavioral parameters using at least one machine learning component, applying, using the processing circuitry, a predetermined noise reduction or transformation scheme to at least a portion of the plurality of behavioral maps, wherein the predetermined noise reduction or transformation scheme is selected based on at least one of the behavioral parameters or the sensor used for acquiring the sequence of representations, and predicting, using the processing circuitry, the future movement of the first movable object based on the plurality of behavioral maps.
By means of the present disclosure an improved prediction scheme is provided, allowing for the possibility to improve handling of complicated scenarios, possibly including complex object dynamics, as compared to prior-art prone to inaccurate object classifications which can result in prediction errors and high computational complexity. In accordance to the present disclosure, the present behavior of a first object is analyzed to determine a likely future “action” of the first object. In line with the present disclosure, this is achieved by separating the received sequence of representations, for example being data received form an image sensor, a lidar sensor, a radar sensor, etc., into different behavioral parameters. The different behavioral parameters for the first object present within the sequence of representations may generally be dependent on the type of object, whereby e.g. in case the first object is a human it will have different behavioral parameters as compared to an animal or a movable machine/robot.
The data relating to the behavioral parameters for the first object may then in line with the present disclosure be handled in parallel, using at least one machine learning component for forming behavioral maps for each of the plurality of different behavioral parameters. The behavioral maps may in turn be seen as how the first object presently behave within its present context and depending on the specific behavioral parameter.
Once the behavioral maps have been determined, they are possibly combined (or fused together) for predicting the future movement of the first movable object. Accordingly, in line with the present disclosure the overall behavior of the first object is observed and analyzed, possibly segmented, to then be combined again for predicting the future movement of the first object.
The method according to the present disclosure further comprise the step of filtering at least a portion of the plurality of behavioral maps using a predetermined filtering scheme, where the filtering scheme is based on the behavioral parameter or a sensor used for acquiring the sequence of representations. That is, different type of filtering schemes may be used for different sensors used for acquiring the sequence of representations, with the intention to reduce an amount of noise being introduced when determining the plurality of behavioral maps. In some embodiments it may be desirable to arrange the predetermined noise reduction or transformation scheme to ensure that unwanted statistical noise is directly removed or filtered out, in real time, to ensure that the reliability and the precision of the predicted output can be maintained. That said, the predetermined noise reduction or transformation scheme may also be used for general manipulation, for example including normalization, scaling and/or cropping, such as for example but not limited to an implementation where the sequence of representations comprise an image.
In comparison to prior-art, it is by means of the present disclosure possible to implement a prediction model which can generate a future sequence of one or more behavior parameters (i.e. the future predicted movement of the first object), based on a past sequence of the same representations, or based on a sequence of a larger set of behavior parameters. This prediction is not of a discrete set of semantic intents, which is a significant difference, and which provides a unique technical effect for the present prediction scheme, for example including the possibility to make use and scale the present prediction scheme to be used in complex implementations including a large plurality of (first) objects, each associated with a large number of behavioral parameters. In comparison, prior-art implementations relying on a temporal prediction model can be complicated to utilize when a task or setting must be scaled up, in dimension or scope. If a higher resolution or a more descriptive prediction is needed, then the number of semantic classes must be increased. This can have unfortunate effects, such as longer execution time for the process, more computations, or it may require more memory or hardware space. More parameters will also increase difficulty of reaching an optimum point in terms of performance during an optimization process. Possibly, a training time may also increase with the number of behavioral parameters. Furthermore, making use of the plurality of behavioral maps for the first movable object may be seen as a game changer, in comparison to prior-art implementations, specifically in relation to specific embodiments of the present disclosure where some changes to the first object (such as relating to gaze estimation) will be challenging to predict.
Advantageously, the behavioral parameters are selected from a group comprising object pose, head pose, eye gaze, velocity, position, acceleration or an interaction map for the first movable object. The expression “interaction map” should within the context of the present disclosure be understood to relate to how the first object relates to other first objects (such as when the first object is part of a “crowd”) and in relation to fixed objects in the vicinity of the first object. Accordingly, it may in some embodiments be desirable to allow the sequence of representations to comprise information relating to at least one static object located in a vicinity of the first movable object.
In some embodiments of the present disclosure a dedicated machine learning component is selected for each of at least a portion of the behavioral parameters. Such an implementation may for example make a formation of the dedicated machine learning component slightly simpler as such a component may be directed only to a specific behavioral parameter. Such an implementation may also allow for a modularity of the implementation, possibly allowing for a flexible introduction of further (and updated) dedicated machine learning components over time. That said, within the context of the present disclosure it may also be possible to form a single machine learning component that has been formed to handle all of the different behavioral parameters handled in line with the present disclosure.
In determining the behavioral map, the at least one machine learning component reviews at least a present and a predetermined number of previous representations for the first movable object, where the predetermined number of previous representations for the first movable object in some embodiments may be dynamic. Accordingly, the machine learning component will in such an embodiment not just review the present behavior of the first object, but also take into account how the object just previously behaved. It may in line with the present disclosure be possible to apply different weights to the different representations, where representations in the past generally will be provided with a lower weight as compared to newly received representations.
In preferred embodiments of the present disclosure, the scheme according to the present disclosure is performed using processing circuitry comprised with a second object, the second object being different from the first object. The second object may be stationary or movable. For example, a stationary second object may be a security camera, a computer system, etc. Correspondingly, a movable second object may generally be any form of craft or vessel. Possible movable second objects also include any form of unmanned aerial vehicles (UAV) (such as drones) or vehicles (such for example as semi or fully autonomous vehicles). Further both present and future stationary or movable second objects are possible and within the scope of the present disclosure.
An example of a stationary implementation (or at least partly stationary implementation) may for example include an automatic self-checkout system in a store. In such an exemplary implementation, the second object may comprise a sensor collecting a sequence of images, and the first object is a human. In accordance to the general concept according to the present disclosure, in such an exemplary implementation the overall aim may for example be to predict an intention of a customer, such as for example identifying a product that the customer is picking from a shelf in in the store. Such an implementation may however be expanded further and must not necessarily be directed to just an automatic self- checkout system in a store but may generally be used for predicting an intention of a customer in a store or any other area where e.g. a human is present.
When the second object is movable, it may be possible to adapt the scheme according to the present disclosure to also include the step of forming control commands for the second movable object based on the predicted future movement of the first movable object, for example represent a trajectory for the second movable object. As an alternative, the control commands could be direct actuation commands for controlling an operation of the second object.
Preferably, the control commands for the second movable object are formed to reduce an interaction between the first and the second object, generally with the intention to ensure that second movable object avoid the visual field of the first movable object. In a general implementation, such as when the second object is a vehicle and the first object is a human, the overall intention may for example be to ensure that the vehicle is controlled in a way to ensure that the human stays safe without being hit by the car.
Along the same line, in some embodiment of the present disclosure, as will elaborated below in relation to the detailed description, the second object may be an airborne drone on a recognizance mission with the intention of “not being seen” by the first object, where again the first object may be a human, for example being part of a plurality of humans arranged in a crowd. In such an embodiment it will typically be desirable to analyze the human(s) in regards to their movements/head pose/eye gaze, etc., and to form the control commands for the drone such that the drone is arranged to fly outside of e.g. a line of sight for the human(s).
It should further be understood that the scheme according to the present disclosure may be adapted in a manner to increase the interaction between the first and the second object, such as in relation to implementations where it is desirable to arrange the first and the second object in close vicinity of each other. Such embodiments may for example find its way into industrial applications where a robot being the second object is to interact with an animal being the first object.
According to an aspect of the present disclosure, there is further provided a computer system comprising processing circuitry, the computer system arranged to predict a future movement associated with a first movable object by adapting the processing circuitry to receive a sequence of representations indicative of present behavior of the first movable object, wherein the sequence of representations is acquired using a sensor and comprises a plurality of different behavioral parameters, form a plurality of behavioral maps for the first movable object by transforming the plurality of representations for each of the plurality of different behavioral parameters using at least one machine learning component, apply a predetermined noise reduction or transformation scheme to at least a portion of the plurality of behavioral maps, wherein the predetermined noise reduction or transformation scheme is selected based on at least one of the behavioral parameters or the sensor used for acquiring the sequence of representations, and predict the future movement of the first movable object based on the plurality of behavioral maps. This aspect of the present disclosure provides similar advantages as discussed above in relation to the previous aspects of the present disclosure.
As indicated above, the computer system may in some embodiments be comprised as an onboard component of a second object (being different from the first object). The second object may as such be movable or stationary.
According to a further aspect of the present disclosure, there is provided a computer program product comprising a non-transitory computer readable medium having stored thereon computer program means for operating a computer system to predict a future movement associated with a first movable object using machine learning, the computer system comprising processing circuitry, wherein the computer program product comprises code for receiving, at the processing circuitry, a sequence of representations indicative of present behavior of the first movable object, wherein the sequence of representations is acquired using a sensor and comprises a plurality of different behavioral parameters, code for forming, using the processing circuitry, a plurality of behavioral maps for the first movable object by transforming the plurality of representations for each of the plurality of different behavioral parameters using at least one machine learning component, code for applying, using the processing circuitry, a predetermined noise reduction or transformation scheme to at least a portion of the plurality of behavioral maps, wherein the predetermined noise reduction or transformation scheme is selected based on at least one of the behavioral parameters or the sensor used for acquiring the sequence of representations, and code for predicting, using the processing circuitry, the future movement of the first movable object based on the plurality of behavioral maps. Also this aspect of the present disclosure provides similar advantages as discussed above in relation to the previous aspects of the present disclosure.
A software executed by the server for operation in accordance to the present disclosure may be stored on a computer readable medium, being any type of memory device, including one of a removable nonvolatile random access memory, a hard disk drive, a floppy disk, a CD-ROM, a DVD-ROM, a USB memory, an SD memory card, or a similar computer readable medium known in the art.
In summary, the present disclosure generally relates to a novel concept for using machine learning to predict a future movement associated with a first movable object. The present disclosure also relates to a corresponding system and a computer program product.
Further features of, and advantages with, the present disclosure will become apparent when studying the appended claims and the following description. The skilled addressee realize that different features of the present disclosure may be combined to create embodiments other than those described in the following, without departing from the scope of the present disclosure.
BRIEF DESCRIPTION OF THE DRAWINGS
The various aspects of the present disclosure, including its particular features and advantages, will be readily understood from the following detailed description and the accompanying drawings, in which:
Fig. 1 conceptually illustrates a computer system according to an embodiment of the present disclosure connected to a vehicle;
Figs. 2A - 2C presents a possible use of the computer system in a relation to an airborne drone;
Fig. 3 is a flow chart illustrating the steps of performing the method according to a currently preferred embodiment of the present disclosure, and
Fig. 4 conceptually shows a possible implementation of a machine learning component that may be used on relation to the present disclosure.
DETAILED DESCRIPTION
The present disclosure will now be described more fully hereinafter with reference to the accompanying drawings, in which currently preferred embodiments of the present disclosure are shown. This present disclosure may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided for thoroughness and completeness, and fully convey the scope of the present disclosure to the skilled person. Like reference characters refer to like elements throughout. The following examples illustrate the present disclosure and are not intended to limit the same. Turning now to the drawings and to Fig. 1 in particular, there is conceptually illustrated a computer system 100 according to an embodiment of the present disclosure. The purpose of the computer system 100 is, in one embodiment, to dynamically observe and analyze a behavior of a first movable object for predicting the future movement of the first object.
In a possible embodiment, the computer system 100 comprises processing circuitry 102 and a plurality of sensors, for example including a camera 104, a lidar sensor 106 and/or a radar sensor 108.
For reference, the processing circuitry 102 may for example be manifested as a general-purpose processor, an application specific processor, a circuit containing processing components, a group of distributed processing components, a group of distributed computers configured for processing, a field programmable gate array (FPGA), etc. The processor may be or include any number of hardware components for conducting data or signal processing or for executing computer code stored in memory. The memory may be one or more devices for storing data and/or computer code for completing or facilitating the various methods described in the present description. The memory may include volatile memory or non volatile memory. The memory may include database components, object code components, script components, or any other type of information structure for supporting the various activities of the present description. According to an exemplary embodiment, any distributed or local memory device may be utilized with the systems and methods of this description. According to an exemplary embodiment the memory is communicably connected to the processor (e.g., via a circuit or any other wired, wireless, or network connection) and includes computer code for executing one or more processes described herein.
Preferably, the computer system 100 is connected to a network, such as the Internet 110, allowing the computer system 100 to communicate and exchange information with e.g. a remotely located server 112, having a thereto connected remote database 114. The remotely located server 112 may be arranged to receive information about the first object and to provide the computer system 100 with general directions for operation.
To allow the computer system 100 to communicate with the remotely located server 112, the computer system 100 may further comprise a transceiver 116 adapted to allow for any form of wireless connections like WLAN, CDMA, GSM, GPRS, 3/4/5G mobile communications, or similar. Other present of future wireless communication protocols are possible and within the scope of the present disclosure. With further reference to Figs. 2A - 2C, there is shown a possible approach of implementing the computer system 100 in relation to a movable second object, where in Fig. 2 the movable second object is in the form of an airborne drone 200. As indicated above, the scheme according to the present disclosure could be implemented in relation to any other form of stationary or movable (second) objects for determining a movement of another (first) object (or e.g. group of objects).
As shown in Fig. 2A, a group of persons 202, 204, 206, 207 are illustrated as walking along a road 208. Each of the persons 202, 204, 206, 207 are within the scope of the present disclosure each defined as a first moving object. The road 208 as well as e.g. the trees 210 along the road 208 (as well as other similar objects) are considered to be stationary objects in a vicinity of the first moving object(s) 202, 204, 206, 207.
Furthermore, in Fig. 2A there is shows a plurality of airborne drones 212, 214, 216 flying at a distance from the persons 202, 204, 206, 207. Each of the drones comprises a control system 100 as presented in Fig. 1 and further detailed in Fig. 2B, preferably each arranged in communication with the remote server 114. It may also be possible to allow the airborne drones 212, 214, 216 to communicate directly between each other, using any form of wireless communication protocol, e.g. as suggested above.
During operation of the control system 100 when comprised with e.g. the airborne drone 212 and with further reference to Fig. 3, the airborne drone 212 (or group of airborne drones) has been assigned to perform a recognizance mission, dynamically travelling from a start position to a destination. The destination could be the same position as the start position. When performing the recognizance mission, it is desirable to minimize any interactions with e.g. humans (such as persons 202, 204, 206, 207), since such an interaction potentially could result in an undesirable knowledge of the fact that the recognizance mission is performed/ongoing.
To minimize the interaction between the airborne drone 212 and the persons 202, 204, 206, 207, the control system 100 has been adapted to implement the scheme according to the present disclosure. As exemplified in Fig. 2B, the camera 104 (or any other sensor comprised with the drone 212) is used for collecting information as to a surrounding of the drone 212.
The information from the camera 104 is received, SI, at the processing circuitry 102. When the drone 212 is within a distance from the persons 202, 204, 206, 207 allowing sufficiently clear images of the person 202, 204, 206, 207 to be collected, then the images may be defined to comprise representations indicative of a present behavior of the persons 202, 204, 206, 207. Such behavior may be defined to include a plurality of different behavioral parameters, where the plurality of different behavioral parameters for example may include a direction of movement of the persons 202, 204, 206, 207, sizes of the persons 202, 204, 206, 207. Other behavioral parameters that may be identified from e.g. the images from the camera 104 may include different positions of the persons 202, 204, 206, 207 face, head, or upper body. These positions may, for example, be the eyes, eyelids, eyebrows, nose, mouth, cheek, neck, shoulders, arms, etc.
The camera 104 may also detect, with further reference to Fig. 2C, if the head, or eyes, of the operator is rotating to the right or left (yaw), 218, rotating up or down (pitch), 220, or, in the case of the head movements, leaning towards the right or left shoulder (roll), 222. If providing the camera 104 with e.g. high-quality optics, it could also be possible to detect e.g. an eye gaze direction for each of the persons 202, 204, 206, 207.
Based on the received images from the camera 104 (or from any other sensor), it may be possible for the processing circuitry 102 to form, S2, a plurality of behavioral maps for the first movable object by transforming the plurality of representations for each of the plurality of different behavioral parameters using at least one machine learning component.
As indicated above, in determining the behavioral map, the at least one machine learning component reviews at least a present and a predetermined number of previous representations for the first movable object, i.e. possible historical data as to how the persons 202, 204, 206, 207 have behaved both seen separately and together.
Once the behavioral maps have been determined, then the processing circuitry 102 may combine and/or fuse together information comprised with the behavioral maps for predicting, S3, the future movement of the persons 202, 204, 206, 207. That is, each of the behavioral maps will provide a component in the overall prediction of how each of the persons 202, 204, 206, 207 seen individually as well as the group of persons 202, 204, 206, 207 likely will behave, and specifically move, including for example in what direction the 202, 204, 206, 207 likely will look, how fast they will walk, etc.
Based on the predicted future movement of the persons 202, 204, 206, 207 the processing circuitry 102 may determine or form control commands that may be used by the drone 212 (as well as the other drones 214, 216) to operate the drone 212 such that it will be positioned in an undisclosed position in relation to the predicted future movement of the persons 202, 204, 206, 207. The control commands may in some embodiments be defined as a trajectory for the drone 212 that may be interpreted by control means comprised with the drone 212 for controlling e.g. electrical motors comprised with the drone 212. The implementation of the machine learning component 400, as part of e.g. the processing circuitry 102 and with further reference to Fig. 4, may on a high level be described as an encoder block 402 and a decoder block 404 that may define the backbone of the machine learning component 400. During training, the machine learning component 400 is also dependent on additional software components that may be used to administer data, guide the training process and optimize the machine learning component 400.
At a greater detail, the above-mentioned software components may comprise:
(i) A first module adapted to generate training sequences and labels, batch and at random distribute a sample from the training pool to the machine learning component to process. The labels are sequences of data in same output domain that the machine learning component 400 will operate in and used as ground truth to objectively measure how close or far away the current iteration of the machine learning component 400 is to a desired state.
(ii) A second module that objectively measure the state of the current prediction from the machine learning component 400 and the label that is distributed from the data administering component defined in (i.) and propagate the individual gradients for the neuron connections backwards to the input neurons.
(iii) A third module that will collect the gradients from all connections between the neurons in the machine learning component 400 and calculate the individual amount of adjustment needed to better represent the labeled data.
In relation to the machine learning component 400, the encoder block 402 is preferably implemented to have an equal amount input neurons as needed input streams of data, in other words, the input dimension shall preferably match the input dimension of the machine learning component 400. The function of the encoder block 402 is here to encode temporal information from the past and combine with a sample from the input domain, that is, the output is defined as h(n + 1) = i(n) + h(n - 1) +h(n -2) + ... + h(n - M) were M is the defined sequence length of past history needed to step one increment into the future h(n + 1) . In relation to the above, h(n) represents the output from the encoder block 402 and i(n) represents the input to the encoder block 402. The encoder block 402 preferably consists of K number of stacked layers.
The output dimension of the encoder block 402 is determined by the number of neurons or 'hidden units' in the last of the stacked layers in the encoder block 402. Information contained in the output from the stacked encoder block 402 is a multidimensional representation of past and current state. The output from the encoder block is passed on to the decoder block 404, where the decoder block in turn will transform the latent representation into two parts, firstly generate new input to the encoder block 402, secondly generate an element for the predicted output sequence.
The output sequence may then be generated via sampling from a gaussian mixture model, this due to the multimodal nature of the problem domain. A positive side effect from this is that it is also possible to gauge the uncertainty in the prediction yielding larger confidence in the predicted data.
In summary, the present disclosure relates to a computer implemented method for using machine learning to predict a future movement associated with a first movable object, the method comprising the steps of receiving a sequence of representations indicative of present behavior of the first movable object, wherein the sequence of representations comprises a plurality of different behavioral parameters, forming a plurality of behavioral maps for the first movable object by transforming the plurality of representations for each of the plurality of different behavioral parameters using at least one machine learning component, and predicting the future movement of the first movable object based on the plurality of behavioral maps.
By means of the present disclosure, the overall behavior of the first object may be observed and analyzed for predicting the future movement of the first object.
The control functionality of the present disclosure may be implemented using existing computer processors, or by a special purpose computer processor for an appropriate system, incorporated for this or another purpose, or by a hardwire system. Embodiments within the scope of the present disclosure include program products comprising machine- readable medium for carrying or having machine-executable instructions or data structures stored thereon. Such machine-readable media can be any available media that can be accessed by a general purpose or special purpose computer or other machine with a processor. By way of example, such machine-readable media can comprise RAM, ROM, EPROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to carry or store desired program code in the form of machine-executable instructions or data structures and which can be accessed by a general purpose or special purpose computer or other machine with a processor. When information is transferred or provided over a network or another communications connection (either hardwired, wireless, or a combination of hardwired or wireless) to a machine, the machine properly views the connection as a machine-readable medium. Thus, any such connection is properly termed a machine-readable medium. Combinations of the above are also included within the scope of machine-readable media. Machine-executable instructions include, for example, instructions and data which cause a general-purpose computer, special purpose computer, or special purpose processing machines to perform a certain function or group of functions.
Although the figures may show a sequence the order of the steps may differ from what is depicted. Furthermore, two or more steps may be performed concurrently or with partial concurrence. Such variation will depend on the software and hardware systems chosen and on designer choice. All such variations are within the scope of the disclosure. Likewise, software implementations could be accomplished with standard programming techniques with rule-based logic and other logic to accomplish the various connection steps, processing steps, comparison steps and decision steps. Additionally, even though the present disclosure has been described with reference to specific exemplifying embodiments thereof, many different alterations, modifications and the like will become apparent for those skilled in the art.
In addition, variations to the disclosed embodiments can be understood and effected by the skilled addressee in practicing the claimed present disclosure, from a study of the drawings, the disclosure, and the appended claims. Furthermore, in the claims, the word "comprising" does not exclude other elements or steps, and the indefinite article "a" or "an" does not exclude a plurality.

Claims

1. A computer implemented method for using machine learning to predict a future movement associated with a first movable object, the method comprising the steps of:
- receiving, at a processing circuitry, a sequence of representations indicative of present behavior of the first movable object, wherein the sequence of representations is acquired using a sensor and comprises a plurality of different behavioral parameters,
- forming, using the processing circuitry, a plurality of behavioral maps for the first movable object by transforming the plurality of representations for each of the plurality of different behavioral parameters using at least one machine learning component,
- applying, using the processing circuitry, a predetermined noise reduction or transformation scheme to at least a portion of the plurality of behavioral maps, wherein the predetermined noise reduction or transformation scheme is selected based on at least one of the behavioral parameters or the sensor used for acquiring the sequence of representations, and
- predicting, using the processing circuitry, the future movement of the first movable object based on the plurality of behavioral maps.
2. The method according to claim 1, wherein the behavioral parameters are selected from a group comprising object pose, head pose, eye gaze, visual field, velocity, position, acceleration or an interaction map for the first movable object.
3. The method according to any one of claims 1 and 2, wherein a dedicated machine learning component is selected for each of at least a portion of the behavioral parameters.
4. The method according to any one of the preceding claims, wherein the at least one machine learning component forms its behavioral map based on a present and a predetermined number of previous representations for the first movable object.
5. The method according to claim 4, wherein the predetermined number of previous representations for the first movable object is dynamic.
6. The method according to any one of the preceding claims, wherein the steps of receiving, performing and predicting are preformed using processing circuitry comprised with a second object, the second object being different from the first object.
7. The method according to claim 6, wherein the second object is movable and the method further comprises the step of:
- forming, using the processing circuitry, control commands for the second movable object based on the predicted future movement of the first movable object.
8. The method according to claim 7, wherein the control commands represent a trajectory for the second movable object.
9. The method according to any one of claims 7 and 8, wherein the control commands for the second movable object are formed to reduce an interaction between the first and the second object.
10. The method according to any one of the preceding claims, wherein the first object is a human.
11. The method according to any one of claims 7 - 10, wherein the second object is at least one of a craft or a vessel.
12. The method according to any one of claims 7 - 10, wherein the second object is an unmanned aerial vehicle (UAV) or a vehicle.
13. The method according to any one of the preceding claims, wherein the sequence of representations is based on data generated by an image sensor, a lidar sensor or a radar sensor.
14. The method according to any one of the preceding claims, wherein the sequence of representations comprises information relating to at least one static object located in a vicinity of the first movable object.
15. A computer system comprising processing circuitry, the computer system arranged to predict a future movement associated with a first movable object by adapting the processing circuitry to:
- receive a sequence of representations indicative of present behavior of the first movable object, wherein the sequence of representations is acquired using a sensor and comprises a plurality of different behavioral parameters,
- form a plurality of behavioral maps for the first movable object by transforming the plurality of representations for each of the plurality of different behavioral parameters using at least one machine learning component,
- apply a predetermined noise reduction or transformation scheme to at least a portion of the plurality of behavioral maps, wherein the predetermined noise reduction or transformation scheme is selected based on at least one of the behavioral parameters or the sensor used for acquiring the sequence of representations, and
- predict the future movement of the first movable object based on the plurality of behavioral maps.
16. The computer system according to claim 15, further comprising at least one sensor for collecting the sequence of representations.
17. The computer system according to claim 16, wherein the at least one sensor comprises an image sensor, a lidar sensor or a radar sensor.
18. A second object, comprising a computer system according to any one of claims 15 - 17.
19. The second object according to claim 18, wherein the first object is a human.
20. The second object according to any one of claims 18 - 19, wherein the second object is movable.
21. The second object according to claim 20, wherein the second object is at least one of an unmanned aerial vehicle (UAV) or a vehicle.
22. The second object according to any one of claims 20 - 22, wherein the processing circuitry is further adapted to:
- form control commands for the second object based on the predicted future movement of the first movable object.
23. The second object according to claim 22, wherein the control commands represent a trajectory for the second object.
24. The second object according to any one of claims 22 and 23, wherein the control commands for the second object are formed to reduce an interaction between the first and the second object.
25. A computer program product comprising a non-transitory computer readable medium having stored thereon computer program means for operating a computer system to predict a future movement associated with a first movable object using machine learning, the computer system comprising processing circuitry, wherein the computer program product comprises:
- code for receiving, at the processing circuitry, a sequence of representations indicative of present behavior of the first movable object, wherein the sequence of representations is acquired using a sensor and comprises a plurality of different behavioral parameters,
- code for forming, using the processing circuitry, a plurality of behavioral maps for the first movable object by transforming the plurality of representations for each of the plurality of different behavioral parameters using at least one machine learning component,
- code for applying, using the processing circuitry, a predetermined noise reduction or transformation scheme to at least a portion of the plurality of behavioral maps, wherein the predetermined noise reduction or transformation scheme is selected based on at least one of the behavioral parameters or the sensor used for acquiring the sequence of representations, and
- code for predicting, using the processing circuitry, the future movement of the first movable object based on the plurality of behavioral maps.
PCT/SE2020/051222 2019-12-18 2020-12-16 Method and system for predicting movement WO2021126062A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
SE1951488-4 2019-12-18
SE1951488A SE1951488A1 (en) 2019-12-18 2019-12-18 Method and system for predicting movement

Publications (1)

Publication Number Publication Date
WO2021126062A1 true WO2021126062A1 (en) 2021-06-24

Family

ID=76476679

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/SE2020/051222 WO2021126062A1 (en) 2019-12-18 2020-12-16 Method and system for predicting movement

Country Status (2)

Country Link
SE (1) SE1951488A1 (en)
WO (1) WO2021126062A1 (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150110388A1 (en) * 2007-07-11 2015-04-23 Behavioral Recognition Systems, Inc. Semantic representation module of a machine-learning engine in a video analysis system
US9248834B1 (en) * 2014-10-02 2016-02-02 Google Inc. Predicting trajectories of objects based on contextual information
US20180124423A1 (en) * 2016-10-28 2018-05-03 Nec Laboratories America, Inc. Dynamic scene prediction with multiple interacting agents
US20190012574A1 (en) * 2017-07-05 2019-01-10 Perceptive Automata System and method of predicting human interaction with vehicles
US20190205667A1 (en) * 2017-12-29 2019-07-04 Here Global B.V. Method, apparatus, and system for generating synthetic image data for machine learning
US20190302767A1 (en) * 2018-03-28 2019-10-03 Zoox, Inc. Temporal prediction model for semantic intent understanding
US10496091B1 (en) * 2016-08-17 2019-12-03 Waymo Llc Behavior and intent estimations of road users for autonomous vehicles
US20190382007A1 (en) * 2018-06-15 2019-12-19 Uber Technologies, Inc. Multi-Task Machine-Learned Models for Object Intention Determination in Autonomous Driving
WO2020160276A1 (en) * 2019-01-30 2020-08-06 Perceptive Automata, Inc. Neural network based navigation of autonomous vehicles through traffic entities
EP3706034A1 (en) * 2019-03-06 2020-09-09 Robert Bosch GmbH Movement prediction of pedestrians useful for autonomous driving

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150110388A1 (en) * 2007-07-11 2015-04-23 Behavioral Recognition Systems, Inc. Semantic representation module of a machine-learning engine in a video analysis system
US9248834B1 (en) * 2014-10-02 2016-02-02 Google Inc. Predicting trajectories of objects based on contextual information
US10496091B1 (en) * 2016-08-17 2019-12-03 Waymo Llc Behavior and intent estimations of road users for autonomous vehicles
US20180124423A1 (en) * 2016-10-28 2018-05-03 Nec Laboratories America, Inc. Dynamic scene prediction with multiple interacting agents
US20190012574A1 (en) * 2017-07-05 2019-01-10 Perceptive Automata System and method of predicting human interaction with vehicles
US20190205667A1 (en) * 2017-12-29 2019-07-04 Here Global B.V. Method, apparatus, and system for generating synthetic image data for machine learning
US20190302767A1 (en) * 2018-03-28 2019-10-03 Zoox, Inc. Temporal prediction model for semantic intent understanding
US20190382007A1 (en) * 2018-06-15 2019-12-19 Uber Technologies, Inc. Multi-Task Machine-Learned Models for Object Intention Determination in Autonomous Driving
WO2020160276A1 (en) * 2019-01-30 2020-08-06 Perceptive Automata, Inc. Neural network based navigation of autonomous vehicles through traffic entities
EP3706034A1 (en) * 2019-03-06 2020-09-09 Robert Bosch GmbH Movement prediction of pedestrians useful for autonomous driving

Also Published As

Publication number Publication date
SE1951488A1 (en) 2021-06-19

Similar Documents

Publication Publication Date Title
Zhang et al. Human-like autonomous vehicle speed control by deep reinforcement learning with double Q-learning
US20210362596A1 (en) End-To-End Tracking of Objects
US11131993B2 (en) Methods and systems for trajectory forecasting with recurrent neural networks using inertial behavioral rollout
CN109109863B (en) Intelligent device and control method and device thereof
US11827214B2 (en) Machine-learning based system for path and/or motion planning and method of training the same
JP2022516383A (en) Autonomous vehicle planning
CN112734808B (en) Trajectory prediction method for vulnerable road users in vehicle driving environment
US11472444B2 (en) Method and system for dynamically updating an environmental representation of an autonomous agent
US12030523B2 (en) Agent trajectory planning using neural networks
Babiker et al. Convolutional neural network for a self-driving car in a virtual environment
EP4095812A1 (en) Method for predicting a trajectory of an agent in a vicinity of a self-driving vehicle based on ranking
CN114399743A (en) Method for generating future track of obstacle
Koutras et al. Autonomous and cooperative design of the monitor positions for a team of uavs to maximize the quantity and quality of detected objects
CN114792148A (en) Method and device for predicting motion trail
CN116182875A (en) Temporary road path planning method and system based on graphic neural network
EP3705367A1 (en) Training a generator unit and a discriminator unit for collision-aware trajectory prediction
CN116776151A (en) Automatic driving model capable of performing autonomous interaction with outside personnel and training method
JP2021170332A (en) Device and method for training classifier, and device and method for evaluating robustness of classifier
US20230085147A1 (en) Method for operating a robotic vehicle
GB2564897A (en) Method and process for motion planning in (un-)structured environments with pedestrians and use of probabilistic manifolds
JP7321983B2 (en) Information processing system, information processing method, program and vehicle control system
WO2021126062A1 (en) Method and system for predicting movement
KR20230036511A (en) Apparatus and method for generating a lane polyline using a neural network model
RU2800694C2 (en) Method for predicting the trajectory of an agent near an unmanned vehicle based on the ranking
Fennessy Autonomous vehicle end-to-end reinforcement learning model and the effects of image segmentation on model quality

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20901867

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20901867

Country of ref document: EP

Kind code of ref document: A1