[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

US20160350908A1 - Method and system for detecting sea-surface oil - Google Patents

Method and system for detecting sea-surface oil Download PDF

Info

Publication number
US20160350908A1
US20160350908A1 US15/232,743 US201615232743A US2016350908A1 US 20160350908 A1 US20160350908 A1 US 20160350908A1 US 201615232743 A US201615232743 A US 201615232743A US 2016350908 A1 US2016350908 A1 US 2016350908A1
Authority
US
United States
Prior art keywords
video
oil
sea
surface oil
foreground
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/232,743
Inventor
Wesley Kenneth Cobb
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Behavioral Recognition Systems Inc
Original Assignee
Behavioral Recognition Systems Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Behavioral Recognition Systems Inc filed Critical Behavioral Recognition Systems Inc
Priority to US15/232,743 priority Critical patent/US20160350908A1/en
Publication of US20160350908A1 publication Critical patent/US20160350908A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • G06K9/00718
    • G06K9/00771
    • G06T7/0079
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06T2207/20144
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30108Industrial image inspection

Definitions

  • the system may include definitions used to recognize the occurrence of a number of pre-defined events, e.g., the system may evaluate the appearance of an object classified as depicting a car (a vehicle-appear event) coming to a stop over a number of frames (a vehicle-stop event). Thereafter, a new foreground object may appear and be classified as a person (a person-appear event) and the person then walks out of frame (a person-disappear event). Further, the system may be able to recognize the combination of the first two events as a “parking-event.”
  • No currently available video surveillance system is capable of reliably identifying sea-surface oil, which can result from operations incident to the normal operation of an offshore oil platform or oil spills, leaks, etc.
  • a system designed to identify sea-surface oil must address constant variations in the maritime environment, including changes in illumination angle, transparency, aerosols, haze, cloud cover, and transitions between night and day. Such variations can produce false-positive and otherwise erroneous identifications of sea-surface oil.
  • One embodiment of the invention includes a method for analyzing a scene depicted in an input stream of video frames captured by one or more cameras.
  • This method may include, for one or more of the video frames, identifying one or more foreground blobs in the video frame.
  • Each foreground blob may correspond to one or more contiguous pixels of the video frame determined to depict sea-surface oil.
  • This method may further include evaluating the one or more foreground blobs to derive expected patterns of observations of sea-surface oil within a field-of-view of the cameras.
  • the input stream of video frames may be generated by one or more long wavelength infrared (LWIR) cameras.
  • LWIR long wavelength infrared
  • Another embodiment includes a method of analyzing a scene depicted in an input stream of video frames.
  • This method includes, for one or more of the video frames, identifying one or more foreground blobs in the video frame.
  • Each foreground blob generally corresponds to contiguous pixels of the video frame determined by a behavior recognition system to depict a patch of sea-surface oil.
  • the behavior recognition system is configured to learn to distinguish between foreground objects depicting patches of sea-surface oil and false positive detections of patches of sea-surface oil resulting from noise occurring in the one or more video frames.
  • an alert message is generated. Examples of noise include that result in false-positive foreground blobs may include lighting, absorption, and extinction artifacts in the video frames.
  • Embodiments of the present invention provide a method and a system for analyzing and learning to identify unusual dispersions of oil floating on a liquid surface.
  • a computer vision engine may be configured to process video frames from multiple cameras observing a common region of sea surface. The computer vision engine may evaluate frames of video to determine what pixels depict seawater (background) and what pixels depict oil floating on the sea surface (foreground). Contiguous regions of pixels classified as foreground are passed to a machine learning engine, which observes a variety of features of the foreground blobs to learn expected patterns in the scene and issue an alert when unexpected, anomalous oil patches are observed.
  • a machine learning engine is configured to build models of certain behaviors within the scene based on the foreground blobs and extracted features, and determine whether observations indicate that the behavior of an object is anomalous or not, relative to the model.
  • the machine learning engine may model observed sea-surface oil over time, and determine whether any given foreground blob corresponding to sea-surface oil is unusual or anomalous relative to prior sea-surface oil which has been observed.
  • the machine learning engine may issue an alert when anomalous sea-surface oil is observed so that the oil may be investigated.
  • routines executed to implement the embodiments of the invention may be part of an operating system or a specific application, component, program, module, object, or sequence of instructions.
  • the computer program of the present invention is comprised typically of a multitude of instructions that will be translated by the native computer into a machine-readable format and hence executable instructions.
  • programs are comprised of variables and data structures that either reside locally to the program or are found in memory or on storage devices.
  • various programs described herein may be identified based upon the application for which they are implemented in a specific embodiment of the invention. However, it should be appreciated that any particular program nomenclature that follows is used merely for convenience, and thus the invention should not be limited to use solely in any specific application identified and/or implied by such nomenclature.
  • FIG. 1 illustrates components of a video analysis and behavior-recognition system 100 , according to one embodiment.
  • the behavior-recognition system 100 includes a video input source 105 , a network 110 , a computer system 115 , and input and output devices 118 (e.g., a monitor, a keyboard, a mouse, a printer, and the like).
  • the network 110 may transmit video data recorded by the video input 105 to the computer system 115 .
  • the computer system 115 includes a CPU 120 , storage 125 (e.g., a disk drive, optical disk drive, floppy disk drive, and the like), and a memory 130 which includes both a computer vision engine 135 and a machine-learning engine 140 .
  • the computer vision engine 135 and the machine-learning engine 140 may provide software applications configured to analyze a sequence of video frames provided by the video input 105 .
  • video input source 105 may capture infrared spectrum instead of visible light. Further, multiple cameras could be band-pass filtered to capture different wavelength bands within the infrared spectrum. In such a case, images from each camera could be registered to one another, allowing a composite image to be generated from the multiple cameras. As described in greater detail below, by using multiple observations of the sea surface in different wavelength bands of the infrared spectrum, the contrast between oil on seawater may be enhanced, making it more readily detectable to the background foreground module.
  • the computer vision engine may filter clutter from video input source, reducing input images to largely black (background) regions representing seawater and white (foreground) regions representing oil in the field of view to the cameras.
  • the machine learning engine learns to filter noise from the observations of sea-surface oil, and generates alerts after observing an unusual appearance (or behavior) of sea-surface oil.
  • noise include that result in false-positive foreground blobs (i.e., false positive detections of patches of sea-surface oil) include lighting, absorption, and extinction artifacts in the video frames.
  • the computer vision engine 135 is configured to receive input from a multiplexor module which multiplexes multiple data channels.
  • the computer vision engine 135 may itself include the multiplexor module.
  • the multiplexor module may process data from three (or more) channels, co-adding image data and performing operations on the video streams. Each channel may correspond to a camera capturing a different portion of the infrared spectrum.
  • the cameras may be positioned collinear to one another. That is, the cameras may each share a substantially identical field of view. Further, the image from each camera may be registered to one another. As noted, however, each camera may cover a different band of the infrared spectrum. That is, each camera subsamples a different band of the infrared spectrum.
  • each camera is a long wavelength infrared (LWIR) camera with configurable filters.
  • the multiplexor module may take the video signals from the video sources, combine them, as further described herein, and pass the information to the computer vision engine 135 .
  • LWIR
  • the machine-learning engine 140 receives the video frames and the data generated by the computer vision engine 135 .
  • the machine-learning engine 140 may be configured to analyze the received data, cluster objects having similar visual and/or kinematic features, build semantic representations of events depicted in the video frames. Over time, the machine learning engine 140 learns expected patterns of behavior for objects that map to a given cluster. Thus, over time, the machine learning engine learns from these observed patterns to identify normal and/or abnormal events. That is, rather than having patterns, objects, object types, or activities defined in advance, the machine learning engine 140 builds its own model of what different object types have been observed (e.g., based on clusters of kinematic and or appearance features) as well as a model of expected behavior for a given object type. Thereafter, the machine learning engine can decide whether the behavior of an observed event is anomalous or not based on prior learning.
  • Data describing whether anomalous sea-surface oil has been determined and/or describing the anomalous sea-surface oil may be provided to output devices 118 to issue alerts (e.g., an alert message presented on a GUI interface screen).
  • FIG. 1 illustrates merely one possible arrangement of the behavior-recognition system 100 .
  • the video input source 105 is shown connected to the computer system 115 via the network 110 , the network 110 is not always present or needed (e.g., the video input source 105 may be directly connected to the computer system 115 ).
  • various components and modules of the behavior-recognition system 100 may be implemented in other systems.
  • the computer vision engine 135 may be implemented as a part of a video input device (e.g., as a firmware component wired directly into a video camera). In such a case, the output of the video camera may be provided to the machine-learning engine 140 for analysis.
  • the output from the computer vision engine 135 and machine-learning engine 140 may be supplied over computer network 110 to other computer systems.
  • the computer vision engine 135 and machine-learning engine 140 may be installed on a server system and configured to process video from multiple input sources (i.e., from multiple cameras).
  • a client application 250 running on another computer system may request (or receive) the results of over network 110 .
  • FIG. 2 further illustrates components of the computer vision engine 135 and the machine-learning engine 140 first illustrated in FIG. 1 , according to one embodiment of the invention.
  • the computer vision engine 135 includes a background/foreground (BG/FG) component 205 , a tracker component 210 , an estimator/identifier component 215 , and a context processor component 220 .
  • BG/FG background/foreground
  • the components 205 , 210 , 215 , and 220 provide a pipeline for processing an incoming sequence of video frames supplied by the video input source 105 (indicated by the solid arrows linking the components).
  • the output of one component may be provided to multiple stages of the component pipeline (as indicated by the dashed arrows) as well as to the machine-learning engine 140 .
  • the components 205 , 210 , 215 , and 220 may each provide a software module configured to provide the functions described herein.
  • the components 205 , 210 , 215 , and 220 may be combined (or further subdivided) to suit the needs of a particular case and further that additional components may be added (or some may be removed) from a video surveillance system.
  • the BG/FG component 205 may be configured to separate each frame of video provided by the video input source 105 into a static part (the scene background) and a collection of volatile parts (the scene foreground).
  • the frame itself may include a two-dimensional array of pixel values for multiple channels (e.g., RGB channels for color video or grayscale channel or radiance channel for black and white video).
  • the BG/FG component 205 may model background states for each pixel using an adaptive resonance theory (ART) network. That is, each pixel may be classified as depicting scene foreground or scene background using an ART network modeling a given pixel.
  • ART adaptive resonance theory
  • the background may generally corresponds to pixels depicting seawater
  • foreground may generally correspond to pixels depicting sea-surface oil.
  • the BG/FG component 205 may be configured to generate a mask used to identify which pixels of the scene are classified as depicting foreground and, conversely, which pixels are classified as depicting scene background. The BG/FG component 205 then identifies regions of the scene that contain a portion of scene foreground (referred to as a foreground “blob” or “patch”) and supplies this information to subsequent stages of the pipeline. Additionally, pixels classified as depicting scene background may be used to generate a background image modeling the scene.
  • the BG/FG component classifies pixels depicting surface oil as foreground.
  • the computer vision engine is being used as a “blob” detector/tracker, where blobs of pixels classified as foreground correspond to patches of sea-surface oil. In such a case, blobs do not need to address occlusion or depth ordering. Instead, blobs that intersect may be merged.
  • the tracker component 210 may receive the foreground patches produced by the BG/FG component 205 and generate computational models for the patches.
  • the tracker component 210 may be configured to use this information, and each successive frame of raw-video, to attempt to track the motion of an object depicted by a given foreground patch as it moves about the scene. That is, the tracker component 210 provides continuity to other elements of the system by tracking a given object from frame-to-frame.
  • the estimator/identifier component 215 may receive the output of the tracker component 210 (and the BF/FG component 205 ) and identify a variety of kinematic and/or appearance features of a foreground object, e.g., size, height, width, and area (in pixels), reflectivity, shininess rigidity, speed velocity, etc.
  • the features of a foreground object may include the location and sizes of a foreground blob.
  • the computer vision engine could correct for distance and the solid angle effects distorting the size of a foreground object detected at different areas within the field of view of a camera.
  • Other features of a foreground blob may include rates of change in blob size and/or a measure of intensity (i.e., how bright the blob is), motion characteristics of the foreground blobs, whether the foreground blobs have non-sharp edges, whether the foreground blobs have high fractal dimension, and whether the foreground blobs are asymmetrical.
  • the context processor component 220 may receive the output from other stages of the pipeline (i.e., the tracked objects, the background and foreground models, and the results of the estimator/identifier component 215 ). Using this information, the context processor 220 may be configured to generate a stream of context events regarding objects tracked (by tracker component 210 ) and evaluated (by estimator identifier component 215 ). For example, the context processor component 220 may package a stream of micro-feature vectors and kinematic observations of an object and output this to the machine-learning engine 140 , e.g., at a rate of 5 Hz. In one embodiment, the context events are packaged as a trajectory.
  • a trajectory generally refers to a vector packaging the kinematic data of a particular foreground object in successive frames or samples. Each element in the trajectory represents the kinematic data captured for that object at a particular point in time.
  • a complete trajectory includes the kinematic data obtained when an object is first observed in a frame of video along with each successive observation of that object up to when it leaves the scene (or becomes stationary to the point of dissolving into the frame background). Accordingly, assuming computer vision engine 135 is operating at a rate of 5 Hz, a trajectory for an object is updated every 200 milliseconds, until complete.
  • the computer vision engine 135 may take the output from the components 205 , 210 , 215 , and 220 describing the motions and actions of the tracked objects in the scene and supply this information to the machine-learning engine 140 .
  • the context event package may include a list of foreground blobs (patches of surface oil) detected by the computer vision engine 135 , the size and position of each blob, and a trajectory of a blob observed over time.
  • the context event package passed to the machine learning engine 140 could also include any other features of a foreground object detected or generated by components of the computer vision engine 136 , as well as the raw data received from the video feeds.
  • the machine-learning engine 140 includes a long-term memory 225 , a perceptual memory 230 , an episodic memory 235 , a workspace 240 , codelets 245 , a micro-feature classifier 255 , a cluster layer 260 and a sequence layer 265 .
  • the machine-learning engine 140 includes a client application 250 , allowing the user to interact with the video surveillance system 100 using a graphical user interface.
  • the machine-learning engine 140 includes an event bus 222 .
  • the components of the computer vision engine 135 and machine-learning engine 140 output data to the event bus 222 .
  • the components of the machine-learning engine 140 may also subscribe to receive different event streams from the event bus 222 .
  • the micro-feature classifier 255 may subscribe to receive the micro-feature vectors output from the computer vision engine 135 .
  • the workspace 240 provides a computational engine for the machine-learning engine 140 .
  • the workspace 240 may be configured to copy information from the perceptual memory 230 , retrieve relevant memories from the episodic memory 235 and the long-term memory 225 , select which codelets 245 to execute.
  • Each codelet 245 may be a software program configured to evaluate different sequences of events and to determine how one sequence may follow (or otherwise relate to) another (e.g., a finite state machine). More generally, each codelet may provide a software module configured to detect interesting patterns from the streams of data fed to the machine-learning engine. In turn, the codelet 245 may create, retrieve, reinforce, or modify memories in the episodic memory 235 and the long-term memory 225 .
  • the machine-learning engine 140 performs a cognitive cycle used to observe, and learn, about patterns of behavior that occur within the scene.
  • the perceptual memory 230 , the episodic memory 235 , and the long-term memory 225 are used to identify patterns of behavior, evaluate events that transpire in the scene, and encode and store observations.
  • the perceptual memory 230 receives the output of the computer vision engine 135 (e.g., the context event stream).
  • the episodic memory 235 stores data representing observed events with details related to a particular episode, e.g., information describing time and space details related to an event.
  • the episodic memory 235 may encode specific details of a particular event, i.e., “what and where” something occurred within a scene, such as a particular vehicle (car A) moved to a location believed to be a parking space (parking space 5) at 9:43 AM.
  • the long-term memory 225 may store data generalizing events observed in the scene.
  • the long-term memory 225 may encode information capturing observations and generalizations learned by an analysis of the behavior of objects in the scene such as “vehicles in certain areas of the scene tend to be in motion,” “vehicles tend to stop in certain areas of the scene,” etc.
  • the long-term memory 225 stores observations about what happens within a scene with much of the particular episodic details stripped away.
  • memories from the episodic memory 235 and the long-term memory 225 may be used to relate and understand a current event, i.e., the new event may be compared with past experience, leading to both reinforcement, decay, and adjustments to the information stored in the long-term memory 225 , over time.
  • the long-term memory 225 may be implemented as an ART network and a sparse-distributed memory data structure.
  • the micro-feature classifier 255 may schedule a codelet 245 to evaluate the micro-feature vectors output by the computer vision engine 135 .
  • the computer vision engine 135 may track objects frame-to-frame and generate micro-feature vectors for each foreground object at a rate of, e.g., 5 Hz.
  • the micro-feature classifier 255 may be configured to create clusters from this stream of micro-feature vectors. For example, each micro-feature vector may be supplied to an input layer of the ART network (or a combination of a self organizing map (SOM) and ART network used to cluster nodes in the SOM).
  • SOM self organizing map
  • the ART network maps the micro-feature vector to a cluster in the ART network and updates that cluster (or creates a new cluster if the input micro-feature vector is sufficiently dissimilar to the existing clusters).
  • Each cluster is presumed to represent a distinct object type, and objects sharing similar micro-feature vectors (as determined using the choice and vigilance parameters of the ART network) may map to the same cluster.
  • each distinct cluster in the art network generally represents a distinct type of object acting within the scene. And as new objects enter the scene, new object types may emerge in the ART network.
  • the micro-feature classifier 255 may assign an object type identifier to each cluster, providing a different object type for each cluster in the ART network.
  • the micro-feature classifier 255 may supply the micro-feature vectors to a self-organizing map structure (SOM).
  • SOM self-organizing map structure
  • the ART network may cluster nodes of the SOM—and assign an object type identifier to each cluster.
  • each SOM node mapping to the same cluster is presumed to represent an instance of a common type of object.
  • the machine-learning engine 140 also includes a cluster layer 260 and a sequence layer 265 .
  • the cluster layer 260 may be configured to generate clusters from the trajectories of objects classified by the micro-feature classifier 255 as being an instance of a common object type.
  • the cluster layer 260 uses a combination of a self-organizing map (SOM) and an ART network to cluster the kinematic data in the trajectories.
  • SOM self-organizing map
  • the sequence layer 265 may be configured to generate sequences encoding the observed patterns of behavior represented by the trajectories. And once generated, the sequence layer may identify segments within a sequence using a voting experts technique. Further, the sequence layer 265 may be configured to identify anomalous segments and sequences.
  • the machine learning engine 140 may observe foreground blobs (presumably patches of sea-surface oil) and, over time, identify where patches tend to appear, how frequently patches appear, how long a patch remains, how large patches tend to be, etc. And after observing a sea-surface area for a period of time, the machine learning engine 140 may distinguish between (1) patches of surface oil that occur incident to the normal operations of an offshore drilling platform and other spurious oil patches, and (2) patches of surface oil that need to be investigated or evaluated by platform personnel.
  • the machine learning engine 140 is used to learn to identify what are “normal” observations of sea-surface oil and what are “abnormal” or “unusual” observations that require investigation.
  • a machine-learning video analytics system may be configured to use a computer vision engine to observe a scene, generate information streams of observed activity, and to pass the streams to a machine learning engine.
  • the machine learning engine may engage in an undirected and unsupervised learning approach to learn patterns regarding the object behaviors in that scene. Thereafter, when unexpected (i.e., abnormal or unusual) behavior is observed, alerts may be generated.
  • a multiplexor module is configured to receive video streams (also referred to herein as “signals”) from three or more long-wavelength infrared (LWIR) cameras whose output is filtered by distinct band-pass filters and multiplex the signals to generate a single synthetic signal whose brightness indicates a match with an IR signature of sea-surface oil.
  • a computer vision engine determines, from the synthetic signal, foreground blobs representing patches of contiguous pixels having values indicating a match to the IR signature of oil, and further extracts features such as position, size, change in size, etc. which are pertinent to sea-surface oil.
  • a machine learning engine is configured to build models of behaviors within the scene based on the foreground blobs and extracted features, and determine whether observations indicate that the behavior of an object is anomalous or not, relative to the model.
  • the machine learning engine may model observed sea-surface oil over time (including spurious oil), and determine whether any given foreground blob corresponding to sea-surface oil is unusual or anomalous relative to prior sea-surface oil which has been observed.
  • the machine learning engine may issue an alert when anomalous sea-surface oil is observed so that the oil may be investigated.
  • FIG. 3 illustrates a system for generating a synthetic video stream for detecting sea-surface oil and deriving expected patterns in the synthetic video stream, according to one embodiment.
  • the system includes LWIR cameras 310 - 330 which capture the “thermal” part of the light spectrum. Captured light from each of the cameras 310 - 300 is filtered using a respective spectral band-pass filter to generate filtered signals in a distinct wavelength band.
  • objects made of normal matter e.g., electrons, protons, and neutrons
  • the emitted radiation may mostly be X-rays, ultraviolet light, visible light, infrared light, microwaves, or radio waves.
  • An idealized object called a blackbody that is perfectly efficient at this process would emit radiation energy as a function of the wavelength of the light according to Planck's law:
  • ⁇ A is the spectral emissivity and will generally be a complicated function of wavelength, angle, and temperature, as an object can radiate more efficiently in some directions and/or colors of light than others and this dependence may vary with temperature.
  • ⁇ ⁇ the spectral reflectivity
  • ⁇ ⁇ that describes what fraction of energy incident upon an object is reflected back, may likewise be a function of temperature, angle, and wavelength.
  • the spectral emissivity and spectral reflectivity of oil and water at different temperatures are well-known.
  • the contrast (i.e., the difference) between modeled spectral radiances (here, the combination of emission and reflected radiances) of seawater and oil is shown in FIG. 4 , which depicts the contrasts
  • the observed ocean will typically be acting as an emissive source of radiation with a temperature somewhere between 275° K and 325° K at nighttime, and during the daytime, there will be additional components to the radiation field corresponding to the reflected sunlight as well.
  • the contrast, at a given temperature, between the emitted radiance curves of seawater versus oil and/or reflected radiance of seawater versus oil, or a combination of the two, may be used to distinguish between water and oil.
  • oil and seawater may be distinguished using an approach which is sensitive to the shape of the curves in FIG. 4 .
  • a thermal camera multiplexor module 340 may be configured to multiplex the input from cameras 310 - 330 equipped with band-pass filters, producing signals B 1 , B 2 , and B 3 .
  • Each of signals B 1 , B 2 , and B 3 may produce a single data-point per image-pixel corresponding to a grayscale brightness of the scene in that particular spectral band.
  • signal B 1 may be generated using a 8.0-9.0 ⁇ m band-pass filter
  • signal B 2 may be generated using a 8.0-11.5 ⁇ m band-pass filter
  • signal B 3 may be generated using an 8.0-13.0 ⁇ m band-pass filter.
  • other band-pass filters may be used, including more (or fewer) than three band-pass filters and band-pass filters for different wavelength ranges.
  • vertical polarizing filters may also be used to minimize specular reflection effects.
  • the synthetic video stream is directly proportional to the differences a and b which correspond to contrasts between wavelength ranges in which difference between radiance from seawater is substantially greater than difference between radiance from oil, and inversely proportional to the difference c which corresponds to a contrast between wavelength ranges in which difference between radiance from oil is substantially greater than difference between radiance from seawater.
  • the synthetic video stream is a synthetic video stream which tends to maximize the contrast between the spectral signatures of water and oil. That is, the synthetic video stream may be a black-and-white video stream in which the brightness of respective image pixels correspond to how closely the IR signature of the pixel matches what would be expected from oil.
  • the synthetic discriminant video stream output by the multiplexor module 340 is subsequently input to video analysis system 350 , which is similar to the video analysis system described in conjunction with FIGS. 1-2 .
  • the video analysis system 350 may include a computer vision engine which determines, from the synthetic video stream, foreground blobs representing patches of contiguous pixels having values indicating a match to the IR signature of oil using, e.g., per-pixel ART networks, as previously discussed.
  • the computer vision engine may also extract features such as locations and sizes of foreground blobs, rates of change in blob size and/or a measure of intensity (i.e., how bright the blob is), motion characteristics of the foreground blobs, whether the foreground blobs have non-sharp edges, whether the foreground blobs have high fractal dimension, and whether the foreground blobs are asymmetrical, etc. which are pertinent to sea-surface oil.
  • the video analysis system 350 may further include a machine learning engine which receives the foreground blobs and features extracted by the computer vision engine, and which engages in undirected and unsupervised learning to discern patterns of object behaviors in the scene of the synthetic discriminant video stream, discussed in greater detail below. Thereafter, when unexpected (i.e., abnormal or unusual) sea-surface oil is observed, the machine learning engine may generate an alert so that the sea-surface oil may be investigated.
  • FIG. 5 illustrates an exemplary geometry for mounting video cameras on an offshore oil platform 510 , according to one embodiment.
  • the oil platform 510 includes a mast 515 on which one or more sets of LWIR cameras 520 are mounted, at a height h above a sea surface 500 , to observe the sea surface 500 .
  • Each set of LWIR cameras may include three or more cameras, with the signal of each camera in the set filtered by a band-pass filter for a distinct wavelength range and the filtered signal being multiplexed to generate a synthetic discriminant video stream that is input to a video analysis system.
  • several sets of fixed cameras, each oriented toward a different azimuth may be used to achieve full 360° azimuthal coverage of the sea surface.
  • a single set of cameras may be configured to perform a continuing guard-tour sweep to achieve full azimuthal coverage.
  • the cameras 520 are able to view a segment of the sea surface beginning from near the platform 510 at r 1 and extending out to a distance r 2 , which may be, e.g., several kilometers away from the platform 510 .
  • wide-angle camera lenses may be used to view a relatively large portion of the sea surface.
  • the higher the cameras 520 are placed i.e., the greater h is), the further away the apparent horizon will be, and the further the cameras 520 will be able to see.
  • due to effects from, e.g., sea-surface spray and aerosols discrimination of oil from seawater may not be possible out to the horizon itself.
  • the particular maximum distance and limiting ranges may depend on the cameras 520 used, the arrangement of the oil platform 510 , among other things.
  • FIG. 6 illustrates a method 600 for detecting and reporting on anomalous sea-surface oil, according to one embodiment.
  • the method 600 begins at step 610 , where a camera multiplexor module receives video frames from LWIR cameras with distinct spectral band-pass filters.
  • a camera multiplexor module receives video frames from LWIR cameras with distinct spectral band-pass filters.
  • three or more cameras may be used for purposes of detecting surface oil, and the band-pass filters may be chosen so as to let through light in wavelength ranges in which the radiance contrast between seawater and surface oil is relatively large. Doing so may permit the spectral radiance signatures of seawater and surface oil to be more clearly distinguishable from each other.
  • video frames B 1 , B 2 , and B 3 may be received, with the B 1 signal being filtered by a 8.0-9.0 ⁇ m band-pass filter, the B 2 signal being filtered by a 8.0-11.5 ⁇ m band-pass filter, and the B 3 signal being filtered by a 8.0-13.0 ⁇ m band-pass filter.
  • the specific bands here are representative values given for illustrative purposes, and the actual bands used may be different in other embodiments.
  • the multiplexor module combines the received frames to create a synthetic discriminant video stream with brightness corresponding to a match with the IR signature of oil.
  • the multiplexor module may compute differences between pairs of received video frames in different wavelength ranges.
  • the synthetic video stream may be directly proportional to the difference(s) which correspond to contrasts between wavelength ranges in which difference between radiance from seawater is substantially greater than difference between radiance from oil, and inversely proportional to difference(s) which correspond to contrast between wavelength ranges in which difference between radiance from oil is substantially greater than difference between radiance from seawater, or vice versa.
  • s 1 and s 2 are constants which normalize the ratio to, e.g., the range [0,1], with 0 being water and 1 being oil.
  • the particular form of the equation for multiplexing the received frames to generate a discriminant signal may be different.
  • a video analysis system analyzes and learns behavioral patterns in the synthetic video stream.
  • a computer vision engine of the video analysis system may separate foreground blobs depicting oil from background depicting seawater given the synthetic discriminant video stream with brightness corresponding to a match with the IR signature of oil.
  • the computer vision engine may model the scene background and select pixels as foreground using per-pixel ART networks, discussed above. Contiguous regions of pixels classified as foreground may eventually be passed to the machine learning engine.
  • the computer vision engine may also include an estimator/identifier component which identifies kinematic and/or appearance features of foreground objects such as size, height, width, and area (in pixels), reflectivity, shininess rigidity, speed velocity, etc.
  • features used to determine sea-surface oil may include the locations and sizes of foreground blobs, rates of change in blob size and/or a measure of intensity (i.e., how bright the blob is), motion characteristics of the foreground blobs, whether the foreground blobs have non-sharp edges, whether the foreground blobs have high fractal dimension, and whether the foreground blobs are asymmetrical.
  • Such features may be particularly relevant to sea-surface oil, as surface oil blobs may tend to be, e.g., irregular in shape and thus have high fractal dimension, asymmetrical, lack sharp edges, move in certain ways, appear in certain places and have certain sizes, etc.
  • the foreground blobs and extracted features are provided to a machine learning engine of the video analysis system, which may observe foreground blobs and, over time, identify where patches tend to appear, how frequently patches appear, how long patches remain (or remains depending on where it appeared), how large patches tend to be, and characteristics and/or patterns of other features as they tend to appear in the scene.
  • the machine learning engine may build a model of expected behavior in the scene. Doing so permits commonly-occurring and spurious sea-surface oil patches, which may be caused by, e.g., normal operation of the oil platform, lighting artifacts or changes in the maritime environment, etc.
  • the machine learning engine may automatically learn to classify foreground blobs by shape, location, and appearance. If an observed object in a later video frame has oil-like characteristics, and is thus extracted by the computer vision engine as a foreground blob, the machine learning engine may determine, based on the shape, location, or other appearance features of this new foreground blob, whether the blob is shaped, located, appears, etc. like objects which were previously observed.
  • the machine learning engine may include a long-term memory storing data generalizing events observed in the scene, where the long term memory is implemented as ART network(s) and sparse-distributed memory data structure(s), discussed above.
  • feature vectors may be supplied to an input layer of the ART network (or a combination of a self organizing map (SOM) and ART network used to cluster nodes in the SOM), and the ART network may map the micro-feature vector to a cluster in the ART network and update that cluster (or create a new cluster if the input micro-feature vector is sufficiently dissimilar to the existing clusters).
  • SOM self organizing map
  • video analysis system generates alerts when anomalous behavior is observed.
  • the machine learning engine may, over time, learn to distinguish between observed patches of sea-surface oil that occur normally and patches of surface oil that do not, and are thus anomalous.
  • the video analysis system may issue to an alert to, e.g., a user interface, so that the anomalous surface oil patch may be investigated.
  • the radiation need not be infrared light, and may instead be X-rays, ultraviolet light, visible light, microwaves, or radio waves, and appropriate cameras and/or filters may be used to capture the radiation.
  • cameras other devices, such as spectrometers, may be used in lieu of cameras.
  • techniques disclosed herein permit surface oil to be distinguished from seawater using input from multiple LWIR cameras whose signals are band-pass filtered and multiplexed to generate a single synthetic discriminant signal. Patterns of behavior in the scene are then learned so that anomalous sea-surface oil patches, which may result from oil spills or leaks, may be reported while other surface oil patches from normal operation of the oil platform or spurious patches from changing maritime conditions, etc. are not reported.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Investigating Or Analysing Materials By Optical Means (AREA)

Abstract

A behavioral recognition system may include both a computer vision engine and a machine learning engine configured to observe and learn patterns of behavior in video data. Certain embodiments may be configured to detect and evaluate the presence of sea-surface oil on the water surrounding an offshore oil platform. The computer vision engine may be configured to segment image data into detected patches or blobs of surface oil (foreground) present in the field of view of an infrared camera (or cameras). A machine learning engine may evaluate the detected patches of surface oil to learn to distinguish between sea-surface oil incident to the operation of an offshore platform and the appearance of surface oil that should be investigated by platform personnel.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a continuation of U.S. patent application Ser. No. 14/823,771 filed on Aug. 11, 2015, which claims priority to U.S. patent application Ser. No. 13/971,027 filed on Aug. 20, 2013, which itself claims priority to provisional patent application Ser. No. 61/691,102, filed on Aug. 20, 2012, and which are hereby incorporated by reference in their entirety.
  • BACKGROUND OF THE INVENTION
  • Field of the Invention
  • Embodiments of the invention provide techniques for analyzing a sequence of video frames. More particularly, embodiments of the invention provide a combination of a camera system and a computer vision engine and machine learning system configured to detect and evaluate the presence of sea-surface oil, e.g., surrounding an offshore drilling platform.
  • Description of the Related Art
  • Some currently available video surveillance systems provide simple object recognition capabilities. For example, a video surveillance system may be configured to classify a group of pixels (referred to as a “blob”) in a given frame as being a particular object (e.g., a person or vehicle). Once identified, a “blob” may be tracked from frame-to-frame in order to follow the “blob” moving through the scene over time, e.g., a person walking across the field of vision of a video surveillance camera. Further, such systems may be configured to determine when an object has engaged in certain predefined behaviors. For example, the system may include definitions used to recognize the occurrence of a number of pre-defined events, e.g., the system may evaluate the appearance of an object classified as depicting a car (a vehicle-appear event) coming to a stop over a number of frames (a vehicle-stop event). Thereafter, a new foreground object may appear and be classified as a person (a person-appear event) and the person then walks out of frame (a person-disappear event). Further, the system may be able to recognize the combination of the first two events as a “parking-event.”
  • However, such surveillance systems typically are unable to identify or update objects, events, behaviors, or patterns (or classify such objects, events, behaviors, etc., as being normal or anomalous) by observing what happens in the scene over time; instead, such systems rely on static patterns defined in advance. Thus, in practice, these systems rely on predefined definitions for objects and/or behaviors to evaluate a video sequence. Unless the underlying system includes a description for a particular object or behavior, the system is generally incapable of recognizing that behavior (or at least instances of the pattern describing the particular object or behavior). More generally, such systems are often unable to identify objects, events, behaviors, or patterns (or classify such objects, events, behaviors, etc., as being normal or anomalous) by observing what happens in the scene over time; instead, such systems rely on static patterns defined in advance.
  • No currently available video surveillance system is capable of reliably identifying sea-surface oil, which can result from operations incident to the normal operation of an offshore oil platform or oil spills, leaks, etc. Although the optical properties of oil-films in the visible, UV, and IR spectral regions have been studied extensively, a system designed to identify sea-surface oil must address constant variations in the maritime environment, including changes in illumination angle, transparency, aerosols, haze, cloud cover, and transitions between night and day. Such variations can produce false-positive and otherwise erroneous identifications of sea-surface oil.
  • SUMMARY OF THE INVENTION
  • One embodiment of the invention includes a method for analyzing a scene depicted in an input stream of video frames captured by one or more cameras. This method may include, for one or more of the video frames, identifying one or more foreground blobs in the video frame. Each foreground blob may correspond to one or more contiguous pixels of the video frame determined to depict sea-surface oil. This method may further include evaluating the one or more foreground blobs to derive expected patterns of observations of sea-surface oil within a field-of-view of the cameras. The input stream of video frames may be generated by one or more long wavelength infrared (LWIR) cameras.
  • In a particular embodiment, this method may further include, after deriving the expected patterns of occurrences of sea-surface oil, receiving a set of foreground blobs identified in a subsequent one of the video frames and, upon determining that at least a first one of the foreground blobs does not correspond to at least one of the expected patterns of occurrences of sea-surface oil, generating an alert message.
  • Another embodiment includes a method of analyzing a scene depicted in an input stream of video frames. This method includes, for one or more of the video frames, identifying one or more foreground blobs in the video frame. Each foreground blob generally corresponds to contiguous pixels of the video frame determined by a behavior recognition system to depict a patch of sea-surface oil. Further, the behavior recognition system is configured to learn to distinguish between foreground objects depicting patches of sea-surface oil and false positive detections of patches of sea-surface oil resulting from noise occurring in the one or more video frames. Upon determining one of the foreground blobs depicts a patch of sea-surface oil deviates from expected patterns of sea-surface oil derived by the behavior recognition system, an alert message is generated. Examples of noise include that result in false-positive foreground blobs may include lighting, absorption, and extinction artifacts in the video frames.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • So that the manner in which the above recited features, advantages, and objects of the present invention are attained and can be understood in detail, a more particular description of the invention, briefly summarized above, may be had by reference to the embodiments illustrated in the appended drawings.
  • It is to be noted, however, that the appended drawings illustrate only typical embodiments of this invention and are therefore not to be considered limiting of its scope, for the invention may admit to other equally effective embodiments.
  • FIG. 1 illustrates components of a video analysis system, according to one embodiment of the invention.
  • FIG. 2 further illustrates components of the video analysis system shown in FIG. 1, according to one embodiment of the invention.
  • FIG. 3 illustrates a system for generating a synthetic video stream for detecting sea-surface oil and deriving expected patterns in the synthetic video stream, according to one embodiment of the invention.
  • FIG. 4 illustrates spectral radiance contrast between seawater and modeled surface oil.
  • FIG. 5 illustrates an exemplary geometry for mounting video cameras on an offshore oil platform, according to one embodiment of the invention.
  • FIG. 6 illustrates a method for detecting and reporting on anomalous sea-surface oil, according to one embodiment of the invention.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • Embodiments of the present invention provide a method and a system for analyzing and learning to identify unusual dispersions of oil floating on a liquid surface. A computer vision engine may be configured to process video frames from multiple cameras observing a common region of sea surface. The computer vision engine may evaluate frames of video to determine what pixels depict seawater (background) and what pixels depict oil floating on the sea surface (foreground). Contiguous regions of pixels classified as foreground are passed to a machine learning engine, which observes a variety of features of the foreground blobs to learn expected patterns in the scene and issue an alert when unexpected, anomalous oil patches are observed.
  • In one embodiment, a multiplexor module is configured to receive video streams (also referred to herein as “signals”) from three or more long-wavelength infrared (LWIR) cameras whose output is filtered by distinct band-pass filters and multiplex the signals to generate a single synthetic signal whose brightness indicates a match with an IR signature of sea-surface oil. A computer vision engine determines, from the synthetic signal, foreground blobs representing patches of contiguous pixels having values indicating a match to the IR signature of oil, and further extracts features such as position, size, change in size, etc. which are pertinent to sea-surface oil. In turn, a machine learning engine is configured to build models of certain behaviors within the scene based on the foreground blobs and extracted features, and determine whether observations indicate that the behavior of an object is anomalous or not, relative to the model. In one embodiment, e.g., the machine learning engine may model observed sea-surface oil over time, and determine whether any given foreground blob corresponding to sea-surface oil is unusual or anomalous relative to prior sea-surface oil which has been observed. The machine learning engine may issue an alert when anomalous sea-surface oil is observed so that the oil may be investigated.
  • In the following, reference is made to embodiments of the invention. However, it should be understood that the invention is not limited to any specifically described embodiment. Instead, any combination of the following features and elements, whether related to different embodiments or not, is contemplated to implement and practice the invention. Furthermore, in various embodiments the invention provides numerous advantages over the prior art. However, although embodiments of the invention may achieve advantages over other possible solutions and/or over the prior art, whether or not a particular advantage is achieved by a given embodiment is not limiting of the invention. Thus, the following aspects, features, embodiments and advantages are merely illustrative and are not considered elements or limitations of the appended claims except where explicitly recited in a claim(s). Likewise, reference to “the invention” shall not be construed as a generalization of any inventive subject matter disclosed herein and shall not be considered to be an element or limitation of the appended claims except where explicitly recited in a claim(s).
  • One embodiment of the invention is implemented as a program product for use with a computer system. The program(s) of the program product defines functions of the embodiments (including the methods described herein) and can be contained on a variety of computer-readable storage media. Examples of computer-readable storage media include (i) non-writable storage media (e.g., read-only memory devices within a computer such as CD-ROM or DVD-ROM disks readable by an optical media drive) on which information is permanently stored; (ii) writable storage media (e.g., floppy disks within a diskette drive or hard-disk drive) on which alterable information is stored. Such computer-readable storage media, when carrying computer-readable instructions that direct the functions of the present invention, are embodiments of the present invention. Other examples media include communications media through which information is conveyed to a computer, such as through a computer or telephone network, including wireless communications networks.
  • In general, the routines executed to implement the embodiments of the invention may be part of an operating system or a specific application, component, program, module, object, or sequence of instructions. The computer program of the present invention is comprised typically of a multitude of instructions that will be translated by the native computer into a machine-readable format and hence executable instructions. Also, programs are comprised of variables and data structures that either reside locally to the program or are found in memory or on storage devices. In addition, various programs described herein may be identified based upon the application for which they are implemented in a specific embodiment of the invention. However, it should be appreciated that any particular program nomenclature that follows is used merely for convenience, and thus the invention should not be limited to use solely in any specific application identified and/or implied by such nomenclature.
  • FIG. 1 illustrates components of a video analysis and behavior-recognition system 100, according to one embodiment. As shown, the behavior-recognition system 100 includes a video input source 105, a network 110, a computer system 115, and input and output devices 118 (e.g., a monitor, a keyboard, a mouse, a printer, and the like). The network 110 may transmit video data recorded by the video input 105 to the computer system 115. Illustratively, the computer system 115 includes a CPU 120, storage 125 (e.g., a disk drive, optical disk drive, floppy disk drive, and the like), and a memory 130 which includes both a computer vision engine 135 and a machine-learning engine 140. As described in greater detail below, the computer vision engine 135 and the machine-learning engine 140 may provide software applications configured to analyze a sequence of video frames provided by the video input 105.
  • Network 110 receives video data (e.g., video stream(s), video images, or the like) from the video input source 105. The video input source 105 may be a video camera, a VCR, DVR, DVD, computer, web-cam device, or the like. For example, the video input source 105 may be a stationary video camera aimed at a certain area (e.g., a subway station, a parking lot, a building entry/exit, etc.), which records the events taking place therein. Generally, the area visible to the camera is referred to as the “scene.” The video input source 105 may be configured to record the scene as a sequence of individual video frames at a specified frame-rate (e.g., 24 frames per second), where each frame includes a fixed number of pixels (e.g., 320×240). Each pixel of each frame may specify a color value (e.g., an RGB value) or grayscale value (e.g., a radiance value between 0-255). Further, the video stream may be formatted using known formats including MPEG2, MJPEG, MPEG4, H.263, H.264, and the like.
  • In one embodiment, video input source 105 may capture infrared spectrum instead of visible light. Further, multiple cameras could be band-pass filtered to capture different wavelength bands within the infrared spectrum. In such a case, images from each camera could be registered to one another, allowing a composite image to be generated from the multiple cameras. As described in greater detail below, by using multiple observations of the sea surface in different wavelength bands of the infrared spectrum, the contrast between oil on seawater may be enhanced, making it more readily detectable to the background foreground module. In one embodiment, the computer vision engine may filter clutter from video input source, reducing input images to largely black (background) regions representing seawater and white (foreground) regions representing oil in the field of view to the cameras. In turn, the machine learning engine learns to filter noise from the observations of sea-surface oil, and generates alerts after observing an unusual appearance (or behavior) of sea-surface oil. Examples of noise include that result in false-positive foreground blobs (i.e., false positive detections of patches of sea-surface oil) include lighting, absorption, and extinction artifacts in the video frames.
  • In one embodiment, the computer vision engine 135 is configured to receive input from a multiplexor module which multiplexes multiple data channels. Alternatively, the computer vision engine 135 may itself include the multiplexor module. In one embodiment, the multiplexor module may process data from three (or more) channels, co-adding image data and performing operations on the video streams. Each channel may correspond to a camera capturing a different portion of the infrared spectrum. The cameras may be positioned collinear to one another. That is, the cameras may each share a substantially identical field of view. Further, the image from each camera may be registered to one another. As noted, however, each camera may cover a different band of the infrared spectrum. That is, each camera subsamples a different band of the infrared spectrum. In one embodiment, each camera is a long wavelength infrared (LWIR) camera with configurable filters. The multiplexor module may take the video signals from the video sources, combine them, as further described herein, and pass the information to the computer vision engine 135.
  • As noted above, the computer vision engine 135 may be configured to analyze image data (whether in the visible or IR spectrum (or otherwise)) to identify objects in the video stream, identify a variety of appearance and kinematic features used by a machine learning engine 140 to derive object classifications, derive a variety of metadata regarding the actions and interactions of such objects, and supply this information to the machine-learning engine 140. And in turn, the machine-learning engine 140 may be configured to evaluate, observe, learn and remember details regarding events (and types of events) that transpire within the scene over time.
  • In one embodiment, the machine-learning engine 140 receives the video frames and the data generated by the computer vision engine 135. The machine-learning engine 140 may be configured to analyze the received data, cluster objects having similar visual and/or kinematic features, build semantic representations of events depicted in the video frames. Over time, the machine learning engine 140 learns expected patterns of behavior for objects that map to a given cluster. Thus, over time, the machine learning engine learns from these observed patterns to identify normal and/or abnormal events. That is, rather than having patterns, objects, object types, or activities defined in advance, the machine learning engine 140 builds its own model of what different object types have been observed (e.g., based on clusters of kinematic and or appearance features) as well as a model of expected behavior for a given object type. Thereafter, the machine learning engine can decide whether the behavior of an observed event is anomalous or not based on prior learning.
  • Data describing whether anomalous sea-surface oil has been determined and/or describing the anomalous sea-surface oil may be provided to output devices 118 to issue alerts (e.g., an alert message presented on a GUI interface screen).
  • In general, the computer vision engine 135 and the machine-learning engine 140 both process video data in real-time. However, time scales for processing information by the computer vision engine 135 and the machine-learning engine 140 may differ. For example, in one embodiment, the computer vision engine 135 processes the received video data frame-by-frame, while the machine-learning engine 140 processes data every N-frames. In other words, while the computer vision engine 135 may analyze each frame in real-time to derive a set of appearance and kinematic data related to objects observed in the frame, the machine-learning engine 140 is not constrained by the real-time frame rate of the video input.
  • Note, however, FIG. 1 illustrates merely one possible arrangement of the behavior-recognition system 100. For example, although the video input source 105 is shown connected to the computer system 115 via the network 110, the network 110 is not always present or needed (e.g., the video input source 105 may be directly connected to the computer system 115). Further, various components and modules of the behavior-recognition system 100 may be implemented in other systems. For example, in one embodiment, the computer vision engine 135 may be implemented as a part of a video input device (e.g., as a firmware component wired directly into a video camera). In such a case, the output of the video camera may be provided to the machine-learning engine 140 for analysis. Similarly, the output from the computer vision engine 135 and machine-learning engine 140 may be supplied over computer network 110 to other computer systems. For example, the computer vision engine 135 and machine-learning engine 140 may be installed on a server system and configured to process video from multiple input sources (i.e., from multiple cameras). In such a case, a client application 250 running on another computer system may request (or receive) the results of over network 110.
  • FIG. 2 further illustrates components of the computer vision engine 135 and the machine-learning engine 140 first illustrated in FIG. 1, according to one embodiment of the invention. As shown, the computer vision engine 135 includes a background/foreground (BG/FG) component 205, a tracker component 210, an estimator/identifier component 215, and a context processor component 220. Collectively, the components 205, 210, 215, and 220 provide a pipeline for processing an incoming sequence of video frames supplied by the video input source 105 (indicated by the solid arrows linking the components). Additionally, the output of one component may be provided to multiple stages of the component pipeline (as indicated by the dashed arrows) as well as to the machine-learning engine 140. In one embodiment, the components 205, 210, 215, and 220 may each provide a software module configured to provide the functions described herein. Of course one of ordinary skill in the art will recognize that the components 205, 210, 215, and 220 may be combined (or further subdivided) to suit the needs of a particular case and further that additional components may be added (or some may be removed) from a video surveillance system.
  • In one embodiment, the BG/FG component 205 may be configured to separate each frame of video provided by the video input source 105 into a static part (the scene background) and a collection of volatile parts (the scene foreground). The frame itself may include a two-dimensional array of pixel values for multiple channels (e.g., RGB channels for color video or grayscale channel or radiance channel for black and white video). In one embodiment, the BG/FG component 205 may model background states for each pixel using an adaptive resonance theory (ART) network. That is, each pixel may be classified as depicting scene foreground or scene background using an ART network modeling a given pixel. Of course, other approaches to distinguish between scene foreground and background may be used. Again, in context of this discussion, the background may generally corresponds to pixels depicting seawater, whereas foreground may generally correspond to pixels depicting sea-surface oil.
  • Additionally, the BG/FG component 205 may be configured to generate a mask used to identify which pixels of the scene are classified as depicting foreground and, conversely, which pixels are classified as depicting scene background. The BG/FG component 205 then identifies regions of the scene that contain a portion of scene foreground (referred to as a foreground “blob” or “patch”) and supplies this information to subsequent stages of the pipeline. Additionally, pixels classified as depicting scene background may be used to generate a background image modeling the scene.
  • In context of detecting and evaluating sea-surface oil, the BG/FG component classifies pixels depicting surface oil as foreground. Thus, the computer vision engine is being used as a “blob” detector/tracker, where blobs of pixels classified as foreground correspond to patches of sea-surface oil. In such a case, blobs do not need to address occlusion or depth ordering. Instead, blobs that intersect may be merged.
  • The tracker component 210 may receive the foreground patches produced by the BG/FG component 205 and generate computational models for the patches. The tracker component 210 may be configured to use this information, and each successive frame of raw-video, to attempt to track the motion of an object depicted by a given foreground patch as it moves about the scene. That is, the tracker component 210 provides continuity to other elements of the system by tracking a given object from frame-to-frame.
  • The estimator/identifier component 215 may receive the output of the tracker component 210 (and the BF/FG component 205) and identify a variety of kinematic and/or appearance features of a foreground object, e.g., size, height, width, and area (in pixels), reflectivity, shininess rigidity, speed velocity, etc.
  • In context of detecting sea-surface oil, the features of a foreground object (a blob of pixels) may include the location and sizes of a foreground blob. Note, the computer vision engine could correct for distance and the solid angle effects distorting the size of a foreground object detected at different areas within the field of view of a camera. Other features of a foreground blob may include rates of change in blob size and/or a measure of intensity (i.e., how bright the blob is), motion characteristics of the foreground blobs, whether the foreground blobs have non-sharp edges, whether the foreground blobs have high fractal dimension, and whether the foreground blobs are asymmetrical.
  • The context processor component 220 may receive the output from other stages of the pipeline (i.e., the tracked objects, the background and foreground models, and the results of the estimator/identifier component 215). Using this information, the context processor 220 may be configured to generate a stream of context events regarding objects tracked (by tracker component 210) and evaluated (by estimator identifier component 215). For example, the context processor component 220 may package a stream of micro-feature vectors and kinematic observations of an object and output this to the machine-learning engine 140, e.g., at a rate of 5 Hz. In one embodiment, the context events are packaged as a trajectory. As used herein, a trajectory generally refers to a vector packaging the kinematic data of a particular foreground object in successive frames or samples. Each element in the trajectory represents the kinematic data captured for that object at a particular point in time. Typically, a complete trajectory includes the kinematic data obtained when an object is first observed in a frame of video along with each successive observation of that object up to when it leaves the scene (or becomes stationary to the point of dissolving into the frame background). Accordingly, assuming computer vision engine 135 is operating at a rate of 5 Hz, a trajectory for an object is updated every 200 milliseconds, until complete.
  • The computer vision engine 135 may take the output from the components 205, 210, 215, and 220 describing the motions and actions of the tracked objects in the scene and supply this information to the machine-learning engine 140. In context of detecting sea-surface oil, the context event package may include a list of foreground blobs (patches of surface oil) detected by the computer vision engine 135, the size and position of each blob, and a trajectory of a blob observed over time. The context event package passed to the machine learning engine 140 could also include any other features of a foreground object detected or generated by components of the computer vision engine 136, as well as the raw data received from the video feeds.
  • Illustratively, the machine-learning engine 140 includes a long-term memory 225, a perceptual memory 230, an episodic memory 235, a workspace 240, codelets 245, a micro-feature classifier 255, a cluster layer 260 and a sequence layer 265. Additionally, the machine-learning engine 140 includes a client application 250, allowing the user to interact with the video surveillance system 100 using a graphical user interface. Further still, the machine-learning engine 140 includes an event bus 222. In one embodiment, the components of the computer vision engine 135 and machine-learning engine 140 output data to the event bus 222. At the same time, the components of the machine-learning engine 140 may also subscribe to receive different event streams from the event bus 222. For example, the micro-feature classifier 255 may subscribe to receive the micro-feature vectors output from the computer vision engine 135.
  • Generally, the workspace 240 provides a computational engine for the machine-learning engine 140. For example, the workspace 240 may be configured to copy information from the perceptual memory 230, retrieve relevant memories from the episodic memory 235 and the long-term memory 225, select which codelets 245 to execute. Each codelet 245 may be a software program configured to evaluate different sequences of events and to determine how one sequence may follow (or otherwise relate to) another (e.g., a finite state machine). More generally, each codelet may provide a software module configured to detect interesting patterns from the streams of data fed to the machine-learning engine. In turn, the codelet 245 may create, retrieve, reinforce, or modify memories in the episodic memory 235 and the long-term memory 225. By repeatedly scheduling codelets 245 for execution, copying memories and percepts to/from the workspace 240, the machine-learning engine 140 performs a cognitive cycle used to observe, and learn, about patterns of behavior that occur within the scene.
  • In one embodiment, the perceptual memory 230, the episodic memory 235, and the long-term memory 225 are used to identify patterns of behavior, evaluate events that transpire in the scene, and encode and store observations. Generally, the perceptual memory 230 receives the output of the computer vision engine 135 (e.g., the context event stream). The episodic memory 235 stores data representing observed events with details related to a particular episode, e.g., information describing time and space details related to an event. That is, the episodic memory 235 may encode specific details of a particular event, i.e., “what and where” something occurred within a scene, such as a particular vehicle (car A) moved to a location believed to be a parking space (parking space 5) at 9:43 AM.
  • In contrast, the long-term memory 225 may store data generalizing events observed in the scene. To continue with the example of a vehicle parking, the long-term memory 225 may encode information capturing observations and generalizations learned by an analysis of the behavior of objects in the scene such as “vehicles in certain areas of the scene tend to be in motion,” “vehicles tend to stop in certain areas of the scene,” etc. Thus, the long-term memory 225 stores observations about what happens within a scene with much of the particular episodic details stripped away. In this way, when a new event occurs, memories from the episodic memory 235 and the long-term memory 225 may be used to relate and understand a current event, i.e., the new event may be compared with past experience, leading to both reinforcement, decay, and adjustments to the information stored in the long-term memory 225, over time. In a particular embodiment, the long-term memory 225 may be implemented as an ART network and a sparse-distributed memory data structure.
  • The micro-feature classifier 255 may schedule a codelet 245 to evaluate the micro-feature vectors output by the computer vision engine 135. As noted, the computer vision engine 135 may track objects frame-to-frame and generate micro-feature vectors for each foreground object at a rate of, e.g., 5 Hz. In one embodiment, the micro-feature classifier 255 may be configured to create clusters from this stream of micro-feature vectors. For example, each micro-feature vector may be supplied to an input layer of the ART network (or a combination of a self organizing map (SOM) and ART network used to cluster nodes in the SOM). In response, the ART network maps the micro-feature vector to a cluster in the ART network and updates that cluster (or creates a new cluster if the input micro-feature vector is sufficiently dissimilar to the existing clusters). Each cluster is presumed to represent a distinct object type, and objects sharing similar micro-feature vectors (as determined using the choice and vigilance parameters of the ART network) may map to the same cluster.
  • For example, the micro-features associated with observations of many different vehicles may be similar enough to map to the same cluster (or group of clusters). At the same time, observations of many different people may map to a different cluster (or group of clusters) than the vehicles cluster. Thus, each distinct cluster in the art network generally represents a distinct type of object acting within the scene. And as new objects enter the scene, new object types may emerge in the ART network.
  • Importantly, however, this approach does not require the different object type classifications to be defined in advance; instead, object types emerge over time as distinct clusters in the ART network. In one embodiment, the micro-feature classifier 255 may assign an object type identifier to each cluster, providing a different object type for each cluster in the ART network.
  • In an alternative embodiment, rather than generate clusters from the micro-features vector directly, the micro-feature classifier 255 may supply the micro-feature vectors to a self-organizing map structure (SOM). In such a case, the ART network may cluster nodes of the SOM—and assign an object type identifier to each cluster. In such a case, each SOM node mapping to the same cluster is presumed to represent an instance of a common type of object.
  • As shown, the machine-learning engine 140 also includes a cluster layer 260 and a sequence layer 265. The cluster layer 260 may be configured to generate clusters from the trajectories of objects classified by the micro-feature classifier 255 as being an instance of a common object type. In one embodiment, the cluster layer 260 uses a combination of a self-organizing map (SOM) and an ART network to cluster the kinematic data in the trajectories. Once the trajectories are clustered, the sequence layer 265 may be configured to generate sequences encoding the observed patterns of behavior represented by the trajectories. And once generated, the sequence layer may identify segments within a sequence using a voting experts technique. Further, the sequence layer 265 may be configured to identify anomalous segments and sequences.
  • In context of detecting sea-surface oil, the machine learning engine 140 may observe foreground blobs (presumably patches of sea-surface oil) and, over time, identify where patches tend to appear, how frequently patches appear, how long a patch remains, how large patches tend to be, etc. And after observing a sea-surface area for a period of time, the machine learning engine 140 may distinguish between (1) patches of surface oil that occur incident to the normal operations of an offshore drilling platform and other spurious oil patches, and (2) patches of surface oil that need to be investigated or evaluated by platform personnel. That is, given the complexity of a maritime environment, the complexity of reflections and spurious light and oil observations in the proximity of boats, ships and offshore platforms, the machine learning engine 140 is used to learn to identify what are “normal” observations of sea-surface oil and what are “abnormal” or “unusual” observations that require investigation.
  • Detecting Anomalous Sea-Surface Oil in a Machine-Learning Video Analytics System
  • As noted above, a machine-learning video analytics system may be configured to use a computer vision engine to observe a scene, generate information streams of observed activity, and to pass the streams to a machine learning engine. In turn, the machine learning engine may engage in an undirected and unsupervised learning approach to learn patterns regarding the object behaviors in that scene. Thereafter, when unexpected (i.e., abnormal or unusual) behavior is observed, alerts may be generated.
  • In one embodiment, a multiplexor module is configured to receive video streams (also referred to herein as “signals”) from three or more long-wavelength infrared (LWIR) cameras whose output is filtered by distinct band-pass filters and multiplex the signals to generate a single synthetic signal whose brightness indicates a match with an IR signature of sea-surface oil. A computer vision engine determines, from the synthetic signal, foreground blobs representing patches of contiguous pixels having values indicating a match to the IR signature of oil, and further extracts features such as position, size, change in size, etc. which are pertinent to sea-surface oil. In turn, a machine learning engine is configured to build models of behaviors within the scene based on the foreground blobs and extracted features, and determine whether observations indicate that the behavior of an object is anomalous or not, relative to the model. In one embodiment, e.g., the machine learning engine may model observed sea-surface oil over time (including spurious oil), and determine whether any given foreground blob corresponding to sea-surface oil is unusual or anomalous relative to prior sea-surface oil which has been observed. The machine learning engine may issue an alert when anomalous sea-surface oil is observed so that the oil may be investigated.
  • FIG. 3 illustrates a system for generating a synthetic video stream for detecting sea-surface oil and deriving expected patterns in the synthetic video stream, according to one embodiment. As shown, the system includes LWIR cameras 310-330 which capture the “thermal” part of the light spectrum. Captured light from each of the cameras 310-300 is filtered using a respective spectral band-pass filter to generate filtered signals in a distinct wavelength band.
  • According to physics theory, objects made of normal matter (e.g., electrons, protons, and neutrons) and having finite non-zero temperatures continuously emit electromagnetic radiation. Depending on the temperature of a given object, the emitted radiation may mostly be X-rays, ultraviolet light, visible light, infrared light, microwaves, or radio waves. An idealized object called a blackbody that is perfectly efficient at this process would emit radiation energy as a function of the wavelength of the light according to Planck's law:
  • B λ ( T ) = 2 hc 2 / λ 5 ( hc / λ kT - 1 ) , ( 1 )
  • where, h is Planck's constant, c is the speed of light, k is Boltzmann's constant, λ is the wavelength of emitted radiation, and T is the temperature (° K) of the emitting object. Real physical objects are not perfectly efficient radiators of electromagnetic radiation, and emit a different distribution of energy than Plank's law given by:
  • B λ ( T , θ ) = 2 hc 2 / λ 5 ( hc / λ kT - 1 ) ε λ ( T , θ ) , ( 2 )
  • where εA is the spectral emissivity and will generally be a complicated function of wavelength, angle, and temperature, as an object can radiate more efficiently in some directions and/or colors of light than others and this dependence may vary with temperature. A related quantity, the spectral reflectivity ρλ, that describes what fraction of energy incident upon an object is reflected back, may likewise be a function of temperature, angle, and wavelength.
  • The spectral emissivity and spectral reflectivity of oil and water at different temperatures are well-known. The contrast (i.e., the difference) between modeled spectral radiances (here, the combination of emission and reflected radiances) of seawater and oil is shown in FIG. 4, which depicts the contrasts |radianceoil−radiancesea| at various temperatures from 275-325° K during the daytime 410 and during the nighttime 420. Note, the observed ocean will typically be acting as an emissive source of radiation with a temperature somewhere between 275° K and 325° K at nighttime, and during the daytime, there will be additional components to the radiation field corresponding to the reflected sunlight as well. In one embodiment, the contrast, at a given temperature, between the emitted radiance curves of seawater versus oil and/or reflected radiance of seawater versus oil, or a combination of the two, may be used to distinguish between water and oil. For example, oil and seawater may be distinguished using an approach which is sensitive to the shape of the curves in FIG. 4.
  • Returning to FIG. 3, a thermal camera multiplexor module 340 may be configured to multiplex the input from cameras 310-330 equipped with band-pass filters, producing signals B1, B2, and B3. Each of signals B1, B2, and B3 may produce a single data-point per image-pixel corresponding to a grayscale brightness of the scene in that particular spectral band. In one embodiment, signal B1 may be generated using a 8.0-9.0 μm band-pass filter, signal B2 may be generated using a 8.0-11.5 μm band-pass filter, and signal B3 may be generated using an 8.0-13.0 μm band-pass filter. In alternative embodiments, other band-pass filters may be used, including more (or fewer) than three band-pass filters and band-pass filters for different wavelength ranges. In yet another embodiment, vertical polarizing filters may also be used to minimize specular reflection effects.
  • The multiplexor module 340 may then compute the difference between the inputs a=B2−B1, b=B3−B1, c=B3−B2 at 342 1-3 and examine the relative sizes of the contrast values. In one embodiment, given the differences between the inputs a, b, and c, the multiplexor module 340 may generate a synthetic discriminant video stream by taking the ratio
  • s 1 ( s 1 + a ) ( s 2 + b ) ( s 2 + c ) 2 ,
  • where s1 and s2 are constants which normalize the ratio to, e.g., the range [0,1], with 0 being water and 1 being oil. Here, the synthetic video stream is directly proportional to the differences a and b which correspond to contrasts between wavelength ranges in which difference between radiance from seawater is substantially greater than difference between radiance from oil, and inversely proportional to the difference c which corresponds to a contrast between wavelength ranges in which difference between radiance from oil is substantially greater than difference between radiance from seawater. As a result, the synthetic video stream is a synthetic video stream which tends to maximize the contrast between the spectral signatures of water and oil. That is, the synthetic video stream may be a black-and-white video stream in which the brightness of respective image pixels correspond to how closely the IR signature of the pixel matches what would be expected from oil.
  • As shown, the synthetic discriminant video stream output by the multiplexor module 340 is subsequently input to video analysis system 350, which is similar to the video analysis system described in conjunction with FIGS. 1-2. The video analysis system 350 may include a computer vision engine which determines, from the synthetic video stream, foreground blobs representing patches of contiguous pixels having values indicating a match to the IR signature of oil using, e.g., per-pixel ART networks, as previously discussed. The computer vision engine may also extract features such as locations and sizes of foreground blobs, rates of change in blob size and/or a measure of intensity (i.e., how bright the blob is), motion characteristics of the foreground blobs, whether the foreground blobs have non-sharp edges, whether the foreground blobs have high fractal dimension, and whether the foreground blobs are asymmetrical, etc. which are pertinent to sea-surface oil. The video analysis system 350 may further include a machine learning engine which receives the foreground blobs and features extracted by the computer vision engine, and which engages in undirected and unsupervised learning to discern patterns of object behaviors in the scene of the synthetic discriminant video stream, discussed in greater detail below. Thereafter, when unexpected (i.e., abnormal or unusual) sea-surface oil is observed, the machine learning engine may generate an alert so that the sea-surface oil may be investigated.
  • FIG. 5 illustrates an exemplary geometry for mounting video cameras on an offshore oil platform 510, according to one embodiment. As shown, the oil platform 510 includes a mast 515 on which one or more sets of LWIR cameras 520 are mounted, at a height h above a sea surface 500, to observe the sea surface 500. Each set of LWIR cameras may include three or more cameras, with the signal of each camera in the set filtered by a band-pass filter for a distinct wavelength range and the filtered signal being multiplexed to generate a synthetic discriminant video stream that is input to a video analysis system. In one embodiment, several sets of fixed cameras, each oriented toward a different azimuth, may be used to achieve full 360° azimuthal coverage of the sea surface. In an alternative embodiment, a single set of cameras may be configured to perform a continuing guard-tour sweep to achieve full azimuthal coverage.
  • Illustratively, the cameras 520 are able to view a segment of the sea surface beginning from near the platform 510 at r1 and extending out to a distance r2, which may be, e.g., several kilometers away from the platform 510. In one embodiment, wide-angle camera lenses may be used to view a relatively large portion of the sea surface. Generally, the higher the cameras 520 are placed (i.e., the greater h is), the further away the apparent horizon will be, and the further the cameras 520 will be able to see. However, due to effects from, e.g., sea-surface spray and aerosols, discrimination of oil from seawater may not be possible out to the horizon itself. The particular maximum distance and limiting ranges may depend on the cameras 520 used, the arrangement of the oil platform 510, among other things.
  • FIG. 6 illustrates a method 600 for detecting and reporting on anomalous sea-surface oil, according to one embodiment. As shown, the method 600 begins at step 610, where a camera multiplexor module receives video frames from LWIR cameras with distinct spectral band-pass filters. In one embodiment, three or more cameras may be used for purposes of detecting surface oil, and the band-pass filters may be chosen so as to let through light in wavelength ranges in which the radiance contrast between seawater and surface oil is relatively large. Doing so may permit the spectral radiance signatures of seawater and surface oil to be more clearly distinguishable from each other. In a particular embodiment, video frames B1, B2, and B3 may be received, with the B1 signal being filtered by a 8.0-9.0 μm band-pass filter, the B2 signal being filtered by a 8.0-11.5 μm band-pass filter, and the B3 signal being filtered by a 8.0-13.0 μm band-pass filter. Note, the specific bands here are representative values given for illustrative purposes, and the actual bands used may be different in other embodiments.
  • At step 620, the multiplexor module combines the received frames to create a synthetic discriminant video stream with brightness corresponding to a match with the IR signature of oil. In one embodiment, the multiplexor module may compute differences between pairs of received video frames in different wavelength ranges. In such a case, the synthetic video stream may be directly proportional to the difference(s) which correspond to contrasts between wavelength ranges in which difference between radiance from seawater is substantially greater than difference between radiance from oil, and inversely proportional to difference(s) which correspond to contrast between wavelength ranges in which difference between radiance from oil is substantially greater than difference between radiance from seawater, or vice versa. Returning to the example of received signals B1, B2, and B3 discussed above, the multiplexor module may compute the difference between the inputs a=B2−B1, b=B3−B1, c=B3−B2, and generate the discriminant video stream by taking the ratio
  • s 1 ( s 1 + a ) ( s 2 + b ) ( s 2 + c ) 2 ,
  • where s1 and s2 are constants which normalize the ratio to, e.g., the range [0,1], with 0 being water and 1 being oil. In alternative embodiments, the particular form of the equation for multiplexing the received frames to generate a discriminant signal may be different.
  • At step 630, a video analysis system analyzes and learns behavioral patterns in the synthetic video stream. As discussed, a computer vision engine of the video analysis system may separate foreground blobs depicting oil from background depicting seawater given the synthetic discriminant video stream with brightness corresponding to a match with the IR signature of oil. For example, the computer vision engine may model the scene background and select pixels as foreground using per-pixel ART networks, discussed above. Contiguous regions of pixels classified as foreground may eventually be passed to the machine learning engine.
  • As discussed, the computer vision engine may also include an estimator/identifier component which identifies kinematic and/or appearance features of foreground objects such as size, height, width, and area (in pixels), reflectivity, shininess rigidity, speed velocity, etc. In one embodiment, features used to determine sea-surface oil may include the locations and sizes of foreground blobs, rates of change in blob size and/or a measure of intensity (i.e., how bright the blob is), motion characteristics of the foreground blobs, whether the foreground blobs have non-sharp edges, whether the foreground blobs have high fractal dimension, and whether the foreground blobs are asymmetrical. Such features may be particularly relevant to sea-surface oil, as surface oil blobs may tend to be, e.g., irregular in shape and thus have high fractal dimension, asymmetrical, lack sharp edges, move in certain ways, appear in certain places and have certain sizes, etc.
  • The foreground blobs and extracted features are provided to a machine learning engine of the video analysis system, which may observe foreground blobs and, over time, identify where patches tend to appear, how frequently patches appear, how long patches remain (or remains depending on where it appeared), how large patches tend to be, and characteristics and/or patterns of other features as they tend to appear in the scene. With the observations of the sea-surface area for a period of time, the machine learning engine may build a model of expected behavior in the scene. Doing so permits commonly-occurring and spurious sea-surface oil patches, which may be caused by, e.g., normal operation of the oil platform, lighting artifacts or changes in the maritime environment, etc. to be learned so that alerts are not generated when such commonly-occurring false-positive patches are observed. For example, using shape, location, or other appearance features, the machine learning engine may automatically learn to classify foreground blobs by shape, location, and appearance. If an observed object in a later video frame has oil-like characteristics, and is thus extracted by the computer vision engine as a foreground blob, the machine learning engine may determine, based on the shape, location, or other appearance features of this new foreground blob, whether the blob is shaped, located, appears, etc. like objects which were previously observed.
  • In one embodiment, the machine learning engine may include a long-term memory storing data generalizing events observed in the scene, where the long term memory is implemented as ART network(s) and sparse-distributed memory data structure(s), discussed above. In such a case, feature vectors may be supplied to an input layer of the ART network (or a combination of a self organizing map (SOM) and ART network used to cluster nodes in the SOM), and the ART network may map the micro-feature vector to a cluster in the ART network and update that cluster (or create a new cluster if the input micro-feature vector is sufficiently dissimilar to the existing clusters). Over time, predictable “oil” patches, whether resulting from oil generated incident to normal operation of the platform or spurious patches resulting from lighting artifacts or changes in the maritime environment, may produce relatively dense ART network clusters. Then, when another “oil” patch having a similar feature vector is received, the machine learning engine may map this “oil” patch to one of the dense clusters and, given such a mapping, identify the patch as “normal.” That is, the system may learn to ignore commonly-occurring and spurious sea-surface oil patches caused by, e.g., normal operation of the oil platform, lighting artifacts or changes in the maritime environment, etc., which may produce relatively dense ART network clusters.
  • Additional and further approaches for extracting objects and features from video frames and learning and reporting on behaviors in a scene are discussed in, e.g., U.S. Pat. No. 8,126,833, entitled “Detecting Anomalous Events Using a Long-Term Memory in a Video Analysis System”; U.S. Pat. No. 8,131,012, entitled “Behavioral Recognition System”; U.S. Pat. No. 8,167,430, entitled “Unsupervised Learning of Temporal Anomalies for a Video Surveillance System”; U.S. Pat. No. 8,180,105, entitled “Classifier Anomalies for Observed Behaviors in a Video Surveillance System”; U.S. Pat. No. 8,189,905, entitled “Cognitive Model for a Machine-Learning Engine in a Video Analysis System”; U.S. Pat. No. 8,218,818, entitled “Foreground Object Tracking”; U.S. Pat. No. 8,270,733, entitled “Identifying Anomalous Object Types During Classification”; U.S. Pat. No. 8,285,060, entitled “Detecting Anomalous Trajectories in a Video Surveillance System”; U.S. Pat. No. 8,300,924, entitled “Tracker Component for Behavioral Recognition System”; U.S. Pat. No. 8,358,834, entitled “Background Model for Complex and Dynamic Scenes”; U.S. Pat. No. 8,411,935, entitled “Semantic Representation Module of a Machine-Learning Engine in a Video Analysis System”; U.S. Pat. No. 8,416,296, entitled “Mapper Component for Multiple Art Networks in a Video Analysis System”; and U.S. Pat. No. 8,494,222, entitled “Classifier Anomalies for Observed Behaviors in a Video Surveillance System,” which are hereby incorporated by reference in their entirety.
  • At step 640, video analysis system generates alerts when anomalous behavior is observed. As discussed, the machine learning engine may, over time, learn to distinguish between observed patches of sea-surface oil that occur normally and patches of surface oil that do not, and are thus anomalous. When such an anomalous surface oil patch is observed, the video analysis system may issue to an alert to, e.g., a user interface, so that the anomalous surface oil patch may be investigated.
  • Although discussed above with respect to distinguishing oil from seawater, techniques disclosed herein may be used to distinguish other objects having different spectral radiance signatures from one another. In such cases, the radiation need not be infrared light, and may instead be X-rays, ultraviolet light, visible light, microwaves, or radio waves, and appropriate cameras and/or filters may be used to capture the radiation. Further, although discussed above with respect to cameras, other devices, such as spectrometers, may be used in lieu of cameras.
  • Advantageously, techniques disclosed herein permit surface oil to be distinguished from seawater using input from multiple LWIR cameras whose signals are band-pass filtered and multiplexed to generate a single synthetic discriminant signal. Patterns of behavior in the scene are then learned so that anomalous sea-surface oil patches, which may result from oil spills or leaks, may be reported while other surface oil patches from normal operation of the oil platform or spurious patches from changing maritime conditions, etc. are not reported.
  • While the foregoing is directed to embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow.

Claims (1)

What is claimed is:
1. A computer-implemented method for analyzing a scene depicted in an input stream of video frames, the method comprising:
for one or more of the video frames:
identifying one or more foreground blobs in the video frames, wherein each foreground blob corresponds to one or more contiguous pixels of the video frame determined to depict sea-surface oil; and
evaluating the one or more foreground blobs to derive expected patterns of observations of sea-surface oil.
US15/232,743 2012-08-20 2016-08-09 Method and system for detecting sea-surface oil Abandoned US20160350908A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US15/232,743 US20160350908A1 (en) 2012-08-20 2016-08-09 Method and system for detecting sea-surface oil

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201261691102P 2012-08-20 2012-08-20
US13/971,027 US9104918B2 (en) 2012-08-20 2013-08-20 Method and system for detecting sea-surface oil
US14/823,771 US9412027B2 (en) 2012-08-20 2015-08-11 Detecting anamolous sea-surface oil based on a synthetic discriminant signal and learned patterns of behavior
US15/232,743 US20160350908A1 (en) 2012-08-20 2016-08-09 Method and system for detecting sea-surface oil

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US14/823,771 Continuation US9412027B2 (en) 2012-08-20 2015-08-11 Detecting anamolous sea-surface oil based on a synthetic discriminant signal and learned patterns of behavior

Publications (1)

Publication Number Publication Date
US20160350908A1 true US20160350908A1 (en) 2016-12-01

Family

ID=50100053

Family Applications (3)

Application Number Title Priority Date Filing Date
US13/971,027 Active 2033-10-18 US9104918B2 (en) 2012-08-20 2013-08-20 Method and system for detecting sea-surface oil
US14/823,771 Active US9412027B2 (en) 2012-08-20 2015-08-11 Detecting anamolous sea-surface oil based on a synthetic discriminant signal and learned patterns of behavior
US15/232,743 Abandoned US20160350908A1 (en) 2012-08-20 2016-08-09 Method and system for detecting sea-surface oil

Family Applications Before (2)

Application Number Title Priority Date Filing Date
US13/971,027 Active 2033-10-18 US9104918B2 (en) 2012-08-20 2013-08-20 Method and system for detecting sea-surface oil
US14/823,771 Active US9412027B2 (en) 2012-08-20 2015-08-11 Detecting anamolous sea-surface oil based on a synthetic discriminant signal and learned patterns of behavior

Country Status (4)

Country Link
US (3) US9104918B2 (en)
EP (1) EP2885766A4 (en)
BR (1) BR112015003444A2 (en)
WO (1) WO2014031615A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133176A (en) * 2017-12-05 2018-06-08 交通运输部规划研究院 A kind of method for filling out sea using remote Sensing Interpretation analytical technology monitoring harbour

Families Citing this family (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
BR112015003444A2 (en) * 2012-08-20 2017-07-04 Behavioral Recognition Sys Inc method and system for detecting oil on sea surface
US10373470B2 (en) * 2013-04-29 2019-08-06 Intelliview Technologies, Inc. Object detection
CA2847707C (en) 2014-03-28 2021-03-30 Intelliview Technologies Inc. Leak detection
US10943357B2 (en) 2014-08-19 2021-03-09 Intelliview Technologies Inc. Video based indoor leak detection
CN104574427B (en) * 2015-02-04 2016-01-20 中国石油大学(华东) A kind of offshore spilled oil image partition method
US10055648B1 (en) * 2015-04-16 2018-08-21 Bae Systems Information And Electronic Systems Integration Inc. Detection, classification, and tracking of surface contacts for maritime assets
WO2017087334A1 (en) * 2015-11-16 2017-05-26 Orbital Insight, Inc. Moving vehicle detection and analysis using low resolution remote sensing imagery
US10026193B2 (en) * 2016-05-24 2018-07-17 Qualcomm Incorporated Methods and systems of determining costs for object tracking in video analytics
DE102016210632A1 (en) 2016-06-15 2017-12-21 Bayerische Motoren Werke Aktiengesellschaft Method for checking a media loss of a motor vehicle and motor vehicle and system for carrying out such a method
CN107578064B (en) * 2017-08-23 2020-09-04 中国地质大学(武汉) Sea surface oil spill detection method based on superpixel and utilizing polarization similarity parameters
CN107609577B (en) * 2017-08-23 2020-05-01 中国地质大学(武汉) Method for extracting polarized SAR sea surface oil film by using random forest
JP6797860B2 (en) * 2018-05-02 2020-12-09 株式会社日立国際電気 Water intrusion detection system and its method
WO2021154459A1 (en) * 2020-01-30 2021-08-05 Boston Polarimetrics, Inc. Systems and methods for synthesizing data for training statistical models on different imaging modalities including polarized images
DE102020203293B4 (en) * 2020-03-13 2022-09-08 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung eingetragener Verein A device for detecting water on a surface and a method for detecting water on a surface
CN111396944B (en) * 2020-03-26 2021-04-23 珠海格力电器股份有限公司 Self-adaption method and device for working state of range hood, storage medium and range hood
US11526544B2 (en) * 2020-05-07 2022-12-13 International Business Machines Corporation System for object identification
US11741625B2 (en) * 2020-06-12 2023-08-29 Elphel, Inc. Systems and methods for thermal imaging
CN113490027A (en) * 2021-07-07 2021-10-08 武汉亿融信科科技有限公司 Short video production generation processing method and equipment and computer storage medium
CN113505712B (en) * 2021-07-16 2024-06-11 自然资源部第一海洋研究所 Sea surface oil spill detection method of convolutional neural network based on quasi-balance loss function
CN113567352B (en) * 2021-08-16 2022-06-03 中国人民解放军63921部队 Ocean oil spill detection method and device based on polarized hemispherical airspace irradiation
CN113869287B (en) * 2021-12-01 2022-03-11 南京信息工程大学 Low-wind-speed sea surface oil spill detection method
US20240062511A1 (en) * 2022-08-16 2024-02-22 Saudi Arabian Oil Company Identifying and remediating oil spills

Family Cites Families (57)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4679077A (en) 1984-11-10 1987-07-07 Matsushita Electric Works, Ltd. Visual Image sensor system
US5113507A (en) 1988-10-20 1992-05-12 Universities Space Research Association Method and apparatus for a sparse distributed memory system
JP3123587B2 (en) 1994-03-09 2001-01-15 日本電信電話株式会社 Moving object region extraction method using background subtraction
JPH11502335A (en) 1995-03-22 1999-02-23 イーデーテー インテルナツィオナール ディギタール テクノロギース ドイッチュラント ゲーエムベーハー Apparatus and method for providing depth modeling and depth information of moving object
US7076102B2 (en) 2001-09-27 2006-07-11 Koninklijke Philips Electronics N.V. Video monitoring system employing hierarchical hidden markov model (HMM) event learning and classification
US5969755A (en) 1996-02-05 1999-10-19 Texas Instruments Incorporated Motion based event detection system and method
US5751378A (en) 1996-09-27 1998-05-12 General Instrument Corporation Scene change detector for digital video
US6263088B1 (en) 1997-06-19 2001-07-17 Ncr Corporation System and method for tracking movement of objects in a scene
US6711278B1 (en) 1998-09-10 2004-03-23 Microsoft Corporation Tracking semantic objects in vector image sequences
US6570608B1 (en) 1998-09-30 2003-05-27 Texas Instruments Incorporated System and method for detecting interactions of people and vehicles
WO2000034919A1 (en) 1998-12-04 2000-06-15 Interval Research Corporation Background estimation and segmentation based on range and color
US7136525B1 (en) 1999-09-20 2006-11-14 Microsoft Corporation System and method for background maintenance of an image sequence
US6674877B1 (en) 2000-02-03 2004-01-06 Microsoft Corporation System and method for visually tracking occluded objects in real time
US6940998B2 (en) 2000-02-04 2005-09-06 Cernium, Inc. System for automated screening of security cameras
US7868912B2 (en) 2000-10-24 2011-01-11 Objectvideo, Inc. Video surveillance system employing video primitives
US6678413B1 (en) 2000-11-24 2004-01-13 Yiqing Liang System and method for object identification and behavior characterization using video analysis
US20030107650A1 (en) 2001-12-11 2003-06-12 Koninklijke Philips Electronics N.V. Surveillance system with suspicious behavior detection
US20060165386A1 (en) 2002-01-08 2006-07-27 Cernium, Inc. Object selective video recording
US7436887B2 (en) 2002-02-06 2008-10-14 Playtex Products, Inc. Method and apparatus for video frame sequence-based object tracking
US6856249B2 (en) 2002-03-07 2005-02-15 Koninklijke Philips Electronics N.V. System and method of keeping track of normal behavior of the inhabitants of a house
US7006128B2 (en) 2002-05-30 2006-02-28 Siemens Corporate Research, Inc. Object detection for sudden illumination changes using order consistency
US7227893B1 (en) 2002-08-22 2007-06-05 Xlabs Holdings, Llc Application-specific object-based segmentation and recognition system
US7200266B2 (en) 2002-08-27 2007-04-03 Princeton University Method and apparatus for automated video activity analysis
US6999600B2 (en) * 2003-01-30 2006-02-14 Objectvideo, Inc. Video scene background maintenance using change detection and classification
US7026979B2 (en) 2003-07-03 2006-04-11 Hrl Labortories, Llc Method and apparatus for joint kinematic and feature tracking using probabilistic argumentation
WO2005017781A1 (en) * 2003-07-25 2005-02-24 Sony Electronics Inc. Video content scene change determination
US7127083B2 (en) 2003-11-17 2006-10-24 Vidient Systems, Inc. Video surveillance system with object detection and probability scoring based on object class
US20060018516A1 (en) 2004-07-22 2006-01-26 Masoud Osama T Monitoring activity using video information
CA2575211C (en) 2004-07-30 2012-12-11 Euclid Discoveries, Llc Apparatus and method for processing video data
JP2006080437A (en) 2004-09-13 2006-03-23 Intel Corp Method and tool for mask blank inspection
GB0424030D0 (en) * 2004-10-28 2004-12-01 British Telecomm A method and system for processing video data
US7620266B2 (en) 2005-01-20 2009-11-17 International Business Machines Corporation Robust and efficient foreground analysis for real-time video surveillance
US20060190419A1 (en) 2005-02-22 2006-08-24 Bunn Frank E Video surveillance data analysis algorithms, with local and network-shared communications for facial, physical condition, and intoxication recognition, fuzzy logic intelligent camera system
ATE487201T1 (en) 2005-03-17 2010-11-15 British Telecomm TRACKING OBJECTS IN A VIDEO SEQUENCE
US20060222206A1 (en) 2005-03-30 2006-10-05 Cernium, Inc. Intelligent video behavior recognition with multiple masks and configurable logic inference module
US7825954B2 (en) 2005-05-31 2010-11-02 Objectvideo, Inc. Multi-state target tracking
US20070250898A1 (en) 2006-03-28 2007-10-25 Object Video, Inc. Automatic extraction of secondary video streams
CN101410855B (en) 2006-03-28 2011-11-30 爱丁堡大学评议会 Method for automatically attributing one or more object behaviors
CA2649389A1 (en) 2006-04-17 2007-11-08 Objectvideo, Inc. Video segmentation using statistical pixel modeling
US8467570B2 (en) 2006-06-14 2013-06-18 Honeywell International Inc. Tracking system with fused motion and object detection
US7916944B2 (en) 2007-01-31 2011-03-29 Fuji Xerox Co., Ltd. System and method for feature level foreground segmentation
KR101260847B1 (en) 2007-02-08 2013-05-06 비헤이버럴 레코그니션 시스템즈, 인코포레이티드 Behavioral recognition system
US8358342B2 (en) 2007-02-23 2013-01-22 Johnson Controls Technology Company Video processing systems and methods
US8086036B2 (en) 2007-03-26 2011-12-27 International Business Machines Corporation Approach for resolving occlusions, splits and merges in video images
US7813528B2 (en) 2007-04-05 2010-10-12 Mitsubishi Electric Research Laboratories, Inc. Method for detecting objects left-behind in a scene
US8064639B2 (en) 2007-07-19 2011-11-22 Honeywell International Inc. Multi-pose face tracking using multiple appearance models
US8124931B2 (en) * 2007-08-10 2012-02-28 Schlumberger Technology Corporation Method and apparatus for oil spill detection
WO2009049314A2 (en) 2007-10-11 2009-04-16 Trustees Of Boston University Video processing system employing behavior subtraction between reference and observed video image sequences
EP2093698A1 (en) 2008-02-19 2009-08-26 British Telecommunications Public Limited Company Crowd congestion analysis
US8452108B2 (en) 2008-06-25 2013-05-28 Gannon Technologies Group Llc Systems and methods for image recognition using graph-based pattern matching
US8121968B2 (en) 2008-09-11 2012-02-21 Behavioral Recognition Systems, Inc. Long-term memory in a video analysis system
US9633275B2 (en) * 2008-09-11 2017-04-25 Wesley Kenneth Cobb Pixel-level based micro-feature extraction
US9373055B2 (en) 2008-12-16 2016-06-21 Behavioral Recognition Systems, Inc. Hierarchical sudden illumination change detection using radiance consistency within a spatial neighborhood
US9756262B2 (en) * 2009-06-03 2017-09-05 Flir Systems, Inc. Systems and methods for monitoring power systems
TW201140470A (en) * 2010-05-13 2011-11-16 Hon Hai Prec Ind Co Ltd System and method for monitoring objects and key persons of the objects
US8749636B2 (en) * 2011-07-12 2014-06-10 Lockheed Martin Corporation Passive multi-band aperture filters and cameras therefrom
BR112015003444A2 (en) * 2012-08-20 2017-07-04 Behavioral Recognition Sys Inc method and system for detecting oil on sea surface

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108133176A (en) * 2017-12-05 2018-06-08 交通运输部规划研究院 A kind of method for filling out sea using remote Sensing Interpretation analytical technology monitoring harbour

Also Published As

Publication number Publication date
WO2014031615A1 (en) 2014-02-27
BR112015003444A2 (en) 2017-07-04
US9104918B2 (en) 2015-08-11
EP2885766A4 (en) 2017-04-26
EP2885766A1 (en) 2015-06-24
US9412027B2 (en) 2016-08-09
US20140050355A1 (en) 2014-02-20
US20150347856A1 (en) 2015-12-03

Similar Documents

Publication Publication Date Title
US9412027B2 (en) Detecting anamolous sea-surface oil based on a synthetic discriminant signal and learned patterns of behavior
US9111148B2 (en) Unsupervised learning of feature anomalies for a video surveillance system
US9652863B2 (en) Multi-mode video event indexing
US10282622B2 (en) Marine intrusion detection system and method
US8565484B2 (en) Forest fire smoke detection method using random forest classification
US9412025B2 (en) Systems and methods to classify moving airplanes in airports
CN102867386B (en) Intelligent video analysis-based forest smoke and fire detection method and special system thereof
US9111353B2 (en) Adaptive illuminance filter in a video analysis system
US20110043689A1 (en) Field-of-view change detection
US20160078272A1 (en) Method and system for dismount detection in low-resolution uav imagery
Tiwari et al. A survey on shadow detection and removal in images and video sequences
Yoon et al. An intelligent automatic early detection system of forest fire smoke signatures using Gaussian mixture model
Abidha et al. Reducing false alarms in vision based fire detection with nb classifier in eadf framework
Gragnaniello et al. Fire and smoke detection from videos: A literature review under a novel taxonomy
Lee et al. Fire detection using color and motion models
Hosseini et al. Anomaly and tampering detection of cameras by providing details
Patino et al. A comprehensive maritime benchmark dataset for detection, tracking and threat recognition
Muncaster et al. Real-time automated detection, tracking, classification, and geolocation of dismounts using EO and IR FMV
Schoonmaker et al. A multispectral automatic target recognition application for maritime surveillance, search, and rescue

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION