[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

US9977991B2 - Secure and private tracking across multiple cameras - Google Patents

Secure and private tracking across multiple cameras Download PDF

Info

Publication number
US9977991B2
US9977991B2 US14/693,791 US201514693791A US9977991B2 US 9977991 B2 US9977991 B2 US 9977991B2 US 201514693791 A US201514693791 A US 201514693791A US 9977991 B2 US9977991 B2 US 9977991B2
Authority
US
United States
Prior art keywords
input
site
function
video camera
distance
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US14/693,791
Other versions
US20150227814A1 (en
Inventor
Chun-Te Chu
Jaeyeon Jung
Zicheng Liu
Ratul Mahajan
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Microsoft Technology Licensing LLC
Original Assignee
Microsoft Technology Licensing LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Microsoft Technology Licensing LLC filed Critical Microsoft Technology Licensing LLC
Priority to US14/693,791 priority Critical patent/US9977991B2/en
Assigned to MICROSOFT CORPORATION reassignment MICROSOFT CORPORATION ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: JUNG, JAEYEON, LIU, ZICHENG, MAHAJAN, RATUL, CHU, CHUN-TE
Assigned to MICROSOFT TECHNOLOGY LICENSING, LLC reassignment MICROSOFT TECHNOLOGY LICENSING, LLC ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: MICROSOFT CORPORATION
Publication of US20150227814A1 publication Critical patent/US20150227814A1/en
Priority to US15/958,820 priority patent/US10181090B2/en
Application granted granted Critical
Publication of US9977991B2 publication Critical patent/US9977991B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/02Comparing digital values
    • G06K9/6201
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/51Indexing; Data structures therefor; Storage structures
    • G06F17/3028
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • G06K9/00711
    • G06K9/52
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/292Multi-camera tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/18Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast
    • H04N7/181Closed-circuit television [CCTV] systems, i.e. systems in which the video signal is not broadcast for receiving images from a plurality of remote sources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20036Morphological image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20224Image subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30232Surveillance

Definitions

  • Camera-based surveillance is widely employed to detect against crime such as burglaries and vandalism. Recently, the commoditization of video cameras and the advances in computer vision have lowered the barrier of deploying a small-scale camera-based surveillance system. These small-scale systems may be sufficient for recording activities in the immediate area, but some activities do not appear suspicious until data from multiple cameras is analyzed in aggregate. These types of activities include strangers knocking on several doors in a neighborhood (i.e., to figure out which houses are empty) and cars cruising the neighborhood without stopping anywhere (i.e., as part of reconnaissance prior to a burglary).
  • This disclosure presents a distributed surveillance technique that can track objects (e.g., humans, cars, etc.) across multiple cameras without sharing information about the objects other than the existence or absence of a match between two objects.
  • Objects are identified in video feeds captured by video cameras and represented mathematically by generating feature vectors from the objects.
  • a distance metric learning approach is used to compare the distance between two feature vectors and determine if vectors are similar enough to each other to conclude that the corresponding objects are actually two different videos of the same object. Since the creation of feature vectors is not limited to objects captured in video this technique is equally applicable in other fields that include data which can be reduced to a vector representation.
  • a distance function used to calculate the similarity between two vectors is decomposed into portions that are computed locally and portions that are computed jointly.
  • Computing systems associated with the respective cameras may solve the local portions of the distance function without privacy concerns. Therefore, computationally intensive encryption is not needed to solve the local portions of the distance function.
  • the joint portions of the distance function use data obtained from multiple cameras which can raises privacy concerns. Cryptographic operations are used to preserve privacy when solving the joint portion of the distance function so this part of the distance function imposes a higher processing burden on the associated computing systems.
  • the joint portion of the distance function uses a garbled circuit based secure multi-party computation to keep both participants' contributions (i.e., data derived from solving the local portions of the distance algorithm) hidden from each other.
  • the function may be represented as a Boolean circuit which is constructed of various logic gates. The type of and number of logic gates is an indication of the complexity, and thus the computer processing needed, to solve the function.
  • the joint portion of the distance function is constructed as an efficient Boolean circuit that uses a number and type of logic gates which allows for solving of the joint portion of the distance function with a relatively low computational burden while still maintaining privacy and preserving matching accuracy.
  • the approach of decomposing a distance function into local portions and joint portions as well as the design of the Boolean circuit provide a computationally efficient and privacy-preserving technique for comparing feature vectors. This technique may be use to track an object across multiple cameras without sharing private information between the cameras.
  • FIG. 1 is an illustrative architecture showing two sites equipped with video cameras collaborating to track an object.
  • FIG. 2 is a block diagram of components that may be present in a computing system shown in FIG. 1 .
  • FIG. 3 shows an ADD*(N, 1) garbled circuit when N is in power of 2.
  • FIG. 4 shows an ADD*(N, 1) garbled circuit when N is not in power of 2.
  • FIG. 5 shows a blind tracking distance computation (BTDC) garbled circuit.
  • FIGS. 6A and 6B show an illustrative process for joint computation of a match between two feature vectors.
  • FIG. 7 is a Receiver Operation Characteristic (ROC) curve showing the effect of training set size on accuracy of classifying object from the Viewpoint Invariant Pedestrian Recognition (VIPeR) database.
  • ROC Receiver Operation Characteristic
  • FIG. 8 is a ROC curve showing the effect of quantization level of an input vector on accuracy of classifying an object from the VIPeR database.
  • FIG. 9 is a ROC curve showing the relative accuracy of different secure matching functions for objects from the VIPeR database.
  • FIG. 10 is a ROC curve showing the relative accuracy of different secure matching functions for objects from the Imagery Library for Intelligent Detection Systems (iLIDS) database.
  • iLIDS Imagery Library for Intelligent Detection Systems
  • FIG. 11 is a ROC curve showing the relative accuracy of different secure matching functions for objects from the “human database.”
  • FIG. 12 is a ROC curve showing the relative accuracy of different secure matching functions for objects from the “car database.”
  • FIG. 13 is a graph showing linear change in the time required to compute a match between objects as a number of objects to match changes.
  • FIG. 1 shows an architecture 100 in which a first site 102 performs privacy-preserving multi-camera tracking with a second site 104 .
  • the first site 102 may be a residence located near to the second site 104 which may be a business.
  • Both the first site 102 and the second site 104 are equipped with video cameras 106 and 108 respectively.
  • each site 102 and 104 is shown as having only a single video camera 106 and 108 for simplicity, the sites 102 and 104 may each have more than one video camera.
  • the video cameras 106 and 108 may use any technology for capturing video images and may capture images across any spectrum such as visible light, infrared, etc. A still camera that captures a series of still images in succession may be substituted for either of the video cameras 106 and 108 .
  • Each of the sites 102 and 104 also contain one or more computing systems 110 and 112 .
  • the computing systems 110 and 112 may be integrated in whole or part with the video cameras 106 and 108 . Alternatively, the video cameras 106 and 108 may simply provide video feeds to the respective computing systems 110 and 112 .
  • the computing systems 110 and 112 may also be implemented as a single device such as a desktop computer, notebook computer, net book, etc., a local system comprising multiple devices such as a server and local area network, a distributed system such as a cloud computing system that may be located remotely and accessed through a communications network, or any other system architecture for processing information.
  • Each of the computing systems 110 and 112 may include a joint computation module 114 and 116 .
  • the joint computation modules 114 and 116 may facilitate each of the respective computing system's 110 and 112 participation in secure multi-party computation.
  • the joint computation modules 114 and 116 each provide an output that is usable by the other module for joint computation.
  • the joint computation modules 114 and 116 may be implemented using different code, different programming languages, and the like. Further details of the joint computation modules 114 and 116 are provided below.
  • Each of the respective sites 102 and 104 exists within a privacy boundary 118 and 120 .
  • the privacy boundaries 118 and 120 may include cameras, computing systems, and the like that are physically separated.
  • a portion of a one of the computing systems 110 and 112 implemented through a cloud computing architecture may utilize memory and processing resources that are physically remote from the respective site 102 and 104 but are nevertheless within the corresponding privacy boundary 118 and 120 due to a secure configuration of the cloud computing architecture.
  • the business at site 104 may have multiple locations (e.g., retail stores) and share video information among separate locations without privacy concerns.
  • physically proximate video cameras, computing systems and the like may be separated by a privacy boundary.
  • different residents may have their own video cameras. All of the video cameras, and possibly the respective computing systems connected to the video cameras, are located in the same apartment building.
  • each of the individual residents may have privacy concerns regarding sharing video information with his or her neighbors.
  • a single building may have multiple privacy boundaries within it.
  • the computing systems 110 and 112 have at least one communication connection 122 used for communicating with another site 102 and 104 inside another privacy boundary 118 and 120 in order to perform the secure multi-party computation.
  • the communication connection 122 may be a direct link between the computing systems 110 and 122 such as a physical cable or a wireless connection.
  • the communication connection 122 may be made through a communications network that also handles communication traffic from other computing systems.
  • the communications network may include any type of network for handling electronic data such as the Internet, a phone network, a cable network, etc.
  • each of the video cameras 106 and 108 captures video images of whatever background is in view of the video camera and foreground objects moving in front of that background.
  • the video camera 106 at the first site 102 may observe object A 124 .
  • the video camera 108 at the second site 104 may observe object B 126 .
  • Multi-camera tracking is used to determine if object A 124 is the same as object B 126 .
  • the objects may both be people and secure multi-party computation implemented in part by the joint computation modules 114 and 116 can be used to determine if object A 124 is the same person as object B 126 while still keeping the content of the respective video feeds obtained by the sites 102 and 104 secret from each other.
  • FIG. 2 shows an illustrative block diagram 200 of components that may be included in the computing system(s) 110 of FIG. 1 .
  • the computing system(s) 112 of FIG. 1 may include similar components but is omitted for simplicity.
  • the computing system(s) 110 contain one or more processing elements 202 and memory 204 distributed throughout one or more locations.
  • the processing elements 202 may include any combination of central processing units (CPUs), graphical processing units (GPUs), single core processors, multi-core processors, application-specific integrated circuits (ASICs), and the like.
  • One or more of the processing element(s) 202 may be implemented in software and/or firmware in addition to hardware implementations.
  • Software or firmware implementations of the processing element(s) 202 may include computer- or machine-executable instructions written in any suitable programming language to perform the various functions described.
  • Software implementations of the processing elements(s) 202 may be stored in whole or part in the memory 204 .
  • the memory 204 may be volatile (such as RAM) and/or non-volatile (such as ROM, flash memory, etc.).
  • the computing system(s) 110 may also include additional computer-readable media such as removable storage, non-removable storage, local storage, and/or remote storage.
  • the memory 204 and any associated computer-readable media may provide storage of computer readable instructions, data structures, program modules, and other data.
  • Computer-readable media includes, at least, two types of media, namely computer-readable storage media and communications media.
  • Computer-readable storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data.
  • Computer-readable storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computing device.
  • communication media may embody computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transmission mechanism.
  • a modulated data signal such as a carrier wave, or other transmission mechanism.
  • computer-readable storage media and communication media are mutually exclusive.
  • the computing system(s) 110 include a video camera connection 206 that may provide hardware and/or software interfaces for receiving a video feed from the video camera 106 and providing commands to the video camera 106 .
  • a communication connection 208 may connect the computing system(s) 110 to other computing systems such as those at the second site 104 .
  • the communication connection 208 may include hardware such as a socket for a network cable or antenna for radio transmissions as well as software for processing incoming and outgoing signals.
  • the block diagram 200 shows multiple modules included within the memory 204 . These modules may be implemented in software and alternatively, or in addition, implemented, at least in part, by one or more hardware logic components.
  • illustrative types of hardware logic components include Field-programmable Gate Arrays (FPGAs), Program-specific Integrated Circuits (ASICs), Program-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), etc.
  • An object detection and tracking module 210 detects an object in a video feed from the video camera 106 when the object is in view of the video camera 106 .
  • the object may be object A 124 shown in FIG. 1 .
  • the object detection and tracking module 210 can track objects across frames within a single camera such as video camera 106 .
  • the computation performed by the object detection and tracking module 210 involves only local processing (i.e., no information exchange needed), and thus, does not affect secure computation.
  • the object detection module 210 may perform background subtraction followed by blob extraction.
  • Background subtraction maintains a statistical background model that may be represented as a Gaussian distribution per pixel.
  • the statistical background model may be used to determine whether the pixel belongs to the background or foreground. While the module is making this determination, the mean and variance parameters of the background model can be updated.
  • the output of the background subtraction may be a binary image where 0 indicates background and 1 indicates foreground.
  • morphological reconstruction filters may be used to remove noise, connect broken regions of the foreground, and smooth the shapes of the foreground pixels.
  • the 4-connected neighbor rule can be used to extract the blobs that represent foreground objects.
  • the object detection and tracking module 210 may maintain a list of objects that are present in view of the video camera 106 .
  • Information stored for each of the objects in the list may include a state of the tracked objects including their location, size, and velocity.
  • a Kalman filter can be used to estimate the current state of a tracked object.
  • the object detection and tracking module 210 determines whether a tracked object is under occlusion by checking if a predicted state of the tracked object is consistent with the observation in the new frame. If the tracked object is not occluded, the result from blob extraction is regarded as reliable, and the nearest blob is selected as the corresponding measurement. Otherwise, the mean-shift method may be used to obtain the measurement. By using the measurement obtained either from the result of blob extraction or the result of the mean-shift method, it is possible to update the Kalman filter and obtain the current state even of occluded objects.
  • the feature vector module 212 can extract a feature vector from an object identified by the object detection and tracking module 210 .
  • Some computer vision algorithms use face or license plate features for identifying humans or cars captured in a video feed. However, in an uncontrolled environment like a neighborhood, these features may be unreliable since video cameras do not always capture a clear view of the face or license plate of an object. Therefore, the whole-body appearance of the object may provide a better basis for feature vector extraction.
  • Two categories of features that may be used in object matching (or re-identification) are local descriptors and global features.
  • Local descriptors are extracted from interest points on the objects, and the matching is performed using a voting scheme. Because local descriptors compute interest point detection in the feature extraction phase, this can add to the processing overhead and potentially affect accuracy.
  • Global features are easier to compute and some are effective for person re-identification.
  • color histograms and texture histograms may be used to create feature vectors from the whole-body or different parts of the body of an object such as a person or car.
  • One technique for generating a color histogram includes dividing an object into six horizontal regions. In each region, color histograms of RGB, YCbCr, and HSV color spaces are extracted. In this technique, each channel has 64 bins, and the histograms are concatenated into the feature vector in a 3456-dimension feature space.
  • a preprocessing module 214 performs local processing of the feature vector by normalizing the feature vector and quantizing the feature vector into an integer value to form a secure input. For example, feature vectors may be transformed into the secure input which can be a vector that has low dimensional integer entries.
  • the preprocessing module 214 may perform a local computation portion of a vector distance function that is decomposed into local and joint computation portions. The decomposition minimizes the parts of the vector distance function that are affected by the complexity of secure computation because the local computation portion may be solved without privacy-protecting security.
  • the training stage is discussed in greater detail below.
  • each entry of z i may be normalized to a real number from 0 to 1 and further quantized into an l-bit integer value to form a q-dimensional input vector z i which can be used as a secure input to a garbled circuit.
  • Matching accuracy may be improved with an offline training phase in order for the vector matching function to learn the distance metric.
  • This learning may be done in a pairwise manner. That is, a set of parameters may be learned for each pair of cameras (e.g., video camera 106 and video camera 108 ), rather than learning global parameters. Objects by the camera that the site owners have indicated a willingness to share publicly may be used as inputs for the learning.
  • a garbled circuit generator 216 generates a garbled circuit for the joint computation portion of the vector distance function. Inputs to the garbled circuit comprise the secure input from the first site 102 and a secure input from the second site 104 .
  • the second site 104 generates the secure input by local processing using techniques that are the same or similar to those described above for the first site 102 .
  • the secure computation protocol may be based on a Boolean circuit for a predefined distance function D(x i ,x j ).
  • One party (the circuit generator, here the first site 102 ) prepares a garbled version of the circuit and the second party (the circuit evaluator, here the second site 104 ) obliviously computes the output of the garbled circuit.
  • the circuit generator here the first site 102
  • the circuit evaluator here the second site 104
  • a match query module 218 sends a query via the communication connection 208 to the second site 104 to request participation in solving the jointly-computed portion of the vector distance function.
  • the match query module 218 may send a secure matching function evaluation request to one or more peer sites such as the second site 104 every time the object detection and tracking module 210 records that a suspicious object has entered the view of the video camera 106 .
  • Suspicious object may be defined by criteria that are local to the computing system(s) 110 . For example, feature vectors of suspicious or “blacklisted” objects may be stored in the database 222 and newly detected objects may be compared against this list of suspicious objects.
  • a secure matching function evaluation request may also be sent when an object enters or leaves the view of the video camera 106 .
  • the computing system(s) 110 may issue secure matching function evaluation requests to some or all of the other participating sites.
  • the issuing system e.g., computing system(s) 110 of the first site 102
  • the responding system e.g., computing system(s) 112 of the second site 104
  • the topology of cameras may be used to improve matching efficiency. Based on an object's time and spatial information, sites can send matching requests only to select other sites that have video cameras positioned to cover locations where the object is likely to travel next.
  • the joint computation module 114 introduced in FIG. 1 solves the jointly-computed portion of the vector distance function with the second site 104 to determine if the first object (e.g., object A 124 ) is the same as the second object (e.g., object B 126 ). Matching between input vectors may be performed through secure two-party computation using the joint computation module 114 of the first site 102 and the joint computation module 116 of the second site 104 .
  • the garbled circuit constructed by the garbled circuit generator 216 and used for joint computation portion of the vector distance function is an “efficient” circuit because it may be constructed to use only integer comparison (CMP), min (MIN), and addition (ADD) circuits.
  • the garbled circuit takes as input the two input vectors from the sites 102 and 104 and a comparison threshold used to set the stringency of matching. If the computed distance between the two input vectors is smaller than the comparison threshold, the vectors are regarded as matching (i.e., object A 124 and object B 126 are the same object).
  • An alert module 220 generates an alert depending on if the first object is or is not the same as the second object. In some implementations, the alert module 220 may generate the alert in real time essentially as soon as a result is received from the joint computation module 114 . Depending on the criteria used to determine whether a particular behavior is “suspicious” behavior or not, either the presence or the absence of a match may be a trigger for an alert.
  • the alert may be a local alert shared only with the first site 102 , shared only with the first site 102 and the second site 104 , or the alert may be shared with other sites that are connected to either the first site 102 or the second site 104 .
  • a database 222 may also be present in a computing system(s) 110 either in the memory 204 or another location.
  • the database 222 may store information such as feature vectors extracted from objects that have been in the view of the video camera 106 .
  • the computing system(s) 110 may store a record of all the objects seen by storing the corresponding feature vectors.
  • This database 222 may be queried in response to an incoming match request from another site (e.g., second site 104 ).
  • the computing system(s) 110 when functioning as a server may respond to queries not only for objects that are currently in view of the video camera 106 but also for objects that have previously been viewed and are stored, as feature vectors, in the database 222 .
  • Each record in the database 222 may be associated with entry and exit timestamps recording when the object entered and left the view of the video camera 106 . This may enable sites in a surveillance community to track how an object moves through the community. The path of the object may be used as a trigger to raise one or more alerts.
  • Distance metric learning is one way to improve matching accuracy by training a distance functionD (x i ,x j ) where x i and x j represent feature vectors extracted from objects.
  • distance metric learning techniques are computationally expensive and inefficient for secure two-party computation.
  • One goal of distance metric learning is to learn a distance function that can discriminate positive and negative examples (e.g., objects known to be the same and objects known to be different). That is, the distance calculated by an effective distance function D(x i ,x j ) is likely to be small when x i and x j correspond to the same object and large when x i and x j correspond to different objects.
  • a secure computation protocol starts with a Boolean circuit for D(x i , x j ).
  • a portion of the circuit may be garbled to provide privacy.
  • the garbling involves generating keys to each wire and generating a garbled truth table for each gate.
  • the number of gates determines the complexity of a garbled circuit, impacting the performance of generating and evaluating the garbled circuit.
  • XOR gates come “free” meaning that cryptographic operations are not required for computing outputs from XOR gates in garbled circuits.
  • Table I compares the size (as measured in terms of the number of non XOR gates) of efficient circuit constructions for basic functions that compute on two l-bit integers (i.e., the input vectors may be converted to integer values before being used as inputs to the garbled circuit).
  • FIG. 3 shows a garbled circuit construction 300 of this function when N is in power of 2.
  • ADD circuits are connected in a hierarchical fashion to minimize table size.
  • the output of ADD*(N,l) is a (l+log 2 N)-bit integer. Since the number of non XOR gates of ADD(l) circuit is
  • 1 as shown in Table I above, the size of ADD*(N,l) is defined by Equation 2
  • N is decomposed into the summation of a set of numbers consisting of power of 2 as shown in Equation 3.
  • roots[t] the remaining x j 13. end if 14.
  • right_tree output of the above ADD circuit 18.
  • gateN gateN +
  • r r ⁇ 1 20. end while
  • x i ⁇ d denotes the original real-valued feature vector.
  • p denote the positive training set consisting of matched feature vector pairs
  • n denote the negative training set consisting of unmatched feature vector pairs.
  • PRDC Probabilistic Relative Distance Comparison
  • the first intermediate step of Table II shows computing the absolute value of the subtraction of two l-bit integers (assuming each input element is quantized to be an l-bit integer).
  • This circuit can be constructed by connecting 2Sorter(l) (which takes two l-bit integers, x and y, and outputs min(x, y) and max(x, y)) and one regular subtraction circuit.
  • this first intermediate step requires d numbers of 2Sorter(l) and SUB(l) circuits.
  • the second intermediate step of Table II performs d ⁇ L multiplications of two l-bit integers which are the outputs of the first intermediate step. Then, the second intermediate step adds d elements of the multiplication results, which are 2l-bit integers.
  • the third intermediate step of Table II first performs L multiplications of two (2l+log 2 d)-bit integers and the output of each multiplication which is a (4l+2 log 2 d)-bit integer. This is summed up (L elements total) and the output which is a (4l+2 log 2 d+2 log 2 L)-bit integer that is compared against the threshold. The distance threshold for match determination is estimated in a training stage to achieve appropriate true positive rates and false positive rates. In sum, the number of non XOR circuits for PRDC is shown by Equation 5
  • W T x i is assumed to be quantized to be an l-bit integer. Construction of the circuits in the intermediate steps of Table III is straightforward to one of ordinary skill in the art and omitted.
  • the non-XOR circuit size for PRDC without absolute values is found by Equation 7:
  • L ⁇
  • CMP(2 l +log 2 2 l 2 L+ 2 lL+L ⁇ 1.
  • MCC Metric learning by Collapsing Classes
  • PCA Principal Component Analysis
  • W T x i is assumed to be quantized to be an l-bit integer.
  • the circuits for each of the intermediate steps of Table IV are similar to the circuits for PRDC without absolute values shown in Table III but with L replaced by q. Therefore, the circuit size for MCC is found by Equation 10:
  • q ⁇
  • CMP(2 l +log 2 q 2 l 2 q+ 2 lq+q ⁇ 1.
  • An alternative function may be developed by realizing that learning the function D(.) is equivalent to learning the parameter W and observing that the distance function used for MCC, D 3 (.), is equivalent to computing a Euclidean distance in an alternate vector space specified by W. Replacing the Euclidean distance function with a histogram intersection function after linearly projecting onto a new vector space specified by W avoids the secure multiplication operation shown in the forth intermediate step of Table IV.
  • the distance function referred to herein as Blind Tracking Distance Computation (BTDC) is Equation 12:
  • w q ] ⁇ r ⁇ q , r ⁇ d and q ⁇ r and T ⁇ 0 is a predefined parameter that ensures the non-negativity of w k T ⁇ circumflex over (x) ⁇ i +T for any feature vector ⁇ circumflex over (x) ⁇ i .
  • the BTDC function is computationally efficient in part because computation is shifted locally (i.e., where there are no privacy concerns) and joint computation is minimized.
  • the Boolean circuits used to represent the BTDC function are shown in Table V.
  • the first and second intermediate steps of Table V are preprocessing steps performed by local computation. For example, these steps may be performed by the preprocessing module 214 .
  • the secure two-party computation represented by the third and fourth intermediate steps of Table V may take only integer input, therefore each entry of z i may be first normalized to a real number from 0 to 1 and further quantized into an l-bit integer value to form a q-dimensional input vector z i .
  • BTDC simplifies the garbled circuit needed for secure function evaluation, as it only requires integer comparison (CMP), min (MIN), and addition (ADD).
  • CMP integer comparison
  • MIN min
  • ADD addition
  • the garbled circuit uses two vectors, ⁇ circumflex over (z) ⁇ i and ⁇ circumflex over (z) ⁇ i (ql-bit integers) and the comparison threshold t as inputs.
  • FIG. 5 shows the garbled circuit of the BTDC distance function.
  • the parameter W from D 4 (.) above may be learned by learning a linear projection matrix, such that the pairs in set p have small distances, and the pairs in set n have large distances. This can be achieved by defining a conditional distribution over points i ⁇ j such that Equation 15 is
  • Equation 17 the cost function ⁇ 1 (W) is defined as Equation 17:
  • Equation 18 K-L divergence which measures the distance between two distributions.
  • Equation 20 is ( w k T ⁇ circumflex over (x) ⁇ i ) 2 ⁇ w k ⁇ 2 2 ⁇ circumflex over (x) ⁇ i ⁇ 2 2 ⁇ w k ⁇ 2 2 ⁇ 2 .
  • Equation 21 If ⁇ w k ⁇ 2 2 is bounded, (w k T ⁇ circumflex over (x) ⁇ i ) 2 is bounded; that is, for all the vectors ⁇ circumflex over (x) ⁇ i , there exists a nonnegative scalar T such that w k T +T ⁇ 0, ⁇ k. This yields Equation 21:
  • the metric learning problem is formulated as finding W that minimizes the objective function J(W) as in Equation 26:
  • FIGS. 6A and 6B show process 600 illustrated as a collection of blocks in a logical flowchart, which represents a sequence of operations that can be implemented in hardware, software, or a combination of hardware and software.
  • process 600 may be performed using different architectures and/or computing systems.
  • the first feature vector may be a private input that the first site 102 desires to keep secret from the second site 104 .
  • the first feature vector may be extracted from an object viewed by the video camera 106 of the first site 102 .
  • other types of private inputs such as still images, images of human faces, handwritten text, biometric features, or the like may also be the source of the first feature vector.
  • the first feature vector may be quantized into an integer value for later processing.
  • the object is an object for which the system will seek a possible match with a peer site (e.g., second site 104 ).
  • a determination to seek a match for a given object may be based on the first object matching an object of interest, entering the view of the first video camera 106 , leaving the view of the first video camera 106 , or for another reason. Determining that the object matches an object of interest may be performed locally at the first site 102 using a distance matching function that is not necessarily private or secure. The match may be performed by comparing the first feature vector to stored feature vectors that are labeled as “objects of interest.” For example, in a security implementation, cars that belong to known criminals may be used as objects of interest. If the object is determined to be an object for which a match is sought from a peer site, process 600 proceeds along the “yes” path to 608 .
  • process 600 may proceed from 604 along the “no” path to 606 and the process 600 may end.
  • all objects may be submitted for secure matching so the process 600 would not end but rather proceed to 608 for all detected objects.
  • a secure match evaluation request is submitted to a peer computing system.
  • the peer computing system may be the computing system 112 of the second site 104 .
  • the peer computing system receives the secure match evaluation request.
  • the peer computing system at the second site 104 may obtain a feature vector at 612 by techniques similar or identical to those used by the first site 102 at 602 .
  • the second site 104 may obtain the feature vector from the current video feed of the video camera 108 or from a database that stores feature vectors from previously observed objects. For example, the peer computing system may compare the incoming requests to objects that appeared within a predetermined time window (e.g., 10 minutes) prior to receiving the request.
  • a predetermined time window e.g. 10 minutes
  • the first site 102 solves its local computation portion of a distance function.
  • the local portion of the distance function is computed within the privacy boundary 118 of the first site 102 .
  • the local computation portion may comprise transformation of the feature vector obtained at 602 to a lower dimensionality vector.
  • the transformation may be performed by PCA and linear mapping.
  • the first and second intermediate steps shown in Table V are the local computation portion of the BTDC distance function.
  • the computing system(s) 112 of the second site 104 perform a similar local computation on the feature vector obtained at 612 .
  • a joint portion of the distance function is solved by joint computation between the first site 102 and the second site 104 .
  • the joint computation portion of the distance function may be constructed as a garbled circuit to prevent either site from obtaining the other site's secure input.
  • the garbled circuit may be an efficient circuit that only contains the Boolean CMP, MIN, and ADD circuits.
  • the garbled circuit may be the circuit shown in FIG. 5 .
  • each potential match there may be one process on the client side (e.g., the first site 102 ) that communicates with a process on the server side (e.g., the second site 104 ). These two processes may communicate with each other to determine if their respective input feature vectors match.
  • the server may start one process per potential match. Each process can be started with a feature vector that belongs to a different object and uses a different transmission control protocol (TCP) port for communication. Then the server returns to the client the list of ports where each process is listening Upon receiving this list, the client may start one process for each port.
  • TCP transmission control protocol
  • the feature vector that is input to client-side processes is the same, corresponding to the object that triggered the matching process.
  • the client and server read the outputs of these processes to determine if any of the pairwise matches yield a positive result.
  • This technique may reveal to the client how many (but not which) objects were seen by the server in the matching time window. If this information is sensitive, to hide it, the server may initiate matches against a fixed number of objects. This fixed number may be an upper bound on the number of objects that can be seen in the matching time period. When fewer objects have been seen, the remaining processes can be supplied random feature vectors as input.
  • a distance determined by the joint solving of the distance function at 618 is compared to a threshold distance.
  • the threshold distance may be the threshold ⁇ circumflex over (t) ⁇ discussed above.
  • process 600 proceeds along the “yes” path to 622 .
  • process 600 proceeds along the “no” path to 624 .
  • the private input generated by the first site 102 at 614 and used for solving the joint portion of the distance function at 618 is categorized as matching the private input similarly generated by the second site 104 .
  • the match of the two private inputs indicates that the objects from which the private inputs were originally created are the same object.
  • the private inputs are interpreted as being different and both sites learn that the objects do not match without sharing any distinguishing information about their respective video feeds.
  • Each dataset consists of the snapshots of several different objects and each object has two images captured by different cameras. Thus, the datasets provide still images corresponding to individual frames captured by a video camera.
  • VIPeR Viewpoint Invariant Pedestrian Recognition
  • iLIDS Imagery Library for Intelligent Detection Systems
  • the private datasets were created by collecting multiple video clips from two video cameras pointing at two different adjacent streets. Multiple video clips were collected from each camera at different times of day for a total of 207 minutes of video. Because the cameras are at different locations, the size and perspective of the objects they capture is different. Images of 114 people from 100 minutes of the video were extracted to create a human dataset. A car dataset was created by extracting images of 83 cars from 40 minutes of video clips.
  • snapshots from the human dataset or the car dataset were manually cropped around the objects.
  • the public datasets, VIPer and iLIDS provide cropped objects.
  • An input feature vector is extracted from images of humans in the dataset by using color histogram as the input feature. Color histogram is resilient against the scale of the object. For images of cars, multiple stripes are extracted based on the principal axis of the car using color histogram. By extracting multiple stripes the feature representation of the car is rotation invariant.
  • ROC Receiver Operational Characteristic
  • FIG. 7 shows ROC curves generated by different training set sizes. Comparison was performed on images from the VIPeR dataset.
  • the training set sizes tested are 20, 50, 80, and 100 objects. Accuracy increases as the training set size increases from 20 to 120. However, the accuracy improves only slightly when the size of training set increases from 80 to 100. Accordingly, 80 randomly selected objects are used as the training set for experiments performed with the VIPer, iLIDS, and human datasets.
  • the car dataset contains images of only 83 cars so a smaller number of objects, 68, were used for training.
  • FIG. 8 shows ROC curves generated by different quantization levels.
  • an input vector is quantized into one of a finite set of prescribed integer values in order to generate and integer input to a Boolean circuit for secure computation.
  • the larger the quantization level the smaller the negative impact on accuracy.
  • Different quantization levels were tested on images from the VIPeR dataset.
  • the curve showing “real value” presents results obtained by using original real value of a vector without quantization.
  • Accuracy increases as quantization increases from 32 bits to 256 bits.
  • a quantization level of less than 64 bits results in a loss of accuracy.
  • FIG. 9 shows the ROC curves under different matching functions when tested against the VIPeR dataset.
  • the PRDC without absolute values distance metric learning technique had significantly lower accuracy than the other three techniques.
  • the BTDC distance metric learning technique derived above obtains similar performance to MCC while reducing the computational cost.
  • FIG. 10 shows the ROC curves under different matching functions when tested against the iLIDS dataset. Similar to FIG. 9 , the BTDC technique achieves a similar level of accuracy as the MCC technique. The overall performance of all the methods with respect to iLIDS dataset is slightly lower than that for the VIPeR dataset due to the presence of occlusions in front of foreground objects.
  • FIG. 11 shows the ROC curves under different matching functions when tested against the human dataset.
  • the BTDC technique has accuracy that is comparable to the PRDC and MCC techniques, while the PRDC without absolute values has much lower accuracy.
  • FIG. 12 shows the ROC curves under different matching functions when tested against the car dataset. The results are similar to FIG. 11 .
  • Tables VI-IX below show the relationship between matching accuracy and performance for the four different distance metric learning techniques.
  • Area Under an ROC Curve (AUC) is used to represent accuracy characteristics of each method because the closer a given ROC curve is to the up-left corner the more accurate the method was that generated that ROC curve and the larger the AUC.
  • AUC value ranges from 0 to 1.
  • FIG. 13 shows performance results for secure matching using the BTDC technique.
  • the system used for this test includes two netbooks each with a 1 GHz processor and 2 GB memory that run Windows 8 with a network emulator to create a round-trip network delay of 100 ms between the netbooks.
  • the total time from the client issuing the match request to it recovering the result of the match in seconds for this test system is shown on the vertical axis of FIG. 13 .
  • the time it takes to securely match objects increases linearly with the number of objects that need to be matched. For single-object matches, the time is roughly 4 seconds. For matching 4-16 objects, the total time amounts to roughly 3 seconds per match. Thus, as many as 20 objects may be securely matched in under a minute.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Signal Processing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

A technique for multi-camera object tracking is disclosed that preserves privacy of imagery from each camera or group of cameras. This technique uses secure multi-party computation to compute a distance metric across data from multiple cameras without revealing any information to operators of the cameras except whether or not an object was observed by both cameras. This is achieved by a distance metric learning technique that reduces the computing complexity of secure computation while maintaining object identification accuracy.

Description

CROSS REFERENCE TO RELATED APPLICATION
This application is a continuation of U.S. patent application Ser. No. 12/731,822 filed Dec. 31, 2012, entitled “Secure and Private Tracking Across Multiple Cameras”. Application Ser. No. 13/731,822 is fully incorporated herein by reference
BACKGROUND
Camera-based surveillance is widely employed to detect against crime such as burglaries and vandalism. Recently, the commoditization of video cameras and the advances in computer vision have lowered the barrier of deploying a small-scale camera-based surveillance system. These small-scale systems may be sufficient for recording activities in the immediate area, but some activities do not appear suspicious until data from multiple cameras is analyzed in aggregate. These types of activities include strangers knocking on several doors in a neighborhood (i.e., to figure out which houses are empty) and cars cruising the neighborhood without stopping anywhere (i.e., as part of reconnaissance prior to a burglary).
If users of the small-scale systems shared data with each other, or with a third party such as law enforcement or a monitoring company, many more suspicious activities could be detected than if each user merely viewed his or her own video cameras. However, residents and businesses may be reluctant to share data from their respective video cameras due to privacy concerns. Thus, the ability to detect suspicious activity by making inferences across data received from multiple cameras deployed by multiple different users is hindered by the respective users' reluctance to sacrifice privacy by sharing their video with others.
SUMMARY
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.
This disclosure presents a distributed surveillance technique that can track objects (e.g., humans, cars, etc.) across multiple cameras without sharing information about the objects other than the existence or absence of a match between two objects. Objects are identified in video feeds captured by video cameras and represented mathematically by generating feature vectors from the objects. A distance metric learning approach is used to compare the distance between two feature vectors and determine if vectors are similar enough to each other to conclude that the corresponding objects are actually two different videos of the same object. Since the creation of feature vectors is not limited to objects captured in video this technique is equally applicable in other fields that include data which can be reduced to a vector representation.
A distance function used to calculate the similarity between two vectors is decomposed into portions that are computed locally and portions that are computed jointly. Computing systems associated with the respective cameras may solve the local portions of the distance function without privacy concerns. Therefore, computationally intensive encryption is not needed to solve the local portions of the distance function. However, the joint portions of the distance function use data obtained from multiple cameras which can raises privacy concerns. Cryptographic operations are used to preserve privacy when solving the joint portion of the distance function so this part of the distance function imposes a higher processing burden on the associated computing systems.
The joint portion of the distance function uses a garbled circuit based secure multi-party computation to keep both participants' contributions (i.e., data derived from solving the local portions of the distance algorithm) hidden from each other. The function may be represented as a Boolean circuit which is constructed of various logic gates. The type of and number of logic gates is an indication of the complexity, and thus the computer processing needed, to solve the function. The joint portion of the distance function is constructed as an efficient Boolean circuit that uses a number and type of logic gates which allows for solving of the joint portion of the distance function with a relatively low computational burden while still maintaining privacy and preserving matching accuracy.
Thus, the approach of decomposing a distance function into local portions and joint portions as well as the design of the Boolean circuit provide a computationally efficient and privacy-preserving technique for comparing feature vectors. This technique may be use to track an object across multiple cameras without sharing private information between the cameras.
BRIEF DESCRIPTION OF THE DRAWINGS
The Detailed Description is set forth with reference to the accompanying figures. In the figures, the left-most digit(s) of a reference number identifies the figure in which the reference number first appears. The use of the same reference numbers in different figures indicates similar or identical items.
FIG. 1 is an illustrative architecture showing two sites equipped with video cameras collaborating to track an object.
FIG. 2 is a block diagram of components that may be present in a computing system shown in FIG. 1.
FIG. 3 shows an ADD*(N, 1) garbled circuit when N is in power of 2.
FIG. 4 shows an ADD*(N, 1) garbled circuit when N is not in power of 2.
FIG. 5 shows a blind tracking distance computation (BTDC) garbled circuit.
FIGS. 6A and 6B show an illustrative process for joint computation of a match between two feature vectors.
FIG. 7 is a Receiver Operation Characteristic (ROC) curve showing the effect of training set size on accuracy of classifying object from the Viewpoint Invariant Pedestrian Recognition (VIPeR) database.
FIG. 8 is a ROC curve showing the effect of quantization level of an input vector on accuracy of classifying an object from the VIPeR database.
FIG. 9 is a ROC curve showing the relative accuracy of different secure matching functions for objects from the VIPeR database.
FIG. 10 is a ROC curve showing the relative accuracy of different secure matching functions for objects from the Imagery Library for Intelligent Detection Systems (iLIDS) database.
FIG. 11 is a ROC curve showing the relative accuracy of different secure matching functions for objects from the “human database.”
FIG. 12 is a ROC curve showing the relative accuracy of different secure matching functions for objects from the “car database.”
FIG. 13 is a graph showing linear change in the time required to compute a match between objects as a number of objects to match changes.
DETAILED DESCRIPTION
Illustrative Architecture
FIG. 1 shows an architecture 100 in which a first site 102 performs privacy-preserving multi-camera tracking with a second site 104. For example, the first site 102 may be a residence located near to the second site 104 which may be a business. Both the first site 102 and the second site 104 are equipped with video cameras 106 and 108 respectively. Although each site 102 and 104 is shown as having only a single video camera 106 and 108 for simplicity, the sites 102 and 104 may each have more than one video camera. The video cameras 106 and 108 may use any technology for capturing video images and may capture images across any spectrum such as visible light, infrared, etc. A still camera that captures a series of still images in succession may be substituted for either of the video cameras 106 and 108.
Each of the sites 102 and 104 also contain one or more computing systems 110 and 112. The computing systems 110 and 112 may be integrated in whole or part with the video cameras 106 and 108. Alternatively, the video cameras 106 and 108 may simply provide video feeds to the respective computing systems 110 and 112. The computing systems 110 and 112 may also be implemented as a single device such as a desktop computer, notebook computer, net book, etc., a local system comprising multiple devices such as a server and local area network, a distributed system such as a cloud computing system that may be located remotely and accessed through a communications network, or any other system architecture for processing information.
Each of the computing systems 110 and 112 may include a joint computation module 114 and 116. The joint computation modules 114 and 116 may facilitate each of the respective computing system's 110 and 112 participation in secure multi-party computation. The joint computation modules 114 and 116 each provide an output that is usable by the other module for joint computation. However, the joint computation modules 114 and 116 may be implemented using different code, different programming languages, and the like. Further details of the joint computation modules 114 and 116 are provided below.
Each of the respective sites 102 and 104 exists within a privacy boundary 118 and 120. Within the privacy boundaries 118 and 120 there are no privacy concerns regarding sharing information. Privacy concerns arise when information is shared across one of the privacy boundaries 118 and 120.
In some implementations, the privacy boundaries 118 and 120 may include cameras, computing systems, and the like that are physically separated. For example, a portion of a one of the computing systems 110 and 112 implemented through a cloud computing architecture may utilize memory and processing resources that are physically remote from the respective site 102 and 104 but are nevertheless within the corresponding privacy boundary 118 and 120 due to a secure configuration of the cloud computing architecture. As an additional example, the business at site 104 may have multiple locations (e.g., retail stores) and share video information among separate locations without privacy concerns.
In the same or different implementation, physically proximate video cameras, computing systems and the like may be separated by a privacy boundary. For example, in an apartment building different residents may have their own video cameras. All of the video cameras, and possibly the respective computing systems connected to the video cameras, are located in the same apartment building. However, each of the individual residents may have privacy concerns regarding sharing video information with his or her neighbors. Thus, a single building may have multiple privacy boundaries within it.
In any of the above implementations, the computing systems 110 and 112 have at least one communication connection 122 used for communicating with another site 102 and 104 inside another privacy boundary 118 and 120 in order to perform the secure multi-party computation. The communication connection 122 may be a direct link between the computing systems 110 and 122 such as a physical cable or a wireless connection. Alternatively, the communication connection 122 may be made through a communications network that also handles communication traffic from other computing systems. The communications network may include any type of network for handling electronic data such as the Internet, a phone network, a cable network, etc.
In the architecture 100 each of the video cameras 106 and 108 captures video images of whatever background is in view of the video camera and foreground objects moving in front of that background. For example, the video camera 106 at the first site 102 may observe object A 124. The video camera 108 at the second site 104 may observe object B 126. Multi-camera tracking is used to determine if object A 124 is the same as object B 126. For example, the objects may both be people and secure multi-party computation implemented in part by the joint computation modules 114 and 116 can be used to determine if object A 124 is the same person as object B 126 while still keeping the content of the respective video feeds obtained by the sites 102 and 104 secret from each other.
Illustrative Computing Device
FIG. 2 shows an illustrative block diagram 200 of components that may be included in the computing system(s) 110 of FIG. 1. The computing system(s) 112 of FIG. 1 may include similar components but is omitted for simplicity. The computing system(s) 110 contain one or more processing elements 202 and memory 204 distributed throughout one or more locations. The processing elements 202 may include any combination of central processing units (CPUs), graphical processing units (GPUs), single core processors, multi-core processors, application-specific integrated circuits (ASICs), and the like.
One or more of the processing element(s) 202 may be implemented in software and/or firmware in addition to hardware implementations. Software or firmware implementations of the processing element(s) 202 may include computer- or machine-executable instructions written in any suitable programming language to perform the various functions described. Software implementations of the processing elements(s) 202 may be stored in whole or part in the memory 204. The memory 204 may be volatile (such as RAM) and/or non-volatile (such as ROM, flash memory, etc.).
The computing system(s) 110 may also include additional computer-readable media such as removable storage, non-removable storage, local storage, and/or remote storage. The memory 204 and any associated computer-readable media may provide storage of computer readable instructions, data structures, program modules, and other data. Computer-readable media includes, at least, two types of media, namely computer-readable storage media and communications media.
Computer-readable storage media includes volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules, or other data. Computer-readable storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computing device.
In contrast, communication media may embody computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, or other transmission mechanism. As defined herein, computer-readable storage media and communication media are mutually exclusive.
The computing system(s) 110 include a video camera connection 206 that may provide hardware and/or software interfaces for receiving a video feed from the video camera 106 and providing commands to the video camera 106. A communication connection 208 may connect the computing system(s) 110 to other computing systems such as those at the second site 104. The communication connection 208 may include hardware such as a socket for a network cable or antenna for radio transmissions as well as software for processing incoming and outgoing signals.
The block diagram 200 shows multiple modules included within the memory 204. These modules may be implemented in software and alternatively, or in addition, implemented, at least in part, by one or more hardware logic components. For example, and without limitation, illustrative types of hardware logic components that can be used include Field-programmable Gate Arrays (FPGAs), Program-specific Integrated Circuits (ASICs), Program-specific Standard Products (ASSPs), System-on-a-chip systems (SOCs), Complex Programmable Logic Devices (CPLDs), etc.
An object detection and tracking module 210 detects an object in a video feed from the video camera 106 when the object is in view of the video camera 106. For example, the object may be object A 124 shown in FIG. 1. The object detection and tracking module 210 can track objects across frames within a single camera such as video camera 106. The computation performed by the object detection and tracking module 210 involves only local processing (i.e., no information exchange needed), and thus, does not affect secure computation.
For one or more frames in the video feed obtained from a video camera 106, the object detection module 210 may perform background subtraction followed by blob extraction. Background subtraction maintains a statistical background model that may be represented as a Gaussian distribution per pixel. For each pixel in a new frame, the statistical background model may be used to determine whether the pixel belongs to the background or foreground. While the module is making this determination, the mean and variance parameters of the background model can be updated. The output of the background subtraction may be a binary image where 0 indicates background and 1 indicates foreground. After background subtraction, morphological reconstruction filters may be used to remove noise, connect broken regions of the foreground, and smooth the shapes of the foreground pixels. Finally, the 4-connected neighbor rule can be used to extract the blobs that represent foreground objects.
The object detection and tracking module 210 may maintain a list of objects that are present in view of the video camera 106. Information stored for each of the objects in the list may include a state of the tracked objects including their location, size, and velocity. Given the extracted blobs of a new frame, a Kalman filter can be used to estimate the current state of a tracked object. The object detection and tracking module 210 then determines whether a tracked object is under occlusion by checking if a predicted state of the tracked object is consistent with the observation in the new frame. If the tracked object is not occluded, the result from blob extraction is regarded as reliable, and the nearest blob is selected as the corresponding measurement. Otherwise, the mean-shift method may be used to obtain the measurement. By using the measurement obtained either from the result of blob extraction or the result of the mean-shift method, it is possible to update the Kalman filter and obtain the current state even of occluded objects.
The feature vector module 212 can extract a feature vector from an object identified by the object detection and tracking module 210. Some computer vision algorithms use face or license plate features for identifying humans or cars captured in a video feed. However, in an uncontrolled environment like a neighborhood, these features may be unreliable since video cameras do not always capture a clear view of the face or license plate of an object. Therefore, the whole-body appearance of the object may provide a better basis for feature vector extraction.
Two categories of features that may be used in object matching (or re-identification) are local descriptors and global features. Local descriptors are extracted from interest points on the objects, and the matching is performed using a voting scheme. Because local descriptors compute interest point detection in the feature extraction phase, this can add to the processing overhead and potentially affect accuracy. Global features are easier to compute and some are effective for person re-identification. For example, color histograms and texture histograms may be used to create feature vectors from the whole-body or different parts of the body of an object such as a person or car. One technique for generating a color histogram includes dividing an object into six horizontal regions. In each region, color histograms of RGB, YCbCr, and HSV color spaces are extracted. In this technique, each channel has 64 bins, and the histograms are concatenated into the feature vector in a 3456-dimension feature space.
A preprocessing module 214 performs local processing of the feature vector by normalizing the feature vector and quantizing the feature vector into an integer value to form a secure input. For example, feature vectors may be transformed into the secure input which can be a vector that has low dimensional integer entries. The preprocessing module 214 may perform a local computation portion of a vector distance function that is decomposed into local and joint computation portions. The decomposition minimizes the parts of the vector distance function that are affected by the complexity of secure computation because the local computation portion may be solved without privacy-protecting security.
The feature vector generated by the feature vector module 212 may be a real-valued feature vector xiϵ
Figure US09977991-20180522-P00001
d, and it may be transformed by the preprocessing module 214 into ziϵ
Figure US09977991-20180522-P00001
q with much lower dimension based on Principal Component Analysis (PCA) and linear mapping by Equation 1
z i =W T P T(x i x )+T1,
where W, P, x and T are parameters obtained in a training stage, and are known by both the first site 102 and the second site 104. The training stage is discussed in greater detail below. The secure two-party computation performed later can take integer input, so each entry of zi may be normalized to a real number from 0 to 1 and further quantized into an l-bit integer value to form a q-dimensional input vector z i which can be used as a secure input to a garbled circuit.
Matching accuracy may be improved with an offline training phase in order for the vector matching function to learn the distance metric. This learning may be done in a pairwise manner. That is, a set of parameters may be learned for each pair of cameras (e.g., video camera 106 and video camera 108), rather than learning global parameters. Objects by the camera that the site owners have indicated a willingness to share publicly may be used as inputs for the learning.
A garbled circuit generator 216 generates a garbled circuit for the joint computation portion of the vector distance function. Inputs to the garbled circuit comprise the secure input from the first site 102 and a secure input from the second site 104. The second site 104 generates the secure input by local processing using techniques that are the same or similar to those described above for the first site 102.
The secure computation protocol may be based on a Boolean circuit for a predefined distance function D(xi,xj). One party (the circuit generator, here the first site 102) prepares a garbled version of the circuit and the second party (the circuit evaluator, here the second site 104) obliviously computes the output of the garbled circuit. Thus, both parties learn the output of the circuit, but neither party has access to the other's input.
A match query module 218 sends a query via the communication connection 208 to the second site 104 to request participation in solving the jointly-computed portion of the vector distance function. The match query module 218 may send a secure matching function evaluation request to one or more peer sites such as the second site 104 every time the object detection and tracking module 210 records that a suspicious object has entered the view of the video camera 106. Suspicious object may be defined by criteria that are local to the computing system(s) 110. For example, feature vectors of suspicious or “blacklisted” objects may be stored in the database 222 and newly detected objects may be compared against this list of suspicious objects. A secure matching function evaluation request may also be sent when an object enters or leaves the view of the video camera 106.
The computing system(s) 110 may issue secure matching function evaluation requests to some or all of the other participating sites. The issuing system (e.g., computing system(s) 110 of the first site 102) becomes a client and the responding system (e.g., computing system(s) 112 of the second site 104) becomes a server for secure function evaluation. However, as more sites join a surveillance community, the overhead will increase, requiring individual sites to equip with more powerful processing elements and more memory. However, the topology of cameras may be used to improve matching efficiency. Based on an object's time and spatial information, sites can send matching requests only to select other sites that have video cameras positioned to cover locations where the object is likely to travel next.
The joint computation module 114 introduced in FIG. 1 solves the jointly-computed portion of the vector distance function with the second site 104 to determine if the first object (e.g., object A 124) is the same as the second object (e.g., object B 126). Matching between input vectors may be performed through secure two-party computation using the joint computation module 114 of the first site 102 and the joint computation module 116 of the second site 104. The garbled circuit constructed by the garbled circuit generator 216 and used for joint computation portion of the vector distance function, is an “efficient” circuit because it may be constructed to use only integer comparison (CMP), min (MIN), and addition (ADD) circuits. The garbled circuit takes as input the two input vectors from the sites 102 and 104 and a comparison threshold used to set the stringency of matching. If the computed distance between the two input vectors is smaller than the comparison threshold, the vectors are regarded as matching (i.e., object A 124 and object B 126 are the same object).
An alert module 220 generates an alert depending on if the first object is or is not the same as the second object. In some implementations, the alert module 220 may generate the alert in real time essentially as soon as a result is received from the joint computation module 114. Depending on the criteria used to determine whether a particular behavior is “suspicious” behavior or not, either the presence or the absence of a match may be a trigger for an alert. The alert may be a local alert shared only with the first site 102, shared only with the first site 102 and the second site 104, or the alert may be shared with other sites that are connected to either the first site 102 or the second site 104.
A database 222 may also be present in a computing system(s) 110 either in the memory 204 or another location. The database 222 may store information such as feature vectors extracted from objects that have been in the view of the video camera 106. Thus, the computing system(s) 110 may store a record of all the objects seen by storing the corresponding feature vectors. This database 222 may be queried in response to an incoming match request from another site (e.g., second site 104). Thus, the computing system(s) 110 when functioning as a server may respond to queries not only for objects that are currently in view of the video camera 106 but also for objects that have previously been viewed and are stored, as feature vectors, in the database 222. Each record in the database 222 may be associated with entry and exit timestamps recording when the object entered and left the view of the video camera 106. This may enable sites in a surveillance community to track how an object moves through the community. The path of the object may be used as a trigger to raise one or more alerts.
Comparison of Distance Learning Techniques
Distance metric learning is one way to improve matching accuracy by training a distance functionD (xi,xj) where xi and xj represent feature vectors extracted from objects. However, many distance metric learning techniques are computationally expensive and inefficient for secure two-party computation. One goal of distance metric learning is to learn a distance function that can discriminate positive and negative examples (e.g., objects known to be the same and objects known to be different). That is, the distance calculated by an effective distance function D(xi,xj) is likely to be small when xi and xj correspond to the same object and large when xi and xj correspond to different objects.
A secure computation protocol starts with a Boolean circuit for D(xi, xj). A portion of the circuit may be garbled to provide privacy. The garbling involves generating keys to each wire and generating a garbled truth table for each gate. Thus, the number of gates determines the complexity of a garbled circuit, impacting the performance of generating and evaluating the garbled circuit. However, XOR gates come “free” meaning that cryptographic operations are not required for computing outputs from XOR gates in garbled circuits. Table I compares the size (as measured in terms of the number of non XOR gates) of efficient circuit constructions for basic functions that compute on two l-bit integers (i.e., the input vectors may be converted to integer values before being used as inputs to the garbled circuit).
TABLE I
Gate Counts for Basic Garbled Circuits
basic circuits non XOR gates
addition ADD(l) l
subtraction SUB(l) l
multiplication MUL(l) 2l2 − l
comparison CMP(l) l
multiplexer MUX(l) l
minimum MIN(l) 2l
Creating efficient circuits for functions that are commonly used when computing distance metrics can decrease the final processing complexity. One function that is commonly used when computing distance metrics is, ADD*(N,l)=Σj=1 Nxj, where xj is an l-bit integer. FIG. 3 shows a garbled circuit construction 300 of this function when N is in power of 2. In the construction 300 ADD circuits are connected in a hierarchical fashion to minimize table size. The output of ADD*(N,l) is a (l+log2 N)-bit integer. Since the number of non XOR gates of ADD(l) circuit is |ADD(l)|=1 as shown in Table I above, the size of ADD*(N,l) is defined by Equation 2
ADD * ( N , l ) = j = 1 log 2 N N 2 j · ADD ( l + j - 1 ) = Nl + N - l - 1 - log 2 N .
FIG. 4 shows a more generalized garbled circuit construction 400 of the ADD*(N,l)=Σj=1 Nxj circuit when N is not in power of 2 (e.g., N=7). First, N is decomposed into the summation of a set of numbers consisting of power of 2 as shown in Equation 3.
K log 2 N , N = i = 0 K - 1 2 α i , log 2 N α K - 1 > > α 0 0
For i=0 to K−1, the summation of 2α i numbers is computed by constructing sub-circuits ADD*(2α i , l). By using ADD (.), the outputs of these sub-circuits are further added up sequentially in the order from the one with smallest number of bits to the one with largest number of bits. The size of ADD*(N,l) is equal to gateN. The algorithm for ADD*(N,l) when N is not in power of 2 is below.
1.  Initial X = N, t = 0, s = K − 1, and gateN = 0. N = Σ i=0 K−1 2αi,
log2 N ≥ αK−1 > ... > α0 ≥ 0
2.  while(X > 1)
3.      m = 2αs
4.      construct sub-circuit ADD*(m,l)
5.      roots[t] = output of ADD*(m,l)
6.      gateN = gateN + |ADD*(m,l)|, where |ADD*(m,l)| is
obtained by the equation for the size of ADD*(N,1)
7.      X = X − m
8.      t = t + 1
9.      s = s − 1
10.  end while
11. if(x == 1)
12.  roots[t] = the remaining xj
13. end if
14. Initial r = K − 1, right_tree = roots[K − 1]
15. while(r ≥ 1)
16.  Connect roots[r−1] and right_tree with an ADD(l + αK−r) circuit.
17.  right_tree = output of the above ADD circuit
18.  gateN = gateN + |ADD(l + αK−r)|
19.  r = r − 1
20. end while
In the following discussion of distance metric learning functions, xiϵ
Figure US09977991-20180522-P00001
d denotes the original real-valued feature vector. Let
Figure US09977991-20180522-P00002
p denote the positive training set consisting of matched feature vector pairs, and let
Figure US09977991-20180522-P00002
n denote the negative training set consisting of unmatched feature vector pairs.
One distance metric learning function that may be used for object matching is the person re-identification by Probabilistic Relative Distance Comparison (PRDC). The PRDC distance function is shown in Equation 4:
D 1(x i ,x j)=|x i −x j |WW T |x i −x j |=∥W T |x i −x j|∥2
where |·| is the entry-wise absolute function; ∥·∥ is L2 norm; W=[w1 . . . wL] ϵ
Figure US09977991-20180522-P00001
d×L and L<<d. Note that |xi−xj|needs to be computed privately (i.e., jointly) as it involves both feature vectors.
As a result, the computation of the PRDC function in secure two-party computation involves subtraction, absolute value calculation, multiplication, and addition. Moreover, the computation often operates in high dimensional space, i.e., d is usually quite large, which adds computational overhead when considering the secure two-party computation. Table II shows the computational complexity of the secure two-party computation implementation of D1 (.).
TABLE II
PRDC
intermediate steps circuits needed
1 |xi − xj| d subtraction
d absolute value
2 WT |xi − xj| d × L multiplication
(d − 1) × L addition
3 || WT |xi − xj| ||2 L multiplication
(L − 1) addition
The first intermediate step of Table II shows computing the absolute value of the subtraction of two l-bit integers (assuming each input element is quantized to be an l-bit integer). This circuit can be constructed by connecting 2Sorter(l) (which takes two l-bit integers, x and y, and outputs min(x, y) and max(x, y)) and one regular subtraction circuit. Thus, this first intermediate step requires d numbers of 2Sorter(l) and SUB(l) circuits.
The second intermediate step of Table II performs d×L multiplications of two l-bit integers which are the outputs of the first intermediate step. Then, the second intermediate step adds d elements of the multiplication results, which are 2l-bit integers. The third intermediate step of Table II first performs L multiplications of two (2l+log2 d)-bit integers and the output of each multiplication which is a (4l+2 log2 d)-bit integer. This is summed up (L elements total) and the output which is a (4l+2 log2 d+2 log2 L)-bit integer that is compared against the threshold. The distance threshold for match determination is estimated in a training stage to achieve appropriate true positive rates and false positive rates. In sum, the number of non XOR circuits for PRDC is shown by Equation 5
PRDC = d · 2 Sorter ( l ) + d · SUB ( l ) + d L · MUL ( l ) + L · ADD * ( d , 2 L ) + L · MUL ( 2 l + log 2 d ) + ADD * ( L , 4 l + 2 log 2 d ) + CMP ( 4 l + 2 log 2 d + log 2 L ) = 2 dl 2 L + dlL + 8 l 2 L + 3 dl + d L + 2 L ( log 2 d ) 2 + 8 lL log 2 d - 1.
Since d is large and multiplication is expensive to implement in secure two-party computation, the PRDC function incurs high computing overhead. Computational complexity may be reduced by omitting the absolute-value functions to arrive at the PRDC without absolute values function which is Equation 6:
D 2(x i ,x j)=(x i −x j)T WW T(x i −x j)=∥W T x i −W T x j2,
where W=[w1 . . . wL
Figure US09977991-20180522-P00001
d×L and L<<d. Note that WTxi and WTxj can be computed locally by each camera without invoking secure two-party computation. As a result, the computational complexity is reduced significantly as shown in Table III.
TABLE III
PRDC w/o absolute values
intermediate steps circuits needed
1 WT xi, WT xj (local computation)
2 WT xi − WT xj L subtraction
3 || WT xi − WT xj ||2 L multiplication
(L − 1) addition
For PRDC without absolute values, WT xi is assumed to be quantized to be an l-bit integer. Construction of the circuits in the intermediate steps of Table III is straightforward to one of ordinary skill in the art and omitted. The non-XOR circuit size for PRDC without absolute values is found by Equation 7:
|PRDC_woABS|=L·|SUB(l)|+L·|MUL(l)|+|ADD*(L,2l|+|CMP(2l+log2=2l 2 L+2lL+L−1.
Another technique for performing visual matching is Metric learning by Collapsing Classes (MCC). With MCC, use of Principal Component Analysis (PCA) for dimensionality reduction reduces the amount of computation in the training stage, especially if the feature space is high dimensional. Assume {circumflex over (x)}iϵ
TABLE I
Gate Counts for Basic Garbled Circuits
basic circuits non XOR gates
addition ADD(l) l
subtraction SUB(l) l
multiplication MUL(l) 2l2 − l
comparison CMP(l) l
multiplexer MUX(l) l
minimum MIN(l) 2l

r (r<<d) is the new feature vector after PCA, that is, Equation 8,
{circumflex over (x)} i =P T(x i −{circumflex over (x)}),
where Pϵ
TABLE I
Gate Counts for Basic Garbled Circuits
basic circuits non XOR gates
addition ADD(l) l
subtraction SUB(l) l
multiplication MUL(l) 2l2 − l
comparison CMP(l) l
multiplexer MUX(l) l
minimum MIN(l) 2l

d×r is projection matrix that maps the original d-dimension space into r dimensions (r<<d); xϵ
TABLE I
Gate Counts for Basic Garbled Circuits
basic circuits non XOR gates
addition ADD(l) l
subtraction SUB(l) l
multiplication MUL(l) 2l2 − l
comparison CMP(l) l
multiplexer MUX(l) l
minimum MIN(l) 2l

d the training data. The distance function is expressed as Equation 9:
D 3({circumflex over (x)} i ,{circumflex over (x)} j)=({circumflex over (x)} i −{circumflex over (x)} j)T WW T({circumflex over (x)} i −{circumflex over (x)} j)=∥W T {circumflex over (x)} i −W T {circumflex over (x)} j2,
where =[w1 . . . wq
TABLE I
Gate Counts for Basic Garbled Circuits
basic circuits non XOR gates
addition ADD(l) l
subtraction SUB(l) l
multiplication MUL(l) 2l2 − l
comparison CMP(l) l
multiplexer MUX(l) l
minimum MIN(l) 2l

r×x, r<<d, q≤r. Similar to PRDC without absolute values, WT{circumflex over (x)}i and WT{circumflex over (x)}j can be computed locally without invoking secure two-party computation. In addition, the dimensionality reduction is also performed locally. However, as shown in Table IV, the secure computation portion of the function uses multiplication circuits which are computationally expensive.
TABLE IV
MCC
intermediate steps circuits needed
1 {circumflex over (x)}i = PT (xi x) (local computation)
2 WT {circumflex over (x)}i, WT {circumflex over (x)}j (local computation)
3 WT {circumflex over (x)}i − WT {circumflex over (x)}j q subtraction
4 || WT {circumflex over (x)}i − WT {circumflex over (x)}j ||2 q multiplication
q − 1 addition
Similar to PRDC without absolute values, WTxi is assumed to be quantized to be an l-bit integer. The circuits for each of the intermediate steps of Table IV are similar to the circuits for PRDC without absolute values shown in Table III but with L replaced by q. Therefore, the circuit size for MCC is found by Equation 10:
|MCC|=q·|SUB(l)|+q·|MUL(l)|+|ADD*(q,2l)|+|CMP(2l+log2 q=2l 2 q+2lq+q−1.
An alternative function may be developed by realizing that learning the function D(.) is equivalent to learning the parameter W and observing that the distance function used for MCC, D3 (.), is equivalent to computing a Euclidean distance in an alternate vector space specified by W. Replacing the Euclidean distance function with a histogram intersection function after linearly projecting onto a new vector space specified by W avoids the secure multiplication operation shown in the forth intermediate step of Table IV.
The histogram intersection function is defined in Equation 11 as:
HI(h i ,h j)=Σk=1 qmin(h i k ,h j k),
where hi=[hi 1 . . . hi q]Tϵ
Figure US09977991-20180522-P00001
+ q, hj=[hj 1 . . . hj q]Tϵ
Figure US09977991-20180522-P00001
+ q are the vectors with nonnegative entries. The distance function referred to herein as Blind Tracking Distance Computation (BTDC) is Equation 12:
D 4 ( x ^ i , x ^ j ) = - HI ( W T x ^ i + T 1 , W T x ^ j + T 1 ) = - k = 1 q min ( w k T x ^ i + T , w k T x ^ j + T )
where [w1 . . . wq
Figure US09977991-20180522-P00001
r×q, r<<d and q≤r and T≥0 is a predefined parameter that ensures the non-negativity of wk T{circumflex over (x)}i+T for any feature vector {circumflex over (x)}i. The BTDC function is computationally efficient in part because computation is shifted locally (i.e., where there are no privacy concerns) and joint computation is minimized. The Boolean circuits used to represent the BTDC function are shown in Table V.
TABLE V
BTDC
intermediate steps circuits needed
1 {circumflex over (x)}i = PT (xi x) (local computation)
2 WT {circumflex over (x)}i + T1, WT {circumflex over (x)}j + T1 (local computation)
3 min(.) q min operation
4 Σk=1 q min(.) (q − 1) addition
The first and second intermediate steps of Table V are preprocessing steps performed by local computation. For example, these steps may be performed by the preprocessing module 214. Given any real-valued feature vector xiϵ
Figure US09977991-20180522-P00001
d, the feature vector can be transformed into ziϵ
Figure US09977991-20180522-P00001
q with much lower dimension based on PCA and linear mapping using Equation 13,
z i =W T P T(x i x )+T1,
where W, P, x, and T are parameters obtained in a training stage and known by both sites 102 and 104. The secure two-party computation represented by the third and fourth intermediate steps of Table V may take only integer input, therefore each entry of zi may be first normalized to a real number from 0 to 1 and further quantized into an l-bit integer value to form a q-dimensional input vector z i.
The matching between input vectors z i and z i is performed through secure two-party computation. BTDC simplifies the garbled circuit needed for secure function evaluation, as it only requires integer comparison (CMP), min (MIN), and addition (ADD). The garbled circuit uses two vectors, {circumflex over (z)}i and {circumflex over (z)}i (ql-bit integers) and the comparison threshold t as inputs. FIG. 5 shows the garbled circuit of the BTDC distance function. The circuit size for BTDC is found by Equation 14:
|BTDC|=q·|MIN(l)|+|ADD*(q,l)|+|CMP(l+log2 q)|=3ql+q−1.
The parameter W from D4(.) above may be learned by learning a linear projection matrix, such that the pairs in set
Figure US09977991-20180522-P00002
p have small distances, and the pairs in set
Figure US09977991-20180522-P00002
n have large distances. This can be achieved by defining a conditional distribution over points i≠j such that Equation 15 is
p W ( j i ) = e - D ( x ^ i , x ^ j ) Z i = e - D ( x ^ i , x ^ j ) k i e - D ( x ^ i , x ^ k ) i j
Ideally, if all the pairs in set
Figure US09977991-20180522-P00002
p have small distance, and all the pairs in set
Figure US09977991-20180522-P00002
n have large distances, the distribution becomes “bi-level,” that is, Equation 16:
p 0 ( j i ) { 1 if ( x ^ i , x ^ j ) p 0 if ( x ^ i , x ^ j ) n
Therefore, the cost function ƒ1 (W) is defined as Equation 17:
f 1 ( W ) = i KL [ p 0 ( j i ) p W ( j i ) ] = i j i p 0 ( j i ) × log ( p 0 ( j i ) p W ( j i ) ) ,
where KL[·] is K-L divergence which measures the distance between two distributions. Substitution of Equations 16 and 17 yields Equation 18,
f 1 ( W ) = i , j , ( x ^ i , x ^ j ) p D 4 ( x ^ i , x ^ j ) + i log ( Z i )
A regularization term ƒ2 (W) bounds the values of WT{circumflex over (x)}i in such a way that a nonnegative scalar T to make all the entries in WT{circumflex over (x)}i+T1 nonnegative for all i always exists. This is done by denoting the original high dimensional feature vector as
x i = [ x i 1 x id ] .
Since the BTDC computation uses a histogram-based feature, each entry is a real number from zero to one. Because the principal component matrix P=[p1 . . . pr] is a unitary matrix, the entries of the new vector after dimensional reduction, Equation 19,
x ^ i = [ x ^ i 1 x ^ id ] = P T ( x i - x _ )
are bounded. Assume ∥{circumflex over (x)}i∥θ, and according to the Cauchy-Schwarz inequality, Equation 20 is
(w k T {circumflex over (x)} i)2 ≤∥w k2 2 ∥{circumflex over (x)} i2 2 ≤∥w k2 2×θ2.
In order to confine the dynamic range of wk, the term Σk=1 qwk Twk=Tr(WTW) may be added as another cost function to make ∥wk2 2 bounded. If ∥wk2 2 is bounded, (wk T{circumflex over (x)}i)2 is bounded; that is, for all the vectors {circumflex over (x)}i, there exists a nonnegative scalar T such that wk T+T≥0, ∀k. This yields Equation 21:
f 2 ( W ) = k = 1 q w k T w k = Tr ( W T W ) .
Moreover, to satisfy the equality of self-distance, that is, Equation 22,
D 4({circumflex over (x)} k ,{circumflex over (x)} k)=D 4({circumflex over (x)} i ,{circumflex over (x)} i) for all k≠l
an additional term ƒ3 (W) is added. In order to make the distance metric function D4 (.) valid, self-distance of all the given feature vectors {circumflex over (x)}i may be the same. Thus, Equation 23:
D 4 ( x ^ i , x ^ i ) = - HI ( W T x ^ i + T 1 , W T x ^ i + T 1 ) = - k = 1 q min ( w k T x ^ i + T , w k T x ^ i + T ) = - k = 1 q w k T x ^ i + q T
Assuming Σk=1 qwk T{circumflex over (x)}i equals to zero for all i, D4({circumflex over (x)}k,{circumflex over (x)}k) is equal to D4({circumflex over (x)}l,{circumflex over (x)}l) for all k≠l. Introducing the cost function Σik=1 q=wk T{circumflex over (x)}i)2 into the objective function gives Equation 24:
f 3 ( W ) = i ( k = 1 q w k T x ^ i ) 2 .
The final objective function is the sum of the above three terms, Equation 25:
J(W)=ƒ1(W)+ƒ2(W)+ƒ3(W)
Then, the metric learning problem is formulated as finding W that minimizes the objective function J(W) as in Equation 26:
W ^ = arg min W J ( W )
Gradient descent method may be employed to solve the optimization problem. The gradient vector is Equation 27,
J ( W ) w k = i j i ( ( p 0 ( j i ) - p W ( j i ) ) × g ( x ^ i , x ^ j , w k ) ) + 2 w k + 2 i ( ( l = 1 q w l T x ^ i ) x ^ i ) ,
where, Equation 28,
g ( x ^ i , x ^ j , w k ) = { - x ^ i if w k T x ^ i < w k T x ^ j - x ^ j if w k T x ^ i > w k T x ^ j - 1 2 ( x ^ i + x ^ j ) if w k T x ^ i = w k T x ^ j
Accuracy and computational efficiency of the BTDC computation are discussed below.
Illustrative Process
For ease of understanding, the process discussed in this disclosure is delineated as separate operations represented as independent blocks. However, these separately delineated operations should not be construed as necessarily order dependent in their performance. The order in which the process is described is not intended to be construed as a limitation, and any number of the described process blocks may be combined in any order to implement the process, or an alternate process. Moreover, it is also possible that one or more of the provided operations may be modified or omitted.
FIGS. 6A and 6B show process 600 illustrated as a collection of blocks in a logical flowchart, which represents a sequence of operations that can be implemented in hardware, software, or a combination of hardware and software. For discussion purposes, the processes are described with reference to the architecture, and computing systems, shown in FIGS. 1 and 2. However, process 600 may be performed using different architectures and/or computing systems.
At 602, a first feature vector is obtained. The first feature vector may be a private input that the first site 102 desires to keep secret from the second site 104. The first feature vector may be extracted from an object viewed by the video camera 106 of the first site 102. However, other types of private inputs such as still images, images of human faces, handwritten text, biometric features, or the like may also be the source of the first feature vector. The first feature vector may be quantized into an integer value for later processing.
At 604, it is determined if the object is an object for which the system will seek a possible match with a peer site (e.g., second site 104). A determination to seek a match for a given object may be based on the first object matching an object of interest, entering the view of the first video camera 106, leaving the view of the first video camera 106, or for another reason. Determining that the object matches an object of interest may be performed locally at the first site 102 using a distance matching function that is not necessarily private or secure. The match may be performed by comparing the first feature vector to stored feature vectors that are labeled as “objects of interest.” For example, in a security implementation, cars that belong to known criminals may be used as objects of interest. If the object is determined to be an object for which a match is sought from a peer site, process 600 proceeds along the “yes” path to 608.
If, however, it is decided to not seek a match for the object process 600 may proceed from 604 along the “no” path to 606 and the process 600 may end. In some implementations, all objects may be submitted for secure matching so the process 600 would not end but rather proceed to 608 for all detected objects.
At 608, a secure match evaluation request is submitted to a peer computing system. The peer computing system may be the computing system 112 of the second site 104.
At 610, the peer computing system receives the secure match evaluation request. The peer computing system at the second site 104 may obtain a feature vector at 612 by techniques similar or identical to those used by the first site 102 at 602. The second site 104 may obtain the feature vector from the current video feed of the video camera 108 or from a database that stores feature vectors from previously observed objects. For example, the peer computing system may compare the incoming requests to objects that appeared within a predetermined time window (e.g., 10 minutes) prior to receiving the request.
At 614 the first site 102 solves its local computation portion of a distance function. The local portion of the distance function is computed within the privacy boundary 118 of the first site 102. In some implementations the local computation portion may comprise transformation of the feature vector obtained at 602 to a lower dimensionality vector. The transformation may be performed by PCA and linear mapping. For example, the first and second intermediate steps shown in Table V are the local computation portion of the BTDC distance function.
At 616, the computing system(s) 112 of the second site 104 perform a similar local computation on the feature vector obtained at 612.
At 618, a joint portion of the distance function is solved by joint computation between the first site 102 and the second site 104. The joint computation portion of the distance function may be constructed as a garbled circuit to prevent either site from obtaining the other site's secure input. The garbled circuit may be an efficient circuit that only contains the Boolean CMP, MIN, and ADD circuits. For example, the garbled circuit may be the circuit shown in FIG. 5.
For each potential match, there may be one process on the client side (e.g., the first site 102) that communicates with a process on the server side (e.g., the second site 104). These two processes may communicate with each other to determine if their respective input feature vectors match. When the server receives a match request from the client, the server may start one process per potential match. Each process can be started with a feature vector that belongs to a different object and uses a different transmission control protocol (TCP) port for communication. Then the server returns to the client the list of ports where each process is listening Upon receiving this list, the client may start one process for each port. The feature vector that is input to client-side processes is the same, corresponding to the object that triggered the matching process. The client and server read the outputs of these processes to determine if any of the pairwise matches yield a positive result.
This technique may reveal to the client how many (but not which) objects were seen by the server in the matching time window. If this information is sensitive, to hide it, the server may initiate matches against a fixed number of objects. This fixed number may be an upper bound on the number of objects that can be seen in the matching time period. When fewer objects have been seen, the remaining processes can be supplied random feature vectors as input.
Continuing on FIG. 6B, at 620, a distance determined by the joint solving of the distance function at 618 is compared to a threshold distance. The threshold distance may be the threshold {circumflex over (t)} discussed above. When the calculated distance is less than the threshold distance, process 600 proceeds along the “yes” path to 622. However, when the calculated distance is greater or equal to the threshold distance, process 600 proceeds along the “no” path to 624.
At 622, the private input generated by the first site 102 at 614 and used for solving the joint portion of the distance function at 618 is categorized as matching the private input similarly generated by the second site 104. The match of the two private inputs indicates that the objects from which the private inputs were originally created are the same object.
At 624, when the distance calculated by the distance function is equal or greater than the threshold, the private inputs are interpreted as being different and both sites learn that the objects do not match without sharing any distinguishing information about their respective video feeds.
EXAMPLES
The four distance metric learning techniques discussed above PRDC, PRDC without absolute values, MCC, and BTDC are compared using four datasets. Each dataset consists of the snapshots of several different objects and each object has two images captured by different cameras. Thus, the datasets provide still images corresponding to individual frames captured by a video camera.
Two of the datasets are public datasets, Viewpoint Invariant Pedestrian Recognition (VIPeR) and Imagery Library for Intelligent Detection Systems (iLIDS). VIPeR is a large publically available person re-identification dataset. It consists of the well-cropped snapshot images of 632 people taken outdoors. For each person, two images are captured under different viewing angles and lighting conditions which make the appearance vary and increase the difficulty of the re-identification. The iLIDS dataset is extracted from a multiple-camera tracking video scenario captured in an airport area.
Two private datasets are also used. The private datasets were created by collecting multiple video clips from two video cameras pointing at two different adjacent streets. Multiple video clips were collected from each camera at different times of day for a total of 207 minutes of video. Because the cameras are at different locations, the size and perspective of the objects they capture is different. Images of 114 people from 100 minutes of the video were extracted to create a human dataset. A car dataset was created by extracting images of 83 cars from 40 minutes of video clips.
In order to reduce the impact by errors in object localization, snapshots from the human dataset or the car dataset were manually cropped around the objects. The public datasets, VIPer and iLIDS, provide cropped objects. An input feature vector is extracted from images of humans in the dataset by using color histogram as the input feature. Color histogram is resilient against the scale of the object. For images of cars, multiple stripes are extracted based on the principal axis of the car using color histogram. By extracting multiple stripes the feature representation of the car is rotation invariant.
Before testing the four different techniques, training set size and quantization levels were determined experimentally. The size of the training set affects matching accuracy. A larger training set generally leads to higher matching accuracy. Accuracy is evaluated with Receiver Operational Characteristic (ROC) curves. A ROC curve is commonly used for binary classification evaluation. The true positive rate (TPR) is presented along the vertical axis and the false positive rate (FPR) is presented along the horizontal axis. Changes in the shape of a ROC curve can be investigated for different methods and inputs. The more accurate that a matching method is, the closer the ROC curve will be to the upper-left corner of the graph. The upper-left corner of an ROC graph corresponds to a desirable condition of high TPR and low FPR.
FIG. 7 shows ROC curves generated by different training set sizes. Comparison was performed on images from the VIPeR dataset. The training set sizes tested are 20, 50, 80, and 100 objects. Accuracy increases as the training set size increases from 20 to 120. However, the accuracy improves only slightly when the size of training set increases from 80 to 100. Accordingly, 80 randomly selected objects are used as the training set for experiments performed with the VIPer, iLIDS, and human datasets. The car dataset contains images of only 83 cars so a smaller number of objects, 68, were used for training.
FIG. 8 shows ROC curves generated by different quantization levels. As discussed above, an input vector is quantized into one of a finite set of prescribed integer values in order to generate and integer input to a Boolean circuit for secure computation. The larger the quantization level, the smaller the negative impact on accuracy. Different quantization levels were tested on images from the VIPeR dataset. The curve showing “real value” presents results obtained by using original real value of a vector without quantization. Accuracy increases as quantization increases from 32 bits to 256 bits. A quantization level of less than 64 bits results in a loss of accuracy. To minimize the loss of accuracy, 256 (l=8) levels are used in the experiments described below.
Accuracy results were tested for each of the four distance metric learning techniques—PRDC, PRDC without absolute values, MCC, and BTDC—on each of the four datasets. In the testing 15 objects were used as the testing set with 80 objects (68 for the car dataset) used as training objects. The training set and testing set do not overlap, i.e., each object only appears either in training set or testing set. In the training set, a pair of snapshots of each person or car under different views form the positive set OP, and all pairs of images of different people are the negative set
Figure US09977991-20180522-P00002
n. These sets are used in learning the parameters mentioned in above in the discussion of distance metric learning. The accuracy curves shown for a given distance metric learning technique in FIGS. 9-12 are generated from the average of ten separate trials.
FIG. 9 shows the ROC curves under different matching functions when tested against the VIPeR dataset. The PRDC without absolute values distance metric learning technique had significantly lower accuracy than the other three techniques. The BTDC distance metric learning technique derived above, obtains similar performance to MCC while reducing the computational cost.
FIG. 10 shows the ROC curves under different matching functions when tested against the iLIDS dataset. Similar to FIG. 9, the BTDC technique achieves a similar level of accuracy as the MCC technique. The overall performance of all the methods with respect to iLIDS dataset is slightly lower than that for the VIPeR dataset due to the presence of occlusions in front of foreground objects.
FIG. 11 shows the ROC curves under different matching functions when tested against the human dataset. The BTDC technique has accuracy that is comparable to the PRDC and MCC techniques, while the PRDC without absolute values has much lower accuracy. FIG. 12 shows the ROC curves under different matching functions when tested against the car dataset. The results are similar to FIG. 11.
Tables VI-IX below show the relationship between matching accuracy and performance for the four different distance metric learning techniques. The measure of performance of a technique is represented by the number of non-XOR gates calculated using the above equations for determining circuit size. In all cases shown in the following tables, the quantization level is 1=8 and d=3456. The other parameters are shown with each table. Area Under an ROC Curve (AUC) is used to represent accuracy characteristics of each method because the closer a given ROC curve is to the up-left corner the more accurate the method was that generated that ROC curve and the larger the AUC. AUC value ranges from 0 to 1.
TABLE VI
ViPER Dataset (L = 3; q = 69)
technique non XOR gates AUC
PRDC 1,508,070 0.8335
PRDC w/o abs 435 0.6708
MCC 10,009 0.8971
BTDC 1,729 0.8934
TABLE VII
iLIDS Dataset (L = 3; q = 76)
technique non XOR gates AUC
PRDC 1,508,070 0.8151
PRDC w/o abs 435 0.6646
MCC 11,022 0.8697
BTDC 1,902 0.8653
TABLE VIII
Human Dataset (L = 2 for PRDC, L =
15 for PRDC w/o abs; q = 75)
technique non XOR gates AUC
PRDC 1,033,027 0.8452
PRDC w/o abs 2,175 0.7244
MCC 10,878 0.9001
BTDC 1,878 0.9042
TABLE IX
Car Dataset (L = 2 for PRDC, L = 3 for PRDC w/o abs; q = 53)
technique non XOR gates AUC
PRDC 1,033,027 0.8336
PRDC w/o abs 435 0.7907
MCC 7,667 0.9207
BTDC 1,327 0.9210
The results shown in Tables VI-IX indicate that even though the PRDC without absolute values technique results in the smallest number of non XOR gates, this technique has much lower accuracy than any of the other techniques. The other three techniques PRDC, MCC, and BTDC provide similar accuracy. However, the BTDC technique is much more computationally efficient having average 748 times fewer non XOR gates than PRDC and 6 times fewer non XOR gates than MCC.
FIG. 13 shows performance results for secure matching using the BTDC technique. The system used for this test includes two netbooks each with a 1 GHz processor and 2 GB memory that run Windows 8 with a network emulator to create a round-trip network delay of 100 ms between the netbooks.
The total time from the client issuing the match request to it recovering the result of the match in seconds for this test system is shown on the vertical axis of FIG. 13. This time includes the time for initial handshake in which the client learns about the ports on which the server processes are running and the time to start client and server processes. Randomly selected feature vectors are used for matching Matching performance does not depend on the values in the feature vector; it only depends on the size of the feature vector, which is independent of the objects being matched. In the experiments used to generate FIG. 13 q=128, l=8, and {circumflex over (t)}=9280. Thus, the number of non XOR gates to the garbled circuit used in the BTDC technique is 3,199. Ten different trials were performed for each number of objects the mean of the trials is plotted in FIG. 13. The time it takes to securely match objects increases linearly with the number of objects that need to be matched. For single-object matches, the time is roughly 4 seconds. For matching 4-16 objects, the total time amounts to roughly 3 seconds per match. Thus, as many as 20 objects may be securely matched in under a minute.
CONCLUSION
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts are disclosed as example forms of implementing the claims.

Claims (19)

The invention claimed is:
1. A system comprising:
one or more processing elements;
an object detection module communicatively coupled to or integrated with the one or more processing elements and configured to detect a first object when the first object is in view of a first video camera associated with a first site having a first privacy boundary that encompasses the first video camera;
a preprocessing module communicatively coupled to or integrated with the one or more processing elements and configured to perform local processing of at least a portion of the first video to form a first secure input;
a circuit generator communicatively coupled to or integrated with the one or more processing elements and configured to generate a function, wherein inputs to the function comprise the first secure input from the first site and a second secure input from a second site having a second privacy boundary that encompasses a second video camera and that does not overlap with the first privacy boundary, wherein the first privacy boundary prevents a user at the first site from obtaining video captured by the second video camera at the second site and the second privacy boundary prevents a user at the second site from obtaining video captured by the first video camera at the first site, the second secure input generated by processing a second object that was in view of the second video camera associated with the second site;
a match query module communicatively coupled to or integrated with the one or more processing elements and configured to send a query to the second site to request participation in solving function; and
a joint computation module communicatively coupled to or integrated with the one or more processing elements and configured to solve the function with the second site to determine if the first object is the same as the second object.
2. The system of claim 1, wherein a first feature vector is generated from the first object, and a second feature vector is generated from the second object, the first and second feature vectors comprising at least one of color histograms or texture histograms.
3. The system of claim 1, wherein the function comprises a Boolean circuit.
4. The system of claim 1, further comprising a database communicatively coupled to or integrated with the one or more processing elements and configured to store a first feature vector generated from the first object for later comparison.
5. The system of claim 1, further comprising an alert module communicatively coupled to or integrated with the one or more processing elements and configured to generate an alert if the first object is the same as the second object.
6. The system of claim 1, wherein at least one of the preprocessing module, the circuit generator, the match query module, or the joint computation module is implemented in a hardware logic component.
7. A system comprising:
one or more processors; and
computer-readable storage including a plurality of programming instructions that are executable by the one or more processors to:
obtain an indication of a first object viewed by a first video camera within a first privacy boundary;
process at least a portion of the indication of the first object to form a first input;
issue a match evaluation request to a peer computing system having a second privacy boundary that encompasses a second video camera and does not overlap with the first privacy boundary, wherein the first privacy boundary prevents a user within the first privacy boundary from obtaining video captured by the second video camera at the within the second privacy boundary, and the second privacy boundary prevents a user within the second privacy boundary from obtaining video captured by the first video camera within the first privacy boundary;
jointly compute, by the computing system and the peer computing system, at least a portion of a distance function over the first input and a second input derived by the peer computing system from a second object viewed by the second video camera; and
characterize, based at least in part on an output of the distance function, the first object and the second object as the same or different.
8. The system of claim 7, wherein the distance function comprises determining a histogram intersection function for the first input and the second input.
9. The system of claim 7, wherein the first input comprises a transformation of a first feature vector of the first object to a lower dimensionality vector.
10. The system of claim 9, wherein the transformation comprises Principal Component Analysis (PCA) and linear mapping.
11. The system of claim 7, wherein the programming instructions are further executable by the one or more processors to issue the match evaluation request when the first object matches an object of interest, enters the view of the first video camera, or leaves the view of the first video camera.
12. The system of claim 7, wherein the programming instructions are further executable by the one or more processors to send, as part of jointly computing the distance function, a Boolean circuit of at least a portion of the distance function to the peer computing system and wait for the peer computing system to compute an output to the Boolean circuit.
13. The system of claim 7, wherein the joint computation is performed without disclosing either input to another computing system.
14. A computer-implemented method for jointly computing a distance metric using secure two-party computing to improve tracking of an object between independent computing systems, each computing comprising a computer-readable memory storing a plurality of machine-readable instructions, and one or more hardware processors coupled to the computer-readable memory and configured to read instructions from the computer-readable memory to perform operations comprising:
obtaining, by a first computing system, a first private input;
solving, by the first computing system, a local computation portion of a distance function, the local computation portion computed within a first privacy boundary of a first entity that provided the first private input, the distance function comprising a Metric learning by Collapsing Classes (MCC) distance function transformed to a different vector space by a linear projection matrix and having a Euclidean distance function replaced with a histogram intersection function;
solving, jointly with a second entity having a second computing system that has also independently solved a same local computation portion of the distance function for a second private input, a joint computation portion of the distance function that takes the first private input and the second private input as inputs, the second entity being within a second privacy boundary that does not overlap with the first privacy boundary, wherein the first privacy boundary prevents a user of the first computing system from obtaining the second private input of the second computing system and the second computing system prevents a user of the second computing system from obtaining the first private input;
comparing a computed distance determined by the distance function to a threshold distance; and
when the computed distance is smaller than the threshold distance, categorizing the first private input as matching the second private input.
15. The method of claim 14, wherein the first private input and second private input both comprise one or more of video, still images, images of human faces, handwritten text, or biometric features.
16. The method of claim 14, wherein the joint computation portion of the distance function is constructed as a circuit.
17. The method of claim 14, wherein a first secure input is determined from the first private input, and is quantized into an integer value.
18. The method of claim 17, wherein a second secure input is determined from the second private input, and the joint computation portion takes the first secure input and the second secure input as inputs.
19. The method of claim 14, wherein the joint solving comprising a single process using a transmission control protocol (TCP) port opened by the second entity exclusively for comparisons against the second private input.
US14/693,791 2012-12-31 2015-04-22 Secure and private tracking across multiple cameras Active 2033-12-16 US9977991B2 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US14/693,791 US9977991B2 (en) 2012-12-31 2015-04-22 Secure and private tracking across multiple cameras
US15/958,820 US10181090B2 (en) 2012-12-31 2018-04-20 Secure and private tracking across multiple cameras

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US73182212A 2012-12-31 2012-12-31
US13/731,822 US20140184803A1 (en) 2012-12-31 2012-12-31 Secure and Private Tracking Across Multiple Cameras
US14/693,791 US9977991B2 (en) 2012-12-31 2015-04-22 Secure and private tracking across multiple cameras

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
US13/731,822 Continuation US20140184803A1 (en) 2012-12-31 2012-12-31 Secure and Private Tracking Across Multiple Cameras

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/958,820 Continuation US10181090B2 (en) 2012-12-31 2018-04-20 Secure and private tracking across multiple cameras

Publications (2)

Publication Number Publication Date
US20150227814A1 US20150227814A1 (en) 2015-08-13
US9977991B2 true US9977991B2 (en) 2018-05-22

Family

ID=51016768

Family Applications (3)

Application Number Title Priority Date Filing Date
US13/731,822 Abandoned US20140184803A1 (en) 2012-12-31 2012-12-31 Secure and Private Tracking Across Multiple Cameras
US14/693,791 Active 2033-12-16 US9977991B2 (en) 2012-12-31 2015-04-22 Secure and private tracking across multiple cameras
US15/958,820 Active US10181090B2 (en) 2012-12-31 2018-04-20 Secure and private tracking across multiple cameras

Family Applications Before (1)

Application Number Title Priority Date Filing Date
US13/731,822 Abandoned US20140184803A1 (en) 2012-12-31 2012-12-31 Secure and Private Tracking Across Multiple Cameras

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/958,820 Active US10181090B2 (en) 2012-12-31 2018-04-20 Secure and private tracking across multiple cameras

Country Status (1)

Country Link
US (3) US20140184803A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170300754A1 (en) * 2016-04-14 2017-10-19 KickView Corporation Video object data storage and processing system
US20190303676A1 (en) * 2018-03-29 2019-10-03 Ncr Corporation Decentralized video tracking
US11120160B2 (en) 2019-05-31 2021-09-14 Advanced New Technologies Co., Ltd. Distributed personal data storage and encrypted personal data service based on secure computation

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9123227B2 (en) * 2008-10-13 2015-09-01 The Boeing Company System for checking security of video surveillance of an area
CN104079525B (en) * 2013-03-25 2015-11-11 腾讯科技(深圳)有限公司 A kind ofly prevent plug-in method and server
JP6203077B2 (en) * 2014-02-21 2017-09-27 株式会社東芝 Learning device, density measuring device, learning method, learning program, and density measuring system
US9460522B2 (en) 2014-10-29 2016-10-04 Behavioral Recognition Systems, Inc. Incremental update for background model thresholds
JP6399356B2 (en) * 2015-05-26 2018-10-03 パナソニックIpマネジメント株式会社 Tracking support device, tracking support system, and tracking support method
US10582125B1 (en) * 2015-06-01 2020-03-03 Amazon Technologies, Inc. Panoramic image generation from video
US10395137B2 (en) * 2016-03-11 2019-08-27 Irvine Sensors Corporation Method and processing unit for correlating image data content from disparate sources
US10331982B2 (en) * 2016-03-11 2019-06-25 Irvine Sensors Corp. Real time signal processor for analyzing, labeling and exploiting data in real time from hyperspectral sensor suites (Hy-ALERT)
US10402983B2 (en) * 2016-11-07 2019-09-03 Nec Corporation Surveillance system using deep network flow for multi-object tracking
US10332274B2 (en) * 2016-11-14 2019-06-25 Nec Corporation Surveillance system using accurate object proposals by tracking detections
US20180173940A1 (en) * 2016-12-19 2018-06-21 Canon Kabushiki Kaisha System and method for matching an object in captured images
US20180308243A1 (en) * 2017-03-28 2018-10-25 Irvine Sensors Corporation Cognitive Tracker -- Appliance For Enabling Camera-to-Camera Object Tracking in Multi-Camera Surveillance Systems
CN108875448B (en) * 2017-05-09 2021-06-08 深圳荆虹科技有限公司 Pedestrian re-identification method and device
US10395385B2 (en) 2017-06-27 2019-08-27 Qualcomm Incorporated Using object re-identification in video surveillance
CN107689054B (en) * 2017-08-24 2020-09-22 北京航空航天大学 Multi-camera topological connectivity graph establishing and cross-camera target tracking method
CN108062562B (en) * 2017-12-12 2020-03-10 北京图森未来科技有限公司 Object re-recognition method and device
EP3506547A1 (en) * 2017-12-28 2019-07-03 Flytxt B.V. Providing security against user collusion in data analytics using random group selection
CN108363724B (en) * 2018-01-04 2021-12-10 浙江大学 Method for solving feature extraction task by utilizing image regularization and data reconstruction
CN109376578A (en) * 2018-08-27 2019-02-22 杭州电子科技大学 A kind of small sample target identification method based on depth migration metric learning
CN109685934B (en) * 2018-12-18 2021-03-02 福州大学 Traffic estimation method based on vehicle data privacy protection and compressed sensing
CN109714570A (en) * 2018-12-28 2019-05-03 孙海 Cloud server end data-pushing mechanism and corresponding terminal
CN110162551B (en) * 2019-04-19 2020-08-14 阿里巴巴集团控股有限公司 Data processing method and device and electronic equipment
US10936605B2 (en) 2019-04-19 2021-03-02 Advanced New Technologies Co., Ltd. Providing oblivious data transfer between computing devices
WO2021056050A1 (en) * 2019-09-24 2021-04-01 NETGENIQ Pty Ltd Systems and methods for identifying and/or tracking objects
US11238093B2 (en) * 2019-10-15 2022-02-01 Adobe Inc. Video retrieval based on encoding temporal relationships among video frames
US12008115B2 (en) * 2020-02-05 2024-06-11 C2Ro Cloud Robotics Inc. System and method for privacy-aware analysis of video streams
EP3971734B1 (en) * 2020-09-22 2024-07-10 Grazper Technologies ApS Evaluation device for re-identification and corresponding method, system and computer program
CN112257684B (en) * 2020-12-07 2021-04-16 之江实验室 Cross-camera global pedestrian tracking method based on real-time video stream
US12106569B2 (en) * 2021-10-11 2024-10-01 Worlds Enterprises, Inc. Tracking two-dimensional objects in a three-dimensional coordinate space
TWI792693B (en) * 2021-11-18 2023-02-11 瑞昱半導體股份有限公司 Method and apparatus for person re-identification
US20230230379A1 (en) * 2022-01-19 2023-07-20 Target Brands, Inc. Safety compliance system and method
CN117528035B (en) * 2024-01-05 2024-03-22 四川省寰宇众恒科技有限公司 Object cross-border head tracking method and system based on active notification

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5715325A (en) * 1995-08-30 1998-02-03 Siemens Corporate Research, Inc. Apparatus and method for detecting a face in a video image
US20030167402A1 (en) * 2001-08-16 2003-09-04 Stolfo Salvatore J. System and methods for detecting malicious email transmission
US6807360B1 (en) * 1999-07-28 2004-10-19 Optex Co., Ltd. Digital image recording system
US20050073585A1 (en) * 2003-09-19 2005-04-07 Alphatech, Inc. Tracking systems and methods
US20050149516A1 (en) * 2002-04-25 2005-07-07 Wolf Peter P. Method and system for retrieving documents with spoken queries
US20060212337A1 (en) * 2005-03-16 2006-09-21 International Business Machines Corporation Method and system for automatic assignment of sales opportunities to human agents
US20060217892A1 (en) * 2005-03-23 2006-09-28 International Business Machines Corporation Vectorization of dynamic-time-warping computation using data reshaping
US20070039030A1 (en) * 2005-08-11 2007-02-15 Romanowich John F Methods and apparatus for a wide area coordinated surveillance system
US20080021899A1 (en) * 2006-07-21 2008-01-24 Shmuel Avidan Method for classifying private data using secure classifiers
US20090154795A1 (en) * 2007-12-12 2009-06-18 Microsoft Corporation Interactive concept learning in image search
US7720261B2 (en) * 2005-12-07 2010-05-18 Sony Corporation Image processing apparatus, image processing method, program, and recording medium
US8224088B2 (en) * 2005-09-01 2012-07-17 Hitachi, Ltd. Method for background generation and its system for video surveillance
US20120319724A1 (en) * 2010-01-12 2012-12-20 Stc.Unm System and methods for generating unclonable security keys in integrated circuits
US8370421B2 (en) * 2007-05-19 2013-02-05 Videotec S.P.A. Method for coordinating a plurality of sensors
US20130035979A1 (en) * 2011-08-01 2013-02-07 Arbitron, Inc. Cross-platform audience measurement with privacy protection
US9077539B2 (en) * 2011-03-09 2015-07-07 Microsoft Technology Licensing, Llc Server-aided multi-party protocols
US9124417B2 (en) * 2010-03-05 2015-09-01 Alcatel Lucent Computation of garbled tables in garbled circuit
US9141184B2 (en) * 2010-08-27 2015-09-22 Hitachi Kokusai Electric Inc. Person detection system

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5715325A (en) * 1995-08-30 1998-02-03 Siemens Corporate Research, Inc. Apparatus and method for detecting a face in a video image
US6807360B1 (en) * 1999-07-28 2004-10-19 Optex Co., Ltd. Digital image recording system
US20030167402A1 (en) * 2001-08-16 2003-09-04 Stolfo Salvatore J. System and methods for detecting malicious email transmission
US20050149516A1 (en) * 2002-04-25 2005-07-07 Wolf Peter P. Method and system for retrieving documents with spoken queries
US20050073585A1 (en) * 2003-09-19 2005-04-07 Alphatech, Inc. Tracking systems and methods
US20060212337A1 (en) * 2005-03-16 2006-09-21 International Business Machines Corporation Method and system for automatic assignment of sales opportunities to human agents
US20060217892A1 (en) * 2005-03-23 2006-09-28 International Business Machines Corporation Vectorization of dynamic-time-warping computation using data reshaping
US20070039030A1 (en) * 2005-08-11 2007-02-15 Romanowich John F Methods and apparatus for a wide area coordinated surveillance system
US8224088B2 (en) * 2005-09-01 2012-07-17 Hitachi, Ltd. Method for background generation and its system for video surveillance
US7720261B2 (en) * 2005-12-07 2010-05-18 Sony Corporation Image processing apparatus, image processing method, program, and recording medium
US20080021899A1 (en) * 2006-07-21 2008-01-24 Shmuel Avidan Method for classifying private data using secure classifiers
US8370421B2 (en) * 2007-05-19 2013-02-05 Videotec S.P.A. Method for coordinating a plurality of sensors
US20090154795A1 (en) * 2007-12-12 2009-06-18 Microsoft Corporation Interactive concept learning in image search
US20120319724A1 (en) * 2010-01-12 2012-12-20 Stc.Unm System and methods for generating unclonable security keys in integrated circuits
US9124417B2 (en) * 2010-03-05 2015-09-01 Alcatel Lucent Computation of garbled tables in garbled circuit
US9141184B2 (en) * 2010-08-27 2015-09-22 Hitachi Kokusai Electric Inc. Person detection system
US9077539B2 (en) * 2011-03-09 2015-07-07 Microsoft Technology Licensing, Llc Server-aided multi-party protocols
US20130035979A1 (en) * 2011-08-01 2013-02-07 Arbitron, Inc. Cross-platform audience measurement with privacy protection

Non-Patent Citations (15)

* Cited by examiner, † Cited by third party
Title
A. Gilbert and R. Bowden, "Tracking objects across cameras by incrementally learning inter-camera colour calibration and patterns of activity", Proceedings of the 9th European conference on Computer Vision-vol. Part II, May 2006, 12 pgs.
A. Gilbert and R. Bowden, "Tracking objects across cameras by incrementally learning inter-camera colour calibration and patterns of activity", Proceedings of the 9th European conference on Computer Vision—vol. Part II, May 2006, 12 pgs.
A. Globerson and R. Rowels, "Metric learning by collapsing classes", Neural Information Processing Systems-NIPS , Dec. 2005, 8 pgs.
A. Globerson and R. Rowels, "Metric learning by collapsing classes", Neural Information Processing Systems—NIPS , Dec. 2005, 8 pgs.
A.J. Brush, J. Jung, R. Mahajan, and F. Martinez, "Digital Neighborhood Watch: Investigating the Sharing of Camera Data amongst Neighbors", Proceedings of the 2013 conference on Computer supported cooperative work, Feb. 2013, 8 pgs.
Bauml et al , "Evaluation of local features for person re-identification in image sequences," IEEE Intl. Conf. on Advanced Video and Signal Based Surveillance, Sep. 2011, 6 pgs.
Bishop, Pattern Recognition and Machine Learning, 2006 Book, Chapt 1.6, 1 pg.
C. Dixon, R. Mahajan, S. Agarwal, A.J. Brush, B. Lee, S. Saroiu, and P. Bahl, "An operating system for the home", Proceedings of 9th USENIX conference on Networked Systems Design and Implementation, Apr. 2012, 16 pgs.
D. Comaniciu, V. Ramesh, and P. Meer, "Kernel-based object tracking", IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 25, No. 5, May 2003, 30 pgs.
Dikmen et al, "Pedestrian recognition with a learned metric," Computer Vision-ACCV, 2010, 10th Asian Conf on Computer Vision, Nov. 2010, 12 pgs.
Dikmen et al, "Pedestrian recognition with a learned metric," Computer Vision—ACCV, 2010, 10th Asian Conf on Computer Vision, Nov. 2010, 12 pgs.
Gonzalez, Digital Image Processing, 2006 Book, 3 pgs.
K. Chen, C. Lai, Y. Hung, and C. Chen, "An adaptive learning method for target tracking across multiple cameras", Computer Vision and Pattern Recognition, IEEE Conference, Jun. 2008, 8 pgs.
M. Farenzena, L. Bazzani, A. Perina, V. Murino, and M. Cristani, "Person re-identification by symmetry-driven accumulation of local features", IEEE Conf on Computer Vision and Pattern Recognition, Jun. 2010, 8 pgs.
S. Avidan and M. Butman, "Blind vision", European Conf on Computer Vision, May 2006, 15 pgs.

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170300754A1 (en) * 2016-04-14 2017-10-19 KickView Corporation Video object data storage and processing system
US10217001B2 (en) * 2016-04-14 2019-02-26 KickView Corporation Video object data storage and processing system
US20190303676A1 (en) * 2018-03-29 2019-10-03 Ncr Corporation Decentralized video tracking
US10929675B2 (en) * 2018-03-29 2021-02-23 Ncr Corporation Decentralized video tracking
US11120160B2 (en) 2019-05-31 2021-09-14 Advanced New Technologies Co., Ltd. Distributed personal data storage and encrypted personal data service based on secure computation

Also Published As

Publication number Publication date
US20140184803A1 (en) 2014-07-03
US20180239985A1 (en) 2018-08-23
US10181090B2 (en) 2019-01-15
US20150227814A1 (en) 2015-08-13

Similar Documents

Publication Publication Date Title
US10181090B2 (en) Secure and private tracking across multiple cameras
Arroyo et al. Expert video-surveillance system for real-time detection of suspicious behaviors in shopping malls
Calderara et al. Detecting anomalies in people’s trajectories using spectral graph analysis
Tiwari et al. A computer vision based framework for visual gun detection using harris interest point detector
Walia et al. Recent advances on multicue object tracking: a survey
Lavi et al. Survey on deep learning techniques for person re-identification task
US20200012923A1 (en) Computer device for training a deep neural network
US20220174089A1 (en) Automatic identification and classification of adversarial attacks
CN108229335A (en) It is associated with face identification method and device, electronic equipment, storage medium, program
Manfredi et al. Detection of static groups and crowds gathered in open spaces by texture classification
Szwoch Extraction of stable foreground image regions for unattended luggage detection
Sabokrou et al. Fast and accurate detection and localization of abnormal behavior in crowded scenes
Manikandan et al. A neural network aided attuned scheme for gun detection in video surveillance images
Gnouma et al. Video anomaly detection and localization in crowded scenes
Denman et al. Automatic surveillance in transportation hubs: No longer just about catching the bad guy
Denman et al. Multi-spectral fusion for surveillance systems
Chebi et al. Dynamic detection of anomalies in crowd's behavior analysis
CN111160150A (en) Video monitoring crowd behavior identification method based on depth residual error neural network convolution
Ahmed et al. Real-time intruder detection in surveillance networks using adaptive kernel methods
Senior An introduction to automatic video surveillance
Maktoof et al. Machine Learning-Based Intelligent Video Surveillance in Smart City Framework
Rashed et al. Improved moving object detection algorithm based on adaptive background subtraction
Srivastava et al. Anomaly Detection Approach for Human Detection in Crowd Based Locations
Kavitha et al. Human Gait Recognition Systems (HGRS) for Viewpoint Variation in Visual Surveillance System Using Siamese Deep Convolutional Neural Network (SDCNN)
Loughlin et al. Spectral variability modeling with variational autoencoders for hyperspectral target analysis

Legal Events

Date Code Title Description
AS Assignment

Owner name: MICROSOFT TECHNOLOGY LICENSING, LLC, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:MICROSOFT CORPORATION;REEL/FRAME:035484/0445

Effective date: 20141014

Owner name: MICROSOFT CORPORATION, WASHINGTON

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHU, CHUN-TE;JUNG, JAEYEON;LIU, ZICHENG;AND OTHERS;SIGNING DATES FROM 20130521 TO 20130530;REEL/FRAME:035484/0293

STCF Information on status: patent grant

Free format text: PATENTED CASE

MAFP Maintenance fee payment

Free format text: PAYMENT OF MAINTENANCE FEE, 4TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1551); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

Year of fee payment: 4