US20150235110A1 - Object recognition or detection based on verification tests - Google Patents
Object recognition or detection based on verification tests Download PDFInfo
- Publication number
- US20150235110A1 US20150235110A1 US14/181,077 US201414181077A US2015235110A1 US 20150235110 A1 US20150235110 A1 US 20150235110A1 US 201414181077 A US201414181077 A US 201414181077A US 2015235110 A1 US2015235110 A1 US 2015235110A1
- Authority
- US
- United States
- Prior art keywords
- candidate
- candidate object
- predetermined
- belief
- source image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/30—Scenes; Scene-specific elements in albums, collections or shared content, e.g. social network photos or video
-
- G06K9/6267—
-
- G06K9/6212—
-
- G06T7/408—
Definitions
- This application relates to computer vision and, in particular, to object recognition or detection.
- Social network use has expanded dramatically in recent years, with social networking services such as Facebook® (a registered trademark of Facebook, Inc. of Menlo Park, Calif.) boasting more than a billion users.
- Social networking services facilitate users posting text and images that may be viewed by others. Posted text and images may remain available for viewing and are often not removed. Accordingly, the amount of posted text may grow over time, and the number of posted images may increase over time.
- An object recognition system may be provided that includes an object detection module, multiple verification tests, a scoring module, and a verification module.
- the object detection module may apply a cascade classifier to a source image, which results in identification of candidate objects for a predetermined object type.
- Each of the verification tests may generate difference values for a candidate object identified by the object detection module and a corresponding reference image, where the corresponding reference image depicts an object of the predetermined object type, and where each one of the difference values represents an indication of a difference between a characteristic of the candidate object and a characteristic of the corresponding reference image.
- the scoring module may determine, for each of the candidate objects, a belief score for the candidate object based on the difference values for the candidate object. The belief score may indicate a likelihood that the candidate object is of the predetermined object type.
- the verification module may identify a set of detected objects based on the candidate objects and the belief scores for the candidate objects.
- a computer readable storage medium may be provided that includes computer executable instructions.
- source images that are shared in a social networking service may be identified.
- a candidate object may be detected in any of the source images by applying a cascade classifier in search of an object of a predetermined object type.
- Difference values may be generated based on comparisons of characteristics of the candidate object with corresponding characteristics of a reference image. Each one of the difference values may indicate a difference between a respective one of the characteristics of the candidate object and a corresponding respective one of the characteristics of the reference image.
- a belief score may be generated for the candidate object based on differences between the difference values and target difference values. The belief score may indicate the likelihood that the candidate object is an object of the predetermined object type. Any of the source images that includes the candidate object may be identified as including the predetermined object type when the belief score exceeds a threshold belief score.
- a method is provided to recognize objects in an image.
- a source image may be searched for any candidate objects of a predetermined object type by applying a cascade classifier associated with the predetermined object type to the source image.
- Scores such as difference values, for a candidate object may be determined from a plurality of verification tests applied to the candidate object. Each one of the scores may be determined from a corresponding one of the verification tests. Each one of the scores may represent an indication of a difference between the candidate object and a set of reference images for the predetermined object type.
- a belief score may be determined for the candidate object from the scores for the candidate object. The belief score may indicate the likelihood that the candidate object is of the predetermined object type.
- the candidate object may be identified as a detected object of the predetermined object type when the belief score relative to a threshold belief score indicates the candidate object is of the predetermined object type.
- FIG. 1 illustrates an object recognition system
- FIG. 2 illustrates the logic flow of an object detection module
- FIG. 3 illustrates a first part of the logic flow of a verification module
- FIG. 4 illustrates a second part of the logic flow of a verification module
- FIG. 5 illustrates a third part of the logic flow a verification module
- FIG. 6 illustrates a graphical user interface for building cascade classifiers
- FIG. 7 illustrates a graphical user interface for testing and adjusting parameters of an object detection module and a verification module
- FIG. 8 illustrates a graphical user interface for testing and adjusting parameters of an object detection module and a verification module in a search for multiple object types
- FIG. 9 illustrates a graphical user interface for presenting images and text available in the social networking service in which objectionable material is detected.
- FIG. 10 illustrates an example of a graphical user interface for providing feedback to improve the accuracy of object recognition.
- source images that are shared in a social networking service may be identified. For example, any images of a person that are publicly available may be identified.
- a cascade classifier associated with the predetermined object type may be applied to each of the source images.
- the predetermined object type may be a beer can, a beer bottle, or any other type of object.
- One or more candidate object may be identified by applying the cascade classifier.
- the candidate object may not be an object of the predetermined type. Verification tests may verify whether the candidate object is such an object. Difference values may be generated based on comparisons of characteristics of the candidate object with corresponding characteristics of a reference image.
- the reference image may be an image known to depict an object of the predetermined object type.
- Each one of the difference values may indicate a difference between a respective one of the characteristics of the candidate object and a corresponding respective one of the characteristics of the reference image.
- a belief score may be determined for the candidate object based on differences between the difference values and target difference values. Each one of the target difference values may be an expected difference value for a corresponding one of the characteristics of any reference image and any candidate image that actually depicts an object of the predetermined object type.
- the belief score may indicate the likelihood that the candidate object is an object of the predetermined object type.
- the source image that includes the candidate object may be identified as including the predetermined object type when the belief score exceeds a threshold belief score.
- FIG. 1 illustrates an object recognition system 100 .
- the object recognition system 100 may recognize or detect objects in any context.
- the object recognition system 100 illustrated in FIG. 1 recognizes objects in the context of a social networking service 102 .
- the system 100 may recognize objects a surveillance system, in a robotics system, or in any other context in which object recognition functionality may be desirable.
- the system 100 may include an object recognition device 104 and one or more client devices 106 .
- the object recognition device 104 may be in communication with the social networking service 102 and the client devices 106 over a network 108 .
- the object recognition device 104 may be included in any type of device.
- the object recognition device 104 may be included in a computer, a server, a smart phone, a smart device, a mobile phone, a robot, an appliance, a circuit, and/or an integrated circuit chip.
- the object recognition device 104 may be included in a server or servers that host the social networking service 102 .
- the social networking service 102 may be a service through which people may build social networks or social relations among each other.
- the people in a social network may share, for example, interests, activities, backgrounds, and/or connections in real-life.
- the social network service 102 may facilitate uploading images that others may view.
- Examples of the social networking service 102 may include FACEBOOK®, INSTAGRAM® (INSTAGRAM is a registered trademark of Instagram, LLC of Menlo Park, Calif.), and/or any other social networking service.
- Each of the client devices 106 may be any computing device. Examples of the client devices 106 may include a computer, a laptop, a tablet, a mobile phone, a smart phone, an appliance, or any other type of computing device.
- the client devices 106 may be referred to as clients of object recognition device 104 because the client devices 106 may use services provided by the object recognition device 104 .
- the network 108 may be any collection of transmission links over which data between computing nodes may be exchanged.
- the network 108 may include a local area network (LAN), a wired network, a wireless network, a wireless local area network (WLAN), a WI-FI® network (WI-FI is a registered trademark of Wireless Ethernet Compatibility Alliance, Inc. of Austin, Tex.), a personal area network (PAN), a wide area network (WAN), the Internet, an Internet Protocol (IP) network, and/or any other communications network.
- LAN local area network
- WLAN wireless local area network
- WI-FI® network WI-FI is a registered trademark of Wireless Ethernet Compatibility Alliance, Inc. of Austin, Tex.
- PAN personal area network
- WAN wide area network
- IP Internet Protocol
- the object recognition device 104 is physically distinct from the social networking service 102 and the client devices 106 .
- the object recognition device 104 may be included in the social networking service 102 and/or in one or more servers that host the social networking service 102 .
- the object recognition device 104 may be included in one or more of the client devices 106 .
- the object recognition device 104 may include a processor 110 and a memory 112 .
- the memory 112 may include a scan engine 114 , a scan engine GUI (Graphical User Interface) module 116 , and an object detection service GUI module 118 .
- GUI Graphic User Interface
- the scan engine 114 may be a component that detects any objects 122 in the source images 120 that are of a predetermined object type 124 , such as a plastic cup, a beer bottle, a tool, and/or a type of animal.
- the scan engine 114 may include an object detection module 126 and a verification module 128 .
- the object detection module 126 of the scan engine 114 may be a component that applies a cascade classifier 130 to the source images 120 or otherwise locates one or more candidate objects 132 in the source images 120 .
- application of the cascade classifier 130 such as an XML cascade, to any of the source images 120 may locate one or more candidate objects 132 that are possibly objects of the predetermined object type 124 .
- the verification module 128 may be a component that verifies that the candidate objects 132 are objects of the predetermined object type 124 .
- the verification module 128 may include one or more reference image based verification tests 134 , one or more context based verification tests 136 , and a scoring module 138 .
- the reference image based verification tests 134 may be tests that compare the candidate objects 132 with reference images 140 to identify similarities and/or differences.
- the context based verification tests 136 may be tests that are based on a context of any of the candidate objects 132 .
- the context of a candidate object may be a location of candidate object relative to a face detected in a source image.
- the context may include any context different from, and/or in addition to, the location of the candidate object relative to the detected face.
- the scoring module 138 of the verification module 128 may be a component that generates scores 142 from one or more of the tests 134 and/or 136 .
- Each of the scores 142 may represent an indication of a difference—or equivalently, a similarity—between one of the candidate objects 132 and one or more of the reference images 140 that depict the predetermined object type 124 .
- the scoring module 138 may be a component that generates a belief score 144 from the scores 142 generated by one or more of the tests 134 and/or 136 .
- the belief score 144 may be any indication of the likelihood that the candidate object is an object of the predetermined object type 124 .
- the belief score 144 may be a numerical value, a percentage, and/or a symbol or a phrase, such as “likely” and “unlikely.”
- the scan engine GUI module 116 may be a component that generates a GUI 146 for configuring the behavior of the scan engine 114 .
- the scan engine GUI module 116 may generate one or more web pages that are viewed at the client devices 106 .
- the scan engine GUI module 116 may generate the GUI 146 in an app or software application that executes in the client devices 106 . Examples of such a GUI are provided later below and illustrated in FIGS. 6-8 .
- the client devices 106 or a subset thereof may be devices used by one or more administrative users or developers. Alternatively or in addition, the client devices 106 or a subset thereof may be devices used by one or more end users.
- the GUI 146 generated by the scan engine GUI module 116 may be an administrator GUI 148 limited to use by administrative users in many examples.
- the object detection service GUI module 118 may be a component that generates the GUI 146 for using the scan engine 114 in the context of the social networking service 102 . Examples of such a GUI are provided later below and illustrated in FIGS. 9-10 .
- the GUI 146 generated by the object detection service GUI module 118 may an end user GUI 150 for end users in many examples.
- the graphical user interface (GUI) 106 generated by either GUI module 116 or 118 may be a type of user interface through which a human may interact with electronic devices, such as the client devices 146 .
- the GUI 106 may include graphical icons and/or any other type of visual indicators to represent information and actions available to a user. The actions may be performed through direct manipulation of the graphical elements. More generally, the GUI 106 may be a text-based interface or text navigation interface.
- the scan engine 114 may search one or more of the source images 120 for the predetermined object type 124 or a set of predetermined object types.
- the source images 120 may be obtained from any source.
- the source images 120 may be obtained from the social networking service 102 .
- the source images 120 may be images in a user's social network that are public, images posted by a user that are available to members of the user's social network, images in which a user is “tagged” or identified with, and/or images selected by any other criteria.
- the user may provide the object recognition device 104 with authorization to access the social networking service 102 .
- the user may provide authorization by, for example, providing log-in credentials to the object recognition device 104 .
- the source images 120 may be obtained from different sources of images.
- the source images 120 may be obtained from a web search for images associated with a person, for example.
- the source images 120 may be obtained from a camera mounted on a robot or from another image source in the robotics system.
- the source images 120 may be obtained from a security camera.
- the predetermined object type 124 or types may be any type of object that the object recognition system 100 is requested to find.
- a user may wish to identify objects that a set of people, such as employers or family members, may find objectionable.
- a user may wish to identify object that may pose a security risk.
- Examples of the predetermined object type 124 may include a beer bottle, a beer can, a plastic cup, such as a SOLO® cup (SOLO is a registered trademark of Solo Cup Company of Lake Forest, Ill.), a beer bong, a can, a bottle, a backpack, a duffle bag, a weapon, a pistol, an animal, a person, a face, or any other type of object.
- SOLO® cup SOLO is a registered trademark of Solo Cup Company of Lake Forest, Ill.
- the predetermined object type 124 or predetermined object types may be predetermined in the sense that the object type 124 or types may be determined prior to searching the source images 120 for the object type 124 or types.
- a user such as an administrative user, may identify the predetermined object type 124 or types.
- the object detection module 126 of the scan engine 114 may locate one or more candidate objects 132 in the source images 120 .
- FIG. 2 illustrates an example logic flow 200 of the object detection module 126 .
- the object detection module 126 may resize ( 206 ) an initial source image 202 to obtain a source image 204 that has a target size.
- the target size may be selected to be large enough, by pixel standards, to detect and verify the predetermined object type 124 or types, but not so large that detecting and verifying objects exceeds a threshold amount of time.
- An example of the target size may be approximately 2000 horizontal pixels and 1300 vertical pixels.
- the target size may depend on factors such as the speed of the processor 110 , characteristics of the object type 124 , and/or the number and variety of object types that the scan engine 114 searches for.
- Resizing ( 206 ) the initial source image 202 may improve the speed by which the detected objects 122 may be recognized, while only incurring a small loss of accuracy in recognizing objects. Nevertheless, the source image 204 may have any size and the initial source image 202 need not be resized.
- the object detection module 126 may apply ( 208 ) the cascade classifier 130 to the source image 204 .
- the cascade classifier 130 may be an XML (eXended Markup Language) cascade, for example.
- the type of the cascade classifier 130 applied may be any type of cascade classifier.
- the cascade classifier 130 may be a Haar-like feature classifier, a local binary pattern (LBP) feature classifier, a histogram of gradient (HOG) feature classifier, or any other type of cascade classifier.
- LBP local binary pattern
- HOG histogram of gradient
- Each type of cascade classifier may implement a corresponding detection algorithm. Examples of the detection algorithm may include Haar, LBP, HOG, or any other type of cascade algorithm.
- the type of the cascade classifier 130 that is applied to the source image 204 may vary depending on the object type 124 . Each type of object may be identified more accurately with one type of cascade classifier than another. For example, if the predetermined object type 124 is a type of object that includes lettering, then a LBP feature classifier may be associated with the predetermined object type 124 in the memory 112 .
- a user may select and/or associate a selected cascade classifier 130 with the predetermined object type 124 in the memory 112 .
- the scan engine GUI module 116 may generate a GUI, as illustrated in FIG. 6 for example, for selecting and/or associating the cascade classifier 130 with the predetermined object type 124 .
- the cascade classifier 130 may be customized with the GUI generated by the scan engine GUI module 116 as illustrated in FIG. 6 .
- the behavior of detection algorithm of the object detection module 126 may be controlled by parameters.
- the parameters may be adjusted and passed to the object detection module 126 .
- the scan engine GUI module 116 may generate a GUI, as illustrated in FIG. 7 for example, for adjusting the parameters passed to the object detection module 126 .
- Customizing the cascade classifier 130 , associating the cascade classifier 130 with the predetermined object type 124 , and/or adjusting the parameters to the object detection module 126 may be performed prior to the object detection module 126 searching the source image 204 for the predetermined object type 124 .
- such action or actions may be performed while the object detection module 126 searches the source image 204 for the predetermined object type 124 .
- such action or actions may be performed after the object detection module 126 searches the source image 204 .
- the object detection module 126 may store a size and/or a location of each of the candidate objects 132 .
- Cartesian coordinates, measured in pixels, of each of the candidate objects 132 may be stored in the memory 112 .
- the height and width, for example in pixels, of each of the detected faces 212 may be stored in the memory 112 .
- the object detection module 126 may detect ( 210 ) faces 212 in the source image 204 .
- the object detection module 126 may, for example, apply an XML cascade to the source image 204 thereby detecting any faces 212 in the source image 204 .
- the XML cascade may evaluate the source image 204 for Haar-like features.
- the object detection module 126 may store a location of each of the detected faces 212 .
- Cartesian coordinates, measured in pixels, of each of the detected faces 212 may be stored in the memory 112 .
- a size of each of the detected faces 212 may be stored.
- the height and width in pixels of each of the detected faces 212 may be stored in the memory 112 .
- the object detection module 126 may determine an average size of the detected faces 212 .
- the size, average size, and/or location of the detected faces 212 may provide context information 214 for the candidate objects 132 .
- the verification module 128 may use the context information 214 to verify that the candidate objects 132 are objects of the predetermined object type 124 .
- the verification module 128 may compare the size, the average size, and/or the location of the detected faces 212 with a relative expected size and/or a relative expected location of an object of the predetermined object type 124 .
- the verification module 128 may use the size, average size, and/or location of the detected faces 212 to adjust a likelihood that each of the candidate objects 132 is of the predetermined object type based on a likelihood that an object of the predetermined object type 124 may overlap any of the detected faces 212 .
- verification module 128 may perform the reference image based verification tests 134 .
- Verification of the candidate objects 132 that are detected with the cascade classifier 130 may improve the accuracy of detecting objects over detecting objects with just the cascade classifier 130 alone.
- the cascade classifier 130 may be configured to achieve a suitable balance of true positives, false positives, and false negatives. As a result of achieving that balance, undetected objects that may have otherwise been detected are eliminated from further consideration.
- the cascade classifier 130 may be configured to identify more false positives than in the absence of performing the verification tests 134 and/or 136 . Accordingly, the overall accuracy in identifying the detected objects 122 may be improved.
- FIG. 3 illustrates a flow diagram of an example of part of the logic 300 of the verification module 128 .
- characteristics 302 , 304 , 306 , 308 , 310 , and/or 312 of a candidate object 314 may be generated ( 318 , 320 , 322 , 324 , 326 , and/or 328 ).
- a histogram 302 of the candidate object 314 may be generated ( 318 ).
- the histogram 302 may represent variations in shading and/or coloration.
- the histogram 302 may, for example, include a map of shading and/or color values arranged in “bins.” Each of the bins may represent a subset of a range of such values.
- the histogram 320 may provide a basis for finding similarities and/or differences between two objects. For example, the histogram 302 of a banana may match the histogram 302 of a lemon because the number of pixels that are shades representing yellow may be comparable for both objects, even though other aspects of the objects, such as their the shapes, are different from each other.
- the histogram 302 of the candidate object 314 may be subsequently compared with a histogram 330 of each of the reference images 140 , such as with the histogram 302 of reference image 350 illustrated in FIG. 3 .
- the histogram 320 may include multiple histograms because multiple types of histograms may be generated.
- Each type of histogram may represent properties of an image that are different than properties represented by the other types of histograms included in the histogram 320 .
- the histogram 320 may include a histogram of predetermined portions of color data and a histogram of grayscale shades.
- a color map 304 of color data of the candidate object 314 may be generated ( 320 ).
- the color map 304 may be a pixel by pixel representation of the image in red-green-blue (RGB) color space.
- the color map 304 of the candidate object 314 may be subsequently compared with a color map 332 of one or more of the reference images 140 .
- a hue map 306 of hue data of the candidate object 314 may be generated ( 322 ).
- the hue map 306 may be a pixel by pixel representation of the candidate object 314 in hue, saturation, and value (HSV) color space.
- the hue map 306 may be a representation of the candidate object 314 in a HSL (hue, saturation, and lightness) color space, a HSI (hue, saturation, and intensity) color space, and/or any other color space.
- the hue map 306 of the candidate object 314 may be subsequently compared with a hue map 334 of one or more of the reference images 140 .
- Key points 308 of the candidate object 314 may be identified ( 324 ).
- the key points 308 may represent significant features within the candidate object 314 , such as corners and areas of contrast. Such features are known as key points.
- the key points 308 may include pixel information from around such features.
- the key points 308 may include descriptors that include the pixel information.
- the key points 308 of the candidate object 314 may be subsequently compared with key points 336 of one or more of the reference images 140 .
- a percentage 310 of the candidate object 314 that contains hue, saturation, and value data that are within a range that represents skin tones may be determined ( 326 ). For example, if fifty percent of the candidate object 314 contains hue, saturation and value data within the range that represents skin tones, then half of the candidate object 314 may be skin.
- the percentage 310 may also be represented as and/or referred to as a skin ratio 310 .
- the skin ratio 310 of the candidate object 314 may be subsequently compared with a skin ratio 338 of one or more of the reference images 140 .
- the range of hue, saturation, and value data that represents skin tones may be determined prior to detecting any of the candidate objects 132 .
- any other characteristics 312 of the candidate object 314 that may be useful for comparison with the reference images 140 or that may provide context for the candidate object 314 may be determined and/or stored ( 328 ). Examples of such characteristics 312 may include an average color or hue of the candidate object 314 , a location of the candidate object 314 relative to any of the detected faces 212 , and/or any other characteristic of the candidate object 314 .
- the additional characteristics 312 of the candidate object 314 may be compared with corresponding additional characteristics 340 of the reference image 350 .
- the histogram 330 , the color map 332 , the hue map 334 , the key points 336 , the skin ratio 338 , and/or the additional characteristics 340 may be generated ( 352 , 354 , 356 , 358 , 360 , and/or 362 ) for each of the reference images 140 .
- Each of the reference images 140 may be an image of an object that is confirmed to be of the predetermined object type 124 .
- the reference images 140 may be customized to improve the accuracy of the verification module 128 .
- the reference images 140 may be added to, deleted from, or adjusted at any time.
- the characteristics 330 , 332 , 334 , 336 , 338 , and/or 340 of each of the reference images 140 may be used in the verification tests 134 and/or 136 for comparison with the candidate objects 132 .
- FIG. 4 illustrates a flow diagram of an example of part of the logic 400 of the verification module 128 .
- FIG. 4 illustrates a flow diagram of the logic of the reference image based verification tests 134 .
- a set of the candidate objects 132 of that type 124 may be found by the object detection module 126 .
- a series of comparisons may be made to each of reference images 140 of the predetermined object type 124 . The comparisons may be performed by the reference image based verification tests 134 .
- the reference image based verification tests 134 may include a histogram comparator 402 , an RGB color comparator 404 , a hue comparator 406 , and/or a key point comparator 408 .
- the reference image based verification tests 134 may include additional, fewer, or different comparators than illustrated in FIG. 4 .
- the comparators 402 , 404 , 406 , and/or 408 may be provided ( 420 ) with one or more of the characteristics 302 , 304 , 306 , 308 , 310 , and/or 312 of the candidate object 314 .
- the comparators 402 , 404 , 406 , 408 and/or 410 may be provided ( 430 ) with one or more of the characteristics 330 , 332 , 334 , 336 , 338 , and/or 340 of each of the reference images 140 .
- the comparators 402 , 404 , 406 , and/or 408 may generate ( 440 ) a numerical score.
- the numerical scores may be referred to as difference values 412 .
- Each of the difference values 412 may represent a difference between the candidate object 314 and the corresponding reference image 350 . Equivalently, each of the difference values 412 may represent a similarity between the candidate object 314 and the corresponding reference image 350 .
- the histogram comparator 402 may compare the histogram 302 of the candidate object 314 to the histogram 330 of each reference image 350 using one or more algorithms.
- the histogram comparator 402 may generate, from each comparison, a corresponding one of the difference values 412 for each algorithm that the histogram comparator 402 applies.
- the algorithm and/or algorithms may include any type of histogram comparison algorithm.
- the histogram comparator 402 may implement a correlation metric, chi-square metric, intersection metric, and/or Bhattacharyya distance metric computation.
- the RGB color comparator 404 may compare the color map 304 of the candidate object 314 to the color map 332 of each reference image 350 .
- the RBG color comparator 404 may generate, for reference image 350 , a respective one of the difference values 412 based on the comparison of the color maps 304 and 332 .
- the RGB color comparator 404 may compare the color maps 304 and 332 using one or more types of comparisons.
- One of the types of RGB color comparisons may include a grayscale conversion comparison, for example.
- the candidate object 314 and the reference image 350 may be converted to grayscale images. For each pixel, the grayscale value (0-256) of the pixel in the candidate object 314 may be subtracted from the grayscale value of the reference image 350 , and the difference may be squared.
- the sum of the squared values for the pixels may represent one the difference values 412 generated by the RGB color comparator 404 .
- the types of RGB color comparisons may include a peak color difference comparison.
- each pixel in the candidate object 314 may be compared to each pixel in the reference image 350 in each color channel (Red, Green, Blue) separately.
- the color channel having the greatest difference between the pixel in the candidate object 314 and the pixel in the reference image 350 may be determined.
- the difference between the pixel in the candidate object 314 and the pixel in the reference image 350 in the determined color channel may be squared a represent a peak value.
- the sum of the peak values may represent one of the difference values 412 generated by the RGB color comparator 404 .
- the types of RGB comparisons may include a sum of squares comparison.
- Each pixel in the candidate object 314 may be compared to each pixel in the reference image in each color channel (Red, Green, Blue) separately.
- a square of the difference in each channel may be determined.
- One of the difference values 412 generated by the RGB color comparator 404 may be a sum of the squares for each of the channels for all of the pixels.
- the hue comparator 406 may compare the hue map 334 of the candidate object 314 to the hue map 334 of the reference image 350 .
- the hue comparator 406 may compare the candidate object 314 with each reference image 350 in the HSV color space, the HSL color space, the HSI color space and/or any other color space.
- the hue comparator 406 may generate, for each comparison, a respective one of the difference values 412 .
- the hue comparator 406 may compare the hue map 334 of the candidate object 314 to the hue map 334 of the reference image 350 using one or more types of comparisons.
- the comparison or comparisons may include comparisons similar to the RGB color comparisons except that the color channels may be hue, saturation, and value (HSV); hue, saturation, and lightness (HSL); hue, saturation, and intensity (HSI); and/or any other color channels or combinations thereof.
- HSV hue, saturation, and value
- HSL hue, saturation, and lightness
- HI hue, saturation, and intensity
- the key point comparator 408 may compare the key points 308 of the candidate object 314 with the key points 336 of each reference image 350 . For example, descriptors in the key points 308 and 336 may be compared with each other. The key point comparator 408 may generate, for each comparison, a respective one of the difference values 412 .
- the key points 336 may be determined using the FAST (Features from Accelerated Segment Test) feature detecting algorithm or any other feature detecting algorithm, such as difference of Gaussians (DoG).
- DoG difference of Gaussians
- the descriptors for each key point may be determined using an ORB (oriented BRIEF) keypoint detector or any other type of detector.
- the descriptors may represent a grid of pixel information surrounding each of the key points 336 , where the grid of pixel information may be configurable.
- a brute force matcher may compare each descriptor for the key points 336 in the candidate object 314 to each descriptor of the key points 336 in the reference image 350 .
- a brute force matcher is a matcher that does not apply a specialized algorithm to speed up the matching process. Alternatively, any other type of matcher may be used.
- the brute force matcher may return a location of a key point in the reference image 350 that best matches each corresponding key point in the candidate object 314 , as well as a corresponding numerical score. The numerical score may be the sum of the differences between the matching key point descriptors.
- the resulting data may be parsed to identify one singular best match of each of the key points 308 in the candidate object 314 with a corresponding one of the key points 336 in the reference image 350 . In other words, none of the key points of the candidate object 314 is a best match with multiple key points 336 of the reference image 350 .
- the data may be further parsed to remove matches in which the numerical score of the respective match falls below a threshold score.
- the data may be further parsed to remove matches that fail to meet a Cartesian y-range limit.
- each of the matching descriptors are to include points that each match in the same relative Y position in the candidate object 314 and reference image 350 .
- the number of matching key points that meet such criteria may be divided by the number of pixels in the candidate object 314 , resulting in the key point comparator score.
- the variables used in this comparator may be adjustable from the GUI 146 generated by the scan engine GUI module ( 116 ).
- FIG. 5 illustrates a flow diagram of an example of part of the logic 500 of the verification module 128 .
- FIG. 5 illustrates a flow diagram of the logic of the scoring module 138 and the logic of the context based verification tests 136 .
- the scoring module 138 may determine ( 502 ) difference ratios 504 based on the difference values 412 and on target difference values 506 .
- Each one of the target difference values 506 may be an expected difference value for a corresponding one of the characteristics 302 , 304 , 306 , 308 , 310 , and/or 312 of any reference image and any candidate image that actually depicts an object of the predetermined object type 124 .
- the expected difference value may be a minimum threshold difference value needed for the candidate object 314 to match the reference image 350 for the corresponding one of the characteristics 302 , 304 , 306 , 308 , 310 , and/or 312 .
- the difference ratio 504 for the respective one of the characteristics, c may be determined as: [(difference value c ⁇ target difference c )/target difference c ].
- the difference ratio 504 may be determined based on any algorithm in which the greater negative difference between each of the difference values 412 and the corresponding one of the target difference values 506 , the greater similarity between the candidate object 314 and the reference image 350 with respect to the corresponding characteristic.
- the greater positive difference between each of the difference values 412 and the corresponding one of the target difference values 506 the greater difference between the candidate object 314 and the reference image 350 with respect to the corresponding characteristic.
- the formula for the difference ratio 504 for the respective one of the characteristics, c may vary depending on whether the difference score is preferably lower than the target difference or preferably greater than the target difference. If the characteristic, c, is desired to be greater than the target difference for a match, then the formula provided above may apply. However, if the characteristic, c, is desired to be lower than the target difference, then the formula [(target difference c ⁇ difference value c )/target difference c ] may apply.
- the determination of the difference ratios 504 may standardize each test to a similar range of ratios.
- the target difference value 506 for the histogram 302 characteristic is 10, and a greater value is more desirable than a lesser value (in other words, the larger the difference value, the better the match).
- the difference ratio may be (15 ⁇ 10)/10, or 0.5, which is a positive number that positively influences the belief score 510 toward acceptance, particularly after multiplication with a corresponding one of the belief multipliers 512 .
- the difference ratio may be (5 ⁇ 10)/10, or ⁇ 0.5, which is a negative number that will negatively influence the belief score 510 , particularly after multiplication with the corresponding one of the belief multipliers 512 .
- the first difference ratio may be (10 ⁇ 15)/10, or ⁇ 0.5
- the second difference ratio may be (10 ⁇ 5)/10, or 0.5.
- the sign of the difference ratios are now reversed and have the opposite effect on the belief score 510 .
- the scoring module 138 may determine ( 508 ) a belief score 510 based on the difference ratios 504 and on belief multipliers 512 .
- the belief score 510 may indicate a likelihood or probability that the candidate object 314 matches the reference image 350 .
- the scoring module 138 may determine the belief score 510 based on an algorithm in which the belief score 510 falls into a suitable range.
- the suitable range may be a range in which a belief score of 50 represents a 50 percent chance that candidate object 314 matches the reference image 350 , a belief score of 100 represents an almost 100 percent chance of a match, and a score of 0 (or less) represents an almost zero percent chance of a match.
- Each of the difference ratios 504 may be applied to the belief score 510 .
- the amount of each of the difference ratios 504 that is applied is based on adjustable multipliers that determine an importance of each characteristic for the predetermined object type 124 .
- the adjustable multipliers are the belief multipliers 512 .
- the scoring module 138 may determine ( 508 ) the belief score 510 as a sum of weighted difference ratios (the difference ratios 504 weighted by the belief multipliers 512 ), the sum then multiplied by a scalar, such as 20, and added to a constant, such as 50 percent.
- the belief score 510 may be determined according to the following:
- r cc is the difference ratio for a characteristic, c
- N is the number of the characteristics that are applied to the belief score 510
- M c is the belief multiplier for the characteristic, c
- S is the scalar
- K is the constant.
- the belief score 510 may be determined using other algorithms.
- the belief multipliers 512 configured for some predetermined object types may differ from the belief multipliers 512 configured for other predetermined object types. For example, a first set of object types may be more accurately matched using the key points 308 characteristic, while a second set of object types may be more accurately matched using the color map 304 characteristic. Accordingly, the belief multiplier for the key points 308 characteristic that is associated with the first set of object types may be higher than the belief multiplier for the key points 308 characteristic that is associated with the second set of object types.
- a positive difference ratio may indicate that the difference value is outside the bound of the target difference, which may negatively affect the belief score 510 .
- a negative difference ratio may indicate that the difference value is inside the bound of the target difference, which may positively affect the belief score 510 .
- the greater the difference ratio the greater the effect on the belief score 510 .
- the target difference values 506 may be adjustable and tuned by a user with the GUI 146 .
- the characteristics for some object types in some examples may require strict target differences in certain characteristics, and more lenient differences in other examples.
- the belief multipliers 512 may be adjusted and tested from within the GUI 146 for the predetermined object type 124 .
- Additional tests such as the context based verification tests 136 may be performed that adjust the belief score 510 .
- the context based verification tests 136 may generate ( 514 ) an adjusted belief score 516 .
- the context based verification tests 136 may include a skin tone test 520 , an image location test 522 , a face location test 524 , an image size test 526 , a face size test 526 , and/or a background color test 530 .
- the context based verification tests 136 may include fewer, additional, or different tests.
- the context information 214 used by the context based verification tests 136 may include any information that may provide context for the candidate objects 132 .
- the context information 214 may include the percentage of skin tones in the candidate object 314 , a location of the candidate object 314 within the source image 204 , a location of the candidate object 314 relative to one or more of the detected faces 212 , the size of the candidate object relative to one or more of the detected faces 212 , the size of the candidate object relative to the size of the source image 204 and/or any other information related to the context of the candidate object 314 , such as text that is associated with the source image 204 , such as a post, or a tag associated with the source image 204 .
- the skin tone test 520 may determine the percentage of the candidate object 314 that has color and/or hue values that are consistent with skin tones. The determined percentage may be compared to a predetermined minimum expected percentage and/or a predetermined maximum expected percentage. The predetermined minimum expected percentage and the predetermined maximum expected percentage may be configurable. The skin tones may be configurable. If the determined percentage is in a range between the predetermined minimum expected percentage and the predetermined maximum expected percentage, then the skin tone test 520 may not modify the belief score 510 , for example. On the other hand, if the determined percentage is less than the predetermined minimum expected percentage or greater than the predetermined maximum expected percentage, then the skin tone test 520 may determine a difference between the determined percentage and the closest of the predetermined minimum expected percentage or the predetermined maximum expected percentage. The difference may be multiplied by an adjustable multiplier to further emphasize the result, on a per candidate object basis.
- the expected percentage range of skin tones for a candidate object 314 of type in-hand may be set at 50-80%.
- the predetermined minimum expected percentage is 50%
- the predetermined maximum expected percentage is 80%. If only 10% of the pixels in the candidate object 314 are determined to be skin tones, then the difference in percentage points between 10% and 50% (40%) is multiplied by a skin tone multiplier resulting in a negative value that lowers the belief score 510 .
- 90% of the pixels in the candidate object 314 are determined to be skin tones, then the difference in percentage points between 90% and 80% (10%) is multiplied by the skin tone multiplier resulting in a negative value that harms the belief score 510 .
- the skin percentage range of the candidate object 314 falls within the predetermined percentage range, then the belief score 510 may be unaffected by the skin tone test 520 .
- the image location test 522 may verify that the location of the candidate object 314 within the source image 204 is within a predefined area.
- the predetermined area may be typical for an object of the predetermined object type 124 .
- beer cans often appear near the center to bottom half of an image, because the beer cans are most often on a table or are being held by a person below eye level.
- the center of the source image 204 may be a baseline.
- the belief score 510 may decrease.
- the image location test 522 may reduce the belief score 510 by a multiplicative product of an adjustable belief multiplier and the distance that the candidate object 314 is from the baseline.
- the face location test 524 may verify that the location of the candidate object 314 relative to one or more of the detected faces 212 is appropriate for the predetermined object type. In one such example, many types of objects should not overlap any of the detected faces 212 . A beer can, for example, is relatively unlikely to overlap a face in a picture. Accordingly, if the candidate object 314 is potentially a beer can and yet the candidate object 314 overlaps any of the detected faces 212 , then the face location test 524 may decrease the belief score 510 by a predetermined amount.
- the image size test 526 may verify that the size of the candidate object 314 relative to the size of the source image 204 is within a predetermined range.
- the predetermined range may be a range that is typical for an object of the predetermined object type 124 .
- a relative size of a beer may typically be less than thirty percent of the source image 204 or more than five percent of the source image 204 .
- the candidate objects 132 that do not fall within the predetermined size range may be eliminated from consideration early in the verification process in order to reduce computational time.
- the face size test 526 may verify that the size of the candidate object 314 relative to the size of the detected faces 212 in the source image 204 is within a predetermined range.
- the predetermined range may be typical for objects of the predetermined object type 124 . For example, a beer can in an image is unlikely to be twice the size of a human head or a tenth the size of a human head.
- the candidate objects 132 that fall outside established (and adjustable) ranges compared to the average face size in the source image 204 may be eliminated from further consideration.
- the background color test 530 may compare the average color of the candidate object 314 with background colors of the source image 204 . For example, objects that may be transparent may more closely match the background colors of the source image 204 than translucent objects. The background color test 530 may verify that the average color of the candidate object 314 matches the background colors of the source image 204 to a degree that is typical for objects of the predetermined object type 124 . The background color test 530 may compare the average color of the candidate object 314 with the background colors of the source image 204 . For example, the candidate object 314 for the predetermined object type, “plastic cup,” may be part of a larger background object, such as a red fire engine. The average color (in any color space) of the candidate object 314 may be determined.
- the background color test 530 may determine a percentage of the entire source image 204 that contains the average color of the candidate object 314 and/or similar color values within an adjustable range. The percentage of the source image 204 that the candidate object 314 occupies may be compared to the percentage of the entire source image 204 that contains the range of similar color value. If the source image 204 contains a high percentage of a similar color, a similarly colored background object (such as a red fire truck) may be present in the source image 204 . The presence of the background object that is similar in color to the candidate object 314 may indicate a lower likelihood that the candidate object 314 is of the predetermined object type 124 . The lower likelihood is due to the candidate object 314 being more likely to be a section of the background object.
- the background color test 530 may reduce the belief score 510 if the source image 204 contains a high percentage of a color similar to the color of the candidate object 314 . Alternatively, if the source image 204 contains a low percentage of a color similar to the color of the candidate object 314 , then the background color test 530 may not modify the belief score 510 .
- the context information 214 may include information about the faces 212 detected by the object detection module 126 .
- the verification module 128 may further limit the information about the detected faces 212 to information about faces that are also verified by the verification module 128 .
- the verification module 128 may verify the detected faces 212 by performing the reference image based verification tests 134 or any other type of test, such as a biometric test.
- the detected faces 212 may be limited to the faces that meet or exceed a predetermined belief level, such as a fifty percent likelihood that the detected face 212 is actually a face.
- the context information 214 may include metadata, such as geo-location data, associated with the source image 204 .
- a camera, or a device that includes the camera, that captured the source image 204 may tag the source image 204 with geo-location data indicating a physical location where the source image 204 was taken.
- the scan engine 114 may extract the geo-location data and determine a likelihood that an object of the predetermined object type 124 was at the physical location where the source image 204 was captured.
- the context based verification tests 136 may adjust the belief score 510 according to the likelihood that an object of the predetermined object type 124 was at the physical location where the source image 204 was captured. For example, the belief score 510 may be increased if the predetermined object type 124 is a beer bottle and the physical location is determined to be a bar.
- the context information 214 may include a capture date.
- the capture date may indicate a date on which the source image 204 was taken.
- the date may include a time of day.
- the date may include only a time of day in some examples.
- the capture date may be extracted from the metadata associated with source image 204 .
- the metadata may be added by the camera or any other device.
- the metadata may be a date on which the source image 204 was posted in the social networking service 102 .
- the context based verification tests 136 may adjust the belief score 510 according to the likelihood that an object of the predetermined object type 124 is present on the capture date. For example, if the predetermined object type 124 is a Christmas tree, then the candidate objects 132 are more likely to be a Christmas tree if the capture date of the source image 204 is on Christmas, or within a date range that includes Christmas. As a result, the context based verification tests 136 may increase the belief scores of the candidate objects 132 when searching for a Christmas tree and the capture date of the source image 204 is on Christmas or within a date range that includes Christmas.
- the context information 214 may include information about one or more images associated with the source image 204 .
- the images associated with the source image 204 may be images captured within a predetermined time of the source image 204 .
- the images associated with the source image 204 may be images included in one photo album in the social networking service 102 .
- the inclusion of the source image 204 in a photo album that also includes an image depicting one or more objects associated with the predetermined object type 124 may increase the likelihood that the candidate objects 132 are objects of the predetermined object type 124 .
- the images associated with the source image 204 may be images having a capture date within a predetermined amount of time from the capture dates of the associated images.
- the context based verification tests 136 may adjust the belief score 510 based on an amount of time between the capture date of the source image 204 and the capture date of an image that includes an object of the predetermine object type 124 or information associated with the predetermined object type 124 .
- the scan engine 114 detects an object of the predetermined object type 124 , such as a basketball, in an associated image with a relatively high belief score.
- the image was captured within close time proximity to (or within a predetermined amount of time of) the source image 204 .
- the associated image may be associated with the source image 204 by being in same photo album as the source image 204 .
- the context based verification tests 136 may increase the belief scores for the candidate objects 132 in the source image 204 when the scan engine searches the source image 204 for the predetermined object type 124 .
- the context information 214 may include an identity of one or more people depicted in the source image 204 and/or personally identifiable information of the people depicted in the source image 204 .
- the scan engine 114 may search for the predetermined object type 124 , such as a hand bag, in the source image 204 that depicts or is otherwise associated with individual A.
- Individual A may be associated with the source image 204 through a social tag and/or by facial recognition processing of the source image 204 .
- a database may store an indication that objects of the predetermined object type 124 have been detected in images associated with or depicting individual A. Alternatively or in addition, the database may indicate that individual A is otherwise associated with one or more suppliers of handbags.
- individual A may follow a handbag supplier on TWITTER®, be employed by the handbag supplier according to a social networking site such as LinkedIn, or have “liked” the handbag supplier's FACEBOOK® page (TWITTER is a registered mark of Twitter, Inc. of San Francisco, Calif.).
- the context based verification tests 136 may search the database for associations between the predetermined object type 124 and any individuals depicted in or otherwise associated with the source image 204 .
- the context based verification tests 136 may increase the belief scores of the candidate objects 132 when associations are found in the database.
- the context information 214 may include text-based social data associated with the source image 204 .
- the text-based social data associated with the source image 204 may be any text associated with the source image 204 in the social networking service 102 .
- Examples of the text-based social data may include album titles, photo captions, and/or comments.
- the predetermined object type 124 may be a dog and the source image 204 may be a photo pulled from the social networking service 102 . Someone may have commented on the photo with the words “cute dog.”
- the source image 204 may be an album cover for an album entitled “puppy play-date.”
- the text-based social data may be “cute dog” and “puppy play-date,” respectively.
- the context based verification tests 136 may increase the belief scores of the candidate objects 132 .
- the context information 214 may include the weather on the day the source image 204 is captured.
- the context based verification tests 136 may extract the capture date and the physical location of the source image 204 from the metadata of the source image 204 or other source.
- the context based verification tests 136 may identify the weather on the capture date at the physical location from a database of known weather conditions.
- the context based verification tests 136 may adjust the belief scores of the candidate objects 132 based on a likelihood of the predetermined object type 124 being depicted in a photo on the capture date at the physical location.
- the predetermined object type 124 may be an umbrella.
- the metadata of the source image 204 may indicate that the source image 204 was captured on Apr. 14, 1991 in Arlington, Va.
- the context based verification tests 136 may determine whether it was raining on the capture date in the capture location from the database of known weather conditions. The context based verification tests 136 may increase the belief scores of the candidate objects 132 if the database indicates that it rained on Apr. 14, 1991 in Arlington, Va.
- the belief score 510 and/or the adjusted belief score 514 is generated ( 508 and/or 514 ) for each candidate object and corresponding reference image. In other words, when multiple reference images 140 are compared with each candidate object, multiple belief scores and/or adjusted belief scores may be generated for each candidate object.
- the belief score 510 , the adjusted belief score 514 , the highest of the belief scores, and/or the highest of the adjusted belief scores may be compared to a predetermined threshold.
- the predetermined threshold may represent a threshold belief score at which the candidate object 314 is considered an object of the predetermined object type 124 .
- the location of the candidate object 314 may be stored in the memory 112 .
- the highest of the belief scores and/or the highest of the adjusted belief scores for each candidate object may be stored in the memory 112 .
- the size, the type of object, and the reference image that compared most similarly with each candidate object may be stored in the memory 112 .
- the stored information such as the belief score 510 or the adjusted belief score 514 may be presented to a user in the GUI 146 as a number, percentage, or in in word format.
- the word format may be a word, symbol, or phrase that represents level of confidence that the candidate object is, indeed, an object of the predetermined object type 124 .
- additional determinations may be made about the candidate object 314 . For example, a brand of a beverage or type of bottle may be determined for bottle objects. The additional determinations made based on the knowledge of the best matched reference object may be useful to advertisers or other parties.
- FIG. 6 illustrates an example 600 of the graphical user interface (GUI) 146 for building cascade classifiers used by the object detection module 126 .
- GUI graphical user interface
- a user may create any number of cascade classifiers for any object using the GUI 600 .
- the GUI 600 may include, for example, an options section 602 , a positive image section 604 , and a negative image section 606 .
- the options section 602 may include options that determine the behavior of the cascade classifier as a whole.
- the options section 602 may display, and facilitate adjustment of, a type of cascade classifier (such as Haar, Hog, or LBP), the width and height of template images, the number of stages in the cascade classifier, and a maximum allowable number of false alarms.
- a type of cascade classifier such as Haar, Hog, or LBP
- the positive image section 604 may display, and facilitate adjustment of, a positive image collection.
- the positive image collection is a collection of example images of the predetermined object type 124 that the cascade classifier 130 is to positively identify when applied to any source image.
- the negative image section 606 may display, and facilitate adjustment of, a negative image collection.
- the negative image collection is a collection of example images that do not depict objects of the predetermined object type 124 .
- the graphical user interface 600 may provide for simple and efficient creation of cascade classifiers from scratch.
- the custom creation of an xml cascade may comprise preparing a set of positive images that embody the predetermined object type 124 , and a set of negative images that do not contain the predetermined object type 124 .
- the number of steps 608 in the cascade process and a false alarm rate 610 of the cascade process may be adjusted in order to alter the sensitivity of the cascade.
- the GUI 600 may create or modify the cascade classifier 130 for any object type simply and quickly.
- the ability of the GUI 600 to create an xml cascade (or any other type of cascade classifier) for any type object may eliminate a reliance on available cascades that have a limited detection scope.
- the graphical user interface 600 may facilitate creation of cascade classifiers that are overly sensitive to positive matches, unlike many cascades available for download.
- the cascade classifiers may be overly sensitive to positive matches, and hence detect more false positives, because the verification module 128 may eliminate the false positives from final set of the detected objects 122 .
- FIG. 7 illustrates an example 700 of the graphical user interface (GUI) 146 for testing and adjusting parameters of the object detection module 126 and the verification module 128 .
- the GUI 70 may include, for example, a parameter section 702 , a feedback section 704 , and an information panel 706 .
- the parameter section 402 may display, and facilitate adjustment of, the parameters 708 of the object detection module 126 .
- the parameter section 402 may display, and facilitate adjustment of, parameters 710 of the verification module 128 .
- the parameters 710 of the verification module 128 may include the target difference values 506 used in the determination of the difference ratios 504 and the belief multipliers 512 used to adjust the impact of each characteristic on the belief score 510 .
- Additional parameters may be available for display and adjustment in the parameter section 702 , such as configuration of skin tones, key point and descriptor parameters, background matching, and the belief threshold to pass the final result to the end user interface.
- the feedback section 704 may provide a testing feedback mechanism.
- a test source image 712 may be loaded into the feedback section 704 .
- the types of objects 714 to search for may be selected.
- the scan engine 114 may execute the object detection module 126 and the verification module 128 using the parameters set in the parameter section 702 .
- the test source image 712 may be displayed along with graphical information reflecting results of the execution of the scan engine 114 .
- the graphical information may provide insight into intermediate results obtained during the execution of the scan engine 114 for a single selected object type.
- the example illustrated in FIG. 7 is a search for plastic cups.
- the faces 212 detected by the object detection module 126 may be displayed as squares or rectangles surrounding the positively-identified faces. If a face was not properly detected in the test source image 712 , then the user may adjust the cascade classifier for faces, and re-run the test.
- Another example of the graphical information may be identification 716 of the candidate objects 132 detected in the test source image 712 by the cascade classifier 132 for the predetermined object type 124 but that are not verified by the verification module 128 .
- the unverified candidate objects 716 may have belief scores and/or adjusted belief scores that are below the belief threshold 718 .
- the candidate objects 132 in the test source image 712 that are not verified may be identified by enclosing rectangles 716 , which correspond to locations and sizes of areas detected as matching the cascade parameters.
- Yet another example of the graphical information may be identification of the detected objects 122 , which are the candidate objects 132 that are verified by the verification module 128 .
- the detected objects 122 may be identified by rectangles in the test source image 712 that represent locations and sizes of areas enclosing the detected objects 122 . If an object of the predetermined object type 124 was not properly detected in the test source image 712 , then the user may adjust any of the parameters 708 and 710 , and re-run the test to determine whether the adjustments improved the accuracy in recognizing the detected objects 122 .
- the information panel 706 may provide additional feedback information.
- the information panel 706 may display any textual output of the scan engine 114 for analysis, along with final results.
- Each of the rectangles in the test source image 712 may be numbered in the test source image 712 .
- the information panel 706 may display information related to the objects in the rectangles.
- the information panel 706 may display the location, the size, the difference values 412 , the difference ratios 504 , the belief score 510 , and/or the adjusted belief score 516 for each of the candidate objects 132 next to a number of the corresponding candidate object.
- the information panel may display the characteristics of the candidate objects 132 and/or the reference images 140 .
- the final results may include, for example, the location, size, the object type, and the belief score of each of the detected objects 122 .
- the ability to adjust the parameters 708 and 710 and/or other aspects of the system 100 from within the graphical user interface 700 , and rapidly test and evaluate the adjustments, provides a dynamic and efficient tuning of the object recognition process.
- a user without extensive experience in object recognition technologies may test, evaluate, and improve the object recognition process for a large number of object types.
- FIG. 8 illustrates an example 800 of the graphical user interface (GUI) 146 for testing and adjusting the parameters 708 and 710 in a search for multiple object types 714 in a single test source image 802 .
- rectangles may overlay the verified and unverified candidate objects 132 in the feedback section 704 to represent the locations and sizes of the candidate objects 132 found by the object detection module 126 , as well as the detected objects 122 , which are the candidate objects 132 that are verified by the verification module 128 .
- yellow and purple rectangles may indicate objects detected but not verified
- white, light blue, green, and blue rectangles may indicate objects that were detected and verified by meeting the belief threshold for the respective object types. Each color may correspond to one of the object types.
- FIG. 9 illustrates an example 900 of the graphical user interface (GUI) 146 for presenting images 902 and text that are available in the social networking service 102 and in which objectionable material is detected.
- the images 902 may be organized from greatest threat level (highest belief score) to lowest threat level that exceeds the belief threshold 718 used by the scan engine 114 .
- the predetermined object types that the scan engine 114 searches the source images for may be a set of object types that are identified as objectionable.
- the object recognition device 104 may obtain the source images by searching the social networking service 102 for images that are to be scanned by the scan engine 114 .
- FIG. 10 illustrates an example 1000 of the graphical user interface (GUI) 146 for a user to provide feedback that the object recognition device 104 may use to improve the accuracy of object recognition.
- the GUI 1000 may display the source image 204 .
- the source image 204 may be selected by a user from the GUI illustrated in FIG. 9 or selected in any other manner.
- the source image 204 is scanned by the scan engine 114 for plastic cups and for any objects found to be “in-hand.”
- Objects that are “in-hand” may be objects held in a hand, or in some examples, held in a hand in a suspicious manner.
- the detected objects 122 may be identified in the source image 204 with a rectangle.
- the user may select any of the detected objects 122 for further information about the selected object.
- the GUI 1000 may display the belief score or a threat risk in easy to understand terms, such as “highly likely”, “100.00% confidence” or “minimal threat.”
- the user may also provide provide feedback, which may be used to help improve the accuracy of the process during future testing and adjustment.
- the GUI 1000 may display a collection of predetermined object types 1010 that the scan engine 114 searched the source image 204 for. The user may select any of the predetermined object types 1010 that are depicted 1020 in the source image 204 but that were not identified as being one of the detected objected 122 .
- the system 100 may be implemented with additional, different, or fewer components.
- the system 100 may include only the object recognition device 104 .
- the object recognition device 104 may not include the context based verification tests 136 .
- the logic flows illustrated in FIGS. 2-5 may include additional, different, or fewer operations than illustrated. The operations may be executed in a different order than illustrated.
- Each component may include additional, different, or fewer components.
- each of the client devices 106 may include a copy of all or a portion of the object recognition device 104 .
- the reference image based verification tests 134 may include the scoring module 138 or a portion thereof.
- the verification module 128 may not include the context based verification tests 136 .
- the GUI 146 generated on any of the client devices 106 may include only the admin GUI 148 , only the end user GUI 150 , or both the admin GUI 148 and the end user GUI 150 .
- Each module such as the scan engine 114 , the object detection module 126 , the verification module 128 , the reference image based verification tests 134 , the context based verification tests 136 , the scoring module 138 , the scan engine GUI module 116 , and/or the object detection service GUI module 118 , may be hardware or a combination of hardware and software.
- each module may include an application specific integrated circuit (ASIC), a Field Programmable Gate Array (FPGA), a circuit, a digital logic circuit, an analog circuit, a combination of discrete circuits, gates, or any other type of hardware or combination thereof.
- ASIC application specific integrated circuit
- FPGA Field Programmable Gate Array
- each module may include memory hardware, such as a portion of the memory 112 , for example, that comprises instructions executable with the processor 110 or other processor to implement one or more of the features of the module.
- memory hardware such as a portion of the memory 112 , for example, that comprises instructions executable with the processor 110 or other processor to implement one or more of the features of the module.
- the module may or may not include the processor.
- each module may just be the portion of the memory 112 or other physical memory that comprises instructions executable with the processor 110 or other processor to implement the features of the corresponding module without the module including any other hardware.
- each module includes at least some hardware even when the included hardware comprises software, each module may be interchangeably referred to as a hardware module, such as the object detection hardware module 126 , the verification hardware module 128 , the reference image based verification tests hardware module 134 , the context based verification tests hardware module 136 , the scoring hardware module 138 , the scan engine GUI hardware module 116 , and/or the object detection service GUI hardware module 118 .
- a hardware module such as the object detection hardware module 126 , the verification hardware module 128 , the reference image based verification tests hardware module 134 , the context based verification tests hardware module 136 , the scoring hardware module 138 , the scan engine GUI hardware module 116 , and/or the object detection service GUI hardware module 118 .
- the context based verification tests 136 adjust the belief score 510 determined from the difference ratios 504 and the belief multipliers 512 .
- the context based verification tests 136 may also generate difference ratios that are multiplied by corresponding belief multipliers in the determination of the belief score 510 .
- the difference ratios for the context based verification tests 136 may represent a difference between the candidate object 314 and corresponding characteristics of the predetermined object type.
- the processor 110 may be in communication with the memory 112 .
- the processor 110 may also be in communication with additional elements, such as a network interface and/or a display device.
- additional elements such as a network interface and/or a display device.
- Examples of the processor 110 may include a general processor, central processing unit, a controller, an application specific integrated circuit (ASIC), a digital signal processor, a field programmable gate array (FPGA), a digital circuit, and/or an analog circuit.
- ASIC application specific integrated circuit
- FPGA field programmable gate array
- the processor 110 may be one or more devices operable to execute logic.
- the logic may include computer executable instructions or computer code embodied in the memory 112 or in other memory that when executed by the processor 110 , cause the processor 110 to perform the features of the object recognition device 104 .
- the computer code may include instructions executable with the processor 110 .
- a computer readable storage medium for example, as logic implemented as computer executable instructions or as data structures in the memory 112 ). All or part of the system and its logic and data structures may be stored on, distributed across, or read from one or more types of computer readable storage media. Examples of the computer readable storage medium may include a hard disk, a floppy disk, a CD-ROM, a flash drive, a cache, volatile memory, non-volatile memory, RAM, flash memory, or any other type of computer readable storage medium or storage media.
- the computer readable storage medium may include any type of non-transitory computer readable medium, such as a CD-ROM, a volatile memory, a non-volatile memory, ROM, RAM, or any other suitable storage device. However, the computer readable storage medium is not a transitory transmission medium for propagating signals.
- the processing capability of the system 100 may be distributed among multiple entities, such as among multiple processors and memories, optionally including multiple distributed processing systems.
- Parameters, databases, and other data structures may be separately stored and managed, may be incorporated into a single memory or database, may be logically and physically organized in many different ways, and may implemented with different types of data structures such as linked lists, hash tables, or implicit storage mechanisms.
- Logic such as programs or circuitry, may be combined or split among multiple programs, distributed across several memories and processors, and may be implemented in a library, such as a shared library (for example, a dynamic link library (DLL)).
- DLL dynamic link library
- the respective logic, software or instructions for implementing the processes, methods and/or techniques discussed above may be provided on computer readable storage media.
- the functions, acts or tasks illustrated in the figures or described herein may be executed in response to one or more sets of logic or instructions stored in or on computer readable media.
- the functions, acts or tasks are independent of the particular type of instructions set, storage media, processor or processing strategy and may be performed by software, hardware, integrated circuits, firmware, micro code and the like, operating alone or in combination.
- processing strategies may include multiprocessing, multitasking, parallel processing and the like.
- the instructions are stored on a removable media device for reading by local or remote systems.
- the logic or instructions are stored in a remote location for transfer through a computer network or over telephone lines.
- the logic or instructions are stored within a given computer, central processing unit (“CPU”), graphics processing unit (“GPU”), or system.
- a processor may be implemented as a microprocessor, microcontroller, application specific integrated circuit (ASIC), discrete logic, or a combination of other type of circuits or logic.
- memories may be DRAM, SRAM, Flash or any other type of memory.
- Flags, data, databases, tables, entities, and other data structures may be separately stored and managed, may be incorporated into a single memory or database, may be distributed, or may be logically and physically organized in many different ways.
- the components may operate independently or be part of a same program or apparatus.
- the components may be resident on separate hardware, such as separate removable circuit boards, or share common hardware, such as a same memory and processor for implementing instructions from the memory.
- Programs may be parts of a single program, separate programs, or distributed across several memories and processors.
- the phrases “at least one of ⁇ A>, ⁇ B>, . . . and ⁇ N>” or “at least one of ⁇ A>, ⁇ B>, . . . ⁇ N>, or combinations thereof” or “ ⁇ A>, ⁇ B>, . . . and/or ⁇ N>” are defined by the Applicant in the broadest sense, superseding any other implied definitions hereinbefore or hereinafter unless expressly asserted by the Applicant to the contrary, to mean one or more elements selected from the group comprising A, B, . . . and N.
- the phrases mean any combination of one or more of the elements A, B, . . . or N including any one element alone or the one element in combination with one or more of the other elements which may also include, in combination, additional elements not listed.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Image Analysis (AREA)
Abstract
Description
- 1. Technical Field.
- This application relates to computer vision and, in particular, to object recognition or detection.
- 2. Related Art.
- Social network use has expanded dramatically in recent years, with social networking services such as Facebook® (a registered trademark of Facebook, Inc. of Menlo Park, Calif.) boasting more than a billion users. Social networking services facilitate users posting text and images that may be viewed by others. Posted text and images may remain available for viewing and are often not removed. Accordingly, the amount of posted text may grow over time, and the number of posted images may increase over time.
- An object recognition system may be provided that includes an object detection module, multiple verification tests, a scoring module, and a verification module. The object detection module may apply a cascade classifier to a source image, which results in identification of candidate objects for a predetermined object type. Each of the verification tests may generate difference values for a candidate object identified by the object detection module and a corresponding reference image, where the corresponding reference image depicts an object of the predetermined object type, and where each one of the difference values represents an indication of a difference between a characteristic of the candidate object and a characteristic of the corresponding reference image. The scoring module may determine, for each of the candidate objects, a belief score for the candidate object based on the difference values for the candidate object. The belief score may indicate a likelihood that the candidate object is of the predetermined object type. The verification module may identify a set of detected objects based on the candidate objects and the belief scores for the candidate objects.
- A computer readable storage medium may be provided that includes computer executable instructions. When executed, source images that are shared in a social networking service may be identified. A candidate object may be detected in any of the source images by applying a cascade classifier in search of an object of a predetermined object type. Difference values may be generated based on comparisons of characteristics of the candidate object with corresponding characteristics of a reference image. Each one of the difference values may indicate a difference between a respective one of the characteristics of the candidate object and a corresponding respective one of the characteristics of the reference image. A belief score may be generated for the candidate object based on differences between the difference values and target difference values. The belief score may indicate the likelihood that the candidate object is an object of the predetermined object type. Any of the source images that includes the candidate object may be identified as including the predetermined object type when the belief score exceeds a threshold belief score.
- A method is provided to recognize objects in an image. A source image may be searched for any candidate objects of a predetermined object type by applying a cascade classifier associated with the predetermined object type to the source image. Scores, such as difference values, for a candidate object may be determined from a plurality of verification tests applied to the candidate object. Each one of the scores may be determined from a corresponding one of the verification tests. Each one of the scores may represent an indication of a difference between the candidate object and a set of reference images for the predetermined object type. A belief score may be determined for the candidate object from the scores for the candidate object. The belief score may indicate the likelihood that the candidate object is of the predetermined object type. The candidate object may be identified as a detected object of the predetermined object type when the belief score relative to a threshold belief score indicates the candidate object is of the predetermined object type.
- The embodiments may be better understood with reference to the following drawings and description. The components in the figures are not necessarily to scale. Moreover, in the figures, like-referenced numerals designate corresponding parts throughout the different views.
-
FIG. 1 illustrates an object recognition system; -
FIG. 2 illustrates the logic flow of an object detection module; -
FIG. 3 illustrates a first part of the logic flow of a verification module; -
FIG. 4 illustrates a second part of the logic flow of a verification module; -
FIG. 5 illustrates a third part of the logic flow a verification module; -
FIG. 6 illustrates a graphical user interface for building cascade classifiers; -
FIG. 7 illustrates a graphical user interface for testing and adjusting parameters of an object detection module and a verification module; -
FIG. 8 illustrates a graphical user interface for testing and adjusting parameters of an object detection module and a verification module in a search for multiple object types; -
FIG. 9 illustrates a graphical user interface for presenting images and text available in the social networking service in which objectionable material is detected; and -
FIG. 10 illustrates an example of a graphical user interface for providing feedback to improve the accuracy of object recognition. - In one example, source images that are shared in a social networking service may be identified. For example, any images of a person that are publicly available may be identified. To search the source images for a predetermined object type, a cascade classifier associated with the predetermined object type may be applied to each of the source images. The predetermined object type may be a beer can, a beer bottle, or any other type of object. One or more candidate object may be identified by applying the cascade classifier.
- However, the candidate object may not be an object of the predetermined type. Verification tests may verify whether the candidate object is such an object. Difference values may be generated based on comparisons of characteristics of the candidate object with corresponding characteristics of a reference image. The reference image may be an image known to depict an object of the predetermined object type. Each one of the difference values may indicate a difference between a respective one of the characteristics of the candidate object and a corresponding respective one of the characteristics of the reference image. A belief score may be determined for the candidate object based on differences between the difference values and target difference values. Each one of the target difference values may be an expected difference value for a corresponding one of the characteristics of any reference image and any candidate image that actually depicts an object of the predetermined object type. The belief score may indicate the likelihood that the candidate object is an object of the predetermined object type. The source image that includes the candidate object may be identified as including the predetermined object type when the belief score exceeds a threshold belief score.
-
FIG. 1 illustrates anobject recognition system 100. Theobject recognition system 100 may recognize or detect objects in any context. For example, theobject recognition system 100 illustrated inFIG. 1 recognizes objects in the context of asocial networking service 102. In alternative examples, thesystem 100 may recognize objects a surveillance system, in a robotics system, or in any other context in which object recognition functionality may be desirable. - The
system 100 may include anobject recognition device 104 and one ormore client devices 106. Theobject recognition device 104 may be in communication with thesocial networking service 102 and theclient devices 106 over anetwork 108. - The
object recognition device 104 may be included in any type of device. For example, theobject recognition device 104 may be included in a computer, a server, a smart phone, a smart device, a mobile phone, a robot, an appliance, a circuit, and/or an integrated circuit chip. In one example, theobject recognition device 104 may be included in a server or servers that host thesocial networking service 102. - The
social networking service 102 may be a service through which people may build social networks or social relations among each other. The people in a social network may share, for example, interests, activities, backgrounds, and/or connections in real-life. In particular, thesocial network service 102 may facilitate uploading images that others may view. Examples of thesocial networking service 102 may include FACEBOOK®, INSTAGRAM® (INSTAGRAM is a registered trademark of Instagram, LLC of Menlo Park, Calif.), and/or any other social networking service. - Each of the
client devices 106 may be any computing device. Examples of theclient devices 106 may include a computer, a laptop, a tablet, a mobile phone, a smart phone, an appliance, or any other type of computing device. Theclient devices 106 may be referred to as clients ofobject recognition device 104 because theclient devices 106 may use services provided by theobject recognition device 104. - The
network 108 may be any collection of transmission links over which data between computing nodes may be exchanged. For example, thenetwork 108 may include a local area network (LAN), a wired network, a wireless network, a wireless local area network (WLAN), a WI-FI® network (WI-FI is a registered trademark of Wireless Ethernet Compatibility Alliance, Inc. of Austin, Tex.), a personal area network (PAN), a wide area network (WAN), the Internet, an Internet Protocol (IP) network, and/or any other communications network. - In
FIG. 1 , theobject recognition device 104 is physically distinct from thesocial networking service 102 and theclient devices 106. Alternatively or in addition, theobject recognition device 104 may be included in thesocial networking service 102 and/or in one or more servers that host thesocial networking service 102. Alternatively or in addition, theobject recognition device 104 may be included in one or more of theclient devices 106. - The
object recognition device 104 may include aprocessor 110 and amemory 112. Thememory 112 may include ascan engine 114, a scan engine GUI (Graphical User Interface)module 116, and an object detectionservice GUI module 118. - The
scan engine 114 may be a component that detects anyobjects 122 in thesource images 120 that are of apredetermined object type 124, such as a plastic cup, a beer bottle, a tool, and/or a type of animal. Thescan engine 114 may include anobject detection module 126 and averification module 128. - The
object detection module 126 of thescan engine 114 may be a component that applies acascade classifier 130 to thesource images 120 or otherwise locates one or more candidate objects 132 in thesource images 120. For example, application of thecascade classifier 130, such as an XML cascade, to any of thesource images 120 may locate one or more candidate objects 132 that are possibly objects of thepredetermined object type 124. - The
verification module 128 may be a component that verifies that the candidate objects 132 are objects of thepredetermined object type 124. Theverification module 128 may include one or more reference image based verification tests 134, one or more context based verification tests 136, and ascoring module 138. - As described in more detail below, the reference image based verification tests 134 may be tests that compare the candidate objects 132 with
reference images 140 to identify similarities and/or differences. The context based verification tests 136 may be tests that are based on a context of any of the candidate objects 132. For example, the context of a candidate object may be a location of candidate object relative to a face detected in a source image. As described in more detail later below, the context may include any context different from, and/or in addition to, the location of the candidate object relative to the detected face. - The
scoring module 138 of theverification module 128 may be a component that generatesscores 142 from one or more of thetests 134 and/or 136. Each of thescores 142 may represent an indication of a difference—or equivalently, a similarity—between one of the candidate objects 132 and one or more of thereference images 140 that depict thepredetermined object type 124. Alternatively or in addition, thescoring module 138 may be a component that generates abelief score 144 from thescores 142 generated by one or more of thetests 134 and/or 136. - The
belief score 144 may be any indication of the likelihood that the candidate object is an object of thepredetermined object type 124. For example, thebelief score 144 may be a numerical value, a percentage, and/or a symbol or a phrase, such as “likely” and “unlikely.” - The scan
engine GUI module 116 may be a component that generates aGUI 146 for configuring the behavior of thescan engine 114. For example, the scanengine GUI module 116 may generate one or more web pages that are viewed at theclient devices 106. Alternatively or in addition, the scanengine GUI module 116 may generate theGUI 146 in an app or software application that executes in theclient devices 106. Examples of such a GUI are provided later below and illustrated inFIGS. 6-8 . Theclient devices 106 or a subset thereof may be devices used by one or more administrative users or developers. Alternatively or in addition, theclient devices 106 or a subset thereof may be devices used by one or more end users. TheGUI 146 generated by the scanengine GUI module 116 may be anadministrator GUI 148 limited to use by administrative users in many examples. - The object detection
service GUI module 118 may be a component that generates theGUI 146 for using thescan engine 114 in the context of thesocial networking service 102. Examples of such a GUI are provided later below and illustrated inFIGS. 9-10 . TheGUI 146 generated by the object detectionservice GUI module 118 may anend user GUI 150 for end users in many examples. - The graphical user interface (GUI) 106 generated by either
GUI module client devices 146. TheGUI 106 may include graphical icons and/or any other type of visual indicators to represent information and actions available to a user. The actions may be performed through direct manipulation of the graphical elements. More generally, theGUI 106 may be a text-based interface or text navigation interface. - During operation of the
object recognition system 100, thescan engine 114 may search one or more of thesource images 120 for thepredetermined object type 124 or a set of predetermined object types. Thesource images 120 may be obtained from any source. - For example, when the
object recognition system 100 is applied to one or more social networking services, such as thesocial networking service 102 inFIG. 1 , thesource images 120 may be obtained from thesocial networking service 102. Thesource images 120 may be images in a user's social network that are public, images posted by a user that are available to members of the user's social network, images in which a user is “tagged” or identified with, and/or images selected by any other criteria. The user may provide theobject recognition device 104 with authorization to access thesocial networking service 102. The user may provide authorization by, for example, providing log-in credentials to theobject recognition device 104. - In different examples, the
source images 120 may be obtained from different sources of images. Thesource images 120 may be obtained from a web search for images associated with a person, for example. In the context of a robotics system, thesource images 120 may be obtained from a camera mounted on a robot or from another image source in the robotics system. In the context of a surveillance system, thesource images 120 may be obtained from a security camera. - The
predetermined object type 124 or types may be any type of object that theobject recognition system 100 is requested to find. For example, a user may wish to identify objects that a set of people, such as employers or family members, may find objectionable. Alternatively or in addition, a user may wish to identify object that may pose a security risk. Examples of thepredetermined object type 124 may include a beer bottle, a beer can, a plastic cup, such as a SOLO® cup (SOLO is a registered trademark of Solo Cup Company of Lake Forest, Ill.), a beer bong, a can, a bottle, a backpack, a duffle bag, a weapon, a pistol, an animal, a person, a face, or any other type of object. - The
predetermined object type 124 or predetermined object types may be predetermined in the sense that theobject type 124 or types may be determined prior to searching thesource images 120 for theobject type 124 or types. A user, such as an administrative user, may identify thepredetermined object type 124 or types. - When scanning the
source images 120 for theobject type 124, theobject detection module 126 of thescan engine 114 may locate one or more candidate objects 132 in thesource images 120.FIG. 2 illustrates anexample logic flow 200 of theobject detection module 126. - The
object detection module 126 may resize (206) aninitial source image 202 to obtain asource image 204 that has a target size. The target size may be selected to be large enough, by pixel standards, to detect and verify thepredetermined object type 124 or types, but not so large that detecting and verifying objects exceeds a threshold amount of time. An example of the target size may be approximately 2000 horizontal pixels and 1300 vertical pixels. The target size may depend on factors such as the speed of theprocessor 110, characteristics of theobject type 124, and/or the number and variety of object types that thescan engine 114 searches for. - Resizing (206) the
initial source image 202 may improve the speed by which the detectedobjects 122 may be recognized, while only incurring a small loss of accuracy in recognizing objects. Nevertheless, thesource image 204 may have any size and theinitial source image 202 need not be resized. - To locate the candidate objects 132, the
object detection module 126 may apply (208) thecascade classifier 130 to thesource image 204. Thecascade classifier 130 may be an XML (eXended Markup Language) cascade, for example. - The type of the
cascade classifier 130 applied may be any type of cascade classifier. For example, thecascade classifier 130 may be a Haar-like feature classifier, a local binary pattern (LBP) feature classifier, a histogram of gradient (HOG) feature classifier, or any other type of cascade classifier. Each type of cascade classifier may implement a corresponding detection algorithm. Examples of the detection algorithm may include Haar, LBP, HOG, or any other type of cascade algorithm. - The type of the
cascade classifier 130 that is applied to thesource image 204 may vary depending on theobject type 124. Each type of object may be identified more accurately with one type of cascade classifier than another. For example, if thepredetermined object type 124 is a type of object that includes lettering, then a LBP feature classifier may be associated with thepredetermined object type 124 in thememory 112. - A user may select and/or associate a selected
cascade classifier 130 with thepredetermined object type 124 in thememory 112. The scanengine GUI module 116 may generate a GUI, as illustrated inFIG. 6 for example, for selecting and/or associating thecascade classifier 130 with thepredetermined object type 124. Alternatively or in addition, thecascade classifier 130 may be customized with the GUI generated by the scanengine GUI module 116 as illustrated inFIG. 6 . - The behavior of detection algorithm of the
object detection module 126 may be controlled by parameters. The parameters may be adjusted and passed to theobject detection module 126. The scanengine GUI module 116 may generate a GUI, as illustrated inFIG. 7 for example, for adjusting the parameters passed to theobject detection module 126. - Customizing the
cascade classifier 130, associating thecascade classifier 130 with thepredetermined object type 124, and/or adjusting the parameters to theobject detection module 126 may be performed prior to theobject detection module 126 searching thesource image 204 for thepredetermined object type 124. Alternatively or in addition, such action or actions may be performed while theobject detection module 126 searches thesource image 204 for thepredetermined object type 124. Alternatively or in addition, such action or actions may be performed after theobject detection module 126 searches thesource image 204. - The
object detection module 126 may store a size and/or a location of each of the candidate objects 132. For example, Cartesian coordinates, measured in pixels, of each of the candidate objects 132 may be stored in thememory 112. The height and width, for example in pixels, of each of the detected faces 212 may be stored in thememory 112. - In addition to locating the candidate objects 132 in the
source image 204, theobject detection module 126 may detect (210) faces 212 in thesource image 204. Theobject detection module 126 may, for example, apply an XML cascade to thesource image 204 thereby detecting anyfaces 212 in thesource image 204. For example the XML cascade may evaluate thesource image 204 for Haar-like features. - The
object detection module 126 may store a location of each of the detected faces 212. For example, Cartesian coordinates, measured in pixels, of each of the detected faces 212 may be stored in thememory 112. Alternatively or in addition, a size of each of the detected faces 212 may be stored. For example, the height and width in pixels of each of the detected faces 212 may be stored in thememory 112. In some examples, theobject detection module 126 may determine an average size of the detected faces 212. - The size, average size, and/or location of the detected faces 212 may provide
context information 214 for the candidate objects 132. Theverification module 128 may use thecontext information 214 to verify that the candidate objects 132 are objects of thepredetermined object type 124. In particular, as described later below, theverification module 128 may compare the size, the average size, and/or the location of the detected faces 212 with a relative expected size and/or a relative expected location of an object of thepredetermined object type 124. Alternatively or in addition, theverification module 128 may use the size, average size, and/or location of the detected faces 212 to adjust a likelihood that each of the candidate objects 132 is of the predetermined object type based on a likelihood that an object of thepredetermined object type 124 may overlap any of the detected faces 212. - In addition to the context based verification tests 136,
verification module 128 may perform the reference image based verification tests 134. Verification of the candidate objects 132 that are detected with thecascade classifier 130 may improve the accuracy of detecting objects over detecting objects with just thecascade classifier 130 alone. When objects are detected with just a cascade classifier—in other words, without verifying the candidate objects 132 as described herein—thecascade classifier 130 may be configured to achieve a suitable balance of true positives, false positives, and false negatives. As a result of achieving that balance, undetected objects that may have otherwise been detected are eliminated from further consideration. - By performing the verification tests 134 and/or 136, the
cascade classifier 130 may be configured to identify more false positives than in the absence of performing the verification tests 134 and/or 136. Accordingly, the overall accuracy in identifying the detectedobjects 122 may be improved. -
FIG. 3 illustrates a flow diagram of an example of part of thelogic 300 of theverification module 128. For each of the candidate objects 132,characteristics candidate object 314 may be generated (318, 320, 322, 324, 326, and/or 328). - For example, a
histogram 302 of thecandidate object 314 may be generated (318). Thehistogram 302 may represent variations in shading and/or coloration. Thehistogram 302 may, for example, include a map of shading and/or color values arranged in “bins.” Each of the bins may represent a subset of a range of such values. - The
histogram 320 may provide a basis for finding similarities and/or differences between two objects. For example, thehistogram 302 of a banana may match thehistogram 302 of a lemon because the number of pixels that are shades representing yellow may be comparable for both objects, even though other aspects of the objects, such as their the shapes, are different from each other. Thehistogram 302 of thecandidate object 314 may be subsequently compared with ahistogram 330 of each of thereference images 140, such as with thehistogram 302 ofreference image 350 illustrated inFIG. 3 . Thehistogram 320 may include multiple histograms because multiple types of histograms may be generated. Each type of histogram may represent properties of an image that are different than properties represented by the other types of histograms included in thehistogram 320. For example, thehistogram 320 may include a histogram of predetermined portions of color data and a histogram of grayscale shades. - A
color map 304 of color data of thecandidate object 314 may be generated (320). Thecolor map 304 may be a pixel by pixel representation of the image in red-green-blue (RGB) color space. Thecolor map 304 of thecandidate object 314 may be subsequently compared with acolor map 332 of one or more of thereference images 140. - A
hue map 306 of hue data of thecandidate object 314 may be generated (322). Thehue map 306 may be a pixel by pixel representation of thecandidate object 314 in hue, saturation, and value (HSV) color space. Alternatively or in addition, thehue map 306 may be a representation of thecandidate object 314 in a HSL (hue, saturation, and lightness) color space, a HSI (hue, saturation, and intensity) color space, and/or any other color space. Thehue map 306 of thecandidate object 314 may be subsequently compared with ahue map 334 of one or more of thereference images 140. -
Key points 308 of thecandidate object 314 may be identified (324). The key points 308 may represent significant features within thecandidate object 314, such as corners and areas of contrast. Such features are known as key points. The key points 308 may include pixel information from around such features. For example, thekey points 308 may include descriptors that include the pixel information. Thekey points 308 of thecandidate object 314 may be subsequently compared withkey points 336 of one or more of thereference images 140. - A
percentage 310 of thecandidate object 314 that contains hue, saturation, and value data that are within a range that represents skin tones may be determined (326). For example, if fifty percent of thecandidate object 314 contains hue, saturation and value data within the range that represents skin tones, then half of thecandidate object 314 may be skin. Thepercentage 310 may also be represented as and/or referred to as askin ratio 310. - The
skin ratio 310 of thecandidate object 314 may be subsequently compared with askin ratio 338 of one or more of thereference images 140. The range of hue, saturation, and value data that represents skin tones may be determined prior to detecting any of the candidate objects 132. - Alternatively or in addition, any
other characteristics 312 of thecandidate object 314 that may be useful for comparison with thereference images 140 or that may provide context for thecandidate object 314 may be determined and/or stored (328). Examples ofsuch characteristics 312 may include an average color or hue of thecandidate object 314, a location of thecandidate object 314 relative to any of the detected faces 212, and/or any other characteristic of thecandidate object 314. Theadditional characteristics 312 of thecandidate object 314 may be compared with correspondingadditional characteristics 340 of thereference image 350. - The
histogram 330, thecolor map 332, thehue map 334, thekey points 336, theskin ratio 338, and/or theadditional characteristics 340 may be generated (352, 354, 356, 358, 360, and/or 362) for each of thereference images 140. - Each of the
reference images 140 may be an image of an object that is confirmed to be of thepredetermined object type 124. Thereference images 140 may be customized to improve the accuracy of theverification module 128. For example, thereference images 140 may be added to, deleted from, or adjusted at any time. As described in more detail below, thecharacteristics reference images 140 may be used in the verification tests 134 and/or 136 for comparison with the candidate objects 132. -
FIG. 4 illustrates a flow diagram of an example of part of thelogic 400 of theverification module 128. In particular,FIG. 4 illustrates a flow diagram of the logic of the reference image based verification tests 134. For eachpredetermined object type 124 that thescan engine 114 attempts to locate in thesource image 204, a set of the candidate objects 132 of thattype 124 may be found by theobject detection module 126. For each of the candidate objects 132 found, a series of comparisons may be made to each ofreference images 140 of thepredetermined object type 124. The comparisons may be performed by the reference image based verification tests 134. - For example, the reference image based verification tests 134 may include a
histogram comparator 402, an RGB color comparator 404, ahue comparator 406, and/or akey point comparator 408. The reference image based verification tests 134 may include additional, fewer, or different comparators than illustrated inFIG. 4 . - The
comparators characteristics candidate object 314. In addition, thecomparators characteristics reference images 140. As a result of each comparison of thecandidate object 314 with thecorresponding reference image 350, thecomparators candidate object 314 and thecorresponding reference image 350. Equivalently, each of the difference values 412 may represent a similarity between thecandidate object 314 and thecorresponding reference image 350. - For example, the
histogram comparator 402 may compare thehistogram 302 of thecandidate object 314 to thehistogram 330 of eachreference image 350 using one or more algorithms. Thehistogram comparator 402 may generate, from each comparison, a corresponding one of the difference values 412 for each algorithm that thehistogram comparator 402 applies. The algorithm and/or algorithms may include any type of histogram comparison algorithm. For example thehistogram comparator 402 may implement a correlation metric, chi-square metric, intersection metric, and/or Bhattacharyya distance metric computation. - The RGB color comparator 404 may compare the
color map 304 of thecandidate object 314 to thecolor map 332 of eachreference image 350. The RBG color comparator 404 may generate, forreference image 350, a respective one of the difference values 412 based on the comparison of the color maps 304 and 332. The RGB color comparator 404 may compare the color maps 304 and 332 using one or more types of comparisons. One of the types of RGB color comparisons may include a grayscale conversion comparison, for example. Thecandidate object 314 and thereference image 350 may be converted to grayscale images. For each pixel, the grayscale value (0-256) of the pixel in thecandidate object 314 may be subtracted from the grayscale value of thereference image 350, and the difference may be squared. The sum of the squared values for the pixels may represent one the difference values 412 generated by the RGB color comparator 404. Alternatively or in addition, the types of RGB color comparisons may include a peak color difference comparison. For example, each pixel in thecandidate object 314 may be compared to each pixel in thereference image 350 in each color channel (Red, Green, Blue) separately. The color channel having the greatest difference between the pixel in thecandidate object 314 and the pixel in thereference image 350 may be determined. The difference between the pixel in thecandidate object 314 and the pixel in thereference image 350 in the determined color channel may be squared a represent a peak value. The sum of the peak values may represent one of the difference values 412 generated by the RGB color comparator 404. Alternatively or in addition, the types of RGB comparisons may include a sum of squares comparison. Each pixel in thecandidate object 314 may be compared to each pixel in the reference image in each color channel (Red, Green, Blue) separately. A square of the difference in each channel may be determined. One of the difference values 412 generated by the RGB color comparator 404 may be a sum of the squares for each of the channels for all of the pixels. - The
hue comparator 406 may compare thehue map 334 of thecandidate object 314 to thehue map 334 of thereference image 350. Thehue comparator 406 may compare thecandidate object 314 with eachreference image 350 in the HSV color space, the HSL color space, the HSI color space and/or any other color space. Thehue comparator 406 may generate, for each comparison, a respective one of the difference values 412. Thehue comparator 406 may compare thehue map 334 of thecandidate object 314 to thehue map 334 of thereference image 350 using one or more types of comparisons. The comparison or comparisons may include comparisons similar to the RGB color comparisons except that the color channels may be hue, saturation, and value (HSV); hue, saturation, and lightness (HSL); hue, saturation, and intensity (HSI); and/or any other color channels or combinations thereof. - The
key point comparator 408 may compare thekey points 308 of thecandidate object 314 with thekey points 336 of eachreference image 350. For example, descriptors in thekey points key point comparator 408 may generate, for each comparison, a respective one of the difference values 412. The key points 336 may be determined using the FAST (Features from Accelerated Segment Test) feature detecting algorithm or any other feature detecting algorithm, such as difference of Gaussians (DoG). The descriptors for each key point may be determined using an ORB (oriented BRIEF) keypoint detector or any other type of detector. The descriptors may represent a grid of pixel information surrounding each of thekey points 336, where the grid of pixel information may be configurable. A brute force matcher may compare each descriptor for thekey points 336 in thecandidate object 314 to each descriptor of thekey points 336 in thereference image 350. A brute force matcher is a matcher that does not apply a specialized algorithm to speed up the matching process. Alternatively, any other type of matcher may be used. The brute force matcher may return a location of a key point in thereference image 350 that best matches each corresponding key point in thecandidate object 314, as well as a corresponding numerical score. The numerical score may be the sum of the differences between the matching key point descriptors. The resulting data may be parsed to identify one singular best match of each of thekey points 308 in thecandidate object 314 with a corresponding one of thekey points 336 in thereference image 350. In other words, none of the key points of thecandidate object 314 is a best match with multiplekey points 336 of thereference image 350. The data may be further parsed to remove matches in which the numerical score of the respective match falls below a threshold score. The data may be further parsed to remove matches that fail to meet a Cartesian y-range limit. In other words, each of the matching descriptors are to include points that each match in the same relative Y position in thecandidate object 314 andreference image 350. The number of matching key points that meet such criteria may be divided by the number of pixels in thecandidate object 314, resulting in the key point comparator score. The variables used in this comparator may be adjustable from theGUI 146 generated by the scan engine GUI module (116). -
FIG. 5 illustrates a flow diagram of an example of part of thelogic 500 of theverification module 128. In particular,FIG. 5 illustrates a flow diagram of the logic of thescoring module 138 and the logic of the context based verification tests 136. - The
scoring module 138 may determine (502)difference ratios 504 based on the difference values 412 and on target difference values 506. Each one of the target difference values 506 may be an expected difference value for a corresponding one of thecharacteristics predetermined object type 124. In some examples, the expected difference value may be a minimum threshold difference value needed for thecandidate object 314 to match thereference image 350 for the corresponding one of thecharacteristics - The
difference ratio 504 for the respective one of the characteristics, c, may be determined as: [(difference valuec−target differencec)/target differencec]. Alternatively, thedifference ratio 504 may be determined based on any algorithm in which the greater negative difference between each of the difference values 412 and the corresponding one of the target difference values 506, the greater similarity between thecandidate object 314 and thereference image 350 with respect to the corresponding characteristic. Conversely, the greater positive difference between each of the difference values 412 and the corresponding one of the target difference values 506, the greater difference between thecandidate object 314 and thereference image 350 with respect to the corresponding characteristic. - The formula for the
difference ratio 504 for the respective one of the characteristics, c, may vary depending on whether the difference score is preferably lower than the target difference or preferably greater than the target difference. If the characteristic, c, is desired to be greater than the target difference for a match, then the formula provided above may apply. However, if the characteristic, c, is desired to be lower than the target difference, then the formula [(target differencec−difference valuec)/target differencec] may apply. The determination of thedifference ratios 504 may standardize each test to a similar range of ratios. - Consider an example where the
target difference value 506 for thehistogram 302 characteristic is 10, and a greater value is more desirable than a lesser value (in other words, the larger the difference value, the better the match). If the difference value for thehistogram 302 of candidate object 315 is 15, then the difference ratio may be (15−10)/10, or 0.5, which is a positive number that positively influences thebelief score 510 toward acceptance, particularly after multiplication with a corresponding one of the belief multipliers 512. On the other hand, if the difference value for thehistogram 302 of the candidate object is 5, then the difference ratio may be (5−10)/10, or −0.5, which is a negative number that will negatively influence thebelief score 510, particularly after multiplication with the corresponding one of the belief multipliers 512. Alternatively, if a lesser difference value is more desirable than a greater difference value for the characteristic, c, then the first difference ratio may be (10−15)/10, or −0.5, and the second difference ratio may be (10−5)/10, or 0.5. The sign of the difference ratios are now reversed and have the opposite effect on thebelief score 510. - In addition to determining the
difference ratios 504, thescoring module 138 may determine (508) abelief score 510 based on thedifference ratios 504 and onbelief multipliers 512. Thebelief score 510 may indicate a likelihood or probability that thecandidate object 314 matches thereference image 350. - The
scoring module 138 may determine thebelief score 510 based on an algorithm in which thebelief score 510 falls into a suitable range. The suitable range may be a range in which a belief score of 50 represents a 50 percent chance that candidate object 314 matches thereference image 350, a belief score of 100 represents an almost 100 percent chance of a match, and a score of 0 (or less) represents an almost zero percent chance of a match. Each of thedifference ratios 504 may be applied to thebelief score 510. The amount of each of thedifference ratios 504 that is applied is based on adjustable multipliers that determine an importance of each characteristic for thepredetermined object type 124. The adjustable multipliers are the belief multipliers 512. - In some examples, the
scoring module 138 may determine (508) thebelief score 510 as a sum of weighted difference ratios (thedifference ratios 504 weighted by the belief multipliers 512), the sum then multiplied by a scalar, such as 20, and added to a constant, such as 50 percent. In other words, thebelief score 510 may be determined according to the following: -
- where rcc is the difference ratio for a characteristic, c; N is the number of the characteristics that are applied to the
belief score 510; Mc is the belief multiplier for the characteristic, c; S is the scalar, and K is the constant. Alternatively, thebelief score 510 may be determined using other algorithms. - The belief multipliers 512 configured for some predetermined object types may differ from the
belief multipliers 512 configured for other predetermined object types. For example, a first set of object types may be more accurately matched using thekey points 308 characteristic, while a second set of object types may be more accurately matched using thecolor map 304 characteristic. Accordingly, the belief multiplier for thekey points 308 characteristic that is associated with the first set of object types may be higher than the belief multiplier for thekey points 308 characteristic that is associated with the second set of object types. - For any characteristic, a positive difference ratio may indicate that the difference value is outside the bound of the target difference, which may negatively affect the
belief score 510. Conversely, a negative difference ratio may indicate that the difference value is inside the bound of the target difference, which may positively affect thebelief score 510. The greater the difference ratio, the greater the effect on thebelief score 510. As illustrated inFIGS. 7 and 8 , the target difference values 506 may be adjustable and tuned by a user with theGUI 146. The characteristics for some object types in some examples may require strict target differences in certain characteristics, and more lenient differences in other examples. Like the target differences, thebelief multipliers 512 may be adjusted and tested from within theGUI 146 for thepredetermined object type 124. - Additional tests, such as the context based verification tests 136, may be performed that adjust the
belief score 510. Based on thecontext information 214, thecharacteristics candidate object 314, and/or characteristics of thepredetermined object type 124, the context based verification tests 136 may generate (514) an adjustedbelief score 516. - The context based verification tests 136 may include a
skin tone test 520, animage location test 522, aface location test 524, animage size test 526, aface size test 526, and/or abackground color test 530. The context based verification tests 136 may include fewer, additional, or different tests. - The
context information 214 used by the context based verification tests 136 may include any information that may provide context for the candidate objects 132. For example, thecontext information 214 may include the percentage of skin tones in thecandidate object 314, a location of thecandidate object 314 within thesource image 204, a location of thecandidate object 314 relative to one or more of the detected faces 212, the size of the candidate object relative to one or more of the detected faces 212, the size of the candidate object relative to the size of thesource image 204 and/or any other information related to the context of thecandidate object 314, such as text that is associated with thesource image 204, such as a post, or a tag associated with thesource image 204. - The
skin tone test 520 may determine the percentage of thecandidate object 314 that has color and/or hue values that are consistent with skin tones. The determined percentage may be compared to a predetermined minimum expected percentage and/or a predetermined maximum expected percentage. The predetermined minimum expected percentage and the predetermined maximum expected percentage may be configurable. The skin tones may be configurable. If the determined percentage is in a range between the predetermined minimum expected percentage and the predetermined maximum expected percentage, then theskin tone test 520 may not modify thebelief score 510, for example. On the other hand, if the determined percentage is less than the predetermined minimum expected percentage or greater than the predetermined maximum expected percentage, then theskin tone test 520 may determine a difference between the determined percentage and the closest of the predetermined minimum expected percentage or the predetermined maximum expected percentage. The difference may be multiplied by an adjustable multiplier to further emphasize the result, on a per candidate object basis. - For example, the expected percentage range of skin tones for a
candidate object 314 of type in-hand may be set at 50-80%. In other words, the predetermined minimum expected percentage is 50%, and the predetermined maximum expected percentage is 80%. If only 10% of the pixels in thecandidate object 314 are determined to be skin tones, then the difference in percentage points between 10% and 50% (40%) is multiplied by a skin tone multiplier resulting in a negative value that lowers thebelief score 510. Similarly, if 90% of the pixels in thecandidate object 314 are determined to be skin tones, then the difference in percentage points between 90% and 80% (10%) is multiplied by the skin tone multiplier resulting in a negative value that harms thebelief score 510. Alternatively, if the skin percentage range of thecandidate object 314 falls within the predetermined percentage range, then thebelief score 510 may be unaffected by theskin tone test 520. - The
image location test 522 may verify that the location of thecandidate object 314 within thesource image 204 is within a predefined area. The predetermined area may be typical for an object of thepredetermined object type 124. For example, beer cans often appear near the center to bottom half of an image, because the beer cans are most often on a table or are being held by a person below eye level. Accordingly, the center of thesource image 204 may be a baseline. As the location of thecandidate object 314 increases on the Y-axis from the baseline (in other words, as thecandidate object 310 is located further towards the top of thesource image 204 relative to the baseline), thebelief score 510 may decrease. For example, theimage location test 522 may reduce thebelief score 510 by a multiplicative product of an adjustable belief multiplier and the distance that thecandidate object 314 is from the baseline. - The
face location test 524 may verify that the location of thecandidate object 314 relative to one or more of the detected faces 212 is appropriate for the predetermined object type. In one such example, many types of objects should not overlap any of the detected faces 212. A beer can, for example, is relatively unlikely to overlap a face in a picture. Accordingly, if thecandidate object 314 is potentially a beer can and yet thecandidate object 314 overlaps any of the detected faces 212, then theface location test 524 may decrease thebelief score 510 by a predetermined amount. - The
image size test 526 may verify that the size of thecandidate object 314 relative to the size of thesource image 204 is within a predetermined range. The predetermined range may be a range that is typical for an object of thepredetermined object type 124. For example, a relative size of a beer may typically be less than thirty percent of thesource image 204 or more than five percent of thesource image 204. In some examples, the candidate objects 132 that do not fall within the predetermined size range may be eliminated from consideration early in the verification process in order to reduce computational time. - The
face size test 526 may verify that the size of thecandidate object 314 relative to the size of the detected faces 212 in thesource image 204 is within a predetermined range. The predetermined range may be typical for objects of thepredetermined object type 124. For example, a beer can in an image is unlikely to be twice the size of a human head or a tenth the size of a human head. The candidate objects 132 that fall outside established (and adjustable) ranges compared to the average face size in thesource image 204 may be eliminated from further consideration. - The
background color test 530 may compare the average color of thecandidate object 314 with background colors of thesource image 204. For example, objects that may be transparent may more closely match the background colors of thesource image 204 than translucent objects. Thebackground color test 530 may verify that the average color of thecandidate object 314 matches the background colors of thesource image 204 to a degree that is typical for objects of thepredetermined object type 124. Thebackground color test 530 may compare the average color of thecandidate object 314 with the background colors of thesource image 204. For example, thecandidate object 314 for the predetermined object type, “plastic cup,” may be part of a larger background object, such as a red fire engine. The average color (in any color space) of thecandidate object 314 may be determined. Thebackground color test 530 may determine a percentage of theentire source image 204 that contains the average color of thecandidate object 314 and/or similar color values within an adjustable range. The percentage of thesource image 204 that thecandidate object 314 occupies may be compared to the percentage of theentire source image 204 that contains the range of similar color value. If thesource image 204 contains a high percentage of a similar color, a similarly colored background object (such as a red fire truck) may be present in thesource image 204. The presence of the background object that is similar in color to thecandidate object 314 may indicate a lower likelihood that thecandidate object 314 is of thepredetermined object type 124. The lower likelihood is due to thecandidate object 314 being more likely to be a section of the background object. Accordingly, thebackground color test 530 may reduce thebelief score 510 if thesource image 204 contains a high percentage of a color similar to the color of thecandidate object 314. Alternatively, if thesource image 204 contains a low percentage of a color similar to the color of thecandidate object 314, then thebackground color test 530 may not modify thebelief score 510. - As described above, the
context information 214 may include information about thefaces 212 detected by theobject detection module 126. Theverification module 128 may further limit the information about the detected faces 212 to information about faces that are also verified by theverification module 128. For example, theverification module 128 may verify the detected faces 212 by performing the reference image based verification tests 134 or any other type of test, such as a biometric test. The detected faces 212 may be limited to the faces that meet or exceed a predetermined belief level, such as a fifty percent likelihood that the detectedface 212 is actually a face. - In some examples, the
context information 214 may include metadata, such as geo-location data, associated with thesource image 204. A camera, or a device that includes the camera, that captured thesource image 204 may tag thesource image 204 with geo-location data indicating a physical location where thesource image 204 was taken. Thescan engine 114 may extract the geo-location data and determine a likelihood that an object of thepredetermined object type 124 was at the physical location where thesource image 204 was captured. The context based verification tests 136 may adjust thebelief score 510 according to the likelihood that an object of thepredetermined object type 124 was at the physical location where thesource image 204 was captured. For example, thebelief score 510 may be increased if thepredetermined object type 124 is a beer bottle and the physical location is determined to be a bar. - The
context information 214 may include a capture date. The capture date may indicate a date on which thesource image 204 was taken. The date may include a time of day. The date may include only a time of day in some examples. The capture date may be extracted from the metadata associated withsource image 204. The metadata may be added by the camera or any other device. For example, the metadata may be a date on which thesource image 204 was posted in thesocial networking service 102. - The context based verification tests 136 may adjust the
belief score 510 according to the likelihood that an object of thepredetermined object type 124 is present on the capture date. For example, if thepredetermined object type 124 is a Christmas tree, then the candidate objects 132 are more likely to be a Christmas tree if the capture date of thesource image 204 is on Christmas, or within a date range that includes Christmas. As a result, the context based verification tests 136 may increase the belief scores of the candidate objects 132 when searching for a Christmas tree and the capture date of thesource image 204 is on Christmas or within a date range that includes Christmas. - The
context information 214 may include information about one or more images associated with thesource image 204. For example, the images associated with thesource image 204 may be images captured within a predetermined time of thesource image 204. Alternatively or in addition, the images associated with thesource image 204 may be images included in one photo album in thesocial networking service 102. The inclusion of thesource image 204 in a photo album that also includes an image depicting one or more objects associated with thepredetermined object type 124 may increase the likelihood that the candidate objects 132 are objects of thepredetermined object type 124. Alternatively or in addition, the images associated with thesource image 204 may be images having a capture date within a predetermined amount of time from the capture dates of the associated images. - The context based verification tests 136 may adjust the
belief score 510 based on an amount of time between the capture date of thesource image 204 and the capture date of an image that includes an object of thepredetermine object type 124 or information associated with thepredetermined object type 124. In one such example, thescan engine 114 detects an object of thepredetermined object type 124, such as a basketball, in an associated image with a relatively high belief score. The image was captured within close time proximity to (or within a predetermined amount of time of) thesource image 204. The associated image may be associated with thesource image 204 by being in same photo album as thesource image 204. As a result, the context based verification tests 136 may increase the belief scores for the candidate objects 132 in thesource image 204 when the scan engine searches thesource image 204 for thepredetermined object type 124. - The
context information 214 may include an identity of one or more people depicted in thesource image 204 and/or personally identifiable information of the people depicted in thesource image 204. For example, thescan engine 114 may search for thepredetermined object type 124, such as a hand bag, in thesource image 204 that depicts or is otherwise associated with individual A. Individual A may be associated with thesource image 204 through a social tag and/or by facial recognition processing of thesource image 204. A database may store an indication that objects of thepredetermined object type 124 have been detected in images associated with or depicting individual A. Alternatively or in addition, the database may indicate that individual A is otherwise associated with one or more suppliers of handbags. For example, individual A may follow a handbag supplier on TWITTER®, be employed by the handbag supplier according to a social networking site such as LinkedIn, or have “liked” the handbag supplier's FACEBOOK® page (TWITTER is a registered mark of Twitter, Inc. of San Francisco, Calif.). The context based verification tests 136 may search the database for associations between thepredetermined object type 124 and any individuals depicted in or otherwise associated with thesource image 204. The context based verification tests 136 may increase the belief scores of the candidate objects 132 when associations are found in the database. - The
context information 214 may include text-based social data associated with thesource image 204. The text-based social data associated with thesource image 204 may be any text associated with thesource image 204 in thesocial networking service 102. Examples of the text-based social data may include album titles, photo captions, and/or comments. For example, thepredetermined object type 124 may be a dog and thesource image 204 may be a photo pulled from thesocial networking service 102. Someone may have commented on the photo with the words “cute dog.” In an alternative example, thesource image 204 may be an album cover for an album entitled “puppy play-date.” In these two examples, the text-based social data may be “cute dog” and “puppy play-date,” respectively. As a result of finding a word and/or a phrase associated with thepredetermined object type 124 in the text-based social data that is associated with thesource image 204, the context based verification tests 136 may increase the belief scores of the candidate objects 132. - The
context information 214 may include the weather on the day thesource image 204 is captured. The context based verification tests 136 may extract the capture date and the physical location of thesource image 204 from the metadata of thesource image 204 or other source. The context based verification tests 136 may identify the weather on the capture date at the physical location from a database of known weather conditions. The context based verification tests 136 may adjust the belief scores of the candidate objects 132 based on a likelihood of thepredetermined object type 124 being depicted in a photo on the capture date at the physical location. - In one such example, the
predetermined object type 124 may be an umbrella. The metadata of thesource image 204 may indicate that thesource image 204 was captured on Apr. 14, 1991 in Arlington, Va. The context based verification tests 136 may determine whether it was raining on the capture date in the capture location from the database of known weather conditions. The context based verification tests 136 may increase the belief scores of the candidate objects 132 if the database indicates that it rained on Apr. 14, 1991 in Arlington, Va. - The
belief score 510 and/or the adjustedbelief score 514 is generated (508 and/or 514) for each candidate object and corresponding reference image. In other words, whenmultiple reference images 140 are compared with each candidate object, multiple belief scores and/or adjusted belief scores may be generated for each candidate object. - For each candidate object, the
belief score 510, the adjustedbelief score 514, the highest of the belief scores, and/or the highest of the adjusted belief scores may be compared to a predetermined threshold. The predetermined threshold may represent a threshold belief score at which thecandidate object 314 is considered an object of thepredetermined object type 124. The location of thecandidate object 314 may be stored in thememory 112. - The highest of the belief scores and/or the highest of the adjusted belief scores for each candidate object may be stored in the
memory 112. In addition, the size, the type of object, and the reference image that compared most similarly with each candidate object may be stored in thememory 112. - The stored information, such as the
belief score 510 or the adjustedbelief score 514 may be presented to a user in theGUI 146 as a number, percentage, or in in word format. The word format may be a word, symbol, or phrase that represents level of confidence that the candidate object is, indeed, an object of thepredetermined object type 124. - With knowledge of the
reference object 350 that best matched (highest belief score and/or adjusted belief score) thecandidate object 314, additional determinations may be made about thecandidate object 314. For example, a brand of a beverage or type of bottle may be determined for bottle objects. The additional determinations made based on the knowledge of the best matched reference object may be useful to advertisers or other parties. -
FIG. 6 illustrates an example 600 of the graphical user interface (GUI) 146 for building cascade classifiers used by theobject detection module 126. A user may create any number of cascade classifiers for any object using theGUI 600. TheGUI 600 may include, for example, anoptions section 602, apositive image section 604, and anegative image section 606. - The
options section 602 may include options that determine the behavior of the cascade classifier as a whole. For example, theoptions section 602 may display, and facilitate adjustment of, a type of cascade classifier (such as Haar, Hog, or LBP), the width and height of template images, the number of stages in the cascade classifier, and a maximum allowable number of false alarms. - The
positive image section 604 may display, and facilitate adjustment of, a positive image collection. The positive image collection is a collection of example images of thepredetermined object type 124 that thecascade classifier 130 is to positively identify when applied to any source image. Similarly, thenegative image section 606 may display, and facilitate adjustment of, a negative image collection. The negative image collection is a collection of example images that do not depict objects of thepredetermined object type 124. - The
graphical user interface 600 may provide for simple and efficient creation of cascade classifiers from scratch. The custom creation of an xml cascade, for example, may comprise preparing a set of positive images that embody thepredetermined object type 124, and a set of negative images that do not contain thepredetermined object type 124. The number ofsteps 608 in the cascade process and afalse alarm rate 610 of the cascade process may be adjusted in order to alter the sensitivity of the cascade. - Furthermore, the
GUI 600 may create or modify thecascade classifier 130 for any object type simply and quickly. The ability of theGUI 600 to create an xml cascade (or any other type of cascade classifier) for any type object may eliminate a reliance on available cascades that have a limited detection scope. In addition, thegraphical user interface 600 may facilitate creation of cascade classifiers that are overly sensitive to positive matches, unlike many cascades available for download. The cascade classifiers may be overly sensitive to positive matches, and hence detect more false positives, because theverification module 128 may eliminate the false positives from final set of the detected objects 122. -
FIG. 7 illustrates an example 700 of the graphical user interface (GUI) 146 for testing and adjusting parameters of theobject detection module 126 and theverification module 128. The GUI 70 may include, for example, aparameter section 702, afeedback section 704, and aninformation panel 706. - The
parameter section 402 may display, and facilitate adjustment of, theparameters 708 of theobject detection module 126. Alternatively or in addition, theparameter section 402 may display, and facilitate adjustment of,parameters 710 of theverification module 128. For example, theparameters 710 of theverification module 128 may include the target difference values 506 used in the determination of thedifference ratios 504 and thebelief multipliers 512 used to adjust the impact of each characteristic on thebelief score 510. Additional parameters may be available for display and adjustment in theparameter section 702, such as configuration of skin tones, key point and descriptor parameters, background matching, and the belief threshold to pass the final result to the end user interface. - The
feedback section 704 may provide a testing feedback mechanism. Atest source image 712 may be loaded into thefeedback section 704. The types ofobjects 714 to search for may be selected. Thescan engine 114 may execute theobject detection module 126 and theverification module 128 using the parameters set in theparameter section 702. Thetest source image 712 may be displayed along with graphical information reflecting results of the execution of thescan engine 114. - The graphical information may provide insight into intermediate results obtained during the execution of the
scan engine 114 for a single selected object type. The example illustrated inFIG. 7 is a search for plastic cups. - In one example of such graphical information, the
faces 212 detected by theobject detection module 126 may be displayed as squares or rectangles surrounding the positively-identified faces. If a face was not properly detected in thetest source image 712, then the user may adjust the cascade classifier for faces, and re-run the test. - Another example of the graphical information may be
identification 716 of the candidate objects 132 detected in thetest source image 712 by thecascade classifier 132 for thepredetermined object type 124 but that are not verified by theverification module 128. The unverified candidate objects 716 may have belief scores and/or adjusted belief scores that are below thebelief threshold 718. The candidate objects 132 in thetest source image 712 that are not verified may be identified by enclosingrectangles 716, which correspond to locations and sizes of areas detected as matching the cascade parameters. - Yet another example of the graphical information may be identification of the detected
objects 122, which are the candidate objects 132 that are verified by theverification module 128. The detected objects 122 may be identified by rectangles in thetest source image 712 that represent locations and sizes of areas enclosing the detected objects 122. If an object of thepredetermined object type 124 was not properly detected in thetest source image 712, then the user may adjust any of theparameters - The
information panel 706 may provide additional feedback information. For example, theinformation panel 706 may display any textual output of thescan engine 114 for analysis, along with final results. Each of the rectangles in thetest source image 712 may be numbered in thetest source image 712. Theinformation panel 706 may display information related to the objects in the rectangles. For example, theinformation panel 706 may display the location, the size, the difference values 412, thedifference ratios 504, thebelief score 510, and/or the adjustedbelief score 516 for each of the candidate objects 132 next to a number of the corresponding candidate object. Alternatively or in addition, the information panel may display the characteristics of the candidate objects 132 and/or thereference images 140. The final results may include, for example, the location, size, the object type, and the belief score of each of the detected objects 122. - The ability to adjust the
parameters system 100 from within thegraphical user interface 700, and rapidly test and evaluate the adjustments, provides a dynamic and efficient tuning of the object recognition process. A user without extensive experience in object recognition technologies may test, evaluate, and improve the object recognition process for a large number of object types. -
FIG. 8 illustrates an example 800 of the graphical user interface (GUI) 146 for testing and adjusting theparameters multiple object types 714 in a singletest source image 802. As inFIG. 7 , rectangles may overlay the verified and unverified candidate objects 132 in thefeedback section 704 to represent the locations and sizes of the candidate objects 132 found by theobject detection module 126, as well as the detectedobjects 122, which are the candidate objects 132 that are verified by theverification module 128. In one example, yellow and purple rectangles may indicate objects detected but not verified, and white, light blue, green, and blue rectangles may indicate objects that were detected and verified by meeting the belief threshold for the respective object types. Each color may correspond to one of the object types. -
FIG. 9 illustrates an example 900 of the graphical user interface (GUI) 146 for presentingimages 902 and text that are available in thesocial networking service 102 and in which objectionable material is detected. Theimages 902 may be organized from greatest threat level (highest belief score) to lowest threat level that exceeds thebelief threshold 718 used by thescan engine 114. The predetermined object types that thescan engine 114 searches the source images for may be a set of object types that are identified as objectionable. Theobject recognition device 104 may obtain the source images by searching thesocial networking service 102 for images that are to be scanned by thescan engine 114. -
FIG. 10 illustrates an example 1000 of the graphical user interface (GUI) 146 for a user to provide feedback that theobject recognition device 104 may use to improve the accuracy of object recognition. TheGUI 1000 may display thesource image 204. Thesource image 204 may be selected by a user from the GUI illustrated inFIG. 9 or selected in any other manner. In the example illustrated inFIG. 10 , thesource image 204 is scanned by thescan engine 114 for plastic cups and for any objects found to be “in-hand.” Objects that are “in-hand” may be objects held in a hand, or in some examples, held in a hand in a suspicious manner. The detected objects 122 may be identified in thesource image 204 with a rectangle. - The user may select any of the detected
objects 122 for further information about the selected object. For example, theGUI 1000 may display the belief score or a threat risk in easy to understand terms, such as “highly likely”, “100.00% confidence” or “minimal threat.” - The user may also provide provide feedback, which may be used to help improve the accuracy of the process during future testing and adjustment. For example, the
GUI 1000 may display a collection ofpredetermined object types 1010 that thescan engine 114 searched thesource image 204 for. The user may select any of thepredetermined object types 1010 that are depicted 1020 in thesource image 204 but that were not identified as being one of the detected objected 122. - The
system 100 may be implemented with additional, different, or fewer components. For example, thesystem 100 may include only theobject recognition device 104. In other examples, theobject recognition device 104 may not include the context based verification tests 136. - The logic flows illustrated in
FIGS. 2-5 may include additional, different, or fewer operations than illustrated. The operations may be executed in a different order than illustrated. - Each component may include additional, different, or fewer components. In one such example, each of the
client devices 106 may include a copy of all or a portion of theobject recognition device 104. In another example, the reference image based verification tests 134 may include thescoring module 138 or a portion thereof. In still another example, theverification module 128 may not include the context based verification tests 136. TheGUI 146 generated on any of theclient devices 106 may include only theadmin GUI 148, only theend user GUI 150, or both theadmin GUI 148 and theend user GUI 150. - The
system 100 may be implemented in many different ways. Each module, such as thescan engine 114, theobject detection module 126, theverification module 128, the reference image based verification tests 134, the context based verification tests 136, thescoring module 138, the scanengine GUI module 116, and/or the object detectionservice GUI module 118, may be hardware or a combination of hardware and software. For example, each module may include an application specific integrated circuit (ASIC), a Field Programmable Gate Array (FPGA), a circuit, a digital logic circuit, an analog circuit, a combination of discrete circuits, gates, or any other type of hardware or combination thereof. Alternatively or in addition, each module may include memory hardware, such as a portion of thememory 112, for example, that comprises instructions executable with theprocessor 110 or other processor to implement one or more of the features of the module. When any one of the module includes the portion of the memory that comprises instructions executable with the processor, the module may or may not include the processor. In some examples, each module may just be the portion of thememory 112 or other physical memory that comprises instructions executable with theprocessor 110 or other processor to implement the features of the corresponding module without the module including any other hardware. Because each module includes at least some hardware even when the included hardware comprises software, each module may be interchangeably referred to as a hardware module, such as the objectdetection hardware module 126, theverification hardware module 128, the reference image based verificationtests hardware module 134, the context based verificationtests hardware module 136, the scoringhardware module 138, the scan engineGUI hardware module 116, and/or the object detection serviceGUI hardware module 118. - In the example illustrated in
FIG. 5 , the context based verification tests 136 adjust thebelief score 510 determined from thedifference ratios 504 and the belief multipliers 512. Alternatively, the context based verification tests 136 may also generate difference ratios that are multiplied by corresponding belief multipliers in the determination of thebelief score 510. The difference ratios for the context based verification tests 136 may represent a difference between thecandidate object 314 and corresponding characteristics of the predetermined object type. - The
processor 110 may be in communication with thememory 112. In one example, theprocessor 110 may also be in communication with additional elements, such as a network interface and/or a display device. Examples of theprocessor 110 may include a general processor, central processing unit, a controller, an application specific integrated circuit (ASIC), a digital signal processor, a field programmable gate array (FPGA), a digital circuit, and/or an analog circuit. - The
processor 110 may be one or more devices operable to execute logic. The logic may include computer executable instructions or computer code embodied in thememory 112 or in other memory that when executed by theprocessor 110, cause theprocessor 110 to perform the features of theobject recognition device 104. The computer code may include instructions executable with theprocessor 110. - Some features are described as implemented in a computer readable storage medium (for example, as logic implemented as computer executable instructions or as data structures in the memory 112). All or part of the system and its logic and data structures may be stored on, distributed across, or read from one or more types of computer readable storage media. Examples of the computer readable storage medium may include a hard disk, a floppy disk, a CD-ROM, a flash drive, a cache, volatile memory, non-volatile memory, RAM, flash memory, or any other type of computer readable storage medium or storage media. The computer readable storage medium may include any type of non-transitory computer readable medium, such as a CD-ROM, a volatile memory, a non-volatile memory, ROM, RAM, or any other suitable storage device. However, the computer readable storage medium is not a transitory transmission medium for propagating signals.
- The processing capability of the
system 100 may be distributed among multiple entities, such as among multiple processors and memories, optionally including multiple distributed processing systems. Parameters, databases, and other data structures may be separately stored and managed, may be incorporated into a single memory or database, may be logically and physically organized in many different ways, and may implemented with different types of data structures such as linked lists, hash tables, or implicit storage mechanisms. Logic, such as programs or circuitry, may be combined or split among multiple programs, distributed across several memories and processors, and may be implemented in a library, such as a shared library (for example, a dynamic link library (DLL)). - All of the discussion, regardless of the particular implementation described, is exemplary in nature, rather than limiting. For example, although selected aspects, features, or components of the implementations are depicted as being stored in memories, all or part of the system or systems may be stored on, distributed across, or read from other computer readable storage media, for example, secondary storage devices such as hard disks, flash memory drives, floppy disks, and CD-ROMs. Moreover, the various modules and screen display functionality is but one example of such functionality and any other configurations encompassing similar functionality are possible.
- The respective logic, software or instructions for implementing the processes, methods and/or techniques discussed above may be provided on computer readable storage media. The functions, acts or tasks illustrated in the figures or described herein may be executed in response to one or more sets of logic or instructions stored in or on computer readable media. The functions, acts or tasks are independent of the particular type of instructions set, storage media, processor or processing strategy and may be performed by software, hardware, integrated circuits, firmware, micro code and the like, operating alone or in combination. Likewise, processing strategies may include multiprocessing, multitasking, parallel processing and the like. In one embodiment, the instructions are stored on a removable media device for reading by local or remote systems. In other embodiments, the logic or instructions are stored in a remote location for transfer through a computer network or over telephone lines. In yet other embodiments, the logic or instructions are stored within a given computer, central processing unit (“CPU”), graphics processing unit (“GPU”), or system.
- Furthermore, although specific components are described above, methods, systems, and articles of manufacture described herein may include additional, fewer, or different components. For example, a processor may be implemented as a microprocessor, microcontroller, application specific integrated circuit (ASIC), discrete logic, or a combination of other type of circuits or logic. Similarly, memories may be DRAM, SRAM, Flash or any other type of memory. Flags, data, databases, tables, entities, and other data structures may be separately stored and managed, may be incorporated into a single memory or database, may be distributed, or may be logically and physically organized in many different ways. The components may operate independently or be part of a same program or apparatus. The components may be resident on separate hardware, such as separate removable circuit boards, or share common hardware, such as a same memory and processor for implementing instructions from the memory. Programs may be parts of a single program, separate programs, or distributed across several memories and processors.
- To clarify the use of and to hereby provide notice to the public, the phrases “at least one of <A>, <B>, . . . and <N>” or “at least one of <A>, <B>, . . . <N>, or combinations thereof” or “<A>, <B>, . . . and/or <N>” are defined by the Applicant in the broadest sense, superseding any other implied definitions hereinbefore or hereinafter unless expressly asserted by the Applicant to the contrary, to mean one or more elements selected from the group comprising A, B, . . . and N. In other words, the phrases mean any combination of one or more of the elements A, B, . . . or N including any one element alone or the one element in combination with one or more of the other elements which may also include, in combination, additional elements not listed.
- While various embodiments have been described, it will be apparent to those of ordinary skill in the art that many more embodiments and implementations are possible. Accordingly, the embodiments described herein are examples, not the only possible embodiments and implementations.
Claims (25)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/181,077 US9122958B1 (en) | 2014-02-14 | 2014-02-14 | Object recognition or detection based on verification tests |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/181,077 US9122958B1 (en) | 2014-02-14 | 2014-02-14 | Object recognition or detection based on verification tests |
Publications (2)
Publication Number | Publication Date |
---|---|
US20150235110A1 true US20150235110A1 (en) | 2015-08-20 |
US9122958B1 US9122958B1 (en) | 2015-09-01 |
Family
ID=53798388
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/181,077 Expired - Fee Related US9122958B1 (en) | 2014-02-14 | 2014-02-14 | Object recognition or detection based on verification tests |
Country Status (1)
Country | Link |
---|---|
US (1) | US9122958B1 (en) |
Cited By (46)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150160839A1 (en) * | 2013-12-06 | 2015-06-11 | Google Inc. | Editing options for image regions |
US20150379731A1 (en) * | 2014-06-26 | 2015-12-31 | Amazon Technologies, Inc. | Color name generation from images and color palettes |
CN105809190A (en) * | 2016-03-03 | 2016-07-27 | 南京邮电大学 | Characteristic selection based SVM cascade classifier method |
US9552656B2 (en) | 2014-06-26 | 2017-01-24 | Amazon Technologies, Inc. | Image-based color palette generation |
US9633448B1 (en) | 2014-09-02 | 2017-04-25 | Amazon Technologies, Inc. | Hue-based color naming for an image |
US9652868B2 (en) | 2014-06-26 | 2017-05-16 | Amazon Technologies, Inc. | Automatic color palette based recommendations |
US9659032B1 (en) | 2014-06-26 | 2017-05-23 | Amazon Technologies, Inc. | Building a palette of colors from a plurality of colors based on human color preferences |
WO2017090830A1 (en) * | 2015-11-27 | 2017-06-01 | 연세대학교 산학협력단 | Method for recognizing object on basis of space-object relationship graph, and device therefor |
US9679532B2 (en) | 2014-06-26 | 2017-06-13 | Amazon Technologies, Inc. | Automatic image-based recommendations using a color palette |
US9697573B1 (en) | 2014-06-26 | 2017-07-04 | Amazon Technologies, Inc. | Color-related social networking recommendations using affiliated colors |
US9727983B2 (en) | 2014-06-26 | 2017-08-08 | Amazon Technologies, Inc. | Automatic color palette based recommendations |
US9741137B2 (en) | 2014-06-26 | 2017-08-22 | Amazon Technologies, Inc. | Image-based color palette generation |
US9785649B1 (en) | 2014-09-02 | 2017-10-10 | Amazon Technologies, Inc. | Hue-based color naming for an image |
US9786000B2 (en) * | 2014-10-15 | 2017-10-10 | Toshiba Global Commerce Solutions | Method, computer program product, and system for providing a sensor-based environment |
US9792303B2 (en) | 2014-06-26 | 2017-10-17 | Amazon Technologies, Inc. | Identifying data from keyword searches of color palettes and keyword trends |
CN107506747A (en) * | 2017-09-11 | 2017-12-22 | 重庆大学 | Face identification system and method based on video data characteristic point |
US20180025253A1 (en) * | 2015-05-12 | 2018-01-25 | Lawrence Livermore National Security, Llc | Identification of uncommon objects in containers |
US9898487B2 (en) | 2014-06-26 | 2018-02-20 | Amazon Technologies, Inc. | Determining color names from keyword searches of color palettes |
US9916613B1 (en) | 2014-06-26 | 2018-03-13 | Amazon Technologies, Inc. | Automatic color palette based recommendations for affiliated colors |
US9922050B2 (en) | 2014-06-26 | 2018-03-20 | Amazon Technologies, Inc. | Identifying data from keyword searches of color palettes and color palette trends |
US9996579B2 (en) | 2014-06-26 | 2018-06-12 | Amazon Technologies, Inc. | Fast color searching |
CN108305281A (en) * | 2018-02-09 | 2018-07-20 | 深圳市商汤科技有限公司 | Calibration method, device, storage medium, program product and the electronic equipment of image |
US10073860B2 (en) | 2014-06-26 | 2018-09-11 | Amazon Technologies, Inc. | Generating visualizations from keyword searches of color palettes |
US10120880B2 (en) | 2014-06-26 | 2018-11-06 | Amazon Technologies, Inc. | Automatic image-based recommendations using a color palette |
US10169803B2 (en) | 2014-06-26 | 2019-01-01 | Amazon Technologies, Inc. | Color based social networking recommendations |
US10186054B2 (en) | 2014-06-26 | 2019-01-22 | Amazon Technologies, Inc. | Automatic image-based recommendations using a color palette |
US10223427B1 (en) | 2014-06-26 | 2019-03-05 | Amazon Technologies, Inc. | Building a palette of colors based on human color preferences |
US10235389B2 (en) | 2014-06-26 | 2019-03-19 | Amazon Technologies, Inc. | Identifying data from keyword searches of color palettes |
EP3357019A4 (en) * | 2015-09-30 | 2019-03-27 | The Nielsen Company (US), LLC. | Interactive product auditing with a mobile device |
US10255295B2 (en) | 2014-06-26 | 2019-04-09 | Amazon Technologies, Inc. | Automatic color validation of image metadata |
WO2019088511A1 (en) * | 2017-11-06 | 2019-05-09 | Samsung Electronics Co., Ltd. | Electronic device and method for reliability-based object recognition |
US10430857B1 (en) | 2014-08-01 | 2019-10-01 | Amazon Technologies, Inc. | Color name based search |
CN110533190A (en) * | 2019-07-18 | 2019-12-03 | 武汉烽火众智数字技术有限责任公司 | A kind of data object analysis method and device based on machine learning |
CN110766081A (en) * | 2019-10-24 | 2020-02-07 | 腾讯科技(深圳)有限公司 | Interface image detection method, model training method and related device |
US20200143838A1 (en) * | 2018-11-02 | 2020-05-07 | BriefCam Ltd. | Method and system for automatic object-aware video or audio redaction |
US10650233B2 (en) * | 2018-04-25 | 2020-05-12 | International Business Machines Corporation | Identifying discrete elements of a composite object |
US10691744B2 (en) | 2014-06-26 | 2020-06-23 | Amazon Technologies, Inc. | Determining affiliated colors from keyword searches of color palettes |
US10706334B2 (en) | 2017-02-20 | 2020-07-07 | Alibaba Group Holding Limited | Type prediction method, apparatus and electronic device for recognizing an object in an image |
US20200284609A1 (en) * | 2019-03-05 | 2020-09-10 | International Business Machines Corporation | Alert system for environmental changes |
US10885095B2 (en) * | 2014-03-17 | 2021-01-05 | Verizon Media Inc. | Personalized criteria-based media organization |
EP3826523A1 (en) * | 2018-10-12 | 2021-06-02 | Sony Corporation | A system, method and computer program for verifying features of a scene |
DE102020201939A1 (en) | 2020-02-17 | 2021-08-19 | Robert Bosch Gesellschaft mit beschränkter Haftung | Method and device for evaluating an image classifier |
US11170216B2 (en) * | 2017-01-20 | 2021-11-09 | Sony Network Communications Inc. | Information processing apparatus, information processing method, program, and ground marker system |
US20220001923A1 (en) * | 2020-07-03 | 2022-01-06 | Volvo Truck Corporation | Method for guiding a vehicle |
US11270420B2 (en) * | 2017-09-27 | 2022-03-08 | Samsung Electronics Co., Ltd. | Method of correcting image on basis of category and recognition rate of object included in image and electronic device implementing same |
US11928662B2 (en) * | 2021-09-30 | 2024-03-12 | Toshiba Global Commerce Solutions Holdings Corporation | End user training for computer vision system |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5227639B2 (en) * | 2008-04-04 | 2013-07-03 | 富士フイルム株式会社 | Object detection method, object detection apparatus, and object detection program |
US8311292B2 (en) | 2009-02-09 | 2012-11-13 | Cisco Technology, Inc. | Context aware, multiple target image recognition |
-
2014
- 2014-02-14 US US14/181,077 patent/US9122958B1/en not_active Expired - Fee Related
Cited By (70)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10114532B2 (en) * | 2013-12-06 | 2018-10-30 | Google Llc | Editing options for image regions |
US20150160839A1 (en) * | 2013-12-06 | 2015-06-11 | Google Inc. | Editing options for image regions |
US10885095B2 (en) * | 2014-03-17 | 2021-01-05 | Verizon Media Inc. | Personalized criteria-based media organization |
US10255295B2 (en) | 2014-06-26 | 2019-04-09 | Amazon Technologies, Inc. | Automatic color validation of image metadata |
US9697573B1 (en) | 2014-06-26 | 2017-07-04 | Amazon Technologies, Inc. | Color-related social networking recommendations using affiliated colors |
US20180040142A1 (en) * | 2014-06-26 | 2018-02-08 | Amazon Technologies, Inc. | Color name generation from images and color palettes |
US9652868B2 (en) | 2014-06-26 | 2017-05-16 | Amazon Technologies, Inc. | Automatic color palette based recommendations |
US20150379731A1 (en) * | 2014-06-26 | 2015-12-31 | Amazon Technologies, Inc. | Color name generation from images and color palettes |
US10691744B2 (en) | 2014-06-26 | 2020-06-23 | Amazon Technologies, Inc. | Determining affiliated colors from keyword searches of color palettes |
US9679532B2 (en) | 2014-06-26 | 2017-06-13 | Amazon Technologies, Inc. | Automatic image-based recommendations using a color palette |
US9916613B1 (en) | 2014-06-26 | 2018-03-13 | Amazon Technologies, Inc. | Automatic color palette based recommendations for affiliated colors |
US9727983B2 (en) | 2014-06-26 | 2017-08-08 | Amazon Technologies, Inc. | Automatic color palette based recommendations |
US9898487B2 (en) | 2014-06-26 | 2018-02-20 | Amazon Technologies, Inc. | Determining color names from keyword searches of color palettes |
US10242396B2 (en) | 2014-06-26 | 2019-03-26 | Amazon Technologies, Inc. | Automatic color palette based recommendations for affiliated colors |
US10235389B2 (en) | 2014-06-26 | 2019-03-19 | Amazon Technologies, Inc. | Identifying data from keyword searches of color palettes |
US9792303B2 (en) | 2014-06-26 | 2017-10-17 | Amazon Technologies, Inc. | Identifying data from keyword searches of color palettes and keyword trends |
US9836856B2 (en) | 2014-06-26 | 2017-12-05 | Amazon Technologies, Inc. | Color name generation from images and color palettes |
US11216861B2 (en) | 2014-06-26 | 2022-01-04 | Amason Technologies, Inc. | Color based social networking recommendations |
US9514543B2 (en) * | 2014-06-26 | 2016-12-06 | Amazon Technologies, Inc. | Color name generation from images and color palettes |
US9659032B1 (en) | 2014-06-26 | 2017-05-23 | Amazon Technologies, Inc. | Building a palette of colors from a plurality of colors based on human color preferences |
US9552656B2 (en) | 2014-06-26 | 2017-01-24 | Amazon Technologies, Inc. | Image-based color palette generation |
US9741137B2 (en) | 2014-06-26 | 2017-08-22 | Amazon Technologies, Inc. | Image-based color palette generation |
US9922050B2 (en) | 2014-06-26 | 2018-03-20 | Amazon Technologies, Inc. | Identifying data from keyword searches of color palettes and color palette trends |
US9996579B2 (en) | 2014-06-26 | 2018-06-12 | Amazon Technologies, Inc. | Fast color searching |
US10223427B1 (en) | 2014-06-26 | 2019-03-05 | Amazon Technologies, Inc. | Building a palette of colors based on human color preferences |
US10049466B2 (en) * | 2014-06-26 | 2018-08-14 | Amazon Technologies, Inc. | Color name generation from images and color palettes |
US10073860B2 (en) | 2014-06-26 | 2018-09-11 | Amazon Technologies, Inc. | Generating visualizations from keyword searches of color palettes |
US10402917B2 (en) | 2014-06-26 | 2019-09-03 | Amazon Technologies, Inc. | Color-related social networking recommendations using affiliated colors |
US10120880B2 (en) | 2014-06-26 | 2018-11-06 | Amazon Technologies, Inc. | Automatic image-based recommendations using a color palette |
US10169803B2 (en) | 2014-06-26 | 2019-01-01 | Amazon Technologies, Inc. | Color based social networking recommendations |
US10186054B2 (en) | 2014-06-26 | 2019-01-22 | Amazon Technologies, Inc. | Automatic image-based recommendations using a color palette |
US10430857B1 (en) | 2014-08-01 | 2019-10-01 | Amazon Technologies, Inc. | Color name based search |
US10831819B2 (en) | 2014-09-02 | 2020-11-10 | Amazon Technologies, Inc. | Hue-based color naming for an image |
US9785649B1 (en) | 2014-09-02 | 2017-10-10 | Amazon Technologies, Inc. | Hue-based color naming for an image |
US9633448B1 (en) | 2014-09-02 | 2017-04-25 | Amazon Technologies, Inc. | Hue-based color naming for an image |
US9786000B2 (en) * | 2014-10-15 | 2017-10-10 | Toshiba Global Commerce Solutions | Method, computer program product, and system for providing a sensor-based environment |
US11127061B2 (en) | 2014-10-15 | 2021-09-21 | Toshiba Global Commerce Solutions Holdings Corporation | Method, product, and system for identifying items for transactions |
US20180025253A1 (en) * | 2015-05-12 | 2018-01-25 | Lawrence Livermore National Security, Llc | Identification of uncommon objects in containers |
US10592774B2 (en) * | 2015-05-12 | 2020-03-17 | Lawrence Livermore National Security, Llc | Identification of uncommon objects in containers |
US11562314B2 (en) | 2015-09-30 | 2023-01-24 | The Nielsen Company (Us), Llc | Interactive product auditing with a mobile device |
EP3862948A1 (en) * | 2015-09-30 | 2021-08-11 | The Nielsen Company (US), LLC | Interactive product auditing with a mobile device |
EP3357019A4 (en) * | 2015-09-30 | 2019-03-27 | The Nielsen Company (US), LLC. | Interactive product auditing with a mobile device |
WO2017090830A1 (en) * | 2015-11-27 | 2017-06-01 | 연세대학교 산학협력단 | Method for recognizing object on basis of space-object relationship graph, and device therefor |
CN105809190A (en) * | 2016-03-03 | 2016-07-27 | 南京邮电大学 | Characteristic selection based SVM cascade classifier method |
US20220036037A1 (en) * | 2017-01-20 | 2022-02-03 | Sony Network Communications Inc. | Information processing apparatus, information processing method, program, and ground marker system |
US11733042B2 (en) * | 2017-01-20 | 2023-08-22 | Sony Network Communications Inc. | Information processing apparatus, information processing method, program, and ground marker system |
US11170216B2 (en) * | 2017-01-20 | 2021-11-09 | Sony Network Communications Inc. | Information processing apparatus, information processing method, program, and ground marker system |
US10706334B2 (en) | 2017-02-20 | 2020-07-07 | Alibaba Group Holding Limited | Type prediction method, apparatus and electronic device for recognizing an object in an image |
CN107506747A (en) * | 2017-09-11 | 2017-12-22 | 重庆大学 | Face identification system and method based on video data characteristic point |
US11270420B2 (en) * | 2017-09-27 | 2022-03-08 | Samsung Electronics Co., Ltd. | Method of correcting image on basis of category and recognition rate of object included in image and electronic device implementing same |
US10977819B2 (en) * | 2017-11-06 | 2021-04-13 | Samsung Electronics Co., Ltd. | Electronic device and method for reliability-based object recognition |
KR102499203B1 (en) * | 2017-11-06 | 2023-02-13 | 삼성전자 주식회사 | Electronic device and method for reliability-based ojbect recognition |
KR20190051230A (en) * | 2017-11-06 | 2019-05-15 | 삼성전자주식회사 | Electronic device and method for reliability-based ojbect recognition |
WO2019088511A1 (en) * | 2017-11-06 | 2019-05-09 | Samsung Electronics Co., Ltd. | Electronic device and method for reliability-based object recognition |
CN108305281A (en) * | 2018-02-09 | 2018-07-20 | 深圳市商汤科技有限公司 | Calibration method, device, storage medium, program product and the electronic equipment of image |
US10650233B2 (en) * | 2018-04-25 | 2020-05-12 | International Business Machines Corporation | Identifying discrete elements of a composite object |
US20210267435A1 (en) * | 2018-10-12 | 2021-09-02 | Sony Corporation | A system, method and computer program for verifying features of a scene |
EP3826523A1 (en) * | 2018-10-12 | 2021-06-02 | Sony Corporation | A system, method and computer program for verifying features of a scene |
US11527265B2 (en) * | 2018-11-02 | 2022-12-13 | BriefCam Ltd. | Method and system for automatic object-aware video or audio redaction |
US12125504B2 (en) | 2018-11-02 | 2024-10-22 | BriefCam Ltd. | Method and system for automatic pre-recordation video redaction of objects |
US11984141B2 (en) | 2018-11-02 | 2024-05-14 | BriefCam Ltd. | Method and system for automatic pre-recordation video redaction of objects |
US20200143838A1 (en) * | 2018-11-02 | 2020-05-07 | BriefCam Ltd. | Method and system for automatic object-aware video or audio redaction |
US20200284609A1 (en) * | 2019-03-05 | 2020-09-10 | International Business Machines Corporation | Alert system for environmental changes |
US11454509B2 (en) * | 2019-03-05 | 2022-09-27 | International Business Machines Corporation | Alert system for environmental changes |
CN110533190A (en) * | 2019-07-18 | 2019-12-03 | 武汉烽火众智数字技术有限责任公司 | A kind of data object analysis method and device based on machine learning |
CN110766081A (en) * | 2019-10-24 | 2020-02-07 | 腾讯科技(深圳)有限公司 | Interface image detection method, model training method and related device |
DE102020201939A1 (en) | 2020-02-17 | 2021-08-19 | Robert Bosch Gesellschaft mit beschränkter Haftung | Method and device for evaluating an image classifier |
US11897541B2 (en) * | 2020-07-03 | 2024-02-13 | Volvo Truck Corporation | Method for guiding a vehicle |
US20220001923A1 (en) * | 2020-07-03 | 2022-01-06 | Volvo Truck Corporation | Method for guiding a vehicle |
US11928662B2 (en) * | 2021-09-30 | 2024-03-12 | Toshiba Global Commerce Solutions Holdings Corporation | End user training for computer vision system |
Also Published As
Publication number | Publication date |
---|---|
US9122958B1 (en) | 2015-09-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9122958B1 (en) | Object recognition or detection based on verification tests | |
US20240070214A1 (en) | Image searching method and apparatus | |
Singh et al. | Face detection and recognition system using digital image processing | |
US10706334B2 (en) | Type prediction method, apparatus and electronic device for recognizing an object in an image | |
Karaoglu et al. | Words matter: Scene text for image classification and retrieval | |
US8792722B2 (en) | Hand gesture detection | |
Lee et al. | Adaboost for text detection in natural scene | |
WO2020082577A1 (en) | Seal anti-counterfeiting verification method, device, and computer readable storage medium | |
US12038977B2 (en) | Visual recognition using user tap locations | |
US9465813B1 (en) | System and method for automatically generating albums | |
US8571332B2 (en) | Methods, systems, and media for automatically classifying face images | |
US20150324368A1 (en) | Hierarchical ranking of facial attributes | |
US9633284B2 (en) | Image processing apparatus and image processing method of identifying object in image | |
Karaoglu et al. | Con-text: text detection using background connectivity for fine-grained object classification | |
Nadhan et al. | Smart attendance monitoring technology for industry 4.0 | |
Wang et al. | License plate localization in complex scenes based on oriented FAST and rotated BRIEF feature | |
Chen et al. | Saliency modeling via outlier detection | |
Kumar et al. | A technique for human upper body parts movement tracking | |
JP6699048B2 (en) | Feature selecting device, tag related area extracting device, method, and program | |
Wang et al. | Efficient iris localization via optimization model | |
Marat et al. | Influence of the amount of context learned for improving object classification when simultaneously learning object and contextual cues | |
Hao et al. | Color flag recognition based on HOG and color features in complex scene | |
Naveen et al. | Pose and head orientation invariant face detection based on optimised aggregate channel feature | |
Huffman et al. | Mixed media tattoo image matching using transformed edge alignment | |
Gupta et al. | Design and Analysis of an Expert System for the Detection and Recognition of Criminal Faces |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SOCIAL SWEEPSTER, LLC., INDIANA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CURTIS, TOD JOSEPH;MCGRATH, THOMAS RYAN;JAGACINSKI SCHWEICKERT, KENNETH EDWARD;SIGNING DATES FROM 20140212 TO 20140214;REEL/FRAME:032228/0456 |
|
AS | Assignment |
Owner name: SOCIAL SWEEPSTER, LLC., INDIANA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CURTIS, TOD JOSEPH;MCGRATH, THOMAS RYAN;JAGACINSKI SCHWEICKERT, KENNETH EDWARD;SIGNING DATES FROM 20140212 TO 20140214;REEL/FRAME:032364/0851 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20190901 |