WO2021015667A1

WO2021015667A1 - Image recognition devices and methods of image recognition

Info

Publication number: WO2021015667A1
Application number: PCT/SG2019/050362
Authority: WO
Inventors: Yoriko Kazama
Original assignee: Hitachi, Ltd.
Priority date: 2019-07-25
Filing date: 2019-07-25
Publication date: 2021-01-28

Abstract

According to various embodiments, there is provided a method of image recognition which includes: defining a plurality of classes for a target object, wherein each class is associated with a respective real size range of the target object and a respective set of image features of the target object; detecting a candidate object for the target object in an image; extracting image features from the candidate object; determining a real size of the candidate object based on a size of the candidate in the image and a position of the candidate object in the image; and classifying the candidate object, based on comparing the determined real size of the candidate object to the real size ranges associated with at least one class of the plurality of classes and comparing the extracted image features to the set of image features associated with at least one class of the plurality of classes.

Description

IMAGE RECOGNITION DEVICES AND METHODS OF IMAGE RECOGNITION

TECHNICAL FIELD

[0001] Various embodiments relate to image recognition devices and methods of image recognition.

BACKGROUND

[0002] Surveillance cameras, or closed circuit television (CCTV) are often used to collect data, for example for traffic monitoring or situation awareness. Image analysis technologies, such as object recognition, are required to effectively process the large amount of images collected. However, existing object recognition technologies may erroneously identify objects due to a variety of reasons. For example, the images may be unclear, or the object may be too small in the images. To perform object recognition accurately in such applications, for example, using deep learning methodology, a large quantity of data needs to be processed, including many classification categories and their definitions, huge feature vectors etc. Consequently, the computation power required becomes very large, which may be impractical and costly.

SUMMARY

[0003] According to various embodiments, there may be provided a method of image recognition. The method may include defining a plurality of classes for a target object, wherein each class is associated with a respective real size range of the target object and a respective set of image features of the target object; detecting a candidate object for the target object in an image; extracting image features from the candidate object; determining a real size of the candidate object based on a size of the candidate object in the image and a position of the candidate object in the image; and classifying the candidate object, based on comparing the determined real size of the candidate object to the real size ranges associated with at least one class of the plurality of classes for the target object and comparing the extracted image features to the set of image features associated with at least one class of the plurality of classes. [0004] According to various embodiments, there may be provided an image recognition device. The image recognition device may include: a database storing definitions of a plurality of classes for a target object, wherein each class is associated with a respective real size range of the target object and a respective set of image features of the target object; an image processor configured to detect a candidate object for the target object in an image, wherein the image processor is further configured to extract image features from the candidate object; a determination unit configured to determine a real size of the candidate object based on a size of the candidate object in the image and a position of the candidate object in the image; and a candidate classifier configured to classify the candidate object based on comparing the determined real size of the candidate object to the real size range associated with at least one class of the plurality of classes for the target object and comparing the extracted image features with the set of image features associated with at least one class of the plurality of classes.

[0005] A non-transitory computer readable medium storing instructions which when executed by a computer, makes the computer perform a method of image recognition, the method including: defining a plurality of classes for a target object, wherein each class is associated with a respective real size range of the target object and a respective set of image features of the target object; detecting a candidate object for the target object in an image; extracting image features from the candidate object; determining a real size of the candidate object based on a size of the candidate object in the image and a position of the candidate object in the image; and classifying the candidate object, based on comparing the determined real size of the candidate object to the real size range associated with at least one class of the plurality of classes for the target object and comparing the extracted image features to the set of image features associated with at least one class of the plurality of classes.

BRIEF DESCRIPTION OF THE DRAWINGS

[0006] In the drawings, like reference characters generally refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead generally being placed upon illustrating the principles of the invention. In the following description, various embodiments are described with reference to the following drawings, in which:

[0007] FIG. 1 shows a block diagram of an image recognition device according to various embodiments. [0008] FIG. 2 shows an example of an image captured by a fixed camera.

[0009] FIG. 3 shows a flow diagram of a method of image recognition according to various embodiments.

[0010] FIG. 4 shows a flow diagram of a method of image recognition according to various embodiments.

[0011] FIG. 5 shows a flow diagram of a method of image recognition according to various embodiments.

[0012] FIG. 6 shows a flow diagram of a method of image recognition according to various embodiments.

[0013] FIG. 7 shows a table that stores object information according to various embodiments.

[0014] FIG. 8 shows a representative diagram of the spatial information according to various embodiments.

[0015] FIG. 9 shows a table that stores candidate region information according to various embodiments.

[0016] FIG. 10 shows a representative diagram that illustrates the spatial scale parameters according to various embodiments.

[0017] FIG. 11 shows a representative diagram that shows an example of an image being calibrated using a reference object.

[0018] FIG. 12 shows a representative diagram that shows an example of an image where an object of interest is detected.

[0019] FIG. 13 shows a table that stores classification rates according to various embodiments.

[0020] FIG. 14 is a flow diagram showing a method of image recognition according to various embodiments.

[0021] FIG. 15 is a diagram illustrating an example of a hardware implementation for an apparatus employing a processing system.

[0022] Embodiments described below in context of the devices are analogously valid for the respective methods, and vice versa. Furthermore, it will be understood that the embodiments described below may be combined, for example, a part of one embodiment may be combined with a part of another embodiment.

[0023] It will be understood that any property described herein for a specific device may also hold for any device described herein. It will be understood that any property described herein for a specific method may also hold for any method described herein. Furthermore, it will be understood that for any device or method described herein, not necessarily all the components or steps described must be enclosed in the device or method, but only some (but not all) components or steps may be enclosed.

[0024] In this context, the device as described in this description may include a memory which is for example used in the processing carried out in the device. A memory used in the embodiments may be a volatile memory, for example a DRAM (Dynamic Random Access Memory) or a non-volatile memory, for example a PROM (Programmable Read Only Memory), an EPROM (Erasable PROM), EEPROM (Electrically Erasable PROM), or a flash memory, e.g., a floating gate memory, a charge trapping memory, an MRAM (Magnetoresistive Random Access Memory) or a PCRAM (Phase Change Random Access Memory).

[0025] The term “coupled” (or“connected”) herein may be understood as electrically coupled or as mechanically coupled, for example attached or fixed, or just in contact without any fixation, and it will be understood that both direct coupling or indirect coupling (in other words: coupling without direct contact) may be provided.

[0026] In order that the invention may be readily understood and put into practical effect, various embodiments will now be described by way of examples and not limitations, and with reference to the figures.

[0027] According to various embodiments, an image recognition device may classify an object in an image taken by a fixed camera, at least partially based on an estimated real size of the object. The image may be a two-dimensional image, in other words, may be void of depth information unlike a stereo image. By taking into consideration the size of the object, the classification process may be accurate, or may require less computation power as irrelevant classes may be ignored in the classification process. The image may be captured by a fixed camera, such as a CCTV, which may be installed at a permanent location and may always point to the same spot. The camera parameters, including at least one of its sensor resolution, location, color calibration, look angle, field-of-field may be known, such that the spatial scale parameter of the camera’s images is known.

[0028] In the context of various embodiments,“spatial scale parameter” may refer to the real size that a pixel in the image represents.

[0029] In the context of various embodiments,“real size” may refer to the size of an object in real space, in other words, real life size, in other words, actual size in world units.“Real size” may be used to distinguish from“image size”. Accordingly, the real size of a region in an image may refer to the equivalent size in real space that the region in the image represents or displays.

[0030] In the context of various embodiments,“image size” or“size in the image” may refer to the size of a representation of the object in the image.

[0031] In the context of various embodiments, “candidate object” may be used interchangeably with“candidate”.

[0032] In the context of various embodiments,“target object” may be used interchangeably with“target”.

[0033] According to various embodiments, the image recognition device may work with any type of camera.

[0034] According to various embodiments, the image recognition device may be used to monitor traffic, for example to detect traffic offences.

[0035] According to various embodiments, the image recognition device may be used for security monitoring, for example to detect vehicle intrusions.

[0036] FIG. 1 shows a block diagram of an image recognition device 100 according to various embodiments. The image recognition device 100 may include an image processor 102, a database 104, a determination unit 106 and a candidate classifier 108. The image recognition device 100 may optionally include a color calibration unit 110. The image processor 102, the database 104, the determination unit 106, the candidate classifier and the color calibration unit 110 may be coupled with each other, like indicated by lines 120, for example electrically coupled, for example using a line or a cable, and/or communicatively coupled, for example via direct wired or wireless connection or via indirect connection like through a computing cloud or a server. The individual components of the image recognition device 100 will be described in detail in relation to the other figures.

[0037] FIG. 2 shows an example of an image 200 captured by a fixed camera. Unlike a stereo image that has depth information for each pixel, the image 200 may have projection information at the entire image area. In other words, the image 200 may be void of any depth information. The image 200 may be two-dimensional. The image recognition device 100 may receive the image 200 from the fixed camera, for example, through a direct wired or wireless data transfer, or through a computing cloud or a server. The image 200 may show a scene, for example, a road. The image processor 102 of the image recognition device 100 may detect a plurality of candidate objects 202a to 202i, for a target object. In this example, the intended target object may be a vehicle, such as a car or a motorcycle. Each candidate object may be a cluster of pixels, i.e. occupy a region of the image 100, that the image processor 102 has detected as being possibly a target object. The image processor 102 may detect the plurality of candidate objects 202a to 202i based on conventional object detection techniques such as convolutional neural network (CNN), Region-based CNN (RCNN), YOLO. The image processor 102 may include a neural network that is trained on images of the target object. The image processor 102 may perform a quick and simple detection of objects that only vaguely resemble the target object, as the other components of the image recognition device 100, such as the determination unit 106 and the candidate classifier 108, may perform the higher accuracy portion of the image analysis. As such, the image processor 102 may only need to be trained on a small number of images of the target object.

[0038] FIG. 3 shows a flow diagram of a method of image recognition 300 according to various embodiments. The method of image recognition 300 may include a detection process 302, a feature extraction process 312, a classification process 308, a definition process 304 and a size estimation process 306. The detection process 302 may include detecting at least one candidate objects for a target object, in other words, detecting regions of an image that may show the target object or at least display an object that could be the target. The output of the detection process 302 may be candidate object(s). The feature extraction process 312 may include extracting image features, in other words, imagery characteristics, from the candidate object(s), such as the candidate objects 202a-202i. The image features may include at least one of edges, comers, interest points, blobs, ridges, colors and patterns. The image processor 102 may carry out the detection process 302 and the feature extraction process 312. The output of the feature extraction process 312 may be a feature descriptor, which may be a matrix or a vector. The size estimation process 306 may include estimating the real size of the candidate object. The size estimation process 306 may include determining an image size of the candidate object. Determining the image size of the candidate object may include determining the furthest distance between peripheral pixels in the candidate object along at least one axis. For example, a width of the candidate object may be determined in a horizontal axis by measuring the distance between a leftmost pixel and a rightmost pixel in the candidate object, and a height of the candidate object may be determined in a vertical axis by measuring the distance between a topmost pixel and a bottommost pixel in the candidate object. The real size of the candidate object may be estimated based on the candidate object image size and a spatial scale parameter. The spatial scale parameter may indicate the real size that each pixel of the image represents. The spatial scale parameter may differ along different positions of the image, as the image may show a skewed view of a scene. The output of the size estimation process 306 may be the estimated real size of the candidate object. The size estimation process 306 will be described in further details with respect to FIG. 10. The determination unit 106 may carry out the size estimation process 306.

[0039] The definition process 304 may include defining a plurality of classes for the target object, and storing the definitions in the database 104. The definitions may include a real size range and a set of image features for each class of the plurality of classes. The classes may be different categories for the target object. As an example, if the target object is an object on the road, the plurality of classes may include“Car”,“Bike”,“Truck”, and“Person”. Each class may be associated with a respective real size range which may include a sub-range for a first dimension along a first axis (such as width) and a sub-range for a second dimension along a second axis (such as height). The second axis may be orthogonal to the first axis. Conventional methods of feature extraction, such as blob detection, edge detection, corner detection etc. may be used on a group of images of each class of target object, to define the set of image features for each class. The output of the definition process 304 may be the definitions of each class of the plurality of classes. Each class may include a plurality of sub classes where each sub-class may correspond to a respective view of the target object of the class. Each sub-class may have its associated real size range and set of image features, since most objects may be visually different according to the perspective from which the objects are viewed. For example, the sub-classes may include at least one of front view, left side view, right side view, top view and bottom view.

[0040] The classification process 308 may include receiving the definitions of each class, and selecting the classes that match with the estimated real size of the candidate object. In other words, the estimated real size of the candidate object may be compared to the real size range of each class. The classes that have real size ranges that do not match the estimated real size of the candidate object may be eliminated from subsequent steps in the classification process. The classes that have real size ranges that at least substantially match the estimated real size, may be selected for the subsequent steps in the classification process. The classification process 308 may further include comparing the feature descriptor obtained from the feature extraction process 312, to the set of image features associated with each class of the selected classes, to obtain the likelihood for each class of the selected classes. The likelihoods for each class may indicate the probability that the candidate object contains a target of the respective class. For example, the estimated size of the candidate object may be fairly small such that only the“Bike and“Person” classes may be selected. The likelihoods 314b, 314d may be computed for these selected classes. Computational power may be saved, by not having to compute the likelihoods 314a, 314c, for the eliminated classes“Car” and“Truck”. The classification process 308 may further include classifying the candidate object into the selected class that has the highest likelihood. For example, the likelihood for“Bike” 314b may be 0.9 which is higher than the likelihood for“Person” which may be 0.7. Thus, the classification process 308 may generate a classification output 316 that indicates the candidate object contains a target object belonging to the“Bike” class. The candidate classifier 108 may carry out the classification process 308. The method 300 may use the estimated real size of the candidate object as a filter, to remove irrelevant classes, prior to classifying the candidate object based on feature recognition.

[0041] FIG. 4 shows a flow diagram of a method of image recognition 400 according to various embodiments. Like the method 300, the method of image recognition 400 may also include the detection process 302, the feature extraction process 312, the definition process 304 and the size estimation process 306. The method 400 may differ from the method 300, in that it includes a classification process 408 instead of the classification process 308. In the classification process 408, the likelihoods 314a-314d are computed for each class based on comparing the feature descriptor from the feature extraction process 312 to the set of image features of the respective class. Computing the likelihoods 314a-314d may further include comparing the estimated real size of the candidate object to the real size ranges of each class. The classification process 408 may further include computing a second set of likelihoods 414b and 414d based on comparing the estimated real size of the candidate object to the real size ranges of each class. In the example shown in FIG. 4, the estimated real size of the candidate object may fall within the real size ranges of only“Bike” class and“Person” class. The likelihoods 414b and 414d may then be computed based on the estimated real size of the candidate object and the real size ranges of the qualifying classes“Bike” and“Person”. The likelihood 414b of “Bike” may be higher than the likelihood 414d of “Person”, as the estimated real size of the candidate object may be closer to a median or average real size value for“Bike” as compared to that for“Person”. Alternatively, or additionally, the likelihood 414b of“Bike” may be higher than the likelihood 414d of“Person” because the variance of the real size range of“Bike” may be lower as compared to that for“Person”. The classification process 408 may further include obtaining a classification output 416 based on combining the likelihoods 314a-314d obtained based on features and optionally, real size estimation, with the likelihoods 414b, 414d obtained based on the real size estimation. The combination process may include multiplication and/or addition, or other combination operators. The combination process may include using the likelihoods as inputs to a function. The classification output 416 may indicate the class that the target object contained in the candidate object belongs to. The candidate classifier 108 may carry out the classification process 408. The method 400 may use the estimated real size of the candidate object to refine the classification results obtained based primarily on feature recognition, to improve classification accuracy.

[0042] According to various embodiments, the method of image recognition 300 or 400 may be a computer vision technique.

[0043] According to various embodiments, the method of image recognition 300 or 400 may further include calibrating the colors of the entire image, or at least image regions that contain the candidate objects. Calibrating the colors of the image or regions of the image may include determining a color of a reference object in the image. The real world color, herein referred to as true color, of the reference object may be known, such that the color of the reference object in the image may be compared against the true color of the reference object to obtain a calibration vector. The entire image, or the image regions of interest, may be calibrated for color, by adjusting the color values of each pixel of the image or the image regions of interest, by the calibration vector. The color calibration unit 110 may perform the above described color calibration process.

[0044] According to various embodiments, in the method of image recognition 300 or 400, the process of classifying the candidate object may further include eliminating irrelevant sub classes based on at least one of the extracted image features and the determine real size of the candidate object. The irrelevant sub-classes may also be eliminated based on knowledge of the viewing angle of the camera that captured the image. For example, if the camera is looking at oncoming traffic, the bottom view and the rear view of the target object may be disregarded.

[0045] FIG. 5 shows a flow diagram 500 of a method of image recognition according to various embodiments. Like described above with respect to FIGS. 3 and 4, the method of image recognition may include the detection process 302, the feature extraction process 312, the definition process 304, the size estimation process 306 and a classification process. The detection process 302 may include detecting candidate object regions, herein also referred to as candidates, in the image 200. The detected candidate object regions may be provided to the image processor 102 to extract the feature descriptor 512 and the estimated real size 506 of each candidate objection region. The process of extracting the features to obtain the feature descriptor 512 may be the feature extraction process 312. The process of obtaining the estimated real size 506 may be the size estimation process 306. The classification process may include calculating classification rates 514 for each class, based on the feature descriptor 512, the estimated real size 506, and definitions 504. Alternatively, the classification process may calculate the classification rates 514 based on the image size of the candidates instead of the estimated real size. The classification rates 514 may include the likelihoods of the candidate object belong to the respective classes. The definitions 504 may include the real size ranges of each class. The real size ranges may include buffer ranges. The candidate classifier 108 may compare the feature descriptor 512 and the estimated real size 506, against the definitions 504 using a predefined classification model 528, to obtain the classification rates 514. The classification model 528 may include at least one of deep learning, support vector machine, bootstrap, and random forest. The classification process may further include filtering the classes based on the estimated real size. This may improve classification accuracy and reduce calculation time. For example, while the classification rate 514 for Object 1 may be the highest for Class 10, the estimated real size of Object 1 may not match the real size range of Class 10. As such, the next highest classification rate that matches the estimated real size of Object 1 may be selected, for example, Class 1. For example, the estimated real size of Object 2 may match Class 1 and 10 only, so Class 2 may be filtered out even if the classification rate 514 of Object 2 may be high. Object 2 may be classified as Class 1, which may have the highest classification rate among the classes that are not filtered out based on size. The filtering process may also take into account the expected shape of the classes, which may include the dimensions of the object in more than one axis, for example, length and width.

[0046] FIG. 6 shows a flow diagram of a method of image recognition 600 according to various embodiments. The method 600 may be part of the method of image recognition 300 or 400. The method 600 may include the detection process 302, herein also referred to as “candidate object detection”. The method may include spatial information estimation 650 based on camera parameters 630 and objection information 640. The camera parameters 630 may include information about the camera, for example, at least one of its tilt angle, image sensor resolution, camera calibration, and lens distortion. The camera may be a CCTV set on fixed position such that the camera parameters may be constant. The camera may have a non- vertical camera setting, may have good accuracy in long range view or low camera resolution. The object information 640 may include size and image features of the target object according to various classes, described in further details with respect to FIG. 7. The object information 640 may include, or may be part of, the definitions 304. The spatial information estimation 650 may generate spatial information 660. The spatial information 660 may include the spatial scale parameters of the images captured by the camera, which may depend on a position in the images. The image recognition device may receive a video 670 from the camera. The image processor 102 of the image recognition device may detect candidate objects 202 in a detection process. The determination unit 106 may estimate the real size of the candidate object 202 in a size estimation process 306, also referred herein as candidate area estimation process. The output of the candidate area estimation may be the candidate object information 680 which will be described with respect to FIG. 9.

[0047] FIG. 7 shows a table 700 that stores object information 640 according to various embodiments. The table 700 may include an identifier column 702, a first size column 704, a first range column 706, a second size column 708, and a second range column 710. The table 700 may include, or may be part of, the definitions 504. Each row in the table 700 may relate to a respective class. The identifier column 702 may store descriptions for the plurality of classes, for example“Car”,“Bike” etc. The first size column 704 may store a first dimension of objects in the class. The second size column 708 may store a second dimension of objects in the class. The first dimension and the second dimension may be length, width, or height. The first range column 706 may store a range for the first dimension. The second range column 710 may store a range for the second dimension.

[0048] FIG. 8 shows a representative diagram 800 of the spatial information 660 according to various embodiments. The image 200 may include regions 880 where the target object may not appear. The image recognition device may mask out these regions 880, to reduce the computation workload, as well as to increase the classification accuracy by not considering objects that area in these regions 880. The image recognition device may limit the detection of candidates to the relevant region 882. For example, the image 200 may be a photograph of an expressway and the target may be a vehicle that travels on the expressway. The relevant region 882 may show the expressway. Since the vehicle cannot appear outside of the expressway in the photograph, any objects that appear in the regions 880 are not objects of interest. The gridlines 884 may be indicative of the spatial scale parameters in the image 200. The spatial scale parameters will be described subsequently with respect to FIG. 10. [0049] FIG. 9 shows a table 900 that stores candidate region information 680 according to various embodiments. The table 900 may include an identifier column 902, a first size column 904, a first range column 906, a second size column 908, and a second range column 910. Each row in the table 900 may relate to a respective candidate region 202. The identifier column 902 may store a unique identifier for each of the candidate regions 202. The first size column 904 may store a first dimension of the candidate regions 202. The second size column 908 may store a second dimension of the candidate regions 202. The first dimension and the second dimension may be length, width, or height. The first dimension and the second dimension may be the estimated real size of the candidate regions 202. The first range column 906 may store a range for the first dimension. The second range column 910 may store a range for the second dimension.

[0050] FIG. 10 shows a representative diagram 1000 that illustrates the spatial scale parameters according to various embodiments. A candidate object 202 may be detected in the image 200. The candidate object 202 may include the entire target object so that the real size of the target object may be approximated based on the image size of the candidate object 202. The spatial scale parameters may include different image sizes that correspond to a base size (for example: lm) in world units, at different positions in the image 200. For example, at a lower end of the image 200, the image length 1020a represents the base size, and at a middle section of the image 200, the image length 1020b may represent the same base size. The image length 1020a may be longer than the image length 1020b that represents that same base size. Similarly, the image height 1020c that represents the base size may be shorter than the image height 1020d that represents that same base size, because the image height 1020c is at an upper section of the image 200 whereas the image height 1020d is at a lower section of the image 200. The differences in the spatial scale parameters may be a result of the image being captured at a skewed angle. As such, computation of the real size of the candidate object 202 may require both the image size of the candidate object 202, as well as the position of the candidate object 202 in the image 200.

[0051] FIG. 11 shows a representative diagram 1100 that shows an example of an image being calibrated using a reference object 1102. In the example shown in FIG. 11, the image shows a road and the reference object 1102 is a traffic information board. The reference object 1102 may be a known object, i.e. the real world visual characteristics of reference object 1102 are known and may be stored in the database 104. The visual characteristics may include color, and may further include size and shape. The color of the reference object 1102 may be sampled from the image, and stored as a referential marker:

RGB’r = [r’i, g’i, b’J

The referential marker may be compared against the color of the reference object 1102 as obtained from the known visual characteristics of the reference object 1102. A correction factor may be computed as follows:

RGB correction factor = RGB \ / RGB_r

where RGB_r is the known colour (i.e. real world colour) of the reference object 1102. The correction factor may be applied to the entire image, or at least to the detected candidate objects, by for example, multiplying the correction factor to each pixel in the image or in the candidate objects. The image may be calibrated for colour, with the application of the correction factor. Colour is often important for object recognitions, and thus the calibration of the image for colour may significantly improve the accuracy of the classification process.

[0052] FIG. 12 shows a representative diagram 1200 that shows an example of an image where an object of interest is detected. In this example, the object of interest may be a car. The region of the image where the object of interest appears may be referred herein as a candidate 1202. The database 104 may store definitions 304 of a plurality of classes with respect to each side/view of the target object. The characteristics of the plurality of views of each class may be stored as sub-classes. Each sub-class may contain information on a respective view of the target. However, very often in reality, the images taken by a fixed camera may only capture one or two views of the target by virtue of the viewing angle of the camera with respect to a scene where the target appears. In the example shown, the camera is positioned above and in front of a road such that vehicles travelling on the road head towards the camera. As a result, the images captured by the camera primarily show the front view and top view of the vehicles. Consequently, the side and back views of the vehicles may hardly appear in the images and may be of little value to the candidate classification process. The image recognition device may determine the irrelevant views and may eliminate the irrelevant views, i.e. sub-classes from being considered in the classification process. As a result, the computing cost of the classification process may be lowered and the accuracy may be improved. The irrelevant sub-classes may be eliminated based on at least one of the extracted image features and the determined real size of the candidate object 1202.

[0053] FIG. 13 shows a table 1300 that stores classification rates according to various embodiments. The table 1300 may include a sub-class column 1302, a front view column 1304, a back view column 1306, a side view column 1308, and a top view column 1310. The sub-class column 1302 may store a descriptor for the sub-class, for example, to specify the view of the target object. Each of the columns 1304, 1306, 1308 and 1310 may store the classification rates for the respective views. For example, if the image recognition device has determined that the back view and the side views are irrelevant views that do not appear in the image, these two sub-classes may be disregarded by the candidate classifier 108.

[0054] FIG. 14 is a flow diagram 1400 showing a method of image recognition according to various embodiments. The method may include elements 1402, 1404, 1406, 1408 and 1410. Element 1402 may include defining a plurality of classes for a target object. Each class of the plurality of classes may be associated with a respective real size range of the target object and a respective set of image features of the target object. Element 1404 may include detecting a candidate object for the target object in a two-dimensional image. Element 1406 may include extracting image features from the candidate object. Element 1408 may include determining a real size of the candidate object based on a size of the candidate object in the two- dimensional image and a position of the candidate object in the image. Element 1410 may include classifying the candidate object, based on comparing the determined real size of the candidate object to the real size range associated with at least one class of the plurality of classes for the target object and comparing the extracted image features to the set of the image features associated with at least one class of the plurality of classes.

[0055] In other words, the method of image recognition may include defining a plurality of classes for a target object, detecting a candidate object for the target object in an image, extracting image features from the candidate object, determining a real size of the candidate object, and classifying the candidate object. The image may be a two-dimensional image, and may be captured by a fixed camera. The fixed camera may be configured to capture images at a fixed angle and at a fixed location. The image may be void of depth information. Defining the plurality of classes of the target object may include storing the respective real size ranges of the target object for each class in a database, and may also include storing the respective set of image features of the target object for each class in the database. As an example, the target object may be an object on a road. The target object may be any one of a person, a car, a truck, a bicycle, or a motorcycle. The plurality of classes may be defined accordingly to the possibilities of the target object, i.e.“person”,“car”,“truck”,“bicycle” and“motorcycle”. Each of these classes may have different real size dimensions. These real size dimensions may be stored in the database and may be associated with the respective classes. Each of these classes may also have different image features, for example, different shapes, edges, and colors. These image features may also be stored in the database and may be associated with the respective classes. Each class may include a plurality of sub-classes for different views of the target object, such as front view, side views, rear view and top view. Each sub class may be associated with its respective set of image features and real size dimensions. These definitions of each sub-class may be stored in the database.

[0056] Detecting a candidate object for the target object may include detecting a region of interest in the image that may contain the candidate object. The candidate object may belong to one of the plurality of classes for the target object. Extracting image features from the candidate object may include identifying and storing information on features such as edges, shapes, colors, lines and blobs that appear in the region of interest. Determining the real size of the candidate object may include determining a size of the candidate object in the image and a position of the candidate object in the image. The real size of the candidate object may be computed based on a spatial scale parameter and the determined size of the candidate object. The spatial scale parameter may depend on the position of the candidate object in the image. Determining the real size of the candidate object may include determining a distance of the candidate object from a camera that captured the image, based on the position of the candidate object in the image. The spatial scale parameter may be determined based on the distance of the candidate object from the camera. The image may include a reference object. The real size of the candidate object may be known. The real size of the candidate object may be determined by comparing the size of the candidate object in the image to the size of the reference object in the image. Alternatively, or additionally, the real size of the candidate object may be determined by comparing the position of the candidate object in the image to a position of the reference object in the image. For example, the reference object may be a traffic signboard, a landmark or any other object whose location is known. For example, the reference object may be a pathway such as a road, an expressway or a corridor, where the target object may be constrained to moving along. The color of the reference object may be known, such that the known color may be used in calibrating color values of the candidate object. Classifying the candidate object may include selecting at least one class associated with real size ranges that at least substantially match the determined real size, and comparing the extracting image features to only the set of image features associated with the at least one class. Alternatively, classifying the candidate object may include computing a respective likelihood of the candidate object belonging to each class of the plurality of classes based on comparing the extracted image features to the respective set of image features associated with the class, and rating each computed likelihood based on comparing the determined real size range associated with the respective class. Classifying the candidate object may include eliminating irrelevant sub-classes of at least one class based on at least one of the extracted image features and the determined real size. The candidate object may also be classified at least partially based on calibrated color values of the candidate object.

[0057] FIG. 15 is a diagram 1500 illustrating an example of a hardware implementation for an apparatus 100' employing a processing system 1502. In one embodiment, the apparatus 100’ may be the image recognition device 100 described above with reference to FIG. 1. The processing system 1502 may be implemented with a bus architecture, represented generally by the bus 1524. The bus 1524 may include any number of interconnecting buses and bridges depending on the specific application of the processing system 1502 and the overall design constraints. The bus 1524 links together various circuits including one or more processors and/or hardware components, represented by the processor 1504, the components 102, 104, 106, 108, 110 and the computer-readable medium / memory 1506. The bus 1524 may also link various other circuits such as timing sources, peripherals, voltage regulators, and power management circuits, which are well known in the art, and therefore, will not be described any further.

[0058] While embodiments of the invention have been particularly shown and described with reference to specific embodiments, it should be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. The scope of the invention is thus indicated by the appended claims and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced. It will be appreciated that common numerals, used in the relevant drawings, refer to components that serve a similar or the same purpose.

[0059] It will be appreciated to a person skilled in the art that the terminology used herein is for the purpose of describing various embodiments only and is not intended to be limiting of the present invention. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

[0060] It is understood that the specific order or hierarchy of blocks in the processes / flowcharts disclosed is an illustration of exemplary approaches. Based upon design preferences, it is understood that the specific order or hierarchy of blocks in the processes / flowcharts may be rearranged. Further, some blocks may be combined or omitted. The accompanying method claims present elements of the various blocks in a sample order, and are not meant to be limited to the specific order or hierarchy presented.

[0061] The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein, but is to be accorded the full scope consistent with the language claims, wherein reference to an element in the singular is not intended to mean“one and only one” unless specifically so stated, but rather“one or more.” The word“exemplary” is used herein to mean“serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects. Unless specifically stated otherwise, the term“some” refers to one or more. Combinations such as “at least one of A, B, or C,”“one or more of A, B, or C,”“at least one of A, B, and C,”“one or more of A, B, and C,” and“A, B, C, or any combination thereof’ include any combination of A, B, and/or C, and may include multiples of A, multiples of B, or multiples of C. Specifically, combinations such as“at least one of A, B, or C,”“one or more of A, B, or C,” “at least one of A, B, and C,”“one or more of A, B, and C,” and“A, B, C, or any combination thereof’ may be A only, B only, C only, A and B, A and C, B and C, or A and B and C, where any such combinations may contain one or more member or members of A, B, or C. All structural and functional equivalents to the elements of the various aspects described throughout this disclosure that are known or later come to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims. Moreover, nothing disclosed herein is intended to be dedicated to the public regardless of whether such disclosure is explicitly recited in the claims. The words “module,”“mechanism,”“element,”“device,” and the like may not be a substitute for the word“means.” As such, no claim element is to be construed as a means plus function unless the element is expressly recited using the phrase“means for.”

Claims

1. A method of image recognition, the method comprising:

defining a plurality of classes for a target object, wherein each class is associated with a respective real size range of the target object and a respective set of image features of the target object;

detecting a candidate object for the target object in a two-dimensional image;

extracting image features from the candidate object;

determining a real size of the candidate object based on a size of the candidate object in the two-dimensional image and a position of the candidate object in the two-dimensional image; and

classifying the candidate object, based on comparing the determined real size of the candidate object to the real size range associated with at least one class of the plurality of classes for the target object and comparing the extracted image features to the set of image features associated with at least one class of the plurality of classes.

2. The method of claim 1, wherein determining the real size of the candidate object comprises determining a distance of the candidate object from a camera that captured the image based on the position of the candidate object in the two-dimensional image.

3. The method of any one of claims 1 to 2, wherein classifying the candidate object comprises selecting at least one class associated with real size ranges that at least substantially match the determined real size, and comparing the extracted image features to only the set of image features associated with the selected at least one class.

4. The method of any one of claims 1 to 2, wherein classifying the candidate object comprises computing a respective likelihood of the candidate object belonging to each class of the plurality of classes based on comparing the extracted image features to the respective set of image features associated with the class, and rating each computed likelihood based on comparing the determined real size to the respective real size range associated with the respective class.

5. The method of any one of claims 1 to 4, wherein the two-dimensional image is captured by a fixed camera, the fixed camera being configured to capture images at a fixed angle and at a fixed location.

6. The method of any one of claims 1 to 5, further comprising:

calibrating color values of the candidate object based on a known color of a reference object in the two-dimensional image; and

classifying the candidate object further based on the calibrated color values of the candidate object.

7. The method of any one of claims 1 to 6, wherein each class comprises sub-classes for different views of the target object, wherein classifying the candidate object comprises eliminating irrelevant sub-classes based on at least one of the extracted image features and the determined real size.

8. The method of any one of claims 1 to 7, wherein determining the real size of the candidate object comprises comparing the position of the candidate object in the two- dimensional image to a position of a reference object in the two-dimensional image, wherein the reference object is a pathway, wherein the target object is constrained to moving along the pathway.

9. The method of any one of claims 1 to 8, wherein determining the real size of the candidate object comprises comparing the size of the candidate object in the image to a size of a reference object in the two-dimensional image.

10. An image recognition device comprising:

a database storing definitions of a plurality of classes for a target object, wherein each class is associated with a respective real size range of the target object and a respective set of image features of the target object;

an image processor configured to detect a candidate object for the target object in a two-dimensional image,

wherein the image processor is further configured to extract image features from the candidate object; a determination unit configured to determine a real size of the candidate object based on a size of the candidate object in the two-dimensional image and a position of the candidate object in the two-dimensional image; and

a candidate classifier configured to classify the candidate object based on comparing the determined real size of the candidate object to the real size range associated with at least one class of the plurality of classes for the target object and comparing the extracted image features with the set of image features associated with at least one class of the plurality of classes.

11. The image recognition device of claim 10, wherein the determination unit is configured to determine a distance of the candidate object from a camera that captured the two-dimensional image based on the position of the candidate object in the two-dimensional image.

12. The image recognition device of any one of claims 10 to 11, wherein the candidate classifier is configured to select at least one class associated with real size ranges that at least substantially match the determined real size, and further configured to compare the extracted image features to only the set of image features associated with the selected at least one class.

13. The image recognition device of any one of claims 10 to 12, wherein the candidate classifier is configured to compute a respective likelihood of the candidate object belonging to each class of the plurality of classes based on comparing the extracted image features to the respective set of image features associated with the class, and further configured to rate each computed likelihood based on comparing the determined real size to the respective real size range associated with the respective class.

14. The image recognition device of any one of claims 10 to 13, wherein the two- dimensional image is captured by a fixed camera, the fixed camera being configured to capture images at a fixed angle and at a fixed location.

15. The image recognition device of any one of claims 10 to 14, further comprising: a color calibration unit configured to calibrate color values of the candidate object based on a known color of a reference object; and classifying the candidate object further based on the calibrated color values of the candidate object.

16. The image recognition device of any one of claims 10 to 15, wherein each class comprises sub-classes for different views of the target object, wherein the candidate classifier is configured to eliminate irrelevant sub-classes based on at least one of image features and the determined real size.

17. The image recognition device of any one of claims 10 to 16, wherein the determination unit is configured to compare the position of the candidate object in the image to a position of a reference object in the two-dimensional image, wherein the reference object is a pathway, wherein the target object is constrained to moving along the pathway.

18. A non-transitory computer readable medium storing instructions which when executed by a computer, makes the computer perform a method of image recognition, the method comprising:

detecting a candidate object for the target object in a two-dimensional image;

extracting image features from the candidate object;