WO2018059629A1

WO2018059629A1 - Detection and validation of objects from sequential images from a camera by means of homographs

Info

Publication number: WO2018059629A1
Application number: PCT/DE2017/200100
Authority: WO
Inventors: Michael Walter
Original assignee: Conti Temic Microelectronic Gmbh
Priority date: 2016-09-29
Filing date: 2017-09-28
Publication date: 2018-04-05
Also published as: DE112017003463A5; DE102016218851A1

Abstract

The invention relates to a method and to a device for identifying objects from images from a camera and can be used in particular in camera-based driver assistance systems. The method for detecting objects from a series of images from a vehicle camera comprises the following steps: a) capturing a series of images by means of the vehicle camera, b) determining corresponding features in two consecutive images, d) associating determined adjacent corresponding features in an image region with a plane in space, and f) determining additional corresponding features in the image region while taking into account the associated plane. The invention offers the advantage of compression of the optical flow field.

Description

DETECTION AND VALIDATION OF OBJECTS FROM SEQUENTIAL IMAGES OF A CAMERA BY HOMOGRAPHIES

The invention relates to a method for the detection of objects from images of a camera and can be used in kameraba ^¬ overbased driver assistance systems in particular.

Vehicle recognition systems according to the current state of the art are mostly classification-based. Classification-based systems can recognize vehicles or vehicle components that they have seen in their training data. However, new vehicle designs, as well as changing structures can lead to a greatly reduced system performance and require generic approaches to object recognition.

EP 2 993 654 A1 shows a method for front collision warning (FCW) from camera images. Here an image section is analyzed, where the own vehicle will arrive within a given time interval. If an object is detected there, a collision warning is issued.

US 2014/0161323 A1 shows a method for generating dense three-dimensional structures in a road environment from images taken with a monocamera.

It is an object of the present invention to provide an improved method for detecting objects.

A starting point of the invention are the following considerations: If the camera positions of two frames (frames) are known, point correspondences can (corresponding feature points) triangulate, but there are no objects ge ^¬ neriert because the triangulation has no model knowledge that a point cloud could clusters into meaningful objects. Disadvantages of monocular systems are that objects close to the epipole can only be inaccurately triangulated and there are the smallest errors in the egomotion (camera self-motion) noticeable. An epipole is the pixel in a first camera image, at which the center of the camera is imaged at a second time. For example, during a straight-ahead journey, the vanishing point corresponds to the epipole. However, this is the relevant area to detect collisions with stationary or forward vehicles. Dynamic objects can be triangulated as they move according to the epipolar geometry. However, they are estimated too close or too far away due to the unknown relative speed.

If, instead of individual correspondence, a plurality of (neigh- bouring) correspondences (corresponding features) are considered, objects can be due to different VELOCITY ^¬ speeds, segment scaling and deformation.

An inventive method for the detection of objects from a sequence of images of a vehicle camera comprises the steps:

a) taking a sequence of images with the vehicle camera, b) determining corresponding features in two successive images,

d) assignment of (adjacent) determined corresponding features in an image area to a plane in space, and f) determination of additional corresponding features in the image area, taking into account the level associated with (in step d)).

Preferably, the vehicle camera is designed to receive an environment of a vehicle. The environment is in particular the environment in front of the vehicle. Before ^¬ Preferably, the vehicle camera in a Fahrerassistenzvor- direction integrated or connectable to this, wherein the driver assistance device is formed in particular for object recognition from the provided by the vehicle camera device image data. Preferably, the vehicle camera device is a camera to be arranged in the interior of the motor vehicle behind the windshield and directed in the direction of travel. Particularly preferably, the vehicle camera is a mo ^¬ nokulare camera. Preferably, individual images are taken with the vehicle camera at specific or known times, resulting in a sequence of images.

Correspondence is the equivalent of a feature in a first image to the same feature in a second image. Corresponding features in two images can also be described as a flow vector indicating how the feature has shifted in the image. In particular, a feature may be an image patch, a pixel, an edge or a corner.

In step d), it is also subsumed that a plurality of levels is specified in space, and an assignment of (adjacent) corresponding features to one of the given levels is carried out (see step d2 below or d3)).

The term "plane" in the context of the present invention describes the following relationships: on the one hand a criterion for the accumulation of adjacent corresponding features, ie these are considered to belong together if they lie in a common plane in space and develop in time according to the movement of the plane.

Such accumulated corresponding features are referred to as ^¬ closing as "ground level", as they are all in the Plane corresponding to the road level. However, such a ground plane does not extend to infinity, but rather means a subarea of the plane, namely the one in which corresponding features are actually arranged.

In step f), the phrase "taking into account ..." means that the level associated with an image area in step d) is taken into account in determining additional corresponding features Features in an image area, a motion of the associated plane is determined and then predicted, where features of the same image area in a first image will be found in a (subsequent) second image as a result of the movement of the associated plane determined corresponding features.

For example, the term "detection of objects" may mean a generation of object hypotheses or objects.

According to a preferred embodiment, the method comprises the step:

c) Calculation of homographies for the determined corresponding features in an image area so that they can be assigned to a level in the room.

Homography describes the correspondence of points on a plane between two camera positions or the correspondence of two points in two consecutive images of the vehicle camera. By calculating homographies for the determined corresponding features in an image area, the assignment to one level in space can thus take place (see step d)). The homographies are preferably determined generically from the image or from successive images. A default of a distance of a plane to the camera position is typically not required.

Advantageously, the method comprises the steps:

d2) assignment of the determined corresponding features to one of a plurality of planes predetermined orientation in space, and

e) Assignment to the plane in space which gives the least backprojection error for the identified corresponding features, the backprojection error indicating the difference in between the measured correspondence of a feature in two successive images and the prediction of the feature predicted from the computed homography. In other words, the backprojection error of a plane describes the difference between a point x at time t-0 and the corresponding point mapped according to the homography of that plane at the previous time t-1 (see below: equation 4).

In particular, based on the calculated homographies, the corresponding features can be segmented, that is, assigned to different image regions (or segments). In step f), it is then possible to determine additional corresponding features in an image area taking into account the assigned plane.

An advantageous development of the method comprises the step d3): Assignment of (adjacent) corresponding features to a respective ground level, a backplane or a sidewall level. In the case of a coordinate system in which the x-direction is horizontal or lateral, the y-direction vertical and the z-direction in the vehicle longitudinal direction, a ground plane normal to the y-direction, a backplane normal to the z-direction and a side wall plane normal to the x-direction can be specified.

By calculating homographies of a ground plane, a backplane and a sidewall plane, an assignment to one of these planes can take place for corresponding features in an image area or for each corresponding feature. The distance (from the on-vehicle camera), in particular to a rear wall or side wall layer is advantageous that He ^¬ result of Homographieberechnung and not a predetermined assumption.

Preferably, the homographies for the backplane can be calculated according to equation (10) or for the ground plane according to equation (9) or for the sidewall plane according to equation (11). Here, a, b, c are constants, xo, yo, xi, yi denote correspondences in the first image (index 0) taken at a time t-0, and a second image (index 1) taken at an earlier time t-1 and t _x, t _y, t _z are the components of the vector t / d. t describes the translation of the vehicle camera and d the distance to a plane (perpendicular to this plane), ie along the normal vector of this plane. The components t _x , t _y and t _z are also referred to below as "inverse TTC." TTC comes from, time to collision ^x and results in a spatial direction as a distance divided by translation speed.

Advantageously, starting from an already determined division of an image into different image regions, each with an associated plane for a second image, first or few mutually corresponding features in an image region in the second and a subsequent third image can be determined. From this first or a few corresponding features then a homography for this image area is recalculated, and the newly calculated homography is used to predict the position and shape of additional or further corresponding features in the third image. Preferably, if in a subsequent third image in an image area not enough (first) corresponding features for the recalculation of a homography can be determined, the homography from the second image calculated for the image area (based on the first and second images) can be the position and predetermining additional corresponding features of the image area in the third image.

This makes the correspondence finding more robust against changes in shape and scale. According to a preferred embodiment, for each image area with an associated level, a current image may be warped (projected or transformed) onto a previous image corresponding to the computed homography (the associated plane) to determine additional features corresponding to each other in the last and in the current image , These Fully ^¬ staltung leads to compression of the optic flow field.

According to an advantageous development, if several levels occur with identical orientation, the planes with identical orientation can be separated on the basis of the associated t _x , t _y , t _z values. For example, two backplane levels, which are different distances from the vehicle camera in the z-direction, can be distinguished from each other by different t _z values.

Preferably, an image can be subdivided by a grid into similar cells and for each cell may be made of the fact he ^¬ mediated corresponding features a homography loading be expected. Cells with consistent homography can then be clustered.

Preferably, if the calculated homography a first cell is not sufficiently matches a homography a neighboring cell for determining a level limit, a so-called rear projection errors of the various corresponding features are considered before ^¬ geous.

Corresponding characteristics can be evaluated by the backprojection error. The backprojection error indicates the difference between the measured flux and the flux predicted from the computed homography.

If the backprojection error of a corresponding feature in a first cell is compared with the backprojection errors of the neighboring cell homographies and that corresponding least-homography feature is assigned, then the boundary (or cluster boundary) within the first cell can be refined. In this way, different corresponding characteristics of a cell can be assigned to different levels.

Preferably, the assignment of planes to adjacent corresponding features can be determined substantially in the entire image of the vehicle camera (for example in at least 80% of the image area, preferably at least 90%). Since the method according to the invention can be designed very quickly, generic object detection or scene interpretation is possible for almost the entire image in real time.

The invention further provides an apparatus for detecting objects of a sequence of images of an in-vehicle camera comprising a camera control device and an off ^¬ evaluation electronics. The camera control unit is designed to

a) to record a sequence of images with the vehicle camera. The transmitter is designed to

b) to determine corresponding features in one image area in two successive images,

d) to assign the determined corresponding features in an image area to a plane in the room, and

f) determine additional corresponding features in the image area taking into account the associated plane.

The camera control unit or the evaluation electronics may in particular comprise a microcontroller or processor, a digital signal processor (DSP), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) and the like, as well as software for carrying out the corresponding control or evaluation steps include. The present invention thus may be plemented in digital electronic circuitry, computer hardware, firmware or software in ^¬.

Further features, advantages and effects of the invention will become apparent from the following description of preferred embodiments of the invention. Showing:

Fig. 1 shows schematically a typical deformation of an approaching backplane;

Fig. 2 schematically shows a typical deformation of an approaching ground plane;

Fig. 3 shows schematically a typical deformation a) of a fast and b) a slowly approaching or further away backplane; Fig. 4 shows schematically a subdivision of an image with two different segments into cells;

Fig. 5 Segmentation results after a third

Iteration step;

Fig. 6 Plane orientation for target validation

(Validation of potential collision effects);

Fig. 8 Projection (or warping) of the guardrail segment at time t-0 (right) to t-1 (left).

Corresponding parts are generally provided in all figures with the same reference numerals.

In Fig. 1, a backplane is shown schematically (back plane), which occupies the hatched area (20, dotted line) at a first time t-1. At a subsequent time t, the distance between the vehicle camera and the rear wall plane has decreased, which leads to the deformation of the area (21, solid line) of the backplane in the image indicated by the arrows (d1). The range (20; 21) scales or increases as a result of the relative movement of the vehicle ^camera to the backplane.

FIG. 2 schematically shows a ground plane (ground plane) which occupies the region (30, dotted line) shown hatched at a first instant t-1. This could be a section of road surface on which the vehicle is traveling. As a result of the self-motion of the vehicle camera, the area (in the picture) changes to a subsequent one Time t, which leads to the sketched by the arrows (d2) deformation of the region (32) of the ground plane. At time t, the lines labeled 32 limit the area of the ground plane. The "edge plane" is thus understood to mean, for example, signatures (or boundary points) on the roadway surface that can be tracked in the image sequence Fig. 3a: 20, 21, deformation dl) and a slowly (Fig. 3b) approaching backplane (20, 23, deformation d3), if at time t-1, the backplane (20) in Fig. 3a the same Distance to the vehicle camera has as the backplane (20) in Fig. 3b.

Alternatively, Figure 3 could illustrate the difference between a near backplane (Figure 3a: 20, 21; deformation dl) and a more distant backplane (20, 23; deformation d3), e.g. moving at the same (relative) speed, then the object (20, 21) shown in Fig. 3b would be larger in real space than the object (20, 23) shown in Fig. 3a.

If several adjacent correspondences are considered instead of individual correspondences, objects can be segmented due to different speeds, scaling and deformation.

Assuming that the world is made up of layers, you can describe them with homographies and, as shown below, separate them by their distance, speed and orientation.

A homography describes the correspondence of points on a plane between two camera positions or the correspondence of two points in two consecutive frames: XQ ^' Χ- ^ ^'

x _t0 = H * x _tl with x _t0 = y ₀ , x _tl - Vi

L 1 -1 Li J

Here, the vector x _{t0 describes} the 3D correspondence at time t-0 of the vector x _tl at time t-1. Homography can be calculated image-based by knowing four point correspondences (see Tutorial: Multiple View Geometry, Hartley, R. and Zisserman, A, CVPR June 1999:

https: // en. scribd. com / document / 96810936 / Hartley-Tut-4up from ^¬ launched on 26/09/2016). The relationships given on page 6 of the tutorial top left (slide 21) can be formulated in the notation of equation 1 as follows:

Alternatively, knowing the camera translation t, the rotation R and the distance d along the normal vector n of the plane, the homography of Equation 3 can be calculated. Equation 3 illustrates that with an inverse TTC t / d not equal to zero, planes with different orientation n can be modeled and that flattening with identical orientation n can be separated by their inverse TTC.

«= [« - £] (3) A homography can theoretically be decomposed into the normal vector n, the rotation matrix R and the inverse TTC t / d. Unfortunately, this decomposition is numerically extremely unstable and sensitive to measurement errors.

If you describe a scene through layers, you can segment it as shown below.

Fig. 4 shows schematically a subdivision into cells (grid, grid / lines). The scene is divided into NxM initial cells, and each point correspondence is a unique ID-assigned ^¬. This ID first indicates the affiliation to a cell. In the further course, the ID can indicate the membership of a cluster or an object.

Hatched is an object (especially a backplane) in the foreground. The background is white. If a cell contains only one object (cells B3, D3), homography will describe this cell very well. However, if a cell contains more than one object (cell C3), the homography will not describe any of the two objects well. If the point correspondences (black dot or black cross or x) are assigned to the clusters (or segment) of the neighboring cells (B3 or D3) via their backprojection errors, the black dot becomes the segment of cell B3 and the black cross assigned to the segment of the cell D3, since the homography for the cell C3 neither the foreground nor the background well ^¬ writes.

If there is prior knowledge of a scene, the segment sizes can be adapted to the scene, for example, by generating larger areas in the vicinity of the vehicle or in areas with a positive classification answer. For every segment For example, as shown in Eqs. 5-10, a de- termined back / ground and side-plane homography is computed.

The calculation of the baking / ground- and side-plane homography, increases the selectivity, since an homography with fewer degrees of freedom areas, consider the differing levels leg ^¬ can model poorly and thus corresponding points will have a higher rear-projection error, see Figure 4. The backprojection error e ± is therefore a measure of how well a point x at time t-0 is described by the homography of a plane i of the corresponding point at time t-1: x to (4) Set the static mounting position Given the camera and camera rotation in two different views as given (eg by knowing the camera calibration and by calculating the fundamental matrix in a monocular system or by rotation values of a rotation rate sensor cluster), the inverse TTC t / d is reversed calculate the static camera rotation compensated flux vectors, as exemplified below for a ground plane n '= [010] is shown. If the rotation is not known, it can be approximately replaced by a unit matrix.

If we replace the quotient t / d in equation 3 by the inverse

Time

By introducing the constants a, b, c, where is,

Equation 5 gives the simplified form:

(6)

By normalizing the homogeneous coordinates results:

(7)

For more than one measurement, a system of equations of the form Mx = v, with a vector x to be determined, a matrix M and a vector v (see Equation 9), which can be used for at least three image correspondences as support points by e.g. solve a singular value decomposition (singular value decomposition of the matrix) or a least square method.

The derivation of the back and side plane homographies is analogous and yields:

Bigger, consisting of several cell objects to seg ^¬ menting, can be adjacent in a further step

Summarizing cells by using the backprojection errors £ _t o-

Hj x _t ^l _l resp. Y, x _t ^J ₀ - Hi ^ _t i via nodes (see below point 1 .: RANSAC) of the adjacent segments j and i and their homographies are calculated. Two adjacent clusters are summarized if £ _t o- HJX _t ^l _l smaller £ _t £ H o- # _t ii ^st or, for example, normalized to the predicted flow length backprojection error is below an adjustable threshold. In particular, two adjacent clusters can be grouped together if Σχ ο ^- HjX _t ^l _l smaller _{t t} o-

and the two reproduction errors

Hi ^x ti falls below a threshold normalized to the flow length. Alternatively, backprojection errors can be used as potentials in a graph and compute a global solution. The compactness of the clusters can be determined by the edge potentials in the graph.

When the segments have been combined, the homographies are recalculated and the point correspondences are assigned to the clusters with the least backprojection error. If you only consider directly adjacent clusters, very compact objects can be generated. If the minimum error exceeds an adjustable threshold, the correspondences are assigned new (cluster / object) IDs in order to be able to recognize partially concealed objects or objects with slightly different TTCs. By setting the threshold, the resolution (slightly) of different objects can be adjusted.

The backprojection errors can be biased to reduce costs for contiguous areas or a bias that increases the cost of an ID change if point correspondences have had the same ID affiliation over an extended period of time. Fig. 5 shows an example of scene segmentation:

Fig. 5a shows an image taken by a vehicle camera disposed inside the vehicle and detecting the surrounding environment through the windshield. You can see a three-lane road (51), eg a motorway. The lanes are separated by corresponding lane markings. On all three lanes drive vehicles. The preceding vehicle in the same lane vehicle (53) covered possible SHORT- more present on the own lane vo ^¬ out moving vehicles. On the left of the three-lane road is a built-up raised boundary (52) to the opposite lane. Right of the three-lane road (51) is a Randbzw. Hard shoulder bordered to the right by a guardrail, followed by a forest area. At some distance in front of their own vehicle gantries (54) can be seen, one of which spans the three-lane roadway (51). Analogous to the method described with reference to FIG. 4, this scene can be segmented. FIGS. 5b to 5d show cells (56). Point correspondences (55) are shown in the cells. The assignment of a cell (56) to a segment is shown by the color of the cell frame or point correspondences (55).

Fig. 5b shows the red channel of the segmented image,

Fig. 5c the green channel and Fig. 5d the blue channel.

Different segments were provided with different colors. A segment, which is green in the original, extends over the lowest five to six lines (shown correspondingly white in FIGS. 5b and 5d and without a cell frame). This segment corresponds to the ground plane, ie the surface of the road (51) on which your own car drives. Another segment can be seen in the middle of the picture, in the original it is pink. Therefore, it has high red values in FIG. 5b, weaker blue values in FIG. 5d and no green values in FIG. 5c. This segment corresponds to the backplane of the (lorry) vehicle (53) traveling in its own lane.

The segmentation result shown was determined without prior knowledge of the scene in only three iterations. This shows the enormous speed and performance of an ^{embodiment of} the invention by temporal integration.

FIG. 6 shows a determination of the orientation of planes in the scene already described in FIG. 5. FIG. 6a again shows the environmental situation according to FIG. 5a for orientation. All correspondences associated with a sidewall plane are shown in Fig. 6b. The correspondences at the left edge were assigned to a right side wall plane, which is true, as there is the right side of the structural boundary (52) to the opposite lane in the picture. The correspondence in the right half were left side wall levels associated, which is also applicable, since there is located "left side" of the road-edge development or -bepflanzung in the image. FIG. 6c shows the correspondence of a bottom level are supplied ^¬ assigns what N is because there in the picture, the surface of the road (51) can be seen.

Fig. 6d shows which correspondences are assigned to a backplane. That is largely true. From this determination alone different backplane levels can not yet be sufficiently distinguished, for example, from the on the same lane ahead of delivery vans (53) of the arranged in the picture above signs of the gantry (54). However, important information can already be found in this representation of where elevated objects occur in the surroundings of the vehicle. As illustrated in Fig. 7, for detecting dyna ^¬ mixer objects inverse TTC (t _x, t _y, t _z) be used. FIG. 7a again shows the image of the vehicle situation (identical to FIG. 6a). The vehicle (73) driving ahead in its own lane is a delivery truck. On the left lane are two vehicles (71 and 72) and on the right lane two more vehicles (74 and 75).

Fig. 7b shows correspondences, which in turn correspond to the ground level (purple in the original) and the only one have a red component.

Fig. 7c shows correspondences associated with moving objects. These are green in the original, if they move away from their own vehicle (ie drive faster) or turquoise, if they drive slower.

Fig. 7d shows correspondences with blue component, ie those corresponding to the ground plane (see Fig. 7b), moving objects approaching the own vehicle (see Fig. 7c) and those corresponding to static raised objects shown only in Fig. 7d, such as Forest areas left and right of the highway and the gantries. From FIGS. 7c and 7d in common, it can be seen that the vehicle is approaching on its own lane (73). The same applies to the front vehicle in the right lane (75). On the other hand, the other vehicles (71, 72 and 74) move away.

The area that corresponds to the sky in the image leads to lack of structure in the image to no correspondences (white in Fig. 7b to 7d).

Is included in the correspondence before calculating the homography the self-rotation, or the self-rotation in the rotation matrix R is taken into account, overtaking vehicles leave because of your negative t recognize _such component or ausscherende or driving in a curve vehicles through a lateral t _x component recognize nonzero. If the dynamic segments are predicated on their homographies (see "Dynamic Optical Compaction Compression Based on Homologies" below), a dynamic map can be constructed over time Considering Equation 3, it can be seen that segments with an inverse TTC equal to zero describe the rotation matrix and can be computed by computation Homography with full degree of freedom (equation 2) from segments with t / d equal to ^0. Assuming that the translational components in the vicinity of the epipole are not noticeable, one can also determine the pitch and yaw rate by the coordinates of the epipole (x _e , y _e ) are predicted by the homography of static segments and the atan ((x _e0 -x _el ) / f) or atan ((y _e0 -y _el / f) with the one pixel related focal length / is calculated.

If a homography with all degrees of freedom is calculated for each cluster, they can also be used to reconstruct the 3D environment by using the predicted position H * x _tl for triangulation instead of the measured position x _t0 . This not only reduces the influence of measurement errors, but also allows objects near the epipole to be reconstructed.

Hereinafter, an embodiment for condensing the optical flux based on homographs will be described.

If the segmentation is known at time t-1, it can be used both to predict the objects and to generate a dense flow field. Signature-based flow methods generate signatures and try to uniquely assign them in consecutive frames. In most cases, the signatures are calculated from a patch (image section or image area) of defined size. However, if the size and shape of a patch change, this is a correspondence finding with a fixed template (Template pattern, meaning, for example, an image section of an image of the image sequence, corresponding to an object - when ^¬ play, a vehicle-template) no longer possible. For example, when approaching a backplane, the size of a patch changes. Or when moving over a ground plane or parallel to a side plane, both size and shape of a patch change, see Figs. 1 and 2). If the segmentation is present at time t-1, the homogra- phies can be recomputed over already found flow vectors and used to predict the position and form of already established correspondences from t-1 to t-0.

Alternatively, the current frame may be transformed to time t-1 at time t-0 to compensate for scale and shape changes.

Fig. 8 illustrates such a procedure.

Fig. 8a shows an image of another driving situation taken by the vehicle camera at a time t-1. You can see a highway with three lanes in each direction of travel. To the left of the own three-lane roadway is a guardrail (81) as a raised boundary to the opposite lane. To the right of the roadway is a noise barrier (82). Fig. 8b shows an image that was taken at the next time t and transformed via the homography of the guardrail in such a way ("warped", English: to warp) was that occurring as a result of the movement of the vehicle and thus the vehicle camera between the two recording times In Fig. 8b, the forward movement of the own vehicle results in that the closest tine of the lane marking is closer to the own vehicle than in Fig. 8a to the trapezoidal displacement of the image, which is illustrated in Fig. 8f by a dashed line.

Fig. 8c now shows corresponding features (85), which were determined in the area of the guardrail (81, see Fig. 8a), as white dots.

Fig. 8d shows where these corresponding features are to be expected in the next image (86) after being transformed as described for Fig. 8b.

In Fig. 8e and 8f, this fact is shown again in a black and white representation, wherein the corresponding features (85) now correspond to the black dots on the guardrail (81) in the left half of the picture.

In order to generate a dense flow field, the current image can be warmed to the previous image for each segment in order to find existing correspondences that have changed in their scale or shape, or to establish new correspondences by means of congruent templates.

If there are not enough flow vectors in a current frame to recalculate a homography, the approximate homography from the last frame can be used to make the correspondence finding more robust against shape and scale changes.

The following embodiments or aspects are advantageous and can be provided individually or in combination:

1. The image is subdivided into NxM cells and the point correspondences of a cell are assigned a unique cell ID. From the correspondences with same IDs are calculated using the RANSAC baking / Si ground- and de-plane homographies (Equation 9, 10 and 10) and both the homography with the smallest rear projection ^¬ jektionsfehler, as well as the reference points used to calculate the homography stored. In RANSAC (RAndom SAmple Consensus) methods, a minimum number of randomly selected correspondences are usually used in each iteration to form a hypothesis. For each corresponding feature, a value is then calculated that describes whether the corresponding feature supports the hypothesis. If the hypothesis reaches out ^¬ ranging support through the corresponding characteristics which are not under-support alarm collapsing corresponding features can be discarded as outliers. Otherwise, a minimum number of correspondences will be randomly selected again.

For neighboring cells i, j, the rear-projection error ^¬ Σ be - or HJX _t ^l _l. , x _t ^J ₀ - iX _t ^J _l calculated over the nodes of the adjacent homography. If the backprojection error Y, x _t ^l o ^~ Hj ^x ti is smaller than Σ ^- ^ i ^x ti or ^if the errors are below a threshold normalized to the flow length, the IDs are combined and the homographies recalculated. In particular, two neighboring cells as ^¬ be at the same level (or to the same segment or to the same object) can be clustered related if the Rear-Ceiling ektionsfehler

less than

- HiX ^l _tl ) and if both backprojection errors Σ _ί 0 ο ^- HjX ^l _tl ) and Σ _ί 0 ο ^- HiX ^l _tl ) fall below a normal nor ^¬ mierte threshold. The backprojection errors x _t Q- H _t x _t all

Point correspondences are for the adjacent segments and a point correspondence is assigned to the segment with least backprojection error. If the minimum error exceeds a threshold, the correspondences are provided with a new object ID in order to be able to recognize even smaller or partially hidden objects. The homographies of the segments extracted at time t-1 are recalculated at the beginning of a new frame (t-0) via the already found image correspondences and the already existing segment IDs are predicted in the current frame. If there are not enough flow vectors available in the current frame to recalculate a homography, the approximate homographies from the last frame can be used. To generate a dense flow field, the current frame (t-0) for each segment is warmed to the last frame (t-1) to find already existing correspondences that have changed in their scale or shape, or new correspondences to establish. The rear projection error of the baking / ground- and Side-tarpaulin can be used to validate raised objectives, see Fig. 6. For example, if in a vehicle stereo camera a Dispari ^¬ tätskarte present, the absolute velocities from the inverse TTC can t / are calculated d, because then there are the absolute distances d for individual pixels in the disparity map. If a complete homography with all degrees of freedom is calculated for each segment, it is possible to use segments with of a TTC close to infinity (or inverse TTCs approaching zero), the rotation matrix R is determined. The 3D environment can be reconstructed from the predicted position (Hx _tl , x _tl ) instead of the measured position (x _Q , x _tl ) and also allows to reconstruct objects on the epipole.

Claims

claims

A method of detecting objects from a sequence of images of a vehicle camera comprising the steps of:

d) assignment of determined corresponding features in an image area to a plane in space, and

f) determining additional corresponding features in the image area taking into account the associated level.

2. The method of claim 1, comprising the step:

c) Calculation of homographies for the determined corresponding features in an image area so that they can be assigned to a plane in space.

3. The method of claim 2, comprising the steps: d2) assignment of the determined corresponding features to one of a plurality of planes predetermined orientation in space, and

e) Assignment to the plane in space that gives the smallest backprojection error for the identified corresponding features, the backprojection error indicating the difference in between the measured correspondence of a feature in two consecutive images and the prediction of the feature from the computed homography.

4. The method according to any one of the preceding claims, comprising the step: d3) assignment of the determined corresponding features in an image area to a respective ground level, a backplane or a sidewall level.

5. The method of claim 4, wherein the at least one backplane is calculated according to

where a, b, c are constants, xo, yo, xi, yi correspondences in a first image (index 0) and a second image (index 1) and t _x , t _y , t _z , the components of the vector t / d t describes the translation of the vehicle camera and d the distance to a plane.

6. The method of claim 4 or 5, wherein the at least one ground plane is calculated according to

7. The method of claim 4, 5 or 6, wherein the at least one sidewall plane is calculated according to

8. The method according to any one of the preceding claims, wherein starting from a determined division of an image into different image areas with associated levels for a second image are first corresponding to each other Determines features in one image area in the second and subsequent third image,

From these first corresponding features, a homography for this image area is recalculated, and the newly calculated homography is used to predict the position and shape of additional corresponding features in the third image.

9. Method according to one of the preceding claims, wherein if in a subsequent third image in an image area not enough first corresponding features for the recalculation of a homography can be determined,

the homograph from the second image calculated for the image area is used to predict the position and shape of additional corresponding features of the image area in the third image.

10. The method according to claim 1, wherein, for each image area with an associated plane, a current image is warmed to a previous image in accordance with the calculated homography in order to determine additional features corresponding to one another in the last image and in the current image.

11. An apparatus for detecting objects from a sequence of images of a vehicle camera, comprising:

a camera controller which is adapted to aufzu a sequence of images with the vehicle camera to take a) ^¬;

and evaluation electronics which are designed to determine b) corresponding features in one image area in two consecutive images, d) determined corresponding features in an image ^¬ area to a plane in space assigned, and

f) determine additional corresponding features in the image ^¬ area taking into account the associated level.