WO2000049525A1

WO2000049525A1 - Method and system for storing at least one image and its associated relational information

Info

Publication number: WO2000049525A1
Application number: PCT/DE2000/000386
Authority: WO
Inventors: André KAUP; Jörg Heuer
Original assignee: Siemens Aktiengesellschaft
Priority date: 1999-02-18
Filing date: 2000-02-09
Publication date: 2000-08-24
Also published as: DE19906830A1

Abstract

The invention relates to a method for storing at least one image by means of a computing unit, according to which relational information is stored such that it is associated with said at least one image.

Description

description

METHOD AND ARRANGEMENT FOR STORING AT LEAST ONE PICTURE WITH RELATED RELATIONAL INFORMATION

The invention relates to a method and an arrangement for storing at least one image by a computer.

A method for image compression with the associated arrangement is known from [1]. The known method serves as a coding method in the MPEG standard and is essentially based on the hybrid DCT (Discrete Cosine Transformation) with motion compensation. A similar procedure is used for video telephony with nx 64kbit / s (CCITT recommendation H.261), for TV contribution (CCR recommendation 723) with 34 or 45Mbit / s and for multimedia applications with 1.2Mbit / s (ISO-MPEG-1) is used. The hybrid DCT consists of a temporal processing stage, which takes advantage of the relationship between successive images, and a local processing stage, which uses correlation within an image.

The local processing (intraframe coding) essentially corresponds to the classic DCT coding. The image is broken down into blocks of 8x8 pixels, each using

DCT can be transformed into the frequency domain. The result is a matrix of 8x8 coefficients, which approximately reflect the two-dimensional spatial frequencies in the transformed image block. A coefficient with frequency 0 (DC component) represents an average gray value of the image block.

After the transformation, data expansion takes place. However, a concentration of energy around the DC component will take place in natural images, while the highest-frequency coefficients are usually zero. In a next step, the coefficients are spectrally weighted, so that the amplitude accuracy of the high-frequency coefficients is reduced. Here one takes advantage of the properties of the human eye, which resolves high spatial frequencies less accurately than low ones.

A second step of data reduction takes the form of an adaptive quantization, by means of which the amplitude accuracy of the coefficients is further reduced or by which the small amplitudes are set to zero. The measure of

Quantization depends on the fill level of the output buffer: If the buffer is empty, fine quantization takes place, so that more data is generated, while if the buffer is full, it is coarser, which reduces the amount of data.

After quantization, the block is scanned diagonally ("zigzag" scanning), followed by entropy coding, which brings about a further reduction in data. Two effects are used for this:

1.) The statistics of the amplitude values (high amplitude values occur less frequently than small ones, so that long code words are assigned to the rare events and short code words to the frequent events (variable-length coding, VLC). This results in a lower data rate on average than with a fixed word length coding. The variable rate of the VLC is then smoothed in the buffer memory.

2.) One takes advantage of the fact that in most cases only zeros follow from a certain value. Instead of all these zeros, only an EOB code (End Of Block) is transmitted, which leads to a significant coding gain in the compression of the image data. Instead of the output rate of 512bit, for example, are then only 46bιt for this block, which corresponds to a compression factor of over 11.

A further compression gain is obtained through the temporal processing (Interfra e coding). A lower data rate is required for coding differential images than for the original images, because the amplitude values are much lower.

However, the time differences are only small, even if the movements in the picture are small. If, on the other hand, the movements in the picture are large, large differences arise, which in turn are difficult to code. For this reason, the picture-to-picture movement is measured (movement estimation) and compensated before the difference is formed (movement compensation).

The motion information is transmitted with the image information, usually only one motion vector per macroblock (e.g. four 8x8 image blocks) is used.

Even smaller amplitude values of the difference images are obtained if a motion-compensated bidirectional prediction is used instead of the prediction used.

In a motion-compensated hybrid, it is not the image signal itself that is transformed, but the temporal one

Differential signal. For this reason, the coder also has a temporal recursion loop, because the predictor must calculate the prediction value from the values of the (coded) images already transmitted. An identical time recursion loop is in the decoder, so that the encoder and decoder are completely synchronized.

There are three main methods in the MPEG-2 coding method with which images can be processed: I-pictures: No temporal prediction is used for the I-pictures, ie the picture values are directly transformed and encoded. I-pictures are used in order to be able to start the decoding process again without knowing the past, or to achieve a resynchronization in the event of transmission errors.

P-pictures: A temporal prediction is made based on the P-pictures, the DCT is based on the temporal

Prediction error applied.

B-pictures: With the B-pictures, the temporal bidirectional prediction error is calculated and then transformed. The bidirectional prediction works basically adaptively, i.e. forward prediction, backward prediction or interpolation are permitted.

An image sequence is used in MPEG-2 coding m so-called GOPs

(Group Of Pictures) divided, n pictures from one I picture to the next form a GOP. The distance between the P-pictures is denoted by m, where there are m-1 B-pictures between the P-pictures. The MPEG syntax, however, leaves it up to the user to choose m and n. m = 1 means that no B-pictures are used, and n = 1 means that only I-pictures are encoded.

[2] discloses a method for estimating movement in the context of a method for block-based image coding. It is assumed that a digitized image has pixels which are combined in m image blocks of 8x8 pixels or 16x16 pixels. If necessary, an image block can also comprise several image blocks. An example of this is a macro block with 6 picture blocks, of which 4 picture blocks for Brightness information and 2 image blocks for color information are provided.

In the case of a sequence of pictures, the following is done for a picture to be coded, taking into account the picture blocks of this picture:

■ It is used for the image block for which a motion estimation is to be carried out in a temporally previous image, starting from an image block which was in the same relative position in the previous image

(= previous image block), a value for an error measure is determined. For this purpose, a sum is preferably determined via the amounts of the differences from the coding information associated with the pixels of the image block and the previous image block.

Coding information here means brightness information (luminance value) and / or color information (chrominance value), each of which is assigned to a pixel. "In a search space of predeterminable size and shape around the starting position in the previous image, a value of the error measure is determined for an area of the same size of the previous image block, shifted by one or half an image point.

^■ In a search space the size of nxn pixels, there are n (error) values. The shifted previous image block in the temporally preceding image is determined for which the error measure gives a minimum error value. For this

Picture block is assumed that this previous picture block best matches the picture block of the picture to be encoded for which the motion estimation is to be performed. ^■ The result of the movement estimation is a

Motion vector with which the displacement between the picture block in the picture to be coded and the selected picture block is described in the two previous picture.

Compression of the image data is achieved in that the motion vector and the error signal are encoded.

■ In particular, the motion estimation is carried out for each image block of an image.

The use of motion estimation in the context of block-based or object-based image coding is described in [3].

In general, it is almost impossible to search for content in image data - especially in compressed image data. Such a search would have to be based on any objects in the image sequences which are not available in a descriptive form but are only part of the image data stream.

The object of the invention is to make an image data stream searchable with regard to the information contained in the image data.

This object is achieved in accordance with the features of the independent claims. Further developments of the invention also result from the dependent claims.

To achieve the object, a method for storing at least one image by a computer is specified, in which relational information associated with the at least one image is stored.

This relational information can in particular be stored together with the at least one image. Alternatively, a reference (pointer, pointer) to the relational information can be stored together with the image. A further development consists in that the -relational information is determined before the storage.

It is also a further development that the relational information is a feature information and a

Reference information between objects and / or images includes. The feature information provides e.g. Information about a movement feature, the reference information creates the link to the object or image for which the feature information is relevant.

In particular, the relational information identifies information relating to a predefined relationship between two objects, wherein on the one hand the information for the type of relationship (feature information) and the objects involved in the relationship (reference formation) can be combined in the relational information.

It should be pointed out here that the association of the relational information with the image can be realized in such a way that a reference to the relational information is stored. It is not necessary to use the same memory for the information and the image data. A division over any storage locations is possible, preferably a link information (pointer) is stored, on the basis of which the actual information can be found.

A further development consists in the fact that the at least one picture is a sequence of several pictures.

When storing images, additional information, referred to here as relational information, is accordingly determined and stored with the images. The type of relational information explained below enables a later search for certain image data. The search preferably takes place in the data of the relational information (s), the image data, which is preferably in compressed form do not need to be specially restored. Using this search at a high level of abstraction "red car drives from left to right through the picture", specific pictures, here the red car, can be found. Such a possibility in (in particular compressed) image data has not existed up to now.

Image compression can in particular be based on an image compression standard, e.g. an MPEG or an H.26x standard.

A further development consists in the relational information comprising at least one of the following options:

a) Movement information:

As stated in [2], motion information (especially between objects) can be determined automatically from image data. As of MPEG-4, objects can be identified in a picture; the picture itself is hierarchical (comparable to a tree structure). The hierarchical relationships of the objects to one another can be supplemented by movement information between the respective objects. This movement information identifies the relative movement of the connected objects. The total movement (relative and absolute) of all objects existing and relevant in the picture results from the complete hierarchical structure. The hierarchical structure of the image (or the scene) can be given according to different specifications: An example is an "included-in" relationship, that is, the hierarchical structure indicates which objects are (at least partially) contained in other objects. Other examples of hierarchical division of the scene are also possible. b) Distance information:

Instead of or in addition to the movement information, the distance information between objects can be determined and stored. The distance can be determined, for example, on the basis of an edge boundary or a center of gravity of an object. The distances between the multiple objects in the scene are fully described.

c) Overlap information:

With the overlap information, the type or degree of overlap between objects is recorded as relational information. The sum of the overlaps results in the arrangement of the objects within the scene.

d) Information regarding a relationship between objects and / or images:

In general, any relationship between objects and / or images can be used as relational information. The hierarchical arrangement of the objects of a scene described above can take place according to the chosen relationship.

The following parameters in particular can be recorded for the movement information: translation (along the coordinate axis (s)), rotation and zoom (enlargement / reduction) of the object.

Furthermore, transformation information over time can also serve as relational information. In such a case, objects / images are preferably transformed over a predetermined period of time, the transformation yielding values that provide average values of the movement over time. Such an average is obtained, for example, by means of a discrete cosine transformation (DCT). It should be expressly noted here that the relational information can be determined in particular between two images or between two objects of an image, taking into account the change in the relational information over time (for example movement information).

As already mentioned above, an image or a scene can comprise a large number of objects which are connected to one another and whose position changes differently over time. The relational information can be determined between two objects according to their hierarchical arrangement. Alternatively, the relational information can also be determined based on absolute information (e.g. absolute coordinates within the image). From the absolute information, the information of the objects to each other emerges and vice versa.

In one embodiment, the relational information is added to a feature set using an image compression method. The image compression process is particularly standardized. Examples are an MPEG standard or an H.26x standard.

The method described can preferably be used in the context of encoding using an image compression method.

A further development consists in that image data which have been stored according to the described method can be accessed selectively by means of suitable

Search mechanisms that relational information is implemented. For example, the movement information between objects that is stored in the feature set can be searched and found in a targeted manner. This makes it possible to search for the red car that moves from left to right through an image mentioned at the beginning. It should be pointed out that the search itself can use the relational information described by the method based on different functionality. This enables an "intelligent" evaluation of the different information within an application-specific search. Relational information alone enables the search in image data that otherwise do not have any searchable features.

To solve the problem, an arrangement for

Saving of at least one image specified, in which a processor unit is provided which is set up in such a way that relational information associated with the at least one image can be stored.

This arrangement is particularly suitable for carrying out the method according to the invention or one of its developments explained above.

Embodiments of the invention are illustrated and explained below with reference to the drawing.

Show it

Fig.l a scene that is hierarchically divided into three objects;

2 shows an image sequence which represents a scene over the course of time;

3 shows a possibility for storing object-related

Image data;

4 is a sketch illustrating a transmitter and receiver for image compression; 5 shows a sketch with an image encoder and an image decoder in greater detail;

6 shows a processor unit;

7 shows an alternative embodiment for storing object-related image data.

1 shows a hierarchical structure consisting of a square 101, a rectangle 102 and a triangle 103 in the form of a tree diagram. The connections 104 and 105 between square 101 and rectangle 102 and between square, respectively

101 and triangle 103 correspond to an "included-in" relation, i.e. the square 101 contains both the rectangle

102 as well as the triangle 103.

This relationship is illustrated with the help of Fig.2. 2 comprises a scene which is shown in different forms 201, 202, 203 and 204 over time. The objects of the hierarchical structure of

Fig.l are present in every temporal version of the scene.

Thus, the square 101 moves from its starting position 205 downward to the left 206, further downward 207 and then to the right 208. Within the square 101, the rectangle 102 remains unchanged during the changes over time (indicated by arrows 217, 218 and 219) Position at top left (see positions 209, 210, 211 and 212). The triangle 103 is also contained in the square 101 and moves gradually from an initial position 213 in the different time steps 217 to 219 (see positions 214, 215 and 216). The relations 104 and 105 from FIG. 1 can thus be expanded by the time step

Motion information of the two connected objects square 101 and rectangle 102 (for the connection 104) and square 101 and triangle 103 (for connection 105). - The relative change in position per time step is preferably specified using the parameters translation (along the coordinate axes), rotation and zoom.

Methods for motion estimation, as mentioned in the introduction, can also be used here.

3 shows a possibility for storing image data, in particular dividing an image into objects, e.g. according to the MPEG-4 standard. A sequence 301 of image data for an object 1 and a sequence 302 of image data for an object 2 are shown

Feature set, which contains both intrinsic data 303 and 305 (e.g. shape and color of the object) and relational information 304 and 306.

The relation is preferably also supplemented by a reference 315 or 316 (pointer). This reference represents the linking of the hierarchically structured objects. In the example of FIG. 1, object 1 corresponds to square 101 and object 2 corresponds to rectangle 102. Arrow 316 indicates the relation "contains" and arrow 315 indicates that

Relation "contained in". The change in position between object 1 and object 2 for sequences 301 and 302 is also stored in the fields for relational information 304 and 306, respectively.

The object-related data 307 to 310 (for object 1) or 311 to 314 (for object 2) determine the respective sequences 301 and 302. Relational information is determined and stored for these sequences, in particular each sequence is interpreted as a "global" movement, i.e. for the

A set of features (303 and 304 or 305 and 306) is determined and stored in its entirety. FIG. 1 shows an arrangement which comprises two computers and a camera, with image coding, transmission of the image data and image decoding being illustrated.

A camera 1101 is connected to a first computer 1102 via a line 1119. The camera 1101 transmits captured images 1104 to the first computer 1102. The first computer 1102 has a first processor unit 1103, which is connected via a bus 1118 to an image memory 1105. The image coding methods are carried out with the processor unit 1103 of the first computer 1102. Image data 1106 encoded in this way is transmitted from the first computer 1102 to a second computer 1108 via a communication link 1107, preferably a line or a radio link. The second computer 1108 contains a second processor unit 1109 which is connected to the image memory 1111 via a bus 1110. Methods for image decoding are carried out on the second processor unit 1109.

Both the first computer 1102 and the second computer 1108 each have a screen 1112 or 1113 on which the image data 1104 are visualized. Input units are provided for operating both the first computer 1102 and the second computer 1108, preferably a keyboard 1114 or 1115, and a computer mouse 1116 or 1117.

The image data 1104, which are transmitted from the camera 1101 to the first computer 1102 via the line 1119, are preferably data in the time domain, while the data 1106 which are transmitted from the first computer 1102 to the second computer 1108 via the communication link 1107, Image data are in the spectral range. The decoded image data is displayed on a screen 1120.

5 shows a sketch of an arrangement for carrying out a block-based image coding method.

A video data stream to be encoded with chronologically successive digitized images is fed to an image coding unit 1201. The digitized images are divided into macro blocks 1202, each

Macroblock has 16x16 pixels. The macro block 1202 comprises 4 picture blocks 1203, 1204, 1205 and 1206, each picture block containing 8x8 picture elements to which luminance values (brightness values) are assigned. Furthermore, each macro block 1202 comprises two chrominance blocks 1207 and 1208 with chrominance values (color information, color saturation) assigned to the pixels.

The block of an image contains a luminance value (= brightness), a first chrominance value (= hue) and a second chrominance value (= color saturation). The luminance value, first chrominance value and second chrominance value are referred to as color values.

The image blocks are fed to a transformation coding unit 1209. In the case of differential image coding, values to be coded from image blocks of temporally preceding images are subtracted from the image blocks currently to be coded; only the difference formation information 1210 of the transformation coding unit (discrete cosine

Transformation, DCT) 1209. For this purpose, the current macro block 1202 is communicated to a motion estimation unit 1229 via a connection 1234. Spectral coefficients 1211 are formed in the transformation coding unit 1209 for the picture blocks or difference picture blocks to be coded and are fed to a quantization unit 1212. This Quantization unit 1212 corresponds to the quantization device according to the invention.

Quantized spectral coefficients 1213 become both a scan unit 1214 and an inverse

Quantization unit 1215 fed in a reverse path. After a scanning process, e.g. a "zigzag" scanning method, entropy coding is carried out on the scanned spectral coefficients 1232 in an entropy coding unit 1216 provided for this purpose. The entropy-coded spectral coefficients are transmitted as coded image data 1217 via a channel, preferably a line or a radio link, to a decoder.

An inverse quantization of the quantized spectral coefficients 1213 takes place in the inverse quantization unit 1215. Spectral coefficients 1218 obtained in this way are fed to an inverse transformation coding unit 1219 (inverse discrete cosine transformation, IDCT). Reconstructed coding values (also differential coding values) 1220 are supplied to an adder 1221 in the differential image mode. The adder 1221 also receives coding values of an image block which result from a temporally preceding image after motion compensation has already been carried out. The adder 1221 is used to reconstruct

Image blocks 1222 are formed and stored in an image memory 1223.

Chrominance values 1224 of the reconstructed image blocks 1222 become one from the image memory 1223

Motion compensation unit 1225 supplied. For brightness values 1226, an interpolation takes place in an interpolation unit 1227 provided for this purpose. Based on the interpolation, the number of brightness values contained in the respective image block is preferably doubled. All brightness values 1228 are both the motion compensation unit 1225 and the Motion estimation unit 1229 supplied. The motion estimation unit 1229 also receives the image blocks of the macro block to be coded in each case (16x16 pixels) via the connection 1234. This takes place in the motion estimation unit 1229

Motion estimation taking into account the interpolated brightness values ("motion estimation on a half-pixel basis"). When estimating the movement, absolute differences between the individual brightness values are preferably determined in the macro block 1202 currently to be coded and in the reconstructed one

Macroblock determined from the previous image.

The result of the motion estimation is a motion vector 1230, by means of which a local displacement of the selected macroblock from the temporally previous image to the macroblock 1202 to be coded is expressed.

Both brightness information and chrominance information relating to the macroblock determined by the motion estimation unit 1229 are shifted by the motion vector 1230 and subtracted from the coding values of the macroblock 1202 (see data path 1231).

A processor unit PRZE is shown in FIG. The processor unit PRZE comprises a processor CPU, one

Memory SPE and an input / output interface IOS, which are used in different ways via an interface IFC: output is displayed on a monitor MON and / or output on a printer PRT via a graphic interface. An entry is made using a mouse MAS or a keyboard TAST. The processor unit PRZE also has a data bus BUS, which ensures the connection of a memory MEM, the processor CPU and the input / output interface IOS. Furthermore, additional components can be connected to the data bus BUS, for example additional memory, data storage (hard disk) or scanner. FIG. 7 shows an alternative embodiment to FIG. 3 for storing object-related image data. A sequence 701 of image data for the object 1 and a sequence 702 image data for the object 2 are shown. An intrinsic information belonging to the respective object (shape, color of the

Object) 703 or 704 is stored with object 701 or 702. The relational information 713 is preferably stored separately from the respective objects 701 and 702. Relational information 713 includes feature information 714 relating to the linking of objects 701 and 702, e.g. the movement of the object 1 relative to the object 2. The link itself is established using the reference information 715, 716, which preferably has references to the objects 701 and 702 associated with the feature information 714.

The object-related data 705 to 708 or 704 to 712 each determine a sequence belonging to the respective object. The sequence can include any number of images (for the respective object).

Bibliography:

[1] J. De Lameillieure, R. Schäfer: "MPEG-2 image coding for digital television", television and cinema technology, 48th year, No. 3/1994, pages 99-107.

[2] M. Bierling: "Displacement Estimation by Hierarchical

Block Matching ", SPIE, Vol.1001, Visual Communications and Image Processing '88, S.942-951, 1988.

[3] ITU-T, International Telecommunication Union, Telecommunications Sector of ITU, Draft ITU-T Recommendation H.263, Videocoding for low bitrate communication, 2.5.96.

Claims

claims

1. A method for storing at least one image by a computer, in which relational information associated with the at least one image is stored.

2. The method of claim 1, wherein the relational information on the at least one image is determined.

3. The method according to any one of the preceding claims, wherein the relational information comprises feature information and reference information between objects and / or images.

4. The method according to any one of the preceding claims, wherein the at least one image is a sequence of several images.

5. The method according to any one of the preceding claims, wherein the relational information comprises at least one of the following options: a) movement information; b) distance information; c) overlap information; d) Information related to a relationship between objects and / or images.

6. The method of claim 5, wherein the movement information comprises the following parameters: a) translation, b) rotation, c) zoom.

7. The method according to any one of the preceding claims, wherein the relational information is determined based on a transformation over time.

8. The method according to any one of the preceding claims, wherein the relational information is determined for two images.

9. The method according to any one of the preceding claims, wherein the at least one image contains at least two objects for which the relational information is determined.

10. The method of claim 9, wherein the relational information is determined for two objects.

11. The method according to any one of the preceding claims, wherein the relational information is added to a set of features according to an image compression standard.

12. The method of claim 11, wherein the image compression standard is an MPEG standard or an H.26x standard.

13. The method according to any one of the preceding claims for use in encoding according to an image compression method.

14. The method according to any one of the preceding claims, in which a search in the stored data is made possible on the basis of the relational information stored with the at least one image.

15. Arrangement for storing at least one image by a computer, in which a processor unit is provided which is set up in such a way that relational information associated with the at least one image can be stored.