WO2023160807A1

WO2023160807A1 - Method and image processor unit for processing image data

Info

Publication number: WO2023160807A1
Application number: PCT/EP2022/054830
Authority: WO
Inventors: Noha El-Yamany
Original assignee: Dream Chip Technologies Gmbh
Priority date: 2022-02-25
Filing date: 2022-02-25
Publication date: 2023-08-31
Also published as: TW202335481A

Abstract

The invention refers to a method for processing image data (IMGRAW) of an image sensor (4), wherein the image data comprising a raw matrix of pixels per image,characterised by:a) Activating a mechanical actuator (6) coupled to the image sensor (4) to cause vibrational motion of the image sensor (4); b) Capturing a burst of images while the mechanical actuator (6) is activated;c) Aligning the raw matrices of pixels of the burst of captured images (IMGRAW) to one specific alignment; and d) Combining the burst of images (IMGRAW) to achieve a resulting image (IMGFIN) by use of the plurality of pixels of the raw matrixes available for eachpixel position in the matrix of the resulting image (IMGFIN).

Description

Method and image processor unit for processing image data

The invention relates to a method for processing image data of an image sensor, wherein the image data comprising a raw matrix of pixel per image, i.e. raw image data.

The invention further relates to an image processor unit for processing raw image data provided by an image sensor, said image sensor comprises a sensor array providing a raw matrix of pixel per image.

Further, the invention relates to a computer program arranged to carry out the steps of the aforementioned method.

Digital imagers are widely used in everyday consumer products, such as smartphones, tablets, notebooks, cameras, cars and wearables. The use of small imaging sensors is becoming a trend to maintain a small, light-weight product form factor as well as to reduce the production cost. Even when imaging sensors with a high number of megapixels are used, colour filter arrays (CFAs), such as common Bayer colour filter array, are typically used to reduce the cost. The use of colour filter arrays limits or reduces the spatial resolution, as the full colour image is produced via interpolation (demosaicing) of undersampled colour channels.

Digital zooming is a common feature that achieves zooming into a region of interest (ROI) in the image, enlarging it to a bigger size (even up to the full sensor size), or trimming away the content that is outside the region of interest. Digital zooming is typically cheaper than optical zooming by use of an optical lens unit, as it is performed via image processing operations and not through a complicated opto- mechanical lens system. However, unlike optical zooming, digital zooming does not always produce high-quality images. The quality of digital zooming and the achievable magnification are further limited by the aforementioned resolution-limiting factors, in particular the sensor small size and the use of colour filter arrays. Achieving high-quality zooming with arbitrary magnification, overcoming the imaging system resolution-limiting factors and boosting the true optical resolution are therefore very desirable features which would result in satisfaction of the consumer of the imaging product in question.

The quality and the achievable magnification factor of digital zooming via interpolation of a range of interest (ROI) in a single image is limited. The use of information from multiple frames in the production of the zoomed region of interest could result in a significantly better image quality (IQ). Multi-frame super-resolution (MFSR) is known as a successful means for increasing the spatial resolution as well as details in images. It relies on the fusion of multiple, sub-pixel shifted frames to produce a higher-resolution frame and is therefore a suitable option for digital zooming.

B. Wronski, I. Garcia-Dorado, M. Ernst, D. Kelly, M. Krainin, C. Liang, M. Levoy and P. Milanfar: “Handheld multi-frame super-resolution”, ACM Trans. Graph., Vol. 38, No. 4, Art. 28, July 2019 discloses a multi-frame super-resolution MFSR, wherein a burst of raw frames being shifted by a normal hand shaking or hand tremor in subpixel dimension are fused to produce a higher-resolution frame. This is limited to the Bayer colour filter array and relies solely on the hand tremor, which would not be sufficient to produce zoomed images with arbitrary magnification factors, especially when the sensor colour filter array has a sparser sampling of the colour channels, such as the Hexadeca Bayer CFA and the RGBW CFA (RGBW = red-green-blue- white). A complete RGB image is directly created from a burst of colour filter array raw images, wherein the burst of raw frames is acquired with small offsets due to the natural hand tremor. These frames are then aligned and merged to form a single image with red, green and blue values on every pixel site. The raw image demosaicing is replaced with a multi-frame super-resolution algorithm. N. El.-Yamany and P. Papamichalis: “Robust color image superresolution: An Adaptive M-Estimation Framework”, EURASIP Journal on Image and Video Processing, Vol. 2008, Art. No. 763254, 2008 discloses an adaptive M-estimation framework for robust colour image super-resolution using a robust error norm in the data fidelity term of the objective function and adapting the estimation process to each of the low-resolution frames and each of the colour components. This results in a colour super-resolution image of crisp details and no artefacts, without the use of regularisation.

N. El-Yamany, P. Papamichalis and M. Christensen: “Adaptive framework for robust high-resolution image reconstruction in multiplexed computational imaging architectures”, Applied Optics, Vol. 47, No. 10, 01.04.2008, p. B117-B126 discloses an adaptive algorithm for robust reconstruction of high-resolution images in multiplex computational imaging architectures.

P. Vandewalle, K. Krichane, D. Alleyson and S. Susstrunk: Joint demosaicing and super-resolution imaging from a set of unregistered aliased images”, Proceedings Volume 6502, Digital Photograph III, Electronic Imaging, 2007 presents an algorithm that performs demosaicing and super-resolution jointly from a set of raw images sampled with a colour filter array. The combined approach allows to compute the alignment parameters between the images of raw camera data before interpolation artefacts are introduced. Between the input images, there is some small, unknown motion that can be modelled as planar motion. The motion is estimated using a frequency domain method on the low-frequency information on the Bayer colour filter array images. Luminance and chrominance are separated for each of the input. The higher resolution luminance and chrominance images are computed separately. They are combined to form the higher resolution colour image.

S. Farsiu, M. Elad, P. Milanfar: “Multi-frame demosaicing and super-resolution of color images”, IEEE Transactions on Image Processing, Vol. 15, No. 1 , pp. 141-159, January 2006 discloses a hybrid method of super-resolution and demosaicing based on a maximum a posteriori (MAP) estimation technique by minimizing a multi-term cost function. The difference between the project and estimate of the high-resolution image and each low-resolution image is determined. Outliers in the data and error due to possibly inaccurate motion estimation are removed. Bilateral regularisation is used for spatially regularising the luminance component to improve the sharpness of edges and forcing interpolation along the edges and not across them.

S. Farsiu, M. Elad, P. Milanfar: “Multi-Frame Demosaicing and Super-Resolution from Under-Sampled Color Images”, Proc, of SPIE - The International Society for Optical Engineering, May 2004 explains the fusion of a set of low resolution images with relative motion resulting in a high-resolution image on an example of 16 Bayer pattern low-resolution images. The e.g. Bayer color filtered low-resolution images are fused together increasing the resolution by a factor of 4 in each direction by use of a maximum likelihood estimate of the high-resolution image. This is a shift and add of all low-resolution images. The pattern of the color distribution in the high-resolution image does not necessarily follow the original Bayer pattern, but rather depends on the relative motion of the low-resolution images. The field of view of real world low- resolution images changes from one frame to the other, so that the center and the border patterns of red, green, and blue pixels differ in the resulting high-resolution image.

The object of the present invention is to provide an improved method and an image processor unit for processing image data of an image sensor.

The object is achieved by the method comprising the features of claim 1 , the image processor unit comprising the features of claim 11 and the computer program comprising the features of claim 13. Preferred embodiments are described in the dependent claims.

In order to achieve a burst of images with a sufficient shift of the pixel position, the method comprises the steps of: a) Activating a mechanical actuator coupled to the image sensor to cause vibrational motion of the image sensor, and b) Capturing a burst of images while the mechanical actuator is activated. The burst of images, e.g. a burst of frames of a complete image, are then fused by the steps: c) Aligning the raw matrices of pixels of the burst of captured images to one specific alignment, and d) Combining the burst of images to achieve a resulting image by use of the plurality of pixels of the raw matrices available for each pixel position in the matrix of the resulting image.

This allows to boost the true optical resolution during digital zooming without changes to the existing opto-mechanical lens system in the camera device, thus maintaining small form factor and not increasing the product cost. The method achieves a high- quality digital zooming with arbitrary magnification factors. There is no need for a controlled setup. Further, there is no requirement that the scenes captured meet certain constraints.

Activating a mechanical actuator has the effect that regardless of hand tremor, or in addition to hand tremor, movement of the image sensor is forced in such a way that the pixel position shifts by a considerably larger distance than when capturing sensor images only with hand tremor. Therefore, the method can be applied to various colour filter array arrangements, in particular if the same colour pixel have a larger distance from each other compared to the Bayer CFA.

By activating an actuator of a handheld device, e.g. a vibrator unit as mechanical actuator which is provided primarily for other purposes, no additional hardware for the device is required. The vibrator unit can be e.g. the vibrator unit applied in smartphones, tablets and wearables for signalling purposes. Thus, existing hardware/mechanics and various camera products can be used that are commonly available without additional cost. These mechanical actuators like vibrator units are also not necessarily coupled to the image sensor in order to move the image sensor relative to the housing of the camera device or a holding frame of the camera device.

Thus, the image sensor itself remains in its relative position in respect to the holding frame of the image sensor or the device housing. Activating the mechanical actuator has the effect of vibrational movement of the holding frame or device housing together with the image sensor.

In order to control the capturing of a burst of images (including frames) while having the mechanical actuator activated, it is preferred to select the region of interest (ROI) of an image to be captured and to determine the size of the region and the magnification factor related to the raw image captured by the image sensor before proceeding with the steps a) and d). The selection of the region of interest can be used as a trigger signal for the automated activation of the mechanical actuator in step a) once the region of interest (ROI) is selected.

The size of the region of interest and the magnification factor related to the raw image or raw frame can be automatically determined based on the known size of an image captured by the image sensor and the selected region of interest of the complete image. The magnification factor is the relation between the size of the raw image captured by the image sensor and the selected size of the image to be captured, i.e. the region of interest.

Selecting the region of interest allows the use of reduced memory capacity since only the ROIs buffer is required, in particular in case of memory-limited imaging systems.

The mechanical actuator can be activated dynamically to cause the actuator to vibrate per a pre-defined or a controlled trajectory by adapting the intensity and duration of the vibration. As a result, a sufficient number of sub-pixel shifts for the target magnification factor can be achieved while taking into account the colour filter array arrangement. As a result of the dynamical vibration, the image sensor vibrates dynamically as well. It is possible to program the actuator to vibrate per a pre-defined trajectory, which can be adapted to a respective colour filter array arrangement. This has the effect of a sub-pixel shift which is sufficient to safeguard a combination of a burst of images for the pixel positions such that for each pixel position, all colour information can be achieved from the raw image data of the respective colour filter array. The alignment of the raw matrices in step c) can be performed such that the captured burst of raw frames is aligned to a selected base frame. The pixel data of the aligned frames as well as the pixel data of the base frame are fused in the raw domain in step d) by use of the raw pixel data.

The alignment is also named “registration” of the captured burst of images, e.g. raw frames. The alignment can be performed by a registration of the burst of raw frames to the first base frame being zoomed into. Fusing the information of the registered frame as well as the base frame in the raw domain results in a full colour zoomed region of interest with a higher resolution via a robust multi-frame super-resolution algorithm.

The selection of the base frame as reference frame for the alignment can be performed adaptively, in particular to ensure that only high-quality frames with usable information are used. This can be done, e.g., with the two steps (functionalities):

1 ) Initial selection of the candidate frames for global motion estimation, based on an estimate of frame sharpness. Blurry frames due to optical and/or motion blur can be discarded in the selection step. This can be taken place via a quantification of the optical and/or motion blur in the frame, wherein:

- The gradient is calculated for the frame or the frame ROI;

- The percentage of edge pixels is calculated;

- A predefined threshold is applied on the percentage of edge pixels to decide the usability of the frame for motion estimation and applicability as reference frame for the multi-frame fusion.

2) Confirmation of the frame selection for global motion estimation and subsequent multi-frame fusion, based on an estimate of the inter-frame motion. Frames with a lot of local motion can be discarded in the confirmation step. This can be taken place via a quantification of the inter-frame motion, wherein:

- The percentage of reliable motion regions in the frame is calculated, for example, from the percentage of reliable motion vectors in the motion vector field;

- A predefined threshold is applied on the percentage of reliable motion vectors to decide the usability of the frame for motion estimation and usability as reference frame for the multi-frame fusion. The robust multi-frame super-resolution result of the selected region of interest (ROI) can then be rendered or produced to the user. Thus, the resulting image achieved in step d) can be rendered for a zoomed area of the raw image captured by the image sensor.

The step d) can include a step of colour interpolation of pixel data as well as increase the spatial resolution. The increase of the spatial resolution is primarily the result of the fusion of raw pixel data of a burst of images.

The selected region of interest can also be a full image, e.g. a magnification factor of 1 or a 100 % image size. In this way, the method can be used to boost the two optical resolutions without any digital zooming performed, i.e. it can be used as a demosaicing solution compensating the resolution loss introduced by traditional colour filter array data interpolation. The method for processing image data does not require any separate step of demosaicing since the step can also be performed by fusing the burst of raw pixel data.

The raw frames can be divided into uniformly shaped regions. The step of aligning the raw matrices in step c) can then be performed independently in each region.

For global motion model parameter estimation, the whole frame information can be used, whereas for local motion model parameter estimation, the division of frames into uniformly shaped regions and registration independently in each region can be used.

The raw matrices can be aligned in step c) by use of the colour channel with the highest sampling, e.g. G (green) in the Bayer colour filter array or W (white) in the RGBW colour filter array. Preferably, the colour channel with the highest sampling is used in the registration/alignment procedure as long as it is not saturated or nonzero.

The chosen colour channel of each of the low-resolution frames can be interpolated on the low-resolution grid to fill in the missing values due to the colour filter array sampling. The frames can be aligned to a selected base frame on the low-resolution grid based on the information on the chosen colour channel. This approach improves the accuracy of image registration/alignment as well as the speed, since one colour channel of the highest sampling is used in the frame matching. If there are more than one colour channel that have the same highest sampling in the colour filter array, both could be used in the frame matching for alignment/registration.

The object is further solved by the image processor unit comprising the features of claim 11 or 12 and the computer program comprising instructions which, when the program is executed by a processing unit, cause the processing unit to carry out the steps of the aforementioned method.

In the following, the invention is explained by way of exemplary embodiments with the following figures. It shows:

Figure 1 - Block diagram of an electronic device comprising a camera, an image processor unit and a mechanical actuator;

Figure 2 - Examples of different colour filter array patterns;

Figure 3 - Flow diagram of the method for processing image data of an image sensor;

Figure 4 - Schematic diagram of a handheld device arranged to perform the method for processing image data;

Figure 5 - Schematic diagram of the method steps of aligning and combining a burst of images,

Figure 6 - Example of resampling of low-resolution frames with the Bayer colour filter array pattern on a raw high-resolution grid;

Figure 7 - Example of fewer samples available with missing frame index numbers 9 and 14;

Figure 8 - Example of a twofold magnification of an image with the Bayer colour filter array by use of the method for image processing;

Figure 9 - Table of ideal 36 x, y shifts for a fourfold magnification;

Figure 10 - Example of a fourfold magnification for the Bayer colour filter array frames with the ideal 36 x, y shifts; Figure 1 is an exemplary block diagram of an electronic device 1 comprising a camera 2 and an image processor unit 3 for processing raw image data IMGRAW provided by an image sensor 4 of the camera 2.

The image sensor 4 comprises an array of pixels so that the raw image IMGRAW is a data set in a raw matrix of pixels per image. In order to capture colours in the image, a colour filter array CFA is provided in the optical path in front of the image sensor 4. The camera comprises an opto-mechanical lens system 5, e.g. a fixed uncontrolled lens.

The electronic device 1 further comprises a mechanical actuator 6. The mechanical actuator 6 is provided in the electronic device primarily for other purposes, e.g. for signalling to a user. This is a well-known feature of smartphones for signalling incoming new messages or calls.

In this regard, the electronic device 1 can be a handheld device like a smartphone, a tablet, wearables or a camera or the like.

The image processor unit 3 is arranged for processing image data IMGRAW from the image sensor 4 as described in the following and to control the mechanical actuator 6 in order to capture a burst of images (e.g. frames) per image/frame while imposing motion of the electronic device 1 , including the image sensor 4, over the course of capturing the burst for one image/frame.

The image processor unit 3 is arranged to align the matrices of pixels of bursts of captured images to one specific alignment and to combine the burst of images to achieve a resulting image IMGFIN by use of the plurality of pixels of the raw matrices available for each pixel position and the matrix of the resulting image.

Figure 2 presents different pixel arrays assigned to different colour filter arrays.

In a), the well-known Low-Order Bayer pixel array assigned to the Bayer colour filter array CFA is shown. A number of 2 x 2 blocks including red (R), green (G) and blue (B) pixels are repeated over the 8 x 8 pixel array. In b), a 6 x 6 colour filter array pattern is shown, which forms a higher-order colour filter array pattern compared to the Bayer colour filter array pattern. The 6 x 6 block contains four 3 x 3 blocks of a respective colour red (R), green (G) and blue (B). The 3 x 3 blocks of a specific colour RGB are arranged according to the colour arrangement of a 2 x 2 block in the Bayer filter colour array pattern (R-G-B).

In c), a QuadBayer colour filter array pattern is shown. The higher-order pixel array contains pixels in the 8 x 8 matrix, wherein a 4 x 4 block is repeated, where each 4 x 4 is formed by four 2 x 2 block of a respective colour R, G and B. The 2 x 2 blocks of a respective colour RGB are arranged in the way of the lower-order Bayer pattern R- G-G-B.

In d), a HexaDeca colour filter array is presented, which is formed by a 8 x 8 matrix. Here, four blocks of 4 x 4 array size are assigned to a respective colour R, G, B. The four blocks of the respective colour are arranged in the same order like the Low- Order Bayer colour filter array pattern shown in a).

Figure 3 shows a flow diagram of the method for processing raw image data IMGRAW of an image sensor 4 with the steps: a) Activating a mechanical actuator 6 coupled to the image sensor 4 to cause vibrational motion of the image sensor 4; b) Capturing a burst of images while the mechanical actuator 6 is activated; c) Aligning the raw matrices of pixels of the burst of captured images to one specific alignment; and d) Combining the burst of images to achieve a resulting image IMGFIN by use of the plurality of pixels of the raw matrices available for each pixel position in the matrix of the resulting image IMGFIN.

The step a) of activating the mechanical actuator 6 can be triggered by a previous step of selecting a region of interest ROI in the image into which the user would like to zoom. The selection of the region of interest ROI can be reduced to a selection of a magnification factor. When selecting the region of interest ROI, for example manually on a screen, it is possible for a programmed processor to calculate the size of the selected region of interest ROI as well as the target magnification factor. The target magnification factor is the relationship between the selected region of interest ROI and the full image size provided by the image sensor 4. When defining the magnification factor and the related region of interest ROI, the allowed memory sources in the electronic device 1 can be taken into account.

The electronic device and, in particular, the image processor unit can be arranged in such a way that the user can perform the selection of the region of interest ROI repeatedly to zoom into a progressively smaller region of interest ROI.

Preferably, the selection of the region of interest ROI and/or the magnification factor triggers an activation of the mechanical actuator, i.e. is used as a trigger signal for step a).

As a result of initiating the movement of the mechanical actuator 6 in step a), the user receives a haptic feedback as an acknowledgment for the request of the digital zooming feature. This is because the mechanical actuator 6 of the electronic device 1 is coupled to the housing of the electronic device 1 so that the electronic device vibrates due to the action of the mechanical actuator 6.

As a result of initiating the movement of the mechanical actuator 6, it vibrates, causing the image sensor 4 to vibrate as well. In this regard, the mechanical actuator 6 is not primarily intended for and mounted to the image sensor 4 directly in order to cause a controlled movement of the image sensor 4. The mechanical actuator 4 is rather mounted to the housing of the electronic device 1 or to the frame into the device housing, wherein the image sensor 4 is also coupled to this frame or housing.

Preferably, the housing of the mechanical actuator 6 is coupled to the image sensor 4 and its holding frame, e.g. a printed circled board, directly or indirectly so that no relative movement between the mechanical actuator 6 and the image sensor 4 is implied. Driving the mechanical actuator 6 has the effect of vibration of the holding frame of the mechanical actuator 6 including the device housing 1 and the image sensor 4. After having the mechanical actuator 6 activated in step a), a burst of images is captured in step b) while the mechanical actuator is still activated. The vibration of the electronic device 1 including the image sensor 4 causes sub-pixel shifts of the captured images. In the meaning of the present invention, an image can also be understood as a frame of a full image, i.e. a part image of a full image.

The intensity and duration of the vibration of the electronic device 1 due to the activation of the mechanical actuator 6 can be, optionally, programmed to produce enough sub-pixel shifts for the target magnification factor and to take into account the filter colour filter array arrangement. It is also possible to program the mechanical actuator 6 to vibrate per a pre-defined trajectory.

As a result of the movement of the mechanical actuator 6, the image sensor 4 also vibrates, imaging the same scene with shifted views of it.

While the image sensor 4 is moving due to the vibration caused by the mechanical actuator 6, a burst of raw frame regions of interest ROI are captured. The number of frames can be determined based on statistical analysis of the movement of the mechanical actuator 6, the corresponding generated sub-pixel shifts and the required number of sub-pixel shifts for the target colour filter array arrangement as well as the target magnification factor.

The captured bursts of raw frames or raw frame regions of interest ROIs are then aligned (registered) to a selected base frame (typically the first one being zoomed into) and the information of the registered frame as well as the base frame are fused in the raw domain, producing a full-colour zoomed (higher-resolution) image for the region of interest ROI via a robust multi-frame super-resolution MFSR.

The resulting image IMGriN Of the robust multi-frame super-resolution MFSR algorithm of the selected region of interest ROI can then be rendered/produced to the user.

In this regard, the method provides the steps of: c) Aligning the raw matrices of pixels of the burst of captured images to one specific alignment; and d) Combining the burst of images to achieve a resulting image by use of the plurality of pixels of the raw matrices available for each pixel position and the matrix of the resulting image IMGFIN.

These steps c) and d) are explained later on.

Figure 4 is a schematic diagram illustrating the steps of selecting the region of interest ROI by the hand 7 of a user on the display of the electronic device 1.

This causes movement of the mechanical actuator 6, i.e. it triggers the activation of the mechanical actuator 6, which causes a vibration of the electronic device and provides a haptic feedback to the user as indicated by the arrow hinting up to the first illustrated electronic device 1 .

The activation of the mechanical actuator 6 has the effect that not only the housing of the electronic device but also the image sensor 4 physically move, e.g. vibrate.

The mechanical actuator 6 vibrates for a sufficient amount of time to allow enough raw images/frames to be captured. The intensity and the duration of the vibration can be tuned to generate enough sib-pixel displacements for the target zoom/magnification factor.

Figure 5 is a schematic diagram of steps c) and d) of the method.

The burst of raw images/frames IMGRAwi . n captured by the image sensor 4 comprises for example a matrix of pixels for red (R), green (G), blue (B) and possibly also white (W) colour information.

The sub-pixel shifts between the low-resolution frames IMGRAwi . n are key to the working of the multi-frame super-resolution MFSR algorithm for arbitrary magnification factors r. This requires an image registration method that can find the right shifts (and other motion model parameters) between each of the low-resolution frames IMG_i and the base low-registration frame IMGRAW-I . In addition to the required sub-pixel accuracy, the image registration algorithm should be as fast as possible in order to minimize a latency introduced by the combined operations of the image registration and fusion.

Thus, it is proposed that the registration of the low-resolution frames IMGRAwi . n takes places in the raw colour filter array domain, as the intention is to jointly perform both colour interpolation (demosaicing) and spatial resolution increase. Since the registration in the under-sampled colour filter array space may not result in the required accuracy, the following strategy is pursued, as indicated in Figure 5:

1 . The colour channel with the highest sampling (e.g. G in the Bayer colour filter array or W in the RGBW colour filter array) is used in the registration procedure as long as it is not saturated or non-zero.

2. The chosen colour channel of each of the low-resolution frames IMGRAW-I ... n is interpolated on the low-resolution grid to fill in the missing values due to the colour filter array sampling. The frames are registered to a selected base frame in the low-resolution grid based on the information from the chosen colour channel. This approach improves the accuracy of the image registration as well as the speed, since one colour channel R, G, B of the higher sampling is used in the frame matching. If there are more than one colour channels that have the same highest sampling in the colour filter array, all could be used in the frame matching for registration.

3. For a global motion model parameter evaluation, the whole frame and information can be used. For local motion model parameter estimation, the frame can be divided into uniformly shaped regions and the registration can be performed independently in each region.

After step c) of aligning/registering the raw low-resolution images/frames, a robust adaptive multi-frame fusion takes place in step d) separately for each colour R, G, B.

Image registration errors are a possibility due to inaccuracy in the assumed motion model, occlusion and local motions, among other reasons. Further, the presence of noise should be considered. The multi-frame super-resolution MFSR fusion should be robust to the presence of such inaccuracy including noise in order to render an artefact-free and sharp content.

The robust fusion can be based on re-sending M-estimates known from the prior art. To account for real scenarios, where both global and local motion can be present in the captured low-resolution frames and frame regions of interest ROIs, the frame can be divided into uniformly shaped regions. This is the same division which can be pursed in the registration step. Super-resolution can be pursued as a minimization of the proposed cost function as follows:

where

X = the unknown high-resolution frame

X_c,r = the region error r in the colour channel c of the unknown high-resolution frame and c is the index of the colour channel and the colour filter array, e.g. R, G and B in standard Bayer colour filter array

Fk,_r = the motion (displacement) operator for region r in frame k; in case of only-global-motion models, Fk.r = Fk, k = 1 , 2, ... N and N is number of low-resolution frames/frame regions of interest ROIs

Hr = the point spread function (PSF) of the camera, which is allowed to be spatially-variant, changing per region r, if necessary

D = the down-sampling operator, which is the reciprocal of the target magnification operator and is assumed to be the same for all frames

Sc = the colour filter array sub-operator for the colour channel c

Yk, c, r = region r in the colour channel c of the low-resolution frame #k

Pk,_r = the robust cost function/robust estimator for region r in the low- resolution frame k

M = the number of the colour channels in the colour filter array pattern

R = the number of regions in the frame.

Each of the robust estimator function pk.r has an outlier threshold Tk, r that is dynamically calculated based on the error term Ek, c, r so that it is set to a high value when the error term is small in the region r and set to a small value when the error term is higher for that region r, hence rejecting outliers form the fusion result. For a better understanding of the steps c) and d) of aligning the raw matrices of pixels and combining the burst of images (fusion), the following key points are explained in more detail:

A. Raw data fusion versus demosaiced data fusion

Most of the multi-frame super-resolution solutions developed and available in the prior art rely on fusing full-colour (e.g. full-RGB) low-resolution (LR) frames in order to produce a full-colour higher-resolution frame. The full-colour low-resolution frames are typically produced in consumer cameras via demosaicing, which varies in detail reconstruction capabilities, depending on the sensor colour filter array pattern and the demosaicing solution itself. Interpolation of the under-sampled colour filter array data limits the resolution and detail in each of the low-resolution frames used in MFSR, and hence limits the final quality of the reconstructed high-resolution (HR) frame.

Unlike the fusion of demosaiced frames, raw MFSR fusion (i.e. fusion in the colour filter array data space) would result in higher image quality since colour filter arrray under-sampling is compensated already by the rich content available from the multiple sub-pixel shifted low-resolution frames. The MFSR in this case takes the responsibility for generating full-colour high-resolution frame, i.e. performing colour interpolation as well as spatial resolution increase.

B. Number of low-resolution frames for MFSR

One of the main factors that limits achievable magnification in MFSR in uncontrolled setups is the lack of sufficient sub-pixel displacements between the captured frames that are fused to generate the higher-resolution frame. Normal hand shaking (tremor) in handheld devices, such as smartphones, tablets and wearables is not sufficient to produce enough sub-pixel displacements that are adequate to produce zoomed images at arbitrary magnification factors (small to big), especially with the colour filter array under-sampling of colours. To elaborate on this point, a monochromatic imaging sensor is considered. The ideal 15 integer shifts in the high-resolution grid (sub-pixels shifts in the low-resolution grid) to enable fourfold (4x) magnification using MFSR are listed in the table in Figure 6.

This Figure 6 shows a Bayer colour filter array pattern having a set of 4 x 4 pixel matrices for R-G-B colour.

Figure 6 further shows the resulting pattern after re-sampling the low resolution frames on the raw high-resolution grid based on a raw image captured with the Bayer colour filter array pattern for the ideal 15 integer shifts, i.e. the frames having the index number 1 to 15.

For each pixel position the resulting colour R, G, B is indicated together with the frame index number for this pixel position, which refers to a specific displacement in the high-resolution grid.

The low-resolution (LR) Frame #1 has no displacement and is selected as reference frame. All other LR frames #2 to #16 are to be aligned (registered) to this reference frame #1 . Each of the other frames #2 to #16 has its own specific displacement from the reference LR frame #1 . These displacements are indicated in X- (horizontal) and Y- (vertical) directions as depicted in the table.

Optionally, the reference frame can be adaptively selected. The selection can be processed, in particular, to ensure that only high-quality frames with usable information are used.

The low-resolution frames are resampled to the high-resolution grid, wherein a gap of 4 pixels between neighbouring pixels in the low-resolution grid occur with magnification of 4. This is highlighted for the R-G-G-B pixel of the frame #1 by use of bold surrounding. The 16 displacements cover all necessary shifts in a 4x4 pixel block to enable 4x magnification in the monochromatic case. Thus, each pixel position of the 4x magnified 16x16 high-resolution frame is filled by a sample of a 4x4 low-resolution grid. Thus, the Bayer color filtered low-resolution images are fused together increasing the resolution by a factor of 4 in each direction by shift and add of all low-resolution images as described in S. Farsiu, M. Elad, P. Milanfar: “Multi-Frame Demosaicing and Super-Resolution from Under-Sampled Color Images”, Proc, of SPIE - The International Society for Optical Engineering, May 2004.

Since each low-resolution frame is in planar color-filter-array raw format, the starting pixel on the top-left corner can be any of the colors of the color filter array, i.e. in the example R, G or B, and not necessarily R. The resulting pattern of the color distribution in the high-resolution image does not necessarily follow the original Bayer pattern, but rather depends on the relative motion of the low-resolution images. The field of view of real world low-resolution images changes from one frame to the other, so that the center and the border patterns of red, green, and blue pixels differ in the resulting high-resolution image.

Frame #3 for example after alignment has a starting top-left pixel at sample position B3 with Blue color, and the frame has a displacement in the (horizontal) X-direction of 2 and 0 in the (vertical) Y-direction. The top-left sample B3 appears adjacent to the top-left sample R2 of frame #2.

The sub-pixel shift in the low-resolution grid caused by vibration due to the mechanical actuator is related to the pixel shift in the high-resolution grid as follows. For M-fold magnification, 1/M pixel accuracy is required. Thus, the displacements in the X- and Y-directions of the low-resolution frame with respect to the selected reference frame must be multiple of the defined sub-pixel accuracy, which is 1/M. For 4x magnification, it is required to estimate the displacements between the low- resolution frames with a 0.25 pixel accuracy. For 2x magnification, there is a need to have 0.5 pixel accuracy.

The color-filter array low-resolution frames are resampled to the magnified (e.g. 4x4) high-resolution grid, based on the corresponding displacements, and each frame has its own color sample at the top-left corner pixel. This is indicated in the table of Figure 6 for the example. It is clear that after re-sampling the raw data in the high-resolution grid, it is sparsely distributed and far from the original symmetric Bayer pattern. In case of redundant low-resolution frames, some positions will have more colour samples than others. As a result, the reconstruction of the high-resolution frame would not of consistent image quality spatially.

Figure 7 shows an example where fewer samples are available to fill in the high- resolution grid. In this example, the frame index numbers 9 and 14 are missing because the related pixel data R9, G9, B9, R14, G14, B14 are missing in the resulting re-sampled high-resolution pattern.

Thus, the resulting reconstruction quality would be lower than in the scenario shown in Figure 6. This is indicated by the blanks in the matrix pattern.

For high-quality MFSR reconstruction that successfully overcomes the colour filter array colour under-sampling and boosts true optical resolution at arbitrary magnification factors r, it is preferred to have low-resolution frames with a sufficient number of sub-pixel shifts in the low-resolution grid so that the raw high-resolution grid would be densely populated (filled) with more colour samples available in each position.

The raw color-filter-array data is color-undersampled in the high-resolution grid. For example, for the Bayer pattern, for each position, only one color sample is available for each position in the high-resolution grid. For a perfect super-resolution result, at least three color samples R, G and B are desired for each position on the high- resolution grid. Compared to the monochromatic scenario in figure 6, the 16 displacements are not sufficient.

Figure 8 shows an example of a twofold magnification in the Bayer colour filter array pattern with the ideal 15 (x, y) shifts for the red channel R and correspondingly in other colour channels, where R_i and B_i are the red and blue channel samples of the frame #i, respectively, and the frame #1 is, in this example, the base low- resolution frame. With those shifts listed in the table of Figure 8, there are red, green and blue samples available in each position in the high-resolution grid, enabling high- quality colour interpolation as well as spatial resolution increase. This is indicated for a 4 x 4 matrix in x, y direction.

In the example, for each X-, Y-displacement three frames are considered having a different color at the top-left starting point of the low-resolution grid. This is listed for the Red (R) Channel (X, Y) Shifts and the related R/B Channels in the table b) in Figure 8.

Figure 9 presents a table of the ideal 36 x, y shifts for a fourfold magnification. Again, the shift numbers of pixels in the x and y direction are listed and assigned to a frame number #1 ...36.

Figure 10 illustrates the example of the fourfold magnification in the Bayer colour filter array pattern with the 36 ideal x, y shifts of Figure 9. It is apparent that for each sample in the high-resolution grid, four samples are available with each of the colors R, G and B being represented in the set for the sample.

Thus, it is evident that for high-quality MFSR-based digital zooming or MFSR-based demosaicing, a burst of low-resolution frames or low-resolution regions of interest ROIs with sufficient sub-pixel shifts in the low-resolution grid should be available. This is safeguarded by activating the mechanical actuator 6 of the electronic device 1 , since the normal hand shaking (tremor) is not sufficient to produce such dense sub-pixel distribution in the burst of the captured raw images/raw frames IMGRAW.

Thus, by programming the mechanical actuator 6 with the right intensity and duration, the required sub-pixel shifts can be guaranteed to exist in the burst of raw frames IMGRAW (including raw full images and raw frame regions of interest ROIs). The resulting vibration of the electronic device creates uniformly distributed sub-pixel shifts in the range defined by the target magnification factor r and the respective colour filter array pattern of the camera 2. The image processor unit can be a digital signal processor suitably programmed by a computer program that comprises instructions which, when the program is executed by the image processing unit, cause the processing unit to carry out the steps a) to d) of the aforementioned method.

The method and the image processing unit including the computer program can use existing hardware and, in particular, the mechanical actuator, which are commonly available for other purposes in various camera products such as smartphones, tablets and wearables. The method boosts true optical resolution during digital zooming without changes to the existing optical lens system in the electronic device, thus maintaining small form factor and not increasing the product cost. The method achieves high-quality digital zooming with arbitrary magnification factors r and does not assume a controlled setup nor requires the scenes captured to meet certain constraints. The method applies to various colour filter array arrangements, wherein only the regions of interest ROIs buffer is required in case of memory-limited imaging systems. The method can be used to boost true optical resolution without any digital zooming performed, i.e. it can be used as a demosaicing solution, compensating the resolution loss introduced by traditional colour filter array data interpolation.

Claims

1 . Method for processing image data (IMGRAW) of an image sensor (4), wherein the image data comprising a raw matrix of pixels per image, characterised by: a) Activating a mechanical actuator (6) coupled to the image sensor (4) to cause vibrational motion of the image sensor (4); b) Capturing a burst of images while the mechanical actuator (6) is activated; c) Aligning the raw matrices of pixels of the burst of captured images (IMGRAW) to one specific alignment; and d) Combining the burst of images (IMGRAW) to achieve a resulting image (IMGFIN) by use of the plurality of pixels of the raw matrixes available for each pixel position in the matrix of the resulting image (IMGFIN).

2. Method according to claim 1 , characterised by activating an actuator of a handheld device, e.g. vibrator unit, as a mechanical actuator (6) which is provided primarily for other purposes.

3. Method according to claim 1 or 2, characterised by selecting a region of interest (ROI) of an image to be captured and determining the size of the region of interest (ROI) and the magnification factor (r) related to the raw image (IMGRAW) captured by the image sensor (4) before proceeding with the steps a) to d).

4. Method according to claim 3, characterised in that the selection of the region of interest (ROI) is used as a trigger signal for the activation of the mechanical actuator (6) in step a).

5. Method according to one of the proceeding claims, characterised by actuating the mechanical actuator (6) dynamically to cause the image sensor (4) to vibrate per a pre-defined or controlled trajectory.

6. Method according to one of the proceeding claims, characterised by aligning the raw matrixes in step c) such that the captured burst of raw images (IMGRAW) is aligned to a selected base image, wherein the pixel data of the aligned respective pixel matrix as well as the pixel data of the related base image are fused in the raw domain in step d) by use of the raw pixel data.

7. Method according to claim 6, characterised by adaptively selecting the base image, in particular the reference frame, by initially selecting of candidate frames for motion estimation based on an estimate of sharpness of the image, and confirming of the selection of a frame for motion estimation and subsequent use as reference frame or the alignment based on an estimate of the interframe motion.

8. Method according to one of the preceding claims, characterised by rendering the resulting image (IMGFIN) achieved in step d) or a zoomed area of the raw image (IMGRAW) captured by the image sensor.

9. Method according to one of the preceding claims, characterised in that the step d) comprises colour interpolation of pixel data as well as increase of the spatial resolution.

10. Method according to one of the preceding claims, characterised by dividing the raw images (IMGRAW) into uniformly shaped regions and aligning the raw matrixes of the raw images (IMGRAW) in step c) independently in each region.

11 . Method according to one of the preceding claims, characterised by aligning the raw matrices in step c) by use of the colour channel with the highest sampling. Image processor unit (3) for processing raw image data (IMGRAW) provided by an image sensor (4), said image sensor (4) comprises a sensor array providing a raw matrix of pixels per image, characterised in that the image processor unit (3) is arranged to: a) Activate a mechanical actuator (6) coupled to the image sensor (4) to cause vibrational motion of the image sensor; b) Capture a burst of images (IMGRAwi . n) while the mechanical actuator is activated; c) Align the raw matrixes of pixels of the burst of captured images (IMGRAwi . n) to one specific alignment; and d) Combine the burst of images (IMGRAwi . n) to achieve a resulting image (IMGFIN) by use of the plurality of pixel of the raw matrixes available for each pixel position and the matrix of the resulting image (IMGFIN). Image processor unit (3) according to claim 12, characterised in that the image processor unit (3) is arranged for processing image data by performing the step of one of claims 1 to 10. Computer program comprising instructions which, when the program is executed by a processing unit, causes the processing unit to carry out the steps of the method of one of the claims 1 to 11 .