Keywords

1 Introduction

Image inpainting, which is also known as “image completion”, involves the issue of filling missing parts in images. This is a challenging task in computer vision [22]. First of all, the filling algorithm should be able to successfully complete complex natural images. Secondly, it should be able to handle incomplete images with large missing parts. Finally, the algorithm’s execution should be in a fully automatic manner, i.e. without intervention from the user.

So far, many inpainting algorithms have been proposed to resolve these issues. We divide them into four categories with the respect of their capability of characterizing the redundancy of the image.

One category of image inpainting methods is diffusion-based [3, 5, 6, 8, 10, 27]. This class of methods has been first introduced by Bertalmio et al. in [6]. These methods are naturally well suited for completing straight lines, curves, and inpainting small regions. However, they are not well suited for recovering the large area textures because their incompetence of synthesizing semantic textures and structures [18].

The second category of methods that are more effective for inpainting large holes is exemplar-based, which is based on the seminal work on texture synthesis [14, 28] and exploits image statistical and self-similarity priors. The authors in [19] further categorize these methods into two subcategories: MRF-based and matching-based. The MRF-based methods are realized by optimizing discrete Markov Random Fields (MRFs) [22, 25]. These methods rearranged the patch or pixel locations to fill the image, and needed to solve a global optimization problem. This makes the methods more complex than matching-based methods. Matching-based methods [11, 13, 26, 29] are inspired by local region-growing methods which grow the texture one patch at a time. Thanks to the proposed fast PatchMatch method [4], the computational complexity of the match-based inpainting methods have been relieved.

Methods in the third category do not involve any explicit interpolation in the image domain but rather in one or several transform spaces, e.g., applying sparse representations for decomposing the image into a texture and a geometry component and then using two dictionaries of different characteristics to complete the image [15]. In [20] Koh and Rodriguez-Marek proposed a method which modified the K-clustering with singular value decomposition (K-SVD) [1] inpainting method. This type of methods usually performs well in sparse missing data or thin domains image inpainting.

Finally, there are some hybrid methods, i.e. Bertalmio et al. [7] proposed a method to decompose an image into structure and texture components, the structure component is filled by using a PDE base method, while the texture component is filled by a texture synthesis method. Methods [9, 21] used one unique energy function to combine different approaches. The paper [24] proposed the super-resolution method to retrieve high-frequency details of the inpainted areas. Arias et al. proposed a general variational framework for exemplar-based image inpainting [2, 17] and achieved very good results. All of these methods are expected to achieve an effective balance between preserving edges or structures and restoring texture.

Due to these research works, image inpainting has made considerable progress. Especially in the aspect of image retouching, image inpainting has become a standard tool, e.g. the Content Aware Fill in Adobe Photoshop is implemented in [4, 29] as reported in Adobe web site. However, there are numerous applications on image and video editing need to be enhanced in terms of quality and speed of the inpainting. There also have been a few research works on how to use image inpainting for super-resolution and zooming of images [16].

We notice that exemplar-based methods are effective in large area inpainting. However, these methods may not get ideal results for restoring complicated images. Firstly, in the patch matching, selected processing patches often centered at inpainting boundary pixels, then the patches possibly with a lot of unknown pixels have been processed earlier which will increase the matching error. Besides, in the filling process, these methods filled all the unknown pixels in the unit of a patch, this may cause the error accumulation, and lead to unsatisfied inpainting results.

Fig. 1.
figure 1

Algorithm outline. (a) Masked image. (b) Selecting the top 10 pixels with the highest priority. (c) Shift patches of the processing pixel. (d) Combining a set of matched patches for each filling pixel. (e) Our result. (f) Result of [11].

To resolve these issues, we propose an exemplar-based inpainting method based on patch shift (as showed in Fig. 1). We select the pending pixels according to their structure priority (as in Fig. 1(b)). Then we select a series of neighboring patches which contain the selected pending pixels as patches to be processed (as in Fig. 1(c)). Moreover, we delete some patches that contain a large number of unknown pixels and patches that are completely known. Furthermore, we acquire the most similar patches for each remaining patch by matching it in the known region of the image according to the method proposed in paper [4]. Finally, for each unknown pixel in the processing patches, we get a series of pixels from matched patches which include the corresponding pixel to the filling pixel, and then fill the pixel according to the values of these corresponding pixels (as in Fig. 1(d)). A variety of experiments show our method is effective and performs well in large area repairing.

The rest of the article is organized as follows. In Sect. 2, we introduce the previous works which are related with our method. In Sect. 3, we present our exemplar-based inpainting method based on patch shift. The inpainting results and comparison in real nature image and synthetic image are discussed in Sect. 4, followed by the conclusion in Sect. 5.

2 Related Work

The original intention of exemplar-based inpainting methods are to filling-in large missing portions in an image (e.g., to remove an object) in a visually plausible way. For images which have the same patterns, we can use the “texture synthesis” algorithms to solve this problem. However, nature images often contains not only the same texture information, but also many foreground objects whose contour continuity is very sensitive to human eyes. Based on this, many exemplar-based inpainting methods are paying great attention to the structural information in valid pixels around the repaired boundary. A pioneering approach in this field was proposed by Criminisi et al. [11] that combined a structural reconstruction approach with the texture synthesis to utilize advantages of both approaches. They observed that it was important to complete an image starting by recovering the structures first, and based on this idea they proposed an order calculation method to ensure the patch which the isophotes flowing into it to be filled earlier.

Here, given an input image \(\mathrm I\), as well as a target region \(\mathrm \Omega \) and a source region \(\mathrm S\), generally \(\mathrm S=I-\Omega \). The goal of the exemplar-based inpainting is to fill \(\mathrm \Omega \) in a visually plausible way by copying patches or pixels from \(\mathrm S\). Also we use \(\delta \mathrm \Omega \) to represent the boundary of the target region, i.e. the fill front. In paper [11], the processing order is given by a patch priority measure defined as the product of two terms (\(P(p)=C(p)D(p)\)). The first term C(p) is the confidence term accounts for the amount of known pixels versus unknown pixels in the processing patch, and the second term D(p) is the data term to indicate the presence of some structures in the processing path.

Based on this initial priority calculation method, some methods are proposed to amend it. For instance, many methods [12, 23, 30] were proposed to define different patch processing orders in the way to ensure the structure information to be filled in first.

In these methods, on one hand, we noticed that there are many patches with large numbers of unknown pixels and large gradient information will be filled first. On the other hand, whole unknown pixels in the processing patch were filled at the same time in these methods. These two aspects may cause filling errors, and lead to the visual discontinuity of the repairing result eventually.

3 Model Description

In this section we describe the pipeline of our algorithm. The process of our algorithm has three steps: (i) selecting pixels; (ii) matching patches; (iii) filling pixels. This process iterates until the image is completely repaired.

3.1 Selecting Pixels

In this step, image product is used to calculate the repairing image, and mark the boundary of the repairing area based on the mask image. We only use data-term D(p) as the priority term to determine which pixels to process first. An effective data-term D(p) must tell about whether linear structures are presented or not at a point p and with which angle they cross the repaired region. Because of the incomplete information of the pixels \(p\in \delta \mathrm \Omega \), the gradient \(\overrightarrow{\nabla I_q}\) cannot precisely estimated priority at this point. We use a better priority term defined as (1) which is proposed and validated in [12] to compute the priority of the marked boundary pixels.

$$\begin{aligned} D(p)=\Vert G_p\overrightarrow{n_p} \Vert _2 . \end{aligned}$$
(1)

where \(\overrightarrow{n_p}\) is the local orientation to the mask front at the point p, and \(G_p\) is a weighted average of structure tensors estimated on non-masked parts of the target patch \(\psi _p\), for color image, \(\mathrm I=(R,G,B)^T\), \(G_p\) can be defined as (2):

$$\begin{aligned} G_p=\sum _{q\in (\psi _p\cap (I-\varOmega ))}{w_q\begin{pmatrix} R_x^2+G_x^2+B_x^2 &{} R_x R_y+G_x G_y+B_x B_y\\ R_x R_y+G_x G_y+B_x B_y &{} R_y^2+G_y^2+B_y^2 \end{pmatrix}} . \end{aligned}$$
(2)

and \(w_q\) is a normalized 2d Gaussian function centered at p. The subscripts x and y indicate spatial derivatives of different color channels.

At the end of pixel selecting step, we extract top \(N_1\) pixels with the highest priority on the repairing boundary as candidate pixels to process in the subsequent step.

3.2 Matching Patches

According to selecting pixels step, pixels among the contour crosses \(\mathrm \Omega \) orthogonally are extracted. For each pixel \(p_i\) in these extracted pixels, we shift the patch \(\psi _{p_i}\)(the size of the path is \(w_s \times w_s \)) which centers at \(p_i\) to get all the patches which include the pixel \(p_i\). The obtained set \(\mathrm S_{\psi }\) of the patches is defined as (3):

$$\begin{aligned} \mathrm{S}_\psi ={\{ \psi |p_i\in \psi , i=1,2,...N_1\}} . \end{aligned}$$
(3)

Through this process, we extend the patches for further matching process. After shifting, the patches may contain a lot of unknown pixels or may be not containing any unknown pixel which are required to be filtered. For each patch \(\psi _{j}\in \mathrm{S}_\psi \), these filter condition can be formulized as (4):

$$\begin{aligned} 0<\left| p_k\in \psi _{j},\ p_k\in \mathrm \Omega \right| \le \alpha _1\mathrm N . \end{aligned}$$
(4)

where \(p_k\) is a pixel in patch \(\psi _{j}\), \(\mathrm N\) is the number of pixels in each patch (i.e. \(w_s \times w_s \)), \(\alpha _1\) is a conditional threshold.

After filtering, we obtained the processing patches set \(\mathrm{S}_\psi ^{'}\), each patch in \(\mathrm{S}_\psi ^{'}\) is matched using PatchMatch algorithm to obtain its most similar patch in the known region of the process image. As a result, we obtained a matched patches set \(\mathrm{M}_\psi ^{'}\) correspond to the processing patches \(\mathrm{S}_\psi ^{'}\).

3.3 Filling Pixels

As discussed above, our algorithm extends the processing patches \(\mathrm{S}_\psi ^{'}\) in matching patches step. This made the filling pixel may present in many different processing patches. Thus, when we want to fill pixel \(p_i\), it is necessary to consider how to fill it based on the matched patches set \(\mathrm{M}_\psi ^{'}\).

In each iterative process of our algorithm, we first exact the pixel \(p_i\) belongs to the processing patch set \(\mathrm{S}_\psi ^{'}\). According to the location of \(p_i\) in each patch \(\psi _{p_i}\) in \(\mathrm{S}_\psi ^{'}\), we get its corresponding pixel value in the matched patch \(\psi _{p_i}\) from \(\mathrm{M}_\psi ^{'}\). For these corresponding pixels value of \(p_i\), due to the weighted average method is more likely to cause fuzzy, in this paper, the median method is adopted to select the final filling value to fill \(p_i\). Finally, after \(p_i\) is filled, our algorithm updates the repairing mask and continues to process the unprocessed pixels belongs to the processing patch. When all unprocessed pixels have been processed, our algorithm will enter the next iteration.

3.4 Confidence Term Processing

Since we have done patch shift of the selected pixel, we don’t use the confidence information C(p) of pixel p to compute the processing order. To ensure the patch that contains a larger amount of reliability information to be filled in a prior order, we adopt a new strategy to use and update the confidence term C(p). During initialization, the value of C(p) is set to \(C(p)=0 \ \forall p \in \mathrm \Omega \), and \(C(p)=1\ for\ \forall p\in \mathrm S\). Our algorithm don’t update C(p) immediately after fill a pixel. Before we compute the data term D(p), if \(C(p)=0 \ for \ \forall p \in \mathrm \Omega \), we set \(D(p)=0\). Until \(D(p)=0 \ for \ \forall p \in \delta \mathrm \Omega \), we update C(p) and set \(C(p)=1\) for all filled pixels.

With this strategy, we can ensure that the repair is done from the structure area on the contour \(\delta \mathrm \Omega \) of the target region \(\mathrm \Omega \), and layer by layer to repair the target region \(\mathrm \Omega \). Figure 2 shows an example of the our algorithm.

Fig. 2.
figure 2

An example of our algorithm. (a) Input image. (b) Mask image. (c) Masked image. (d) Intermediate result image. (e) Intermediate result image. (f) Result.

4 Result and Comparisons

Here we apply our proposed exemplar-based pixel by pixel inpainting algorithm to a variety of applications, ranging from purely synthetic images to full-color photographs that contain complex textures. At the same time, we take some side-by-side comparisons to some previously proposed methods.

Fig. 3.
figure 3

Comparisons on synthetic images. The first row shows the original synthetic images. The second row shows the unknown region (black areas). The third row shows the results and defective areas (in the yellow boxes). (Color figure online)

Fig. 4.
figure 4

Comparisons for object removal. The first row shows five original images. In the remaining five rows, the first to the fifth columns show the masked images, results of [11], results of [24], results of [17] and our results.

4.1 Comparisons on Synthetic Images

Figure 3 presents six examples for synthetic images inpainting. The first row presents the original synthetic images; the second row shows the unknown region (black areas), and the third row shows the results and the defective areas (in the yellow boxes). The (a) and (b) columns show the inpainting of the synthetic images without texture, as the results showed, for such synthetic images, the proposed method is very effective and there are no obvious flaws in the results. (c) and (d) columns show the inpainting of the synthetic images with regular texture, as the results showed, for this type of synthetic images, there are some minor flaws in the results, but the repaired effect of the regular texture effect is very good. (e) and (f) columns show the inpainting results of the synthetic images with irregular texture. The results show that our method has the ability to repair the overall structure of composite images with the complex background, but there are more flaws in the results. This mainly due to the strong gradient area existed in the background texture, and these areas have a higher priority order to process, the foreground object’s boundary may be undermined in background texture’s repaired.

Fig. 5.
figure 5

Comparisons for scratch and text removal. The first row shows four original images. In the remaining four rows, the first to the sixth columns show the masked images, results of [11], results of [5], results of [27], results of [17] and our results.

4.2 Comparisons on Full-Color Photographs

To verify the effect of our method to full-color photograph inpainting applications, we have conducted a lot of experimental comparisons to classical methods on commonly used image inpainting datasets. These applications include large target removal, scratch repair and texture removal.

Figure 4 presents the comparison between our algorithm and related exemplar-based inpainting algorithms. The first row presents the five original images, and the others rows, from the first to the fifth columns show masked images and the results of the methods in [11, 17, 24] and ours. For the image with regular background texture (Fig. 4(a)), the methods for comparison and our method can repair the obvious boundary (the coastline in Fig. 4(a)) very well. For the image with irregular background texture (Fig. 4(b)), our method can be a good way to repair the boundary of images, while the repaired area does not have any obvious texture garbage. In the image Fig. 4(c) and (d), there are prominent structure need to be restored, our method achieves the best repairing results too. The Criminisi’s method has obviously repairing failures for inpainting images in Fig. 4(c) and (d). The pole is clearly disconnected in the result of approach [17]. Although the expected structures were repaired in the results of method [24], there are serrated edge on the poles in the repaired result of Fig. 4(c) and there are a lot of blurring in the repaired result of Fig. 4(d). For the inpainting image (Fig. 4(e)), which has obvious boundaries and different types of textures, our method also achieved the best results. On one hand, our method ensures the continuity of the contour, on the other hand, there is no distinct blur.

Furthermore, Fig. 5 shows a lot of examples we have done for scratch and text removal. For the scratch repairing in Fig. 5(a) and the text removal in Fig. 5(b), the PSNR of our algorithm is similar to the algorithm [17] and is superior some other algorithms [5, 11, 27]. For repairing of damaged texture areas, even algorithms [5, 17, 27] nearly achieve the similar PSNR as our method, but they still have some differences. Figure 6 shows that when we zoom in the repaired pictures, we can find our repaired picture is better than algorithms [5, 27] and close to the best scratch repainting algorithm [17]. The PSNR values of repairing algorithm whose result is presented in Fig. 5(d) of algorithms [5, 17, 27] are little better than our method. However, when we enlarge parts detail of repaired pictures (presented in the Fig. 7.), we find repaired pictures of algorithms [5, 17, 27] still have obvious outline of ‘of’. Thus, our algorithm is still superior to the traditional algorithms [5, 27] that are good at scratch repairing.

Fig. 6.
figure 6

Comparisons of the inpainting details of Fig. 5(c).

Fig. 7.
figure 7

Comparisons of the inpainting details of Fig. 5(d).

4.3 Implementation Details and Parameters

All experiments are running on a PC with an Intel Core i7 2.3 GHz CPU and 16 G RAM. Parameters of the algorithm are kept as constant for testing are presented in this paper, and the size of patch is set to \(9\times 9\), the number \(N_1=10\) in Eq. (3), \(\alpha _1=0.1\) in Eq. (4).

Limitations. Our method may fail when there is a strong structure in both the whole and the part of the image. This shortcoming is actually existed in all exemplar-based inpainting methods.

5 Conclusion

In this paper, through a lot of experiments, we prove patch shift is effective for exemplar-based inpainting algorithm and our method is effective for scratch or text removal, object removal and missing block completion.