Computer Science > Computer Vision and Pattern Recognition

arXiv:2407.10958 (cs)

[Submitted on 15 Jul 2024]

Title:InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models

Authors:Nirat Saini, Navaneeth Bodla, Ashish Shrivastava, Avinash Ravichandran, Xiao Zhang, Abhinav Shrivastava, Bharat Singh

View PDF HTML (experimental)

Abstract:We introduce InVi, an approach for inserting or replacing objects within videos (referred to as inpainting) using off-the-shelf, text-to-image latent diffusion models. InVi targets controlled manipulation of objects and blending them seamlessly into a background video unlike existing video editing methods that focus on comprehensive re-styling or entire scene alterations. To achieve this goal, we tackle two key challenges. Firstly, for high quality control and blending, we employ a two-step process involving inpainting and matching. This process begins with inserting the object into a single frame using a ControlNet-based inpainting diffusion model, and then generating subsequent frames conditioned on features from an inpainted frame as an anchor to minimize the domain gap between the background and the object. Secondly, to ensure temporal coherence, we replace the diffusion model's self-attention layers with extended-attention layers. The anchor frame features serve as the keys and values for these layers, enhancing consistency across frames. Our approach removes the need for video-specific fine-tuning, presenting an efficient and adaptable solution. Experimental results demonstrate that InVi achieves realistic object insertion with consistent blending and coherence across frames, outperforming existing methods.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2407.10958 [cs.CV]
	(or arXiv:2407.10958v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2407.10958

Submission history

From: Nirat Saini [view email]
[v1] Mon, 15 Jul 2024 17:55:09 UTC (33,890 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:InVi: Object Insertion In Videos Using Off-the-Shelf Diffusion Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators