From depth maps to final render β everything you need to turn flat 2D footage into cinematic 3D. Powered by AI. Optimized for speed. Designed for creators.
- Key Features
- Guide Sheet: Install
- Guide Sheet: GUI Inputs
- Pulfrich Effect Quick Guide
- Troubleshooting
- Dev Notes
- Acknowledgments & Credits
- Seamless integration with 20+ transformer-based depth models: ZoeDepth, Depth Anything, MiDaS, DPT, DepthPro, DinoV2, and more
- One-click model selection with automatic downloads, no CLI setup or config files
- PyTorch GPU acceleration (no OpenCV recompile needed)
- Batch support for both video sequences and image folders
- Temporal smoothing, intelligent scene-adaptive normalization
- Built-in color inversion, customizable colormaps (Viridis, Inferno, Magma)
- Real-time frame-by-frame progress bar, FPS display, and ETA tracking
- Auto-resizing, smart batching, and graceful handling of large resolutions
- Pause/resume/cancel supported during all GPU operations
- Pixel-accurate depth parallax shifting using CUDA + PyTorch
- Full control over foreground(pop) / midground(balance) / background(pull) parallax
- Half-SBS, Full-SBS, VR, Passive Interlaced, Anaglyph, and Dynamic Floating Window formats
- Dynamic floating window with cinema-style masking that slides and eases smoothly
- Built-in Pulfrich effect renderer (motion delay-based left-eye blending)
- Feathered shift masking, sharpening, and edge-aware smoothing
- Subject tracking-based convergence for natural stereo alignment
- GPU-accelerated real-time processing with live GUI stats (FPS, elapsed time, %)
- Output is compatible with Quest VR, YouTube 3D, and most stereo players
- Integrated RIFE ONNX model (no PyTorch required) for real-time frame doubling
- Interpolation modes: 2x, 4x, 8x FPS with smooth motion blending
- Folder-based processing of raw frames + automatic video reassembly
- Preserves frame resolution, count, audio sync, and aspect ratio
- Supports preview and export at high quality using FFmpeg codecs
- Real-time progress tracking + FPS + ETA built into GUI
- Integrated Real-ESRGAN (x4) super-resolution model, exported to ONNX with full GPU support
- Batch upscaling with intelligent VRAM-aware batching (1β8 frames)
- Supports 720p β 1080p, 1080p β 4K, or any custom resolution
- Automatically resizes final frames to match output format and target resolution
- Lightning-fast CUDA-accelerated ONNX runtime (no PyTorch required)
- Full integration with frame renderer: upscales after 3D rendering or interpolation
- Clean, artifact-free outputs using enhanced fp16 inference for visual clarity
- Progress bar, FPS counter, ETA timer integrated into the GUI
- Fully exportable to video with codec support: MP4V, XVID, MJPG, FFmpeg NVENC
- Gradient-aware artifact suppression near depth edges (limbs, hair)
- Feathered transition masks to avoid ghosting and popping
- Depth-aware sharpening and blending for polished 3D output
- Dynamic bar generation for floating window masking that eases smoothly like theatrical films
- Real-time zero parallax estimation and smoothing per-frame
- Audio button to directly rip and attach audio from source video using FFmpeg
- Format choices: AAC, MP3, WAV with adjustable bitrate
- Built-in tools, no shell commands needed β fully GUI-based
- Choose preview format: Passive Interlaced, HSBS, Shift Heatmap
- Live preview on frame for quick tuning
- Auto-exports as image preview file, no temp videos needed
- Toggle convergence depth and parallax before full render
- Lightweight player built for Half-SBS, Full-SBS, and VR output
- Fast seeking, play/pause/fullscreen toggles
- Timestamp scrubber + resolution-aware display
- Designed to instantly preview 3D results without leaving the app
- Multi-tab Tkinter interface, responsive and persistent settings
- Pause, resume, and cancel buttons for all rendering threads
- Codec selector with GPU NVENC options (H.264, HEVC, AV1-ready)
- One-click launcher, no pip/CLI scripting needed
- slider recall, and auto-cropping for black bars
- Formats: Half-SBS, Full-SBS, VR Mode, Red-Cyan Anaglyph, Passive Interlaced
- Ratios: 16:9, CinemaScope (2.39:1), 2.76:1, 4:3, 21:9, Square 1:1, Classic 2.35:1
- Supports export in MP4, MKV, AVI with codecs: XVID, MP4V, MJPG, DIVX, and FFmpeg NVENC
- βοΈ This program runs on python 3.12
- βοΈ This program has been tested on cuda 12.8
- βοΈ Conda (Optional, Recommended for Simplicity)
- β Linux/macOS is not officially supported until a more stable solution is found
- 1οΈβ£ Download the VisionDepth3D zip file from the official download source. (green button)
- 2οΈβ£ Extract the zip file to your desired folder (e.g., c:\user\VisionDepth3D).
- 3οΈβ£ Download models Here and extract weights folder into VisionDepth3D Main Folder
- 1οΈ. press (Win + R), type cmd, and hit Enter.
- 2. Clone the Repository (Skip the git clone if you downloaded the ZIP and start from cd)
git clone https://github.com/VisionDepth/VisionDepth3D.git cd C:\VisionDepth3D-main pip install -r requirements.txt
- continue to installing pytorch with cuda and then run VisionDepth3D.bat
(Automatically manages dependencies & isolates environment.)
- 1. Clone the Repository (Skip the git clone if you downloaded the ZIP and start from cd)
- 2. Create the Conda Environment
To create the environment, copy and past this in conda to run:
git clone https://github.com/VisionDepth/VisionDepth3D.git cd VisionDepth3D-main conda create -n VD3D python=3.12 conda activate VD3D pip install -r requirements.txt
π Find Your CUDA Version: Before installing PyTorch, check which CUDA version your GPU supports:
- 1οΈβ£ Open Command Prompt (Win + R, type cmd, hit Enter)
- 2οΈβ£ Run the following command:
nvcc --version
or
nvidia-smi
- 3οΈβ£ Look for the CUDA version (e.g., CUDA 11.8, 12.1, etc.)
Go to the official PyTorch website to find the best install command for your setup: π https://pytorch.org/get-started/locally/
if you are running Cuda 12.8 install Pytorch(nightly)-Cuda 12.8, if that doesnt work use 12.6 version
- Once Pytorch and all dependancies are installed run the following command:
VisionDepth3D.bat
Congrats you have successfully downloaded VisionDepth3D! This snippet guides users through cloning the repo, creating and activating the environment, and running the appβall in a few simple steps.
Use the GUI to fine-tune your 3D conversion settings.
- Description: Sets the output video encoder.
- Default:
mp4v
(CPU) - Options:
mp4v
,XVID
,DIVX
β CPU-basedlibx264
,libx265
β High-quality software (CPU)h264_nvenc
,hevc_nvenc
β GPU-accelerated (NVIDIA)
- Description: Pops foreground objects out of the screen.
- Default:
6.5
- Range:
3.0
to8.0
- Effect: Strong values create noticeable 3D "pop" in close objects.
- Description: Depth for mid-layer transition between foreground and background.
- Default:
1.5
- Range:
-3.0
to5.0
- Effect: Smooths the 3D transition β higher values exaggerate depth between layers.
- Description: Shift depth for background layers (far away).
- Default:
-12.0
- Range:
-10.0
to0.0
- Effect: More negative pushes content into the screen (deeper background).
- Description: Applies a sharpening filter to the output.
- Default:
0.2
- Range:
-1.0
(softer) to1.0
(sharper) - Effect: Brings clarity to 3D edges; avoid over-sharpening to reduce halos.
- Description: Blends delayed and current frames for Pulfrich-style motion depth.
- Default:
0.5
- Range:
0.3
(subtle) to0.7
(stronger) - Effect: Controls temporal depth perception. Higher = more blur in motion.
- Description: How many seconds to delay the Pulfrich ghost frame.
- Default:
1/30
- Range:
1/50
to1/20
- Effect: Smaller values = subtle motion depth, larger = stronger Pulfrich 3D.
- Description: Softens hard 3D edges using depth gradients.
- Default:
10.0
- Range:
0
to20
- Effect: Reduces ghosting artifacts and hard cutouts around subjects.
- Description: How wide the smoothing kernel should be.
- Default:
9
- Range:
1
to15
- Effect: Larger = more smoothing, helps reduce halo noise on edges.
- Codec: Choose GPU-accelerated encoders (
h264_nvenc
,hevc_nvenc
) for faster renders. - CRF (Constant Rate Factor):
- Default:
23
- Range:
0
(lossless) to51
(worst) - Lower values = better visual quality.
- Default:
- Checkbox: Lock Subject to Screen
- Effect: Enables Dynamic Zero Parallax Tracking β the depth plane will automatically follow the subjectβs depth to minimize excessive 3D warping.
- Great for: Human characters or central objects in motion.
- Match resolution and FPS between your input video and depth map.
- Use the Inverse Depth checkbox if bright = far instead of close.
- Recommended depth models:
ZoeDepth
,Depth Anything V2
,MiDaS
,DPT-Large
, etc.- Choose Large models for better fidelity.
Clip Length | Estimated Time (with GPU) |
---|---|
30 seconds | 1β4 mins |
5 minutes | 10β25 mins |
Full Movie | 6β24+ hours |
- Load video and matching depth map.
- Choose output format (Half-SBS, Full-SBS, Anaglyph, etc.).
- Enable "Lock Subject to Screen" for tracked parallax.
- Set feather smoothing to around
10
and blur9
for cleanest edges. - Set encoder: use NVENC for speed (
h264_nvenc
), orlibx264
for max compatibility. - Hit "Generate 3D Video" and let it roll!
- Works by blending delayed + current frames for moving objects.
- Best for lateral motion scenes (walking, panning, cars, etc.).
- Tune:
blend_factor
= 0.4β0.6delay_time
= ~1/30
- Scene changes are automatically smoothed!
- Black/Empty Output: Wrong depth map resolution or mismatch with input FPS.
- Halo/Artifacts:
- Increase feather strength and blur size.
- Enable subject tracking and clamp the zero parallax offset.
- Out of Memory (OEM):
- Enable FFmpeg rendering for better memory usage.
- Use
libx264
orh264_nvenc
and avoid long clips in one go.
This tool is being developed by a solo dev with nightly grind energy (π ~4 hours a night). If you find it helpful, let me know β feedback, bug reports, and feature ideas are always welcome!
I want to express my gratitude to the amazing creators and contributors behind the depth estimation models used in this project. Your work has made it possible to push the boundaries of 3D rendering and video processing. π
Model Name | Creator / Organization | Hugging Face Repository |
---|---|---|
Distil-Any-Depth-Large | xingyang1 | Distil-Any-Depth-Large-hf |
Distil-Any-Depth-Small | xingyang1 | Distil-Any-Depth-Small-hf |
Depth Anything V2 Large | Depth Anything Team | Depth-Anything-V2-Large-hf |
Depth Anything V2 Base | Depth Anything Team | Depth-Anything-V2-Base-hf |
Depth Anything V2 Small | Depth Anything Team | Depth-Anything-V2-Small-hf |
Depth Anything V1 Large | LiheYoung | Depth-Anything-V2-Large |
Depth Anything V1 Base | LiheYoung | depth-anything-base-hf |
Depth Anything V1 Small | LiheYoung | depth-anything-small-hf |
V2-Metric-Indoor-Large | Depth Anything Team | Depth-Anything-V2-Metric-Indoor-Large-hf |
V2-Metric-Outdoor-Large | Depth Anything Team | Depth-Anything-V2-Metric-Outdoor-Large-hf |
DA_vitl14 | LiheYoung | depth_anything_vitl14 |
DA_vits14 | LiheYoung | depth_anything_vits14 |
DepthPro | Apple | DepthPro-hf |
ZoeDepth | Intel | zoedepth-nyu-kitti |
MiDaS 3.0 | Intel | dpt-hybrid-midas |
DPT-Large | Intel | dpt-large |
DinoV2 | dpt-dinov2-small-kitti | |
dpt-beit-large-512 | Intel | dpt-beit-large-512 |
π Thank You! A huge thank you to all the researchers, developers, and contributors who created and shared these models. Your work is inspiring and enables developers like me to build exciting and innovative applications! ππ