Installation with cuda 12.2
Setup the root for all source files
git clone https://github.com/tomtomtommi/stereoanyvideo
cd stereoanyvideo
export PYTHONPATH=`(cd ../ && pwd)`:`pwd`:$PYTHONPATH
Create a conda env
conda create -n sav python=3.10
conda activate sav
Install requirements
conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install pip==24.0
pip install pytorch_lightning==1.6.0
pip install iopath
conda install -c bottler nvidiacub
pip install scikit-image matplotlib imageio plotly opencv-python
conda install -c fvcore -c conda-forge fvcore
pip install black usort flake8 flake8-bugbear flake8-comprehensions
conda install pytorch3d -c pytorch3d
pip install -r requirements.txt
pip install timm
Download VDA checkpoints
cd models/Video-Depth-Anything
sh get_weights.sh
sh demo.sh
Before running, download the checkpoints on google drive .
Copy the checkpoints to ./checkpoints/
In default, left and right camera videos are supposed to be structured like this:
./demo_video/
├── left
├── left000000.png
├── left000001.png
├── left000002.png
...
├── right
├── right000000.png
├── right000001.png
├── right000002.png
...
A simple way to run the demo is using SouthKensingtonSV.
To test on your own data, modify --path ./demo_video/
. More arguments can be found and modified in demo.py
Download the following datasets and put in ./data/datasets/
:
sh evaluate_stereoanyvideo.sh
sh train_stereoanyvideo.sh
If you use our method in your research, please consider citing:
@misc{jing2025stereovideotemporallyconsistent,
title={Stereo Any Video: Temporally Consistent Stereo Matching},
author={Junpeng Jing and Weixun Luo and Ye Mao and Krystian Mikolajczyk},
year={2025},
eprint={2503.05549},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2503.05549},
}