A real-time 3D object detection system that combines YOLOv11 for object detection with Depth Anything v2 for depth estimation to create pseudo-3D bounding boxes and bird's eye view visualization.
- Real-time object detection using YOLOv11
- Depth estimation using Depth Anything v2
- 3D bounding box visualization
- Bird's Eye View (BEV) visualization
- Object tracking capabilities
- Support for video files and webcam input
- Adjustable model sizes for performance/accuracy tradeoffs
- Python 3.8+
- PyTorch 2.0+
- OpenCV
- NumPy
- Other dependencies listed in
requirements.txt
-
Clone this repository:
git clone https://github.com/niconielsen32/YOLO-3D.git cd YOLO-3D
-
Install dependencies:
pip install -r requirements.txt
-
Download model weights (will be downloaded automatically on first run)
Run the main script:
python run.py
You can modify the following parameters in run.py
:
-
Input/Output:
source
: Path to input video file or webcam index (0 for default camera)output_path
: Path to output video file
-
Model Settings:
yolo_model_size
: YOLOv11 model size ("nano", "small", "medium", "large", "extra")depth_model_size
: Depth Anything v2 model size ("small", "base", "large")
-
Detection Settings:
conf_threshold
: Confidence threshold for object detectioniou_threshold
: IoU threshold for NMSclasses
: Filter by class, e.g., [0, 1, 2] for specific classes, None for all classes
-
Feature Toggles:
enable_tracking
: Enable object trackingenable_bev
: Enable Bird's Eye View visualizationenable_pseudo_3d
: Enable 3D visualization
YOLO-3D/
│── run.py # Main script
│── detection_model.py # YOLOv11 object detection
│── depth_model.py # Depth Anything v2 depth estimation
│── bbox3d_utils.py # 3D bounding box utilities
│── load_camera_params.py # Camera parameter utilities
├── requirements.txt # Project dependencies
└── README.md # This file
- Object Detection: YOLOv11 detects objects in the frame and provides 2D bounding boxes
- Depth Estimation: Depth Anything v2 generates a depth map for the entire frame
- 3D Box Estimation: Combines 2D boxes with depth information to create 3D boxes
- Visualization: Renders 3D boxes and bird's eye view for better spatial understanding
- YOLOv11 by Ultralytics
- Depth Anything v2 by Microsoft