GitHub - ChocoWu/PSG-4D-LLM: This is the project repo for 'PSG-4D-LLM'.

4D Panoptic Scene Graph Generation

Shengqiong Wu¹, Hao Fei^1*, Jingkang Yang², Xiangtai Li², Juncheng Li³, Hanwang Zhang², and Tat-Seng Chua¹

(*Correspondence)

Motivation and Method

The latest emerged 4D Panoptic Scene Graph (4D-PSG) provides an advanced-ever representation for comprehensively modeling the dynamic 4D visual real world. Unfortunately, current pioneering 4D-PSG research can largely suffer from data scarcity issues severely, as well as the resulting out-of-vocabulary problems; also, the pipeline nature of the benchmark generation method can lead to suboptimal performance. To address these challenges, this paper investigates a novel framework for 4D-PSG generation that leverages rich 2D visual scene annotations to enhance 4D scene learning. First, we introduce a 4D Large Language Model (4D-LLM) integrated with a 3D mask decoder for end-to-end generation of 4D-PSG. A chained SG inference mechanism is further designed to exploit LLMs' open-vocabulary capabilities to infer accurate and comprehensive object and relation labels iteratively. Most importantly, we propose a 2D-to-4D visual scene transfer learning framework, where a spatial-temporal scene transcending strategy effectively transfers dimension-invariant features from abundant 2D SG annotations to 4D scenes, effectively compensating for data scarcity in 4D-PSG.

Data

The main task dataset is PSG4D, please refer the instruction for preparation.
In the 2D-to-4D visual scene transfer learning, the datasets we leverage are:

Please follow the instructions to prepare the datasets.

Training

Coming soon.

Citation

If you use PSG-4D-LLM in your project, please kindly cite:

@inproceedings{wu2025psg4dllm,
    title={Learning 4D Panoptic Scene Graph Generation from Rich 2D Visual Scene},
    author={Shengqiong Wu and Hao Fei and Jingkang Yang and Xiangtai Li and Juncheng Li and Hanwang Zhang and Tat-Seng Chua1},
    booktitle={CVPR},
    year={2025}
}

Acknowledgement

Our 4D-LLM is developed based on the codebases of NExT-Chat, Chat-UniVi, SA-Gate, and Sam2, and we would like to thank the developers of both.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
data		data
static		static
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

4D Panoptic Scene Graph Generation

Shengqiong Wu¹, Hao Fei^1*, Jingkang Yang², Xiangtai Li², Juncheng Li³, Hanwang Zhang², and Tat-Seng Chua¹

(*Correspondence)

Motivation and Method

Data

Training

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

ChocoWu/PSG-4D-LLM

Folders and files

Latest commit

History

Repository files navigation

4D Panoptic Scene Graph Generation

Shengqiong Wu1, Hao Fei1*, Jingkang Yang2, Xiangtai Li2, Juncheng Li3, Hanwang Zhang2, and Tat-Seng Chua1

(*Correspondence)

Motivation and Method

Data

Training

Citation

Acknowledgement

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Shengqiong Wu¹, Hao Fei^1*, Jingkang Yang², Xiangtai Li², Juncheng Li³, Hanwang Zhang², and Tat-Seng Chua¹

Packages