GitHub - csuhan/Tar: Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Unifying Visual Understanding and Generation via Text-Aligned Representations

Jiaming Han, Hao Chen^†, Yang Zhao, Hanyu Wang, Qi Zhao, Ziyan Yang, Hao He, Xiangyu Yue^‡, Lu Jiang^‡

^† Project Lead ^‡ Corresponding Authors

News

J 6083 une 2025. We are applying for code open-sourcing and will release the code and model immediately once approved.

Citation

@article{han2025tar,
  title={Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations}, 
  author={Han, Jiaming and Chen, Hao and Zhao, Yang and Wang, Hanyu and Zhao, Qi and Yang, Ziyan and He, Hao and Yue, Xiangyu and Jiang, Lu},
  journal={arXiv preprint arXiv:2506.18898},
  year={2025},
}

License

This project is licensed under the Apache 2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
static		static
.gitignore		.gitignore
CNAME		CNAME
LICENSE		LICENSE
README.md		README.md
index.html		index.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Unifying Visual Understanding and Generation via Text-Aligned Representations

News

Citation

License

About

Uh oh!

Releases

Packages

Languages

License

csuhan/Tar

Folders and files

Latest commit

History

Repository files navigation

Unifying Visual Understanding and Generation via Text-Aligned Representations

News

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages