Stars
Document Scanner and Word Segmentation
A star path planning algorithm based line segmentation of handwritten document
The deslanting algorithm sets text upright in images. Python, C++ and OpenCL implementations provided.
[CVPR 2025] Official implementation for "Empowering LLMs to Understand and Generate Complex Vector Graphics" https://arxiv.org/abs/2412.11102
🖼️ A Kaggle Package project for converting natural language prompts into precise SVG code using Python.
The Accessibility Toolkit is an open-source Unity package for adding context-aware subtitles in VR, AR, and non-XR environments.
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
(CVPR 2025) Code of "Chat2SVG: Vector Graphics Generation with Large Language Models and Image Diffusion Models"
Code and data to evaluate LLMs on the ENEM, the main standardized Brazilian university admission exams.
🆙 Upscayl - #1 Free and Open Source AI Image Upscaler for Linux, MacOS and Windows.
Repository accompanying the "Sign Pose-based Transformer for Word-level Sign Language Recognition" paper
Create embeddings from sign pose videos using Transformers
Client/Server Authoritative Multiplayer Addon for the Godot Engine
Lib to create realtime multiplayer game. It uses Prediction & Rewinding networking model.
Pytorch code for NeurIPS-20 Paper "Object Goal Navigation using Goal-Oriented Semantic Exploration"
ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings. NeurIPS 2022
Vision-and-Language Navigation in Continuous Environments using Habitat
Code of the paper "NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning" (TPAMI 2025)
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
[ECCV 2024] Official implementation of NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models
Sign Language Translation with Transformers (COLING'2020, ECCV'20 SLRTP Workshop)
[AAAI 2024] Official implementation of NavGPT: Explicit Reasoning in Vision-and-Language Navigation with Large Language Models
[CVPR 2025] RoomTour3D - Geometry-aware, cheap and automatic data from web videos for embodied navigation
Official Task Suite Implementation of ICML'23 Paper "VIMA: General Robot Manipulation with Multimodal Prompts"
LaVi-Lab / NaviLLM
Forked from zd11024/NaviLLM[CVPR 2024] The code for paper 'Towards Learning a Generalist Model for Embodied Navigation'