8000 Norman-Ou (Ruizhe Ou) · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
View Norman-Ou's full-sized avatar

Block or report Norman-Ou

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Norman-Ou/README.md

Hi there 👋

I‘m Ruizhe Ou

I explore how multi-modal large language models (MLLMs) can advance remote sensing tasks.

I work on text-to-image, image-to-image, and text-to-video generation, blending creativity with machine learning.

Things I‘ve Been Discovering

GeoPix is a remote sensing MLLM that extends image understanding capabilities to the pixel level. It integrates a mask predictor into the MLLM, transforming visual features from the vision encoder into masks conditioned on the segmentation token embeddings generated by the LLM.

multitask

This work provides a novel method for generating disaster-affected remote sensing images by integrating state-of-the-art models, including Stable Diffusion, BLIP, GPT-4, and human-in-the-loop feedback. The pipeline starts with only 97 unlabelled 512×512 remote sensing images. BLIP is first used to generate initial captions, which are then refined through expert feedback and GPT-based semantic rewriting to enhance the prompts. These enhanced prompts, paired with the original images, form a synthetic training set.

fig1

Things I've Been Creating

Line Art to Anime

I developed a ControlNet model designed to transform line art into fully colored anime-style images. This model enables precise and high-quality generation by conditioning the diffusion process on clean line drawings, making it easier to create vibrant and consistent anime artwork from simple sketches.

Image Generation Pipeline for Linky Logo

Main developer of the image generation pipeline for Linky, supporting anime-style, real-style, and film-style image stylization, pose editing, and face consistency modeling.

This pipeline covers:

  • Prompt cleaning and expansion (similar to a prompt helper)
  • Image style selection
  • Pose extraction and editing
  • Face consistency enhancement
  • Risk control evaluation
  • Compute resource scheduling strategies

Popular repositories Loading

  1. GeoPix GeoPix Public

    [GRSM] Project Page for "GeoPix: Multi-Modal Large Language Model for Pixel-level Image Understanding in Remote Sensing"

    Python 22

  2. InstantID-with-FouriScale InstantID-with-FouriScale Public

    Combined InstantID🔥 and FouriScale to generate high resolution image!

    Python 11 1

  3. BUPT-QM-InnovationTutor_Group61 BUPT-QM-InnovationTutor_Group61 Public

    CSS 2 1

  4. EBU6304-2022-Software-Engineering-Group-8 EBU6304-2022-Software-Engineering-Group-8 Public

    Java 2

  5. Norman-Ou Norman-Ou Public

    Config files for my GitHub profile.

  6. AntennaPhaseGA AntennaPhaseGA Public

    This project applies Genetic Algorithms (GA) to optimize the initial phase settings of antenna arrays. The goal is to improve beamforming performance by minimizing sidelobe levels and enhancing mai…

    Python

0