8000 GitHub - hriteshMaikap/INSPIR: A repository whrere machine learning concepts are implemented from Scratch
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

hriteshMaikap/INSPIR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

The INSPIR (Image-Text Synthesis Pipeline for Intelligent Retrieval and Generation) framework presents an innovative approach to image captioning and retrieval by leveraging an ensemble of state-of-the-art models. This research introduces a method that generates descriptive captions from images using BLIP (Bootstrapping Language-Image Pre-training), ViT-GPT2 (Vision Transformer combined with GPT-2), and GIT (Generative Image Text), while employing CLIP (Contrastive Language-Image Pre-training) for ranking the generated captions based on their relevance. The top-ranked captions are then utilized by Llama3.1 to produce creative outputs tailored for various applications, including social media captions and image notes. Furthermore, the INSPIR model enhances image retrieval capabilities by annotating uploaded images and enabling users to conduct text-based searches, thereby facilitating efficient access to relevant visual content. By integrating multiple modalities within a unified semantic space through contrastive learning, this work aims to advance the field of image captioning beyond conventional classification tasks, offering a generalized model performance that addresses the complexities of language and vision interaction.

About

A repository whrere machine learning concepts are implemented from Scratch

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0