image-captioning

The aim of the project is to create a captioning model. The architecture of your generative model which have two encoders and a decoder. The first encoder is used to extract visual features from images, while the second one is used for semantic features. Visual and semantic features are then concatenated to represent the images.The decoder aims at generating words to construct the captions for the medical images.

methodology

We employed both generative and retrieval models to produce the output captions. Within the generative approach, two methodologies were utilized:

A dual-encoder configuration involving one image encoder and a separate text encoder for captions, both linked to a decoder.
A single encoder-decoder setup.

The outputs from the chosen encoder configuration were then fed into the decoder to generate provisional captions. These were subsequently compared with the captions obtained from the retrieval model to determine the final caption.

Usage

Install the requirements
```
pip3 install -r requirements.txt
```

scripts

 process_dataset.ipynb             Task1: select dataset and generate the json dat object
 data_visualize.ipynb              Task1: visualize samples from the dataset
 vocabulary_builder.ipynb          Task2: build the vocabulary 
 vacabulary_frequency.ipynb        Task3: plot the word occurrences 
 word_embeddings.ipynb             Task4: generate word embeddings (using different methods) and plotting 
 data_loader.ipynb                 Task5: pytorch data loading functions generative model (encoder-decoder models)
 train_duel_encoder.ipynb          Task5: fit data: model training of duel encoder model
 train_single_encoder.ipynb        Task5: fit data: model training of single encoder model
 similarity_single_encoder.ipynb   Task6 - Task9 using single encoder generative model 
                                   Task6: get generative captions and similarities with GTs (X) using different similarity matrices 
                                   Task7: Retrieval method: get the most similar caption for generated caption (Y) from training set (Z)
                                   Task8: model fusion: compare GT (X) with (Y) and (Z) and get the best caption as (Y) or (Z) and assigned it to test image
                                   Task9: Evaluation metrics
 similarity_duel_encoder.ipynb     Task6 - Ta
4FF2
sk9 using dual encoder generative model 
 inference.ipynb                   Task6 - inference of dual encoder model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

image-captioning

methodology

Usage

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
images		images
models		models
.gitignore		.gitignore
README.md		README.md
Radiology_Image_Captioning.pdf		Radiology_Image_Captioning.pdf
data_loader.ipynb		data_loader.ipynb
data_visualize.ipynb		data_visualize.ipynb
inference.ipynb		inference.ipynb
process_dataset.ipynb		process_dataset.ipynb
requirements.txt		requirements.txt
similarity_duel_encoder.ipynb		similarity_duel_encoder.ipynb
similarity_single_encoder.ipynb		similarity_single_encoder.ipynb
train_duel_encoder.ipynb		train_duel_encoder.ipynb
train_single_encoder.ipynb		train_single_encoder.ipynb
vacabulary_frequency.ipynb		vacabulary_frequency.ipynb
visualize_models.ipynb		visualize_models.ipynb
vocab.pkl		vocab.pkl
vocab_60000.pkl		vocab_60000.pkl
vocabulary_builder.ipynb		vocabulary_builder.ipynb
word_embeddings.ipynb		word_embeddings.ipynb

tharindu326/image-captioning

Folders and files

Latest commit

History

Repository files navigation

image-captioning

methodology

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages