Clinical Data Extraction and Summarization

Access to organized medical information in clinical datasets is limited, preventing effective data exploitation. I have developed an unified end-to-end tool that uses Named Entity Recognition (NER) for entity extraction, document search based on a query, and clinical document summarization using a Large Language Model (LLMs) to allow healthcare professionals to retrieve and comprehend relevant medical data quickly.

Workflow

The detailed workflow is as follows:

Step 1

Pre-trained DistilBERT model is fine-tuned on Maccrobat data, a medical entity dataset, for Named-Entity-Recognition (NER) task.
The fine-tuned distilbert-ner model is hosted on HuggingFace model hub. The model can be accessed from here
The complete training code can be accessed in NER_training.ipynb notebook

Step 2

Here, given a medical report, the fine-tuned distilbert-ner model predicts the medical entities in the report
distilbert-ner model also outputs contextual embeddings for the input medical report, which would be used for relevant information retrieval
Given a query by the user, FAISS is used to retrieve most relevant sentences in the medical report
As a final step, summary of the retrieved sentences is generated using pre-trained T5 (t5-base) model
Step 2 integrates all the tasks of the project. Run run.ipynb to execute Step 2

example_run.pdf contains a sample run of the project for an example

Dependencies

pip install -r requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Clinical Data Extraction and Summarization

Workflow

Step 1

Step 2

Dependencies

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
output		output
.gitignore		.gitignore
NER_training.ipynb		NER_training.ipynb
NLP Assignment.pdf		NLP Assignment.pdf
README.md		README.md
example_run.pdf		example_run.pdf
requirements.txt		requirements.txt
run.ipynb		run.ipynb
semantic_search.py		semantic_search.py

SahuH/LLM-summarization

Folders and files

Latest commit

History

Repository files navigation

Clinical Data Extraction and Summarization

Workflow

Step 1

Step 2

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages