ClinicalPseudoBERT Pretrained LLM

This repository has the corpus preprocessing and training instructions for the models and corpora from the paper Enhancing Clinical Models with Pseudo Data for De-identification.

Reproducing Results

Install the MIMIC-III database as described in the mimic package install section.
Uncompress the pseudo sources: tar jxf pseudo-source.tar.bz2
Install Python dependencies: pip install -r src/python/requirements-all.txt
Load the SQLite DB from downloaded lists: ./cpbert load
Create the admission files: ./cpbert admids
Create the masked and pseudo corpora files: ./cpbert process <admission ID file>
Create admission IDs to process: ./cpbert adms --shuffle -s 10 -o adm-ids
Process the first set of 10: ./cpbert process adm-ids/0000 -d pseudos
Create the corpus file: find pseudos -name \*-pseudo.txt -exec cat {} >> pseudo-corpus.txt \;
Confirm corpus status as newlines, words, and byte counts: wc pseudo-corpus.txt
Follow the instructions to reproduce the de-identification results.

Models and Corpora

The pretrained, de-identification models and pseudo corpus are available upon request. All require proper documentation of certification by Physionet as explained in the paper.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
deid		deid
etc		etc
resources		resources
src		src
zenbuild @ 9acb012		zenbuild @ 9acb012
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE.md		LICENSE.md
README.md		README.md
cpbert		cpbert
makefile		makefile
pseudo-source.tar.bz2		pseudo-source.tar.bz2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ClinicalPseudoBERT Pretrained LLM

Reproducing Results

Models and Corpora

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

APPFL/cpbert

Folders and files

Latest commit

History

Repository files navigation

ClinicalPseudoBERT Pretrained LLM

Reproducing Results

Models and Corpora

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages