8000 GitHub - brsynth/chebirgroup: Systematic expansion of R-group for ChEBI molecules from the RHEA database
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

brsynth/chebirgroup

Repository files navigation

ChEBI R-group

Data location

https://doi.org/10.57745/V3URYA

Installation

conda env create --file recipes/worklow.yaml --name chebirgroup
pip install --no-deps -e .

Build dataset

1 - Download PubChem

python -m chebirgroup.pubchem.download \
    --output-pubchem-dir <output dir> \
    --output-pubchem-db <output sql database>

2 - Download Rhea

python -m chebirgroup.rhea.download \
    --output-rhea-dir <output dir> \
    --parameter-release-int <int>

3 - R-group search

snakemake \
    -p \
    -j 48 \
    -c 48 \
    --workflow-profile template/chebirgroup \
    -s ./src/chebirgroup/rgroup/Snakefile \
    --use-conda \
    --latency-wait 5 \
    --rerun-incomplete \
    --config depot_dir=./src/chebirgroup/rgroup input_chebi_csv=rhea-chebi-smiles.csv input_pubchem_db=pubchem.db output_dir_str=chebi

Dataset overview

The Snakemake workflow produces a csv.gz file containing:

column name type
smiles_rhea str
chebi List[str]
num_heavy_atoms int
exact_mol_wt int
additional_substituents_smiles List[str]
additional_substituents_pubchem_cid List[List[str]]
only_match_rgroup_smiles List[str]
only_match_rgroup_pubchem_cid List[List[str]]

About

Systematic expansion of R-group for ChEBI molecules from the RHEA database

Topics

Resources

License

Stars

Watchers

Forks

0