Demo of Rust molecule file format parsing using a grammar-based parser generator.
Grammars are based on the following:
The main dependencies are:
See the Cargo.toml
for details.
Ensure you have rustup installed. You can also get Rust over conda:
conda install -c conda-forge rust
Clone and enter this project and run the tool using cargo:
git clone https://github.com/PatrickPenner/mol-parsing
cd mol-parsing
cargo run --release # building for release makes a significant difference when parsing larger databases
This should run the build and give you this usage at the end:
Usage: main <format: smiles | sdf | sdfgz | pdb> <path to file>
You can now run the tool with whatever input file you have, for example the ChEMBL 33 in .sdf.gz
:
cargo run --release sdfgz chembl_33.sdf.gz
The grammars can be found in src/grammar
and you are encouraged to hack around in them. A very useful tool is pest's web-based tool to write grammars at pest.rs.