Official implementation of MADGEN: Mass-Spec attends to De Novo Molecular generation by Yinkai Wang, Xiaohui Chen, Liping Liu, and Soha Hassoun.
conda create --name madgen python=3.9 rdkit=2023.09.5 -c conda-forge -y
conda activate madgen
pip install -r requirements.txt
The processed data is available in the zenodo repository: MADGEN.
wget https://zenodo.org/records/15036069/files/msgym.pkl?download=1 -O ./data/msgym/raw/msgym.pkl
wget https://zenodo.org/records/15036069/files/canopus.pkl?download=1 -O ./data/canopus/raw/canopus.pkl
The NIST is a commercial dataset.
python train.py --config configs/{dataset_name}.yaml --model Madgen
CUDA_VISIBLE_DEVICES=1 python sample.py \
--config configs/{dataset_name}.yaml \
--checkpoint {checkpoint_path}\
--samples samples \
--model Madgen \
--mode test \
--n_samples 50 \
--n_steps 100 \
--table_name {table_name} \
--sampling_seed 42
To run the evaluation, you now need to provide the file path as an argument:
python evaluation_generation.py --file_path /path/to/your/csvfile.csv
For the predictive retrival, please refer to JESTR.
If you have any questions, please contact yinkai.wang@tufts.edu and soha.hassoun@tufts.edu.
If you find this code useful for your research, please consider citing our paper:
@inproceedings{
wang2025madgen,
title={{MADGEN}: Mass-Spec attends to De Novo Molecular generation},
author={Yinkai Wang and Xiaohui Chen and Liping Liu and Soha Hassoun},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=78tc3EiUrN}
}