-
Notifications
You must be signed in to change notification settings - Fork 4
Batch Mode Support and Best Practices for Ligand Screening #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey @Jnelen, not sure if it is actively being worked on, @AlvaroSmorras can answer that perhaps. If you want to add code for your own use, I guess you should feel free. Whether your code will be merged in of course depends on whether the project remains actively maintained in the future.
I use 8 for VS.
I have been advised to use it, but I haven't benchmarked it myself. The relevant paper is here: https://pubs.acs.org/doi/10.1021/ct5010406
In my experience, very roughly an hour on a single GPU for the complete 8 cycles, but most ligands terminate before completing all 8. Of course it depends a lot on the size of the protein, if you use chunking, HMR etc.
Never tried it on CPU alone, but I guess it will be extremely slow. I hope this helps. |
Thanks a lot for the information. I have a first version that works quite well for me with launching SLURM jobs using the openMM backend now. However, currently batching (so multiple compounds per execution) doesn't seem to be supported for openMM, so I'll look into mimicking this behaviour from the amber backend and implement it for openMM. If I can get this to work, I think it might be a nice addition, but as you indicate it is up to the main developers to make the choice to merge it! |
Hi @Jnelen Regarding your questions, I believe @simonbray answered them. I'll only add that I have indeed studied the effect of using HMR and it does not impact the results at all. Here is an example with two benchmark datasets I tried (iridium-green, SERAPHiC-purple). Without a significant impact on the results, it spends almost half the wall clock time. Lets see if I find time to push the publication of the code and benchmarks soon. I usually launch 10 replicas, but 8 is perfectly ok. The secret relies in having a nice WQB threshold to stop the 'labile' binders early. With that, and HMR, they usually run at 800-1000 ns/day using RTX3080. I think the CPU execution will be rather slow. |
Thanks for your input! I'll try to work on the openMM batching in the coming weeks as a side project. Would be wonderful to get it merged if I can finish it! Additionally, I think (hope) the launcher script can also be very convenient to run with openMM (Amber should also work), so hopefully that can also be a nice inclusion. I'll keep you updated and open a PR to review when it's ready! |
Thanks for sharing this @AlvaroSmorras, really nice to see. Did you apply a WQB cutoff threshold running these simulations? I imagine this could change the results quite significantly. |
@simonbray The values shown are the free energies calculated from the exponential average (Jarzynski) of the works and the error bars coming from bootstrapping the WQBs. I ran 20 replicas without a threshold, just to have the same sampling for each point, but there are some interactions with very high dispersion as a result. Something that might be of interest, is that I have also been testing steering at higher speeds and it works with minimal differences too, so we could optimize the simulations also in that dimension. Still, I feel that the equilibration is what is taking longer (specially for the bad/labile ligands) |
Hi @AlvaroSmorras, As a bonus, I’ve also included a convenient SLURM job launcher utility, which should be especially useful for screening a larger number of ligands after initial docking. Would be great to get your thoughts when you have a chance to review it. Looking forward to your feedback! |
Hi! I’ve been exploring this repo and I think that the whole concept is super exciting, with applications ranging from initial hit finding (as a post-docking filter) to hit optimization (accurate ranking of hit compounds) as has been documented across various excellent papers. Really impressive work!
I am trying to use it myself, but I had a few questions:
Batch Mode / CPU Parallelization:
Is there currently (planned) support for batch processing of ligands using OpenMM? Something akin to a built-in batch mode like what is offered with the AMBER backend?
I mostly have access to CPU-heavy clusters (limited GPU availability) and would love to analyze a few hundred ligands efficiently — ideally with SLURM support or parallelization across CPUs.
As a workaround, I’m thinking of writing a launcher script that submits one SLURM job per ligand. If I make it flexible enough, maybe it could be a useful addition to this repo? I'd be happy to share a working prototype once I’ve tested it further. I already have a Singularity container that runs smoothly with the OpenMM backend, and I’m working through additional validation now.
If there is support for OpenMM batch mode already, the launcher script could still help by submitting jobs for each batch, making it practical for systems with multiple GPUs. For example, if you have 400 ligands and 4 GPUs, the script could split them into 4 jobs of 100 ligands each and run them in parallel across the GPUs.
Recommended Settings:
How many SMD cycles do you recommend for virtual screening vs more accurate evaluations? Across papers, documentation and the tutorial, I have seen 5 suggested for VS, and 10–20 for higher precision runs. Is that still your go-to?
What’s your take on using Hydrogen Mass Repartitioning (HMR)? From your experience, does it significantly affect result quality, or is it generally a safe way to speed things up?
Performance Expectations:
Any ballpark estimates for how long it typically takes to process a single ligand in your setup (CPU vs GPU)? Just trying to calibrate my expectations.
Thanks again for the great work on this — really looking forward to experimenting with it more. Would love to contribute back if there's interest in batch mode support or other usability improvements!
Kind regards,
Jochem
The text was updated successfully, but these errors were encountered: