8000 Data input and T006 Maximum common substructure · Issue #429 · volkamerlab/teachopencadd · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Data input and T006 Maximum common substructure #429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dgcovell opened this issue Feb 10, 2025 · 0 comments
Open

Data input and T006 Maximum common substructure #429

dgcovell opened this issue Feb 10, 2025 · 0 comments
Labels
enhancement New feature or request question Further information is requested

Comments

@dgcovell
Copy link

To begin, I would like to thank the Volkamer lab for this series of tutorials. Working through each has been rewarding and instructive.

I have focused on T006 Maximum common substructure and am in need of further details. The first has to do with input data
( ../T005_compound_clustering/data/molecule_set_largest_cluster.sdf) which has been processed through the code appearing
under T000. Replacing this sdf file with my own is not straightforward.
sdf = str(HERE / "../T005_compound_clustering/data/molecule_set_largest_cluster.sdf")
supplier = Chem.ForwardSDMolSupplier(sdf)
mols = list(supplier)
print(f"Set with {len(mols)} molecules loaded.")
print(dir(mols))

Set with 145 molecules loaded.
['add', 'class', 'class_getitem', 'contains', 'delattr', 'delitem', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'getitem', 'gt', 'hash', 'iadd', 'imul', 'init', 'init_subclass', 'iter', 'le', 'len', 'lt', 'mul', 'ne', 'new', 'reduce', 'reduce_ex', 'repr', 'reversed', 'rmul', 'setattr', 'setitem', 'sizeof', 'str', 'subclasshook', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort']

  1. Replacing the input file with my own yields the same results for print(dir(mols)). However the designation of mols <- list(supplier) where
    supplier <- Chem.ForwardSDMolSupplier(sdf_new) cannot be completed. I trace Chem.ForwardSDMolSupplier back to T000, where the
    file has been derived/processed from the Chembl library. Can you please indicate the steps for defining mols from a sdf file created with openbabel?

  2. My second question has to correcting the error described below.

Add molecule column to data frame

PandasTools.AddMoleculeColumnToFrame(mol_df, "smiles")
mol_df.head(3)
Failed to patch pandas - unable to change molecule rendering

Blog suggestions for correcting this do not appear to work. Might you have a solution?

  1. My third question is whether the steps for searching a large database using the SMARTS derived in T006 are available? I find
    https://github.com/rdkit/rdkit-tutorials/blob/master/notebooks/002_SMARTS_SubstructureMatching.ipynb
    However a tutorial that could be used to search a large database for mcs’s as derived in T006 would be helpful?
    I believe this functionality is imbedded in T006, but not sure. As with the earlier question, searching from a generic sdf file would be needed.

  2. Fourth, I have a few related questions specific to t006:
    a. Is there a way to enlarge the images. Highlighting is difficult to see. Or maybe there is a way to enlarge only the highlighting or enhance the color?
    b. Can you provide the code for converting between smiles and smarts. I realize the world has gone to smarts. Having a smarts to smiles might help move me forward.

Thanks in advance for your help and providing this tutorial series. Hopefully I can process these and any future ones. I see that an example of Machine Learning is listed. But this is not in the teachopencadd suite. If I upload its pynb should it work as well, or is there a need to include data files?

Regards,
David Covell, Ph.D.
NIH-NCI

@sakhawathsumit sakhawathsumit added enhancement New feature or request question Further information is requested labels Feb 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants
0