[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tautomer hash fix #7502 appears to introduce atom/bond order dependence #7953

Open
mcs07 opened this issue Oct 22, 2024 · 0 comments
Open

Tautomer hash fix #7502 appears to introduce atom/bond order dependence #7953

mcs07 opened this issue Oct 22, 2024 · 0 comments
Labels

Comments

@mcs07
Copy link
Contributor
mcs07 commented Oct 22, 2024

Describe the bug
In RDKit 2024.09.1 I am seeing different HetAtomTautomerv2 hashes being generated depending on the order of that atoms and/or bonds in the input. It appears to be related to the recent fix #7502 that shrinks tautomeric zones in the v2 tautomer/protomer hash - I don't see the issue if I revert that change.

To Reproduce

>>> from rdkit import Chem
>>> from rdkit.Chem import rdMolHash

>>> # Same molecule with atoms in different order in input SMILES
>>> mol1 = Chem.MolFromSmiles("CNC(=O)N[C@@H](C)c1ccccc1")
>>> mol2 = Chem.MolFromSmiles("C[C@H](NC(=O)NC)c1ccccc1")
>>> Chem.MolToSmiles(mol1) == Chem.MolToSmiles(mol2)
True
>>> hash1 = rdMolHash.MolHash(mol1, rdMolHash.HashFunction.HetAtomTautomerv2)
>>> hash2 = rdMolHash.MolHash(mol2, rdMolHash.HashFunction.HetAtomTautomerv2)
>>> hash1 == hash2
False
>>> hash1
'[CH3]-[N]:[C](:[O]):[N]:[C](-[CH3])-[c]1:[cH]:[cH]:[cH]:[cH]:[cH]:1_3_0'
>>> hash2
'[C]:[N]:[C](:[O]):[N]-[C@@H](-[CH3])-[c]1:[cH]:[cH]:[cH]:[cH]:[cH]:1_5_0'

Interestingly, using Chem.RenumberAtoms with _smilesAtomOutputOrder on both molecules in this example to get a consistent atom order does not seem to fix this - they still produce different hashes. The bonds are still in a different order so I presume that is the issue. In general I don't think it is straightforward to consistently renumber atoms/bonds for all different tautomers/protomers anyway, so I don't think it is possible to fix this just by renumbering atoms/bonds ahead of hash generation.

Expected behavior
Expect atom order not to affect hashes, i.e. hash1 == hash2 in the above example.

Configuration (please complete the following information):

  • RDKit version: 2024.09.1
  • Python version (if relevant): 3.10
  • Are you using conda? No
  • If you are not using conda: how did you install the RDKit? Source compile
@mcs07 mcs07 added the bug label Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant