[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LabuteASA Descriptor Misinterprets Implicit Hydrogen #8018

Open
swteer opened this issue Nov 18, 2024 · 1 comment
Open

LabuteASA Descriptor Misinterprets Implicit Hydrogen #8018

swteer opened this issue Nov 18, 2024 · 1 comment
Labels

Comments

@swteer
Copy link
swteer commented Nov 18, 2024

Describe the bug
When comparing atom contributions (for heavy atoms) to the surface area of the same molecule with and without explicit hydrogen, the values differ slightly. Additionally, the total hydrogen contribution varies between the two approaches.

To Reproduce

from rdkit import Chem
from rdkit.Chem import rdMolDescriptors

# Function to compute LabuteASA for each atom
def compute_labute_asa(mol, includeHs=True):
    # Compute LabuteASA for the molecule
    labute_asa = rdMolDescriptors.CalcLabuteASA(mol, includeHs=includeHs)
    # Get the contribution of each atom
    (atom_contribs, H_contrib) = rdMolDescriptors._CalcLabuteASAContribs(mol)
    # Print the contribution of each atom
    for i, contrib in enumerate(atom_contribs):
        print(f"Atom {i} ({mol.GetAtomWithIdx(i).GetSymbol()}): {contrib:.2f}")
    print('Hydrogen contribution', H_contrib)
    return labute_asa

smiles = "CCO"
# Without explicit H
mol = Chem.MolFromSmiles(smiles)
print("Without explicit H:")
print("-------------------")
total_asa = compute_labute_asa(mol)
print(f"Total LabuteASA: {total_asa:.2f}")
print("===")
mol2 = Chem.AddHs(mol)
print("With explicit H:")
print("-------------------")
total_asa = compute_labute_asa(mol2, includeHs=False)
print(f"Total LabuteASA: {total_asa:.2f}")`

Output

Without explicit H:
-------------------
Atom 0 (C): 6.92
Atom 1 (C): 6.61
Atom 2 (O): 5.11
Hydrogen contribution 1.2612803353779227
Total LabuteASA: 19.90
===
With explicit H:
-------------------
Atom 0 (C): 6.88
Atom 1 (C): 6.58
Atom 2 (O): 5.11
Atom 3 (H): 1.31
Atom 4 (H): 1.31
Atom 5 (H): 1.31
Atom 6 (H): 1.31
Atom 7 (H): 1.31

Expected behavior
The LabuteASA descriptor should yield consistent results regardless of whether explicit or implicit hydrogen atoms are used.

Configuration (please complete the following information):

  • RDKit version: 2024.03.5
  • OS: Windows 11
  • Python version (if relevant): 3.10
  • Are you using conda? Yes, but RDKit installed via pip

Additional context

When looking at the code in MolSurf.cpp starting line 58, for each explicit atoms the code count its interaction with one and only one implicit H.
Again line 75, when computing the surface for implicit hydrogens, only 1 hydrogen is considered.
Shouldn't the code lookup for the count of implicit Hydrogen?

Additionally, if the code is inspired by A Widely Applicable Set of Descriptors, then in lines 90-91 and 101-102, shouldn't there be brackets as follows?

Vi[idx1] += (Rj * Rj - (Ri - dij)**2) / dij;
Vi[idx2] += (Ri * Ri - (Rj - dij)**2) / dij;

Instead of:

Vi[idx1] += Rj * Rj - (Ri - dij)**2 / dij;
Vi[idx2] += Ri * Ri - (Rj - dij)**2 / dij;
@swteer swteer added the bug label Nov 18, 2024
@greglandrum
Copy link
Member

confirmed that there is definitely something incorrect going on here.
Thanks for noticing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants