[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SanitizeRxn replaces lists of atoms with a wildcard #8068

Open
hadoth opened this issue Dec 5, 2024 · 2 comments
Open

SanitizeRxn replaces lists of atoms with a wildcard #8068

hadoth opened this issue Dec 5, 2024 · 2 comments
Assignees
Labels

Comments

@hadoth
Copy link
hadoth commented Dec 5, 2024

Describe the bug
In some cases the SanitizeRxn function replaces list of atoms with a wildcard ,e.g.:
[H,C:4][C:1]([H])([#6:5])[O:2][H]>>[H,C:4][C:1](=[O:2])[#6:5] gives [*:4][C&!H0:1]([#6:5])[O&!H0:2]>>[*:4][C:1](=[O:2])[#6:5]
[H,C:4][C:1]([H])([O:2][H])[C:5][C:3]>>[H,C:4][C:1](=[O:2])[C:5][C:3] gives [*:4][C&!H0:1]([O&!H0:2])[C:5][C:3]>>[*:4][C:1](=[O:2])[C:5][C:3]
[O:3]([H])[C:1]([H,C:4])[C:2]([H,C:6])[O:5][H]>>[O:3]=[C:1]([H,C:4]).[C:2]([H,C:6])=[O:5] gives [O&!H0:3][C:1]([*:4])[C:2]([*:6])[O&!H0:5]>>[O:3]=[C:1][*:4].[C:2]([*:6])=[O:5]
[O:6]([H])[C:5]([C:4])([H,C:7])[C:2][C:1](=[O:3])[O:8][C:9]>>[C:4][C:5]([H,C:7])=[O:6].[C:2]([H])[C:1](=[O:3])[O:8][C:9] gives [O&!H0:6][C:5]([C:4])([*:7])[C:2][C:1](=[O:3])[O:8][C:9]>>[C:4][C:5]([*:7])=[O:6].[C:2]([#1])[C:1](=[O:3])[O:8][C:9]
[c:1][C,S:2]([H])=[O]>>[c:1][H].[Cl][C,S:2]([H])=[O] gives [c:1][!H0:2]=O>>[c:1][#1].Cl[*:2]([#1])=O

The template [O:6]([H])[C:5]([C:4])([H,C:7])[C:2][C:1](=[O:3])[O:8][C:9]>>[C:4][C:5]([H,C:7])=[O:6].[C:2]([H])[C:1](=[O:3])[O:8][C:9] should not produce any results for CC(O)(Cl)CC(=O)OC as there is chlorine in the place where either carbon or hydrogen should match(index 7). However, after sanitization, this reaction does match and produces following results COC(=O)CC(C)(O)Cl>>CC(=O)Cl.[H]CC(=O)OC.

To Reproduce

from rdkit.Chem import AllChem
from rdkit.Chem import rdChemReactions

def canonicalize_template(template: str) -> str:
    reaction = rdChemReactions.ReactionFromSmarts(template)
    AllChem.SanitizeRxn(reaction)
    return rdChemReactions.ReactionToSmarts(reaction)

for template in [
    "[H,C:4][C:1]([H])([#6:5])[O:2][H]>>[H,C:4][C:1](=[O:2])[#6:5]",
    "[H,C:4][C:1]([H])([O:2][H])[C:5][C:3]>>[H,C:4][C:1](=[O:2])[C:5][C:3]",
    "[O:3]([H])[C:1]([H,C:4])[C:2]([H,C:6])[O:5][H]>>[O:3]=[C:1]([H,C:4]).[C:2]([H,C:6])=[O:5]",
    "[O:6]([H])[C:5]([C:4])([H,C:7])[C:2][C:1](=[O:3])[O:8][C:9]>>[C:4][C:5]([H,C:7])=[O:6].[C:2]([H])[C:1](=[O:3])[O:8][C:9]",
    "[c:1][C,S:2]([H])=[O]>>[c:1][H].[Cl][C,S:2]([H])=[O]",
]:
    print(canonicalize_template(template))

Observed behavior
Atom lists are replaced with wildcards.

[*:4][C&!H0:1]([#6:5])[O&!H0:2]>>[*:4][C:1](=[O:2])[#6:5]
[*:4][C&!H0:1]([O&!H0:2])[C:5][C:3]>>[*:4][C:1](=[O:2])[C:5][C:3]
[O&!H0:3][C:1]([*:4])[C:2]([*:6])[O&!H0:5]>>[O:3]=[C:1][*:4].[C:2]([*:6])=[O:5]
[O&!H0:6][C:5]([C:4])([*:7])[C:2][C:1](=[O:3])[O:8][C:9]>>[C:4][C:5]([*:7])=[O:6].[C:2]([#1])[C:1](=[O:3])[O:8][C:9]
[c:1][!H0:2]=O>>[c:1][#1].Cl[*:2]([#1])=O

Expected behavior
Atom lists should not be replaced.

[H,C:4][C&!H0:1]([#6:5])[O&!H0:2]>>[*:4][C:1](=[O:2])[#6:5]
[H,C:4][C&!H0:1]([O&!H0:2])[C:5][C:3]>>[*:4][C:1](=[O:2])[C:5][C:3]
[O&!H0:3][C:1]([H,C:4])[C:2]([*:6])[O&!H0:5]>>[O:3]=[C:1][*:4].[C:2]([*:6])=[O:5]
[O&!H0:6][C:5]([C:4])([H,C:7])[C:2][C:1](=[O:3])[O:8][C:9]>>[C:4][C:5]([*:7])=[O:6].[C:2]([#1])[C:1](=[O:3])[O:8][C:9]
[c:1][C,S&!H0:2]=O>>[c:1][#1].Cl[*:2]([#1])=O

Configuration (please complete the following information):

  • RDKit version: 2024.3.6
  • OS: macOS Sonoma 14.7.1
  • Python version (if relevant): 3.11.10
  • Are you using conda? Not
  • If you are not using conda: how did you install the RDKit? uv pip install rdkit
@hadoth hadoth added the bug label Dec 5, 2024
@bp-kelley
Copy link
Contributor

This does look odd, I'll look into this.

@bp-kelley bp-kelley self-assigned this Dec 9, 2024
@bp-kelley
Copy link
Contributor

The bug is in Sanitize Rgroup Names

>>> AllChem.ReactionToSmarts(rxn)
'[H1,C:4][C:1]([#1])([#6:5])[O:2][#1]>>[H1,C:4][C:1](=[O:2])[#6:5]'
>>> AllChem.SanitizeRxn(rxn, rdChemReactions.SANITIZE_RGROUP_NAMES)
rdkit.Chem.rdChemReactions.SanitizeFlags.SANITIZE_NONE
>>> AllChem.ReactionToSmarts(rxn)
'[*:4][C:1]([#1])([#6:5])[O:2][#1]>>[*:4][C:1](=[O:2])[#6:5]'

For now you can turn this off

AllChem.SanitizeRxn(rxn, rdChemReactions.SANITIZE_ADJUST_REACTANTS | rdChemReactions.SANITIZE_MERGEHS | rdChemReactions.SANITIZE_ATOM_MAPS)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants