8000 Wrong SMARTS pattern in O-benzyl deprotection reaction · Issue #7989 · rdkit/rdkit · GitHub
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrong SMARTS pattern in O-benzyl deprotection reaction #7989

Closed
marcobICR opened this issue Nov 7, 2024 · 11 comments · Fixed by #7990
Closed

Wrong SMARTS pattern in O-benzyl deprotection reaction #7989

marcobICR opened this issue Nov 7, 2024 · 11 comments · Fixed by #7990
Labels
Milestone

Comments

@marcobICR
Copy link
marcobICR commented Nov 7, 2024

Hi everyone,

I have created this issue following the discussion in #7982:
I have noticed that in the benzyl deprotection reaction the SMARTS pattern is wrong, matching phenylethers instead:

import rdkit
from rdkit import Chem
from rdkit.Chem import Draw, rdDeprotect

mol = Chem.MolFromSmiles("Nc1ccccc1Oc1ccccc1")

This is the mol:
image

Draw.MolsToGridImage(
    mols = [rdDeprotect.Deprotect(mol = mol, deprotections = [dep]) for dep in rdDeprotect.GetDeprotections()],
    legends = [dep.full_name for dep in rdDeprotect.GetDeprotections()],
    molsPerRow = 7
)

This is the output:
image

This compound should not be deprotected by the benzyl reaction at all, since this is a phenyl ether.
Looking at the SMARTS in the code:

      {"alcohol", "[O;!$(*C(=O)):1]c1[c;H1][c;H1][c;H1][c;H1][c;H1]1>>[O;H1:1]",
       "Bn", "benzyl", "NOc1ccccc1>>ON"}

It looks like the SMARTS pattern is wrong and it's missing a CH2 group between the O and the phenyl group.
Can someone confirm that this is indeed uncorrect?

Originally posted by @marcobICR in #7982

@bp-kelley
Copy link
Contributor

I believe it should be:

{"alcohol", "[OH1;!$(*C(=O)):1]c1[c;H1][c;H1][c;H1][c;H1][c;H1]1>>[O;H1:1]",
"Bn", "benzyl", "NOc1ccccc1>>ON"},

@bp-kelley
Copy link
Contributor

On inspection, though, it appears to be missing a carbon as well

image

@marcobICR
Copy link
Author
marcobICR commented Nov 7, 2024

Yes, this is what I was pointing at. It's missing the benzylic carbon in the SMARTS definition of the reaction.
It should be:

[OH1;!$(*C(=O)):1][C;H2]c1[c;H1][c;H1][c;H1][c;H1][c;H1]1>>[O;H1:1]

@bp-kelley
Copy link
Contributor
bp-kelley commented Nov 7, 2024

So I think the correct deprotection entry is

{"alcohol", "[O;!$(*C(=O)):1][CH2]c1[c;H1][c;H1][c;H1][c;H1][c;H1]1>>[O;H1:1]",
"Bn", "benzyl", "NOCc1ccccc1>>ON"},

@bp-kelley
Copy link
Contributor

@marcobICR Looks like we agree!

@bp-kelley
Copy link
Contributor

I'll add the MR. If you are using this in practice, do you know how to load your own deprotection patterns until we get the fix in?

@marcobICR
Copy link
Author

Thanks! I wouldn't know how to do the MR anyway 😅. I've just recently started learning how GitHub works.
No, some help would be appreciated to get the local fix done. Thanks for your help!

@bp-kelley
Copy link
Contributor
bp-kelley commented Nov 7, 2024

Here is the current results from testing

>>> from rdkit import Chem
>>> from rdkit.Chem import rdDeprotect
>>> data = rdDeprotect.GetDeprotections()
>>> for d in data:
>>> new_data = []
>>> for d in data:
...   if d.abbreviation == "Bn" and d.deprotection_class == "alcohol":
...      new_data.append(rdDeprotect.DeprotectData("alcohol", "[O;!$(*C(=O)):1][CH2]c1[c;H1][c;H1][c;H1][c;H1][c;H1]1>>[O;H1:1]", "Bn", "benzyl"))
...   else:
...      new_data.append(d)
... 
>>> m = Chem.MolFromSmiles("c1ccccc1Oc1ccccc1")
>>> rdDeprotect.DeprotectInPlace(m, new_data)
False
>>> m = Chem.MolFromSmiles("c1ccccc1CO")
>>> rdDeprotect.DeprotectInPlace(m, new_data)
True
>>> m = Chem.MolFromSmiles("c1ccccc1COc1ccccc1")
>>> rdDeprotect.DeprotectInPlace(m, new_data)
True

@marcobICR
Copy link
Author
marcobICR commented Nov 7, 2024

Here is the current results from testing

>>> from rdkit import Chem
>>> from rdkit.Chem import rdDeprotect
>>> data = rdDeprotect.GetDeprotections()
>>> for d in data:
>>> new_data = []
>>> for d in data:
...   if d.abbreviation == "Bn" and d.deprotection_class == "alcohol":
...      new_data.append(rdDeprotect.DeprotectData("alcohol", "[O;!$(*C(=O)):1][CH2]c1[c;H1][c;H1][c;H1][c;H1][c;H1]1>>[O;H1:1]", "Bn", "benzyl"))
...   else:
...      new_data.append(d)
... 
>>> m = Chem.MolFromSmiles("c1ccccc1Oc1ccccc1")
>>> rdDeprotect.DeprotectInPlace(m, new_data)
False
>>> m = Chem.MolFromSmiles("c1ccccc1CO")
>>> rdDeprotect.DeprotectInPlace(m, new_data)
True
>>> m = Chem.MolFromSmiles("c1ccccc1COc1ccccc1")
>>> rdDeprotect.DeprotectInPlace(m, new_data)
True

Thanks! Just reading through this comment: if the intent of this function is to deprotect protected groups, does it make sense to make it match the simple benzyl alcohol (your second example)? Would it be better to make it match a benzyl-O-something instead?
A quick example could be this SMARTS pattern: "[O;H0;!$(*C(=O)):1][CH2]c1[c;H1][c;H1][c;H1][c;H1][c;H1]1>>[O;H1:1]". Using your example:

from rdkit import Chem
from rdkit.Chem import rdDeprotect
data = rdDeprotect.GetDeprotections()
for d in data:
    new_data = []
for d in data:
    if d.abbreviation == "Bn" and d.deprotection_class == "alcohol":
        new_data.append(rdDeprotect.DeprotectData("alcohol", "[O;H0;!$(*C(=O)):1][CH2]c1[c;H1][c;H1][c;H1][c;H1][c;H1]1>>[O;H1:1]", "Bn", "benzyl"))
    else:   
        new_data.append(d)

m = Chem.MolFromSmiles("c1ccccc1Oc1ccccc1")
print(rdDeprotect.DeprotectInPlace(m, new_data))
m = Chem.MolFromSmiles("c1ccccc1CO")
print(rdDeprotect.DeprotectInPlace(m, new_data))
m = Chem.MolFromSmiles("c1ccccc1COc1ccccc1")
print(rdDeprotect.DeprotectInPlace(m, new_data))
-------------------- Output --------------------
False
False
True

I didn't see that you already made a MR about this...

@bp-kelley
Copy link
Contributor

@marcobICR In general we don't bother, but we could certainly add an option to enforce r-groups at some point.

@marcobICR
8B58
Copy link
Author
marcobICR commented Nov 8, 2024

@bp-kelley it has already been done for the benzyl deprotection of benzylamines. According to this reaction SMARTS:

[NX3;H0,H1;!$(NC=O):1][C;H2]c1[c;H1][c;H1][c;H1][c;H1][c;H1]1>>[N:1]

The amine must have either one or zero hydrogens, that tracks with secondary and tertiary amines only, leaving the simple benzylamine "c1ccccc1CN" untouched since it has two hydrogens.

Edit: other reactions also take this into account. I can arrange a list if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants
0