Wrong SMARTS pattern in O-benzyl deprotection reaction #7989

marcobICR · 2024-11-07T10:20:26Z

Hi everyone,

I have created this issue following the discussion in #7982:
I have noticed that in the benzyl deprotection reaction the SMARTS pattern is wrong, matching phenylethers instead:

import rdkit
from rdkit import Chem
from rdkit.Chem import Draw, rdDeprotect

mol = Chem.MolFromSmiles("Nc1ccccc1Oc1ccccc1")

This is the mol:

Draw.MolsToGridImage(
    mols = [rdDeprotect.Deprotect(mol = mol, deprotections = [dep]) for dep in rdDeprotect.GetDeprotections()],
    legends = [dep.full_name for dep in rdDeprotect.GetDeprotections()],
    molsPerRow = 7
)

This is the output:

This compound should not be deprotected by the benzyl reaction at all, since this is a phenyl ether.
Looking at the SMARTS in the code:

      {"alcohol", "[O;!$(*C(=O)):1]c1[c;H1][c;H1][c;H1][c;H1][c;H1]1>>[O;H1:1]",
       "Bn", "benzyl", "NOc1ccccc1>>ON"}

It looks like the SMARTS pattern is wrong and it's missing a CH2 group between the O and the phenyl group.
Can someone confirm that this is indeed uncorrect?

Originally posted by @marcobICR in #7982

The text was updated successfully, but these errors were encountered:

bp-kelley · 2024-11-07T12:40:52Z

I believe it should be:

{"alcohol", "[OH1;!$(*C(=O)):1]c1[c;H1][c;H1][c;H1][c;H1][c;H1]1>>[O;H1:1]",
"Bn", "benzyl", "NOc1ccccc1>>ON"},

bp-kelley · 2024-11-07T12:43:03Z

On inspection, though, it appears to be missing a carbon as well

marcobICR · 2024-11-07T12:44:05Z

Yes, this is what I was pointing at. It's missing the benzylic carbon in the SMARTS definition of the reaction.
It should be:

[OH1;!$(*C(=O)):1][C;H2]c1[c;H1][c;H1][c;H1][c;H1][c;H1]1>>[O;H1:1]

bp-kelley · 2024-11-07T12:45:19Z

So I think the correct deprotection entry is

{"alcohol", "[O;!$(*C(=O)):1][CH2]c1[c;H1][c;H1][c;H1][c;H1][c;H1]1>>[O;H1:1]",
"Bn", "benzyl", "NOCc1ccccc1>>ON"},

bp-kelley · 2024-11-07T12:46:29Z

@marcobICR Looks like we agree!

bp-kelley · 2024-11-07T12:47:32Z

I'll add the MR. If you are using this in practice, do you know how to load your own deprotection patterns until we get the fix in?

marcobICR · 2024-11-07T12:53:52Z

Thanks! I wouldn't know how to do the MR anyway 😅. I've just recently started learning how GitHub works.
No, some help would be appreciated to get the local fix done. Thanks for your help!

bp-kelley · 2024-11-07T12:57:31Z

Here is the current results from testing

>>> from rdkit import Chem
>>> from rdkit.Chem import rdDeprotect
>>> data = rdDeprotect.GetDeprotections()
>>> for d in data:
>>> new_data = []
>>> for d in data:
...   if d.abbreviation == "Bn" and d.deprotection_class == "alcohol":
...      new_data.append(rdDeprotect.DeprotectData("alcohol", "[O;!$(*C(=O)):1][CH2]c1[c;H1][c;H1][c;H1][c;H1][c;H1]1>>[O;H1:1]", "Bn", "benzyl"))
...   else:
...      new_data.append(d)
... 
>>> m = Chem.MolFromSmiles("c1ccccc1Oc1ccccc1")
>>> rdDeprotect.DeprotectInPlace(m, new_data)
False
>>> m = Chem.MolFromSmiles("c1ccccc1CO")
>>> rdDeprotect.DeprotectInPlace(m, new_data)
True
>>> m = Chem.MolFromSmiles("c1ccccc1COc1ccccc1")
>>> rdDeprotect.DeprotectInPlace(m, new_data)
True

marcobICR · 2024-11-07T13:48:57Z

Here is the current results from testing

>>> from rdkit import Chem
>>> from rdkit.Chem import rdDeprotect
>>> data = rdDeprotect.GetDeprotections()
>>> for d in data:
>>> new_data = []
>>> for d in data:
...   if d.abbreviation == "Bn" and d.deprotection_class == "alcohol":
...      new_data.append(rdDeprotect.DeprotectData("alcohol", "[O;!$(*C(=O)):1][CH2]c1[c;H1][c;H1][c;H1][c;H1][c;H1]1>>[O;H1:1]", "Bn", "benzyl"))
...   else:
...      new_data.append(d)
... 
>>> m = Chem.MolFromSmiles("c1ccccc1Oc1ccccc1")
>>> rdDeprotect.DeprotectInPlace(m, new_data)
False
>>> m = Chem.MolFromSmiles("c1ccccc1CO")
>>> rdDeprotect.DeprotectInPlace(m, new_data)
True
>>> m = Chem.MolFromSmiles("c1ccccc1COc1ccccc1")
>>> rdDeprotect.DeprotectInPlace(m, new_data)
True

Thanks! Just reading through this comment: if the intent of this function is to deprotect protected groups, does it make sense to make it match the simple benzyl alcohol (your second example)? Would it be better to make it match a benzyl-O-something instead?
A quick example could be this SMARTS pattern: "[O;H0;!$(*C(=O)):1][CH2]c1[c;H1][c;H1][c;H1][c;H1][c;H1]1>>[O;H1:1]". Using your example:

from rdkit import Chem
from rdkit.Chem import rdDeprotect
data = rdDeprotect.GetDeprotections()
for d in data:
    new_data = []
for d in data:
    if d.abbreviation == "Bn" and d.deprotection_class == "alcohol":
        new_data.append(rdDeprotect.DeprotectData("alcohol", "[O;H0;!$(*C(=O)):1][CH2]c1[c;H1][c;H1][c;H1][c;H1][c;H1]1>>[O;H1:1]", "Bn", "benzyl"))
    else:   
        new_data.append(d)

m = Chem.MolFromSmiles("c1ccccc1Oc1ccccc1")
print(rdDeprotect.DeprotectInPlace(m, new_data))
m = Chem.MolFromSmiles("c1ccccc1CO")
print(rdDeprotect.DeprotectInPlace(m, new_data))
m = Chem.MolFromSmiles("c1ccccc1COc1ccccc1")
print(rdDeprotect.DeprotectInPlace(m, new_data))
-------------------- Output --------------------
False
False
True

I didn't see that you already made a MR about this...

bp-kelley · 2024-11-07T18:00:48Z

@marcobICR In general we don't bother, but we could certainly add an option to enforce r-groups at some point.

marcobICR · 2024-11-08T13:12:48Z

@bp-kelley it has already been done for the benzyl deprotection of benzylamines. According to this reaction SMARTS:

[NX3;H0,H1;!$(NC=O):1][C;H2]c1[c;H1][c;H1][c;H1][c;H1][c;H1]1>>[N:1]

The amine must have either one or zero hydrogens, that tracks with secondary and tertiary amines only, leaving the simple benzylamine "c1ccccc1CN" untouched since it has two hydrogens.

Edit: other reactions also take this into account. I can arrange a list if needed.

bp-kelley added a commit to bp-kelley/rdkit that referenced this issue Nov 7, 2024

Fixes rdkit#7989 Incorrect benzyl deprotection reaction

b1eda2d

bp-kelley mentioned this issue Nov 7, 2024

Fixes #7989 Incorrect benzyl deprotection reaction #7990

Merged

greglandrum added the bug label Nov 7, 2024

greglandrum added this to the 2024_09_3 milestone Nov 7, 2024

greglandrum pushed a commit that referenced this issue Nov 7, 2024

Fixes #7989 Incorrect benzyl deprotection reaction (#7990)

75e8858

greglandrum closed this as completed in #7990 Nov 7, 2024

greglandrum pushed a commit that referenced this issue Nov 29, 2024

Fixes #7989 Incorrect benzyl deprotection reaction (#7990)

c17d091

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wrong SMARTS pattern in O-benzyl deprotection reaction #7989

Wrong SMARTS pattern in O-benzyl deprotection reaction #7989

Wrong SMARTS pattern in O-benzyl deprotection reaction #7989

Wrong SMARTS pattern in O-benzyl deprotection reaction #7989

Comments