You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Draw.MolsToGridImage([Chem.MolFromSmarts(sm) for sm in list(features_dict_bits.values())],
molsPerRow = 5,
subImgSize = (200, 100),
legends = [str(bn) for bn in list(features_dict_bits.keys())]
)
Second molecule:
input_smiles = 'Cn1cc(nn1)N2Cc3ncncc3C2=O'
features_dict_bits = dict()
features_dict_onbits = dict()
m = Chem.MolFromSmiles(input_smiles)
sfp = mfpgen.GetSparseFingerprint(m, additionalOutput = ao)
bitinfomap = ao.GetBitInfoMap()
bits = list(bitinfomap.keys())
for b, ob in zip(bits, onbits) :
if b not in features_dict_bits :
tu = bitinfomap[b]
tu0 = tu[0]
a = tu0[0]
r = tu0[1]
env = Chem.FindAtomEnvironmentOfRadiusN(m, r, a)
atoms = set()
if r > 0 :
for bidx in env :
atoms.add(m.GetBondWithIdx(bidx).GetBeginAtomIdx())
atoms.add(m.GetBondWithIdx(bidx).GetEndAtomIdx())
usm = Chem.MolFragmentToSmiles(m, atomsToUse = list(atoms), bondsToUse = env, rootedAtAtom = a)
else :
atoms.add(a)
usm = Chem.MolFragmentToSmiles(m, atomsToUse = list(atoms), bondsToUse = [], rootedAtAtom = a)
features_dict_bits[b] = usm
features_dict_onbits[ob] = usm
Draw.MolsToGridImage([Chem.MolFromSmarts(sm) for sm in list(features_dict_bits.values())],
molsPerRow = 5,
subImgSize = (200, 100),
legends = [str(bn) for bn in list(features_dict_bits.keys())]
)
You may spot that feature bit 2685705257 is present in both dictionaries, but: 1) it maps to a different substructure, assuming that our code to generate the fragment is correct; 2) in the first molecule it is a 4-membered aromatic ring, in the second molecule it probably comes from some ring-opening of the triazole.
Do you have any explanation?
Is this a bit collision of the Morgan algorithm?
Do you know of or expect similar cases?
Thanks
The text was updated successfully, but these errors were encountered:
Hi @GLPG-GT : this is a bit collision at the main hash level. It's not particularly common, particularly when compared to collisions due to bit folding, but it is definitely something that can happen.
Here's another example we identified some years ago: #814
The type of hash function that we use to convert atom environments into bit IDs are never perfect and will always produce this type of collision. The hope is that they will be rare.
As an aside: your fragment drawings would be more representative of what the fragment actually is if you use MolFromSmiles() with sanitization turned off instead of MolFromSmarts() (which interprets unspecified bonds as 'single or aromatic')
Loading modules:
Initialisation of the fingerprint generator:
First molecule:
Second molecule:
You may spot that feature bit 2685705257 is present in both dictionaries, but: 1) it maps to a different substructure, assuming that our code to generate the fragment is correct; 2) in the first molecule it is a 4-membered aromatic ring, in the second molecule it probably comes from some ring-opening of the triazole.
Do you have any explanation?
Is this a bit collision of the Morgan algorithm?
Do you know of or expect similar cases?
Thanks
The text was updated successfully, but these errors were encountered: