-
Notifications
You must be signed in to change notification settings - Fork 892
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3D SDF generation from csv file #2453
Comments
|
@lilleswing I got an error like following
any help? |
Hi,
By default PandasTools.AddMoleculeColumnToFrame() create a column named
"ROMol". (http://www.rdkit.org/docs/source/rdkit.Chem.PandasTools.html)
To add hydrogens the line should be changed to:
pp['ROMol'] = [Chem.AddHs(x) for x in pp['ROMol'].values.tolist()]
For you to calculate 3D coords look at this
http://www.rdkit.org/docs/GettingStartedInPython.html#working-with-3d-molecules
.
I would advise to create a function that takes input one molecule, performs
the steps listed above and then returns the new molecule object.
With pandas you can apply/map this function to the column with the RDKit
Molecule objects and you can store the result in a column of your choice.
Then you just use PandasTools functionality to write the dataframe in SDF.
Regards,
Christos
Christos Kannas
Scientific Software Developer (Cheminformatics)
[image: View Christos Kannas's profile on LinkedIn]
<http://cy.linkedin.com/in/christoskannas>
…On Sat, 18 May 2019 at 10:16, sbhakat ***@***.***> wrote:
@lilleswing <https://github.com/lilleswing> I got an error like following
>>> from rdkit import Chem
>>> pp['Smiles'] = [Chem.AddHs(x) for x in pp['Smiles'].values.tolist()]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 1, in <listcomp>
Boost.Python.ArgumentError: Python argument types in
rdkit.Chem.rdmolops.AddHs(str)
did not match C++ signature:
AddHs(RDKit::ROMol mol, bool explicitOnly=False, bool addCoords=False, boost::python::api::object bool addResidueInfo=False)
any help?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2453?email_source=notifications&email_token=AA4P6SSJTWPX7YTAMEYO3YTPV7CONA5CNFSM4HNUV7J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVWK5NA#issuecomment-493661876>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA4P6SWAQ5ELKUYHG7XVVATPV7CONANCNFSM4HNUV7JQ>
.
|
@CKannas got an error like
Any help? |
Hi,
Could you share your code / jupyter notebook?
Best,
Christos
Christos Kannas
Scientific Software Developer (Cheminformatics)
[image: View Christos Kannas's profile on LinkedIn]
<http://cy.linkedin.com/in/christoskannas>
…On Mon, 20 May 2019 at 19:53, sbhakat ***@***.***> wrote:
@CKannas <https://github.com/CKannas> got an error like
>>> pp['ROMol'] = [Chem.AddHs(x) for x in pp['ROMol'].values.tolist()]
Traceback (most recent call last):
File "/home/sbhakat/miniconda2/envs/rdkit/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'ROMol'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sbhakat/miniconda2/envs/rdkit/lib/python3.6/site-packages/pandas/core/frame.py", line 2688, in __getitem__
return self._getitem_column(key)
File "/home/sbhakat/miniconda2/envs/rdkit/lib/python3.6/site-packages/pandas/core/frame.py", line 2695, in _getitem_column
return self._get_item_cache(key)
File "/home/sbhakat/miniconda2/envs/rdkit/lib/python3.6/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache
values = self._data.get(item)
File "/home/sbhakat/miniconda2/envs/rdkit/lib/python3.6/site-packages/pandas/core/internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "/home/sbhakat/miniconda2/envs/rdkit/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'ROMol'
Any help?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2453?email_source=notifications&email_token=AA4P6SSFE5GIBCS4PT7IF7DPWLXRJA5CNFSM4HNUV7J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVZX52Q#issuecomment-494108394>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA4P6SUQBVWB2V456ZRQZKDPWLXRJANCNFSM4HNUV7JQ>
.
|
Hi,
The error shows that your dataframe doesn't have a column named "ROMol".
You need to make sure that you use the column that has the RDKit Molecules.
Best,
Christos
Christos Kannas
Scientific Software Developer (Cheminformatics)
[image: View Christos Kannas's profile on LinkedIn]
<http://cy.linkedin.com/in/christoskannas>
…On Mon, 20 May 2019 at 19:53, sbhakat ***@***.***> wrote:
@CKannas <https://github.com/CKannas> got an error like
>>> pp['ROMol'] = [Chem.AddHs(x) for x in pp['ROMol'].values.tolist()]
Traceback (most recent call last):
File "/home/sbhakat/miniconda2/envs/rdkit/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3078, in get_loc
return self._engine.get_loc(key)
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'ROMol'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/sbhakat/miniconda2/envs/rdkit/lib/python3.6/site-packages/pandas/core/frame.py", line 2688, in __getitem__
return self._getitem_column(key)
File "/home/sbhakat/miniconda2/envs/rdkit/lib/python3.6/site-packages/pandas/core/frame.py", line 2695, in _getitem_column
return self._get_item_cache(key)
File "/home/sbhakat/miniconda2/envs/rdkit/lib/python3.6/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache
values = self._data.get(item)
File "/home/sbhakat/miniconda2/envs/rdkit/lib/python3.6/site-packages/pandas/core/internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "/home/sbhakat/miniconda2/envs/rdkit/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
8000
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'ROMol'
Any help?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2453?email_source=notifications&email_token=AA4P6SSFE5GIBCS4PT7IF7DPWLXRJA5CNFSM4HNUV7J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVZX52Q#issuecomment-494108394>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA4P6SUQBVWB2V456ZRQZKDPWLXRJANCNFSM4HNUV7JQ>
.
|
Here is the full script which I am trying to execute
The only_smile.csv contains SMILES in each line something as mentioned before. |
Check this gist
https://gist.github.com/CKannas/5ead64ea673388daedc60de8ef041e28
Christos
Christos Kannas
Scientific Software Developer (Cheminformatics)
[image: View Christos Kannas's profile on LinkedIn]
<http://cy.linkedin.com/in/christoskannas>
…On Tue, 21 May 2019 at 20:06, sbhakat ***@***.***> wrote:
Here is the full script which I am trying to execute
import pandas as pd
from rdkit.Chem import PandasTools
from rdkit import Chem
from rdkit import RDConfig
pp = pd.read_csv('only_smile.csv', names=['Smiles'])
pp['ROMol'] = [Chem.AddHs(x) for x in pp['ROMol'].values.tolist()]
PandasTools.AddMoleculeColumnToFrame(pp,'Smiles')
PandasTools.WriteSDF(pp, 'pp_out.sdf')
The only_smile.csv contains SMILES in each line something as mentioned
before.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2453?email_source=notifications&email_token=AA4P6SQO3JJ7LOJH3COKRU3PWRB3FA5CNFSM4HNUV7J2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODV44BBQ#issuecomment-494518406>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA4P6SVZKYITSNY3GFQ2EJ3PWRB3FANCNFSM4HNUV7JQ>
.
|
Here is my try
Got an error like
My .csv file has no header or anything it is just like following
|
One way to solve this is to have df = pd.read_csv(PATH_CSV, names=['SMILES']) #Add more column names here as per your csv
PandasTools.AddMoleculeColumnToFrame(df, "SMILES") #This add the ROMol column with the molecular representation
df["Mol_H"] = df["ROMol"].apply(Chem.AddHs)
Chem.PandasTools.WriteSDF(df,'{}/SDF_File.sdf'.format(source_csv_dir), idName=None, properties=list(df.columns), allNumeric=False) |
@pgg1610 thank you for your solution. I would like to ask you a question that is how to speed up (or parallel) process? This is because I have a lot of smiles. |
I don't know why this error happed after I re-run:
I updated Please help me to solve it. My code here:
Another error:
|
Change to |
This issue was marked as stale because it has been open for 90 days with no activity. |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
I am new to RDKit and trying to generate a SDF file from a .csv file containing >1000 SMILES. The .csv file looks something like following (each line is one SMILE):
I generated the 2D SDF file using the following script
I want to add hydrogens and want to generated 3D SDF file for the same. Any help?
The text was updated successfully, but these errors were encountered: