[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
forked from grimme-lab/PubGrep

Simple bash script using the PUG REST API to automatically get compounds and compound information from the PubChem database, based on readily available input data.

License

Notifications You must be signed in to change notification settings

tmiland/PubGrep

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PubGrep

Workflow using PubGrep, CREST and CENSO

This project provides a simple bash script, that uses the PUG REST API provided by the National Library of Medicine to automatically access the PubChem Database, based on a list of readily available input data, like CAS numbers, compound names or PubChem CIDs. It can be used to automatically get random conformer structure data (.sdf files) from the PubChem database, if it is available.

Since the structures obtained with this program are not necessarily the lowest-lying conformers, it is recommended to use conformer screening with CREST and CENSO afterward.

If you are using this script extensively for your research, please consider citing the publication.

Installation

If you are using linux, you can just download the repository, add PubGrep to your path and make it executable by using

chmod +x PubGrep

Usage

The default mode (name based input, structure output) is used by first creating a list with structure information (e.g. CAS numbers). The list can have a simple text (list.txt) format and could look like this:

124-18-15
78-70-76
142-62-1
3796-70-1
627-93-0

PubGrep can then be invoked as

PubGrep list.txt

which should give you the following output

---------------------------------------------------------------------
-                          PubGrep 0.3.1                            -
- This Program tries to search CIDs from the Pubchem Database based -
- on a list of compounds given as Input. Afterwards it creates sdf  -
-   Files for each Compound given in an appropriate subdirectory.   -
-     If you are using this program extensively (like, a lot!)      -
-   for your Research, please consider citing 10.1039/D3RA01705B    -
-                          MS, 2021-2023                            -
---------------------------------------------------------------------

Multiple compound mode, reading input from cas.
Testing Pubchem Server...
Pubchem Server is working fine.

Compound: 124-18-5, CID: 15600
Compound: 78-70-6, CID: 6549
Compound: 142-62-1, CID: 8892
Compound: 3796-70-1, CID: 1549778
Compound: 627-93-0, CID: 12329
Creating directories and sdf files.
15600
6549
8892
1549778
12329
Done!

Structure files are created in pubchem_compounds in their respective directory. Other modes for input and output can be found by calling

PubGrep --help

Single Structure mode.

You can also use PubGrep to obtain a single structure quickly from the PubChem Database by adding an identifier directly to the call instead of using the list-based input.

PubGrep caffeine 

2D to 3D conversion (experimental)

Sometimes, for more complex structures, no 3D information is available from the PubChem database. If this is the case, PubGrep will give you a warning and download the 2D structure instead. If you have xTB installed (in your path), PubGrep will try to use the implemented 2D -> 3D converter to give you a 3D structure guess. Use this feature with caution and double check the created 3D conformers. An example output (e.g for Taxol) will look like this.

Single compound mode for taxol.
Testing Pubchem Server...
Pubchem Server is working fine.

Compound: taxol, CID: 36314
Creating directories and sdf files.
36314
No 3D Conformer Data found for taxol
Retrieving 2D Conformer Data instead.
Using xTB for an attempt to convert the 2D structure to 3D.
normal termination of xtb
3D conversion successfull.
Done!

About

Simple bash script using the PUG REST API to automatically get compounds and compound information from the PubChem database, based on readily available input data.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Shell 100.0%