Drugs, Active Ingredients and Diseases Database in Spanish. Augmenting the Resources for Analyses on Drug–Illness Interactions
<p>Schematic representation of our database’s structure. The database contains 11 tables which comprise details of drugs, active ingredients, diseases and side effects.</p> "> Figure 2
<p>Schematic overview of the operational workflow to construct the <span class="html-italic">ATC-ICD-DATA</span> and <span class="html-italic">ATC-SIDEEFFECT-DATA</span>. The integrated datasets contains 2,073,740 records for ICD–ATC pairs and 3,316,029 records for side effects–ATC pairs. The data was extracted from relational database using web services XML (see main text for details); the information such as drug name, disease name, active ingredient name, side effects and other attributes were obtained from Vademecum (VDM) relational database by using SQL’s SELECT statement.</p> "> Figure 3
<p>Schematic representation of the process (option (i)) for acquiring data based on SQL staments. This process utilizes an inner join keyword for selecting records that have matching values in the 5 tables shown in the Venn diagram. With the SQL statement, it is possible to select all the attributes defined in the tables, and there is the ability to match other data sources that include ATC and ICD codes with descriptions in different languages.</p> "> Figure 4
<p>Schematic representation of the process (option (ii)). This process consists of combining Pandas DataFrames as s data manipulation tool for joining tables.</p> "> Figure 5
<p>Examples of the utility of our dataset. (<b>a</b>) Sankey diagram to represent the association intensities between active ingredients and diseases. Here, the width is proportional to the number of links (associations) between ATC’s and ICD’s groups. (<b>b</b>) Heatmap of the associations between anatomical group names and blocks of diseases. We observe the presence of groups of active ingredients (ATC) and diseases (ICD), which concentrate high numbers of elements. (<b>c</b>) Bipartite network of the most promising COVID-19 treatments. In this case, nodes in green represent active ingredients, nodes in blue are diseases and a link exists if the ATC is prescribed for a disease.</p> "> Figure 6
<p>Comparison. (<b>a</b>) Venn diagram and (<b>b</b>) bar chart which show the frequencies of active ingredients in active ingredient–disease pairs.</p> ">
Abstract
:1. Background
2. Methods
2.1. Data Request to Vademecum
2.2. Data Collection Process
2.3. Load Data from Flat Files to PostgreSQL Database
2.4. Standarization
2.5. Data Content of Drugs, ATC and ICD Tables
2.5.1. Drugs Table
2.5.2. ATC Table
2.5.3. ICD Table
2.6. Integrated Datasets
3. Results
3.1. Data Description
3.2. Technical Validation
3.3. Comparison with Other Databases
4. Code Availability
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Krantz, A. Diversification of the drug discovery process. Nat. Biotechnol. 1998, 16, 1294. [Google Scholar] [CrossRef] [PubMed]
- Guney, E.; Menche, J.; Vidal, M.; Barábasi, A.L. Network-based in silico drug efficacy screening. Nat. Commun. 2016, 7, 10331. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Vinayagam, A.; Gibson, T.E.; Lee, H.J.; Yilmazel, B.; Roesel, C.; Hu, Y.; Kwon, Y.; Sharma, A.; Liu, Y.Y.; Perrimon, N.; et al. Controllability analysis of the directed human protein interaction network identifies disease genes and drug targets. Proc. Natl. Acad. Sci. USA 2016, 113, 4976–4981. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ye, H.; Tang, K.; Yang, L.; Cao, Z.; Li, Y. Study of drug function based on similarity of pathway fingerprint. Protein Cell 2012, 3, 132–139. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Drews, J. Drug Discovery: A Historical Perspective. Science 2000, 287, 1960–1964. Available online: https://science.sciencemag.org/content/287/5460/1960.full.pdf (accessed on 2 February 2020). [CrossRef] [PubMed] [Green Version]
- Wouters, O.J.; McKee, M.; Luyten, J. Estimated Research and Development Investment Needed to Bring a New Medicine to Market, 2009–2018. JAMA 2020, 323, 844–853. Available online: https://jamanetwork.com/journals/jama/articlepdf/2762311/jama_wouters_2020_oi_200015.pdf (accessed on 19 December 2020). [CrossRef] [PubMed]
- Mohs, R.C.; Greig, N.H. Drug discovery and development: Role of basic biological research. Alzheimer’s Dement. Transl. Res. Clin. Interv. 2017, 3, 651–657. [Google Scholar] [CrossRef] [PubMed]
- Leaders. Getting Medicines to Market Faster. 2018. Available online: https://www.economist.com/leaders/2018/03/24/getting-medicines-to-market-faster (accessed on 6 March 2020).
- Law, V.; Knox, C.; Djoumbou, Y.; Jewison, T.; Guo, A.C.; Liu, Y.; Maciejewski, A.; Arndt, D.; Wilson, M.; Neveu, V.; et al. DrugBank 4.0: Shedding new light on drug metabolism. Nucleic Acids Res. 2013, 42, D1091–D1097. Available online: https://academic.oup.com/nar/article-pdf/42/D1/D1091/3559045/gkt1068.pdf (accessed on 17 June 2020). [CrossRef] [PubMed] [Green Version]
- Ursu, O.; Holmes, J.; Bologa, C.G.; Yang, J.J.; Mathias, S.L.; Stathias, V.; Nguyen, D.T.; Schürer, S.; Oprea, T. DrugCentral 2018: An update. Nucleic Acids Res. 2018, 47, D963–D970. Available online: https://academic.oup.com/nar/article-pdf/47/D1/D963/27436360/gky963.pdf (accessed on 17 June 2020). [CrossRef] [PubMed]
- Kuhn, M.; Letunic, I.; Jensen, L.J.; Bork, P. The SIDER database of drugs and side effects. Nucleic Acids Res. 2015, 44, D1075–D1079. [Google Scholar] [CrossRef] [PubMed]
- Spain, V.V. Vidal Vademecum Spain, Su Fuente de Conocimiento Farmacológico. 2010. Available online: https://www.vademecum.es/ (accessed on 30 November 2019).
- European Commission. European Commission—Centralised Medicinal Products for Human Use by ATC Code. 2020. Available online: http://ec.europa.eu/health/documents/community-register/html/reg_hum_atc.htm (accessed on 2 February 2020).
- World Health Organization. World Health Organization, Anatomical Therapeutic Chemical Classification System. 2018. Available online: https://www.whocc.no/ (accessed on 2 February 2020).
- World Health Organization. World Health Organization, International Statistical Classification of Diseases and Related Health Problems. 2010. Available online: http://www.who.int/classifications/ (accessed on 2 February 2020).
- Winkler, W.E. String Comparator Metrics and Enhanced Decision Rules in the Fellegi-Sunter Model of Record Linkage. In Proceedings of the Section on Survey Research Methods, Anaheim, CA, USA, 6–9 August 1990; pp. 354–359. [Google Scholar]
- FigShare. 2020. Available online: https://figshare.com/s/5b3128788640d7aa7d4f (accessed on 30 October 2020).
- Smith, D.G. The 3 Most Promising Coronavirus Treatments, Explained. 2020. Available online: https://elemental.medium.com/the-3-most-promising-coronavirus-treatments-explained-752e2c6d54d7 (accessed on 30 September 2020).
Name | Description | Rows Number | Attribute Name | Attribute Description |
---|---|---|---|---|
atcclass | Anatomical Therapeutic Chemical Classification System (ATC International) | 6231 | atcclassid | ATC class identifier |
parentid | Top level ATC code identifier | |||
code | ATC code | |||
name | ATC title description | |||
cim10 | The 10th revision of the International Classification of Diseases and Related Health Problems | 40,164 | cim10id | ICD10 identifier |
parentid | ICD10 group identifier | |||
code | ICD10 code | |||
name | ICD10 description | |||
cim10_indicationgroup | Relationship between ICD10 and general indications | 10,432 | indicationgroupid | Indication group identifier |
cim10id | ICD10 identifier | |||
commonnamegroup | VMP Group defines the drugs from active ingredient, doses, doses unit, route and pharmaceutical form | 12,827 | commonnamegroupid | Virtual Medicinal Product(VMP) group identifier |
name | VMP group description | |||
galenicformid | Galenic form identifier of the VMP group | |||
name_noaccent | VMP group description without accent | |||
commonnamegroup_atc | Relationship between VMP Group and ATC class | 12,320 | atcclassid | ATC class identifier |
commonnamegroupid | VMP group identifier | |||
commonnamegroup_indication | Relationship between VMP Group and therapeutical indication | 50,662 | indicationid | Therapeutic indication identifier |
commonnamegroupid | VMP group identifier | |||
commonnamegroup_sideeffect | Relationship between VMP Group and adverse effects | 631,243 | commonnamegroupid | VMP group identifier |
sideeffectid | Side effect identifier | |||
frequency | Frequency of occurrence of adverse effects | |||
indicationgroup_indication | Relationship between specific therapeutic indications and general therapeutic indications | 5292 | indicationgroupid | General therapeutic indication identifier |
indicationid | Particular therapeutic indication identifier | |||
sideeffect | Adverse effects (these side effects imply the noxious and non-intentional responses from the use of the drugs) | 4093 | sideeffectid | Side effect identifier |
name | Side effect description |
File Name | Description |
---|---|
get_full_data.py | Script which load data from CSV files to PostgreSQL database in a methodical and automated manner and extract joined datasets using SQL statements (insert number 1 via command line for this option). For joining dataset using Pandas DataFrames, insert number 2 via command line (The minimum requirement for running option 2 is 8GB RAM) |
Technical Validation.ipynb | Jupyter notebook where data technical validation is shown |
requirements.txt | The list of Python module names required for Nbviewer |
covid_subtances.graphml | Network ready for opening on Gephi o Cytoscape software |
atcclass.csv | Anatomical Therapeutic Chemical Classification System (ATC International) |
cim10.csv | The 10th revision of the International Classification of Diseases and Related Health Problems |
cim10_indicationgroup.csv | Relationship between ICD10 and general indications |
commonnamegroup.csv | VMP Group defines the drugs from active ingredient, doses, doses unit, route and pharmaceutical form |
commonnamegroup_atc.csv | Relationship between VMP Group and ATC class |
commonnamegroup_indication.csv | Relationship between VMP Group and therapeutical indication |
commonnamegroup_sideeffect.csv | Relationship between VMP Group and adverse effects |
indicationgroup_indication.csv | Relationship between specific therapeutic indications and general therapeutic indications |
sideeffect.csv | Adverse effects (these side effects imply the noxious and non-intentional responses from the use of the drugs) |
atc_icd_data.csv | Integrated dataset result of our algorithm in format CSV, it is available only in FigShare inside a ZIP file with the same file name |
atc_sideeffects_data.csv | Integrated dataset result of our algorithm in format CSV, it is available only in FigShare inside a ZIP file with the same file name |
Code | Description |
---|---|
A | Alimentary tract and metabolism (1st level, anatomical main group) |
A10 | Drugs used in diabetes (2nd level, therapeutic subgroup) |
A10B | Blood glucose lowering drugs, excl. insulins (3rd level, pharmacological subgroup) |
A10BA | Biguanides (4th level, chemical subgroup) |
A10BA02 | Metformin (5th level, chemical substance) |
Chapter | Group | Category | Codes |
---|---|---|---|
IX Diseases of the circulatory system (I00–I99) | Acute rheumatic fever (I00–I02) | I01 Rheumatic fever with heart involvement | I01.0 Acute rheumatic pericarditis |
I01.1 Acute rheumatic endocarditis | |||
I01.2 Acute rheumatic myocarditis | |||
I01.8 Other acute rheumatic heart disease | |||
I09.9 Acute rheumatic heart disease, unspecified |
Relationship | Anatomical Name Group (ATC) | Disease Names per Blocks (ICD) |
---|---|---|
A_ATC-K_ICD | A-Alimentary tract and metabolism | K-Diseases of the digestive system |
B_ATC-K_ICD | B-Blood and blood forming organs | K-Diseases of the digestive system |
C_ATC-I_ICD | C-Cardiovascular system | I-Diseases of the circulatory system |
D_ATC-L_ICD | D-Dermatologicals | L-Diseases of the skin and subcutaneous tissue |
G_ATC-N_ICD | G-Genito-urinary system and sex hormones | N-Diseases of the genitourinary system |
H_ATC-L_ICD | H-Systemic hormonal preparations, excluding sex hormones and insulins | L-Diseases of the skin and subcutaneous tissue |
J_ATC-A_ICD | J-Antiinfectives for systemic use | A-Certain infectious and parasitic diseases |
L_ATC-C_ICD | L-Antineoplastic and immunomodulating agents | C-Neoplasms |
M_ATC-M_ICD | M-Musculo-skeletal system | M-Diseases of the musculoskeletal system and connective tissue |
N_ATC-G_ICD | N-Nervous system | G-Diseases of the nervous system |
P_ATC-B_ICD | Antiparasitic products, insecticides and repellents | B-Certain infectious and parasitic diseases |
R_ATC-J_ICD | R-Respiratory system | J-Diseases of the respiratory system |
S_ATC-L_ICD | S-Sensory organs | L-Diseases of the skin and subcutaneous tissue |
V_ATC-L_ICD | V-Various | L-Diseases of the skin and subcutaneous tissue |
Records Analyzed | DrugBank | DrugCentral | Vademecum | SIDER |
---|---|---|---|---|
Records number of unique ICD-ATC codes pairs | 13,580 | 106,916 | 260,995 | 30,835 |
Unique Active Ingredients in the catalog table | 4484 | 5067 | 6231 | 1560 |
Active Ingredients related to an indication | 4484 | 3295 | 2729 | 1560 |
ATC Group | Anatomical Name Group | DrugBank | DrugCentral | Vademecum | SIDER |
---|---|---|---|---|---|
A | Alimentary tract and metabolism | 554 | 408 | 354 | 169 |
B | Blood and blood forming organs | 190 | 129 | 142 | 64 |
C | Cardiovascular system | 571 | 409 | 308 | 159 |
D | Dermatologicals | 345 | 254 | 208 | 127 |
G | Genito-urinary system and sex hormones | 264 | 192 | 160 | 85 |
H | Systemic hormonal preparations, excluding sex hormones and insulins | 69 | 51 | 51 | 39 |
J | Antiinfectives for systemic use | 402 | 316 | 261 | 151 |
L | Antineoplastic and immunomodulating agents | 344 | 295 | 251 | 123 |
M | Musculo-skeletal system | 217 | 150 | 124 | 75 |
N | Nervous system | 588 | 415 | 339 | 226 |
P | Antiparasitic products, insecticides and repellents | 118 | 77 | 56 | 25 |
R | Respiratory system | 364 | 251 | 207 | 103 |
S | Sensory organs | 264 | 239 | 157 | 148 |
V | Various | 194 | 109 | 111 | 66 |
Total | 4484 | 3295 | 2729 | 1560 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
López-Rodríguez, I.; Reyes-Manzano, C.F.; Reyes-Ramírez, I.; Contreras-Uribe, T.J.; Guzmán-Vargas, L. Drugs, Active Ingredients and Diseases Database in Spanish. Augmenting the Resources for Analyses on Drug–Illness Interactions. Data 2021, 6, 3. https://doi.org/10.3390/data6010003
López-Rodríguez I, Reyes-Manzano CF, Reyes-Ramírez I, Contreras-Uribe TJ, Guzmán-Vargas L. Drugs, Active Ingredients and Diseases Database in Spanish. Augmenting the Resources for Analyses on Drug–Illness Interactions. Data. 2021; 6(1):3. https://doi.org/10.3390/data6010003
Chicago/Turabian StyleLópez-Rodríguez, Irene, César F. Reyes-Manzano, Israel Reyes-Ramírez, Tania J. Contreras-Uribe, and Lev Guzmán-Vargas. 2021. "Drugs, Active Ingredients and Diseases Database in Spanish. Augmenting the Resources for Analyses on Drug–Illness Interactions" Data 6, no. 1: 3. https://doi.org/10.3390/data6010003
APA StyleLópez-Rodríguez, I., Reyes-Manzano, C. F., Reyes-Ramírez, I., Contreras-Uribe, T. J., & Guzmán-Vargas, L. (2021). Drugs, Active Ingredients and Diseases Database in Spanish. Augmenting the Resources for Analyses on Drug–Illness Interactions. Data, 6(1), 3. https://doi.org/10.3390/data6010003