################################################################################ Chemical Entities of Biological Interest ################################################################################ Data Structure ============== The structure of the database: The main table is the 'compounds table' The following tables have a foreign key reference to the 'compounds' table. - chemical_data - database_accession - names - comments - reference - structures - compound_origins - comments The compounds table itself contains parent compounds and children compounds which have a foreign key to the parent compound. We use the identifier of the parent compound and the name of the parent compound as the ChEBI ID and ChEBI name respectively. The table 'relation' provides all the relationships between the compounds. Please refer to the file 'DataModel.png' in this downloads section for a visual representation of the data structure. Notes: 1. The compounds table has been exported with all our compounds though on our public web site we display only compounds which have been annotated. This is because we use the identifiers for the ontology. Annotated compounds have a status of 'C' which means 'checked'. Also compounds which have not been annotated have no database references, names or chemical data associated with them. 2. The status we refer to has the following definitions 'C' - Checked by one of our curators and released to the public domain. 'E' - Exists but not been checked by one of our curators. 'O' - Compound was made obsolete due to the merger of compounds. 3. The data has been exported in an ASCII format. For more information on what the symbols define please see the help file: http://www.ebi.ac.uk/chebi/pages/html/help.html#specialCharacters 4. The structures table has two columns one called the 'default_structure' which identifies if the structure is the preferred default structure, as there could be more than one structure for a compound. The second is 'autogen_structure', is used to define whether a structure is autogenerated or not (ex: 'SMILES, InChI, InChIKey'). A. Flat-File (Tab Delimited Format) =================================== ChEBI is stored in a relational database and we currently provide the ChEBI tables in a flat-file tab delimited format. There are various spreadsheet tools available to import this into a relational database. The files are stored in the same structure as the relational database. B. Oracle SQL Table Dumps ========================= ChEBI provides an Oracle SQL Table Dump that can be imported into an Oracle relational database. You can import this into Oracle using the unix 'imp' command. You should first create the tables using the 'create_tables.sql' file provided. Once the tables are created then you should disable all the constraints using the 'disable_constraints.sql' file. The import command can then be used to import the data and once the import is complete then the constraints should be enabled with the given file. The parameter file 'import.par' should reside in the same directory when the import is done. The correct command to perform in unix is: imp database_name/database_password@Instance_name PARFILE=import.par C. OBO Ontology Format ====================== ChEBI provides the ChEBI ontology in OBO format version 1.2. More information about the OBO format can be found at http://obo.sourceforge.net/ or http://www.geneontology.org/GO.format.html#oboflat. Note that the OBO format requires that certain characters such as '\' be escaped. The tool OBO-edit can be used to view the OBO format: http://sourceforge.net/project/showfiles.php?group_id=36855. D. GENERIC DUMPS ================ ChEBI provides a generic SQL dump which consists of SQL insert statements. The archive file called 'generic_dump.zip' consists of 9 files which contain SQL table insert statements of the entire database. The file called 'compounds.sql' should always be inserted first in order to avoid any constraint errors. Included in the folder is a mySQL create table script as an example for other users of the database. These insert statements should be possible to use in any database which accepts SQL as its query language. D. SDF File ================ ChEBI provides its chemical structures and additional data in SDF format. The data is provided in two flavours, * Chebi_lite.sdf file contains only the chemical structure, ChEBI identifier and ChEBI Name. * Chebi_complete.sdf file contains all the chemical structures and associated information. Note that it excludes any ontological information as ontological classes are not able to be represented as they do not contain a structure. More information about the SDF format used in ChEBI can be found in the Developer Manual. For any further queries please use our Developer Manual http://www.ebi.ac.uk/chebi/developerManualForward.do Regards, The ChEBI team. ################################################################################### All data in the database is non-proprietary or is derived from a non-proprietary source. It is thus freely accessible and available to anyone. In addition, each data item is fully traceable and explicitly referenced to the original source/version. The data on this website is available under the Creative Commons License (CC BY 4.0) https://creativecommons.org/licenses/by/4.0/ ChEBI reserves the right to change its output format in future. ###################################################################################