################################################################################
                                                                              
Chemical Entities of Biological Interest                                     

                                                                             
################################################################################

Data Structure
==============

The structure of the database:
The main table is the 'compounds table'
The following tables have a foreign key reference to the 'compounds' table.
  - chemical_data
  - database_accession
  - names
  - comments
  - reference
  - structures
  - compound_origins
  - comments

The compounds table itself contains parent compounds and children compounds which
have a foreign key to the parent compound. We use the identifier of the parent compound 
and the name of the parent compound as the ChEBI ID and ChEBI name respectively. 
The table 'relation' provides all the relationships between the compounds.

Please refer to the file 'DataModel.png' in this downloads section for a visual
representation of the data structure.

Notes:
1. The compounds table has been exported with all our compounds though on our 
public web site we display only compounds which have been annotated. This is 
because we use the identifiers for the ontology. Annotated compounds have a 
status of 'C' which means 'checked'. Also compounds which have not been 
annotated have no database references, names or chemical data associated 
with them.

2. The status we refer to has the following definitions
   'C' - Checked by one of our curators and released to the public domain.
   'E' - Exists but not been checked by one of our curators.
   'O' - Compound was made obsolete due to the merger of compounds.

3. The data has been exported in an ASCII format.
For more information on what the symbols define please see the help file:
http://www.ebi.ac.uk/chebi/pages/html/help.html#specialCharacters

4. The structures table has two columns one called the 'default_structure' which 
identifies if the structure is the preferred default structure, as there could 
be more than one structure for a compound. The second is 'autogen_structure', is used to define 
whether a structure is autogenerated or not (ex: 'SMILES, InChI, InChIKey').

A. Flat-File (Tab Delimited Format)
===================================

ChEBI is stored in a relational database and we currently provide the ChEBI 
tables in a flat-file tab delimited format. There are various spreadsheet tools
available to import this into a relational database. The files are stored in
the same structure as the relational database.


B. Oracle SQL Table Dumps
=========================

ChEBI provides an Oracle SQL Table Dump that can be imported into an Oracle 
relational database.
You can import this into Oracle using the unix 'imp' command.
You should first create the tables using the 'create_tables.sql' file provided.
Once the tables are created then you should disable all the constraints using
the 'disable_constraints.sql' file. The import command can then be used to
import the data and once the import is complete then the constraints should
be enabled with the given file.
The parameter file 'import.par' should reside in the same directory when the 
import is done. The correct command to perform in unix is:
imp database_name/database_password@Instance_name PARFILE=import.par

C. OBO Ontology Format
======================

ChEBI provides the ChEBI ontology in OBO format version 1.2. 
More information about the OBO format can be found at
http://obo.sourceforge.net/ or http://www.geneontology.org/GO.format.html#oboflat.
Note that the OBO format requires that certain characters such as '\' be escaped.
The tool OBO-edit can be used to view the OBO format:
http://sourceforge.net/project/showfiles.php?group_id=36855.

D. GENERIC DUMPS
================

ChEBI provides a generic SQL dump which consists of SQL insert statements. 
The archive file called 'generic_dump.zip' consists of 9 files which contain SQL
table insert statements of the entire database. The file called 'compounds.sql' 
should always be inserted first in order to avoid any constraint errors.
Included in the folder is a mySQL create table script as an example for 
other users of the database. These insert statements should be possible to use
in any database which accepts SQL as its query language.

D. SDF File
================
ChEBI provides its chemical structures and additional data in SDF format. 
The data is provided in two flavours,
* Chebi_lite.sdf file contains only the chemical structure, ChEBI identifier and ChEBI Name.
* Chebi_complete.sdf file contains all the chemical structures and associated information. 
Note that it excludes any ontological information as ontological classes are not able to be 
represented as they do not contain a structure.

More information about the SDF format used in ChEBI can be found in the Developer Manual. 

For any further queries please use our Developer Manual
http://www.ebi.ac.uk/chebi/developerManualForward.do

Regards,
The ChEBI team.

###################################################################################

All data in the database is non-proprietary or is derived from a non-proprietary 
source. It is thus freely accessible and available to anyone. In addition, 
each data item is fully traceable and explicitly referenced to the original 
source/version.


The data on this website is available under the Creative Commons License (CC BY 4.0)
https://creativecommons.org/licenses/by/4.0/


ChEBI reserves the right to change its output format in future.

###################################################################################