KEGG Mapping
KEGG mapping as a set operation
KEGG mapping is the process to map molecular objects (genes, proteins, small molecules, etc.) to molecular network objects (KEGG pathway maps, BRITE hierarchies and KEGG modules). It is not simply an enrichment process; rather it is a set operation to generate a new set. From the beginning of the KEGG project, the basic idea was to automatically generate organism-specific pathways by the set operation between manually annotated genome data and manually created pathway maps. Thus, the KEGG mapping set operation has played a role to extend the KEGG knowledge base. In addition, it played another important role to assist integration and interpretation of users' datasets, especially large-scale datasets generated by high-throughput technologies (see: KEGG Mapper tools).Here the network objects of pathway maps and Brite hierarchies are explained.
KEGG Pathway Maps
Graphical map objects
The KEGG pathway map is a moleculalr interaction/reaction network diagram represented in terms of the KEGG Orthology (KO) groups, so that experimental evidence in specific organisms can be generalized to other organisms through genomic information. Each map is manually drawn with in-house software called KegSketch, which generates the KGML+ file. This file is an SVG file containing graphics objects that are associated with KEGG objects (see KEGG object identifiers). Basic graphics objects in the reference KEGG pathway maps are:
Convention of map number prefix
- boxes - ortholog (KO) groups identified by K numbers and, in metabolic maps, reactions identified by R numbers as well
- circles - other molecules, usually chemical compounds identified by C numbers, but including glycans identified by G numbers
- lines - reactions identified by R numbers in metabolic maps; ortholog (KO) groups identified by K numbers in global metabolism maps
- boxes - genes or gene products identified by the combination of the KEGG organism code and gene identifiers
Each pathway map is identified by the combination of 2-4 letter code and 5 digit number (see KEGG Identifiers). The prefix has the following meaning:
As shown here, "map" pathways are not colored, "ko/ec/rn" pathways are colored blue, and organism-specific pathways are colored green, where coloring indicates that map objects exist and are linked to corresponding entries.
For global metabolism maps, "map" pathways are fully colored, so that "ko/ec/rn" pathways and organism-specific pathways are generated by reducing the coloring indicating the absence of corresponding entries.
About KGML files
- map - Reference pathway
- ko - Reference pathway (KO)
- ec - Reference pathway (EC)
- rn - Reference pathway (Reaction)
- org - Organism-specific pathway map
map00010 |
ko00010 |
hsa00010 |
As shown here, "map" pathways are not colored, "ko/ec/rn" pathways are colored blue, and organism-specific pathways are colored green, where coloring indicates that map objects exist and are linked to corresponding entries.
For global metabolism maps, "map" pathways are fully colored, so that "ko/ec/rn" pathways and organism-specific pathways are generated by reducing the coloring indicating the absence of corresponding entries.
KGML is an exchange format of KEGG pathway maps. It is meant for outside users and is not used in any service or database update procedure within KEGG. KGML files, which are computationally generated from the manually defined KGML+ file, contain information about entries (KEGG objects) and two types of relationships.
- relations - relationships between boxes
- reactions - relationships between circles
BRITE Functional Hierarchies
BRITE hierarchy files
The KEGG BRITE database is a collection of BRITE hierarchy files, called htext (hierarchical text) files, with additional files for binary relations. The htext file is manually created with in-house software called KegHierEditor. The htext file contains "A", "B", "C", etc. at the first column to indicate the hierarchy level.
Each BRITE hierarchy file represents a classification system of KEGG objects identified by the KEGG Identifiers; for example, pathway-based gene classification or protein family classification by the K numbers, compound classification by C numbers, drug classification by D numbers, and disease classification by H numbers.
The binary relation files contain the relationship between KEGG objects and attributes, which can be dynamically added to the hierarchy file as additional columns using the join feature of the Brite hierarchy viewer. Many binary relation files are computationally generated from the KEGG database contents and shown in the left panel of the Brite hierarchy viewer.
The KEGG objects of BRITE files can be searched in the search box at the top of the KEGG BRITE page, in the search boxes of the Brite hierarchy viewer, and by the KEGG Mapper tools.
Convention of brite number prefix
A Metabolism B Carbohydrate Metabolism C 00010 Glycolysis / Gluconeogenesis [PATH:ko00010] D K00844 HK; hexokinase [EC:2.7.1.1] D K12407 GCK; glucokinase [EC:2.7.1.2] D K00845 glk; glucokinase [EC:2.7.1.2] D ......
The binary relation files contain the relationship between KEGG objects and attributes, which can be dynamically added to the hierarchy file as additional columns using the join feature of the Brite hierarchy viewer. Many binary relation files are computationally generated from the KEGG database contents and shown in the left panel of the Brite hierarchy viewer.
The KEGG objects of BRITE files can be searched in the search box at the top of the KEGG BRITE page, in the search boxes of the Brite hierarchy viewer, and by the KEGG Mapper tools.
Each BRITE hierarchy file is identified by the combination of 2-4 letter code and 5 digit number (see KEGG Identifiers). The prefix has the following meaning:
- ko - Reference hierarchy (KO)
- org - Organism-specific hierarchy
- br - Non-KO hierarchy
- jp - Non-KO hierarchy in Japanese
Last updated: January 1, 2024