HiGEC is a Python framework for performing hierarchical classification with automated hierarchy generation, flexible exploitation strategies, and integration with modern classifiers.
HiGeC is evaluated on 115 multi-class datasets, demonstrating significant improvements over traditional flat classification (FC) and one-vs-all (OVA) approaches, particularly with advanced classifiers like XGBoost.
🔧 Installation
git clone https://github.com/your-username/higec.git
cd higec
pip install -r requirements.txt
Dependencies: numpy scikit-learn xgboost scipy matplotlib
📊 What This Project Does HiGEC provides:
- Automated Hierarchy Generation from flat-labeled datasets
- Probabilistic and hybrid Hierarchy Exploitation strategies
- Support for any multi-class base classifier
- Benchmark-ready structure using OpenML datasets
🚀 Quick Start
Run the Example:
python run_higec_example.py
What It Does:
- Downloads the Glass dataset from OpenML
- Performs flat classification using XGBoost
- Automatically constructs a class hierarchy using TSD (Task Similarity Distance)
- Trains a hierarchical classifier using the LCL+ scheme
- Compares F1-score of flat vs hierarchical classification
🧱 Core Components
Component | Description |
---|---|
HiGen | Constructs hierarchy using representative- or classifier-based distances |
hier_binary_tree | Hierarchical classifier wrapper for training/prediction |
utils.py | Data loader, metric scorer, plotting, label checks |
🧪 Customization
You can change the following parameters in run_higec_example.py:
did_ = 41 # Dataset ID (from OpenML)
hc_type = 'lcl+' # HC strategy: 'lcl+', 'lcpn', 'lcn+f', etc.
diss_type = 'jsd' # Dissimilarity type: 'tsd', 'jsd', 'ccm', 'cmd'
build_type = 'hdc' # Hierarchy build type: 'hac' or 'hdc'
eval_metric = 'f1' # Metric: 'f1', 'accuracy', 'auc'
You can also replace XGBClassifier() with any sklearn-compatible classifier.
📈 Example Output
Extended linkage table for levels:
level_id:0, subsets:[[3, 4], [0, 1, 2, 5]], branch_id:[8, 9]
level_id:1, subsets:[[1, 2], [0, 5], [4], [3]], branch_id:[7, 6, 4, 3]
Performance Comparison:
- Flat Classification (f1): 0.6470
- Hierarchical lcl+f (f1): 0.7377
Generated Hierarchy:
📂 Project Structure
├── run_higec_example.py # Main script
├── HiGen.py # Hierarchy generation module
├── HiCl.py # Hierarchical classifier logic
├── utils.py # Data loading and utility functions
├── README.md
├── requirements.txt