8000 GitHub - 23japhone/HALO: The official Implementation of HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content
/ HALO Public

The official Implementation of HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems

License

Notifications You must be signed in to change notification settings

23japhone/HALO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems

License


Authors: Zhipeng Hou, Junyi Tang, Yipeng Wang
Contact: japhonehou@gmail.com


📌Citation

If you find our work useful in your research, please consider citing the HALO as follows:

@misc{hou2025halohierarchicalautonomouslogicoriented,
      title={HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems}, 
      author={Zhipeng Hou and Junyi Tang and Yipeng Wang},
      year={2025},
      eprint={2505.13516},
      archivePrefix={arXiv},
      primaryClass={cs.MA},
      url={https://arxiv.org/abs/2505.13516}, 
}

📋️Overview

HALO functions as a three-stage paradigm

Abstract Recent advancements in Multi-Agent Systems (MAS) powered by Large Language Models (LLMs) have demonstrated tremendous potential in diverse task scenarios. Nonetheless, existing agentic systems typically rely on predefined agent-role design spaces and static communication structures, limiting their adaptability as well as flexibility in complex interaction environments and leading to subpar performance on highly specialized and expert-level tasks. To address these issues, we introduce HALO, a multi-agent collaboration framework based on a hierarchical reasoning architecture. Specifically, we incorporate a high-level planning agent for task decomposition, mid-level role-design agents for subtask-specific agent instantiation, and low-level inference agents for subtask execution. Particularly, subtask execution is reformulated as a structured workflow search problem, where Monte Carlo Tree Search (MCTS) systematically explores the agentic action space to construct optimal reasoning trajectories. Additionally, as the majority of users lack expertise in prompt engineering, we leverage an Adaptive Prompt Refinement module to transform raw queries into task-specific prompts. Empirical evaluations on Code Generation (HumanEval), General Reasoning (MMLU), and Arithmetic Reasoning (MATH) benchmark datasets highlight the effectiveness of HALO, yielding a 14.4% average improvement over state-of-the-art baselines. Notably, HALO achieves up to 13.3% performance gain on the Moral Scenarios subject in the MMLU benchmark and up to 19.6% performance gain on the Algebra subarea in the MATH benchmark, indicating its advanced proficiency in tackling highly specialized and expert-level tasks.

⚙️Installation

1. Create virtualenv (recommended)

conda create -n halo python=3.10
conda activate halo

2. Install packages

pip install -r requirements.txt

🔛Quick Start

1. Create Api config file

Create api_setting.json file in HALO/configs directory and insert the following contents (GPT-4o is recommended):

{
    "endpoints": "<base_url>/chat/completions",
    "api_key": "sk-xxx",
    "model": "xxx"
}

2. Modify user query (optional)

Locate the code between "user input begin" section and "user input ended" section in HALO/run.py script. You can modify the "QUERY" as what you want to ask.

3. Run script (make sure to run in the HALO directory)

python run.py

📝Experiments

Performance of HALO across three benchmarks. Metrics include $pass@1$ (%) for HumanEval, $accuracy$ (%) for MMLU as well as MATH, and $Avg.$ (%) for the mean performance over three runs. All methods are executed with GPT-4o.

Structure HumanEval MMLU MATH Avg.
HALO (Ours) Hierarchical architecture + MCTS 95.2 81.6 58.9 78.6

Ablation study of removing the Adaptive Prompt Refinement module and the high-level planning agent on GPT-4o across three benchmarks.

Ablation study of HALO

1. HumanEval

1.1 For windows, modify human-eval package script, please refer here

1.2 Run script (make sure to run in the HALO directory)

python ./experiment/human_eval/run.py

1.3 View the results in the HALO/experiment/human_eval/results directory

2. MMLU

2.1 Download MMLU datasets

2.2 Unzip datasets and rename MMLU_data, move it into HALO/experiment/MMLU directory

2.3 Run script (make sure to run in the HALO directory)

python ./experiment/MMLU/run.py

2.4 View the results in the HALO/experiment/MMLU/results directory

3. MATH

3.1 Download MATH datasets

3.2 Unzip datasets and rename MATH_data, move it into HALO/experiment/MATH directory

3.3 Run script (make sure to run in the HALO directory)

python ./experiment/MATH/run.py

3.4 View the results in the HALO/experiment/MATH/results directory

4. Ablation study

4.1 Run script (make sure to run in the HALO directory)

python ./experiment/ablation_study/run_humaneval_w_o_prompt.py
python ./experiment/ablation_study/run_humaneval_w_o_task.py
python ./experiment/ablation_study/run_math_w_o_prompt.py
python ./experiment/ablation_study/run_math_w_o_task.py
python ./experiment/ablation_study/run_mmlu_w_o_prompt.py
python ./experiment/ablation_study/run_mmlu_w_o_task.py

4.2 View the results in the HALO/experiment/ablation_study/results directory

🎀Acknowledgement

In our experiments, data preprocessing was standardized across all tasks. Thus, we referred to DyLAN, HumanEval, MMLU, and MATH in this stage.

⭐Star History

Star History Chart

About

The official Implementation of HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages

0