HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems

Authors: Zhipeng Hou, Junyi Tang, Yipeng Wang
Contact: japhonehou@gmail.com

📌Citation

If you find our work useful in your research, please consider citing the HALO as follows:

@misc{hou2025halohierarchicalautonomouslogicoriented,
      title={HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems}, 
      author={Zhipeng Hou and Junyi Tang and Yipeng Wang},
      year={2025},
      eprint={2505.13516},
      archivePrefix={arXiv},
      primaryClass={cs.MA},
      url={https://arxiv.org/abs/2505.13516}, 
}

📋️Overview

Abstract

Recent advancements in Multi-Agent Systems (MAS) powered by Large Language Models (LLMs) have demonstrated tremendous potential in diverse task scenarios. Nonetheless, existing agentic systems typically rely on predefined agent-role design spaces and static communication structures, limiting their adaptability as well as flexibility in complex interaction environments and leading to subpar performance on highly specialized and expert-level tasks. To address these issues, we introduce HALO, a multi-agent collaboration framework based on a hierarchical reasoning architecture. Specifically, we incorporate a high-level planning agent for task decomposition, mid-level role-design agents for subtask-specific agent instantiation, and low-level inference agents for subtask execution. Particularly, subtask execution is reformulated as a structured workflow search problem, where Monte Carlo Tree Search (MCTS) systematically explores the agentic action space to construct optimal reasoning trajectories. Additionally, as the majority of users lack expertise in prompt engineering, we leverage an Adaptive Prompt Refinement module to transform raw queries into task-specific prompts. Empirical evaluations on Code Generation (HumanEval), General Reasoning (MMLU), and Arithmetic Reasoning (MATH) benchmark datasets highlight the effectiveness of HALO, yielding a 14.4% average improvement over state-of-the-art baselines. Notably, HALO achieves up to 13.3% performance gain on the Moral Scenarios subject in the MMLU benchmark and up to 19.6% performance gain on the Algebra subarea in the MATH benchmark, indicating its advanced proficiency in tackling highly specialized and expert-level tasks.

⚙️Installation

1. Create virtualenv (recommended)

conda create -n halo python=3.10
conda activate halo

2. Install packages

pip install -r requirements.txt

🔛Quick Start

1. Create Api config file

Create api_setting.json file in HALO/configs directory and insert the following contents (GPT-4o is recommended):

{
    "endpoints": "<base_url>/chat/completions",
    "api_key": "sk-xxx",
    "model": "xxx"
}

2. Modify user query (optional)

Locate the code between "user input begin" section and "user input ended" section in HALO/run.py script. You can modify the "QUERY" as what you want to ask.

3. Run script (make sure to run in the `HALO` directory)

python run.py

📝Experiments

Performance of HALO across three benchmarks. Metrics include $pass@1$ (%) for HumanEval, $accuracy$ (%) for MMLU as well as MATH, and $Avg.$ (%) for the mean performance over three runs. All methods are executed with GPT-4o.

	Structure	HumanEval	MMLU	MATH	Avg.
HALO (Ours)	Hierarchical architecture + MCTS	95.2	81.6	58.9	78.6

Ablation study of removing the Adaptive Prompt Refinement module and the high-level planning agent on GPT-4o across three benchmarks.

1. HumanEval

1.1 For windows, modify `human-eval` package script, please refer here

1.2 Run script (make sure to run in the `HALO` directory)

python ./experiment/human_eval/run.py

1.3 View the results in the `HALO/experiment/human_eval/results` directory

2. MMLU

2.1 Download MMLU datasets

2.2 Unzip datasets and rename `MMLU_data`, move it into `HALO/experiment/MMLU` directory

2.3 Run script (make sure to run in the `HALO` directory)

python ./experiment/MMLU/run.py

2.4 View the results in the `HALO/experiment/MMLU/results` directory

3. MATH

3.1 Download MATH datasets

3.2 Unzip datasets and rename `MATH_data`, move it into `HALO/experiment/MATH` directory

3.3 Run script (make sure to run in the `HALO` directory)

python ./experiment/MATH/run.py

3.4 View the results in the `HALO/experiment/MATH/results` directory

4. Ablation study

4.1 Run script (make sure to run in the `HALO` directory)

python ./experiment/ablation_study/run_humaneval_w_o_prompt.py
python ./experiment/ablation_study/run_humaneval_w_o_task.py
python ./experiment/ablation_study/run_math_w_o_prompt.py
python ./experiment/ablation_study/run_math_w_o_task.py
python ./experiment/ablation_study/run_mmlu_w_o_prompt.py
python ./experiment/ablation_study/run_mmlu_w_o_task.py

4.2 View the results in the `HALO/experiment/ablation_study/results` directory

🎀Acknowledgement

In our experiments, data preprocessing was standardized across all tasks. Thus, we referred to DyLAN, HumanEval, MMLU, and MATH in this stage.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
configs		configs
experiment		experiment
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

License

23japhone/HALO

Folders and files

Latest commit

History

Repository files navigation

HALO: Hierarchical Autonomous Logic-Oriented Orchestration for Multi-Agent LLM Systems

📌Citation

📋️Overview

⚙️Installation

1. Create virtualenv (recommended)

2. Install packages

🔛Quick Start

1. Create Api config file

2. Modify user query (optional)

3. Run script (make sure to run in the HALO directory)

📝Experiments

1. HumanEval

1.1 For windows, modify human-eval package script, please refer here

1.2 Run script (make sure to run in the HALO directory)

1.3 View the results in the HALO/experiment/human_eval/results directory

2. MMLU

2.1 Download MMLU datasets

2.2 Unzip datasets and rename MMLU_data, move it into HALO/experiment/MMLU directory

2.3 Run script (make sure to run in the HALO directory)

2.4 View the results in the HALO/experiment/MMLU/results directory

3. MATH

3.1 Download MATH datasets

3.2 Unzip datasets and rename MATH_data, move it into HALO/experiment/MATH directory

3.3 Run script (make sure to run in the HALO directory)

3.4 View the results in the HALO/experiment/MATH/results directory

4. Ablation study

4.1 Run script (make sure to run in the HALO directory)

4.2 View the results in the HALO/experiment/ablation_study/results directory

🎀Acknowledgement

⭐Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

3. Run script (make sure to run in the `HALO` directory)

1.1 For windows, modify `human-eval` package script, please refer here

1.2 Run script (make sure to run in the `HALO` directory)

1.3 View the results in the `HALO/experiment/human_eval/results` directory

2.2 Unzip datasets and rename `MMLU_data`, move it into `HALO/experiment/MMLU` directory

2.3 Run script (make sure to run in the `HALO` directory)

2.4 View the results in the `HALO/experiment/MMLU/results` directory

3.2 Unzip datasets and rename `MATH_data`, move it into `HALO/experiment/MATH` directory

3.3 Run script (make sure to run in the `HALO` directory)

3.4 View the results in the `HALO/experiment/MATH/results` directory

4.1 Run script (make sure to run in the `HALO` directory)

4.2 View the results in the `HALO/experiment/ablation_study/results` directory

Packages