NetPress is a dynamic benchmark generation framework for evaluating LLM agents in real-world network applications. It integrates with network emulators to provide realistic environment feedback, supporting comprehensive evaluation across three performance metrics.
The research behind NetPress is detailed in our paper:
Zhou, Y., Ruan, J., Wang, E. S., Fouladi, S., Yan, F. Y., Hsieh, K., & Liu, Z. (2025).
NetPress: Dynamically Generated LLM Benchmarks for Network Applications. arXiv preprint arXiv:2506.03231. [paper]
- Conda package manager
- Python environment
- Set up the required Conda environments:
# Create Mininet environment (for Route and K8s applications)
conda env create -f environment_mininet.yml
# Create AI Gym environment (for Malt application)
conda env create -f environment_ai_gym.yml
- Activate the AI Gym environment and install additional dependencies:
conda activate ai_gym_env
pip install -r ai_gym_requirement.txt
Execute the following commands to run the benchmark for each application:
cd experiments
./run_app_malt.sh
./run_app_route.sh
./run_app_k8s.sh
For comprehensive testing instructions, please refer to the following guides:
- Capacity Planning (CP) Application Guide
- Routing Application Guide
- Kubernetes (K8s) Application Guide
Our evaluation framework measures three key dimensions:
- Correctness: Evaluates if the LLM agent produces accurate solution for each network query.
- Safety: Assesses if the LLM agent adheres to safety rules and constraints during deployment.
- Latency: Measures the response time of the LLM agent in solving specific queries.
Guide for adding new network applications coming soon.
For questions or support, please:
- Open an issue on GitHub
- Contact directly at leszhou@umd.edu