NetPress: Dynamically Generated LLM Benchmarks for Network Applications

Overview

NetPress is a dynamic benchmark generation framework for evaluating LLM agents in real-world network applications. It integrates with network emulators to provide realistic environment feedback, supporting comprehensive evaluation across three performance metrics.

Paper

The research behind NetPress is detailed in our paper:
Zhou, Y., Ruan, J., Wang, E. S., Fouladi, S., Yan, F. Y., Hsieh, K., & Liu, Z. (2025). NetPress: Dynamically Generated LLM Benchmarks for Network Applications. arXiv preprint arXiv:2506.03231. [paper]

Prerequisites

Conda package manager
Python environment

Installation

Set up the required Conda environments:

# Create Mininet environment (for Route and K8s applications)
conda env create -f environment_mininet.yml

# Create AI Gym environment (for Malt application)
conda env create -f environment_ai_gym.yml

Activate the AI Gym environment and install additional dependencies:

conda activate ai_gym_env
pip install -r ai_gym_requirement.txt

Quick Start

Execute the following commands to run the benchmark for each application:

cd experiments
./run_app_malt.sh
./run_app_route.sh
./run_app_k8s.sh

Detailed Application Guides

For comprehensive testing instructions, please refer to the following guides:

Results Analysis

Performance Metrics

Our evaluation framework measures three key dimensions:

Correctness: Evaluates if the LLM agent produces accurate solution for each network query.
Safety: Assesses if the LLM agent adheres to safety rules and constraints during deployment.
Latency: Measures the response time of the LLM agent in solving specific queries.

Statistical Analysis

Confidence interval comparisons between different agents
Comprehensive breakdown analysis of performance metrics

Contributing

Guide for adding new network applications coming soon.

Contact

For questions or support, please:

Open an issue on GitHub
Contact directly at leszhou@umd.edu

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
app-k8s		app-k8s
app-malt		app-malt
app-route		app-route
assets/images		assets/images
experiments		experiments
.gitignore		.gitignore
README.md		README.md
ai_gym_requirement.txt		ai_gym_requirement.txt
environment_ai_gym.yml		environment_ai_gym.yml
environment_mininet.yml		environment_mininet.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

NetPress: Dynamically Generated LLM Benchmarks for Network Applications

Overview

Paper

Prerequisites

Installation

Quick Start

Detailed Application Guides

Results Analysis

Performance Metrics

Statistical Analysis

Contributing

Contact

About

Uh oh!

Releases

Packages

Languages

Froot-NetSys/NetPress

Folders and files

Latest commit

History

Repository files navigation

NetPress: Dynamically Generated LLM Benchmarks for Network Applications

Overview

Paper

Prerequisites

Installation

Quick Start

Detailed Application Guides

Results Analysis

Performance Metrics

Statistical Analysis

Contributing

Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages