8000 GitHub - Froot-NetSys/NetPress
[go: up one dir, main page]
More Web Proxy on the site http://driver.im/
Skip to content

Froot-NetSys/NetPress

Repository files navigation

NetPress: Dynamically Generated LLM Benchmarks for Network Applications

Overview

NetPress is a dynamic benchmark generation framework for evaluating LLM agents in real-world network applications. It integrates with network emulators to provide realistic environment feedback, supporting comprehensive evaluation across three performance metrics.

Paper

The research behind NetPress is detailed in our paper:
Zhou, Y., Ruan, J., Wang, E. S., Fouladi, S., Yan, F. Y., Hsieh, K., & Liu, Z. (2025). NetPress: Dynamically Generated LLM Benchmarks for Network Applications. arXiv preprint arXiv:2506.03231. [paper]

Prerequisites

  • Conda package manager
  • Python environment

Installation

  1. Set up the required Conda environments:
# Create Mininet environment (for Route and K8s applications)
conda env create -f environment_mininet.yml

# Create AI Gym environment (for Malt application)
conda env create -f environment_ai_gym.yml
  1. Activate the AI Gym environment and install additional dependencies:
conda activate ai_gym_env
pip install -r ai_gym_requirement.txt

Quick Start

Execute the following commands to run the benchmark for each application:

cd experiments
./run_app_malt.sh
./run_app_route.sh
./run_app_k8s.sh

Detailed Application Guides

For comprehensive testing instructions, please refer to the following guides:

Results Analysis

Performance Metrics

Our evaluation framework measures three key dimensions:

  • Correctness: Evaluates if the LLM agent produces accurate solution for each network query.
  • Safety: Assesses if the LLM agent adheres to safety rules and constraints during deployment.
  • Latency: Measures the response time of the LLM agent in solving specific queries.

Statistical Analysis

  • Confidence interval comparisons between different agents Metrics Breakdown Analysis

  • Comprehensive breakdown analysis of performance metrics Metrics Breakdown Analysis

Contributing

Guide for adding new network applications coming soon.

Contact

For questions or support, please:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published
0