RobotecAI · maciejmajek · May 29, 2025 · May 20, 2025 · May 20, 2025 · May 21, 2025
diff --git a/docs/imgs/manipulation_benchmark.png b/docs/imgs/manipulation_benchmark.png
diff --git a/docs/imgs/tool_calling_agent_benchmark.png b/docs/imgs/tool_calling_agent_benchmark.png
diff --git a/docs/simulation_and_benchmarking/overview.md b/docs/simulation_and_benchmarking/overview.md
@@ -10,21 +10,22 @@ RAI Sim provides a simulator-agnostic interface that allows RAI to work with any
 -   Easy integration with new simulators
 -   Seamless switching between simulation backends
 
+The package also provides simulator bridges for concrete simulators, currently supporting only O3DE.
 For detailed information about the simulation interface, see [RAI Sim Documentation](rai_sim.md).
 
 ## RAI Bench
 
-RAI Bench builds on top of RAI Sim to provide a framework for creating and running benchmarks. It uses the simulator-agnostic interface to:
+RAI Bench provides benchmarks with ready-to-use tasks and a framework to create your own tasks. It enables:
 
--   Define and execute tasks in any supported simulator
+-   Define and execute tasks
 -   Measure and evaluate performance
 -   Collect and analyze results
 
 For detailed information about the benchmarking framework, see [RAI Bench Documentation](rai_bench.md).
 
 ## Integration
 
-RAI Sim and RAI Bench work together to provide a complete simulation and evaluation environment:
+RAI Sim and RAI Bench work together to provide benchmarks which utilize simulations for evaluation:
 
 1. **Simulation Interface**: RAI Sim provides the foundation with its simulator-agnostic interface
 2. **Task Definition**: RAI Bench defines tasks that can be executed in any supported simulator

diff --git a/d 8000 ocs/simulation_and_benchmarking/rai_bench.md b/d 8000 ocs/simulation_and_benchmarking/rai_bench.md
@@ -1,146 +1,136 @@
 # RAI Bench
 
-RAI Bench is a framework for creating and running benchmarks in simulation environments. It builds on top of RAI Sim to provide a structured way to define tasks, scenarios, and evaluate performance.
+RAI Bench is a comprehensive package that both provides benchmarks with ready-to-use tasks and offers a framework for creating new tasks. It's designed to evaluate the performance of AI agents in various environments.
 
-## Core Components
+### Available Benchmarks
 
-### Task
+-   [Manipulation O3DE Benchmark](#manipulation-o3de-benchmark)
+-   [Tool Calling Agent Benchmark](#tool-calling-agent-benchmark)
+
+## Manipulation O3DE Benchmark
 
-The `Task` class is an abstract base class that defines the interface for benchmark tasks. Each task must implement:
+Evaluates agent performance in robotic arm manipulation tasks within the O3DE simulation environment. The benchmark evaluates how well agents can process sensor data and use tools to manipulate objects in the environment.
 
--   `get_prompt()`: Returns the task instruction for the agent
--   `validate_config()`: Verifies if a simulation configuration is suitable for the task
--   `calculate_result()`: Computes the task score (0.0 to 1.0)
+### Framework Components
 
-### ManipulationTask
+Manipulation O3DE Benchmark provides a framework for creating custom tasks and scenarios with these core components:
 
-A specialized `Task` class for manipulation tasks that provides common functionality for:
+![Manipulation Benchmark Framework](../imgs/manipulation_benchmark.png)
+
+### Task
 
--   Object type filtering
--   Placement validation
--   Score calculation based on object positions
+The `Task` class is an abstract base class that defines the interface for tasks used in this benchmark.
+Each concrete Task must implement:
+
+-   prompts that will be passed to the agent
+-   validation of simulation configurations
+-   calculating results based on scene state
 
 ### Scenario
 
 A `Scenario` represents a specific test case combining:
 
 -   A task to be executed
 -   A simulation configuration
--   The path to the configuration file
 
-### Benchmark
+### ManipulationO3DEBenchmark
 
-The `Benchmark` class manages the execution of scenarios and collects results. It provides:
+The `ManipulationO3DEBenchmark` class manages the execution of scenarios and collects results. It provides:
 
 -   Scenario execution management
 -   Performance metrics tracking
--   Results logging and export
+-   Logs and results
+-   Robotic stack needed, provided as `LaunchDescription`
 
-## Available Tasks
+### Available Tasks
 
-The framework includes several predefined manipulation tasks:
+The benchmark includes several predefined manipulation tasks:
 
-1. **MoveObjectsToLeftTask**
+1. **MoveObjectsToLeftTask** - Move specified objects to the left side of the table
 
-    - Moves specified objects to the left side of the table
-    - Success measured by objects' y-coordinate being positive
+2. **PlaceObjectAtCoordTask** - Place specified objects at specific coordinates
 
-2. **PlaceObjectAtCoordTask**
+3. **PlaceCubesTask** - Place specified cubes adjacent to each other
 
-    - Places an object at specific coordinates
-    - Success measured by distance from target position
+4. **BuildCubeTowerTask** - Stack specified cubes to form a tower
 
-3. **PlaceCubesTask**
+5. **GroupObjectsTask** - Group specified objects of specified types together
 
-    - Places cubes adjacent to each other
-    - Success measured by proximity to other cubes
+Tasks are parametrizable so you can configure which objects should be manipulated and how much precision is needed to complete a task.
 
-4. **BuildCubeTowerTask**
+Tasks are scored on a scale from 0.0 to 1.0, where:
 
-    - Stacks cubes to form a tower
-    - Success measured by height and stability
+-   0.0 indicates no improvement or worse placement than the starting one
+-   1.0 indicates perfect completion
 
-5. **GroupObjectsTask**
+The score is typically calculated as:
 
-    - Groups objects of specified types together
-    - Success measured by object proximity
+```
+score = (correctly_placed_now - correctly_placed_initially) / initially_incorrect
+```
 
-## Usage
+### Available Scene Configs and Scenarios
 
-### Creating Scenarios
+You can find predefined scene configs in `rai_bench/manipulation_o3de/predefined/configs/`.
 
-Scenarios can be created manually:
+Predefined scenarios can be imported like:
 
 ```python
-scenario = Scenario(
-    task=MoveObjectsToLeftTask(obj_types=["cube"]),
-    simulation_config=simulation_config,
-    simulation_config_path="path/to/config.yaml"
-)
+from rai_bench.manipulation_o3de import get_scenarios
+
+get_scenarios(levels=["easy", "medium"])
 ```
 
-Or automatically using the `Benchmark.create_scenarios()` method:
+Choose which task you want by selecting the difficulty, from trivial to very hard scenarios.
 
-```python
-scenarios = Benchmark.create_scenarios(
-    tasks=tasks,
-    simulation_configs=configs,
-    simulation_configs_paths=config_paths
-)
-```
+## Tool Calling Agent Benchmark
 
-### Running Benchmarks
+Evaluates agent performance independently from any simulation, based only on tool calls that the agent makes. To make it independent from simulations, this benchmark introduces tool mocks which can be adjusted for different tasks. This makes the benchmark more universal and a lot faster.
 
-```python
-benchmark = Benchmark(
-    simulation_bridge=bridge,
-    scenarios=scenarios,
-    results_filename="results.csv"
-)
-```
+### Framework Components
 
-## Scoring System
+![Tool Calling Benchmark Framework](../imgs/tool_calling_agent_benchmark.png)
 
-Tasks are scored on a scale from 0.0 to 1.0, where:
+### SubTask
 
--   0.0 indicates no improvement or worse performance
--   1.0 indicates perfect completion
+The `SubTask` class is used to validate just one tool call. Following classes are available:
 
-The score is typically calculated as:
+-   `CheckArgsToolCallSubTask` - verify if a certain tool was called with expected arguments
+-   `CheckTopicFieldsToolCallSubTask` - verify if a message published to ROS 2topic was of proper type and included expected fields
+-   `CheckServiceFieldsToolCallSubTask` - verify if a message published to ROS 2service was of proper type and included expected fields
+-   `CheckActionFieldsToolCallSubTask` - verify if a message published to ROS 2action was of proper type and included expected fields
 
-```
-score = (correctly_placed_now - correctly_placed_initially) / initially_incorrect
-```
+### Validator
+
+The `Validator` class can combine single or multiple subtasks to create a single validation step. Following validators are available:
+
+-   OrderedCallsValidator - requires a strict order of subtasks. The next subtask will be validated only when the previous one was completed. Validator passes when all subtasks pass.
+-   NotOrderedCallsValidator - doesn't enforce order of subtasks. Every subtask will be validated against every tool call. Validator passes when all subtasks pass.
+
+### Task
 
-## Integration with RAI Sim
+A Task represents a specific prompt and set of tools available. A list of validators is assigned to validate the performance.
 
-RAI Bench leverages RAI Sim's simulator-agnostic interface to:
+??? info "Task class definition"
 
--   Execute tasks in any supported simulation environment
--   Access and manipulate simulation entities
--   Monitor scene state and object positions
--   Manage simulation configurations
+    ::: rai_bench.tool_calling_agent.interfaces.Task
 
-This integration allows for:
+As you can see, the framework is very flexible. Any SubTask can be combined into any Validator that can be later assigned to any Task.
 
--   Consistent task execution across different simulators
--   Reliable performance measurement
--   Flexible task definition
--   Comprehensive result analysis
+### ToolCallingAgentBenchmark
 
-## Configuration
+The ToolCallingAgentBenchmark class manages the execution of tasks and collects results.
 
-Simulation configurations are defined in YAML files that specify:
+### Available Tasks
 
--   Scene setup
--   Object types and positions
--   Task-specific parameters
+Tasks of this benchmark are grouped by type:
 
-## Error Handling
+-   Basic - basic usage of tools
+-   Navigation
+-   Spatial reasoning - questions about surroundings with images attached
+-   Manipulation
+-   Custom Interfaces - requires using messages with custom interfaces
 
-The framework includes comprehensive error handling for:
+If you want to know details about every task, visit `rai_bench/tool_calling_agent/tasks`
 
--   Invalid configurations
--   Task validation failures
--   Simulation errors
--   Performance tracking
+## Test Models
diff --git a/docs/simulation_and_benchmarking/rai_sim.md b/docs/simulation_and_benchmarking/rai_sim.md
@@ -13,16 +13,13 @@ The `SimulationBridge` is an abstract base class that defines the interface for
 -   Object pose retrieval
 -   Scene state monitoring
 
-### SimulationConfig
+### SceneConfig
 
-The `SimulationConfig` is a base configuration class that specifies the entities to be spawned in the simulation. Each simulation bridge can extend this with additional parameters specific to its implementation.
+The `SceneConfig` is a configuration class that specifies the entities to be spawned in the simulation.
 
-Key features:
+### SimulationConfig
 
--   Entity list management
--   Unique name validation
--   YAML configuration loading
--   Frame ID specification
+The `SimulationConfig` is an abstract configuration class. Each simulation bridge can extend this with additional parameters specific to its implementation.
 
 ### SceneState
 
@@ -54,6 +51,7 @@ To use RAI Sim with a specific simulation environment:
 1. Create a custom `SimulationBridge` implementation for your simulator
 2. Extend `SimulationConfig` with simulator-specific parameters
 3. Implement the required abstract methods:
+    - `init_simulation`
     - `setup_scene`
     - `_spawn_entity`
     - `_despawn_entity`
@@ -100,3 +98,11 @@ RAI Sim serves as the foundation for RAI Bench by providing:
 -   Configuration management
 
 This allows RAI Bench to focus on task definition and evaluation while remaining simulator-agnostic.
+
+## LaunchManager
+
+RAI Sim also provides a ROS2LaunchManager class that manages the start and shutdown of ROS 2`LaunchDescription`
+
+??? info "ROS2LaunchManager class definition"
+
+    ::: rai_sim.launch_manager.ROS2LaunchManager