How can I add termination condition to legged robot environment? #575

DINHQuangDung1999 · 2025-03-14T12:07:04Z

DINHQuangDung1999
Mar 14, 2025

Hi everyone,

I am working on building an environments for quadruped robots in Warp. My goal is to make an environment for deploying policy learning algorithms (BPTT/RL). In order to do that, I need to define termination conditions, which I am struggling with. At the moment, the code is using CUDA graph to capture the forward loop. My approach for the moment is simply iteratively update the attribute self._terminated inside the graph, but apparently this is trouble some and does not work out. I looked at the Warp examples and did not find any thing helpful. Can you help me with this? Below is my code.

import math
import os
import warp as wp
import warp.sim
import warp.sim.render

@wp.kernel
def check_healthy_ker(joint_q: wp.array(dtype = wp.float32), 
               healthy_z_range: wp.array(dtype = wp.float32), 
               healthy_roll_range: wp.array(dtype = wp.float32), 
               healthy_pitch_range: wp.array(dtype = wp.float32), 
               is_healthy: wp.array(dtype = wp.int32)
               ):

    for i in range(19):
        if not warp.isfinite(joint_q[i]):
            is_healthy[0] = 0

    quat = wp.quat(joint_q[3], joint_q[4], joint_q[5], joint_q[6])
    rpy = warp.sim.utils.quat_to_rpy(quat)

    min_z, max_z = healthy_z_range[0], healthy_z_range[1]
    if not min_z <= joint_q[1] and joint_q[1] <= max_z:
        is_healthy[0] = 0

    min_roll, max_roll = healthy_roll_range[0], healthy_roll_range[1]
    if not min_roll <= rpy[0] and rpy[0]<= max_roll:
        is_healthy[0] = 0

    min_pitch, max_pitch = healthy_pitch_range[0], healthy_pitch_range[1]
    if not min_pitch <= rpy[1] and rpy[1] <= max_pitch:
        is_healthy[0] = 0
    
 class Example:
    def __init__(self, stage_path="example_quadruped_walking.usd", verbose=False, num_frames=300, training = False):
        self.verbose = verbose
        self.training = training

        # Simulation params
        fps = 60
        self.frame_dt = 1.0 / fps
        self.num_frames = num_frames
        self.sim_substeps = 50
        self.sim_dt = self.frame_dt / self.sim_substeps
        self.num_envs = 1
        
        # Counters
        self.iter = 0
        self.sim_time = 0.0
        self.render_time = 0.0
        self.sim_frame = 0

        # Optimizers
        self.train_rate = 0.001

        # Builders
        articulation_builder = wp.sim.ModelBuilder()
        wp.sim.parse_urdf(
            os.path.join(warp.examples.get_asset_directory(), "quadruped.urdf"),
            articulation_builder,
            xform=wp.transform([0.0, 0.8, 0.0], warp.quat_from_axis_angle(wp.vec3(1.0, 0.0, 0.0), -math.pi * 0.5)),
            floating=True,
            density=1000,
            armature=0.01,
            stiffness=50,
            damping=1,
            contact_ke=1.0e4,
            contact_kd=1.0e2,
            contact_kf=1.0e2,
            contact_mu=1.0,
            limit_ke=1.0e4,
            limit_kd=1.0e1,
        )
        
        builder = wp.sim.ModelBuilder()

        builder.add_builder(articulation_builder)
        builder.joint_axis_mode = [wp.sim.JOINT_MODE_TARGET_POSITION] * len(builder.joint_axis_mode)

        self.model = builder.finalize(requires_grad=True) # finalize model
        self.control = self.model.control()
        self.model.ground = True
        self.model.joint_attach_ke = 16000.0
        self.model.joint_attach_kd = 200.0
        
        # Metrics used to determine if the episode should be terminated
        self._healthy_z_range = wp.array((0.22, 0.65), dtype=wp.float32)
        self._healthy_pitch_range = wp.array((-10./math.pi, 10./math.pi), dtype=wp.float32)
        self._healthy_roll_range = wp.array((-10./math.pi, 10./math.pi), dtype=wp.float32)

        # allocate sim states
        self.states = []
        for _i in range(self.num_frames * self.sim_substeps + 1):
            self.states.append(self.model.state(requires_grad=True))
        wp.sim.eval_fk(self.model, self.model.joint_q, self.model.joint_qd, None, self.states[0])

        self._terminated = False
        # initialize the integrator.
        self.integrator = wp.sim.SemiImplicitIntegrator()
        # self.integrator = warp.sim.FeatherstoneIntegrator(self.model, use_tile_gemm=False)

        # capture forward/backward passes
        self.use_cuda_graph = wp.get_device().is_cuda
        if self.use_cuda_graph:
            with wp.ScopedCapture() as capture:
                self.tape = wp.Tape()
                with self.tape:
                    for i in range(self.num_frames):
                        self.forward()
                        # if self._check_termination:
                        #     print(self._terminated)
                            # break
                if self.training:
                    self.tape.backward(self.loss)
            self.graph = capture.graph
        breakpoint()

    @property
    def _check_termination(self):
        state = self.states[self.sim_frame * self.sim_substeps]
        is_healthy = wp.array([1], dtype = wp.int32)
        wp.launch(check_healthy_ker, 
                  dim = 1,
                  inputs = (state.joint_q, 
                            self._healthy_z_range, 
                            self._healthy_roll_range, 
                            self._healthy_pitch_range, 
                            is_healthy))
        self._terminated = is_healthy
        # return bool(is_healthy.numpy()[0])
        return wp.bool(is_healthy)
    

    def forward(self):
        with wp.ScopedTimer("simulate", active=self.verbose):
            # run simulation loop
            for i in range(self.sim_substeps):
                self.states[self.sim_frame * self.sim_substeps + i].clear_forces()
                wp.sim.collide(self.model, self.states[self.sim_frame * self.sim_substeps + i])
                self.integrator.simulate(
                    self.model,
                    self.states[self.sim_frame * self.sim_substeps + i],
                    self.states[self.sim_frame * self.sim_substeps + i + 1],
                    self.sim_dt,
                    self.control,
                )
                self.sim_time += self.sim_dt

    def step(self):
        with wp.ScopedTimer("step"):
            if self.use_cuda_graph:
                wp.capture_launch(self.graph)
            else:
                self.tape = wp.Tape()
                with self.tape:
                    for i in range(self.num_frames):
                        self.forward()
                        # if self._check_termination:
                        #     print(self._terminated)
                            # break
        # reset sim
        self.sim_frame = 0
        self.sim_time = 0.0
        self.states[0] = self.model.state(requires_grad=True)

        self.iter += 1

Answered by etaoxing

Mar 14, 2025

you could try adding terminal as a wp.array to model.state and update it inside a wp.kernel. right now, looks like you're using the same variable across all states.

fyi Rewarped Ant RL example :)

View full answer

etaoxing · 2025-03-14T12:18:30Z

etaoxing
Mar 14, 2025

you could try adding terminal as a wp.array to model.state and update it inside a wp.kernel. right now, looks like you're using the same variable across all states.

fyi Rewarped Ant RL example :)

1 reply

DINHQuangDung1999 Mar 14, 2025
Author

Hi @etaoxing. Thank you very much for your answer. Out of the topic, I came across your recent paper Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation a few weeks ago on Open Review and find it interesting. I believe back then there was no code published yet. I am glad to see that it is now published.

shi-eric · 2025-03-14T16:42:08Z

shi-eric
Mar 14, 2025
Maintainer

To add some more information, Warp doesn't currently support these kinds of dynamic graph evaluations, which would require us to expose this kind of functionality: https://developer.nvidia.com/blog/dynamic-control-flow-in-cuda-graphs-with-conditional-nodes/ (You're welcome to open a feature request).

Specifically, it seems that the number of times you call forward() in the graph depends on values that are computed inside the graph itself. The control flow should be static.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How can I add termination condition to legged robot environment? #575

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How can I add termination condition to legged robot environment? #575

Uh oh!

DINHQuangDung1999 Mar 14, 2025

Replies: 2 comments · 1 reply

Uh oh!

etaoxing Mar 14, 2025

Uh oh!

DINHQuangDung1999 Mar 14, 2025 Author

Uh oh!

shi-eric Mar 14, 2025 Maintainer

DINHQuangDung1999
Mar 14, 2025

Replies: 2 comments 1 reply

etaoxing
Mar 14, 2025

DINHQuangDung1999 Mar 14, 2025
Author

shi-eric
Mar 14, 2025
Maintainer