SPiDR: A Simple Approach for Zero-Shot Safety in Sim-to-Real Transfer

Yarden As

ETH Zurich

Benjamin Unger

ETH Zurich

Dongho Kang

ETH Zurich

Max van der Hart

ETH Zurich

Laixi Shi

John Hopkins University

Stelian Coros

ETH Zurich

Andreas Krause

ETH Zurich

NeurIPS 2025

Core idea of SPiDR

We introduce SPiDR (Sim-to-real via Pessimistic Domain Randomization), a scalable algorithm that closes the gap between simulation and reality without compromising safety. SPiDR uses domain randomization to incorporate the uncertainty about the sim-to-real gap into the safety constraints, making it versatile and highly compatible with existing training pipelines.

We theoretically show that unsafe transfer can be related to large uncertainty about the sim-to-real gap, quantified as the disagreement among next-state predictions from domain-randomized dynamics models. This key idea is illustrated in the figure above, where spikes in uncertainty (e.g. at t = 4.6 and t = 5.3) coincide with unstable or unsafe behaviors, such as stumbling or flipping. Motivated by this insight, we propose to penalize the cost with the uncertainty to achieve safe sim-to-real transfer, leading to the design of SPiDR.

Domain Randomization is Not Safe

Domain randomization is the de facto method for reliable transfer for simulator to real robots. Safety is a key component in robotics, however it is not addressed directly by domain randomization. This means that you can train your robot in simulation, and it may still violate safety constraints (like falling or overusing the motors) when you get to deploy the policy on the real system. Existing methods for safe transfer typically rely on tools from robust optimization, therefore requiring roboticists to significantly alter our beloved domain randomization training pipelines. We started this project with the following question:

How can we develop a safe sim-to-real reinforcement learning algorithm that practitioners can use without reinventing the wheel?

With that in mind, the first obvious thing to do is check is just using domain randomization can still satisfy safety constraints, even when under distribution shifts. The figure below shows that using domain randomization fails to satisfy safety constraints on a bunch of well-known tasks in Mujoco. In contrast, when using SPiDR, constraints are satisfied across all tasks.

Domain randomization does not satisfy safety constraints under mismatches in the dynamics.
Domain randomization does not satsify the safety constraints.

What About Performance?

We saw that by adding pessimis, SPiDR is able to transfer safely. But what if with this added conservatism SPiDR is just safe, but does not really solve the task? In the figure below we show the performance in the y-axis vs. safety in the x-axis (upper-left is better). As shown, among all baselines, SPiDR consistently satisfies the constraints while achieving good performance.

SPiDR finds good balance between safety and performance.
SPiDR finds good balance between safety and performance

Enough with Mujoco, Show Me Some Real Robots!

In our last experiments, we took SPiDR to the real world, deploying it on a Unitree Go1 robot and a remote-controlled race car, shown below.

Real robots

In the race car, the robot has to navigate to a goal position while avoiding the tire obstacles. For the Unitree Go1, we put a safety constraint on the robot’s joint limits.

Below, we show two trajectories of the Unitree Go1 robot. Both trajectories are conditioned exactly on the same commands, however the one on the left uses SPiDR, while the one on the right uses a baseline algorithm from the robust RL literature.

SPiDR

SPiDR follows commands while satisfying the safety constraints.

Baseline

Robust safe RL baseline fails to transfer to the real robot.

Below, we showcase SPiDR using the race car robot.

SPiDR maintains safety upon deployment.

As shown, in both robotics, SPiDR transfers successfully while maintaining the safety constraint.

Finally, we present the accumulated costs that measure safety on the real robots across five random seeds and for multiple trajectories per seed.

Safety is maintained when deploying on two real robotic systems.
SPiDR finds good balance between safety and performance

This empirical evaluation demonstrates again that SPiDR achieves good performance while transferring safely to the real robots.

Cool! I Want to Learn More

Check out our paper for a deep dive and more cool experiments.

Cite

  @inproceedings{
  as2025spidrsimpleapproachzeroshot,
  title={{SP}i{DR}: A Simple Approach for Zero-Shot Safety in Sim-to-Real Transfer},
  author={Yarden As and Chengrui Qu and Benjamin Unger and Dongho Kang and Max van der Hart and Laixi Shi and Stelian Coros and Adam Wierman and Andreas Krause},
  booktitle={International Conference on Neural Information Processing Systems},
  year={2025},
}