ETH Zurich
Caltech
ETH Zurich
ETH Zurich
ETH Zurich
John Hopkins University
ETH Zurich
Caltech
ETH Zurich
NeurIPS 2025
We introduce SPiDR (Sim-to-real via Pessimistic Domain Randomization), a scalable algorithm that closes the gap between simulation and reality without compromising safety. SPiDR uses domain randomization to incorporate the uncertainty about the sim-to-real gap into the safety constraints, making it versatile and highly compatible with existing training pipelines.
We theoretically show that unsafe transfer can be related to large uncertainty about the sim-to-real gap, quantified as the disagreement among next-state predictions from domain-randomized dynamics models. This key idea is illustrated in the figure above, where spikes in uncertainty (e.g. at t = 4.6 and t = 5.3) coincide with unstable or unsafe behaviors, such as stumbling or flipping. Motivated by this insight, we propose to penalize the cost with the uncertainty to achieve safe sim-to-real transfer, leading to the design of SPiDR.
Domain randomization is the de facto method for reliable transfer for simulator to real robots. Safety is a key component in robotics, however it is not addressed directly by domain randomization. This means that you can train your robot in simulation, and it may still violate safety constraints (like falling or overusing the motors) when you get to deploy the policy on the real system. Existing methods for safe transfer typically rely on tools from robust optimization, therefore requiring roboticists to significantly alter our beloved domain randomization training pipelines. We started this project with the following question:
How can we develop a safe sim-to-real reinforcement learning algorithm that practitioners can use without reinventing the wheel?
With that in mind, the first obvious thing to do is check is just using domain randomization can still satisfy safety constraints, even when under distribution shifts. The figure below shows that using domain randomization fails to satisfy safety constraints on a bunch of well-known tasks in Mujoco. In contrast, when using SPiDR, constraints are satisfied across all tasks.
We saw that by adding pessimis, SPiDR is able to transfer safely. But what if with this added conservatism SPiDR is just safe, but does not really solve the task? In the figure below we show the performance in the y-axis vs. safety in the x-axis (upper-left is better). As shown, among all baselines, SPiDR consistently satisfies the constraints while achieving good performance.
In our last experiments, we took SPiDR to the real world, deploying it on a Unitree Go1 robot and a remote-controlled race car, shown below.
In the race car, the robot has to navigate to a goal position while avoiding the tire obstacles. For the Unitree Go1, we put a safety constraint on the robot’s joint limits.
Below, we show two trajectories of the Unitree Go1 robot. Both trajectories are conditioned exactly on the same commands, however the one on the left uses SPiDR, while the one on the right uses a baseline algorithm from the robust RL literature.
Below, we showcase SPiDR using the race car robot.
As shown, in both robotics, SPiDR transfers successfully while maintaining the safety constraint.
Finally, we present the accumulated costs that measure safety on the real robots across five random seeds and for multiple trajectories per seed.
This empirical evaluation demonstrates again that SPiDR achieves good performance while transferring safely to the real robots.
Check out our paper for a deep dive and more cool experiments.
@inproceedings{
as2025spidrsimpleapproachzeroshot,
title={{SP}i{DR}: A Simple Approach for Zero-Shot Safety in Sim-to-Real Transfer},
author={Yarden As and Chengrui Qu and Benjamin Unger and Dongho Kang and Max van der Hart and Laixi Shi and Stelian Coros and Adam Wierman and Andreas Krause},
booktitle={International Conference on Neural Information Processing Systems},
year={2025},
}