Puze Liu
German Research Center for AI
Robot Parkour (2023), Qi Zhi Lab, Stanford
Robot Parkour (2023), Qi Zhi Lab, Stanford
Robot Soccer (2023), Deepmind
Robot Parkour (2023), Qi Zhi Lab, Stanford
Robot Soccer (2023), Deepmind
Humanoid Backflip (2024), Unitree
Robot Parkour (2023), Qi Zhi Lab, Stanford
Robot Soccer (2023), Deepmind
Humanoid Backflip (2024), Unitree
Robot Parkour (2023), Qi Zhi Lab, Stanford
Robot Soccer (2023), Deepmind
Humanoid Backflip (2024), Unitree
Safe Exploration
Design safe policies under given constraints
Safety Constraint
Learn safety constraints for complex environments
$$ \begin{align*} \max_{\pi} \quad & \mathbb{E}_{\tau \in \pi} \left[ \sum_{t}^{T} \gamma^t r(\vs_t, \va_t) \right] \\ \mathrm{s.t.} \quad & k(\vs_t) < 0 \end{align*}$$
Robot dynamics $\dot{\vs} = f(\vs) + G(\vs) \vu_s $
$$ \begin{align*} \max_{\pi} \quad & \mathbb{E}_{\tau \in \pi} \left[ \sum_{t}^{T} \gamma^t r(\vs_t, \va_t) \right] \\ \mathrm{s.t.} \quad & k(\vs_t) < 0 \end{align*}$$
Robot dynamics $\dot{\vs} = f(\vs) + G(\vs) \vu_s $
Convert the constraint into a constraint manifold and construct a safe action space that drives the state tangent to the constraint manifold.
Safe Set $\quad \mathcal{C} = \{\vs \in \mathcal{S} | k(\vs) < 0 \}$
Constraint Manifold in Augmented State Space $$ \MM = \{(\vs, \vmu) \in \mathcal{S} \times \mathbb{R}^{+} | c(\vs, \vmu) \coloneqq k(\vs) + \vmu = 0\} $$
Dynamics for Slack Variable $$\dot{\vmu} = A(\vmu) \vu_{\mu}$$ $A$ is continuous, strictly increasing and $A(\vzero) = \vzero$
Augmented Dynamics $$\begin{bmatrix}\dot{\vs} \\ \dot{\vmu} \end{bmatrix} =\begin{bmatrix} f(\vs) \\ \vzero \end{bmatrix} + \begin{bmatrix} G(\vs) & \vzero \\ \vzero & A(\vmu) \end{bmatrix} \begin{bmatrix} \vu_s \\ \vu_{\mu} \end{bmatrix}$$
Velocity tangent to the manifold $$\mathrm{T}_{(s, \mu)}\MM =\left\{ (\dot{\vs}, \dot{\vmu}) | \dot{c}(\vs, \vmu) = \begin{bmatrix} \mJ_k & \mathbb{I} \end{bmatrix} \begin{bmatrix} \dot{\vs} \\ \dot{\vmu} \end{bmatrix} = \vzero \right\}$$
Combine with augmented state dynamics $$ \underbrace{\mJ_k\vf}_{\vpsi} + \underbrace{\begin{bmatrix} \mJ_k \mG & \mA \end{bmatrix}}_{\mJ_u} \begin{bmatrix} \vu_s \\ \vu_\mu \end{bmatrix}= \vzero $$
Acting on the TAngent space of the COnstraint Manifold (ATACOM)
$\begin{bmatrix} \vu_s \\ \vu_\mu \end{bmatrix} = \textcolor{b2e061}{\underbrace{-\mJ_u^{\dagger} \vpsi}_{\text{Drift Comp.}}}$ $\textcolor{#fd7f6f}{\underbrace{- \lambda \mJ_u^{\dagger} \vc}_{\text{ Contraction }}}$ $\textcolor{7eb0d5}{\underbrace{+ \mB_u \va }_{\text{ Tangential }}} $
Acting on the TAngent space of the COnstraint Manifold (ATACOM)
$\begin{bmatrix} \vu_s \\ \vu_\mu \end{bmatrix} = \textcolor{b2e061}{\underbrace{-\mJ_u^{\dagger} \vpsi}_{\text{Drift Comp.}}}$ $\textcolor{#fd7f6f}{\underbrace{- \lambda \mJ_u^{\dagger} \vc}_{\text{ Contraction }}}$ $\textcolor{7eb0d5}{\underbrace{+ \mB_u \va }_{\text{ Tangential }}} $
Acting on the TAngent space of the COnstraint Manifold (ATACOM)
$\begin{bmatrix} \vu_s \\ \vu_\mu \end{bmatrix} = \textcolor{b2e061}{\underbrace{-\mJ_u^{\dagger} \vpsi}_{\text{Drift Comp.}}}$ $\textcolor{#fd7f6f}{\underbrace{- \lambda \mJ_u^{\dagger} \vc}_{\text{ Contraction }}}$ $\textcolor{7eb0d5}{\underbrace{+ \mB_u \va }_{\text{ Tangential }}} $
Velocity Controlled System: $\dot{s} = u_s$
Constraint: $s^2 < 1$
Constraint Manifold: $$\MM = \{(s, \mu) | s^2 + \mu -1 = 0 \} $$
$$\psi = 0, a=1$$
$$J_u = \begin{bmatrix} 2s & \mu \end{bmatrix}, B_u=\begin{bmatrix} -\mu \\ 2s \end{bmatrix}$$
Training result in simulation
Real World Experiment Setup
Success rate
Simulation | 86% |
Zero-shot transfer | 12% |
Fine tuning | 71% |
Hitting Velocity
Simulation | 0.92m/s |
Zero-shot transfer | 0.97m/s |
Fine tuning | 0.97m/s |
1. Separable State Space
$$ \vs = \begin{bmatrix}\vq \\ \vx \end{bmatrix}, \quad \dot{\vq} = f(\vq) + G(\vq) \vu $$
Constraint: $k(\vq, \vx) \leq \vzero$
Drift and Jacobian: $$\vpsi = \mJ_q f + \mJ_x \dot{\vx}$$ $$\mJ_u = \begin{bmatrix} \mJ_q G & A \end{bmatrix}$$
Mobile Robot with Differential Drive
15 Moving Obstacles
Safe Exploration
Design safe policies under given constraints
Safety Constraint
Learn safety constraints for complex environments
Distance-Based Constraint
$$ \Vert \vx_{\mathrm{robot}} - \vx_{\mathrm{obs}}\Vert > \delta $$
e.g., Spheres, Cylinders
e.g., Neural Network
$$ d(\vx, \textcolor{DarkSalmon}{\vq}) = \left[1-\sigma_{\vtheta}(\vx, \textcolor{DarkSalmon}{\vq})\right]\textcolor{LightGreen}{\underbrace{f_{\vtheta}(\vx, \vq)}_{\mathrm{NN}}} + \sigma_{\vtheta}(\vx, \textcolor{DarkSalmon}{\vq})\textcolor{Orange}{\underbrace{\lVert \vx - \vx_c \rVert_2}_{\mathrm{Point\,Dist.}}}$$
$$ d(\vx, \textcolor{DarkSalmon}{\vq}) = \left[1-\sigma_{\vtheta}(\vx, \textcolor{DarkSalmon}{\vq})\right]\textcolor{LightGreen}{\underbrace{f_{\vtheta}(\vx, \vq)}_{\mathrm{NN}}} + \sigma_{\vtheta}(\vx, \textcolor{DarkSalmon}{\vq})\textcolor{Orange}{\underbrace{\lVert \vx - \vx_c \rVert_2}_{\mathrm{Point\,Dist.}}}$$
$ \sigma_{\vtheta}(\vx, \vq) = \sigmoid\left(\textcolor{OrangeRed}{\alpha_{\vtheta}}\left( \lVert\vx - \vx_c\rVert_2 - \textcolor{OrangeRed}{\rho_{\vtheta}} \right)\right) $
Table
Shelf
Table
Shelf
Tiago
Human
Ground Truth
ReDSDF
DeepSDF
ECMNN
Real World
Simulation
3x speed
ATACOM: Safe Exploration on the Tangent Space of the Constraimt Manifold
Safe Exploration in Dynamic Environment
ReDSDF: Regularized Deep Signed Distance Field for Safe Learning and Control