TamedPUMA: safe and stable imitation learning with geometric fabrics

By leveraging the recently introduced geometric framework for motion planning called geometric fabrics, our approach learns stable motion profiles while considering online whole-body collision-avoidance and joint limits.

Abstract

Using the language of dynamical systems, Imitation learning (IL) provides an intuitive and effective way of teaching stable task-space motions to robots with goal convergence. Yet, IL techniques are affected by serious limitations when it comes to ensuring safety and fulfillment of physical constraints. With this work, we solve this challenge via TamedPUMA, an IL algorithm augmented with a recent development in motion planning called geometric fabrics. As both the IL policy and geometric fabrics describe motions as artificial second-order dynamical systems, we propose two variations where IL provide a navigation policy for geometric fabrics. The result is a stable imitation learning strategy within which we can seamlessly blend geometrical constraints like collision avoidance and joint limits. Beyond providing a theoretical analysis, we demonstrate TamedPUMA with simulated and real-world tasks, including a 7-DoF manipulator.

Keywords: Imitation Learning, Dynamical Systems, Geometric Motion Planning, Fabrics, Movement Primitives

Illustrative Point-mass Example

In Fig. 1 , trajectories are shown of a 2D point-mass example for the proposed methods, alongside baselines of geometric fabrics[1, 2] with a manually designed potential function, and PUMA[3]. The purely data-driven method PUMA follows the motion profile as learned from demonstrations, although it has no notion of obstacle avoidance or other physical constraints. The proposed methods, FPM and CPM, successfully follow the desired motion profile of PUMA, while avoiding collisions with the obstacles. In contrast, geometric fabrics are unable to follow the desired motion profile.

lab

Experiment: Tomato picking with a 7-DoF manipulator

Demonstrating the tomato-picking task

Results of the tomato-picking task with TamedPUMA

Via TamedPUMA, PUMA is enhanced for safe and stable navigation, while accounting for whole-body collision avoidance and joint limits. We propose two variations, the Forcing Policy Method (FPM) and the Compatible Potential Method (CPM).

FPM

In the FPM, convergence to the goal is not guaranteed, but works well in practice.

CPM

CPM provides a stronger notion of goal convergence than the FPM. In practice, the performance is similar to FPM.

An out-of-distribution scenario

When an obstacle forces the robot towards a pose far from the demonstrations, TamedPUMA recovers and reaches the goal.

Human disturbances

When a human disturbs the robot, TamedPUMA can recover and converges to the goal.

Experiment: Pouring a liquid

Avoiding the helmet

TamedPUMA avoids collisions with the helmet while performing a learned pouring task.

Online goal changes

Using TamedPUMA, the goal can be changed online.

Comparison with geometric fabrics and PUMA

Note: The following videos might not display in Chrome, but they do render in Firefox, we are addressing this problem!

Geometric Fabrics

Fabrics causes a deadlock scenario with two obstacles representing the side of the box.

TamedPUMA (ours)

The DNN encodes an intuitive movement to avoid the side of the box.

PUMA

Via PUMA, the demonstrated movement is learned while ensuring goal converge. Obstacle avoidance and physical constraints are NOT considered.

TamedPUMA (ours)

Via TamedPUMA, PUMA is enhanced for safe and stable navigation. Whole-body collision avoidance and joint limits are incorporated.

Supplementary theoretical details on TamedPUMA

This link guides to the webpage with theoretical details on TamedPUMA.

Specifications of the DNN by PUMA

Table 1 contains specifications of the trained DNN for the tomato-picking task and pouring task. The network provides a second-order dynamical system where positions are trained over a Euclidian space and orientations over a spherical space. An illustration of the performance is provided in Figure 3 at 5000 iterations. This shows the convergence to goal and the closeness of the solutions to the demonstrated motion profile by the user. A DNN with 5000 iterations takes 9 minutes to train on a standard laptop (i7-12700H) and 0.6 $\pm$ 0.1 ms to request an action from the DNN online.

Table 1. Hyperparameters of PUMA for a second-order dynamical system learning a pose, e.g. position and orientation.
Hyperparameter Value (tomato-picking) Value (pouring)
PUMA Stability loss margin ($m$) 1e-6 1e-6
Triplet imitation loss weight ($\lambda$) 1.0 1.0
Window size imitation ($\mathcal{H}^i$) 13 13
Window size stability ($\mathcal{H}^s$) 2 2
Batch size imitation ($\mathcal{B}^i$) 800 800
Batch size stability ($\mathcal{B}^s$) 800 800
Neural Network Optimizer Adam Adam
Number of iterations 40000 27000
Learning rate 1e-4 1e-4
Activation function GELU GELU
Num. layers ($\varphi_{\theta}, \rho_\theta$) (3, 3) (3, 3)
Layer normalization yes yes
lab
Figure 3. Performance of the DNN trained via PUMA where all states are normalized. The demonstrations are indicated in black, the trajectories generated by the DNN for the same initial poses as the demonstrations are given in red, and trajectories from randomly sampled initial conditions are indicated in blue.

References

  1. Ratliff, Nathan, and Van Wyk, Karl. (2023). "Fabrics: A Foundationally Stable Medium for Encoding Prior Experience." arXiv preprint arXiv:2309.07368.
  2. Ratliff, Nathan D., Van Wyk, Karl, Xie, Mandy, Li, Anqi, and Rana, Muhammad Asif. (2020). "Optimization fabrics." arXiv preprint arXiv:2008.02399.
  3. Pérez-Dattari, Rodrigo, Della Santina, Cosimo and Kober, Jens. (2024). "Deep metric imitation learning for stable motion primitives." Advanced Intelligent Systems.

Safe and stable motion primitives via imitation learning and geometric fabrics
Saray Bakker, Rodrigo Pérez-Dattari, Cosimo Della Santina, Wendelin Böhmer, Javier Alonso-Mora. In Robotics: Science and Systems, Workshop on Structural Priors as Inductive Biases for Learning Robot Dynamics, 2024.

Using the language of dynamical systems, Imitation learning (IL) provides an intuitive and effective way of teaching stable task-space motions to robots with goal convergence. Yet, these techniques are affected by serious limitations when it comes to ensuring safety and fulfillment of physical constraints. With this work, we propose to solve this challenge via TamedPUMA, an IL algorithm augmented with a recent development in motion planning called geometric fabrics. We explore two variations of this approach, which we name the forcing policy method and the compatible potential method. Making these combinations possible requires two enabling factors: the possibility of learning second-order dynamical systems by imitation and the availability of a potential function that is compatible with the learned dynamics. In this paper, we show how these conditions can be met when using an IL strategy called PUMA. The result is a stable imitation learning strategy within which we can seamlessly blend geometrical constraints like collision avoidance and joint limits. Beyond providing a theoretical analysis, we demonstrate TamedPUMA with simulated and real-world tasks, including a 7-degree-of-freedom manipulator that is trained to pick a tomato from a crate in the presence of obstacles.

Reactive grasp and motion planning for adaptive mobile manipulation among obstacles
Tomas Merva, Saray Bakker, Max Spahn, Ivan Virgala, Javier Alonso-Mora. In Robotics: Science and Systems, Workshop on Frontiers of Optimization for Robotics, 2024.

Mobile manipulators are susceptible to situations in which the precomputed grasp pose is not reachable as the result of conflicts between collision avoidance behaviour and the manipulation task. In this work, we address this issue by combining real-time grasp planning with geometric motion planning for decentralized multi-agent systems, referred to as Reactive Grasp Fabrics (RGF). We optimize the precomputed grasp pose candidate to account for obstacles and the robot's kinematics. By leveraging a reactive geometric motion planner, specifically geometric fabrics, the grasp optimization problem can be simplified, resulting in a fast, adaptive framework that can resolve deadlock situations in pick-and-place tasks. We demonstrate the robustness of this approach by controlling a mobile manipulator in both simulation and real-world experiments in dynamic environments.

Multi-Robot Local Motion Planning Using Dynamic Optimization Fabrics
Saray Bakker, Luzia Knoedler, Max Spahn, Wendelin Boehmer, Javier Alonso-Mora. In Proc. IEEE International Symposium on Multi-Robot and Multi-Agent Systems, 2023.

In this paper, we address the problem of real-time motion planning for multiple robotic manipulators that operate in close proximity. We build upon the concept of dynamic fabrics and extend them to multi-robot systems, referred to as Multi-Robot Dynamic Fabrics (MRDF). This geometric method enables a very high planning frequency for high-dimensional systems at the expense of being reactive and prone to deadlocks. To detect and resolve deadlocks, we propose Rollout Fabrics where MRDF are forward simulated in a decentralized manner. We validate the methods in simulated close-proximity pick-and-place scenarios with multiple manipulators, showing high success rates and real-time performance.