By leveraging the recently introduced geometric framework for motion planning called geometric fabrics,
our approach learns stable motion profiles while considering online whole-body collision-avoidance and joint limits.
Abstract
Using the language of dynamical systems, Imitation learning (IL) provides an intuitive and effective way of teaching stable task-space motions to robots with goal convergence. Yet, IL techniques are affected by serious limitations when it comes to ensuring safety and fulfillment of physical constraints. With this work, we solve this challenge via TamedPUMA, an IL algorithm augmented with a recent development in motion planning called geometric fabrics.
As both the IL policy and geometric fabrics describe motions as artificial second-order dynamical systems, we propose two variations where IL provide a navigation policy for geometric fabrics.
The result is a stable imitation learning strategy within which we can seamlessly blend geometrical constraints like collision avoidance and joint limits.
Beyond providing a theoretical analysis, we demonstrate TamedPUMA with simulated and real-world tasks, including a 7-DoF manipulator.
In Fig. 1 , trajectories are shown of a 2D point-mass example for the proposed methods, alongside baselines of geometric fabrics[1, 2] with a manually designed potential function, and PUMA[3]. The purely data-driven method PUMA follows the motion profile as learned from demonstrations, although it has no notion of obstacle avoidance or other physical constraints. The proposed methods, FPM and CPM, successfully follow the desired motion profile of PUMA, while avoiding collisions with the obstacles. In contrast, geometric fabrics are unable to follow the desired motion profile.
Experiment: Tomato picking with a 7-DoF manipulator
Demonstrating the tomato-picking task
Results of the tomato-picking task with TamedPUMA
Via TamedPUMA, PUMA is enhanced for safe and stable navigation, while accounting for whole-body collision avoidance and joint limits. We propose two variations, the Forcing Policy Method (FPM) and the Compatible Potential Method (CPM).
FPM
In the FPM, convergence to the goal is not guaranteed, but works well in practice.
CPM
CPM provides a stronger notion of goal convergence than the FPM. In practice, the performance is similar to FPM.
An out-of-distribution scenario
When an obstacle forces the robot towards a pose far from the demonstrations, TamedPUMA recovers and reaches the goal.
Human disturbances
When a human disturbs the robot, TamedPUMA can recover and converges to the goal.
Experiment: Pouring a liquid
Avoiding the helmet
TamedPUMA avoids collisions with the helmet while performing a learned pouring task.
Online goal changes
Using TamedPUMA, the goal can be changed online.
Comparison with geometric fabrics and PUMA
Geometric Fabrics
Fabrics causes a deadlock scenario with two obstacles representing the side of the box.
TamedPUMA (ours)
The DNN encodes an intuitive movement to avoid the side of the box.
PUMA
Via PUMA, the demonstrated movement is learned while ensuring goal converge. Obstacle avoidance and physical constraints are NOT considered.
TamedPUMA (ours)
Via TamedPUMA, PUMA is enhanced for safe and stable navigation. Whole-body collision avoidance and joint limits are incorporated.
Table 1 contains specifications of the trained DNN for the tomato-picking task and pouring task. The network provides a second-order dynamical system where positions are trained over a Euclidian space and orientations over a spherical space. An illustration of the performance is provided in
Figure 3 at 5000 iterations. This shows the convergence to goal and the closeness of the solutions to the demonstrated motion profile by the user. A DNN with 5000 iterations takes 9 minutes to train on a standard laptop (i7-12700H) and 0.6 $\pm$ 0.1 ms to request an action from the DNN online.
Table 1. Hyperparameters of PUMA for a second-order dynamical system learning a pose, e.g. position and orientation.
Hyperparameter
Value (tomato-picking)
Value (pouring)
PUMA
Stability loss margin ($m$)
1e-6
1e-6
Triplet imitation loss weight ($\lambda$)
1.0
1.0
Window size imitation ($\mathcal{H}^i$)
13
13
Window size stability ($\mathcal{H}^s$)
2
2
Batch size imitation ($\mathcal{B}^i$)
800
800
Batch size stability ($\mathcal{B}^s$)
800
800
Neural Network
Optimizer
Adam
Adam
Number of iterations
40000
27000
Learning rate
1e-4
1e-4
Activation function
GELU
GELU
Num. layers ($\varphi_{\theta}, \rho_\theta$)
(3, 3)
(3, 3)
Layer normalization
yes
yes
References
Ratliff, Nathan, and Van Wyk, Karl. (2023). "Fabrics: A Foundationally Stable Medium for Encoding Prior Experience." arXiv preprint arXiv:2309.07368.
Ratliff, Nathan D., Van Wyk, Karl, Xie, Mandy, Li, Anqi, and Rana, Muhammad Asif. (2020). "Optimization fabrics." arXiv preprint arXiv:2008.02399.
Pérez-Dattari, Rodrigo, Della Santina, Cosimo and Kober, Jens. (2024). "Deep metric imitation learning for stable motion primitives." Advanced Intelligent Systems.
Related Publications
Safe and stable motion primitives via imitation learning and geometric fabrics
Saray Bakker,
Rodrigo Pérez-Dattari,
Cosimo Della Santina,
Wendelin Böhmer,
Javier Alonso-Mora.
In Robotics: Science and Systems, Workshop on Structural Priors as Inductive Biases for Learning Robot Dynamics, 2024.
Using the language of dynamical systems, Imitation learning (IL) provides an intuitive and effective way of teaching stable task-space motions to robots with goal convergence. Yet, these techniques are affected by serious limitations when it comes to ensuring safety and fulfillment of physical constraints. With this work, we propose to solve this challenge via TamedPUMA, an IL algorithm augmented with a recent development in motion planning called geometric fabrics. We explore two variations of this approach, which we name the forcing policy method and the compatible potential method. Making these combinations possible requires two enabling factors: the possibility of learning second-order dynamical systems by imitation and the availability of a potential function that is compatible with the learned dynamics. In this paper, we show how these conditions can be met when using an IL strategy called PUMA. The result is a stable imitation learning strategy within which we can seamlessly blend geometrical constraints like collision avoidance and joint limits. Beyond providing a theoretical analysis, we demonstrate TamedPUMA with simulated and real-world tasks, including a 7-degree-of-freedom manipulator that is trained to pick a tomato from a crate in the presence of obstacles.
Reactive grasp and motion planning for adaptive mobile manipulation among obstacles
Tomas Merva,
Saray Bakker,
Max Spahn,
Ivan Virgala,
Javier Alonso-Mora.
In Robotics: Science and Systems, Workshop on Frontiers of Optimization for Robotics, 2024.
Mobile manipulators are susceptible to situations in which the precomputed grasp pose is not reachable as the result of conflicts between collision avoidance behaviour and the manipulation task. In this work, we address this issue by combining real-time grasp planning with geometric motion planning for decentralized multi-agent systems, referred to as Reactive Grasp Fabrics (RGF). We optimize the precomputed grasp pose candidate to account for obstacles and the robot's kinematics. By leveraging a reactive geometric motion planner, specifically geometric fabrics, the grasp optimization problem can be simplified, resulting in a fast, adaptive framework that can resolve deadlock situations in pick-and-place tasks. We demonstrate the robustness of this approach by controlling a mobile manipulator in both simulation and real-world experiments in dynamic environments.
Multi-Robot Local Motion Planning Using Dynamic Optimization Fabrics
Saray Bakker,
Luzia Knoedler,
Max Spahn,
Wendelin Boehmer,
Javier Alonso-Mora.
In Proc. IEEE International Symposium on Multi-Robot and Multi-Agent Systems, 2023.
In this paper, we address the problem of real-time motion planning for multiple robotic manipulators that operate in close proximity. We build upon the concept of dynamic fabrics and extend them to multi-robot systems, referred to as Multi-Robot Dynamic Fabrics (MRDF). This geometric method enables a very high planning frequency for high-dimensional systems at the expense of being reactive and prone to deadlocks. To detect and resolve deadlocks, we propose Rollout Fabrics where MRDF are forward simulated in a decentralized manner. We validate the methods in simulated close-proximity pick-and-place scenarios with multiple manipulators, showing high success rates and real-time performance.