READ: Retrieval-Enhanced Asymmetric Diffusion for Motion Planning
(CVPR 2024)

Takeru Oba¹, Matthew Walter², Norimichi Ukita¹,

¹Toyota Technological Institute,

²Toyota Technological Institute at Chicago,

Abstract

This paper proposes Retrieval-Enhanced Asymmetric Diffusion (READ) for image-based robot motion planning. Given an image of the scene, READ retrieves an initial motion from a database of image-motion pairs, and uses a diffusion model to refine the motion for the given scene. Unlike prior retrieval-based diffusion models that require long forward-reverse diffusion paths, READ directly diffuses between the source (retrieved) and target motions, resulting in an efficient diffusion path. A second contribution of READ is its use of asymmetric diffusion, whereby it preserves the kinematic feasibility of the generated motion by forward diffusion in a low-dimensional latent space, while achieving high-resolution motion by reverse diffusion in the original task space using cold diffusion. Experimental results on various manipulation tasks demonstrate that READ outperforms state-of-the-art planning methods, while ablation studies elucidate the contributions of asymmetric diffusion.

Contributions

Retrieval Enhancement: Typical stochastic diffusion-based motion planning methods generate motions by running reverse diffusion from a random (e.g., Gaussian) sample, which may result in motion that is infeasible and/or does not reach the goal (e.g., DDPM). Retrieval-based methods use a diffusion process to refine a candidate (retrieved) motion that is assumed to be near the target motion. However, prior retrieval-based methods employ forward diffusion to move towards the zero-mean Gaussian and then refine the motion towards the target, resulting in a longer, roundabout path. Instead, READ performs diffusion directly from the retrieved motion to the target.

Latent space interpolation as forward process: Forward diffusion has difficulty in preserving the feasibility of the motion in the original high-dimensional task space. To enhance the retrieved motion, READ instead performs forward diffusion by interpolation in a latent space in which the semantics of the motion, including feasibility, are preserved.

Asymmetric diffusion: While the latent space supports the generation of feasible motions, the lower dimensionality makes it difficult to model high-resolution motions such as those needed for manipulation. To address this, we propose asymmetric diffusion that performs the forward process in the latent space and the reverse process in the original space to achieve high-resolution refinement.