HMD2: Environment-aware Motion Generation from Single Egocentric Head-Mounted Device

1Tübingen AI Center, University of Tübingen, 2Max Planck Institute for Informatics, Saarland Informatics Campus 3Stanford University 4Nanyang Technological University 5Meta Reality Labs Research *Equal contribution Work done during internships at Meta Reality Labs Research Work done partially during internship at Meta Reality Labs Research
Teaser figure.

HMD2 generates full human-body motion from a single head-mounted device equipped with an outward-facing camera in complex and diverse environments.

Abstract

This paper investigates the online generation of realistic full-body human motion using a single head-mounted device with an outward-facing color camera and the ability to perform visual SLAM. Given the inherent ambiguity of this setup, we introduce a novel system, HMD2, designed to balance between motion reconstruction and generation.

From a reconstruction standpoint, our system aims to maximally utilize the camera streams to produce both analytical and learned features, including head motion, SLAM point cloud, and image embeddings. On the generative front, HMD2 employs a multi-modal conditional motion Diffusion model, incorporating a time-series backbone to maintain temporal coherence in generated motions, and utilizes autoregressive in-painting to facilitate online motion inference with minimal latency (0.17 seconds). Collectively, we demonstrate that our system offers a highly effective and robust solution capable of scaling to an extensive dataset of over 200 hours collected in a wide range of complex indoor and outdoor environments using publicly available smart glasses.

Video

System Overview

Our system, HMD2, generates realistic full-body motion that aligns with the signals from a single head-mounted device. Using the image streams from the ego-centric camera and head trajectory with the feature cloud from the onboard SLAM system, we employ a diffusion-based framework to generate the wearer's full-body motion.

Method figure.

Results from Unseen Environments and Subjects

Effectiveness of Image and Point Cloud Conditions

Diverse Generation with Underspecified Input

Citation

@inproceedings{guzov-jiang2024hmd2,
  title = {HMD^2: Environment-aware Motion Generation from Single Egocentric Head-Mounted Device},
  author = {Guzov, Vladimir and Jiang, Yifeng and Hong, Fangzhou and Pons-Moll, Gerard and Newcombe, Richard and Liu, C. Karen and Ye, Yuting and Ma, Lingni},
  booktitle = {Arxiv},
  year = {2024},
  month = sep,
}