Articulat3D: Reconstructing Articulated Digital Twins From Monocular Videos with Geometric and Motion Constraints

*Equal contribution, Corresponding authors
1WHU, 2CUHK, 3CAMS & PUMC, 4UM, 4ZJU

Abstract

Building high-fidelity digital twins of articulated objects from visual data remains a central challenge. Existing approaches depend on multi-view captures of the object in discrete, static states, which severely constrains their real-world scalability. In this paper, we introduce Articulat3D, a novel framework that constructs such digital twins from casually captured monocular videos by jointly enforcing explicit 3D geometric and motion constraints. We first propose Motion Prior–Driven Initialization, which leverages 3D point tracks to exploit the low-dimensional structure of articulated motion. By modeling scene dynamics with a compact set of motion bases, we facilitate soft decomposition of the scene into multiple rigidly-moving groups. Building on this initialization, we introduce Geometric and Motion Constraints Refinement, which enforces physically plausible articulation through learnable kinematic primitives parameterized by a joint axis, a pivot point, and per-frame motion scalars, yielding reconstructions that are both geometrically accurate and temporally coherent. Extensive experiments demonstrate that Articulat3D achieves state-of-the-art performance on synthetic benchmarks and real-world casually captured monocular videos, advancing the feasibility of digital twin creation under real-world conditions.

Pipeline
Figure 1: Overview of the proposed pipeline.

Comparison with baselines

Our method consistently achieves state-of-the-art (SOTA) performance across all datasets and evaluation metrics.

Pipeline
Tab 1: Comparison of Articulat3D with SOTA baselines.
Pipeline
Figure 2: Visual results on Video2Articulation-S dataset.
Pipeline
Figure 3: Visual results on Articulat3D-Sim dataset.
Pipeline
Figure 4: Visual results on Articulat3D-Real dataset.

BibTeX


      @article{zhao2024hfgs,
        title={Towards Affordance-Aware Robotic Dexterous Grasping with Human-like Priors},
        author = {Zhao, Haoyu and Zhuang, Linghao and Zhao, Xingyue and Zeng, Cheng and Xu, Haoran and Jiang, Yuming and Cen, Jun and Wang, Kexiang and Guo, Jiayan and Huang, Siteng and Li, Xin and Zhao, Deli and Zou, Hua},
        journal={arXiv preprint arXiv:2508.08896},
        year={2025}
      }