DeformMaster An Interactive Physics-Neural World Model for Deformable Objects from Videos

Can Li1,*, Zhoujian Li2,*, Ren Li3, Jie Gu4, Lei Lei5,*, Jingmin Chen4, Lei Sun1
1Nankai University   2Zhejiang University   3Southern University of Science and Technology
4Rightly Robotics, A4X   5University of Science and Technology of China
*Work done during internship at Rightly Robotics, A4X.
DeformMaster teaser

DeformMaster turns a phone-captured monocular video of deformable objects into an online interactive world model. By recovering both underlying physics and high-fidelity appearance, it supports long-horizon rollouts under novel actions, material-parameter variation, and novel-view synthesis.

Online Interaction

More online interactive demo coming.

Abstract

World models for deformable objects should recover not only geometry and appearance, but also underlying physical dynamics, interaction grounding, and material behavior. Learning such a model from real videos is challenging because deformable linear, planar, and volumetric objects evolve under high-dimensional deformation, noisy interactions, and complex material response. The model must therefore infer a physical state from visual observations, roll it forward under new interactions, and render the resulting dynamics with high visual fidelity.

We present DeformMaster, a video-derived interactive physics–neural world model that turns real interaction videos into an online interactive model of deformable objects within a unified dynamics-and-appearance framework. DeformMaster preserves structured physical rollout while using a neural residual to compensate for unmodeled effects, grounds sparse hand motion as distributed compliant actuators for hand–continuum interaction, represents material response with spatially varying constitutive experts, and drives high-fidelity 4D appearance from the predicted physical evolution. Experiments on real-world deformable-object sequences demonstrate DeformMaster's ability to roll out future dynamics and render dynamic appearance, outperforming state-of-the-art baselines while supporting novel action rollout, material-parameter variation, and dynamic novel-view synthesis.

Method

DeformMaster method overview

DeformMaster learns a deformable-object world model from interaction videos by coupling interactive physics–neural dynamics with physics-grounded appearance. It integrates four components for stable dynamics rollout, robust hand–continuum interaction, heterogeneous material response, and high-fidelity rendering:

  • Physics–Neural Particle-Grid Dynamics (PNPGD). A differentiable MPM block advances the material-particle state over each frame, while a particle-grid neural residual estimates a residual velocity field from the post-MPM state and short motion history. This keeps rollout structured and stable while absorbing systematic real-world mismatch.
  • Distributed Compliant Actuators (DCA). Vision-derived hand tracks are treated as compliant actuator anchors rather than hard point constraints. Local actuator-particle couplings damp tracking noise and distribute forces over contact patches, yielding robust hand–continuum interaction.
  • Mixture of Constitutive Experts (MoCE). The stress response is represented as a spatially varying mixture of named constitutive laws, including Neo-Hookean, Corotated, and StVK experts. Patch expert weights and learnable material parameters let different regions express different deformation behavior.
  • Physics-grounded 4D appearance. Gaussian splats are incrementally deformed by the predicted material-particle trajectory using linear blend skinning. Rendering therefore stays aligned with physical motion without re-optimizing a separate dynamic appearance model for each new action.

Results

We evaluate DeformMaster on real-world deformable-object sequences spanning linear (ropes), planar (cloths, packages), and volumetric (softbodied toys) objects.

Qualitative results

Long-horizon rollouts of DeformMaster on the deformable objects (from PhysTwin). Foreground: prediction; background: ground truth.

double_lift_cloth_1

double_lift_cloth_3

double_lift_sloth

double_lift_zebra

double_stretch_sloth

double_stretch_zebra

single_clift_cloth_1

single_clift_cloth_3

single_lift_cloth

single_lift_cloth_1

single_lift_cloth_3

single_lift_cloth_4

single_lift_dinosor

single_lift_rope

single_lift_sloth

single_lift_zebra

single_push_rope

single_push_rope_1

single_push_sloth

weird_package

Comparisons

Side-by-side comparisons between a baseline and DeformMaster on representative deformable-object sequences.

Baseline (PhysTwin)
Ours

double_lift_zebra

double_lift_zebra

double_lift_sloth

double_lift_sloth

Novel action- and material-conditioned prediction

DeformMaster supports rollouts under novel actions and material-parameter variations, going beyond observations and baselines.

Rope

Sloth

Per-category dynamics prediction

Per-category dynamics prediction on the PhysTwin sequences (n=20), grouped into deformable linear, planar, and volumetric objects.

Method Linear (n=3) Planar (n=9) Volumetric (n=8)
IoU ↑ Chamfer ↓ Track ↓ IoU ↑ Chamfer ↓ Track ↓ IoU ↑ Chamfer ↓ Track ↓
PhysTwin 0.6580.0070.013 0.7380.0130.028 0.7480.0130.021
DeformMaster (ours) 0.7210.0050.010 0.7480.0130.032 0.7560.0120.020

More quantitative results, qualitative results, ablations, and an interactive online playground will be released here.

BibTeX

@article{li2026deformmaster,
    title         = {DeformMaster: An Interactive Physics-Neural World Model for Deformable Objects from Videos},
    author        = {Li, Can and Li, Zhoujian and Li, Ren and Gu, Jie and Lei, Lei and Chen, Jingmin and Sun, Lei},
    journal       = {arXiv preprint arXiv:2605.09586},
    year          = {2026},
    eprint        = {2605.09586},
    archivePrefix = {arXiv}
}