Dyco

Neural rendering techniques have significantly advanced 3D human body modeling. However, previous approaches often overlook dynamics induced by factors such as motion inertia, leading to challenges in scenarios like abrupt stops after rotation, where the pose remains static while the appearance changes. This limitation arises from reliance on a single pose as conditional input, resulting in ambiguity in mapping one pose to multiple appearances.
In this study, we elucidate that variations in human appearance depend not only on the current frame's pose condition but also on past pose states. Therefore, we introduce Dyco, a novel method utilizing the delta pose sequence representation for non-rigid deformations and canonical space to effectively model temporal appearance variations. To prevent a decrease in the model's generalization ability to novel poses, we further propose low-dimensional global context to reduce unnecessary inter-body part dependencies and a quantization operation to mitigate overfitting of the delta pose sequence by the model. To validate the effectiveness of our approach, we collected a novel dataset named I3D-Human, with a focus on capturing temporal changes in clothing appearance under approximate poses. Through extensive experiments on both I3D-Human and existing datasets, our approach demonstrates superior qualitative and quantitative performance. In addition, our inertia-aware 3D human method can unprecedentedly simulate appearance changes caused by inertia at different velocities.

Idea

The overall pipeline of our method. The rigid transformation and non-rigid transformation module deform the coordinate in the pose space into the canonical space, which is then fed into the triplane volume to obtain the color and density in the canonical space. To capture the variation under similar poses within different dynamic contexts, we adopt a localized dynamic context encoder to embed pose sequences as additional conditional inputs into the transformation module and canonical volume.

Conclusion

In this work, we present Dyco, a novel human motion modeling method incorporating pose-sequence condition to mitigate dynamic contexts induced appearance ambiguities. We posit that the appearance of human is determined not only by pose conditions but by the cumulative motion states induced by inertia as well, which can be adequately encapsulated by pose sequences. In addition to introducing pose sequences as conditional inputs, we further design a localized dynamic context encoder to address the model overfitting caused by excessive reliance on delta pose. Through these modules, we have successfully resolved the ambiguity in appearance caused by dynamic context, thus enhancing the rendering quality of human bodies in loose attire. The I3D-Human dataset we have developed aims to rectify the oversight in previous datasets regarding loose clothing and propel research into complex human motion familiar in real-life scenarios.

Reference

[1] Weng, Chung-Yi, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron, and Ira Kemelmacher-Shlizerman. "Humannerf: Free-viewpoint rendering of moving people from monocular video." In Proceedings of the IEEE/CVF conference on computer vision and pattern Recognition, pp. 16210-16220. 2022.

Within the Dynamic Context: Inertia-aware 3D Human Modeling with Pose Sequence

Video demos

Novel velocity rendering

Novel acceleration rendering

Abstract

Idea

Conclusion

Reference