Rendering results of our Dyco and HumanNeRF[1] on I3D-Human dataset.
We select different scaling factors α ranging from 0 to 2, while increasing α implies higher motion speed. We conduct inference on the same pre-trained model and show our rendering results with scaled delta pose sequence as condition. We observe a significant variation in the skirt's amplitude as α changes, which is consistent with our experience that the skirt's amplitude increases with higher spinning velocity.
Novel acceleration is accomplished by designing a pose-sequence simulating an abrupt stop. Specifically, for a spinning test data, we intercepted poses midway and control subsequent poses to be exactly the same. We show the rendering sequences with (left) and without (right) delta pose sequence condition to illustrate that our method can capture the characteristics of novel acceleration accurately, such as the falling of a skirt hem.
Neural rendering techniques have significantly advanced 3D human body modeling.
However, previous approaches often overlook dynamics induced by factors such as motion inertia,
leading to challenges in scenarios like abrupt stops after rotation, where the pose remains
static while the appearance changes.
This limitation arises from reliance on a single pose as conditional input, resulting in
ambiguity in mapping one pose to multiple appearances.
In this study, we elucidate that variations in human appearance depend not only on the current
frame's pose condition but also on past pose states.
Therefore, we introduce Dyco, a novel method utilizing the delta pose sequence representation
for non-rigid deformations and canonical space to effectively model temporal appearance
variations.
To prevent a decrease in the model's generalization ability to novel poses, we further propose
low-dimensional global context to reduce unnecessary inter-body part dependencies and a
quantization operation to mitigate overfitting of the delta pose sequence by the model.
To validate the effectiveness of our approach, we collected a novel dataset named I3D-Human,
with a focus on capturing temporal changes in clothing appearance under approximate poses.
Through extensive experiments on both I3D-Human and existing datasets, our approach demonstrates
superior qualitative and quantitative performance.
In addition, our inertia-aware 3D human method can unprecedentedly simulate appearance changes
caused by inertia at different velocities.
The overall pipeline of our method. The rigid transformation and non-rigid transformation module deform the coordinate in the pose space into the canonical space, which is then fed into the triplane volume to obtain the color and density in the canonical space. To capture the variation under similar poses within different dynamic contexts, we adopt a localized dynamic context encoder to embed pose sequences as additional conditional inputs into the transformation module and canonical volume.
In this work, we present Dyco, a novel human motion modeling method incorporating pose-sequence condition to mitigate dynamic contexts induced appearance ambiguities. We posit that the appearance of human is determined not only by pose conditions but by the cumulative motion states induced by inertia as well, which can be adequately encapsulated by pose sequences. In addition to introducing pose sequences as conditional inputs, we further design a localized dynamic context encoder to address the model overfitting caused by excessive reliance on delta pose. Through these modules, we have successfully resolved the ambiguity in appearance caused by dynamic context, thus enhancing the rendering quality of human bodies in loose attire. The I3D-Human dataset we have developed aims to rectify the oversight in previous datasets regarding loose clothing and propel research into complex human motion familiar in real-life scenarios.
[1] Weng, Chung-Yi, Brian Curless, Pratul P. Srinivasan, Jonathan T. Barron, and Ira Kemelmacher-Shlizerman.
"Humannerf: Free-viewpoint rendering of moving people from monocular video." In Proceedings of the IEEE/CVF
conference on computer vision and pattern Recognition, pp. 16210-16220. 2022.