Body height and weight estimation from a single non-frontal face image suffers from poor performance due to large face pose variance and lack of labeled data. To solve the problems, a research team led by Shiguang SHAN published their new research in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature. The team proposed a face-based body height and weight estimation method that leverages auxiliary tasks and pose disentanglement to address these issues. Specifically, inspired by the relevance of gender, age, height and weight estimation tasks, they employ gender and age estimation as auxiliary tasks to improve the performance of primary tasks, i.e., height and weight estimation. Besides, they remove the pose-relevant feature from input to further promote the performance of both primary and auxiliary tasks. Extensive experiments are conducted on both small- and large-pose datasets, demonstrating the superiority of the proposed method.
In the research, they analyze the relationship among gender, age, height, weight and head poses. As body shape alters with time, age-related feature may contribute to the estimation of height and weight. Moreover, bodies of males and females are generally discrepant in bone mineral density and muscle account, therefore facial appearance varies between different genders even when they are of the same height and weight, demonstrating height and weight estimation can also benefit from gender perception. In addition, as attributes, like gender, age, height and weight, do not alter with face pose variation, pose-relevant feature may hinder the prediction performance of these attributes. In consideration of the relationship among these attributes, they proposed a face-based method that utilizes auxiliary tasks and pose disentanglement for body height and weight estimation.
Firstly, the general face feature is extracted from the input image via several convolutional layers. Next, pose disentanglement module is utilized to remove pose-relevant feature from general face feature so as to extract the pose-irrelevant feature for both primary and auxiliary tasks. In order to further promote the performance of primary tasks, auxiliary feature learning and fusion branches are introduced to fuse auxiliary task-specific features with pose-irrelevant feature afterward. Subsequently, convolutional layers and fully-connected layers project fusion feature to the primary task-specific outputs. Finally, multi-task losses are utilized to optimize the whole framework in an end-to-end manner. Experimental results on both VIP-attributes dataset and VIPL-MumoFace-WH dataset demonstrate the effectiveness of the proposed method.
DOI:10.1007/s11704-025-50162-0