In the field of mental health research, accurately detecting depression is crucial. However, when handling multimodal long-temporal data, two major challenges emerge: 1) Redundancy exists in long-temporal data feature extraction, and key feature extraction is unclear. 2) Existing multimodal data feature fusion methods are disrupted by inferior modality.
To solve the problems, a research team led by Shuai Ding published their new research in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team introduced a remarkable approach named MILCAnet, a dominant feature attention framework. This framework focuses on multimodal long-temporal data features, integrating key feature extraction based on multi - instance learning and feature fusion led by dominant modalities. More precisely, the MIL module is utilized to dig out the key features of long-temporal data, while the CA module ensures the effective fusion of multimodal data features. This innovative combination remarkably enhances the performance of non - contact depression detection.
The MIL module is designed to pinpoint the key temporal segment features within long-temporal data and consolidate the full-period data features of the modal data. Meanwhile, the CA module conducts cross-attention between dominant and weak modal data, relying on the dominant modal data to achieve multimodal feature fusion. The effectiveness of both modules has been experimentally verified through ablation studies in depression detection scenarios.
Compared with previous methods, MILCAnet offers a more efficient and accurate solution for non - contact depression detection, potentially revolutionizing the field of mental health screening.
DOI:10.1007/s11704-025-41443-9