Graph neural networks (GNNs) have gained traction and have been applied to various graph-based data analysis tasks due to their high performance. However, a major concern is their robustness, particularly when faced with graph data that has been deliberately or accidentally polluted with noise. This presents a challenge in learning robust GNNs under noisy conditions.
To address this issue, a research team led by Yao WU published their new research on 15 Apr 2025 in Frontiers of Computer Science co-published by Higher Education Press and Springer Nature.
The team propose a novel framework called Soft-GNN, which mitigates the influence of label noise by adapting the data utilized in training. By better utilizing significant training samples and reducing the impact of label noise through dynamic data selection, GNNs are trained to be more robust. Compared with the existing research results, the proposed method achieves better performance in node classification task, and it has higher time efficiency.
In the research, they look deeper into the training of GNNs on noisy graph data and find that there are differences between mislabeled nodes and correct nodes in the training process. It is observed that the average loss on mislabeled nodes is larger than that on correct nodes. They further revisit the training differences between mislabeled nodes and correctly labeled nodes from the perspective of node neighborhoods. Specifically, for nodes in the graph, they analyze from both local and global structural perspectives. There are obvious deviations between mislabeled nodes and correctly labeled nodes from both local and global views. These training statistic features uncover the potential impact of label noise from the views of prediction as well as the local and global neighborhood, which can be further utilized for data selection.
Based on the observations, they propose a simple yet effective framework for training robust GNNs, denoted by Soft-GNN. The introduced sample network utilizes the observed training deviations and outputs sample weights to realize the self-adaptive data utilization. Specifically, it learns the sample weight by considering prediction, local, and global deviation and prunes labeled nodes according to the output weights. They further balance selected labeled nodes with the corresponding weights and assign learned weights to selected nodes with loss correction for model training. The experimental results show that Soft-GNN can learn the robust GNNs model under different noise types and rates. Moreover, it is still better when the data is noise-free. In addition, Soft-GNN improves the performance of mislabeled nodes and their neighboring nodes. They focus on the label noise for the data corruption at the node level, and further study will pay close attention to other types of data noise.
DOI: 10.1007/s11704-024-3575-5