Comorbidity—the co-occurrence of multiple diseases in a patient—complicates diagnosis, treatment, and prognosis. Understanding how diseases connect at a molecular level is crucial, especially in aging populations and complex conditions like COVID-19, where comorbidities worsen outcomes. While network medicine has used the human interactome—a map of protein-protein interactions—to study these links, accurately predicting comorbid pairs remains challenging.
Now, researchers from the University of Delaware have introduced a new machine learning approach that significantly improves comorbidity prediction. In a study published in
Quantitative Biology, they present the Transformer with Subgraph Positional Encoding (TSPE) model, which combines a graph transformer architecture with a novel positional encoding method to capture both protein connectivity and disease-specific subgraph information.
Figure 1 illustrates the TSPE framework. Node embeddings from the interactome are enriched with subgraph-aware positional encoding and processed through a transformer encoder-decoder to predict whether two diseases are comorbid. The team evaluated TSPE on two widely used clinical benchmarks: RR0 and RR1, which classify disease pairs based on relative risk scores. Compared to the previous state-of-the-art method—Biologically Supervised Embedding with SVM—TSPE achieved a 28.24% increase in ROC AUC and a 3.04% increase in accuracy on RR0, and a 15.40% increase in ROC AUC and 4.93% increase in accuracy on RR1. An ablation study confirmed the importance of the proposed subgraph positional encoding (SPE), which outperformed both no positional encoding and Laplacian positional encoding, especially in terms of Matthews correlation coefficient—a key metric for imbalanced classification.
DOI:10.1002/qub2.70008