Molecular Merged Hypergraph Neural Network for Explainable Solvation Gibbs Free Energy Prediction

Background: Molecular interactions are central to numerous challenges in chemistry and the life sciences. Whether in solute–solvent dissolution, adverse drug–drug interactions, or protein complex formation, understanding the fundamental mechanisms of intermolecular interactions is essential for advancing molecular-level modeling.

Current machine learning strategies typically approach this problem through three paradigms: Embedding Merging, where molecules are encoded independently and then merged; Feature fusion, where features are combined via attention or other fusion mechanisms; and Merged molecular graphs, where solute and solvent atoms are unified into a single graph to explicitly model atomic-level interactions.

Among these, merged molecular graph-based approaches have shown strong performance and enhanced interpretability due to their ability to explicitly encode intermolecular interactions. A representative model in this category is MMGNN, which connects all possible atom pairs in a fused graph and applies attention mechanisms to prioritize critical interactions. While effective in improving the prediction of solvation free energy (ΔG_solv), MMGNN suffers from rapidly growing computational complexity as molecular size increases, limiting its scalability and general applicability.

Method: To address these limitations, we introduce a novel framework: the Molecular Merged Hypergraph Neural Network (MMHNN). MMHNN innovatively incorporates a predefined set of molecular subgraphs, replacing each with a supernode to construct a compact hypergraph. This architectural change substantially reduces computational overhead while preserving essential molecular interactions.

In addition, MMHNN explicitly models non-interacting or repulsive atomic pairs by introducing a mechanism rooted in Graph Information Bottleneck (GIB) theory. This component enhances the semantic interpretability of both nodes and edges in the fused molecular graph, thereby improving the transparency and explainability of predictions.

Through extensive experiments on multiple solute–solvent benchmark datasets, MMHNN not only demonstrates significantly improved predictive performance and efficiency over existing methods but also offers clearer interpretability of molecular interactions, paving the way for efficient and scalable modeling of intermolecular relationships.

Furthermore, to evaluate the model's generalization capability across diverse solute–solvent systems, the authors conducted systematic generalization analyses on different solvent environments and solute scaffolds, validating the robustness and effectiveness of the proposed model under distributional shifts.

In addition, a quantitative analysis of generalization errors across different solute–solvent systems reveals that the model exhibits greater error sensitivity for larger molecules and varies with specific atomic element types. Similarly, distinct distributional differences were observed among solute scaffolds, indicating that distribution shifts between training and testing data can significantly impact model performance.

Finally, it is evident that the hypergraph-based molecular fusion framework significantly reduces computational time and memory consumption compared to fully connected molecular fusion graphs, while simultaneously delivering superior predictive performance.

The complete study is accessible via DOI: 10.34133/research.0740

https://spj.science.org/doi/10.34133/research.0740

Title: Molecular Merged Hypergraph Neural Network for Explainable Solvation Gibbs Free Energy Prediction
Authors: Wenjie Du, Shuai Zhang, Zhaohui Cai, Zhiyuan Liu, Junfeng Fang, Jianmin Wang, and Yang Wang
Journal: Research, 26 May 2025, Article in Press, Article ID: 0740
DOI: 10.34133/research.0740

Fichiers joints

Figure 1：Overview of the MMHNN model framework.
Table 1：Evaluation results of MMHNN on multiple solute–solvent datasets, demonstrating its effectiveness across diverse molecular systems.
Figure 2：Generalization performance of the model across diverse solute–solvent systems
Figure 3：Model error analysis across solute–solvent systems and molecular scaffolds
Figure 4：Analysis of model resource consumption.

07/08/2025 Science and Technology Review Publishing House

Regions: Asia, China

Keywords: Applied science, Artificial Intelligence, Computing, Science, Chemistry

Disclaimer: AlphaGalileo is not responsible for the accuracy of content posted to AlphaGalileo by contributing institutions or for the use of any information through the AlphaGalileo system.

Dernières publications

Témoignages