Solvents play an indispensable role in numerous chemical processes, including gas absorption, extraction, and reactions, which makes solvent selection one of the critical decisions in early-stage chemical process design. Given the vast number of potential solvents and their mixtures, solely experimental evaluation of their properties is prohibitively time- and cost-intensive. The infinite dilution activity coefficient is a fundamental thermodynamic property widely used to characterize liquid mixtures, especially for describing phase and chemical equilibria.
In this study, researchers developed a hybrid COSMO-SAC-ML workflow that delivers fast and quantitative predictions of infinite dilution activity coefficient. To circumvent labor-intensive and time-consuming quantum chemistry calculations, a multi-task deep learning model was developed to predict the σ-profile and molecular cavity volume as essential inputs for COSMO-SAC calculation. The prediction model achieved high predictive accuracy, with R² values of 0.982 for σ-profile and 0.997 for molecular cavity volume. The model exhibits excellent capability in distinguishing stereoisomers, cis/trans isomers, protonated/deprotonated species, and is also applicable to ionic liquids.
Based on the constructed prediction model, the original COSMO-SAC was evaluated on over 20000 experimental data points. Results revealed significant limitations, with a mean absolute error of 0.944 and a negative R² value of -0.443, indicating that the original model sometimes provides only qualitative rather than quantitative predictions, exhibiting particularly large errors for hydrogen-bonding mixtures such as halogenated-alcohols mixtures.
To enhance performance, four adjustable parameters within COSMO-SAC were optimized using the experimental data set. Parameter optimization yielded substantial improvement, reducing mean absolute error to 0.510 and increasing R² to 0.625. To further elevate predictive performance, boosting ensemble learning was implemented to predict residuals between experimental values and optimized COSMO-SAC predictions. The resulting hybrid COSMO-SAC-ML model achieved a remarkably low mean absolute error of 0.102, representing an 89.2% reduction compared to the original model, alongside a high R² value of 0.969.
The proposed COSMO-SAC-ML strategy showcases how classical thermodynamics and modern machine learning can complement each other to achieve both accuracy and physical insight, offering a robust platform for high-throughput solvent screening and broader mixture-property prediction. This accurate and efficient approach broadens the practical applicability of σ-profile and molecular cavity volume prediction, as well as infinite dilution activity coefficient calculations based on COSMO-SAC, facilitating high-throughput solvent screening for diverse chemical engineering applications.
DOI
10.1007/s11705-026-2625-y