General Intelligence Framework to Predict Virus Adaptation Based on a Genome Language Model

In the field of biomedicine and public health, continuous viral mutation and evolution may enable viruses to cross species barriers, infect non-natural hosts, and subsequently trigger human-to-human transmission or even global pandemics. Historically, multiple major outbreaks, such as COVID-19 and influenza pandemics, have been caused by zoonotic viruses. Therefore, in the face of potential threats from unknown viruses, developing intelligent models capable of rapidly assessing their adaptability and transmission risks at the genotypic level has become a forefront challenge in infectious disease prevention and control.

Traditional experimental methods for viral risk identification, while reliable, are time-consuming and low-throughput, making it difficult to perform real-time and prospective risk assessments on large-scale viral sequence data. In recent years, artificial intelligence technologies have demonstrated potential in predicting phenotypes such as receptor binding, host adaptation, and evolutionary escape based on viral gene or protein sequences. However, existing models are primarily designed for specific viruses or genes, with strict limitations on sequence type and length, thereby lacking generalizability for diverse unknown viruses. Moreover, the scarcity of phenotypic labels and the presence of annotation noise in public databases significantly constrain the performance of supervised learning models. Thus, under conditions of incomplete labeling, constructing an accurate, highly generalizable, and directly applicable intelligent framework for predicting the adaptation risk of unknown viruses represents a critical challenge in the field.

In summary, developing a general artificial intelligence approach that does not fully rely on predefined labels and can perform adaptation risk prediction will provide essential technical support for the early warning and control decision-making of emerging infectious diseases, holding significant theoretical value and practical application prospects.

Research Progress

A research team led by Prof. Tao Jiang and Prof. Jing Li from State Key Laboratory of Pathogen and Biosecurity, Academy of Military Medical Science, in collaboration with Prof. Shi-Shun Zhao from College of Mathematics, Jilin University and Prof. Jianwei Wang from NHC Key Laboratory of Systems Biology of Pathogens and Christophe Merieux Laboratory, National Institute of Pathogen Biology, Chinese Academy of Medical Sciences and Peking Union Medical College, developed a viral risk prediction framework named GIVAL based on the pre-trained viral protein language model vBERT. Prof. Jing Li, Prof. Tao Jiang and Prof. Jianwei Wang are the corresponding authors of this paper, and the first author is Shu-Yang Jiang, a Ph.D. candidate from College of Mathematics, Jilin University. This study proposes a general intelligent prediction method for assessing adaptation risks of unknown viruses (Fig. 1).

First, viral protein sequences were tokenized dynamically using a Hidden Markov Model (HMM). The viral protein language model vBERT, trained with statistical sampling of viral genome sequences and HMM-based dynamic tokenization, demonstrated performance superior to mainstream pre-trained models such as DNABERT-2, proteinBERT, and ESM-2 in benchmark tests (Fig. 2A-E).

Second, based on vBERT embeddings, a semi-supervised general AI framework named GIVAL was established for predicting adaptation risks of unknown viruses, and the full pipeline was systematically evaluated. The semi-supervised learning approach endowed GIVAL with higher prediction accuracy and greater tolerance to labeling errors, enabling reliable modeling and accurate prediction for unknown input sequences under label-deficient conditions (Fig. 2F-K).

Finally, GIVAL successfully identified the reported shift in receptor binding of two Middle East respiratory syndrome coronavirus (MERS-CoV) related strains, discerned adaptation differences between canine and equine H3N8 influenza viruses, inferred high-risk mutations in H5N1 influenza viruses (Fig. 3), and assessed recent adaptation shift in monkeypox viruses.

Future Prospects
The innovative general artificial intelligence framework for viral risk prediction proposed in this study enables intelligent assessment of potential risks from future unknown viruses. Even under conditions of incomplete viral sequences and scarce annotated data, it can achieve high-precision and highly robust risk evaluation, thereby providing critical decision support for early warning and proactive prevention and control of viral infectious diseases.

The complete study is accessible via DOI:10.34133/research.0871

https://spj.science.org/doi/10.34133/research.0871

Title: General Intelligence Framework to Predict Virus Adaptation Based on a Genome Language Model
Authors: SHU-YANG JIANG, SHI-SHUN ZHAO, JUN-QING WEI, SEN ZHANG, ZHONGPENG ZHAO, YIGANG TONG, WEI LIU, JIANWEI WANG, TAO JIANG, AND JING LI
Journal: 30 Sep 2025 Vol 8 Article ID: 0871
DOI: 10.34133/research.0871

Attached files

Fig. 1. The pipeline of GIVAL to generally predict the adaptation risk of a virus based on vBERT embedding of its genotype
Fig. 2. Performance of HMM-based vBERT and GIVAL on different viral protein datasets
Fig. 3. Prediction and inference of human-adapted H3N8 and H5N1 IAVs based on HA RBD

13/12/2025 Science and Technology Review Publishing House

Regions: Asia, China

Keywords: Applied science, Artificial Intelligence, Health, Medical, People in health research

Disclaimer: AlphaGalileo is not responsible for the accuracy of content posted to AlphaGalileo by contributing institutions or for the use of any information through the AlphaGalileo system.

Latest Publications

Testimonials

For well over a decade, in my capacity as a researcher, broadcaster, and producer, I have relied heavily on Alphagalileo.
All of my work trips have been planned around stories that I've found on this site.
The under embargo section allows us to plan ahead and the news releases enable us to find key experts.
Going through the tailored daily updates is the best way to start the day. It's such a critical service for me and many of my colleagues.

Koula Bouloukos, Senior manager, Editorial & Production Underknown

We have used AlphaGalileo since its foundation but frankly we need it more than ever now to ensure our research news is heard across Europe, Asia and North America. As one of the UK’s leading research universities we want to continue to work with other outstanding researchers in Europe. AlphaGalileo helps us to continue to bring our research story to them and the rest of the world.

Peter Dunn, Director of Press and Media Relations at the University of Warwick

AlphaGalileo has helped us more than double our reach at SciDev.Net. The service has enabled our journalists around the world to reach the mainstream media with articles about the impact of science on people in low- and middle-income countries, leading to big increases in the number of SciDev.Net articles that have been republished.

General Intelligence Framework to Predict Virus Adaptation Based on a Genome Language Model

This item is under embargo and is only visible to journalists

Latest Publications

Testimonials

Koula Bouloukos, Senior manager, Editorial & Production Underknown

Peter Dunn, Director of Press and Media Relations at the University of Warwick

Ben Deighton, SciDevNet