https://doi.org/10.1101/gr.281006.125
A new study published in
Genome Research presents an
interpretable artificial intelligence framework that
improves both the accuracy and transparency of
genomic prediction, a key challenge in fields such as
precision medicine,
crop science, and
animal breeding.
Predicting observable traits from genetic variation remains difficult due to the complex interplay of multiple genes and environmental influences. Widely used statistical approaches are limited in their ability to capture
complex genetic interactions, while
machine learning methods, although powerful, are often criticised for their lack of interpretability.
This new study addresses this gap by integrating
advanced machine learning models with
explainable AI techniques, enabling both high predictive performance and biological insight. A broad range of computational methods across diverse datasets spanning multiple species were evaluated, identifying key factors that influence prediction accuracy.
The findings show that
boosting algorithms, a class of machine learning models, consistently outperform traditional statistical methods, particularly for traits with well-defined genetic signals. In some cases, substantial improvements in predictive performance were observed, highlighting the potential of machine learning to advance genomic analysis. Further simulations show that machine learning models can automatically capture non-additive effects and multi-locus interactions without explicitly specifying interaction terms, thereby improving the representation of complex genetic architectures and predictive performance.
The study also demonstrates that
genetic architecture plays a critical role in determining model performance. Traits influenced by a smaller number of significant genetic loci are more effectively predicted, while highly complex traits remain more challenging. In addition, the researchers show that
feature selection and
model optimisation are essential for maximising predictive accuracy.
A key advance of the work is the incorporation of
interpretability methods, allowing the contribution of individual genetic variants to be quantified. This enables researchers to link predictions directly to specific regions of the genome, revealing both additive and interaction effects and providing deeper insight into the biological basis of complex traits.
To support broader use, the authors of this article have developed an open-source platform,
AIGP (artificial intelligence genomic prediction), which integrates data preprocessing, model training, optimisation and interpretation within a single workflow. The platform is designed to make
AI-driven genomic analysis more accessible to researchers across disciplines.
The findings highlight a growing shift toward more transparent and biologically informed
AI applications in genomics, with potential implications for improving breeding strategies, accelerating biological discovery, and enhancing understanding of complex traits across species.
Article Reference
Authors: Lei Wei; Ziqin Jiang; Baoliang Fan; Yidan Yan; Zhenqiang Xu; Xiaoxiang Hu; Yuzhe Wang, Automated interpretable artificial intelligence genomic prediction with AIGP,
Genome Research, Published online March 5, 2026,
https://doi.org/10.1101/gr.281006.125
Keywords
Genomic prediction, Artificial intelligence, Interpretable machine learning, Computational genomics, Predictive modelling, Bioinformatics