AI deciphers plant DNA: language models set to transform genomics and agriculture
en-GBde-DEes-ESfr-FR

AI deciphers plant DNA: language models set to transform genomics and agriculture

01/06/2025 TranSpread

By leveraging the structural parallels between genomic sequences and natural language, these AI-driven models can decode complex genetic information, offering unprecedented insights into plant biology. This advancement holds promise for accelerating crop improvement, enhancing biodiversity conservation, and bolstering food security in the face of global challenges.

Traditionally, plant genomics has grappled with the intricacies of vast and complex datasets, often limited by the specificity of traditional machine learning models and the scarcity of annotated data. While LLMs have revolutionized fields like natural language processing, their application in plant genomics remained nascent. The primary hurdle has been adapting these models to interpret the unique "language" of plant genomes, which differ significantly from human linguistic patterns. This study addresses this gap, exploring how LLMs can be tailored to understand and predict plant genetic functions effectively.​

A study (DOI: 10.48130/tp-0025-0008) published in Tropical Plants on 14 April 2025 by Meiling Zou, Haiwei Chai and Zhiqiang Xia’s team, Hainan University, details how LLMs, when trained on extensive plant genomic data, can accurately predict gene functions and regulatory elements.

In this study, researchers explore the potential of LLMs in plant genomics. By drawing parallels between the structures of natural language and genomic sequences, the study highlights how LLMs can be trained to understand and predict gene functions, regulatory elements, and expression patterns in plants. The research discusses various LLM architectures, including encoder-only models like DNABERT, decoder-only models such as DNAGPT, and encoder-decoder models like ENBED. The team employed a methodology that involved pre-training LLMs on vast datasets of plant genomic sequences, followed by fine-tuning with specific annotated data to enhance accuracy. By treating DNA sequences akin to linguistic sentences, the models could identify patterns and relationships within the genetic code. These models have shown promise in tasks like promoter prediction, enhancer identification, and gene expression analysis. Notably, plant-specific models like AgroNT and FloraBERT have been developed, demonstrating improved performance in annotating plant genomes and predicting tissue-specific gene expression. However, the study also notes that most existing LLMs are trained on animal or microbial data, which often lack comprehensive genomic annotations, showcasing the versatility and robustness of LLMs in diverse plant species. To address this, the authors advocate for the development of plant-focused LLMs trained on diverse plant genomic datasets, including those from underrepresented species like tropical plants. They also emphasize the importance of integrating multi-omics data and developing standardized benchmarks to evaluate model performance.​

In summary, this study underscores the immense potential of integrating artificial intelligence, particularly large language models, into plant genomics research. By bridging the gap between computational linguistics and genetic analysis, LLMs can revolutionize our understanding of plant biology, paving the way for innovations in agriculture, conservation, and biotechnology. Future research will focus on refining these models, expanding their training datasets, and exploring their applications in real-world agricultural scenarios to fully harness their capabilities.

###

References

DOI

10.48130/tp-0025-0008

Original Source URL

https://doi.org/10.48130/tp-0025-0008

About Tropical Plants

Tropical Plants (e-ISSN 2833-9851) is the official journal of Hainan University and published by Maximum Academic Press. Tropical Plants undergoes rigorous peer review and is published in open-access format to enable swift dissemination of research findings, facilitate exchange of academic knowledge and encourage academic discourse on innovative technologies and issues emerging in tropical plant research.

Funding Information

The research was supported by Biological Breeding-National Science and Technology Major Project (2023ZD04073), the Project of Sanya Yazhou Bay Science and Technology City (SCKJ-JYRC-2022-57), and the High-performance Computing Platform of YZBSTCACC.

Title of original paper: Artificial intelligence-driven plant bio-genomics research: a new era
Authors: Lin Yang1, Hao Wang1, Meiling Zou1*, Haiwei Chai2* and Zhiqiang Xia1*
Journal: Tropical Plants
Original Source URL: https://doi.org/10.48130/tp-0025-0008
DOI: 10.48130/tp-0025-0008
Latest article publication date: 14 April 2025
Subject of research: Not applicable
COI statement: The authors declare that they have no competing interests.
Attached files
  • Figure.3 Similarity between genome sequence and language sequence.
01/06/2025 TranSpread
Regions: North America, United States, Asia, China
Keywords: Applied science, Engineering

Disclaimer: AlphaGalileo is not responsible for the accuracy of content posted to AlphaGalileo by contributing institutions or for the use of any information through the AlphaGalileo system.

Testimonials

For well over a decade, in my capacity as a researcher, broadcaster, and producer, I have relied heavily on Alphagalileo.
All of my work trips have been planned around stories that I've found on this site.
The under embargo section allows us to plan ahead and the news releases enable us to find key experts.
Going through the tailored daily updates is the best way to start the day. It's such a critical service for me and many of my colleagues.
Koula Bouloukos, Senior manager, Editorial & Production Underknown
We have used AlphaGalileo since its foundation but frankly we need it more than ever now to ensure our research news is heard across Europe, Asia and North America. As one of the UK’s leading research universities we want to continue to work with other outstanding researchers in Europe. AlphaGalileo helps us to continue to bring our research story to them and the rest of the world.
Peter Dunn, Director of Press and Media Relations at the University of Warwick
AlphaGalileo has helped us more than double our reach at SciDev.Net. The service has enabled our journalists around the world to reach the mainstream media with articles about the impact of science on people in low- and middle-income countries, leading to big increases in the number of SciDev.Net articles that have been republished.
Ben Deighton, SciDevNet

We Work Closely With...


  • e
  • The Research Council of Norway
  • SciDevNet
  • Swiss National Science Foundation
  • iesResearch
Copyright 2025 by AlphaGalileo Terms Of Use Privacy Statement