EnvGPT: a specialized AI tool for climate, water, and soil science challenges
en-GBde-DEes-ESfr-FR

EnvGPT: a specialized AI tool for climate, water, and soil science challenges

01/09/2025 TranSpread

Environmental science integrates diverse disciplines like ecology, hydrology, and climate science, requiring models that understand specialized jargon and heterogeneous data. While general-purpose Large language models (LLMs) have advanced fields like medicine and law, they struggle with domain-specific environmental tasks due to limited training on relevant corpora. Previous efforts like ClimateGPT and WaterGPT focused on narrow subdomains, lacking a unified, cross-disciplinary approach. Based on these challenges, there is a critical need to develop integrated frameworks that generate high-quality environmental data and enable rigorous model evaluation.

Published (DOI: 10.1016/j.ese.2025.100608) on August 1, 2025, in Environmental Science and Ecotechnology, researchers from Southern University of Science and Technology and Tsinghua University unveiled EnvGPT—a fine-tuned language model specifically designed for environmental science. The study presents a comprehensive pipeline including a multi-agent instruction generator (EnvInstruct), a balanced 100-million-token dataset (ChatEnv), and a 4998-item benchmark (EnvBench) to train and evaluate the model across five core environmental themes.

The research team constructed EnvCorpus from open-access environmental journals, covering five key themes, and used a multi-agent GPT-4 system to generate 112,946 instruction–response pairs. EnvGPT was fine-tuned using low-rank adaptation (LoRA), significantly reducing computational cost while maintaining performance. On the independently designed EnvBench, EnvGPT outperformed similarly sized models like LLAMA-3.1-8B and Vicuna-1.5-7B, and even matched the performance of the much larger Qwen2.5-72B and closed-source GPT-4o-mini in factual accuracy and relevance. Notably, it achieved 92.06% accuracy on the EnviroExam benchmark—a test based on university-level multiple-choice questions—surpassing baseline models by ~8 points. The model also excelled in real-world applicability, especially in interdisciplinary and complex reasoning tasks, as validated by the ELLE dataset.

"This work demonstrates how targeted fine-tuning with domain-specific data can elevate compact models to compete with giants in the field. EnvGPT sets a new standard for AI applications in environmental science," said Dr. Qing Hu, corresponding author and lead researcher at the State Key Laboratory of Soil Pollution Control and Safety.

EnvGPT can support researchers, educators, and policymakers by providing accurate, domain-aware responses to complex environmental queries. The open release of ChatEnv and EnvBench enables reproducible research and encourages community-driven improvements. Future work may integrate retrieval-augmented generation and multimodal data to enhance real-time reasoning and keep pace with evolving scientific knowledge.

###

References

DOI

10.1016/j.ese.2025.100608

Original Source URL

https://doi.org/10.1016/j.ese.2025.100608

Funding information

This research was supported by the National Key Research and Development Program of China (2024YFC3711800) and the High-level University Special Fund (G03050K001).

About Environmental Science and Ecotechnology

Environmental Science and Ecotechnology (ISSN 2666-4984) is an international, peer-reviewed, and open-access journal published by Elsevier. The journal publishes significant views and research across the full spectrum of ecology and environmental sciences, such as climate change, sustainability, biodiversity conservation, environment & health, green catalysis/processing for pollution control, and AI-driven environmental engineering. The latest impact factor of ESE is 14.3, according to the Journal Citation ReportsTM 2024.

Paper title: Fine-tuning large language models for interdisciplinary environmental challenges
Attached files
  • EnvGPT Framework: From Data to Benchmark. This diagram illustrates the three core components of the EnvGPT development pipeline: EnvInstruct for generating instruction sets, ChatEnv—a 100-million-token domain-specific dataset, and EnvBench—a comprehensive benchmark for evaluating LLMs in environmental science. Together, they enable efficient and reproducible model training and assessment.
01/09/2025 TranSpread
Regions: North America, United States
Keywords: Applied science, Artificial Intelligence, Science, Environment - science

Disclaimer: AlphaGalileo is not responsible for the accuracy of content posted to AlphaGalileo by contributing institutions or for the use of any information through the AlphaGalileo system.

Testimonials

For well over a decade, in my capacity as a researcher, broadcaster, and producer, I have relied heavily on Alphagalileo.
All of my work trips have been planned around stories that I've found on this site.
The under embargo section allows us to plan ahead and the news releases enable us to find key experts.
Going through the tailored daily updates is the best way to start the day. It's such a critical service for me and many of my colleagues.
Koula Bouloukos, Senior manager, Editorial & Production Underknown
We have used AlphaGalileo since its foundation but frankly we need it more than ever now to ensure our research news is heard across Europe, Asia and North America. As one of the UK’s leading research universities we want to continue to work with other outstanding researchers in Europe. AlphaGalileo helps us to continue to bring our research story to them and the rest of the world.
Peter Dunn, Director of Press and Media Relations at the University of Warwick
AlphaGalileo has helped us more than double our reach at SciDev.Net. The service has enabled our journalists around the world to reach the mainstream media with articles about the impact of science on people in low- and middle-income countries, leading to big increases in the number of SciDev.Net articles that have been republished.
Ben Deighton, SciDevNet

We Work Closely With...


  • e
  • The Research Council of Norway
  • SciDevNet
  • Swiss National Science Foundation
  • iesResearch
Copyright 2025 by AlphaGalileo Terms Of Use Privacy Statement