Advances in sequencing technologies have transformed life science research, enabling multi-layered exploration of biological systems across species, tissues, and developmental stages. However, this data explosion presents major challenges, including standardized storage formats, quality control, cross-database interoperability, and scalable data delivery. Large-scale global genome sequencing projects—such as the Earth BioGenome Project and numerous organism-wide genomic initiatives—depend on reliable systems capable of handling diverse data types ranging from whole genomes to spatial transcriptomes. Traditional repositories alone are insufficient to support these broad and evolving needs. Based on these challenges, deeper research into efficient multi-omics data archiving and open-sharing frameworks is required.
Researchers from the China National GeneBank (CNGB) have published (DOI: 10.1093/hr/uhaf036) the 2024 update of the China National GeneBank Sequence Archive (CNSA) in Horticulture Research on May 1, 2025. The report details major advances in CNSA’s data scale, data types, visualization tools, international certification, and role in supporting global multi-omics research. CNSA now archives more than 16.3 petabytes of biological data from over 560 institutions worldwide, making it one of the largest open-access repositories for life science data.
CNSA provides public archiving and open-sharing services for a broad spectrum of biological data, including genome assemblies, raw sequencing reads, gene expression matrices, variation data, metabolomics profiles, viral sequences, and single-cell and spatial transcriptomic datasets. As of August 2024, the repository includes 1,122,067 samples, 1,766,269 sequencing datasets, and 125,855 genome assemblies, representing 7,521 species, supported by 47 sequencing platforms. A key update is the addition of a spatial transcriptomics archiving system, which captures tissue section metadata, image files, barcoding information, and spatial gene expression matrices, integrated with an online viewer that enables cell-type annotation, spatial region segmentation, and cell–cell interaction analysis. CNSA now supports high-speed data access through FTP, HTTPS, and Aspera transfer protocols, and has received formal certifications including CoreTrustSeal, FAIRsharing, and re3data, demonstrating global compliance with data management and preservation standards. CNSA also contributes to major international projects such as the 10KP Plant Genome Project, the Earth BioGenome Project, and the SpatioTemporal Omics Consortium, accelerating discovery across evolution, agriculture, ecology, and human health.
“Open and well-curated biological data resources are essential to advancing global scientific collaboration,” the authors noted. “The continued development of CNSA reflects the growing need to archive, preserve, and share complex multi-omics datasets at scale. By integrating quality control systems, standardized metadata formats, visualization platforms, and international interoperability frameworks, CNSA provides researchers worldwide with the tools required to accelerate genome science and biodiversity conservation.”
The updated CNSA platform supports broad research applications in plant and animal genomics, crop breeding, evolutionary biology, microbial ecology, medical research, environmental monitoring, and biodiversity protection. Its open-access structure encourages data reuse, reduces duplication, and supports integrative analyses that combine genomics, transcriptomics, phenotyping, and spatial mapping. Future developments will integrate artificial intelligence-assisted data curation, application programming interfaces (APIs), and cloud computing platforms to enable large-scale data analysis without requiring local storage. These advancements will further enhance CNSA’s role as a critical global infrastructure for accelerating biological discovery and supporting sustainable management of genetic resources.
###
References
DOI
10.1093/hr/uhaf036
Original Source URL
https://doi.org/10.1093/hr/uhaf036
Funding information
This study was supported by the Guangdong Genomics Data Center (2021B1212100001), Shenzhen Science and Technology Program (KQTD20230301092839007), Biological Breeding-National Science and Technology Major Project (2023ZD04073), and the China National GeneBank.
About Horticulture Research
Horticulture Research is an open access journal of Nanjing Agricultural University and ranked number one in the Horticulture category of the Journal Citation Reports ™ from Clarivate, 2023. The journal is committed to publishing original research articles, reviews, perspectives, comments, correspondence articles and letters to the editor related to all major horticultural plants and disciplines, including biotechnology, breeding, cellular and molecular biology, evolution, genetics, inter-species interactions, physiology, and the origination and domestication of crops.