Gene print-based cell subtypes annotation of human disease across heterogeneous datasets with gPRINT
en-GBde-DEes-ESfr-FR

Gene print-based cell subtypes annotation of human disease across heterogeneous datasets with gPRINT

20/05/2025 Frontiers Journals

gPRINT​​, a computational framework that integrates ​​gene expression levels​​ and ​​chromosomal positional information​​ to generate unique "gene prints" for DSCS annotation. Inspired by speech recognition, this approach leverages spatial gene organization (e.g., co-regulated genes within nucleosomes) to reduce noise and improve resolution in heterogeneous datasets.
Targeted benchmarking against marker-based (SingleR) and clustering-based (Seurat) methods showed gPRINT’s superiority in resolving ambiguities within mixed-cell populations (e.g., tumor-stroma interfaces). In tendinopathy, gPRINT identified novel chondrogenic tendon cells marked by SOX9/COL2A1 co-expression, a population undetectable by conventional methods. Cross-species alignment further validated conserved fibroblast subtypes driving fibrotic cascades in human, mouse, and primate models.
Further comparative analyses highlighted the generalizability of the "gene print" approach across diverse tissue types and disease models. These findings establish gPRINT as a powerful tool for single-cell data integration and subtype annotation, providing a unified platform for decoding cellular heterogeneity in human diseases.
Key findings from the study include:
1. Gene Print Framework for Cross-Dataset Annotation​​: The ​​gPRINT​​ algorithm integrates ​​gene expression​​ and ​​chromosomal positional information​​ to generate unique "gene prints," enabling platform-agnostic identification of disease-specific cell subtypes (DSCSs). Validated across 1.2 million cells, gPRINT achieved ​​98.37% cross-platform accuracy​​, outperforming traditional methods (SingleR, Seurat) in resolving ambiguous populations (e.g., tumor-stroma interfaces) and identifying novel subtypes like SOX9/COL2A1-expressing chondrogenic tendon cells in tendinopathy.
​​2. Mechanistic Link to 3D Genome Architecture​​: Hi-C data confirmed that gene prints reflect ​​spatial co-localization​​ of signature genes (e.g., COL1A1/ACTA2 clusters on chromosome 7) in DSCSs. Disrupting chromosomal topology (e.g., CTCF anchor deletions) reduced annotation accuracy by ​​63%​​, while CRISPR-mediated enhancer deletions abolished subtype-specific pathways (e.g., TGF-β signaling).
3. Therapeutic Discovery and Universal Utility​​: gPRINT prioritized drug candidates (e.g., ascorbic acid, celastrol) via CMAP database integration and revealed conserved fibrotic networks across species (human/mouse/primate). Its application to ​​TendonBase, a multi-omics database, established a universal framework for decoding cellular heterogeneity in fibrosis, cancer, and degenerative diseases.
This study established ​​gPRINT​​, a computational framework that unifies cell subtype annotation across single-cell datasets by integrating ​​gene expression​​ and ​​chromosomal spatial organization​​ into unique "gene prints." Validated on 1.2 million cells, gPRINT achieved ​​98.37% cross-platform accuracy​​, identifying novel pathological subtypes and linking their gene prints to 3D chromatin architecture via Hi-C. Disrupting chromosomal topology reduced annotation accuracy by ​​63%​​, while drug-database integration prioritized candidates like ascorbic acid for fibrosis. The work entitled “Gene print-based cell subtypes annotation of human disease across heterogeneous datasets with gPRINT” was published on Protein & Cell (published on ​Mar. 14, 2025).
DOI: 10.1093/procel/pwaf001
Reference: X Yan R, Fan C, Gu S, Wang T, Yin Z, Chen X. Gene print-based cell subtypes annotation of human disease across heterogeneous datasets with gPRINT. Protein Cell. 2025 March 14:pwaf001. doi: 10.1093/procel/pwaf001
Fichiers joints
  • Image: General conceptual framework and validation of gPRINT. (A) Selected human single-cell transcriptome profiles from HCL and other public datasets were utilized to train and validate gPRINT. The human single-cell data encompasses 159,302 cells from 26 tissues and 5 platforms. (B) For each cell, a neural network of that cell and its “gene print” was constructed for supervised learning of gPRINT using known cell labels from each tissue’s transcriptome atlas. (C) The performance of gPRINT was tested using both an internal human dataset and an external test dataset, which contains single-cell transcriptome data from multiple tissues. For the external human test dataset, gPRINT was validated at the levels of cell type, hybrid hierarchy type, and cell subtype
20/05/2025 Frontiers Journals
Regions: Asia, China
Keywords: Science, Life Sciences

Disclaimer: AlphaGalileo is not responsible for the accuracy of content posted to AlphaGalileo by contributing institutions or for the use of any information through the AlphaGalileo system.

Témoignages

We have used AlphaGalileo since its foundation but frankly we need it more than ever now to ensure our research news is heard across Europe, Asia and North America. As one of the UK’s leading research universities we want to continue to work with other outstanding researchers in Europe. AlphaGalileo helps us to continue to bring our research story to them and the rest of the world.
Peter Dunn, Director of Press and Media Relations at the University of Warwick
AlphaGalileo has helped us more than double our reach at SciDev.Net. The service has enabled our journalists around the world to reach the mainstream media with articles about the impact of science on people in low- and middle-income countries, leading to big increases in the number of SciDev.Net articles that have been republished.
Ben Deighton, SciDevNet
AlphaGalileo is a great source of global research news. I use it regularly.
Robert Lee Hotz, LA Times

Nous travaillons en étroite collaboration avec...


  • e
  • The Research Council of Norway
  • SciDevNet
  • Swiss National Science Foundation
  • iesResearch
Copyright 2025 by DNN Corp Terms Of Use Privacy Statement