Spatial transcriptomics (ST) has revolutionized biomedical research by enabling scientists to measure gene expression while preserving the spatial architecture of tissues. However, a major bottleneck remains: how to effectively group genes that share similar spatial patterns—especially rare or unique ones—into biologically meaningful modules.
Recently, a research team led by Jingyi Jessica Li from the University of California Los Angeles, reported a novel two-step method called stIHC (spatial transcriptomics iterative hierarchical clustering). This method offers a powerful, data-driven solution for identifying spatial gene co-expression modules, even when cluster sizes are highly imbalanced.
Unlike standard clustering methods (e.g., those used in SPARK, SpatialDE, or MERINGUE), stIHC excels at detecting small or rare spatial expression patterns that are often merged into larger clusters or completely missed. It is fully data-adaptive and automatically determines the optimal number of clusters using an internal metric—requiring no manual parameter tuning. In the first step, stIHC models gene expression as smooth, continuous spatial fields using a generalized penalized regression framework, effectively reducing noise and accounting for spatial dependencies. In the second step, it applies functional iterative hierarchical clustering (funIHC) to precisely handle imbalanced clusters.
Through three rigorous simulations and applications to real datasets from 10x Visium, 10x Xenium, and Spatial Transcriptomics technologies, stIHC consistently outperformed existing clustering approaches. It achieved perfect or near-perfect adjusted Rand index (ARI) scores in imbalanced and sparse-resolution scenarios, while competing methods struggled. Gene ontology (GO) enrichment analysis confirmed that the modules identified by stIHC are not only spatially coherent but also functionally relevant. In mouse brain datasets, stIHC automatically recovered gene clusters corresponding to key anatomical regions, including hippocampus, hypothalamus, thalamus, and cortical subplate. In human lung cancer data (Figure 1), it successfully distinguished modules associated with normal lung function from those supporting tumor growth.
DOI:10.1002/qub2.70011