1.Background
Medical image segmentation is a fundamental component of many clinical applications such as computer-aided diagnosis, radiotherapy planning, and preoperative planning. Its accuracy and stability directly affect subsequent quantitative analysis and clinical decision-making. In recent years, U-shaped encoder–decoder architectures represented by U-Net have become the mainstream solutions, and segmentation performance has been continuously improved by designing various customized modules. However, to achieve higher accuracy, the number of model parameters has kept increasing, which is not conducive to deployment in real clinical environments. At the same time, the conventional decoder that progressively restores spatial resolution may introduce redundant information and accumulate noise. How to reduce model size while maintaining or even improving accuracy, and how to establish a unified framework for both 2D and 3D tasks, remains a key bottleneck for large-scale clinical and industrial applications of medical image segmentation.
2.Research Progress
Based on a systematic analysis of feature utilization in U-shaped architectures, Prof. Zhao Qi’s team at Liaoning University of Science and Technology and Prof. Shuai Jianwei’s team at the Wenzhou Institute of the University of Chinese Academy of Sciences have developed an E-shaped medical image segmentation framework that differs from conventional symmetric decoders. This framework preserves a multi-layer encoder to extract multi-scale features, but no longer relies on a step-by-step upsampling procedure in the decoder. Instead, features from different encoder stages are directly fused across layers, and a specially designed refinement module is then used to produce the final segmentation, thereby realizing a “lightweight decoding” reconstruction paradigm at the architectural level.
Building on this framework, the research team has constructed both 2D and 3D model variants (Figure 1) and conducted systematic evaluations on eight multi-modal datasets, covering several representative scenarios including abdominal multi-organ segmentation, cardiac structure segmentation, polyp segmentation, and lesion segmentation in ultrasound images. The results show that the proposed method achieves segmentation accuracy comparable to or better than existing representative approaches (Figures 2 and 3), while simultaneously reducing model size and significantly improving inference speed. In addition, when existing U-shaped networks are converted to the proposed E-shaped structure, most models obtain performance gains despite having fewer parameters, demonstrating the good generalizability and practical value of the E-shaped framework.
3.Future Prospects
Thanks to its compact architecture and efficient inference, E-SegNet holds promise for applications such as multi-organ automatic delineation, tumor and organ-at-risk segmentation, as well as real-time lesion highlighting in endoscopic and ultrasound scenarios. It is expected to help reduce the annotation workload of clinicians and improve the consistency and objectivity of segmentation. The unified 2D/3D design and validation on CT, MRI, ultrasound, and endoscopic data provide a solid foundation for building robust segmentation systems across multiple centers and devices. In the future, the framework can be further extended to other imaging modalities such as PET and X-ray, and validated on larger-scale real-world datasets. Since E-shaped structure is decoupled from specific encoders, it can be combined with more lightweight backbones, model compression, and knowledge distillation techniques to better meet the resource constraints of edge devices. It can also be integrated with new long-range dependency modeling approaches and graph-based modeling methods to enhance representation capability for complex anatomical structures and high-noise scenarios.
The complete study is accessible via DOI:10.34133/research.0869