A new method for automated concrete bridge damage detection using an efficient Vision Transformer-enhanced anchor-free YOLO (You Only Look Once) has been proposed by researchers from the University of Auckland, New Zealand, and Chongqing Jiaotong University, China. The study, published in
Engineering, aims to address the challenges faced by existing deep learning techniques in detecting bridge damage captured by unmanned aerial vehicles (UAVs).
Concrete bridges are susceptible to deterioration due to environmental conditions, natural hazards, increasing traffic, and aging. Periodic inspections are crucial for assessing bridge conditions and providing early warnings of defects that may impact safety. However, traditional visual inspections are time-consuming, subjective, and error-prone. UAVs equipped with high-definition cameras have been increasingly used for bridge inspections, offering significant cost savings compared to manual inspections. Integrating UAVs with computer vision-based damage detection algorithms can further improve inspection efficiency.
Existing deep learning-based damage detection methods face several challenges. Defect scale variance, motion blur, and strong illumination significantly affect the accuracy and reliability of damage detectors. Anchor-based damage detectors struggle to generalize to real-world scenarios, and convolutional neural networks (CNNs) lack the capability to model long-range dependencies across the entire image. To address these issues, the researchers developed an efficient Vision Transformer-enhanced anchor-free YOLO method.
The researchers established a concrete bridge damage dataset, augmented with motion blur and varying brightness to better adapt to real-world conditions. They applied four key enhancements to the YOLOv5l algorithm: four detection heads to alleviate multi-scale damage detection issues, decoupled heads to address the conflict between classification and bounding box regression tasks, an anchor-free mechanism to reduce computational complexity and improve generalization, and a novel Vision Transformer (ViT) block, C3MaxViT, to enable CNNs to model long-range dependencies.
The proposed method was compared against state-of-the-art damage detection methods. Experimental results demonstrated an increase of 8.1% in mean average precision at intersection over union threshold of 0.5 (mAP
50) and an improvement of 8.4% in mAP@[0.5:.05:.95]. Ablation studies revealed that the four detection heads, decoupled head design, anchor-free mechanism, and C3MaxViT contributed improvements of 2.4%, 1.2%, 2.6%, and 1.9% in mAP
50, respectively.
The study’s main contributions include the creation of a multi-scale concrete bridge damage dataset augmented with motion blur and varying brightness levels, the development of a Vision Transformer-enhanced anchor-free YOLO method based on the YOLOv5l algorithm, and the proposal of a novel C3MaxViT block to model long-range dependencies. The proposed method shows promise in improving the accuracy and efficiency of automated concrete bridge damage detection, particularly in challenging real-world conditions. Future research could focus on establishing larger datasets, exploring dataset augmentation with synthetic defects, and developing models capable of handling low light and exposure conditions.
The paper “Automated Concrete Bridge Damage Detection Using an Efficient Vision Transformer-Enhanced Anchor-Free YOLO,” is authored by Xiaofei Yang, Enrique del Rey Castillo, Yang Zou, Liam Wotherspoon, Jianxi Yang, Hao Li. Full text of the open access paper:
https://doi.org/10.1016/j.eng.2025.02.018. For more information about
Engineering, visit the website at
https://www.sciencedirect.com/journal/engineering.