Integrating Attention Modules with YOLOv8 for Enhanced Crack Detection and Segmentation

Samuel Owoeye; Folasade Durodola; Sikirulahi Abdulkareem; Olugbenga Omotainse

doi:10.24017/science.2026.1.9

Authors

Samuel Owoeye Department of Mechatronics Engineering, Federal University of Agriculture, Abeokuta, Nigeria Author https://orcid.org/0000-0003-4497-2111 (unauthenticated)
Folasade Durodola Department of Mechatronics Engineering, Federal University of Agriculture, Abeokuta, Nigeria Author https://orcid.org/0000-0002-0593-6153 (unauthenticated)
Sikirulahi Abdulkareem Department of Mechatronics Engineering, Federal University of Agriculture, Abeokuta, Nigeria Author https://orcid.org/0009-0008-0177-9905 (unauthenticated)
Olugbenga Omotainse Department of Agriculture and Bioresources Engineering, Federal University of Agriculture, Abeokuta, Nigeria Author https://orcid.org/0000-0001-8398-4997 (unauthenticated)

DOI:

https://doi.org/10.24017/science.2026.1.9

Keywords:

Attention mechanisms, Deep learning, Building inspection, Shuttle Attention, YOLOv8

Abstract

Earlier cracks identification is very crucial in structural building maintenance as it is the main signifier of building deterioration. Manual inspection processes are slow, expensive and can be easily compromised by human error. Though the You Only Look Once version 8 (YOLOv8) has emerged as a powerful framework for automated crack detection, it faces limitations in accurately detecting small, irregularly shaped, and partially obscured cracks due to feature loss in deeper network layers and insufficient pixel-level precision. This study addresses these limitations by strategically integrating five attention mechanisms into YOLOv8's architecture: Convolutional Block Attention Module (CBAM), Efficient Channel Attention (ECA), Selective Kernel Attention (SKA), Shuffle Attention (SA), and Global Attention Mechanism (GAM). The attention modules were placed at critical positions within the backbone and neck regions to enhance feature representation without compromising computational efficiency. Using a comprehensive dataset of 13,169 building crack images with 19,386 annotations, each attention-enhanced variant was trained and evaluated against the baseline YOLOv8 model. Results demonstrate consistent improvements across all attention mechanisms. CBAM achieved the highest segmentation accuracy with mask mean Average Precision (mAP) @0.5 of 0.820 (0.4% improvement), while ECA provided the most parameter-efficient enhancement, improving box precision by 3.5% with only 41 additional parameters. SKA excelled in recall performance, achieving 0.724 (1.0% improvement), valuable for comprehensive building crack detection. All variants maintained practical deployment feasibility, supporting real-time building inspection applications. The findings confirm that attention mechanism integration offers an effective approach to enhance YOLOv8 for building crack detection, providing empirical evidence for attention module selection based on specific deployment constraints and supporting the development of more reliable automated building inspection systems.

References

M.Z. Khan, M. Shahzadi, A. Khan, U. Ali, M.A.S. Hassan, and M. Hussain, "Review on crack detection in civil infrastruc-ture using structural health monitoring and machine learning techniques," Innovative Infrastructure Solutions, vol. 10, no. 8, pp. 348 , 2025, doi: 10.1007/s41062-025-02147-y. DOI: https://doi.org/10.1007/s41062-025-02147-y

M. K. Askar, R.F. Al-Kamaki, and K.M. Hisham, "Cracks in concrete structures causes and treatments: A review," Journal of Duhok University, vol. 26, no. 2, pp. 148-165, 2023, [Online]: Available: https://journal.uod.ac/index.php/uodjournal/article/view/3294. [Accessed Sep, 2. 2025]. DOI: https://doi.org/10.26682/csjuod.2023.26.2.16

X. Li, X. Langxing, W. Mengpu, Z. Lixiao, and Z. Chen, "An underwater crack detection method based on improved YOLOv8," Ocean Engineering, vol. no. 3, 313, pp. 119508, 2024, doi: 10.1016/j.oceaneng.2024.119508. DOI: https://doi.org/10.1016/j.oceaneng.2024.119508

H. S.Munawar, A. W. Hammad, A. Haddad, C. A. Soares, S. T. Waller, ‘ Image-based crack detection methods: A re-view,” Infrastructures, vol. 6, no. 8, pp. 115, 2021, doi: 10.3390/infrastructures6080115. DOI: https://doi.org/10.3390/infrastructures6080115

N. Meyendorf, “Early detection of materials degradation,” In AIP Conference Proceedings, vol. 36, pp. 020002-1-020002-10, 2017, doi: 10.1063/1.4974543. DOI: https://doi.org/10.1063/1.4974543

T. Yamane, and P. Chun, “Crack detection from a concrete surface image based on semantic segmentation using deep learning.” Journal of Advanced Concrete Technology, vol. 18, no. 9, pp. 493-504, 2020., doi: 10.3151/jact.18.493. DOI: https://doi.org/10.3151/jact.18.493

J. Zhang, S. Qian, and C. Tan, “Automated bridge surface crack detection and segmentation using computer vision-based deep learning model,” Engineering Applications of Artificial Intelligence, vol. 115, pp. 105225, 2022, doi: 10.1016/j.engappai.2022.105225. DOI: https://doi.org/10.1016/j.engappai.2022.105225

A. Mohan, and P. Sumathi, "Crack detection using image processing: A critical review and analysis." Alexandria engineer-ing journal, vol. 57, no. 2, pp. 787-798, 2018, doi: 10.1016/j.aej.2017.01.020. DOI: https://doi.org/10.1016/j.aej.2017.01.020

R. Varghese, and M. Sambath, “YOLOv8: A novel object detection algorithm with enhanced performance and robust-ness.” In 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), pp. 1-6, 2024, doi: 10.1109/ADICS58448.2024.10533619. DOI: https://doi.org/10.1109/ADICS58448.2024.10533619

Z.F. Elsharkawy, H. Kasban, and M.Y. Abbass, “Efficient surface crack segmentation for industrial and civil applications based on an enhanced YOLOv8 model.” Journal of Big Data, vol. 12, no. 1, pp. 16, 2025, doi: 10.1186/s40537-025-01065-1. DOI: https://doi.org/10.1186/s40537-025-01065-1

Z. Zhang, H. Zhang, and T. Zhang, “Enhanced YOLOv8-based pavement crack detection: A high-precision approach.” PLoS ONE, vol. 20, no. 5, pp. e0324512, 2025, doi: 10.1371/journal.pone.0324512. DOI: https://doi.org/10.1371/journal.pone.0324512

D. Lin, X. Tian, J. Duan, D. Zhou, D. Zhao, D. Cao, “DA-RDD: toward domain adaptive road damage detection across different countries.” IEEE Transacations on Intelligent Transportation System, vol. 24, no. 3, pp. 3091–3103, 2023, doi: 10.1109/TITS.2022.3221067. DOI: https://doi.org/10.1109/TITS.2022.3221067

H. Liu, C. Jia, F. Shi, X. Cheng, M. Wang,and S. Chen, ”Staircase cascaded fusion of lightweight local pattern recognition and long-range dependencies for structural crack segmentation.” arXiv preprint arXiv:2408.12815, 2024,

doi: 10.48550/arXiv.2408.12815.

H. Kim, E. Ahn, M. Shin, and S. Sim, “Crack and non-crack classification from concrete surface images using machine learning.” Structural Health Monitoring, vol. 18, no. 3, pp. 725- 738, 2019, doi: 10.1177/1475921718768747. DOI: https://doi.org/10.1177/1475921718768747

M. Yaseen, “What is YOLOv8: An in-depth exploration of the internal features of the next-generation object fetector”. arXiv preprint arXiv:2408.15857, 2024., doi: 10.48550/arXiv.2408.15857.

H. Yao, et al., "A detection method for pavement cracks combining object detection and attention mechanism." IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 11, pp. 22179-22189, 2022, doi: 10.1109/TITS.2022.3177210. DOI: https://doi.org/10.1109/TITS.2022.3177210

P. Gupta, and M. Dixit, “Image-based crack detection approaches: A comprehensive survey.” Multimedia Tools and Appli-cations, vol. 81, no. 28, pp. 40181-40229, 2022, doi: 10.1007/s11042-022-13152-z. DOI: https://doi.org/10.1007/s11042-022-13152-z

G. Boesch, ”YOLOv8: A Complete Guide.” Viso.ai Blog. 2024. Accessed Sep., 22, pp. 24-25, 2025, https://viso.ai/deep-learning/yolov8-guide

J. Zhang, Z. V. Beliaeva, and Y. Huang, “Accuracy–efficiency trade-off: Optimising YOLOv8 for structural crack detec-tion,” Sensors, vol. 25, no. 13, pp. 3873, 2025, doi: 10.3390/s25133873. DOI: https://doi.org/10.3390/s25133873

F. Yu, G. Ye, Q. Jiang, K. Yuen, X. Chong, and Q. Jin, “Imaging-based instance segmentation of pavement cracks using an improved YOLOv8 network,” Structural Control and Health Monitoring, vol. 1660649, pp.1-22, 2025, doi: 10.1155/stc/1660649. DOI: https://doi.org/10.1155/stc/1660649

Z. Liua, H. Yao, X. Zhong, and Z. Deng, “A real-time pavement crack detection method based on an improved light-weight YOLOv8 model,” International Journal of Applied Mathematics in Control Engineering, 7, pp. 171-176, 2024. [Online]: Available:

http://www.ijamce.com/Papers/IJAMCE20241208.pdf. [Accessed Sep, 2. 2025].

X. Dong, Y. Liu, and J. Dai, “Concrete surface crack detection algorithm based on improved YOLOv8,” Sensors, vol. 24, no. 16, pp. 5252, 2024. doi: 10.3390/s24165252. DOI: https://doi.org/10.3390/s24165252

T. Cao, W. Li, H. Sun, P. Wang, and Z. Gong, “YOLOv8-PCD: A pavement crack detection method based on enhanced feature fusion,” In SPIE Conference Proceedings, Dalian, China, 2024, pp. 13421, doi: 10.1117/12.3054712. DOI: https://doi.org/10.1117/12.3054712

X. Cui, W. Qicai, D. Jinpeng, X. Yanjin, and D. Yun, “Intelligent crack detection based on attention mechanism in convo-lution neural network,” Advances in Structural Engineering, vol. 24, no. 9, pp. 1859-1868, 2021, doi: 10.1177/1369433220986638. DOI: https://doi.org/10.1177/1369433220986638

Z. Qu, C. Wen, W. Shi-Yan, Y. Tu-Ming, and L. Ling, “A crack detection algorithm for concrete pavement based on atten-tion mechanism and multi-features fusion,” IEEE Transactions on Intelligent Transportation Systems, vol. 23, no. 8, pp. 11710–11719, 2022, doi: 10.1109/tits.2021.3106647. DOI: https://doi.org/10.1109/TITS.2021.3106647

G. Xu, H. Xu, Z. Yuwei, and W. Chunyan “Dam crack image detection model on feature enhancement and attention mechanism,” Water, vol. 15, no. 1, pp. 64, 2022, doi: 10.3390/w15010064. DOI: https://doi.org/10.3390/w15010064

R. Junhua, Z. Guowu, M. Yadong, Z. De, L. Tao, and Y. Jun, “Automatic pavement crack detection fusing attention mechanism,” Electronics, vol. 11, no. 21, pp. 3622, 2022, doi: 10.3390/electronics11213622. DOI: https://doi.org/10.3390/electronics11213622

Y. Jiang, and Z. chunhui, “Attention classification-and-segmentation network for micro-crack anomaly detection of pho-tovoltaic module cells,” Solar Energy, vol. 238, pp. 291–304, 2022, doi: 10.1016/j.solener.2022.04.012. DOI: https://doi.org/10.1016/j.solener.2022.04.012

P. Jing, Y. Haiyang, H. Zhihua, X. Saifei, and S. Caoyuan, “Road crack detection using deep neural network based on attention mechanism and residual structure,” IEEE Access, vol. 11, pp. 919–929, 2022, doi: 10.1109/ACCESS.2022.3233072. DOI: https://doi.org/10.1109/ACCESS.2022.3233072

F. Guo, L. Jian, L. Chengshun, and Y. Huayang, “A novel transformer-based network with attention mechanism for automatic pavement crack detection,” Construction and Building Materials, vol. 391, pp. 131852–131862. 2023, doi: 10.1016/j.conbuildmat.2023.131852. DOI: https://doi.org/10.1016/j.conbuildmat.2023.131852

H. Liu, X. Miao,C. Mertz, C. Xu, and H. Kong, “CrackFormer: Transformer network for fine-grained crack detection,” In Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3763-3772, 2021, doi: 10.1109/ICCV48922.2021.00376. DOI: https://doi.org/10.1109/ICCV48922.2021.00376

X. Jiang, L. Jiang,A, Wang, K. Zhu, and Y. Gao, “CrackSegDiff: Diffusion probability model-based multi-modal crack segmentation,” arXiv preprint arXiv:2410.08,2024. doi: 10.48550/arXiv.2410.08100.

F. Lin, J. Yang, J. Shu, and R. J. Scherer, “Crack semantic segmentation using the U-Net with full attention strategy,” arXiv preprint arXiv:2104.14586, 2021. doi: 10.48550/arXiv.2104.14586.

L. Yang, R. Zhang, L. Li, and X. Xie, X. “SimAM: A simple, parameter- free attention module for convolutional neural networks,” In Proceedings of the 38th International Conference on Machine Learning, vol. 139, pp. 11863-11874, 2021. . [Online]: Available: https://proceedings.mlr.press/v139/yang21o/yang21o.pdf. [Accessed Sep, 2. 2025].

T. Diwan, G. Anirudh, and J.V. Tembhurne, “Object detection using YOLO: challenges, architectural successors, datasets and applications,” Multimedia Tools and Applications, vol. 82, no. 6, pp. 9243-9275, 2023, doi: 10.1007/s11042-022-13644-y. DOI: https://doi.org/10.1007/s11042-022-13644-y

C.-Y. Wang, H. -Y. Mark Liao, Y. -H. Wu, P. -Y. Chen, J. -W. Hsieh and I. -H. Yeh, "CSPNet: A New Backbone that can Enhance Learning Capability of CNN," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 2020, pp. 1571-1580, doi: 10.1109/CVPRW50498.2020.00203. DOI: https://doi.org/10.1109/CVPRW50498.2020.00203

T. -Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan and S. Belongie, "Feature Pyramid Networks for Object Detec-tion," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 2017, pp. 936-944, doi: 10.1109/CVPR.2017.106. DOI: https://doi.org/10.1109/CVPR.2017.106

L.T. Ramos, and A. D. Sappa, “A decade of you only look once (YOLO) for object detection: A review,” IEEE Access, vol. 13, pp. 192747-192794, 2025, doi: 10.1109/ACCESS.2025.3630988. DOI: https://doi.org/10.1109/ACCESS.2025.3630988

X. Tong, L. Zhihong and L. Fangrong, "Succulent plant image classification based on lightweight GoogLeNet with CBAM attention mechanism," Applied Sciences, vol.15, no. 7, pp. 3730, 2025, doi: 10.3390/app15073730. DOI: https://doi.org/10.3390/app15073730

L. Mengxuan, Z. Luo, and M. Jiang, "Intelligent modulation recognition of frequency-hopping communications: theory, methods, and challenges," Big Data and Cognitive Computing, vol. 9, no. 12, pp 318, 2025, doi: 10.3390/bdcc9120318. DOI: https://doi.org/10.3390/bdcc9120318

Y. Chen, Y. Tao, D. Shuai, W. Like, P. Bida, and W. Yunlong, "Enhancing crack segmentation network with multiple selective fusion mechanisms," Buildings, vol. 15, no. 7, pp. 1088, 2025, doi: 10.3390/buildings15071088. DOI: https://doi.org/10.3390/buildings15071088

Z. Cai, D. Yuming, Z. Jianwei, and F. Yuan, "SA-ResNet: An intrusion detection method based on spatial attention mechanism and residual neural network fusion," Computers, Materials and Continua, vol. 83, no. 2, 2025, doi: 10.32604/cmc.2025.061206. DOI: https://doi.org/10.32604/cmc.2025.061206

A.L.C. Ottoni, A.M Souza, and M.S. Novo, “Automated hyperparameter tuning for crack image classification with deep learning,” Soft Computing, vol. 27, no 23, pp. 18383–18402, 2023, doi: 10.1007/s00500-023-09103-x. DOI: https://doi.org/10.1007/s00500-023-09103-x

F. Gerz, M. G. Schneider, and M. Jelali, "Integration of vision transformer networks in yolov8 for object detection: com-parative study on plant disease detection," IEEE Access, vol. 14, pp. 27303-27338, 2026, doi: 10.1109/ACCESS.2026.3665969. DOI: https://doi.org/10.1109/ACCESS.2026.3665969

Q. Zaheer, Q. Shi, S. M. A. Hassan Shah, C. Ai, and J. Wang, "Intelligent multitasking framework for boundary-preserving semantic segmentation, width estimation, and propagation modeling of concrete cracks," Journal of Infrastruc-ture Systems, vol. 31, no. 3, pp. 04025009, 2025, doi: 10.1061/JITSE4.ISENG-2574. DOI: https://doi.org/10.1061/JITSE4.ISENG-2574

D. Nguyen, V. -D. Hoang and V. -T. -L. Le, "A lightweight multi-scale attention model for small object detection in UAV imagery," IEEE Access, vol. 14, pp. 12579-12593, 2026, doi: 10.1109/ACCESS.2026.3656179. DOI: https://doi.org/10.1109/ACCESS.2026.3656179

C. Liu, X. Zeng, R. Lin, X. Liang, Z. Freyberg, E. Xing, and M. Xu, “Deep learning-based supervised semantic segmenta-tion of electron cryo-subtomograms,” In 2018, the 25th IEEE International Conference on Image Processing, pp. 1578-1582, 2018, doi: 10.1109/ICIP.2018.8451386. DOI: https://doi.org/10.1109/ICIP.2018.8451386

O. E. Olorunshola, E.I. Martins, and E. E. Abraham, "A comparative study of YOLOv5 and YOLOv7 object detection algo-rithms," Journal of Computing and Social Informatics, vol. 2, no. 1, pp. 1-12, 2023, doi: 10.33736/jcsi.5070.2023. DOI: https://doi.org/10.33736/jcsi.5070.2023

L. Fadia, S. Vatsal, H. Mohammad, W. Jonathan, and A. Majid, "A novel multi-modal dual pathway network with hier-archical channel-spatial attention and adaptive feature fusion for viral genomic variant classification," Network Model-ling Analysis in Health Informatics and Bioinformatics, vol. 14, no. 1, pp. 75, 2025, doi: 10.1007/s13721-025-00576-4. DOI: https://doi.org/10.1007/s13721-025-00576-4

A. Haboub, B. Hamza, and B. Abdesselam, "DCT-based channel attention for multivariate time series classifica-tion," IEEE Open Journal of the Computer Society, vol. 6, pp. 1110-1120, 2025, doi: 10.1109/OJCS.2025.3586682. DOI: https://doi.org/10.1109/OJCS.2025.3586682

S. Li and T. Ma, "Pathway representation learning for interpretable graph neural networks," IEEE Access, vol. 14, pp. 41979-41997, 2026, doi: 10.1109/ACCESS.2026.3673117. DOI: https://doi.org/10.1109/ACCESS.2026.3673117

Y. Song, Mi, H. Xie, and H. Chen, “November. Powerinfer: Fast large language model serving with a consumer-grade GPU,” In Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles, pp. 590-606, 2024, doi: 10.1145/3694715.3695964 DOI: https://doi.org/10.1145/3694715.3695964

D. Soydaner, "Attention mechanism in neural networks: where it comes and where it goes." Neural Computing and Appli-cations,” vol. 34, no. 16, pp. 13371-13385, 2022, doi: 10.1007/s00521-022-07366-3. DOI: https://doi.org/10.1007/s00521-022-07366-3