General Multi-modal Image Fusion Transformer Network
DOI:
https://doi.org/10.62051/ijcsit.v5n1.02Keywords:
Image fusion, Transformer, End-to-end networkAbstract
In the field of image fusion, images obtained from multiple different sensors are fused into one image that contains more complementary information and fewer redundant features to produce a single image with enhanced information. In order to be accurate and concise for different Fusion tasks, this paper proposes a General end-to-end Multi-modal image fusion Transformer Network (GMTN). A two-branch feature extraction module is designed, which integrates the self-attention mechanism of the improved convolutional neural network CNN and Transformer to extract the short-range features and remote dependencies of the image respectively, and take into account various information of the fused image in a more comprehensive way. A large number of experimental results show that the proposed method achieves the same or even better performance than the existing image fusion on a variety of multi-modal medical image data sets, and also achieves good results on infrared and visible light, multi-focus and multi-exposure image fusion tasks.
Downloads
References
[1] Zhang, Hao, et al. "Image fusion meets deep learning: A survey and perspective." Information Fusion 76 (2021): 323-336.
[2] Ardeshir, Goshtasby A., and S. Nikolov. "Image fusion: Advances in the state of the art." Information Fusion 8.2 (2007): 114-118.
[3] Li, Hui, **anbiao Qi, and Wuyuan **e. "Fast infrared and visible image fusion with structural decomposition." Knowledge-Based Systems 204 (2020): 106182.
[4] Zhang, Hao, et al. "Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity." Proceedings of the AAAI conference on artificial intelligence. Vol. 34. No. 07. 2020.
[5] Wu, Minghui, et al. "Infrared and visible image fusion via joint convolutional sparse representation." JOSA A 37.7 (2020): 1105-1115.
[6] Tang, Wei, et al. "A phase congruency‐based green fluorescent protein and phase contrast image fusion method in nonsubsampled shearlet transform domain." Microscopy research and technique 83.10 (2020): 1225-1234.
[7] Ho, Jonathan, et al. "Axial attention in multidimensional transformers." arxiv preprint arxiv:1912.12180 (2019).
[8] Harvard medical website. http://www.med.harvard.edu/AANLIB/home.html
[9] Lin, Tsung-Yi, et al. "Microsoft coco: Common objects in context." Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer International Publishing, 2014.
[10] Tang, Linfeng, et al. "DIVFusion: Darkness-free infrared and visible image fusion." Information Fusion 91 (2023): 477-493.
[11] Li, Hui, et al. "Lrrnet: A novel representation learning guided fusion network for infrared and visible images." IEEE transactions on pattern analysis and machine intelligence 45.9 (2023): 11040-11052.
[12] X. Li, X. Guo, P. Han, X. Wang, H. Li and T. Luo, "Laplacian Redecomposition for Multimodal Medical Image Fusion," in IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 9, pp. 6880-6890, Sept. 2020, doi: 10.1109/TIM.2020.2975405.
[13] Ma, Jiayi, et al. "SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer." IEEE/CAA Journal of Automatica Sinica 9.7 (2022): 1200-1217.
[14] Cheng, Chunyang, Tianyang Xu, and **ao-Jun Wu. "MUFusion: A general unsupervised image fusion network based on memory unit." Information Fusion 92 (2023): 80-92.
[15] Xu, Han, et al. "U2Fusion: A unified unsupervised image fusion network." IEEE Transactions on Pattern Analysis and Machine Intelligence 44.1 (2020): 502-518.
[16] Zhao, Zixiang, et al. "Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Computer Science and Information Technology

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.







