General Multi-modal Image Fusion Transformer Network

Authors

  • Ao Dong
  • Zhi Wang

DOI:

https://doi.org/10.62051/ijcsit.v5n1.02

Keywords:

Image fusion, Transformer, End-to-end network

Abstract

In the field of image fusion, images obtained from multiple different sensors are fused into one image that contains more complementary information and fewer redundant features to produce a single image with enhanced information. In order to be accurate and concise for different Fusion tasks, this paper proposes a General end-to-end Multi-modal image fusion Transformer Network (GMTN). A two-branch feature extraction module is designed, which integrates the self-attention mechanism of the improved convolutional neural network CNN and Transformer to extract the short-range features and remote dependencies of the image respectively, and take into account various information of the fused image in a more comprehensive way. A large number of experimental results show that the proposed method achieves the same or even better performance than the existing image fusion on a variety of multi-modal medical image data sets, and also achieves good results on infrared and visible light, multi-focus and multi-exposure image fusion tasks.

Downloads

Download data is not yet available.

References

[1] Zhang, Hao, et al. "Image fusion meets deep learning: A survey and perspective." Information Fusion 76 (2021): 323-336.

[2] Ardeshir, Goshtasby A., and S. Nikolov. "Image fusion: Advances in the state of the art." Information Fusion 8.2 (2007): 114-118.

[3] Li, Hui, **anbiao Qi, and Wuyuan **e. "Fast infrared and visible image fusion with structural decomposition." Knowledge-Based Systems 204 (2020): 106182.

[4] Zhang, Hao, et al. "Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity." Proceedings of the AAAI conference on artificial intelligence. Vol. 34. No. 07. 2020.

[5] Wu, Minghui, et al. "Infrared and visible image fusion via joint convolutional sparse representation." JOSA A 37.7 (2020): 1105-1115.

[6] Tang, Wei, et al. "A phase congruency‐based green fluorescent protein and phase contrast image fusion method in nonsubsampled shearlet transform domain." Microscopy research and technique 83.10 (2020): 1225-1234.

[7] Ho, Jonathan, et al. "Axial attention in multidimensional transformers." arxiv preprint arxiv:1912.12180 (2019).

[8] Harvard medical website. http://www.med.harvard.edu/AANLIB/home.html

[9] Lin, Tsung-Yi, et al. "Microsoft coco: Common objects in context." Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer International Publishing, 2014.

[10] Tang, Linfeng, et al. "DIVFusion: Darkness-free infrared and visible image fusion." Information Fusion 91 (2023): 477-493.

[11] Li, Hui, et al. "Lrrnet: A novel representation learning guided fusion network for infrared and visible images." IEEE transactions on pattern analysis and machine intelligence 45.9 (2023): 11040-11052.

[12] X. Li, X. Guo, P. Han, X. Wang, H. Li and T. Luo, "Laplacian Redecomposition for Multimodal Medical Image Fusion," in IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 9, pp. 6880-6890, Sept. 2020, doi: 10.1109/TIM.2020.2975405.

[13] Ma, Jiayi, et al. "SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer." IEEE/CAA Journal of Automatica Sinica 9.7 (2022): 1200-1217.

[14] Cheng, Chunyang, Tianyang Xu, and **ao-Jun Wu. "MUFusion: A general unsupervised image fusion network based on memory unit." Information Fusion 92 (2023): 80-92.

[15] Xu, Han, et al. "U2Fusion: A unified unsupervised image fusion network." IEEE Transactions on Pattern Analysis and Machine Intelligence 44.1 (2020): 502-518.

[16] Zhao, Zixiang, et al. "Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023.

Downloads

Published

23-01-2025

Issue

Section

Articles

How to Cite

Dong, A., & Wang, Z. (2025). General Multi-modal Image Fusion Transformer Network. International Journal of Computer Science and Information Technology, 5(1), 16-21. https://doi.org/10.62051/ijcsit.v5n1.02