Self-Evolving Diagnostic Framework based Gated Residual Adapters and OpenClaw-Based Medical Agents

Shuai Feng; Pan Su

doi:10.62051/ijcsit.v8n4.01

Authors

Shuai Feng
Pan Su

DOI:

https://doi.org/10.62051/ijcsit.v8n4.01

Keywords:

Fundus Image Analysis, Gated Residual Adapter, Self-Evolving Framework, Medical Agent

Abstract

This Accurate and interpretable report generation from fundus images remains a critical yet challenging task in medical artificial intelligence, particularly due to the static nature of model adaptation and the limited evolvability of existing agent-based frameworks. Although ophthalmic foundation models have significantly improved visual representation learning through large-scale self-supervised pretraining, they lack mechanisms for continual adaptation during inference. Meanwhile, current agent-based approaches enhance reasoning but remain constrained by fixed cognitive structures. In this work, we propose a self-evolving diagnostic framework that unifies parametric adaptation and cognitive evolution for fundus report generation. Specifically, we introduce a gated residual adapter to enable dynamic, inference-time knowledge integration while preserving prior knowledge. Furthermore, we develop a medical agent architecture based on the OpenClaw paradigm, which continuously refines its core reasoning strategy through physician feedback and consistency-driven constraints. By coupling model-level adaptability with agent-level reasoning evolution, the proposed framework enables sustained performance improvement in complex clinical scenarios. This work provides a new perspective on building continuously evolving intelligent diagnostic systems for real-world healthcare applications.

Downloads

Download data is not yet available.

References

[1] Santos, A. R., Lopes, M., Santos, T., Reste-Ferreira, D., Marques, I. P., Yamaguchi, T. C., ... & Cunha-Vaz, J. (2024). Intraretinal microvascular abnormalities in eyes with advanced stages of nonproliferative diabetic retinopathy: comparison between UWF-FFA, CFP, and OCTA—the RICHARD study. Ophthalmology and Therapy, 13(12), 3161-3173.

[2] Zhou, Y., Chia, M. A., Wagner, S. K., Ayhan, M. S., Williamson, D. J., Struyven, R. R., ... & Keane, P. A. (2023). A foundation model for generalizable disease detection from retinal images. Nature, 622(7981), 156-163.

[3] Shurrab, S., & Duwairi, R. (2022). Self-supervised learning methods and applications in medical imaging analysis: A survey. PeerJ Computer Science, 8, e1045.

[4] Li, C., Wong, C., Zhang, S., Usuyama, N., Liu, H., Yang, J., ... & Gao, J. (2023). Llava-med: Training a large language-and-vision assistant for biomedicine in one day. Advances in Neural Information Processing Systems, 36, 28541-28564.

[5] Tang, X., Zou, A., Zhang, Z., Li, Z., Zhao, Y., Zhang, X., ... & Gerstein, M. (2024, August). Medagents: Large language models as collaborators for zero-shot medical reasoning. In Findings of the Association for Computational Linguistics: ACL 2024 (pp. 599-621).

[6] Chen, K., Qi, J., Huo, J., Tian, P., Meng, F., Yang, X., & Gao, Y. (2025, April). A self-evolving framework for multi-agent medical consultation based on large language models. In ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1-5). IEEE.

[7] Houlsby, N., Giurgiu, A., Jastrzebski, S., Morrone, B., De Laroussilhe, Q., Gesmundo, A., ... & Gelly, S. (2019, May). Parameter-efficient transfer learning for NLP. In International conference on machine learning (pp. 2790-2799). PMLR.

[8] Pfeiffer, J., Kamath, A., Rücklé, A., Cho, K., & Gurevych, I. (2021, April). Adapterfusion: Non-destructive task composition for transfer learning. In Proceedings of the 16th conference of the European chapter of the association for computational linguistics: main volume (pp. 487-503).

[9] Rebuffi, S. A., Bilen, H., & Vedaldi, A. (2017). Learning multiple visual domains with residual adapters. Advances in neural information processing systems, 30.

[10] Savarese, P., & Figueiredo, D. (2017). Residual gates: A simple mechanism for improved network optimization. In Proc. Int. Conf. Learn. Representations.

[11] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 770-778).

[12] Besenczi, R., Tóth, J., & Hajdu, A. (2016). A review on automatic analysis techniques for color fundus photographs. Computational and structural biotechnology journal, 14, 371-384.

[13] Kinouchi, R., Ishiko, S., Hanada, K., Hayashi, H., Mikami, D., & Yoshida, A. (2021). Identification of risk factors for retinal vascular events in a population-based cross-sectional study in Rumoi, Japan. Scientific Reports, 11(1), 6340.

[14] Yang, W. H., Xu, Y. W., & Sun, X. H. (2025). Guidelines for glaucoma imaging classification, annotation, and quality control for artificial intelligence applications. International Journal of Ophthalmology, 18(7), 1181.

[15] Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101.

[16] Johnson, O. V., Xinying, C., Khaw, K. W., & Lee, M. H. (2023). ps-CALR: periodic-shift cosine annealing learning rate for deep neural networks. IEEE access, 11, 139171-139186.

[17] Kalra, D. S., & Barkeshli, M. (2024). Why warmup the learning rate? underlying mechanisms and improvements. Advances in Neural Information Processing Systems, 37, 111760-111801.

[18] Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2020). BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234-1240.