Towards Efficient LLMs: Analyzing Computational Bottlenecks and Optimization Strategies

Taowen Qian

doi:10.62051/m633as12

Authors

Taowen Qian

DOI:

https://doi.org/10.62051/m633as12

Keywords:

Large Language Model (LLMs); efficiency; USER-LLM; OPTIMA; Infinite-LLM.

Abstract

The study major focuses on the efficiency of the current Large Language Model (LLMs). By researching several papers that focus on it, the limitations of the current efficiency in LLM are significant problems that need to be considered by academia. Then, the study will provide some research on the progress of solving issues and explain each solution clearly. Finally, the study will focus on the further needs for developing each solution. This study is conducted on the USER-LLM, OPTIMA, and Infinite-LLM systems that can solve the efficiency problems in LLM and find some benefits in improving LLM efficiency limitations. Experimental results show that some issues in each system need to be solved in further research. This study can explain the main efficiency problems in current LLMs and provide direction for further research. With more research on the efficiency problem, computational costs and response times will decrease, enabling real-time decision-making improvement.

Downloads

Download data is not yet available.

References

[1] BEAUDETTE de, OGEEN at. An iPhone application for on‐demand access to digital soil survey information. Soil Science Society of America Journal, 2010, 74(5): 1682-1684.

[2] LAZIC Aleksandar, et al. Google Assistant integration in TV application for Android OS. Telecommunications Forum (TELFOR). 2018, 420-425.

[3] MATHEW, S., & VARIA, J. Overview of amazon web services. Amazon Whitepapers, 2014, 105(1), 22.

[4] WU TSUNG Han, et al. Self-correcting llm-controlled diffusion models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024, 6327-6336.

[5] ZHOU Hou Quan, et al. A Training-free LLM-based Approach to General Chinese Character Error Correction, arXiv preprint arXiv:2502.15266, 2025.

[6] PAN Liang Ming, et al. Automatically correcting large language models: Surveying the landscape of diverse automated correction strategies. Transactions of the Association for Computational Linguistics, 2024, 12: 484-506.

[7] QIU Rui Zhong, et al. How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark, arXiv preprint arXiv:2406.06647, 2024.

[8] NING Lin, et al. User-llm: Efficient llm contextualization with user embeddings, arXiv preprint arXiv:2402.13598, 2024.

[9] LI Xin Yi, et al. A survey on LLM-based multi-agent systems: workflow, infrastructure, and challenges. Vicinagearth, 2024, 1(1): 9.

[10] LIN Bin, et al. Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache, arXiv preprint arXiv:2401.02669, 2024.