A Review of Applications of Speech Synthesis Technology

Wenhao Xiong

doi:10.62051/f4gcpp59

Authors

Wenhao Xiong

DOI:

https://doi.org/10.62051/f4gcpp59

Keywords:

Speech synthesis technology; Application; Prospect.

Abstract

Currently, the development of visual technologies and applications is much more advanced than that of speech, but as both speech and vision are equally important and attractive, their potentials should be comparable. In order to develop speech technology and prove its application potential, this paper presents current status of speech synthesis technology in four applications: spoken language education, digital music, virtual character, and language protection and dissemination, and then points out its potential in different stages of spoken language learning, music generation similar to current picture generation, virtual characters that move people in ways other than through singing, and new ways of voice protection and dissemination both directly and indirectly, and finally discusses the balance of the development of speech synthesis technology. By analyzing several potential applications of speech synthesis technology from application areas, this paper not only proves the development potential of speech, but also shows the way for subsequent research to find innovative inspiration from application areas.

Downloads

Download data is not yet available.

References

[1] Wei W H. Overview and Research Status of Speech Synthesis Technology. Software, 2020, 41(12): 214-217.

[2] Liu Y. Research and implementation of speech synthesis system based on deep learning. Beijing: Beijing Jiaotong University, 2022.

[3] Liu Y F. A review of the application of artificial intelligence in speech synthesis. Big Data and Artificial Intelligence, 2024, 5(1).

[4] Chen C Y. Speech Synthesis Technology: Status and Challenges. ITM Web of Conferences. EDP Sciences, 2025, 73: 02006.

[5] Gao S. The Influence of Speech Synthesis Technology on Traditional Broadcast Host and Its Development Path. Television Technology, 2024, 48(6): 109-111.

[6] Latif S, Qadir J, Qayyum A, Usama M, Younis S. Speech Technology for Healthcare: Opportunities, Challenges, and State of the Art. IEEE Reviews in Biomedical Engineering, 2021, 14: 342-356

[7] Fujita K, Ashihara T, Delcroix M, et al. Lightweight Zero-shot Text-to-Speech with Mixture of Adapters. arXiv, 2024.

[8] Wu Z J, Liu D, Li M. Lightweight Language Model for Speech Synthesis: Attempts and Analysis. 2024 IEEE 14th International Symposium on Chinese Spoken Language Processing, 2024: 501-505

[9] Gong C, Wang X, Erica C, et al. ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023.

[10] Liu L, Sui J P, Ding D, et al. Reasearch progress and prospects of deep learning for visual speech generation. Journal of National University of Defense Technology, 2024, 46(2): 123-138.

[11] Liu Y X. Research on the practical application of AI speech synthesis technology in the creation of multi-person audio drama. Anhui: Anhui University, 2024.

[12] Lin Y Q, Zhang X X. From Endangerment to Empowerment—An Exploration of Multimodal AI Technologies in Linguistic Diversity Conservation and International Communication Strategies Innovation. Modern Linguistics, 2024, 12(06): 520-529.

[13] Zhang Y. The Construction Practice of the Project for Protecting Language Resources of China: in the Case of the Integration of Anhui Dialects into the Teaching of"Modern Chinese". Journal of Wuhu Vocational Institute of Technology, 2023, 25(2): 64-67.

[14] Lux F, Meyer S, Behringer L, et al. Meta learning text-to-speech synthesis in over 7000 languages. arXiv preprint, 2024.