A Review of Joint Optimization Methods for Neural Network Compression

Haoyu Hu

doi:10.62051/ijcsit.v8n3.11

Authors

Haoyu Hu

DOI:

https://doi.org/10.62051/ijcsit.v8n3.11

Keywords:

Neural network, Compression, Pruning, Quantization, Knowledge distillation

Abstract

As AI models scale, storage, compute, and energy block edge deployment. Single prunin as deep neural networks grow in size and complexity, deploying them on resource-constrained edge devices remains a critical challenge. This paper systematically investigates joint model compression techniques—specifically the integration of pruning, quantization, and knowledge distillation—to achieve efficient inference while preserving accuracy. In this work, I analyze three hybrid frameworks: pruning with distillation, quantization with distillation, and pruning with quantization. By comparing sequential and parallel optimization strategies, this work demonstrates that joint methods consistently outperform single-compression approaches, delivering higher compression ratios and better accuracy retention across benchmark models. Our results further reveal that co-optimizing multiple compression dimensions enables more effective model scaling-down, yet challenges such as stage-wise optimization gaps and hardware-aware design remain. This study not only synthesizes recent advances but also outlines practical pathways toward lightweight, hardware-friendly neural networks for edge AI deployment, highlighting the importance of integrated compression in real-world applications. g, quantization, or distillation struggles with accuracy vs. hardware trade-offs, pushing “pruning-quantization-distillation” combos. This paper surveys core compressors, summarizes three joint schemes—distillation + pruning, distillation + quantization, pruning + quantization—outlines their ideas, and charts future directions.

Downloads

Download data is not yet available.

References

[1] C, S. K., Dhiman, J. K., Adiga, N., & Singh, S. (2025). Synergistic Effects of Knowledge Distillation and Structured Pruning for Self-Supervised Speech Models. arXiv, 2502.05837.

[2] Han, S., Mao, H., & Dally, W. J. (2016). Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv, 1510.00149.

[3] Hongle, C., Qirui, S., Juan, C., & Quan, W. (2021). HKDP: A Hybrid Approach On Knowledge Distillation and Pruning for Neural Network Compression. 2021 18th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), 188–193.

[4] Lagunas, F., Charlaix, E., Sanh, V., & Rush, A. (2021). Block Pruning For Faster Transformers. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 10619–10629.

[5] Li, J., Gao, N., Shen, T., Zhang, W., Mei, T., & Ren, H. (2020). SketchMan: Learning to Create Professional Sketches. Proceedings of the 28th ACM International Conference on Multimedia, 3237–3245.

[6] Mishra, A., & Marr, D. (2017a). Apprentice: Using Knowledge Distillation Techniques To Improve Low-Precision Network Accuracy. arXiv, 1711.05852.

[7] Prakosa, S. W., Leu, J.-S., & Chen, Z.-H. (2021). Improving the accuracy of pruned network using knowledge distillation. Pattern Analysis and Applications, 24(2), 819–830.

[8] Tessier, H., Boukli, G., & Gripon, V. (2023). ThinResNet: A New Baseline for Structured Convolutional Networks Pruning. arXiv, 2309.12854.

[9] Wang, T., Wang, K., Cai, H., Lin, J., Liu, Z., Wang, H., Lin, Y., & Han, S. (2020). APQ: Joint Search for Network Architecture, Pruning and Quantization Policy. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2075–2084.

[10] Wei, Y., Pan, X., Qin, H., Ouyang, W., & Yan, J. (2018). Quantization Mimic: Towards Very Tiny CNN for Object Detection. In V. Ferrari, M. Hebert, C. Sminchisescu, & Y. Weiss (Eds.), Computer Vision – ECCV 2018. Springer International Publishing, 11212, 274–290.

[11] Wu, K., Zhang, J., Peng, H., Liu, M., Xiao, B., Fu, J., & Yuan, L. (2022). TinyViT: Fast Pretraining Distillation for Small Vision Transformers. In S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, & T. Hassner (Eds.), Computer Vision – ECCV 2022. Springer Nature Switzerland, 13681, 68–85.

[12] Yu, R., Li, A., Chen, C.-F., Lai, J.-H., Morariu, V. I., Han, X., Gao, M., Lin, C.-Y., & Davis, L. S. (2018). NISP: Pruning Networks Using Neuron Importance Score Propagation. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9194–9203.

[13] Zhang, R., Wu, Q., & Zhou, Y. (2025). Network Security Situation Element Extraction Algorithm Based on Hybrid Deep Learning. Electronics, 14(3), 553.