Comparative Analysis of Video Frame Interpolation from Optical Flow to Diffusion Models

Tianlang Yin

doi:10.62051/qxv9y811

Authors

Tianlang Yin

DOI:

https://doi.org/10.62051/qxv9y811

Keywords:

Video Frame Interpolation; Optical Flow; Transformer; Diffusion Model.

Abstract

Video Frame Interpolation (VFI) is essential in handling video processing to fill in the gaps between the initial and final frames and increase temporal resolution. This method is critical in applications like frame rate up-sampling, slow-motion rendering, and video improvement. This work compares and evaluates the merits and limitations of several different VFI methods based on their structures and interpolation performance. This paper summarizes conventional optical flow-based methods, kernel-based models, hybrid models based on depth estimation, flow-agnostic convolutional models, Transformer models, and new generative diffusion models. In particular, this paper compares each method's structural form, movement handling ability, and efficiency. Experimental evaluation demonstrates that transformer models, as well as diffusion models, are superior in treating large and complicated motions. By comparison, models such as Flow-agnostic video representations (FLAVR) balance efficiency and accuracy, making them ideal for real-time processing. Experimental evaluations indicate that the development of VFI methods shifts toward data-driven and globally conscious structures to capture the richness of motions better. Such findings inform future research and advance the real-time handling of video applications.

Downloads

Download data is not yet available.

References

[1] MENGISTU Biruk. Deep-Learning Realtime Upsampling Techniques in Video Games. Scholarly Horizons: University of Minnesota, Morris Undergraduate Journal. 2023, 10(2): 4.

[2] ZHU Tian Yi, et al. Generative Inbetweening through Frame-wise Conditions-Driven Video Generation. 2024.

[3] WIJMA Ruth, YOU Shao Di, and LI Yu. Multi-level adaptive separable convolution for large-motion video frame interpolation. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021, 1127-1135.

[4] HUANG Zhe Wei, et al. Real-time intermediate flow estimation for video frame interpolation. European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2022: 624-642.

[5] ZHAO Bin, and LI Xue Long. Edge-aware network for flow-based video frame interpolation. IEEE Transactions on Neural Networks and Learning Systems, 2022, 35(1): 1401-1408.

[6] SHI Zhi Hao, et al. Video frame interpolation transformer. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022: 17482-17491.

[7] CHEN Mu, et al. General and Task-Oriented Video Segmentation. European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024: 72-92.

[8] VRSKOVA Roberta, et al. Human activity classification using the 3DCNN architecture. Applied Sciences, 2022, 12(2): 931.

[9] ZHU Qi, et al. Exploring temporal frequency spectrum in deep video deblurring. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 12428-12437.

[10] ROMERA Thomas, et al. Optical flow algorithms optimized for speed, energy and accuracy on embedded GPUs. Journal of Real-Time Image Processing, 2023, 20(2): 32.

[11] REVAUD Jerome, et al. Epicflow: Edge-preserving interpolation of correspondences for optical flow. Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1164-1172.

[12] BAO Wen Bo, et al. Depth-aware video frame interpolation. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 3703-3712.

[13] HUANG Zhi Lin, et al. Motion-aware latent diffusion models for video frame interpolation. Proceedings of the ACM International Conference on Multimedia. 2024, 1043-1052.

[14] KALLURI Tarun, et al. Flavr: Flow-agnostic video representations for fast frame interpolation. Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2023, 2071-2082.