ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring - UTU Research Portal

A4 Refereed article in a conference publication

ARVo: Learning All-Range Volumetric Correspondence for Video Deblurring

Authors: Li Dongxu, Xu Chenchen, Zhang Kaihao, Yu Xin, Zhong Yiran, Ren Wenqi, Suominen Hanna, Li Hongdong

Conference name: IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Publication year: 2021

Journal: IEEE Computer Society Conference on Computer Vision and Pattern Recognition

Book title : 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Journal name in source: 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021

Journal acronym: PROC CVPR IEEE

First page : 7717

Last page: 7727

Number of pages: 11

ISSN: 1063-6919

DOI: https://doi.org/10.1109/CVPR46437.2021.00763

Publication's open availability at the time of reporting: No Open Access

Publication channel's open availability : No Open Access publication channel

Abstract

Video deblurring models exploit consecutive frames to remove blurs from camera shakes and object motions. In order to utilize neighboring sharp patches, typical methods rely mainly on homography or optical flows to spatially align neighboring blurry frames. However, such explicit approaches are less effective in the presence of fast motions with large pixel displacements. In this work, we propose a novel implicit method to learn spatial correspondence among blurry frames in the feature space. To construct distant pixel correspondences, our model builds a correlation volume pyramid among all the pixel-pairs between neighboring frames. To enhance the features of the reference frame, we design a correlative aggregation module that maximizes the pixel-pair correlations with its neighbors based on the volume pyramid. Finally, we feed the aggregated features into a reconstruction module to obtain the restored frame. We design a generative adversarial paradigm to optimize the model progressively. Our proposed method is evaluated on the widely-adopted DVD dataset, along with a newly collected High-Frame-Rate (1000 fps) Dataset for Video Deblurring (HFR-DVD). Quantitative and qualitative experiments show that our model performs favorably on both datasets against previous state-of-the-art methods, confirming the benefit of modeling all-range spatial correspondence for video deblurring.