Multi-resolution Transformer for Video Quality Assessment

Abstract: a no-reference video assessment framework employs a multi-resolution input representation and a patch sampling mechanism on a video having multiple frames to aggregate information across different granularities in spatial and temporal dimensions. the framework effectively models complex spacetime distortions that occur in user generated content-type videos. according to one aspect, the framework embeds video clips as multi-resolution patch tokens using complementary modules. this includes a multi-resolution video embedding module , and a space-time factorized transformer encoding module . the multi-resolution video embedding module is configured to encode multi-scale quality information in the video, capturing both global video composition from lower resolution frame and local details from larger resolution frames. the space-time factorized transformer encoding module aggregates the spatial and temporal quality from the multi-scale embedding input, and is configured to output a quality score for the input video.

Inventor(s): Junjie Ke, Tianhao Zhang, Yilin Wang, Peyman Milanfar, Feng Yang

CPC Classification: G06T3/4046 (using neural networks)

Search for rejections for patent application number 20250173821

20250173821. Multi-resolution Transformer Vi (Google LLC)

Multi-resolution Transformer for Video Quality Assessment

Transform your business with AI in minutes, not months