18726881. Machine Learning Models Featuring Resolution-Flexible Multi-Axis Attention Blocks (GOOGLE LLC)
Machine Learning Models Featuring Resolution-Flexible Multi-Axis Attention Blocks
Organization Name
Inventor(s)
Yinxiao Li of Sunnyvale CA (US)
Zhengzhong Tu of Austin TX (US)
Hossein Talebi of San Jose CA (US)
Han Zhang of Sunnyvale CA (US)
Feng Yang of Sunnyvale CA (US)
Peyman Milanfar of Menlo Park CA (US)
Machine Learning Models Featuring Resolution-Flexible Multi-Axis Attention Blocks
This abstract first appeared for US patent application 18726881 titled 'Machine Learning Models Featuring Resolution-Flexible Multi-Axis Attention Blocks
Original Abstract Submitted
Provided are machine learning systems and models featuring resolution-flexible multi-axis attention blocks. In particular, the present disclosure provides example multi-axis MLP based architectures (example implementations of which can be generally referred to as MAXIM) that can serve as an efficient and flexible general-purpose vision backbone for image processing tasks. In some implementations, MAXIM can use a UNet-shaped hierarchical structure and supports long-range interactions enabled by spatially-gated MLPs. Specifically, some example implementations of MAXIM can contain two MLP-based building blocks: a multi-axis gated MLP that allows for efficient and scalable spatial mixing of local and global visual cues, and a cross-gating block, an alternative to cross-attention, which accounts for cross-feature mutual conditioning.