OBJECT DETECTION BASED ON MOTION-GUIDED TOKENS

Organization Name

TOYOTA JIDOSHA KABUSHIKI KAISHA

Inventor(s)

Zhipeng Bao of Pittsburgh PA (US)

Pavel Tokmakov of Santa Monica CA (US)

Yuxiong Wang of Champaign IL (US)

Adrien David Gaidon of San Jose CA (US)

Martial Hebert of Pittsburgh PA (US)

OBJECT DETECTION BASED ON MOTION-GUIDED TOKENS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18394746 titled 'OBJECT DETECTION BASED ON MOTION-GUIDED TOKENS

- Simplified Explanation:**

The patent application describes a method for learning a representation of a sequence of frames by encoding the frames, extracting mid-level features, quantizing the features, and decoding them to reconstruct the frames.

- Key Features and Innovation:**
Encoding of frames using an encoder network
Extraction of mid-level features using a motion-guided slot learning mechanism
Quantization of mid-level features through vector quantization
Decoding of tokens using a decoder network
Optimization of reconstruction loss and motion loss to train the networks

- Potential Applications:**

This technology could be applied in video compression, video editing, surveillance systems, and computer vision tasks.

- Problems Solved:**

The method addresses the challenge of efficiently representing and reconstructing sequences of frames, especially in applications where storage or bandwidth is limited.

- Benefits:**
Improved compression of video data
Enhanced quality of reconstructed frames
Efficient representation of motion in video sequences

- Commercial Applications:**
Video streaming services
Security and surveillance systems
Virtual reality and augmented reality applications

- Prior Art:**

Prior research in video compression and computer vision may provide insights into similar methods for sequence representation and reconstruction.

- Frequently Updated Research:**

Researchers are continually exploring new techniques for improving video compression and representation methods, which may impact the development of this technology.

- Questions about the Technology:**

1. How does the motion-guided slot learning mechanism enhance the extraction of mid-level features? 2. What are the potential limitations of using vector quantization for quantizing mid-level features in this context?

Original Abstract Submitted

A method for learning a representation of a sequence of frames includes encoding, via an encoder network, the sequence of frames to obtain a set of feature maps and extracting, a motion-guided slot learning mechanism, mid-level features from the set of feature maps. The method further includes quantizing the mid-level features via a vector quantization process to obtain a set of tokens, and decoding, via a decoder network, the tokens to obtain a reconstructed sequence of frames. The method still further includes optimizing a combination of a reconstruction loss and a motion loss to train the encoder and decoder networks.

18394746. OBJECT DETECTION BASED ON MOTION-GUIDED TOKENS simplified abstract (TOYOTA JIDOSHA KABUSHIKI KAISHA)

OBJECT DETECTION BASED ON MOTION-GUIDED TOKENS

Organization Name

Inventor(s)

OBJECT DETECTION BASED ON MOTION-GUIDED TOKENS - A simplified explanation of the abstract

Original Abstract Submitted

Unlock Your AI Advantage