Jump to content

18394746. OBJECT DETECTION BASED ON MOTION-GUIDED TOKENS simplified abstract (TOYOTA JIDOSHA KABUSHIKI KAISHA)

From WikiPatents

OBJECT DETECTION BASED ON MOTION-GUIDED TOKENS

Organization Name

TOYOTA JIDOSHA KABUSHIKI KAISHA

Inventor(s)

Zhipeng Bao of Pittsburgh PA (US)

Pavel Tokmakov of Santa Monica CA (US)

Yuxiong Wang of Champaign IL (US)

Adrien David Gaidon of San Jose CA (US)

Martial Hebert of Pittsburgh PA (US)

OBJECT DETECTION BASED ON MOTION-GUIDED TOKENS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18394746 titled 'OBJECT DETECTION BASED ON MOTION-GUIDED TOKENS

    • Simplified Explanation:**

The patent application describes a method for learning a representation of a sequence of frames by encoding the frames, extracting mid-level features, quantizing the features, and decoding them to reconstruct the frames.

    • Key Features and Innovation:**
  • Encoding of frames using an encoder network
  • Extraction of mid-level features using a motion-guided slot learning mechanism
  • Quantization of mid-level features through vector quantization
  • Decoding of tokens using a decoder network
  • Optimization of reconstruction loss and motion loss to train the networks
    • Potential Applications:**

This technology could be applied in video compression, video editing, surveillance systems, and computer vision tasks.

    • Problems Solved:**

The method addresses the challenge of efficiently representing and reconstructing sequences of frames, especially in applications where storage or bandwidth is limited.

    • Benefits:**
  • Improved compression of video data
  • Enhanced quality of reconstructed frames
  • Efficient representation of motion in video sequences
    • Commercial Applications:**
  • Video streaming services
  • Security and surveillance systems
  • Virtual reality and augmented reality applications
    • Prior Art:**

Prior research in video compression and computer vision may provide insights into similar methods for sequence representation and reconstruction.

    • Frequently Updated Research:**

Researchers are continually exploring new techniques for improving video compression and representation methods, which may impact the development of this technology.

    • Questions about the Technology:**

1. How does the motion-guided slot learning mechanism enhance the extraction of mid-level features? 2. What are the potential limitations of using vector quantization for quantizing mid-level features in this context?


Original Abstract Submitted

A method for learning a representation of a sequence of frames includes encoding, via an encoder network, the sequence of frames to obtain a set of feature maps and extracting, a motion-guided slot learning mechanism, mid-level features from the set of feature maps. The method further includes quantizing the mid-level features via a vector quantization process to obtain a set of tokens, and decoding, via a decoder network, the tokens to obtain a reconstructed sequence of frames. The method still further includes optimizing a combination of a reconstruction loss and a motion loss to train the encoder and decoder networks.

Cookies help us deliver our services. By using our services, you agree to our use of cookies.