Google llc (20240212347). MEMORY-GUIDED VIDEO OBJECT DETECTION simplified abstract

From WikiPatents
Jump to navigation Jump to search

MEMORY-GUIDED VIDEO OBJECT DETECTION

Organization Name

google llc

Inventor(s)

Dmitry Kalenichenko of Los Angeles CA (US)

Menglong Zhu of Playa Vista CA (US)

Marie Charisse White of Mountain View CA (US)

Mason Liu of Acton MA (US)

Yinxiao Li of Sunnyvale CA (US)

MEMORY-GUIDED VIDEO OBJECT DETECTION - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240212347 titled 'MEMORY-GUIDED VIDEO OBJECT DETECTION

The patent application describes systems and methods for detecting objects in a video using an interleaved object detection model with multiple feature extractor networks and a shared memory layer.

  • Input a video with multiple frames into the interleaved object detection model.
  • Select a feature extractor network to analyze each frame.
  • Analyze frames to determine features.
  • Update features based on previous frame's features stored in shared memory.
  • Detect objects in frames based on updated features.
      1. Key Features and Innovation:
  • Utilizes multiple feature extractor networks for object detection in videos.
  • Incorporates a shared memory layer to store and update features for accurate object detection.
  • Analyzes frames individually to improve object detection accuracy.
      1. Potential Applications:
  • Video surveillance systems
  • Autonomous vehicles
  • Augmented reality applications
      1. Problems Solved:
  • Enhances object detection accuracy in videos.
  • Improves real-time object detection performance.
      1. Benefits:
  • Increased accuracy in detecting objects in videos.
  • Enhanced performance in real-time object detection applications.
      1. Commercial Applications:
        1. Title: Advanced Video Object Detection Technology

This technology can be used in various industries such as security, automotive, and entertainment for efficient and accurate object detection in videos. It can revolutionize surveillance systems, enhance safety in autonomous vehicles, and improve user experiences in augmented reality applications.

      1. Prior Art:

There may be prior art related to object detection models in videos using multiple feature extractor networks and shared memory layers. Researchers and developers in the field of computer vision and artificial intelligence may have explored similar techniques.

      1. Frequently Updated Research:

Researchers in the field of computer vision are constantly working on improving object detection models in videos. Stay updated on the latest advancements in feature extraction and object detection algorithms for videos.

        1. Questions about Video Object Detection Technology:

1. How does the shared memory layer improve object detection accuracy in videos?

  - The shared memory layer stores and updates features from previous frames, allowing for better analysis and detection of objects in subsequent frames.

2. What are the potential limitations of using multiple feature extractor networks in video object detection?

  - The use of multiple feature extractor networks may increase computational complexity and require efficient memory management for optimal performance.


Original Abstract Submitted

systems and methods for detecting objects in a video are provided. a method can include inputting a video comprising a plurality of frames into an interleaved object detection model comprising a plurality of feature extractor networks and a shared memory layer. for each of one or more frames, the operations can include selecting one of the plurality of feature extractor networks to analyze the one or more frames, analyzing the one or more frames by the selected feature extractor network to determine one or more features of the one or more frames, determining an updated set of features based at least in part on the one or more features and one or more previously extracted features extracted from a previous frame stored in the shared memory layer, and detecting an object in the one or more frames based at least in part on the updated set of features.