US Patent Application 18308452. SEGMENTATION OF A SEQUENCE OF VIDEO IMAGES WITH A TRANSFORMER NETWORK simplified abstract

From WikiPatents
Jump to navigation Jump to search

SEGMENTATION OF A SEQUENCE OF VIDEO IMAGES WITH A TRANSFORMER NETWORK

Organization Name

Robert Bosch GmbH


Inventor(s)

Nadine Behrmann of München (DE)

Mehdi Noroozi of Stuttgart (DE)

S. Alireza Golestaneh of Pittsburgh PA (US)

SEGMENTATION OF A SEQUENCE OF VIDEO IMAGES WITH A TRANSFORMER NETWORK - A simplified explanation of the abstract

This abstract first appeared for US patent application 18308452 titled 'SEGMENTATION OF A SEQUENCE OF VIDEO IMAGES WITH A TRANSFORMER NETWORK

Simplified Explanation

The patent application describes a method for converting a sequence of video frames into a sequence of scenes.

  • Features are extracted from each video frame and transformed into a feature representation in a first working space.
  • The interaction between each feature representation and other feature representations is determined to predict the frame.
  • The class of each scene that has already been determined is transformed into a scene representation in a second working space.
  • The interaction between each scene representation and all other scene representations is determined.
  • The interaction between each scene representation and each feature representation is determined.
  • Based on the scene-feature interactions, the most plausible class for the next scene in the sequence is determined considering the frame sequence and already-determined scenes.


Original Abstract Submitted

A method for transforming a frame sequence of video frames into a scene sequence of scenes. In the method: features are extracted from each video frame, and are transformed into a feature representation in a first working space; a feature interaction of each feature representation with the other feature representations is ascertained, characterizing a frame prediction; the class belonging to each already-ascertained scene is transformed into a scene representation in a second working space; a scene interaction of a scene representation with each of all the other scene representations is ascertained; a scene-feature interaction of each scene interaction with each feature interaction is ascertained; and from the scene-feature interactions, at least the class of the next scene in the scene sequence that is most plausible in view of the frame sequence and the already-ascertained scenes is ascertained.