SEGMENTATION OF A SEQUENCE OF VIDEO IMAGES WITH A TRANSFORMER NETWORK

Organization Name

Inventor(s)

S. Alireza Golestaneh of Pittsburgh PA (US)

SEGMENTATION OF A SEQUENCE OF VIDEO IMAGES WITH A TRANSFORMER NETWORK - A simplified explanation of the abstract

This abstract first appeared for US patent application 18308452 titled 'SEGMENTATION OF A SEQUENCE OF VIDEO IMAGES WITH A TRANSFORMER NETWORK

Simplified Explanation

The patent application describes a method for converting a sequence of video frames into a sequence of scenes.

Features are extracted from each video frame and transformed into a feature representation in a first working space.
The interaction between each feature representation and other feature representations is determined to predict the frame.
The class of each scene that has already been determined is transformed into a scene representation in a second working space.
The interaction between each scene representation and all other scene representations is determined.
The interaction between each scene representation and each feature representation is determined.
Based on the scene-feature interactions, the most plausible class for the next scene in the sequence is determined considering the frame sequence and already-determined scenes.

Original Abstract Submitted

A method for transforming a frame sequence of video frames into a scene sequence of scenes. In the method: features are extracted from each video frame, and are transformed into a feature representation in a first working space; a feature interaction of each feature representation with the other feature representations is ascertained, characterizing a frame prediction; the class belonging to each already-ascertained scene is transformed into a scene representation in a second working space; a scene interaction of a scene representation with each of all the other scene representations is ascertained; a scene-feature interaction of each scene interaction with each feature interaction is ascertained; and from the scene-feature interactions, at least the class of the next scene in the scene sequence that is most plausible in view of the frame sequence and the already-ascertained scenes is ascertained.

US Patent Application 18308452. SEGMENTATION OF A SEQUENCE OF VIDEO IMAGES WITH A TRANSFORMER NETWORK simplified abstract

Contents

SEGMENTATION OF A SEQUENCE OF VIDEO IMAGES WITH A TRANSFORMER NETWORK

Organization Name

Inventor(s)