Intel corporation (20240104915). LONG DURATION STRUCTURED VIDEO ACTION SEGMENTATION simplified abstract

From WikiPatents
Jump to navigation Jump to search

LONG DURATION STRUCTURED VIDEO ACTION SEGMENTATION

Organization Name

intel corporation

Inventor(s)

Anthony Daniel Rhodes of Portland OR (US)

Byungsu Min of Monroeville PA (US)

Subarna Tripathi of San Diego CA (US)

Giuseppe Raffa of Portland OR (US)

Sovan Biswas of Bonn (DE)

LONG DURATION STRUCTURED VIDEO ACTION SEGMENTATION - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240104915 titled 'LONG DURATION STRUCTURED VIDEO ACTION SEGMENTATION

Simplified Explanation

Machine learning models can process videos to generate outputs such as action segmentation and action classification. Some models may struggle with long, structured videos, but a hybrid architecture involving temporal convolutional and graph neural networks can be effective.

  • Temporal convolutional network and bi-directional graph neural network are combined in a hybrid architecture.
  • Temporal convolutional network processes videos to generate frame-wise features.
  • Frame-wise features are converted into a graph for further processing by a graph neural network.
  • Graph neural network refines final per-frame action predictions.

Potential Applications

This technology can be applied in various fields such as video analysis, surveillance, sports analytics, and entertainment industry for accurate action segmentation and classification.

Problems Solved

This technology addresses the challenge of accurately segmenting actions in long, structured videos where traditional machine learning models may struggle.

Benefits

The hybrid architecture improves the accuracy of action segmentation in long videos, providing more detailed and precise per-frame action predictions.

Potential Commercial Applications

Potential commercial applications include video editing software, sports analytics tools, security systems, and entertainment platforms.

Possible Prior Art

Prior art may include existing machine learning models for video analysis and action recognition, but the specific hybrid architecture described in this patent application may be novel.

Unanswered Questions

How does this technology compare to existing methods for action segmentation in videos?

This technology combines temporal convolutional and graph neural networks for improved action segmentation in long, structured videos. It would be interesting to compare its performance with other state-of-the-art methods in the field.

What computational resources are required to implement this hybrid architecture effectively?

Implementing a hybrid architecture involving temporal convolutional and graph neural networks may require significant computational resources. Understanding the resource requirements can help in practical implementation and scalability of the technology.


Original Abstract Submitted

machine learning models can process a video and generate outputs such as action segmentation assigning portions of the video to a particular action, or action classification assigning an action class for each frame of the video. some machine learning models can accurately make predictions for short videos but may not be particularly suited for performing action segmentation for long duration, structured videos. an effective machine learning model may include a hybrid architecture involving a temporal convolutional network and a bi-directional graph neural network. the machine learning model can process long duration structured videos by using a temporal convolutional network as a first pass action segmentation model to generate rich, frame-wise features. the frame-wise features can be converted into a graph having forward edges and backward edges. a graph neural network can process the graph to refine a final fine-grain per-frame action prediction.