Intel Corporation (20240256839). METHODS, SYSTEMS, ARTICLES OF MANUFACTURE AND APPARATUS TO GENERATE FLOW AND AUDIO MULTI-MODAL OUTPUT simplified abstract

From WikiPatents
Jump to navigation Jump to search

METHODS, SYSTEMS, ARTICLES OF MANUFACTURE AND APPARATUS TO GENERATE FLOW AND AUDIO MULTI-MODAL OUTPUT

Organization Name

Intel Corporation

Inventor(s)

Jiaxiang Jiang of Santa Clara CA (US)

Mahesh Subedar of Portland OR (US)

METHODS, SYSTEMS, ARTICLES OF MANUFACTURE AND APPARATUS TO GENERATE FLOW AND AUDIO MULTI-MODAL OUTPUT - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240256839 titled 'METHODS, SYSTEMS, ARTICLES OF MANUFACTURE AND APPARATUS TO GENERATE FLOW AND AUDIO MULTI-MODAL OUTPUT

The patent application describes methods, systems, and apparatus for generating flow and audio multi-modal output.

  • Interface circuitry, machine-readable instructions, and at least one processor circuit are used to train an unsupervised image model to generate flow tensors based on a reference frame and a driver frame.
  • The flow tensors represent rotation or translation information.
  • A denoising diffusion probabilistic model (DDPM) is trained based on the flow tensors, audio distributions, and prompt signals to temporally align the flow tensors with the audio distributions.

Potential Applications: - This technology could be used in virtual reality systems to enhance the immersive experience by synchronizing audio and visual elements. - It could also have applications in video editing software to improve the alignment of audio and visual components.

Problems Solved: - The technology addresses the challenge of aligning flow tensors with audio distributions in a seamless and accurate manner. - It also tackles the issue of training unsupervised image models and DDPMs to work together effectively.

Benefits: - Improved synchronization of audio and visual elements in multimedia content. - Enhanced user experience in virtual reality and video editing applications.

Commercial Applications: Title: Enhanced Audio-Visual Synchronization Technology for Virtual Reality and Video Editing This technology could be commercially utilized in the development of virtual reality headsets, video editing software, and multimedia content creation tools. It has the potential to improve user engagement and overall quality of audio-visual content.

Questions about the Technology: 1. How does this technology improve the user experience in virtual reality applications? 2. What sets this audio-visual synchronization technology apart from existing solutions?


Original Abstract Submitted

methods, systems, articles of manufacture, apparatus and methods are disclosed to generate flow and audio multi-modal output. an example apparatus includes interface circuitry, machine-readable instructions, and at least one processor circuit programmed by the machine-readable instructions to train an unsupervised image model to generate flow tensors based on a reference frame and a driver frame, the flow tensors representing at least one of rotation information or translation information. the example apparatus also includes at least one processor circuit programmed by the machine-readable instructions to train a denoising diffusion probabilistic model (ddpm) based on (a) the flow tensors, (b) audio distributions and (c) prompt signals, the trained ddpm to temporally align the flow tensors with the audio distributions.