Jump to content

GOOGLE LLC (20240265917). Cascaded Audiovisual Automatic Speech Recognition Models simplified abstract

From WikiPatents
Revision as of 07:31, 8 August 2024 by Unknown user (talk) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Cascaded Audiovisual Automatic Speech Recognition Models

Organization Name

GOOGLE LLC

Inventor(s)

Oscar Chang of New York NY (US)

Cascaded Audiovisual Automatic Speech Recognition Models - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240265917 titled 'Cascaded Audiovisual Automatic Speech Recognition Models

The method described in the patent application involves processing a sequence of acoustic frames to generate higher-order feature representations using an audio encoder. These representations are then used in conjunction with corresponding video frames to create audiovisual higher-order feature representations through an audiovisual encoder. A joint network then utilizes these representations to generate probability distributions over possible speech recognition hypotheses.

  • Receiving a sequence of acoustic frames
  • Generating acoustic higher-order feature representations at each output step
  • Creating audiovisual higher-order feature representations for paired acoustic and video frames
  • Generating probability distributions for speech recognition hypotheses based on these representations
  • Utilizing a joint network for processing both audio and audiovisual representations

Potential Applications: - Speech recognition systems - Audiovisual content analysis - Multimodal communication technologies

Problems Solved: - Enhancing speech recognition accuracy - Improving audiovisual synchronization - Facilitating multimodal data processing

Benefits: - Increased accuracy in speech recognition - Enhanced audiovisual content analysis - Improved performance in multimodal communication

Commercial Applications: Title: Advanced Speech Recognition and Audiovisual Analysis Technology This technology can be utilized in various industries such as: - Telecommunications - Media and entertainment - Security and surveillance

Questions about the technology: 1. How does this technology improve speech recognition accuracy? 2. What are the potential challenges in implementing this technology in real-world applications?

Frequently Updated Research: Stay updated on the latest advancements in speech recognition technology and audiovisual analysis to enhance the performance and capabilities of this innovation.


Original Abstract Submitted

a method includes receiving a sequence of acoustic frames and generating, by an audio encoder, at each of a plurality of output steps, an acoustic higher-order feature representation for a corresponding acoustic frame in the sequence of acoustic frames. for each acoustic frame in the sequence of acoustic frames paired with a corresponding video frame, the method includes generating, by an audiovisual encoder, an audiovisual higher-order feature representation for the corresponding acoustic higher-order feature frame and the corresponding video frame; and generating, by a joint network, at an output step, a probability distribution over possible speech recognition hypotheses based on the audiovisual higher-order feature representation. the method, for each corresponding acoustic frame in the sequence of acoustic frames not paired with a corresponding video frame, includes generating, by the joint network, at an output step, a probability distribution over possible speech recognition hypotheses based on the acoustic higher-order feature representation.

(Ad) Transform your business with AI in minutes, not months

Custom AI strategy tailored to your specific industry needs
Step-by-step implementation with measurable ROI
5-minute setup that requires zero technical skills
Get your AI playbook

Trusted by 1,000+ companies worldwide

Cookies help us deliver our services. By using our services, you agree to our use of cookies.