Nvidia corporation (20240112021). AUTOMATIC SPEECH RECOGNITION WITH MULTI-FRAME BLANK DECODING USING NEURAL NETWORKS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS simplified abstract

From WikiPatents
Jump to navigation Jump to search

AUTOMATIC SPEECH RECOGNITION WITH MULTI-FRAME BLANK DECODING USING NEURAL NETWORKS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS

Organization Name

nvidia corporation

Inventor(s)

Hainan Xu of Baltimore MD (US)

Boris Ginsburg of Sunnyvale CA (US)

AUTOMATIC SPEECH RECOGNITION WITH MULTI-FRAME BLANK DECODING USING NEURAL NETWORKS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240112021 titled 'AUTOMATIC SPEECH RECOGNITION WITH MULTI-FRAME BLANK DECODING USING NEURAL NETWORKS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS

Simplified Explanation

The patent application describes a machine learning system that trains a model to output a multi-frame blank symbol when processing auditory input, allowing for more efficient and accurate processing of audio frames.

  • The system generates paths through a probability lattice, some of which include a multi-frame blank symbol that skips at least one frame.
  • The inclusion of the multi-frame blank symbol increases the total number of potential paths through the lattice, enabling the model to process audio frames more quickly and accurately.
  • When the model outputs a multi-frame blank symbol, one or more frames of the auditory input can be omitted from processing.

Potential Applications

This technology could be applied in speech recognition systems, audio processing software, and voice-controlled devices.

Problems Solved

This technology helps improve the efficiency and accuracy of processing audio frames, allowing for better performance in tasks such as speech recognition and audio analysis.

Benefits

The use of multi-frame blank symbols can lead to faster and more accurate processing of auditory input, resulting in improved performance of machine learning models.

Potential Commercial Applications

Potential commercial applications include speech-to-text software, virtual assistants, and automated transcription services.

Possible Prior Art

One possible prior art could be the use of blank symbols in language modeling or speech recognition systems to improve processing efficiency.

What is the impact of this technology on the accuracy of speech recognition systems?

The technology can significantly improve the accuracy of speech recognition systems by allowing the model to focus on more relevant audio frames and disregard less valuable ones.

How does the use of multi-frame blank symbols affect the speed of audio processing in machine learning models?

The use of multi-frame blank symbols can speed up audio processing by enabling the model to skip certain frames, reducing the computational load and processing time.


Original Abstract Submitted

systems and methods provide for a machine learning system to train a machine learning model to output a multi-frame blank symbol when processing an auditory input. for example, as the system generates paths through a probability lattice, one or more paths include a multi-frame blank that skips at least one frame associated with the probability lattice. the inclusion of the multi-frame blank symbol may increase a total number of potential paths through the probability lattice, and may allow the machine learning model to more quickly and accurately process audio frames, while disregarding audio frames of less value. in deployment, when an output of the machine learning model indicates a multi-frame blank symbol or token, one or more frames of the auditory input may be omitted from processing.