17959958. AUTOMATIC SPEECH RECOGNITION WITH MULTI-FRAME BLANK DECODING USING NEURAL NETWORKS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS simplified abstract (NVIDIA Corporation)

From WikiPatents
Jump to navigation Jump to search

AUTOMATIC SPEECH RECOGNITION WITH MULTI-FRAME BLANK DECODING USING NEURAL NETWORKS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS

Organization Name

NVIDIA Corporation

Inventor(s)

Hainan Xu of Baltimore MD (US)

Boris Ginsburg of Sunnyvale CA (US)

AUTOMATIC SPEECH RECOGNITION WITH MULTI-FRAME BLANK DECODING USING NEURAL NETWORKS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS - A simplified explanation of the abstract

This abstract first appeared for US patent application 17959958 titled 'AUTOMATIC SPEECH RECOGNITION WITH MULTI-FRAME BLANK DECODING USING NEURAL NETWORKS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS

Simplified Explanation

The patent application describes a machine learning system that trains a model to output a multi-frame blank symbol when processing an auditory input. This allows the model to skip less valuable audio frames and process the input more quickly and accurately.

  • The system generates paths through a probability lattice, some of which include a multi-frame blank symbol that skips at least one frame associated with the lattice.
  • The inclusion of the multi-frame blank symbol increases the total number of potential paths through the lattice, improving the model's processing efficiency.
  • When the model outputs a multi-frame blank symbol, one or more frames of the auditory input may be omitted from processing.

Potential Applications

This technology could be applied in speech recognition systems, audio processing applications, and any other machine learning tasks involving auditory inputs.

Problems Solved

This innovation helps improve the efficiency and accuracy of machine learning models when processing audio data by allowing them to skip less valuable frames.

Benefits

The use of multi-frame blank symbols can lead to faster processing times, more accurate results, and better utilization of computational resources.

Potential Commercial Applications

This technology could be valuable in industries such as speech recognition software, audio transcription services, and automated audio analysis tools.

Possible Prior Art

One potential prior art could be the use of blank symbols in sequence modeling tasks to improve efficiency and accuracy.

Unanswered Questions

How does this technology compare to existing methods for processing audio data in machine learning models?

This article does not provide a direct comparison to existing methods for processing audio data.

What are the potential limitations or drawbacks of using multi-frame blank symbols in machine learning models for audio processing?

The article does not address any potential limitations or drawbacks of using multi-frame blank symbols in machine learning models.


Original Abstract Submitted

Systems and methods provide for a machine learning system to train a machine learning model to output a multi-frame blank symbol when processing an auditory input. For example, as the system generates paths through a probability lattice, one or more paths include a multi-frame blank that skips at least one frame associated with the probability lattice. The inclusion of the multi-frame blank symbol may increase a total number of potential paths through the probability lattice, and may allow the machine learning model to more quickly and accurately process audio frames, while disregarding audio frames of less value. In deployment, when an output of the machine learning model indicates a multi-frame blank symbol or token, one or more frames of the auditory input may be omitted from processing.