17959958. AUTOMATIC SPEECH RECOGNITION WITH MULTI-FRAME BLANK DECODING USING NEURAL NETWORKS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS simplified abstract (NVIDIA Corporation)
Contents
- 1 AUTOMATIC SPEECH RECOGNITION WITH MULTI-FRAME BLANK DECODING USING NEURAL NETWORKS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS
- 1.1 Organization Name
- 1.2 Inventor(s)
- 1.3 AUTOMATIC SPEECH RECOGNITION WITH MULTI-FRAME BLANK DECODING USING NEURAL NETWORKS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS - A simplified explanation of the abstract
- 1.4 Simplified Explanation
- 1.5 Potential Applications
- 1.6 Problems Solved
- 1.7 Benefits
- 1.8 Potential Commercial Applications
- 1.9 Possible Prior Art
- 1.10 Original Abstract Submitted
AUTOMATIC SPEECH RECOGNITION WITH MULTI-FRAME BLANK DECODING USING NEURAL NETWORKS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS
Organization Name
Inventor(s)
Hainan Xu of Baltimore MD (US)
Boris Ginsburg of Sunnyvale CA (US)
AUTOMATIC SPEECH RECOGNITION WITH MULTI-FRAME BLANK DECODING USING NEURAL NETWORKS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS - A simplified explanation of the abstract
This abstract first appeared for US patent application 17959958 titled 'AUTOMATIC SPEECH RECOGNITION WITH MULTI-FRAME BLANK DECODING USING NEURAL NETWORKS FOR CONVERSATIONAL AI SYSTEMS AND APPLICATIONS
Simplified Explanation
The patent application describes a machine learning system that trains a model to output a multi-frame blank symbol when processing an auditory input. This allows the model to skip less valuable audio frames and process the input more quickly and accurately.
- The system generates paths through a probability lattice, some of which include a multi-frame blank symbol that skips at least one frame associated with the lattice.
- The inclusion of the multi-frame blank symbol increases the total number of potential paths through the lattice, improving the model's processing efficiency.
- When the model outputs a multi-frame blank symbol, one or more frames of the auditory input may be omitted from processing.
Potential Applications
This technology could be applied in speech recognition systems, audio processing applications, and any other machine learning tasks involving auditory inputs.
Problems Solved
This innovation helps improve the efficiency and accuracy of machine learning models when processing audio data by allowing them to skip less valuable frames.
Benefits
The use of multi-frame blank symbols can lead to faster processing times, more accurate results, and better utilization of computational resources.
Potential Commercial Applications
This technology could be valuable in industries such as speech recognition software, audio transcription services, and automated audio analysis tools.
Possible Prior Art
One potential prior art could be the use of blank symbols in sequence modeling tasks to improve efficiency and accuracy.
Unanswered Questions
How does this technology compare to existing methods for processing audio data in machine learning models?
This article does not provide a direct comparison to existing methods for processing audio data.
What are the potential limitations or drawbacks of using multi-frame blank symbols in machine learning models for audio processing?
The article does not address any potential limitations or drawbacks of using multi-frame blank symbols in machine learning models.
Original Abstract Submitted
Systems and methods provide for a machine learning system to train a machine learning model to output a multi-frame blank symbol when processing an auditory input. For example, as the system generates paths through a probability lattice, one or more paths include a multi-frame blank that skips at least one frame associated with the probability lattice. The inclusion of the multi-frame blank symbol may increase a total number of potential paths through the probability lattice, and may allow the machine learning model to more quickly and accurately process audio frames, while disregarding audio frames of less value. In deployment, when an output of the machine learning model indicates a multi-frame blank symbol or token, one or more frames of the auditory input may be omitted from processing.