Google LLC (20240321263). Emitting Word Timings with End-to-End Models simplified abstract
Emitting Word Timings with End-to-End Models
Organization Name
Inventor(s)
Tara N. Sainath of Jersey City NJ (US)
Basilio Garcia Castillo of Mountain View CA (US)
Trevor Strohman of Mountain View CA (US)
Ruoming Pang of New York NY (US)
Emitting Word Timings with End-to-End Models - A simplified explanation of the abstract
This abstract first appeared for US patent application 20240321263 titled 'Emitting Word Timings with End-to-End Models
The method described in the abstract involves processing training examples containing audio data of spoken utterances and their corresponding transcriptions. The method inserts placeholder symbols before each word to identify the beginning and end of the word, then determines word pieces and generates constrained alignments for each word piece. These alignments are used to constrain the attention head of a decoder in a second pass.
- Training examples contain audio data and transcriptions
- Placeholder symbols are inserted before each word
- Word pieces are determined and constrained alignments are generated
- Alignments are used to constrain the attention head of a decoder
- Second pass decoder is constrained by the alignments
Potential Applications: - Speech recognition technology - Language translation systems - Voice-controlled devices
Problems Solved: - Improving accuracy in speech recognition - Enhancing the performance of language processing systems
Benefits: - Higher accuracy in transcribing spoken language - Improved efficiency in language translation - Enhanced user experience with voice-controlled devices
Commercial Applications: Title: Enhanced Speech Recognition Technology for Improved Language Processing This technology can be utilized in various industries such as: - Customer service for automated call centers - Language learning applications - Voice-activated virtual assistants
Questions about the technology: 1. How does this method improve the accuracy of speech recognition systems? 2. What are the potential limitations of using constrained alignments in language processing systems?
Frequently Updated Research: Stay updated on advancements in speech recognition technology and language processing systems to leverage the latest innovations in the field.
Original Abstract Submitted
a method includes receiving a training example that includes audio data representing a spoken utterance and a ground truth transcription. for each word in the spoken utterance, the method also includes inserting a placeholder symbol before the respective word identifying a respective ground truth alignment for a beginning and an end of the respective word, determining a beginning word piece and an ending word piece, and generating a first constrained alignment for the beginning word piece and a second constrained alignment for the ending word piece. the first constrained alignment is aligned with the ground truth alignment for the beginning of the respective word and the second constrained alignment is aligned with the ground truth alignment for the ending of the respective word. the method also includes constraining an attention head of a second pass decoder by applying the first and second constrained alignments.