US Patent Application 17828240. CONDITIONAL FACTORIZATION FOR JOINTLY MODELING CODE-SWITCHED AND MONOLINGUAL ASR simplified abstract
CONDITIONAL FACTORIZATION FOR JOINTLY MODELING CODE-SWITCHED AND MONOLINGUAL ASR
Organization Name
Inventor(s)
Chunlei Zhang of Bellevue WA (US)
Brian Yan of Palo Alto CA (US)
CONDITIONAL FACTORIZATION FOR JOINTLY MODELING CODE-SWITCHED AND MONOLINGUAL ASR - A simplified explanation of the abstract
This abstract first appeared for US patent application 17828240 titled 'CONDITIONAL FACTORIZATION FOR JOINTLY MODELING CODE-SWITCHED AND MONOLINGUAL ASR
Simplified Explanation
This patent application describes a method, apparatus, and computer-readable medium for automatic speech recognition using conditional factorization for bilingual code-switched and monolingual speech.
- The approach involves receiving an audio observation sequence that contains audio in either a first language or a second language.
- The audio observation sequence is then mapped into two separate sequences of hidden representations using encoders specific to each language.
- A label-to-frame sequence is generated based on the hidden representations from both languages using a joint neural network model.
- This method allows for accurate speech recognition in bilingual code-switched and monolingual speech scenarios.
Original Abstract Submitted
A method, apparatus, and non-transitory computer-readable medium for automatic speech recognition using conditional factorization for bilingual code-switched and monolingual speech may include receiving an audio observation sequence comprising a plurality of frames, the audio observation sequence including audio in a first language or a second language. The approach may further include mapping the audio observation sequence into a first sequence of hidden representations, the mapping being generated by a first encoder corresponding to the first language and mapping the audio observation sequence into a second sequence of hidden representations, the mapping being generated by a second encoder corresponding to the second language. The approach may further include generating a label-to-frame sequence based on the first sequence of hidden representations and the second sequence of hidden representations, using a joint neural network based model.