Microsoft technology licensing, llc (20240412736). FACTORIZED NEURAL TRANSDUCER FOR MULTI-SPEAKER SPEECH RECOGNITION
FACTORIZED NEURAL TRANSDUCER FOR MULTI-SPEAKER SPEECH RECOGNITION
Organization Name
microsoft technology licensing, llc
Inventor(s)
Zhuo Chen of Woodinville WA (US)
Naoyuki Kanda of Bellevue WA (US)
Takuya Yoshioka of Bellevue WA (US)
FACTORIZED NEURAL TRANSDUCER FOR MULTI-SPEAKER SPEECH RECOGNITION
This abstract first appeared for US patent application 20240412736 titled 'FACTORIZED NEURAL TRANSDUCER FOR MULTI-SPEAKER SPEECH RECOGNITION
Original Abstract Submitted
systems and methods are provided for instantiating, modifying, adapting, and using a factorized neural transducer for multi-speaker automatic speech recognition. the factorized neural transducer includes a vocabulary predictor with multiple hidden states to process speech from different speakers, a non-vocabulary predictor that facilitates the prediction of channel change tokens indicating a speaker change in input speech data, an encoder used to encode acoustic features of the input speech data, and a joint network.