Nvidia corporation (20250078842). MULTI-SPEAKER SPEECH RECOGNITION FACILITATED BY LANGUAGE MODELS
MULTI-SPEAKER SPEECH RECOGNITION FACILITATED BY LANGUAGE MODELS
Organization Name
Inventor(s)
Taejin Park of San Jose CA (US)
Kunal Dhawan of San Jose CA (US)
Nithin Rao Koluguri of Milpitas CA (US)
Jagadeesh Balam of Campbell CA (US)
MULTI-SPEAKER SPEECH RECOGNITION FACILITATED BY LANGUAGE MODELS
This abstract first appeared for US patent application 20250078842 titled 'MULTI-SPEAKER SPEECH RECOGNITION FACILITATED BY LANGUAGE MODELS
Original Abstract Submitted
disclosed are apparatuses, systems, and techniques that leverage one or more language models (lms)âsuch as large language models (llmsâfor efficient multi-speaker speech recognition. the techniques include processing, using a speaker diarization model, an audio feature to generate a first association of the audio feature with one or more prospective speakers, the audio feature being representative of one or more spoken words. the techniques further include providing, to an lm, a first prompt requesting the lm to identify a second association of the one or more spoken words with the one or more prospective speakers and receiving, from the lm, a first response identifying the second association of the one or more spoken words with the one or more prospective speakers. the techniques further include determining, using the first association and the second association, one or more speakers that produced the one or more spoken words.