Google LLC (20250078807). Injecting Text in Self-Supervised Speech Pre-training
Injecting Text in Self-Supervised Speech Pre-training
Organization Name
Inventor(s)
Zhehuai Chen of Jersey City NJ (US)
Bhuvana Ramabhadran of Mt. Kisco NY (US)
Andrew M. Rosenberg of Brooklyn NY (US)
Yu Zhang of Mountain View CA (US)
Pedro J. Moreno Mengibar of Jersey City NJ (US)
Injecting Text in Self-Supervised Speech Pre-training
This abstract first appeared for US patent application 20250078807 titled 'Injecting Text in Self-Supervised Speech Pre-training
Original Abstract Submitted
a method includes receiving training data that includes unspoken text utterances and un-transcribed non-synthetic speech utterances. each unspoken text utterance is not paired with any corresponding spoken utterance of non-synthetic speech. each un-transcribed non-synthetic speech utterance is not paired with a corresponding transcription. the method also includes generating a corresponding synthetic speech representation for each unspoken textual utterance of the received training data using a text-to-speech model. the method also includes pre-training an audio encoder on the synthetic speech representations generated for the unspoken textual utterances and the un-transcribed non-synthetic speech utterances to teach the audio encoder to jointly learn shared speech and text representations.