Google LLC (20250078808). Two-Level Text-To-Speech Systems Using Synthetic Training Data
Two-Level Text-To-Speech Systems Using Synthetic Training Data
Organization Name
Inventor(s)
Lev Finkelstein of Mountain View CA (US)
Chun-an Chan of Mountain View CA (US)
Norman Casagrande of London (GB)
Yu Zhang of Mountain View CA (US)
Robert Andrew James Clark of Hertfordshire (GB)
Two-Level Text-To-Speech Systems Using Synthetic Training Data
This abstract first appeared for US patent application 20250078808 titled 'Two-Level Text-To-Speech Systems Using Synthetic Training Data
Original Abstract Submitted
a method includes obtaining training data including a plurality of training audio signals and corresponding transcripts. each training audio signal is spoken by a target speaker in a first accent/dialect. for each training audio signal of the training data, the method includes generating a training synthesized speech representation spoken by the target speaker in a second accent/dialect different than the first accent/dialect and training a text-to-speech (tts) system based on the corresponding transcript and the training synthesized speech representation. the method also includes receiving an input text utterance to be synthesized into speech in the second accent/dialect. the method also includes obtaining conditioning inputs that include a speaker embedding and an accent/dialect identifier that identifies the second accent/dialect. the method also includes generating an output audio waveform corresponding to a synthesized speech representation of the input text sequence that clones the voice of the target speaker in the second accent/dialect.