US Patent Application 18356743. Speech Personalization and Federated Training Using Real World Noise simplified abstract
Contents
Speech Personalization and Federated Training Using Real World Noise
Organization Name
Inventor(s)
Matthew Sharifi of Kilchberg (CH)
Speech Personalization and Federated Training Using Real World Noise - A simplified explanation of the abstract
This abstract first appeared for US patent application 18356743 titled 'Speech Personalization and Federated Training Using Real World Noise
Simplified Explanation
The patent application describes a method for training a speech model using a voice-enabled device.
- The method involves receiving a set of training utterances, each consisting of a transcription and a speech representation.
- Noisy audio data is sampled from the device's environment.
- The speech representation of each training utterance is augmented with the sampled noisy audio data to create noisy audio samples.
- Each noisy audio sample is paired with the corresponding transcription.
- A speech model is then trained using these noisy audio samples.
- This method helps improve the accuracy and robustness of the speech model by incorporating real-world noise into the training process.
Original Abstract Submitted
A method of training a speech model includes receiving, at a voice-enabled device, a fixed set of training utterances where each training utterance in the fixed set of training utterances includes a transcription paired with a speech representation of the corresponding training utterance. The method also includes sampling noisy audio data from an environment of the voice-enabled device. For each training utterance in the fixed set of training utterances, the method further includes augmenting, using the noisy audio data sampled from the environment of the voice-enabled device, the speech representation of the corresponding training utterance to generate noisy audio samples and pairing each of the noisy audio samples with the corresponding transcription of the corresponding training utterance. The method additionally includes training a speech model on the noisy audio samples generated for each speech representation in the fixed set of training utterances.