US Patent Application 18356743. Speech Personalization and Federated Training Using Real World Noise simplified abstract

From WikiPatents
Jump to navigation Jump to search

Speech Personalization and Federated Training Using Real World Noise

Organization Name

Google LLC


Inventor(s)

Matthew Sharifi of Kilchberg (CH)

Victor Carbune of Zürich (CH)

Speech Personalization and Federated Training Using Real World Noise - A simplified explanation of the abstract

This abstract first appeared for US patent application 18356743 titled 'Speech Personalization and Federated Training Using Real World Noise

Simplified Explanation

The patent application describes a method for training a speech model using a voice-enabled device.

  • The method involves receiving a set of training utterances, each consisting of a transcription and a speech representation.
  • Noisy audio data is sampled from the device's environment.
  • The speech representation of each training utterance is augmented with the sampled noisy audio data to create noisy audio samples.
  • Each noisy audio sample is paired with the corresponding transcription.
  • A speech model is then trained using these noisy audio samples.
  • This method helps improve the accuracy and robustness of the speech model by incorporating real-world noise into the training process.


Original Abstract Submitted

A method of training a speech model includes receiving, at a voice-enabled device, a fixed set of training utterances where each training utterance in the fixed set of training utterances includes a transcription paired with a speech representation of the corresponding training utterance. The method also includes sampling noisy audio data from an environment of the voice-enabled device. For each training utterance in the fixed set of training utterances, the method further includes augmenting, using the noisy audio data sampled from the environment of the voice-enabled device, the speech representation of the corresponding training utterance to generate noisy audio samples and pairing each of the noisy audio samples with the corresponding transcription of the corresponding training utterance. The method additionally includes training a speech model on the noisy audio samples generated for each speech representation in the fixed set of training utterances.