US Patent Application 18446623. SELF-SUPERVISED SPEECH REPRESENTATIONS FOR FAKE AUDIO DETECTION simplified abstract

From WikiPatents
Jump to navigation Jump to search

SELF-SUPERVISED SPEECH REPRESENTATIONS FOR FAKE AUDIO DETECTION

Organization Name

GOOGLE LLC

Inventor(s)

Joel Shor of Mountain View CA (US)

Alanna Foster Slocum of San Francisco CA (US)

SELF-SUPERVISED SPEECH REPRESENTATIONS FOR FAKE AUDIO DETECTION - A simplified explanation of the abstract

This abstract first appeared for US patent application 18446623 titled 'SELF-SUPERVISED SPEECH REPRESENTATIONS FOR FAKE AUDIO DETECTION

Simplified Explanation

The patent application describes a method for detecting synthetic speech in audio data obtained by a user device. Here are the key points:

  • The method involves receiving audio data that represents speech from a user device.
  • A trained self-supervised model is used to generate multiple audio feature vectors, each representing the audio features of a portion of the audio data.
  • A shallow discriminator model is then used to generate a score indicating the presence of synthetic speech in the audio data based on the audio features of each feature vector.
  • The score is compared to a synthetic speech detection threshold to determine if the speech in the audio data is synthetic or not.
  • If the score satisfies the threshold, it is determined that the speech in the audio data is synthetic.


Original Abstract Submitted

A method for determining synthetic speech includes receiving audio data characterizing speech in audio data obtained by a user device. The method also includes generating, using a trained self-supervised model, a plurality of audio features vectors each representative of audio features of a portion of the audio data. The method also includes generating, using a shallow discriminator model, a score indicating a presence of synthetic speech in the audio data based on the corresponding audio features of each audio feature vector of the plurality of audio feature vectors. The method also includes determining whether the score satisfies a synthetic speech detection threshold. When the score satisfies the synthetic speech detection threshold, the method includes determining that the speech in the audio data obtained by the user device comprises synthetic speech.