Google llc (20240203426). TARGETED VOICE SEPARATION BY SPEAKER CONDITIONED ON SPECTROGRAM MASKING simplified abstract

From WikiPatents
Jump to navigation Jump to search

TARGETED VOICE SEPARATION BY SPEAKER CONDITIONED ON SPECTROGRAM MASKING

Organization Name

google llc

Inventor(s)

Quan Wang of Hoboken NJ (US)

Prashant Sridhar of New York NY (US)

Ignacio Lopez Moreno of New York NY (US)

Hannah Muckenhim of Martigny (CH)

TARGETED VOICE SEPARATION BY SPEAKER CONDITIONED ON SPECTROGRAM MASKING - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240203426 titled 'TARGETED VOICE SEPARATION BY SPEAKER CONDITIONED ON SPECTROGRAM MASKING

The patent application describes techniques for processing audio data to generate refined versions that isolate utterances of a single human speaker.

  • Processing of audio data to isolate utterances of a single human speaker
  • Generating refined versions of audio data using a trained voice filter model
  • Utilizing a mask generated from the spectrogram representation of the audio data
  • Applying an inverse of the frequency transformation to generate the refined audio data
  • Enhancing the quality of audio data by isolating specific speaker utterances

Potential Applications: - Speech recognition technology - Voice-controlled devices - Audio editing software - Speaker identification systems

Problems Solved: - Difficulty in isolating specific speaker utterances in audio data - Enhancing the accuracy of voice recognition systems - Improving the quality of audio recordings

Benefits: - Enhanced audio processing capabilities - Improved accuracy in speaker identification - Enhanced user experience in voice-controlled devices

Commercial Applications: Title: "Advanced Audio Processing Technology for Speaker Isolation" This technology can be used in various industries such as telecommunications, security, entertainment, and customer service to enhance audio processing capabilities and improve speaker identification systems.

Questions about the technology: 1. How does this technology improve the accuracy of voice recognition systems? - This technology isolates specific speaker utterances, reducing background noise and improving the accuracy of voice recognition systems. 2. What are the potential applications of this technology in the entertainment industry? - This technology can be used in audio editing software to isolate specific speaker utterances in recordings, enhancing the quality of audio content.


Original Abstract Submitted

techniques are disclosed that enable processing of audio data to generate one or more refined versions of audio data, where each of the refined versions of audio data isolate one or more utterances of a single respective human speaker. various implementations generate a refined version of audio data that isolates utterance(s) of a single human speaker by processing a spectrogram representation of the audio data (generated by processing the audio data with a frequency transformation) using a mask generated by processing the spectrogram of the audio data and a speaker embedding for the single human speaker using a trained voice filter model. output generated over the trained voice filter model is processed using an inverse of the frequency transformation to generate the refined audio data.