17931755. SOURCE SPEECH MODIFICATION BASED ON AN INPUT SPEECH CHARACTERISTIC simplified abstract (QUALCOMM Incorporated)

From WikiPatents
Jump to navigation Jump to search

SOURCE SPEECH MODIFICATION BASED ON AN INPUT SPEECH CHARACTERISTIC

Organization Name

QUALCOMM Incorporated

Inventor(s)

Kyungguen Byun of Seoul (KR)

Sunkuk Moon of San Diego CA (US)

Erik Visser of San Diego CA (US)

SOURCE SPEECH MODIFICATION BASED ON AN INPUT SPEECH CHARACTERISTIC - A simplified explanation of the abstract

This abstract first appeared for US patent application 17931755 titled 'SOURCE SPEECH MODIFICATION BASED ON AN INPUT SPEECH CHARACTERISTIC

Simplified Explanation

The device described in the patent application processes input speech to detect characteristics and generate output speech using reference embeddings.

  • Process input audio spectrum of input speech
  • Detect first characteristic associated with input speech
  • Select one or more reference embeddings based on the first characteristic
  • Process representation of source speech using reference embeddings
  • Generate output audio spectrum of output speech

Potential Applications

  • Speech recognition technology
  • Voice cloning and synthesis
  • Audio editing and manipulation software

Problems Solved

  • Improving speech processing accuracy
  • Enhancing voice recognition capabilities
  • Facilitating audio content creation

Benefits

  • Enhanced user experience in voice-controlled devices
  • Improved speech-to-text conversion accuracy
  • Increased efficiency in audio production and editing


Original Abstract Submitted

A device includes one or more processors configured to process an input audio spectrum of input speech to detect a first characteristic associated with the input speech. The one or more processors are also configured to select, based at least in part on the first characteristic, one or more reference embeddings from among multiple reference embeddings. The one or more processors are further configured to process a representation of source speech, using the one or more reference embeddings, to generate an output audio spectrum of output speech.