20240013784. SPEAKER RECOGNITION ADAPTATION simplified abstract (Amazon Technologies, Inc.)

From WikiPatents
Jump to navigation Jump to search

SPEAKER RECOGNITION ADAPTATION

Organization Name

Amazon Technologies, Inc.

Inventor(s)

Zeya Chen of San Jose CA (US)

SPEAKER RECOGNITION ADAPTATION - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240013784 titled 'SPEAKER RECOGNITION ADAPTATION

Simplified Explanation

The abstract describes techniques for generating speaker recognition data for multiple words using a transformation model.

  • During a speaker recognition enrollment process, audio data corresponding to one or more prompted spoken inputs is received.
  • The first speaker recognition data specific to the first word is generated using the prompted spoken input(s).
  • The user can indicate that speaker recognition processing is to be performed using a second word.
  • Instead of going through the enrollment process again, a transformation model is applied to the first speaker recognition data to generate second speaker recognition data specific to the second word.

Potential Applications:

  • Speaker recognition systems for authentication and access control.
  • Voice-controlled devices and virtual assistants.
  • Speech analysis and transcription services.

Problems Solved:

  • Reduces the need for users to go through the enrollment process multiple times for different words.
  • Improves the efficiency and convenience of speaker recognition systems.

Benefits:

  • Simplifies the user experience by allowing the use of multiple words without repeating the enrollment process.
  • Saves time and effort for users and reduces the burden on the system.
  • Enables more accurate and reliable speaker recognition for a wider range of words and phrases.


Original Abstract Submitted

techniques for generating, from first speaker recognition data corresponding to at least a first word, second speaker recognition data corresponding to at least a second word are described. during a speaker recognition enrollment process, a device receives audio data corresponding to one or more prompted spoken inputs comprising the at least first word. using the prompted spoken input(s), the first speaker recognition data (specific to that least first word) is generated. sometime thereafter, a user may indicate that speaker recognition processing is to be performed using at least a second word. rather than have the user go through the speaker recognition enrollment process a second time, the device (or a system) may apply a transformation model to the first speaker recognition data to generate second speaker recognition data specific to the at least second word.