Amazon Technologies, Inc. (20240321264). AUTOMATIC SPEECH RECOGNITION simplified abstract

From WikiPatents
Jump to navigation Jump to search

AUTOMATIC SPEECH RECOGNITION

Organization Name

Amazon Technologies, Inc.

Inventor(s)

Jing Liu of Pittsburgh PA (US)

Feng-Ju Chang of Pittsburgh PA (US)

Athanasios Mouchtaris of Pittsburgh PA (US)

Martin Radfar of North York (CA)

Maurizio Omologo of Altopiano della Vigolana (IT)

Siegfried Kunzmann of Heidelberg (DE)

AUTOMATIC SPEECH RECOGNITION - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240321264 titled 'AUTOMATIC SPEECH RECOGNITION

The abstract describes techniques for automatic speech recognition (ASR) that integrate contextual information from user profile data into audio encoding data to predict tokens corresponding to spoken input. The user profile data includes personalized words like contact names and device names, and the ASR component determines word embedding data using these personalized words. The ASR component applies attention to audio frames relevant to the personalized words based on processing the audio encoding data and word embedding data.

  • Integration of contextual information from user profile data into audio encoding data for ASR
  • Prediction of tokens corresponding to spoken input using personalized words from user profile data
  • Determination of word embedding data based on personalized words
  • Application of attention to audio frames relevant to personalized words
  • Processing of audio encoding data and word embedding data to enhance ASR accuracy

Potential Applications: - Improved accuracy in speech recognition systems - Personalized voice assistants - Enhanced user experience in voice-controlled devices

Problems Solved: - Difficulty in accurately recognizing personalized words in speech input - Lack of context awareness in traditional ASR systems

Benefits: - Increased accuracy and efficiency in speech recognition - Enhanced user experience with personalized voice interactions - Improved performance of voice-controlled devices

Commercial Applications: Title: Personalized Speech Recognition Technology for Enhanced User Experience Potential commercial uses include: - Voice-controlled smart devices - Virtual assistants in smartphones and smart home devices - Customer service chatbots with personalized responses

Questions about Personalized Speech Recognition Technology:

1. How does the integration of user profile data improve the accuracy of speech recognition?

  - By incorporating personalized words from user profiles, the system can better predict and recognize specific tokens in spoken input, leading to higher accuracy.

2. What are the potential privacy concerns associated with using user profile data in speech recognition technology?

  - Privacy concerns may arise regarding the collection and storage of personalized information for speech recognition purposes. It is essential to ensure data security and user consent in handling such sensitive data.


Original Abstract Submitted

techniques for performing automatic speech recognition (asr) are described. in some embodiments, an asr component integrates contextual information from user profile data into audio encoding data to predict a token(s) corresponding to a spoken input. the user profile data may include personalized words, such as, contact names, device names, etc. the asr component determines word embedding data using the personalized words. the asr component is configured to apply attention to audio frames that are relevant to the personalized words based on processing the audio encoding data and the word embedding data.