20240021190. Sub-models for Neural Contextual Biasing with Attention and Embedding Space simplified abstract (GOOGLE LLC)

From WikiPatents
Jump to navigation Jump to search

Sub-models for Neural Contextual Biasing with Attention and Embedding Space

Organization Name

GOOGLE LLC

Inventor(s)

Fadi Biadsy of Mountain View CA (US)

Pedro Jose Moreno Mengibar of Jersey City NJ (US)

Sub-models for Neural Contextual Biasing with Attention and Embedding Space - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240021190 titled 'Sub-models for Neural Contextual Biasing with Attention and Embedding Space

Simplified Explanation

The abstract describes a method for training a sub-model to bias a base speech recognition model for recognizing speech in a particular domain. The method involves obtaining a base speech recognition model trained on non-biased data and a set of training utterances representative of the particular domain. Each training utterance includes audio data and a ground truth transcription. The method further includes determining a corresponding document embedding for each training utterance using an embedding encoder. The sub-model is then trained using the document embeddings to bias the base speech recognition model.

  • The method involves training a sub-model to bias a base speech recognition model.
  • The base speech recognition model is trained on non-biased data.
  • A set of training utterances representative of a particular domain is obtained.
  • Each training utterance includes audio data and a ground truth transcription.
  • A corresponding document embedding is determined for each training utterance using an embedding encoder.
  • The sub-model is trained using the document embeddings to bias the base speech recognition model.

Potential applications of this technology:

  • Improving speech recognition accuracy in specific domains or industries.
  • Enhancing voice assistants or speech-to-text systems for specialized tasks.
  • Customizing speech recognition models for specific user needs or preferences.

Problems solved by this technology:

  • Overcoming bias or inaccuracies in speech recognition models trained on general data.
  • Addressing the challenges of recognizing speech in specific domains with unique vocabulary or accents.
  • Improving the performance of speech recognition systems in specialized applications.

Benefits of this technology:

  • Increased accuracy and reliability of speech recognition in specific domains.
  • Customization and adaptation of speech recognition models to specific user requirements.
  • Enhanced user experience and productivity in voice-controlled applications or services.


Original Abstract Submitted

a method for training a sub-model for contextual biasing for speech recognition includes obtaining a base speech recognition model trained on non-biased data. the method includes obtaining a set of training utterances representative of a particular domain, each training utterance in the set of training utterances including audio data characterizing the training utterances and a ground truth transcription of the training utterance. the method further includes, for each corresponding training utterance in the set of training utterances, determining, using an embedding encoder, a corresponding document embedding from the ground truth transcription of the corresponding training utterance. the method includes training, using the corresponding document embeddings determined from the ground truth transcriptions of the set of training utterances, a sub-model to bias the base speech recognition model to recognize speech in the particular domain.