Google LLC (20240304181). CONNECTING DIFFERENT ASR APPLICATION DOMAINS WITH SPEAKER-TAGS simplified abstract

From WikiPatents
Jump to navigation Jump to search

CONNECTING DIFFERENT ASR APPLICATION DOMAINS WITH SPEAKER-TAGS

Organization Name

Google LLC

Inventor(s)

Guru Prakash Arumugam of Sunnyvale CA (US)

Shuo-yiin Chang of Sunnyvale CA (US)

Shaan Jagdeep Patrick Bijwadia of San Francisco CA (US)

Weiran Wang of San Jose CA (US)

Quan Wang of Hoboken NJ (US)

Rohit Prakash Prabhavalkar of Palo Alto CA (US)

Tara N. Sainath of Jersey City NJ (US)

CONNECTING DIFFERENT ASR APPLICATION DOMAINS WITH SPEAKER-TAGS - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240304181 titled 'CONNECTING DIFFERENT ASR APPLICATION DOMAINS WITH SPEAKER-TAGS

Simplified Explanation:

The patent application describes a method for training a multi-domain speech recognition model using training samples from various domains, each paired with audio data and corresponding transcriptions. The method involves re-labeling the training samples with speaker tags to indicate different types of speakers, and then training the model to recognize speech across multiple domains.

  • The method involves receiving training samples from different domains with audio data and transcriptions.
  • Each training sample is re-labeled with speaker tags to identify different types of speakers in the transcription.
  • The multi-domain speech recognition model is trained to recognize speech across various domains by sharing parameters.

Key Features and Innovation:

  • Training a multi-domain speech recognition model using training samples from different domains.
  • Re-labeling training samples with speaker tags to identify different types of speakers in the transcription.
  • Teaching the model to recognize speech across multiple domains by sharing parameters.

Potential Applications:

  • Speech recognition systems in various industries such as customer service, healthcare, and education.
  • Transcription services for meetings, interviews, and lectures.
  • Voice-controlled devices and virtual assistants.

Problems Solved:

  • Improving speech recognition accuracy across different domains.
  • Enhancing the ability to identify different speakers in transcriptions.
  • Streamlining the training process for multi-domain speech recognition models.

Benefits:

  • Increased accuracy and efficiency in speech recognition.
  • Enhanced transcription services with speaker identification.
  • Improved performance of voice-controlled devices and virtual assistants.

Commercial Applications:

Multi-Domain Speech Recognition Model Training for Enhanced Accuracy and Efficiency

Questions about Multi-Domain Speech Recognition Model Training:

1. How does re-labeling training samples with speaker tags improve the performance of the multi-domain speech recognition model? 2. What are the potential challenges in training a speech recognition model across multiple domains?


Original Abstract Submitted

a method includes receiving a plurality of training samples spanning multiple different domains. each corresponding training sample includes audio data characterizing an utterance paired with a corresponding transcription of the utterance. the method also includes re-labeling each corresponding training sample of the plurality of training samples by annotating the corresponding transcription of the utterance with one or more speaker tags. each speaker tag indicates a respective segment of the transcription for speech that was spoken by a particular type of speaker. the method also includes training a multi-domain speech recognition model on the re-labeled training samples to teach the multi-domain speech recognition model to learn to share parameters for recognizing speech across each of the different multiple different domains.