18598523. CONNECTING DIFFERENT ASR APPLICATION DOMAINS WITH SPEAKER-TAGS simplified abstract (Google LLC)

From WikiPatents
Jump to navigation Jump to search

CONNECTING DIFFERENT ASR APPLICATION DOMAINS WITH SPEAKER-TAGS

Organization Name

Google LLC

Inventor(s)

Guru Prakash Arumugam of Sunnyvale CA (US)

Shuo-yiin Chang of Sunnyvale CA (US)

Shaan Jagdeep Patrick Bijwadia of San Francisco CA (US)

Weiran Wang of San Jose CA (US)

Quan Wang of Hoboken NJ (US)

Rohit Prakash Prabhavalkar of Palo Alto CA (US)

Tara N. Sainath of Jersey City NJ (US)

CONNECTING DIFFERENT ASR APPLICATION DOMAINS WITH SPEAKER-TAGS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18598523 titled 'CONNECTING DIFFERENT ASR APPLICATION DOMAINS WITH SPEAKER-TAGS

Simplified Explanation

The patent application describes a method for training a multi-domain speech recognition model using training samples from various domains, where each sample includes audio data and corresponding transcriptions with speaker tags.

Key Features and Innovation

  • Receiving training samples from multiple domains with audio data and transcriptions.
  • Re-labeling training samples with speaker tags to indicate different types of speakers.
  • Training a multi-domain speech recognition model to share parameters across different domains.

Potential Applications

This technology can be applied in various fields such as:

  • Speech recognition software
  • Virtual assistants
  • Call center automation

Problems Solved

  • Enhancing speech recognition accuracy across different domains
  • Improving speaker identification in audio data

Benefits

  • Increased accuracy in recognizing speech
  • Better understanding of different speaker types
  • Enhanced performance in multi-domain applications

Commercial Applications

  • This technology can be used in developing advanced speech recognition systems for commercial purposes, such as customer service automation and voice-controlled devices.

Prior Art

Researchers can explore prior art related to multi-domain speech recognition models, speaker tagging in audio data, and training methods for improving speech recognition accuracy.

Frequently Updated Research

Stay updated on the latest research in multi-domain speech recognition models, speaker identification techniques, and advancements in speech recognition technology.

Questions about Multi-Domain Speech Recognition

How does speaker tagging improve speech recognition accuracy?

Speaker tagging helps the model differentiate between different speakers, leading to more accurate recognition of speech patterns.

What are the potential challenges in training a multi-domain speech recognition model?

Training a model to recognize speech across various domains may face challenges in adapting to different accents, languages, and speech styles.


Original Abstract Submitted

A method includes receiving a plurality of training samples spanning multiple different domains. Each corresponding training sample includes audio data characterizing an utterance paired with a corresponding transcription of the utterance. The method also includes re-labeling each corresponding training sample of the plurality of training samples by annotating the corresponding transcription of the utterance with one or more speaker tags. Each speaker tag indicates a respective segment of the transcription for speech that was spoken by a particular type of speaker. The method also includes training a multi-domain speech recognition model on the re-labeled training samples to teach the multi-domain speech recognition model to learn to share parameters for recognizing speech across each of the different multiple different domains.