Google llc (20240290323). Large-Scale Language Model Data Selection for Rare-Word Speech Recognition simplified abstract

From WikiPatents
Jump to navigation Jump to search

Large-Scale Language Model Data Selection for Rare-Word Speech Recognition

Organization Name

google llc

Inventor(s)

Wenqian Ronny Huang of Mountain View CA (US)

Tara N. Sainath of Jersey City NJ (US)

Large-Scale Language Model Data Selection for Rare-Word Speech Recognition - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240290323 titled 'Large-Scale Language Model Data Selection for Rare-Word Speech Recognition

The abstract describes a method of training a language model for rare-word speech recognition by filtering out rare words from training text samples and training utterances.

  • Obtaining a set of training text samples and training utterances with audio data and transcriptions.
  • Applying rare word filtering to identify rare-word training text samples.
  • Training the language model on transcriptions from training utterances and the identified subset of rare-word training text samples.

Potential Applications: - Improving speech recognition accuracy for rare words. - Enhancing the performance of language models in speech recognition systems.

Problems Solved: - Addressing the challenge of recognizing and transcribing rare words accurately in speech recognition.

Benefits: - Increased accuracy in recognizing and transcribing rare words. - Enhanced overall performance of language models in speech recognition applications.

Commercial Applications: Title: "Rare-Word Speech Recognition Technology for Enhanced Language Models" This technology can be applied in various industries such as healthcare, customer service, and education to improve speech recognition systems and enhance communication efficiency.

Questions about Rare-Word Speech Recognition Technology: 1. How does rare-word filtering improve the training of language models for speech recognition? 2. What are the potential limitations of using rare-word filtering in speech recognition technology?

Frequently Updated Research: Stay updated on advancements in rare-word speech recognition technology and its applications in various industries to leverage the latest innovations in speech recognition systems.


Original Abstract Submitted

a method of training a language model for rare-word speech recognition includes obtaining a set of training text samples, and obtaining a set of training utterances used for training a speech recognition model. each training utterance in the plurality of training utterances includes audio data corresponding to an utterance and a corresponding transcription of the utterance. the method also includes applying rare word filtering on the set of training text samples to identify a subset of rare-word training text samples that include words that do not appear in the transcriptions from the set of training utterances or appear in the transcriptions from the set of training utterances less than a threshold number of times. the method further includes training the external language model on the transcriptions from the set of training utterances and the identified subset of rare-word training text samples.