20240029723. SYSTEM AND METHOD FOR COMMAND FULFILLMENT WITHOUT WAKE WORD simplified abstract (SAMSUNG ELECTRONICS CO., LTD.)

From WikiPatents
Jump to navigation Jump to search

SYSTEM AND METHOD FOR COMMAND FULFILLMENT WITHOUT WAKE WORD

Organization Name

SAMSUNG ELECTRONICS CO., LTD.

Inventor(s)

Sivakumar Balasubramanian of Sunnyvale CA (US)

Gowtham Srinivasan of San Jose CA (US)

Srinivasa Rao Ponakala of Sunnyvale CA (US)

Vijendra Raj Apsingekar of San Jose CA (US)

Anil Sunder Yadav of San Jose CA (US)

SYSTEM AND METHOD FOR COMMAND FULFILLMENT WITHOUT WAKE WORD - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240029723 titled 'SYSTEM AND METHOD FOR COMMAND FULFILLMENT WITHOUT WAKE WORD

Simplified Explanation

The method described in the patent application involves processing audio input using a frame-level detector model and a word-level verifier model to perform automatic speech recognition. Here is a simplified explanation of the abstract:

  • The method starts by obtaining an audio input.
  • A frame-level detector model is used to analyze at least a portion of the audio input and generate frame-level predictions.
  • A first output of the frame-level detector model is obtained, which includes the frame-level predictions associated with the analyzed portion of the audio input.
  • The method then involves providing at least one chunked audio frame to a word-level verifier model.
  • A second output of the word-level verifier model is obtained, which includes word-level probabilities associated with the chunked audio frame.
  • Based on the word-level probabilities, the method instructs the performance of automatic speech recognition on the audio input.

Potential applications of this technology:

  • Speech recognition systems and applications
  • Voice-controlled devices and virtual assistants
  • Transcription services and software
  • Language learning and pronunciation assessment tools

Problems solved by this technology:

  • Improving the accuracy and efficiency of automatic speech recognition systems
  • Enhancing the performance of voice-controlled devices and virtual assistants
  • Enabling more accurate transcription of audio content
  • Facilitating language learning and pronunciation assessment

Benefits of this technology:

  • Higher accuracy in converting spoken language into written text
  • Improved user experience with voice-controlled devices and virtual assistants
  • Time-saving and cost-effective transcription services
  • Enhanced language learning and pronunciation assessment capabilities


Original Abstract Submitted

a method comprises obtaining an audio input. the method also includes providing at least a portion of the audio input to a frame-level detector model. the method also includes obtaining a first output of the frame-level detector model including frame-level predictions associated with at least the portion of the audio input. the method also includes providing at least one chunked audio frame to a word-level verifier model. the method also includes obtaining a second output of the word-level verifier model including word-level probabilities associated with the at least one chunked audio frame. the method also includes instructing performance of automatic speech recognition on the audio input based on the word-level probabilities associated with the at least one chunked audio frame.