18258316. SPEECH RECOGNITION METHOD AND APPARATUS simplified abstract (HUAWEI TECHNOLOGIES CO., LTD.)

From WikiPatents
Jump to navigation Jump to search

SPEECH RECOGNITION METHOD AND APPARATUS

Organization Name

HUAWEI TECHNOLOGIES CO., LTD.

Inventor(s)

Xuxian Yin of Shenzhen (CN)

SPEECH RECOGNITION METHOD AND APPARATUS - A simplified explanation of the abstract

This abstract first appeared for US patent application 18258316 titled 'SPEECH RECOGNITION METHOD AND APPARATUS

Simplified Explanation

The patent application describes a method and apparatus for speech recognition. Here is a simplified explanation of the abstract:

  • A terminal device inputs a phoneme to be recognized into a multitask neural network model.
  • The neural network model outputs a prediction result, which includes both a character prediction and a punctuation prediction corresponding to the input phoneme.
  • The terminal device displays at least a part of the prediction result on its display.

Potential applications of this technology:

  • Speech recognition systems in various devices such as smartphones, tablets, and computers.
  • Voice-controlled virtual assistants.
  • Transcription services for converting spoken language into written text.

Problems solved by this technology:

  • Simultaneous prediction of both characters and punctuation marks corresponding to a phoneme.
  • Efficient and accurate speech recognition.
  • Reducing the size of the neural network model for deployment on terminal devices.

Benefits of this technology:

  • Improved user experience with accurate and real-time speech recognition.
  • Enhanced productivity through voice-controlled applications.
  • Reduced computational resources required for speech recognition on terminal devices.


Original Abstract Submitted

This application relates to a speech recognition method and apparatus. The speech recognition method includes: A terminal device inputs a to-be-recognized phoneme into a first multitask neural network model; the first multitask neural network model outputs a first prediction result, where the first prediction result includes a character prediction result and a punctuation prediction result that correspond to the to-be-recognized phoneme; and the terminal device displays at least a part of the first prediction result on a display of the terminal device. A neural network model for simultaneously predicting a character and a punctuation corresponding to a phoneme is constructed, so that the character and the punctuation corresponding to the phoneme can be simultaneously output. In addition, the neural network model is small-sized, and can be deployed on a terminal side.