20240038238. ELECTRONIC DEVICE, SPEECH RECOGNITION METHOD THEREFOR, AND MEDIUM simplified abstract (HUAWEI TECHNOLOGIES CO., LTD.)

From WikiPatents
Jump to navigation Jump to search

ELECTRONIC DEVICE, SPEECH RECOGNITION METHOD THEREFOR, AND MEDIUM

Organization Name

HUAWEI TECHNOLOGIES CO., LTD.

Inventor(s)

Lei Qin of Shenzhen (CN)

Lele Zhang of Shenzhen (CN)

Hao Liu of Beijing (CN)

Yuewan Lu of Shenzhen (CN)

ELECTRONIC DEVICE, SPEECH RECOGNITION METHOD THEREFOR, AND MEDIUM - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240038238 titled 'ELECTRONIC DEVICE, SPEECH RECOGNITION METHOD THEREFOR, AND MEDIUM

Simplified Explanation

The patent application describes a method for speech recognition using a combination of facial depth images and audio. Here is a simplified explanation of the abstract:

  • The method involves capturing a facial depth image and the voice of a user.
  • The facial depth image is obtained using a depth camera.
  • The method then identifies the shape of the user's mouth from the facial depth image.
  • Simultaneously, it extracts voice features from the audio.
  • The voice feature and the mouth shape feature are combined into an audio-video feature.
  • Finally, the method uses the audio-video feature to recognize the voice uttered by the user.

Potential applications of this technology:

  • Speech recognition systems for various devices, such as smartphones, smart speakers, and computers.
  • Biometric authentication systems that use speech recognition as an additional security measure.
  • Assistive technologies for individuals with speech impairments.

Problems solved by this technology:

  • Improved accuracy in speech recognition by incorporating visual cues from the user's mouth shape.
  • Enhanced user experience by reducing the need for explicit voice commands.
  • Increased accessibility for individuals with speech impairments.

Benefits of this technology:

  • More robust and accurate speech recognition, especially in noisy environments.
  • Improved user interaction with devices through a combination of voice and visual cues.
  • Potential for more natural and intuitive user interfaces.
  • Increased inclusivity by providing speech recognition capabilities to individuals with speech impairments.


Original Abstract Submitted

embodiments of this application provide a speech recognition method. the speech recognition method includes: obtaining a facial depth image and a to-be-recognized voice of a user, where the facial depth image is an image collected by using a depth camera; recognizing a mouth shape feature from the facial depth image, and recognizing a voice feature from a to-be-recognized audio; and fusing the voice feature and the mouth shape feature into an audio-video feature, and recognizing, based on the audio-video feature, a voice uttered by the user.