18453338. SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD, AND NON-TRANSITORY COMPUTER-READABLE MEDIUM simplified abstract (Honda Motor Co., Ltd.)
Contents
- 1 SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD, AND NON-TRANSITORY COMPUTER-READABLE MEDIUM
- 1.1 Organization Name
- 1.2 Inventor(s)
- 1.3 SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD, AND NON-TRANSITORY COMPUTER-READABLE MEDIUM - A simplified explanation of the abstract
- 1.4 Simplified Explanation
- 1.5 Potential Applications
- 1.6 Problems Solved
- 1.7 Benefits
- 1.8 Potential Commercial Applications
- 1.9 Possible Prior Art
- 1.10 Unanswered Questions
- 1.11 Original Abstract Submitted
SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD, AND NON-TRANSITORY COMPUTER-READABLE MEDIUM
Organization Name
Inventor(s)
Kazuhiro Nakadai of Saitama (JP)
SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD, AND NON-TRANSITORY COMPUTER-READABLE MEDIUM - A simplified explanation of the abstract
This abstract first appeared for US patent application 18453338 titled 'SPEECH RECOGNITION DEVICE, SPEECH RECOGNITION METHOD, AND NON-TRANSITORY COMPUTER-READABLE MEDIUM
Simplified Explanation
The patent application describes a speech recognition device that can accurately recognize speech and convert it into text by using two different models and tagging specific classes in the recognition results.
- Acquisition part: Acquires a speech signal.
- Speech feature amount calculation part: Calculates a speech feature amount.
- First speech recognition part: Performs speech recognition using a learned first E2E model and attaches a first tag to a vocabulary portion of a specific class in the recognition result.
- Second speech recognition part: Performs speech recognition using a learned second E2E model and attaches a second tag to a vocabulary portion of a specific class in a phoneme that is a recognition result.
- Phoneme replacement part: Replaces a vocabulary with the first tag with a phoneme with the second tag.
- Output part: Converts the phoneme with the second tag into text and outputs the same.
Potential Applications
This technology can be used in various applications such as:
- Voice-controlled devices
- Speech-to-text transcription software
- Language translation tools
Problems Solved
The technology solves the following problems:
- Improving speech recognition accuracy
- Enhancing text output quality
- Streamlining the speech-to-text conversion process
Benefits
The benefits of this technology include:
- Increased efficiency in converting speech to text
- Enhanced user experience in voice-activated systems
- Improved accessibility for individuals with speech impairments
Potential Commercial Applications
The technology can be commercially applied in:
- Virtual assistants
- Call center automation systems
- Dictation software
Possible Prior Art
One possible prior art for this technology could be the use of deep learning models in speech recognition systems.
Unanswered Questions
How does the device handle background noise during speech recognition?
The patent application does not provide details on how the device deals with background noise interference.
What languages are supported by the speech recognition device?
The patent application does not specify the languages that the device can recognize and convert into text.
Original Abstract Submitted
A speech recognition device includes: an acquisition part, acquiring a speech signal; a speech feature amount calculation part, calculating a speech feature amount; a first speech recognition part, based on the speech feature amount, performing speech recognition using a learned first E2E model, attaching a first tag to a vocabulary portion of a specific class in text that is a recognition result, and outputting the same; a second speech recognition part, based on the speech feature amount, performing speech recognition using a learned second E2E model, attaching a second tag to a vocabulary portion of a specific class in a phoneme that is a recognition result, and outputting the same; a phoneme replacement part, replacing a vocabulary with the first tag with a phoneme with the second tag; and an output part, converting the phoneme with the second tag into text and outputting the same.