Jingdong Technology Holding Co.,Ltd. (20240233708). METHOD AND DEVICE FOR GENERATING SPEECH RECOGNITION TRAINING SET simplified abstract

From WikiPatents
Jump to navigation Jump to search

METHOD AND DEVICE FOR GENERATING SPEECH RECOGNITION TRAINING SET

Organization Name

Jingdong Technology Holding Co.,Ltd.

Inventor(s)

Li Fu of Beijing (CN)

METHOD AND DEVICE FOR GENERATING SPEECH RECOGNITION TRAINING SET - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240233708 titled 'METHOD AND DEVICE FOR GENERATING SPEECH RECOGNITION TRAINING SET

The present disclosure describes a method and apparatus for creating a speech recognition training set using audio and video data.

  • Acquiring audio and video data to be processed, with the video containing text information corresponding to the audio.
  • Recognizing the audio to obtain an audio text.
  • Recognizing text information in the video to obtain a video text.
  • Using the audio text and video text to generate a speech recognition training set based on their consistency.

Potential Applications: - Improving speech recognition technology. - Enhancing training sets for machine learning models. - Streamlining the process of creating speech recognition datasets.

Problems Solved: - Simplifying the creation of speech recognition training sets. - Increasing the accuracy of speech recognition systems. - Reducing the manual effort required to generate training data.

Benefits: - Enhanced accuracy in speech recognition. - Time-saving in creating training sets. - Improved efficiency in machine learning model training.

Commercial Applications: Title: "Enhanced Speech Recognition Training Set Generation for AI Applications" This technology can be utilized in industries such as: - Virtual assistants - Call centers - Transcription services - Voice-controlled devices

Frequently Updated Research: Stay updated on advancements in speech recognition technology and machine learning models for improved accuracy and efficiency.

Questions about Speech Recognition Training Set Generation: 1. How does this method improve the accuracy of speech recognition systems? 2. What are the potential applications of this technology in real-world scenarios?


Original Abstract Submitted

disclosed in the present disclosure are a method and apparatus for generating a speech recognition training set. the method may include: acquiring a to-be-processed audio and a to-be-processed video, where the to-be-processed video comprises text information corresponding to the to-be-processed audio; recognizing the to-be-processed audio to obtain an audio text; recognizing text information in the to-be-processed video to obtain a video text; and using, based on consistency of the audio text with the video text, the to-be-processed audio as a speech sample and the video text as a label to obtain the speech recognition training set.