20240038251. AUDIO DATA PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, MEDIUM AND PROGRAM PRODUCT simplified abstract (Unknown Organization)

From WikiPatents
Jump to navigation Jump to search

AUDIO DATA PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, MEDIUM AND PROGRAM PRODUCT

Organization Name

Unknown Organization

Inventor(s)

Yipeng Wang (US)

AUDIO DATA PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, MEDIUM AND PROGRAM PRODUCT - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240038251 titled 'AUDIO DATA PROCESSING METHOD AND APPARATUS, ELECTRONIC DEVICE, MEDIUM AND PROGRAM PRODUCT

Simplified Explanation

The abstract describes a method for processing audio data, specifically human voice audio data. The method involves obtaining both the audio data to be adjusted and a reference audio data. The audio data is then divided into frames to create a first audio frame set and a second audio frame set. Each frame is analyzed to recognize the corresponding pronunciation unit. The method further determines the timestamp of each pronunciation unit based on the timestamp of each audio frame. Finally, the method adjusts the timestamp of at least one pronunciation unit to ensure consistency between the audio data to be adjusted and the reference audio data.

  • The method obtains human voice audio data and reference human voice audio data.
  • The audio data is divided into frames to create a first audio frame set and a second audio frame set.
  • Each frame is analyzed to identify the corresponding pronunciation unit.
  • The method determines the timestamp of each pronunciation unit based on the timestamp of each audio frame.
  • The timestamp of at least one pronunciation unit is adjusted to align with the reference audio data.

Potential applications of this technology:

  • Speech recognition and transcription systems could benefit from accurate alignment of pronunciation units.
  • Language learning applications could use this technology to provide precise feedback on pronunciation.
  • Voice-controlled systems, such as virtual assistants, could improve their understanding of user commands by aligning pronunciation units.

Problems solved by this technology:

  • Inaccurate alignment of pronunciation units in audio data can lead to errors in speech recognition and transcription.
  • Misalignment of pronunciation units can affect the accuracy of language learning applications.
  • Voice-controlled systems may misinterpret user commands if pronunciation units are not properly aligned.

Benefits of this technology:

  • Improved accuracy in speech recognition and transcription.
  • Enhanced language learning experiences with precise pronunciation feedback.
  • More reliable and accurate voice-controlled systems.


Original Abstract Submitted

an audio data processing method is provided. the method includes: obtaining human voice audio data to be adjusted and reference human voice audio data; performing framing on the human voice audio data to be adjusted and the reference human voice audio data respectively so as to obtain a first audio frame set and a second audio frame set respectively; recognizing a pronunciation unit corresponding to each audio frame respectively; determining, based on a timestamp of each audio frame, a timestamp of each pronunciation unit in the human voice audio data to be adjusted and the reference human voice audio data respectively; and adjusting the timestamp of at least one pronunciation unit to make the timestamp of the pronunciation unit in the human voice audio data to be adjusted to be consistent with the timestamp of the corresponding pronunciation unit in the reference human voice audio data.