BANDWIDTH EXTENSION AND SPEECH ENHANCEMENT OF AUDIO: abstract simplified (18208628)

From WikiPatents
Jump to navigation Jump to search
  • This abstract for appeared for patent application number 18208628 Titled 'BANDWIDTH EXTENSION AND SPEECH ENHANCEMENT OF AUDIO'

Simplified Explanation

This abstract describes a system, apparatus, and method for audio processing. It involves several operations such as obtaining an input audio waveform, converting it into a mel-spectrogram using a short-time Fourier transform (STFT), and then improving the mel-spectrogram by removing noise or restoring high-frequency components using two-dimensional Unet convolutional blocks. The updated mel-spectrogram is then converted back into an audio waveform and corrected in both the time and frequency domains to remove artifacts or noise. The corrected audio waveform is further processed using a one-dimensional convolutional layer, and the final output is provided in both the time and frequency domains.


Original Abstract Submitted

There is provided a system, apparatus and a method for audio processing. The operations include obtaining an input audio waveform, obtaining a mel-spectrogram by performing a short-time Fourier transform (STFT) operation on the input audio waveform, obtaining an updated mel-spectrogram by at least one or removing noise from the mel-spectrogram or restoring high frequency components by applying two-dimensional Unet convolutional blocks to the mel-spectrogram, converting the updated mel-spectrogram to a converted audio waveform in a waveform domain, correcting the converted audio waveform in a time domain, correcting the converted audio waveform in a frequency domain to remove artifacts or noise, processing the corrected audio waveform corrected in the time domain and corrected in the frequency domain with an one-dimensional convolutional layer, and outputting the processed audio waveform in the time domain and in the frequency domain.