US Patent Application 18208628. BANDWIDTH EXTENSION AND SPEECH ENHANCEMENT OF AUDIO simplified abstract

From WikiPatents
Jump to navigation Jump to search

BANDWIDTH EXTENSION AND SPEECH ENHANCEMENT OF AUDIO

Inventors

Pavel Konstantinovich Andreev of Moscow (RU)


Aibek Arstanbekovich Alanov of Moscow (RU)


Oleg Yurievich Ivanov of Moscow (RU)


Dmitry Petrovich Vetrov of Moscow (RU)


BANDWIDTH EXTENSION AND SPEECH ENHANCEMENT OF AUDIO - A simplified explanation of the abstract

  • This abstract for appeared for patent application number 18208628 Titled 'BANDWIDTH EXTENSION AND SPEECH ENHANCEMENT OF AUDIO'

Simplified Explanation

This abstract describes a system, apparatus, and method for audio processing. It involves several operations such as obtaining an input audio waveform, converting it into a mel-spectrogram using a short-time Fourier transform (STFT), and then improving the mel-spectrogram by removing noise or restoring high-frequency components using two-dimensional Unet convolutional blocks. The updated mel-spectrogram is then converted back into an audio waveform in the waveform domain. The converted audio waveform is corrected in both the time and frequency domains to eliminate artifacts or noise. It is then processed further using a one-dimensional convolutional layer. Finally, the processed audio waveform is outputted in both the time and frequency domains.


Original Abstract Submitted

There is provided a system, apparatus and a method for audio processing. The operations include obtaining an input audio waveform, obtaining a mel-spectrogram by performing a short-time Fourier transform (STFT) operation on the input audio waveform, obtaining an updated mel-spectrogram by at least one or removing noise from the mel-spectrogram or restoring high frequency components by applying two-dimensional Unet convolutional blocks to the mel-spectrogram, converting the updated mel-spectrogram to a converted audio waveform in a waveform domain, correcting the converted audio waveform in a time domain, correcting the converted audio waveform in a frequency domain to remove artifacts or noise, processing the corrected audio waveform corrected in the time domain and corrected in the frequency domain with an one-dimensional convolutional layer, and outputting the processed audio waveform in the time domain and in the frequency domain.