US Patent Application 17804606. Multichannel Audio Speech Classification simplified abstract

From WikiPatents
Jump to navigation Jump to search

Multichannel Audio Speech Classification

Organization Name

Microsoft Technology Licensing, LLC==Inventor(s)==

[[Category:Oron Nir of Herzeliya (IL)]]

[[Category:Inbal Sagiv of Kfar-Saba (IL)]]

[[Category:Maayan Yedidia of Ramat Gan (IL)]]

[[Category:Fardau Van Neerden of Driel (NL)]]

[[Category:Itai Norman of Tel Aviv (IL)]]

Multichannel Audio Speech Classification - A simplified explanation of the abstract

This abstract first appeared for US patent application 17804606 titled 'Multichannel Audio Speech Classification

Simplified Explanation

- The present disclosure describes systems and methods for multichannel audio speech classification. - The system receives an audio signal with multiple audio channels. - Each audio channel is transcoded to a predefined audio format. - Average power values are calculated for one or more data windows in each transcoded audio channel. - Correlation values are calculated between the average power values of each audio channel and the combined average power value of the other audio channels. - The correlation values are compared against a threshold value to determine if the audio signal is speech-based communication. - An action associated with the audio signal is performed based on the classification.


Original Abstract Submitted

Examples of the present disclosure describe systems and methods for multichannel audio speech classification. In examples, an audio signal comprising multiple audio channels is received at a processing device. Each of the audio channels in the audio signal is transcoded to a predefined audio format. For each of the transcoded audio channels, an average power value is calculated for one or more data windows in the audio signal. A correlation value is calculated between the average power value for each audio channel and the combined average power value of the other audio channels in the audio signal. Each of the correlation values (or an aggregated correlation value for the audio channels) is then compared against a threshold value to determine whether the audio signal is to be classified as a speech-based communication. Based on the classification, an action associated with the audio signal may be performed.