Fake voice detection method based on dual-track differential modeling

Organization Name

Inner Mongolia University

Inventor(s)

Rui Liu of Hohhot CN

Jinhua Zhang of Hohhot CN

Yifan Hu of Hohhot CN

Haolin Zuo of Hohhot CN

Yuxuan Ma of Hohhot CN

Fake voice detection method based on dual-track differential modeling

This abstract first appeared for US patent application 20250095669 titled 'Fake voice detection method based on dual-track differential modeling

Original Abstract Submitted

a fake voice detection method based on dual-track differential modeling includes: converting, by a pre-trained single/dual-track voice conversion model, a single-track audio into a stereo; extracting left and right-track mel spectrogram features; performing texture enhancement on absolute values of differences between an original single-track mel spectrogram feature and each of the left and right-track mel spectrogram features; inputting texture enhancement results into left and right dual-branch feature extractors respectively to acquire a final attention map through feature extraction, processing and fusion; and inputting the final attention map into an attention pooling layer and a final binary classification layer to acquire a fake voice detection result. the fake voice detection method achieves fake voice detection by means of dual-track differential modeling, fine-grained texture enhancement, and multi-head attention feature fusion, improving the accuracy, transferability, and generalization of fake voice detection.