Google LLC (20250022477). COMPRESSING AUDIO WAVEFORMS USING A STRUCTURED LATENT SPACE

From WikiPatents
Jump to navigation Jump to search

COMPRESSING AUDIO WAVEFORMS USING A STRUCTURED LATENT SPACE

Organization Name

Google LLC

Inventor(s)

Ahmed Omran of Baar CH

Neil Zeghidour of Paris FR

Zalán Borsos of Zurich CH

Félix De Chaumont Quitry of Zürich CH

Marco Tagliasacchi of Kilchberg CH

COMPRESSING AUDIO WAVEFORMS USING A STRUCTURED LATENT SPACE

This abstract first appeared for US patent application 20250022477 titled 'COMPRESSING AUDIO WAVEFORMS USING A STRUCTURED LATENT SPACE

Original Abstract Submitted

methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an encoder neural network and a decoder neural network. in one aspect, a method includes obtaining a first initial audio waveform and a first noisy audio waveform, obtaining a second initial audio waveform and a second noisy audio waveform, processing the first noisy audio waveform and the second noisy audio waveform using an encoder neural network, generating a blended embedding by concatenating: (i) clean feature dimensions from an embedding of the first noisy audio waveform, and (ii) noise feature dimensions from an embedding of the second noisy audio waveform, processing the blended embedding using a decoder neural network to generate a reconstructed audio waveform, determining gradients of an objective function; and updating parameter values of the encoder neural network and the decoder neural network using the gradients.