MULTI-TIME-SCALE NEURAL AUDIO CODEC STREAMS

Organization Name

Cisco Technology, Inc.

Inventor(s)

Rafal Pilarczyk of Plock PL

Amir Salah Abdelsamie Abdelwahed of Edinburgh GB

Hui-Ling Lu of Palo Alto CA US

Ivana Balic of Studen BE CH

Yusuf Ziya Isik of Edinburgh GB

David Guoqing Zhang of Fremont CA US

Xuehong Mao of San Jose CA US

Samer Lutfi Hijazi of San Jose CA US

MULTI-TIME-SCALE NEURAL AUDIO CODEC STREAMS

This abstract first appeared for US patent application 18539764 titled 'MULTI-TIME-SCALE NEURAL AUDIO CODEC STREAMS

Original Abstract Submitted

A data-driven audio codec system that involves producing multiple compressed streams comprising encoded information (e.g., codeword indices) at different time scales (time intervals or frequency). This may allow for separation of different properties of speech, such as content and aspects of style (prosody), into the different compressed streams without explicitly enforcing it, i.e., in an unsupervised manner. Speech audio is encoded to produce a plurality of encoded streams comprising encoded information for the speech audio at different time scales. The plurality of encoded streams are decoded to generate output audio.