Unsupervised Learning of Disentangled Speech Content and Style Representation

Organization Name

Inventor(s)

Unsupervised Learning of Disentangled Speech Content and Style Representation - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240312449 titled 'Unsupervised Learning of Disentangled Speech Content and Style Representation

The patent application describes a model that can disentangle linguistic content and speaking style in speech.

Content encoder receives input speech and generates a latent representation of linguistic content.
Style encoder receives input speech and generates a latent representation of speaking style.
Decoder generates output speech based on the latent representations of content and style.

Potential Applications: - Speech synthesis with different speaking styles - Voice conversion for personalized speech - Language translation with style preservation

Problems Solved: - Separating content and style in speech data - Enhancing naturalness and expressiveness in speech synthesis - Improving cross-lingual voice conversion

Benefits: - Customizable speech generation - Enhanced naturalness in synthesized speech - Improved accuracy in voice conversion tasks

Commercial Applications: Title: Advanced Speech Synthesis and Voice Conversion Technology This technology can be used in industries such as: - Entertainment (creating unique character voices) - Customer service (personalized automated responses) - Language learning (accent adaptation in language courses)

Questions about the technology: 1. How does this model improve upon existing speech synthesis techniques?

  - This model allows for the separation of content and style in speech, enabling more customizable and natural-sounding speech synthesis.

2. Can this technology be applied to real-time speech processing?

  - Yes, with further optimization, this technology could potentially be used for real-time applications such as voice assistants or live translation services.

Original Abstract Submitted

a linguistic content and speaking style disentanglement model includes a content encoder, a style encoder, and a decoder. the content encoder is configured to receive input speech as input and generate a latent representation of linguistic content for the input speech output. the content encoder is trained to disentangle speaking style information from the latent representation of linguistic content. the style encoder is configured to receive the input speech as input and generate a latent representation of speaking style for the input speech as output. the style encoder is trained to disentangle linguistic content information from the latent representation of speaking style. the decoder is configured to generate output speech based on the latent representation of linguistic content for the input speech and the latent representation of speaking style for the same or different input speech.

Google llc (20240312449). Unsupervised Learning of Disentangled Speech Content and Style Representation simplified abstract

Contents

Unsupervised Learning of Disentangled Speech Content and Style Representation

Organization Name

Inventor(s)

Unsupervised Learning of Disentangled Speech Content and Style Representation - A simplified explanation of the abstract

Original Abstract Submitted

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools