SRI International (20240304175). SPEECH MODIFICATION USING ACCENT EMBEDDINGS simplified abstract

From WikiPatents
Jump to navigation Jump to search

SPEECH MODIFICATION USING ACCENT EMBEDDINGS

Organization Name

SRI International

Inventor(s)

Alexander Erdmann of Malvern OH (US)

Sarah Bakst of San Francisco CA (US)

Harry Bratt of Mountain View CA (US)

Dimitra Vergyri of Sunnyvale CA (US)

Horacio Franco of Menlo Park CA (US)

SPEECH MODIFICATION USING ACCENT EMBEDDINGS - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240304175 titled 'SPEECH MODIFICATION USING ACCENT EMBEDDINGS

The patent application describes techniques for a machine learning system that can generate synthetic speech clips based on a dataset of sample speech clips.

  • The system generates sequences, initializes speaker and accent embeddings, updates these embeddings, and generates augmented embeddings for speech synthesis.
  • It further decomposes audio waveforms into spectral slices, processes them to map to second magnitude spectral slices, and combines them with the original phase to create modified audio waveforms.

Potential Applications:

  • This technology can be used in speech synthesis applications, virtual assistants, and voice-controlled devices.
  • It can also be applied in language translation services, voice cloning, and personalized speech generation.

Problems Solved:

  • The technology addresses the need for realistic and diverse synthetic speech generation.
  • It improves the quality and naturalness of synthesized speech by incorporating speaker and accent embeddings.

Benefits:

  • Enhanced speech synthesis capabilities for various applications.
  • Improved user experience with more natural and diverse synthetic speech.
  • Personalized speech generation for specific speakers or accents.

Commercial Applications:

  • "Advanced Speech Synthesis Techniques for Virtual Assistants and Voice-Controlled Devices" - This technology can revolutionize the way virtual assistants interact with users, providing more natural and personalized responses.

Prior Art:

  • Researchers in the field of speech synthesis and machine learning have explored similar techniques for improving synthetic speech quality and diversity.

Frequently Updated Research:

  • Stay updated on advancements in machine learning models for speech synthesis and the integration of speaker embeddings for personalized speech generation.

Questions about Speech Synthesis Technology: 1. How does this technology improve the quality of synthetic speech compared to traditional methods?

  - The technology enhances speech synthesis by incorporating speaker and accent embeddings, resulting in more natural and diverse speech output.

2. What are the potential privacy concerns related to using personalized speech generation technology?

  - Privacy concerns may arise from the collection and storage of voice data used to create personalized speech models. It is essential to address data security and user consent issues in the development and deployment of such technology.


Original Abstract Submitted

techniques for a machine learning system configured to obtain a dataset of a plurality of sample speech clips; generate a plurality of sequence; initialize a plurality of speaker embeddings and a plurality of accent embeddings; update the plurality of speaker embeddings; update the plurality of accent embeddings; generate a plurality of augmented embeddings based on the plurality of sequence embeddings, the plurality of speaker embeddings, and the plurality of accent embeddings; and generate a plurality of synthetic speech clips based on the plurality of augmented embeddings. the machine learning system may further be configured to obtain an audio waveform; decompose the audio waveform into first magnitude spectral slices and an original phase; process the first magnitude spectral slices to map the first magnitude spectral slices to second magnitude spectral slices; and generate a modified audio waveform in part by combining the second magnitude spectral slices and the original phase.