Amazon Technologies, Inc. (20240274122). SPEECH TRANSLATION WITH PERFORMANCE CHARACTERISTICS simplified abstract

From WikiPatents
Jump to navigation Jump to search


Organization Name

Amazon Technologies, Inc.


Duo Wang of Cambridge (GB)

Vincent Laurent J. Pollet of Astene (BE)

Mikolaj Wojciech Babianski of Gdansk (PL)

Jakub Bartlomiej Swiatkowski of Warsaw (PL)


This abstract first appeared for US patent application 20240274122 titled 'SPEECH TRANSLATION WITH PERFORMANCE CHARACTERISTICS

Simplified Explanation: The patent application describes a system that can translate source speech in one language into synthesized speech in another language while preserving vocal performance characteristics like intonation, emphasis, and emotion.

Key Features and Innovation:

  • Processing source speech in a source language and outputting synthesized speech in a target language.
  • Retaining vocal performance characteristics such as intonation, emphasis, rhythm, style, and emotion.
  • Generating transcript data by translating the source speech.
  • Processing the transcript data with language, speaker, and performance embeddings to create synthesized speech.
  • Controlling the duration of segments of the synthesized speech to align with the source speech for dubbing multimedia content.

Potential Applications: This technology could be used for dubbing movies, TV shows, and other multimedia content into different languages. It could also assist in language learning and accessibility for individuals with hearing impairments.

Problems Solved: The system addresses the challenge of translating vocal performance characteristics accurately from one language to another. It also helps in creating high-quality dubbed content that matches the original audio.


  • Enables accurate translation of vocal performance characteristics.
  • Enhances the quality of dubbed multimedia content.
  • Facilitates language learning and accessibility.

Commercial Applications: The technology could be valuable for dubbing studios, language learning platforms, streaming services, and accessibility-focused organizations. It has the potential to improve user experience and expand the reach of multimedia content.

Prior Art: Researchers interested in this technology may explore existing patents related to speech translation, voice synthesis, and dubbing technologies to understand the prior art landscape.

Frequently Updated Research: Stay updated on advancements in speech recognition, natural language processing, and machine learning techniques that could enhance the accuracy and efficiency of speech translation systems.

Questions about Speech Translation Systems: 1. How does the system ensure the accurate preservation of vocal performance characteristics during translation? 2. What are the potential challenges in aligning the duration of segments in synthesized speech with the original source speech?

Original Abstract Submitted

an expressive speech translation system may process source speech in a source language and output synthesized speech in a target language while retaining vocal performance characteristics such as intonation, emphasis, rhythm, style, and/or emotion. the system may receive a transcript of the source speech, translate it, and generate transcript data. to generate the synthesized speech, the system may process the transcript data with a language embedding representing language-dependent speech characteristics of the target language, a speaker embedding representing speaker-dependent voice identity characteristics of a speaker, and a performance embedding representing the vocal performance characteristics of the source speech. the system may control the duration of segments of the synthesized speech to better align with corresponding segments of the source speech for the purpose of dubbing multimedia content with synthesized speech in a language different from that of the original audio.