18477859. GENERATING AUDIO FILES FROM TEXT INPUT simplified abstract (Meta Platforms Technologies, LLC)

From WikiPatents
Revision as of 03:21, 16 April 2024 by Wikipatents (talk | contribs) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

GENERATING AUDIO FILES FROM TEXT INPUT

Organization Name

Meta Platforms Technologies, LLC

Inventor(s)

Yaniv Nechemia Taigman of Raanana (IL)

Felix Kruk of Rehovot (IL)

Yossef Mordechay Adi of Rishon Le Zion (IL)

Gabriel Synnaeve of Paris (FR)

Adam Polyak of Tel Aviv (IL)

Uriel Singer of Harish (IL)

Devi Niru Parikh of San Francisco CA (US)

[[:Category:Alexandre D�fossez of Paris (FR)|Alexandre D�fossez of Paris (FR)]][[Category:Alexandre D�fossez of Paris (FR)]]

Jade Copet of Paris (FR)

GENERATING AUDIO FILES FROM TEXT INPUT - A simplified explanation of the abstract

This abstract first appeared for US patent application 18477859 titled 'GENERATING AUDIO FILES FROM TEXT INPUT

Simplified Explanation

The patent application describes methods, systems, and storage media for generating audio data by encoding text inputs and representative audio sources into audio tokens and text representations, respectively. The audio tokens are mapped to text representations to determine a relationship score, which identifies the distribution of audio tokens. The technology also involves decoding the audio tokens to reconstruct the audio source.

  • Encoding text inputs and audio sources into audio tokens and text representations
  • Mapping audio tokens to text representations to determine relationship scores
  • Decoding audio tokens to reconstruct audio sources

Potential Applications

The technology can be applied in speech recognition, language translation, audio editing, and voice synthesis.

Problems Solved

This technology solves the problem of efficiently generating audio data from text inputs and representative audio sources.

Benefits

The benefits of this technology include improved audio data generation, enhanced speech recognition accuracy, and more efficient language translation.

Potential Commercial Applications

Potential commercial applications of this technology include speech-to-text software, language translation services, audio editing tools, and voice-controlled devices.

Possible Prior Art

One possible prior art for this technology could be existing speech recognition systems that map audio inputs to text outputs.

Unanswered Questions

1. How does this technology handle accents and dialects in speech recognition? 2. What is the computational complexity of mapping audio tokens to text representations?


Original Abstract Submitted

Methods, systems, and storage media for generating audio data includes receiving a text input. The method also includes receiving a plurality of representative audio sources and encoding the plurality of representative audio sources into a plurality of audio tokens. The method includes encoding the text input into a plurality of text representations. The method comprises mapping each audio tokens of the plurality of audio tokens to a text representation of the plurality of text representations. The method also comprises determining a relationship score based on mapping each audio tokens to the text representation, wherein the relationship score identifies a distribution of audio tokens from the plurality of audio tokens. The method and systems can also comprise decoding the subgroup of audio tokens to yield a reconstructed audio source.