18493268. METHOD FOR SPEECH-TO-SPEECH CONVERSION simplified abstract (GOOGLE LLC)

From WikiPatents
Revision as of 06:26, 8 May 2024 by Wikipatents (talk | contribs) (Creating a new page)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

METHOD FOR SPEECH-TO-SPEECH CONVERSION

Organization Name

GOOGLE LLC

Inventor(s)

Oleg Rybakov of Redmond WA (US)

Fadi Biadsy of Sandyston NJ (US)

METHOD FOR SPEECH-TO-SPEECH CONVERSION - A simplified explanation of the abstract

This abstract first appeared for US patent application 18493268 titled 'METHOD FOR SPEECH-TO-SPEECH CONVERSION

Simplified Explanation

The present disclosure relates to a streaming speech-to-speech conversion model, where an encoder runs in real time while a user is speaking, then after the speaking stops, a decoder generates output audio in real time. A streaming-based approach produces an acceptable delay with minimal loss in conversion quality when compared to other non-streaming server-based models. A hybrid model approach combines look-ahead in the encoder and a non-causal stacker with non-causal self-attention.

  • Encoder runs in real time while user is speaking
  • Decoder generates output audio in real time after speaking stops
  • Streaming-based approach with minimal delay and quality loss
  • Hybrid model combines look-ahead in encoder and non-causal stacker with non-causal self-attention

Potential Applications

This technology can be applied in real-time translation services, virtual assistants, and language learning tools.

Problems Solved

This technology solves the problem of delays in speech-to-speech conversion and maintains high conversion quality.

Benefits

The benefits of this technology include real-time conversion, minimal delay, and high-quality output audio.

Potential Commercial Applications

Potential commercial applications of this technology include speech translation apps, virtual meeting platforms, and language tutoring services.

Possible Prior Art

One possible prior art could be traditional speech-to-text models that do not operate in real time.

Unanswered Questions

How does this technology handle different accents and speech patterns?

The article does not address how the streaming speech-to-speech conversion model adapts to various accents and speech patterns.

What is the energy consumption of this technology compared to non-streaming models?

The article does not provide information on the energy consumption of the streaming speech-to-speech conversion model in comparison to non-streaming models.


Original Abstract Submitted

The present disclosure relates to a streaming speech-to-speech conversion model, where an encoder runs in real time while a user is speaking, then after the speaking stops, a decoder generates output audio in real time. A streaming-based approach produces an acceptable delay with minimal loss in conversion quality when compared to other non-streaming server-based models. A hybrid model approach for combines look-ahead in the encoder and a non-causal stacker with non-causal self-attention.