17931026. GENERATING DUBBED AUDIO FROM A VIDEO-BASED SOURCE simplified abstract (Google LLC)

From WikiPatents
Jump to navigation Jump to search

GENERATING DUBBED AUDIO FROM A VIDEO-BASED SOURCE

Organization Name

Google LLC

Inventor(s)

Andrew R. Levine of New York NY (US)

Buddhika Kottahachchi of San Mateo CA (US)

Christopher Davie of Queens NY (US)

Kulumani Sriram of Danville CA (US)

Richard James Potts of Mountain View CA (US)

Sasakthi S. Abeysinghe of Santa Clara CA (US)

GENERATING DUBBED AUDIO FROM A VIDEO-BASED SOURCE - A simplified explanation of the abstract

This abstract first appeared for US patent application 17931026 titled 'GENERATING DUBBED AUDIO FROM A VIDEO-BASED SOURCE

Simplified Explanation

The present disclosure involves a method for generating and adjusting translated audio from a video-based source. This includes receiving video and corresponding audio data in a first language, generating a translated preliminary transcript in a second language, aligning timing windows of portions of the translated transcript with corresponding segments of the audio data, determining portions of the translated aligned transcript that exceed a timing window range of the corresponding audio segments to generate flagged transcript portions, transmitting the original transcript, the translated aligned transcript, and the first speech dub to a device, receiving a modified original transcript from the device, and generating a second speech dub in the second language based on the modified transcript.

  • Receiving video and audio data in one language and generating a translated transcript in another language.
  • Aligning timing windows of translated transcript portions with corresponding audio segments.
  • Flagging transcript portions that exceed timing window range of audio segments.
  • Transmitting original and translated transcripts along with speech dub to a device.
  • Receiving modified transcript from the device and generating a second speech dub based on it.

Potential Applications

This technology can be applied in language translation services, video content localization, educational platforms, and accessibility tools for the hearing impaired.

Problems Solved

This technology solves the problem of efficiently translating audio content from one language to another while maintaining synchronization with the original video source.

Benefits

The benefits of this technology include accurate translation of audio content, improved accessibility for non-native speakers, and enhanced user experience for multilingual audiences.

Potential Commercial Applications

Potential commercial applications of this technology include video streaming platforms, language learning apps, online education platforms, and media production companies looking to reach global audiences.

Possible Prior Art

One possible prior art could be the use of machine translation algorithms in audio transcription services, but the specific method of aligning translated transcripts with audio segments may be a novel aspect of this technology.

Unanswered Questions

How does this technology handle dialects or accents in the source audio data?

The abstract does not mention how the technology accounts for variations in dialects or accents that may affect the accuracy of the translation.

What is the level of accuracy achieved by this technology in translating and aligning audio content?

The abstract does not provide information on the accuracy rate or any metrics used to measure the effectiveness of the translation and alignment process.


Original Abstract Submitted

The present disclosure relates to generating and adjusting translated audio from a video-based source. The method includes receiving video data and corresponding audio data in a first language; generating a translated preliminary transcript in a second language; aligning timing windows of portions of the translated preliminary transcript with corresponding segments of the audio data; determining portions of the translated aligned transcript in the second language that exceed a timing window range of the corresponding segments of the audio data in the first language to generate flagged transcript portions; transmitting the original transcript, the translated aligned transcript, and the first speech dub to a first device, the generated flagged transcript portions included in the original transcript and the translated aligned transcript; receiving, from the first device, a modified original transcript; and generating, based on the modified original transcript, a second speech dub in the second language.