20230093405. OPTIMIZATION OF LIP SYNCING IN NATURAL LANGUAGE TRANSLATED VIDEO simplified abstract (International Business Machines Corporation)

From WikiPatents
Jump to navigation Jump to search

OPTIMIZATION OF LIP SYNCING IN NATURAL LANGUAGE TRANSLATED VIDEO

Organization Name

International Business Machines Corporation

Inventor(s)

Sathya Santhar of Ramapuram (IN)

Sridevi Kannan of Katupakkam (IN)

Sarbajit K. Rakshit of Kolkata (IN)

Samuel Mathew Jawaharlal of Chennai (IN)

OPTIMIZATION OF LIP SYNCING IN NATURAL LANGUAGE TRANSLATED VIDEO - A simplified explanation of the abstract

This abstract first appeared for US patent application 20230093405 titled 'OPTIMIZATION OF LIP SYNCING IN NATURAL LANGUAGE TRANSLATED VIDEO

Simplified Explanation

The patent application describes a method for creating an optimized video of a speaker, where their speech is translated from one language to another and their lip movements are synchronized with the translated speech. The approach aims to balance the optimization of the translation process while ensuring accurate lip-syncing.

  • Source video is inputted into a neural machine translation model.
  • The model generates multiple potential translations of the speech.
  • A generative adversarial network receives the translations and produces a video for each translation.
  • The videos are classified as either in-sync or out of sync by the network.
  • Lip-syncing scores are assigned to the videos classified as in-sync.

Potential Applications

This technology has potential applications in various fields, including:

  • Language learning: Providing learners with videos where speakers' lip movements match the translated speech can aid in understanding and pronunciation practice.
  • Dubbing and subtitling: Generating videos with accurate lip-syncing can enhance the quality of dubbed or subtitled content.
  • Remote communication: Enabling real-time translation with synchronized lip movements can improve the experience of remote communication platforms.

Problems Solved

The technology addresses the following problems:

  • Lip-syncing accuracy: By synchronizing the speaker's lip movements with the translated speech, the technology ensures a more natural and accurate viewing experience.
  • Translation optimization: The approach aims to balance the optimization of the translation process while maintaining lip-syncing quality, providing an optimized video output.

Benefits

The technology offers several benefits:

  • Enhanced comprehension: Viewers can better understand translated content when the speaker's lip movements align with the translated speech.
  • Improved user experience: Videos with accurate lip-syncing provide a more immersive and engaging experience for the audience.
  • Time and cost efficiency: Automating the process of generating lip-synced videos can save time and resources compared to manual editing and dubbing.


Original Abstract Submitted

an approach for generating an optimized video of a speaker, translated from a source language into a target language with the speaker's lips synchronized to the translated speech, while balancing optimization of the translation into a target language. a source video may be fed into a neural machine translation model. the model may synthesize a plurality of potential translations. the translations may be received by a generative adversarial network which generates video for each translation and classifies the translations as in-sync or out of sync. a lip-syncing score may be for each of the generated videos that are classified as in-sync.