20240015371. SYSTEMS AND METHODS FOR GENERATING A VIDEO SUMMARY OF A VIRTUAL EVENT simplified abstract (Verizon Patent and Licensing Inc.)

From WikiPatents
Jump to navigation Jump to search

SYSTEMS AND METHODS FOR GENERATING A VIDEO SUMMARY OF A VIRTUAL EVENT

Organization Name

Verizon Patent and Licensing Inc.

Inventor(s)

Subham Biswas of Thane (IN)

Saurabh Tahiliani of Noida (IN)

SYSTEMS AND METHODS FOR GENERATING A VIDEO SUMMARY OF A VIRTUAL EVENT - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240015371 titled 'SYSTEMS AND METHODS FOR GENERATING A VIDEO SUMMARY OF A VIRTUAL EVENT

Simplified Explanation

The abstract describes a video summary device that can generate a textual summary of a virtual event. It can also generate a phonemic transcription of the textual summary and create a text embedding based on the transcription. Additionally, the device can generate an audio embedding based on a target voice and produce an audio output of the phonemic transcription spoken by the target voice. Furthermore, the device can generate an image embedding based on video data of a target user, specifically capturing facial movements. It can then create a video output showing different facial movements of the target user speaking the phonemic transcription, using the text and image embeddings.

  • The video summary device can generate a textual summary of a virtual event.
  • It can generate a phonemic transcription of the textual summary.
  • The device can create a text embedding based on the phonemic transcription.
  • It can generate an audio embedding based on a target voice.
  • The device can produce an audio output of the phonemic transcription spoken by the target voice.
  • It can generate an image embedding based on video data of a target user, capturing facial movements.
  • The device can create a video output showing different facial movements of the target user speaking the phonemic transcription, using the text and image embeddings.

Potential Applications:

  • Automated summarization of virtual events or meetings.
  • Creation of personalized videos with a target user's voice and facial movements.
  • Enhancing accessibility by providing audio outputs of textual summaries.

Problems Solved:

  • Time-consuming manual transcription and summarization of virtual events.
  • Lack of personalization in video content.
  • Limited accessibility for individuals with visual impairments.

Benefits:

  • Efficient and accurate summarization of virtual events.
  • Personalized and engaging video content.
  • Improved accessibility for individuals with visual impairments.


Original Abstract Submitted

a video summary device may generate a textual summary of a transcription of a virtual event. the video summary device may generate a phonemic transcription of the textual summary and generate a text embedding based on the phonemic transcription. the video summary device may generate an audio embedding based on a target voice. the video summary device may generate an audio output of the phonemic transcription uttered by the target voice. the audio output may be generated based on the text embedding and the audio embedding. the video summary device may generate an image embedding based on video data of a target user. the image embedding may include information regarding images of facial movements of the target user. the video summary device may generate a video output of different facial movements of the target user uttering the phonemic transcription, based on the text embedding and the image embedding.