18383765. DISPLAY APPARATUS THAT PROVIDES ANSWER TO QUESTION BASED ON IMAGE AND CONTROLLING METHOD THEREOF simplified abstract (SAMSUNG ELECTRONICS CO., LTD.)

From WikiPatents
Jump to navigation Jump to search

DISPLAY APPARATUS THAT PROVIDES ANSWER TO QUESTION BASED ON IMAGE AND CONTROLLING METHOD THEREOF

Organization Name

SAMSUNG ELECTRONICS CO., LTD.

Inventor(s)

Jakub Hoschiowics of Warszawa (PL)

Adrian Wisniewski of Warszawa (PL)

DISPLAY APPARATUS THAT PROVIDES ANSWER TO QUESTION BASED ON IMAGE AND CONTROLLING METHOD THEREOF - A simplified explanation of the abstract

This abstract first appeared for US patent application 18383765 titled 'DISPLAY APPARATUS THAT PROVIDES ANSWER TO QUESTION BASED ON IMAGE AND CONTROLLING METHOD THEREOF

Simplified Explanation

The display apparatus described in the patent application allows users to input information while watching a video, which is then used to generate text based on the video content.

  • The display apparatus includes a display and one or more processors.
  • Users can input information while watching a video on the display.
  • The apparatus obtains at least one passage by inputting the user input and the video to a first encoder.
  • A text is generated by using a neural network model and a part of a plurality of tokens included in the obtained passage.
  • The generated text is outputted for the user.

---

      1. Potential Applications
  • Educational tools for generating text summaries of video content.
  • Assistive technology for individuals with hearing impairments to understand video content.
      1. Problems Solved
  • Difficulty in summarizing video content quickly and accurately.
  • Lack of tools for generating text from video content in real-time.
      1. Benefits
  • Improved accessibility to video content for a wider range of users.
  • Enhanced learning experience through text summaries of video content.


Original Abstract Submitted

Disclosed is a display apparatus. The display apparatus includes a display; and one or more processors configured to, based on a user input being received while a video is provided through the display, obtain at least one passage by inputting information on the user input and the video to a first encoder, obtain a text by using a part of a plurality of tokens included in the at least one passage by inputting the video, the information on the user input, and the at least one passage into a neural network model, and output the obtained text.