20240021201. AUDIO CAPTION GENERATION METHOD, AUDIO CAPTION GENERATION APPARATUS, AND PROGRAM simplified abstract (NIPPON TELEGRAPH AND TELEPHONE CORPORATION)

From WikiPatents
Jump to navigation Jump to search

AUDIO CAPTION GENERATION METHOD, AUDIO CAPTION GENERATION APPARATUS, AND PROGRAM

Organization Name

NIPPON TELEGRAPH AND TELEPHONE CORPORATION

Inventor(s)

Yuma Koizumi of Tokyo (JP)

Masahiro Yasuda of Tokyo (JP)

AUDIO CAPTION GENERATION METHOD, AUDIO CAPTION GENERATION APPARATUS, AND PROGRAM - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240021201 titled 'AUDIO CAPTION GENERATION METHOD, AUDIO CAPTION GENERATION APPARATUS, AND PROGRAM

Simplified Explanation

The abstract describes an audio caption generation apparatus that can generate accurate captions for audio signals, even with a small amount of training data. The apparatus includes a training data storage that stores a dataset of audio signals and their corresponding captions. It also includes an audio similarity calculation unit that calculates the similarity between a target audio and each audio signal in the training data. A guidance caption retrieval unit retrieves multiple captions corresponding to audio signals similar to the target audio. Finally, a caption generation unit generates a caption for the target audio by determining words in order from the acquired captions.

  • An audio caption generation apparatus generates captions for audio signals with high accuracy.
  • The apparatus includes a training data storage that stores a dataset of audio signals and their corresponding captions.
  • An audio similarity calculation unit calculates the similarity between a target audio and each audio signal in the training data.
  • A guidance caption retrieval unit retrieves multiple captions corresponding to audio signals similar to the target audio.
  • A caption generation unit generates a caption for the target audio by determining words in order from the acquired captions.

Potential Applications:

  • Automatic captioning for audio content in various domains such as videos, podcasts, and lectures.
  • Accessibility features for individuals with hearing impairments.
  • Content indexing and searchability for audio-based platforms.

Problems Solved:

  • Overcoming the challenge of generating accurate captions for audio signals with limited training data.
  • Reducing the manual effort required for captioning audio content.
  • Improving accessibility and inclusivity by providing captions for audio-based content.

Benefits:

  • High accuracy in generating captions for audio signals, even with a small amount of training data.
  • Time and cost savings by automating the captioning process.
  • Enhanced accessibility for individuals with hearing impairments.
  • Improved searchability and indexing of audio-based content.


Original Abstract Submitted

even in a case where an amount of training data is small, a caption for an audio signal is generated with high accuracy. an audio caption generation apparatus () generates a caption for an input target audio. a training data storage () stores a training data set including a set of an audio signal and a caption corresponding thereto. an audio similarity calculation unit () calculates similarity between the target audio and each audio signal of training data. a guidance caption retrieval unit () acquires a plurality of captions corresponding to an audio signal similar to the target audio. a caption generation unit () generates a caption for the target audio by determining words in order from the head on the basis of the acquired captions.