International business machines corporation (20240194202). ARTIFICIAL INTELLIGENCE CAPTIONS USING AN ENSEMBLE METHOD FOR AUDIO TEMPO AND PITCH simplified abstract

From WikiPatents
Jump to navigation Jump to search

ARTIFICIAL INTELLIGENCE CAPTIONS USING AN ENSEMBLE METHOD FOR AUDIO TEMPO AND PITCH

Organization Name

international business machines corporation

Inventor(s)

Willie L. Scott, Ii of Austin TX (US)

ARTIFICIAL INTELLIGENCE CAPTIONS USING AN ENSEMBLE METHOD FOR AUDIO TEMPO AND PITCH - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240194202 titled 'ARTIFICIAL INTELLIGENCE CAPTIONS USING AN ENSEMBLE METHOD FOR AUDIO TEMPO AND PITCH

Simplified Explanation: This patent application describes a method, computer system, and computer program product for generating captions for audiovisual content by altering the rate of speech and integrating word predictions.

  • Capturing input audio containing audiovisual content
  • Processing the input audio to extract rate of speech, word timings, and word predictions
  • Altering the rate of speech to fall within a predetermined range
  • Extracting new word timings and predictions from the altered audio
  • Creating a mapping between input and new word timings
  • Selecting word predictions based on the mapping
  • Integrating selected word predictions into the audiovisual content for display

Potential Applications: This technology can be used in video editing software, online streaming platforms, accessibility tools for the hearing impaired, and language learning applications.

Problems Solved: This technology addresses the need for accurate and efficient caption generation for audiovisual content, improving accessibility and user experience.

Benefits: The benefits of this technology include improved accuracy of captions, faster caption generation process, enhanced user accessibility, and better user engagement with audiovisual content.

Commercial Applications: Caption generation technology can be utilized in video production companies, streaming services, educational platforms, and communication tools for the deaf and hard of hearing community.

Prior Art: Prior research in the field of automatic speech recognition and caption generation can provide insights into similar technologies and approaches.

Frequently Updated Research: Stay updated on advancements in automatic speech recognition, natural language processing, and audiovisual content analysis for potential improvements in caption generation technology.

Questions about Caption Generation Technology: 1. How does this technology improve user accessibility to audiovisual content? 2. What are the key factors influencing the accuracy of word predictions in caption generation?


Original Abstract Submitted

according to one embodiment, a method, computer system, and computer program product for generating captions is provided. the present invention may include capturing input audio comprising audiovisual content; processing the input audio to extract an input rate of speech, input word timings, and input word predictions; generating one or more new audio files by altering the input rate of speech of the input audio to fall within a pre-determined range; processing the one or more new audio files to extract new word timings and a new word predictions; creating a mapping that pairs the input word timings with corresponding new word timings; selecting a word prediction for each paired input word timing and new word timing based on the mapping; and integrating the selected word predictions into the audiovisual content for display.