ARTIFICIAL INTELLIGENCE CAPTIONS USING AN ENSEMBLE METHOD FOR AUDIO TEMPO AND PITCH

Organization Name

international business machines corporation

Inventor(s)

ARTIFICIAL INTELLIGENCE CAPTIONS USING AN ENSEMBLE METHOD FOR AUDIO TEMPO AND PITCH - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240194202 titled 'ARTIFICIAL INTELLIGENCE CAPTIONS USING AN ENSEMBLE METHOD FOR AUDIO TEMPO AND PITCH

Simplified Explanation: This patent application describes a method, computer system, and computer program product for generating captions for audiovisual content by altering the rate of speech and integrating word predictions.

Capturing input audio containing audiovisual content
Processing the input audio to extract rate of speech, word timings, and word predictions
Altering the rate of speech to fall within a predetermined range
Extracting new word timings and predictions from the altered audio
Creating a mapping between input and new word timings
Selecting word predictions based on the mapping
Integrating selected word predictions into the audiovisual content for display

Potential Applications: This technology can be used in video editing software, online streaming platforms, accessibility tools for the hearing impaired, and language learning applications.

Problems Solved: This technology addresses the need for accurate and efficient caption generation for audiovisual content, improving accessibility and user experience.

Benefits: The benefits of this technology include improved accuracy of captions, faster caption generation process, enhanced user accessibility, and better user engagement with audiovisual content.

Commercial Applications: Caption generation technology can be utilized in video production companies, streaming services, educational platforms, and communication tools for the deaf and hard of hearing community.

Prior Art: Prior research in the field of automatic speech recognition and caption generation can provide insights into similar technologies and approaches.

Frequently Updated Research: Stay updated on advancements in automatic speech recognition, natural language processing, and audiovisual content analysis for potential improvements in caption generation technology.

Questions about Caption Generation Technology: 1. How does this technology improve user accessibility to audiovisual content? 2. What are the key factors influencing the accuracy of word predictions in caption generation?

Original Abstract Submitted

according to one embodiment, a method, computer system, and computer program product for generating captions is provided. the present invention may include capturing input audio comprising audiovisual content; processing the input audio to extract an input rate of speech, input word timings, and input word predictions; generating one or more new audio files by altering the input rate of speech of the input audio to fall within a pre-determined range; processing the one or more new audio files to extract new word timings and a new word predictions; creating a mapping that pairs the input word timings with corresponding new word timings; selecting a word prediction for each paired input word timing and new word timing based on the mapping; and integrating the selected word predictions into the audiovisual content for display.

International business machines corporation (20240194202). ARTIFICIAL INTELLIGENCE CAPTIONS USING AN ENSEMBLE METHOD FOR AUDIO TEMPO AND PITCH simplified abstract

Contents

ARTIFICIAL INTELLIGENCE CAPTIONS USING AN ENSEMBLE METHOD FOR AUDIO TEMPO AND PITCH

Organization Name

Inventor(s)

ARTIFICIAL INTELLIGENCE CAPTIONS USING AN ENSEMBLE METHOD FOR AUDIO TEMPO AND PITCH - A simplified explanation of the abstract

Original Abstract Submitted

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools