International business machines corporation (20240194202). ARTIFICIAL INTELLIGENCE CAPTIONS USING AN ENSEMBLE METHOD FOR AUDIO TEMPO AND PITCH simplified abstract
Contents
ARTIFICIAL INTELLIGENCE CAPTIONS USING AN ENSEMBLE METHOD FOR AUDIO TEMPO AND PITCH
Organization Name
international business machines corporation
Inventor(s)
Willie L. Scott, Ii of Austin TX (US)
ARTIFICIAL INTELLIGENCE CAPTIONS USING AN ENSEMBLE METHOD FOR AUDIO TEMPO AND PITCH - A simplified explanation of the abstract
This abstract first appeared for US patent application 20240194202 titled 'ARTIFICIAL INTELLIGENCE CAPTIONS USING AN ENSEMBLE METHOD FOR AUDIO TEMPO AND PITCH
Simplified Explanation: This patent application describes a method, computer system, and computer program product for generating captions for audiovisual content by altering the rate of speech and integrating word predictions.
- Capturing input audio containing audiovisual content
- Processing the input audio to extract rate of speech, word timings, and word predictions
- Altering the rate of speech to fall within a predetermined range
- Extracting new word timings and predictions from the altered audio
- Creating a mapping between input and new word timings
- Selecting word predictions based on the mapping
- Integrating selected word predictions into the audiovisual content for display
Potential Applications: This technology can be used in video editing software, online streaming platforms, accessibility tools for the hearing impaired, and language learning applications.
Problems Solved: This technology addresses the need for accurate and efficient caption generation for audiovisual content, improving accessibility and user experience.
Benefits: The benefits of this technology include improved accuracy of captions, faster caption generation process, enhanced user accessibility, and better user engagement with audiovisual content.
Commercial Applications: Caption generation technology can be utilized in video production companies, streaming services, educational platforms, and communication tools for the deaf and hard of hearing community.
Prior Art: Prior research in the field of automatic speech recognition and caption generation can provide insights into similar technologies and approaches.
Frequently Updated Research: Stay updated on advancements in automatic speech recognition, natural language processing, and audiovisual content analysis for potential improvements in caption generation technology.
Questions about Caption Generation Technology: 1. How does this technology improve user accessibility to audiovisual content? 2. What are the key factors influencing the accuracy of word predictions in caption generation?
Original Abstract Submitted
according to one embodiment, a method, computer system, and computer program product for generating captions is provided. the present invention may include capturing input audio comprising audiovisual content; processing the input audio to extract an input rate of speech, input word timings, and input word predictions; generating one or more new audio files by altering the input rate of speech of the input audio to fall within a pre-determined range; processing the one or more new audio files to extract new word timings and a new word predictions; creating a mapping that pairs the input word timings with corresponding new word timings; selecting a word prediction for each paired input word timing and new word timing based on the mapping; and integrating the selected word predictions into the audiovisual content for display.