SRI International (20240257801). METHOD AND SYSTEM FOR CREATING A PROSODIC SCRIPT simplified abstract

From WikiPatents
Jump to navigation Jump to search

METHOD AND SYSTEM FOR CREATING A PROSODIC SCRIPT

Organization Name

SRI International

Inventor(s)

Jeffrey Lubin of Princeton NJ (US)

Alexander Erdmann of Malvern OH (US)

James Bergen of Pennington NJ (US)

Harry Bratt of Mountain View CA (US)

Jihua Huang of Santa Clara CA (US)

Sarah Bakst of San Francisco CA (US)

Michael Lomnitz of Castro Valley CA (US)

Zachary Daniels of Robbinsville NJ (US)

John Cadigan of San Diego CA (US)

Ali Chaudhry of Princeton Junction NJ (US)

Zhiwei Zhu of Princeton NJ (US)

Joshua Chattin of Mount Laurel NJ (US)

Girish Acharya of Redwood City CA (US)

METHOD AND SYSTEM FOR CREATING A PROSODIC SCRIPT - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240257801 titled 'METHOD AND SYSTEM FOR CREATING A PROSODIC SCRIPT

The patent application describes a method, apparatus, and system for creating a script for rendering audio and/or video streams. This involves identifying prosodic speech features in an audio stream and/or language model, creating prosodic speech symbols for these features, converting the audio stream and/or language model into a text stream, inserting the prosodic speech symbols into the text stream, identifying prosodic gestures in a video stream, creating gesture symbols for these gestures, and inserting the gesture symbols into the text stream along with the prosodic speech symbols to create a prosodic script.

  • Method, apparatus, and system for creating a script for rendering audio and/or video streams
  • Identification of prosodic speech features in audio streams and language models
  • Creation of prosodic speech symbols for identified features
  • Conversion of audio streams and language models into text streams
  • Temporal insertion of prosodic speech symbols into the text stream
  • Identification of prosodic gestures in video streams
  • Creation of gesture symbols for identified gestures
  • Temporal insertion of gesture symbols into the text stream along with prosodic speech symbols to create a prosodic script

Potential Applications: - Speech recognition and synthesis systems - Audiovisual content creation - Language learning tools - Communication aids for individuals with speech disorders

Problems Solved: - Enhancing the expressiveness and naturalness of synthesized speech - Improving the synchronization of gestures with speech in audiovisual content - Facilitating the creation of scripts for multimedia presentations

Benefits: - Enhanced user experience in speech-based applications - Improved accessibility for individuals with communication challenges - Increased engagement and effectiveness of audiovisual content

Commercial Applications: Title: "Enhancing Audiovisual Content Creation with Prosodic Scripting Technology" This technology can be used in the development of speech recognition software, language learning applications, and multimedia production tools. It has implications for industries such as entertainment, education, and assistive technology.

Questions about Prosodic Scripting Technology: 1. How does prosodic scripting technology improve the naturalness of synthesized speech? - Prosodic scripting technology enhances the expressiveness of synthesized speech by incorporating prosodic speech symbols that reflect intonation, rhythm, and emphasis in spoken language.

2. What are the potential applications of prosodic scripting technology beyond audiovisual content creation? - Prosodic scripting technology can be utilized in various fields such as speech therapy, virtual assistants, and interactive storytelling to enhance communication and engagement.


Original Abstract Submitted

a method, apparatus, and system for creating a script for rendering audio and/or video streams include identifying at least one prosodic speech feature in a received audio stream and/or a received language model, creating a respective prosodic speech symbol for each of the at least one identified prosodic speech features, converting the received audio stream and/or the received language model into a text stream, temporally inserting the created at least one prosodic speech symbol into the text stream, identifying in a received video stream at least one prosodic gesture of at least a portion of a body of a speaker of the received audio stream, creating at least one respective gesture symbol for each of the at least one identified prosodic gestures, and temporally inserting the created at least one gesture symbol into the text stream along with the at least one prosodic speech symbol to create a prosodic script.