Intel corporation (20240127789). SYSTEMS AND METHODS FOR PROVIDING NON-LEXICAL CUES IN SYNTHESIZED SPEECH simplified abstract

From WikiPatents
Jump to navigation Jump to search

SYSTEMS AND METHODS FOR PROVIDING NON-LEXICAL CUES IN SYNTHESIZED SPEECH

Organization Name

intel corporation

Inventor(s)

Jessica M. Christian of Redwood City CA (US)

Peter Graff of San Jose CA (US)

Crystal A. Nakatsu of San Jose CA (US)

Beth Ann Hockey of Sunnyvale CA (US)

SYSTEMS AND METHODS FOR PROVIDING NON-LEXICAL CUES IN SYNTHESIZED SPEECH - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240127789 titled 'SYSTEMS AND METHODS FOR PROVIDING NON-LEXICAL CUES IN SYNTHESIZED SPEECH

Simplified Explanation

The abstract describes a system and method for providing non-lexical cues in synthesized speech, such as breathing and prosody cues, to enhance the naturalness of the speech. The cues are inserted into the text based on markup language tags and then used during the synthesis of the speech.

  • Processor circuitry generates breathing and prosody cues to enhance synthesized speech.
  • Cues are inserted into the text at specific points based on markup language tags.
  • The cues are then used during the synthesis of the speech to improve its naturalness.

Potential Applications

This technology could be applied in various fields such as:

  • Assistive technology for individuals with speech impairments.
  • Virtual assistants and chatbots to improve the naturalness of their responses.

Problems Solved

This technology addresses the following issues:

  • Lack of naturalness in synthesized speech.
  • Difficulty in conveying emotions and intentions through synthesized speech.

Benefits

The benefits of this technology include:

  • Enhanced naturalness and expressiveness in synthesized speech.
  • Improved user experience in applications utilizing synthesized speech.

Potential Commercial Applications

Potential commercial applications of this technology include:

  • Speech synthesis software for various industries.
  • Voice-enabled devices and applications.

Possible Prior Art

One possible prior art in this field is the use of markup language tags to enhance text-to-speech synthesis by inserting cues for pauses, emphasis, and intonation.

Unanswered Questions

How does this technology handle different languages and accents in speech synthesis?

The article does not provide information on how the system adapts to different languages and accents during speech synthesis.

What is the impact of these non-lexical cues on the overall performance of the synthesized speech?

The article does not discuss the potential impact of breathing and prosody cues on the overall quality and intelligibility of the synthesized speech.


Original Abstract Submitted

systems and methods are disclosed for providing non-lexical cues in synthesized speech. an example system includes processor circuitry to generate a breathing cue to enhance speech to be synthesized from text; determine a first insertion point of the breathing cue in the text, wherein the breathing cue is identified by a first tag of a markup language; generate a prosody cue to enhance speech to be synthesized from the text; determine a second insertion point of the prosody cue in the text, wherein the prosody cue is identified by a second tag of the markup language; insert the breathing cue at the first insertion point based on the first tag and the prosody cue at the second insertion point based on the second tag; and trigger a synthesis of the speech from the text, the breathing cue, and the prosody cue.