20240021189. CONFIGURABLE NEURAL SPEECH SYNTHESIS simplified abstract (SoundHound, Inc.)

From WikiPatents
Jump to navigation Jump to search

CONFIGURABLE NEURAL SPEECH SYNTHESIS

Organization Name

SoundHound, Inc.

Inventor(s)

Andrew Richards of Toulouse (FR)

CONFIGURABLE NEURAL SPEECH SYNTHESIS - A simplified explanation of the abstract

This abstract first appeared for US patent application 20240021189 titled 'CONFIGURABLE NEURAL SPEECH SYNTHESIS

Simplified Explanation

The abstract of this patent application describes a system that uses a discriminator trained on labeled speech samples to compute probabilities of voice properties. It also describes a speech synthesis generative neural network that is trained to synthesize speech audio based on text and continuous scale values of voice properties. The generated speech is designed to match the input voice properties as inferred by the discriminator. The voice parameters can include speaker voice parameters, accents, attitudes, and more. The system can be trained using transfer learning or a loss function that considers speech and parameter values. A graphical user interface allows voice designers to synthesize speech with a desired voice or generate a speech synthesis engine with fixed voice parameters. A vector of parameters can be used for comparison to previously registered voices in databases for trademark registration.

  • A discriminator is trained to compute probabilities of voice properties based on labeled speech samples.
  • A speech synthesis generative neural network is trained to synthesize speech audio matching the input voice properties.
  • Voice parameters such as speaker voice parameters, accents, and attitudes can be included.
  • Training can be done using transfer learning or a loss function considering speech and parameter values.
  • A graphical user interface allows voice designers to synthesize speech with a desired voice or generate a speech synthesis engine with fixed voice parameters.
  • Comparison of a vector of parameters to previously registered voices in databases can be used for trademark registration.

Potential applications of this technology:

  • Voice design for products: The graphical user interface allows voice designers to synthesize speech with a desired voice, enabling the customization of voice properties for various products.
  • Speech synthesis engine generation: The system can generate a speech synthesis engine with fixed voice parameters, which can be used in applications such as virtual assistants, voice assistants, and automated customer service systems.
  • Trademark registration: The comparison of voice parameter vectors to previously registered voices in databases can be used for trademark registration, ensuring uniqueness and protection of voice identities.

Problems solved by this technology:

  • Customization of voice properties: The system allows for the synthesis of speech with specific voice properties, enabling personalized and tailored voice experiences.
  • Efficient voice synthesis training: Transfer learning and the use of a discriminator help in training the speech synthesis generative neural network, improving the efficiency and accuracy of voice synthesis.
  • Voice identity protection: The comparison of voice parameter vectors to registered voices helps in trademark registration, preventing unauthorized use and protecting voice identities.

Benefits of this technology:

  • Enhanced user experience: The ability to synthesize speech with desired voice properties provides a more engaging and personalized user experience in various applications.
  • Time and cost savings: The use of transfer learning and a discriminator in training the speech synthesis network can reduce the time and cost required for developing high-quality voice synthesis systems.
  • Voice identity protection: The system helps in protecting voice identities by comparing voice parameter vectors to registered voices, ensuring uniqueness and preventing unauthorized use.


Original Abstract Submitted

a discriminator trained on labeled samples of speech can compute probabilities of voice properties. a speech synthesis generative neural network that takes in text and continuous scale values of voice properties is trained to synthesize speech audio that the discriminator will infer as matching the values of the input voice properties. voice parameters can include speaker voice parameters, accents, and attitudes, among others. training can be done by transfer learning from an existing neural speech synthesis model or such a model can be trained with a loss function that considers speech and parameter values. a graphical user interface can allow voice designers for products to synthesize speech with a desired voice or generate a speech synthesis engine with frozen voice parameters. a vector of parameters can be used for comparison to previously registered voices in databases such as ones for trademark registration.