18033758. AUDIO SIGNAL CONVERSION MODEL LEARNING APPARATUS, AUDIO SIGNAL CONVERSION APPARATUS, AUDIO SIGNAL CONVERSION MODEL LEARNING METHOD AND PROGRAM simplified abstract (Nippon Telegraph and Telephone Corporation)

From WikiPatents
Jump to navigation Jump to search

AUDIO SIGNAL CONVERSION MODEL LEARNING APPARATUS, AUDIO SIGNAL CONVERSION APPARATUS, AUDIO SIGNAL CONVERSION MODEL LEARNING METHOD AND PROGRAM

Organization Name

Nippon Telegraph and Telephone Corporation

Inventor(s)

Hirokazu Kameoka of Musashino-shi (JP)

AUDIO SIGNAL CONVERSION MODEL LEARNING APPARATUS, AUDIO SIGNAL CONVERSION APPARATUS, AUDIO SIGNAL CONVERSION MODEL LEARNING METHOD AND PROGRAM - A simplified explanation of the abstract

This abstract first appeared for US patent application 18033758 titled 'AUDIO SIGNAL CONVERSION MODEL LEARNING APPARATUS, AUDIO SIGNAL CONVERSION APPARATUS, AUDIO SIGNAL CONVERSION MODEL LEARNING METHOD AND PROGRAM

Simplified Explanation

The patent application describes a device that learns to convert voice signals using a conversion learning model. The device acquires input voice data for learning and converts it into learning stage conversion destination data. The conversion learning model is updated by learning using a probability density function defined as a target feature amount distribution function.

  • The device acquires input voice data for learning and converts it using a conversion learning model.
  • The conversion learning model is updated by learning using a probability density function.
  • The probability density function represents the distribution of feature amounts of a target voice signal.
  • The device uses an initial value point and a score function to perform the conversion of the input data.
  • The score function indicates the gradient of a path from the initial value point to a stationary point on the target feature amount distribution function.

Potential Applications

  • Speech recognition and transcription systems
  • Voice conversion for entertainment purposes (changing voices in movies or video games)
  • Language translation systems with voice conversion capabilities

Problems Solved

  • Accurate conversion of voice signals with different attributes
  • Learning and adapting to different voice feature distributions
  • Efficient conversion of voice data for learning purposes

Benefits

  • Improved accuracy in voice signal conversion
  • Adaptability to different voice attributes and feature distributions
  • Enhanced learning capabilities for voice conversion models


Original Abstract Submitted

A voice signal conversion model learning device including: a data-for-learning acquisition unit that acquires input data for learning that is a voice signal input; a conversion learning model execution unit that executes a conversion learning model that converts the input data for learning into learning stage conversion destination data; and an update unit that updates the conversion learning model by learning, in which: a probability density function is defined as a target feature amount distribution function, the probability density function being a function on a vector space representing a series of voice feature amounts and representing a distribution of a series of voice feature amounts of a target voice signal that is a voice signal having a predetermined attribute; a point is defined as an initial value point, the point being in the vector space and representing a series of feature amounts of the input data for learning; a function is defined as a score function, the function having a point x in the vector space as an independent variable and indicating a gradient of a path from the point x to a stationary point that is on the target feature amount distribution function and is nearest to the initial value point; the conversion learning model execution unit performs conversion of the input data for learning on the basis of the score function; and the update unit updates the score function in updating the conversion learning model.