US Patent Application 18080713. METHOD AND APPARATUS FOR TRAINING NEURAL NETWORK, AND METHOD AND APPARATUS FOR AUDIO PROCESSING simplified abstract

From WikiPatents
Jump to navigation Jump to search

METHOD AND APPARATUS FOR TRAINING NEURAL NETWORK, AND METHOD AND APPARATUS FOR AUDIO PROCESSING

Organization Name

BEIJING XIAOMI MOBILE SOFTWARE CO., LTD.

Inventor(s)

Wei Kang of Beijing (CN)

Povey Daniel of Beijing (CN)

Fangjun Kuang of Beijing (CN)

Liyong Guo of Beijing (CN)

Zengwei Yao of Beijing (CN)

Long Lin of Beijing (CN)

Mingshuang Luo of Beijing (CN)

METHOD AND APPARATUS FOR TRAINING NEURAL NETWORK, AND METHOD AND APPARATUS FOR AUDIO PROCESSING - A simplified explanation of the abstract

This abstract first appeared for US patent application 18080713 titled 'METHOD AND APPARATUS FOR TRAINING NEURAL NETWORK, AND METHOD AND APPARATUS FOR AUDIO PROCESSING

Simplified Explanation

The patent application describes a method and apparatus for training a neural network and audio processing.

  • The method involves encoding training audio data using an encoder network and predicting a text label corresponding to the audio data using a prediction network.
  • The first encoding result and the first prediction result are combined to obtain a first joint result.
  • The first encoding result and the first prediction result are pruned based on the first joint result to obtain a second encoding result and a second prediction result.
  • The second encoding result and the second prediction result are jointly processed using a joiner network to obtain a second joint result.
  • The network parameters of the encoder network, prediction network, and joiner network are adjusted based on the second joint result.


Original Abstract Submitted

The present disclosure provides a method and apparatus for training a neural network, and a method and apparatus for audio processing. The method includes: encoding training audio data input to an encoder network to obtain a first encoding result, and predicting a text label corresponding to the training audio data input to a prediction network to obtain a first prediction result; jointing the first encoding result with the first prediction result to obtain a first joint result; pruning the first encoding result and the first prediction result according to the first joint result to obtain a second encoding result and a second prediction result; performing a joint processing on the second encoding result and the second prediction result input to a joiner network to obtain a second joint result, and adjusting network parameters of the encoder network, the prediction network and the joiner network according to the second joint result.