US Patent Application 17662435. SELF-SUPERVISED SPEECH RECOGNITION simplified abstract

From WikiPatents
Jump to navigation Jump to search

SELF-SUPERVISED SPEECH RECOGNITION

Organization Name

INTERNATIONAL BUSINESS MACHINES CORPORATION


Inventor(s)

Cheng-I Lai of Cambridge MA (US)

Yang Zhang of Cambridge MA (US)

Kaizhi Qian of Champaign IL (US)

Chuang Gan of Cambridge MA (US)

James R. Glass of Winchester MA (US)

Alexander Haojan Liu of Malden MA (US)

SELF-SUPERVISED SPEECH RECOGNITION - A simplified explanation of the abstract

This abstract first appeared for US patent application 17662435 titled 'SELF-SUPERVISED SPEECH RECOGNITION

Simplified Explanation

The patent application describes a method for improving the performance of a self-supervised learning (SSL) speech model through a process called finetuning.

  • The method involves using one or more computer processors to obtain an initial subnetwork and pruning mask from a pre-trained SSL speech model.
  • The initial subnetwork is adjusted by zeroing out certain weights specified by the pruning mask.
  • A new subnetwork is then trained from the adjusted subnetwork.
  • To achieve a desired level of sparsity, the method further prunes weights with the lowest magnitude in the new subnetwork, regardless of the network structure.
  • Finally, the finetuned subnetwork is used to classify audio segments.

Overall, this patent application presents a technique for refining a self-supervised learning speech model by iteratively adjusting and pruning the network weights to improve its performance in audio classification tasks.


Original Abstract Submitted

One or more computer processors obtain an initial subnetwork at a target sparsity and an initial pruning mask from a pre-trained self-supervised learning (SSL) speech model. The one or more computer processors finetune the initial subnetwork, comprising: the one or more computer processors zero out one or more masked weights in the initial subnetwork specified by the initial pruning mask; the one or more computer processors train a new subnetwork from the zeroed out subnetwork; the one or more computer processors prune one or more weights of lowest magnitude in the new subnetwork regardless of network structure to satisfy the target sparsity. The one or more computer processors classify an audio segment with the finetuned subnetwork.