US Patent Application 17662435. SELF-SUPERVISED SPEECH RECOGNITION simplified abstract
Contents
SELF-SUPERVISED SPEECH RECOGNITION
Organization Name
INTERNATIONAL BUSINESS MACHINES CORPORATION
Inventor(s)
Cheng-I Lai of Cambridge MA (US)
Yang Zhang of Cambridge MA (US)
Kaizhi Qian of Champaign IL (US)
Chuang Gan of Cambridge MA (US)
James R. Glass of Winchester MA (US)
Alexander Haojan Liu of Malden MA (US)
SELF-SUPERVISED SPEECH RECOGNITION - A simplified explanation of the abstract
This abstract first appeared for US patent application 17662435 titled 'SELF-SUPERVISED SPEECH RECOGNITION
Simplified Explanation
The patent application describes a method for improving the performance of a self-supervised learning (SSL) speech model through a process called finetuning.
- The method involves using one or more computer processors to obtain an initial subnetwork and pruning mask from a pre-trained SSL speech model.
- The initial subnetwork is adjusted by zeroing out certain weights specified by the pruning mask.
- A new subnetwork is then trained from the adjusted subnetwork.
- To achieve a desired level of sparsity, the method further prunes weights with the lowest magnitude in the new subnetwork, regardless of the network structure.
- Finally, the finetuned subnetwork is used to classify audio segments.
Overall, this patent application presents a technique for refining a self-supervised learning speech model by iteratively adjusting and pruning the network weights to improve its performance in audio classification tasks.
Original Abstract Submitted
One or more computer processors obtain an initial subnetwork at a target sparsity and an initial pruning mask from a pre-trained self-supervised learning (SSL) speech model. The one or more computer processors finetune the initial subnetwork, comprising: the one or more computer processors zero out one or more masked weights in the initial subnetwork specified by the initial pruning mask; the one or more computer processors train a new subnetwork from the zeroed out subnetwork; the one or more computer processors prune one or more weights of lowest magnitude in the new subnetwork regardless of network structure to satisfy the target sparsity. The one or more computer processors classify an audio segment with the finetuned subnetwork.