MULTI-TASK LEARNING FOR PERSONALIZED KEYWORD SPOTTING: abstract simplified (18153932)
Systems and techniques are described for processing audio data using personalized keyword spotting through multi-task learning (PK-MTL). This involves obtaining an audio sample and generating representations of both a keyword and a speaker based on the sample. The speaker is associated with the keyword. A similarity score is calculated based on a reference representation and either the keyword representation or the speaker representation. This score is then analyzed against a threshold to determine if the audio sample includes the target keyword.