METHODS AND SYSTEMS FOR SPEECH EMOTION RETRIEVAL VIA NATURAL LANGUAGE PROMPTS

Abstract: methods and systems for generating training data for training a contrastive language-audio machine-learning model. a plurality of audio segments are retrieved from a speech emotion recognition (ser) database along with metadata associated with the audio segments. the metadata of each audio segment includes an emotion class. words or terms associated with emotions are retrieved from a lexicon. a large language model (llm) is executed on (i) the classes of emotion associated with the audio segments and (ii) the words or terms from the lexicon. this generates a plurality of text captions associated with emotion, which are stored in a caption pool. for each audio segment retrieved from the ser database, that audio segment is paired with one or more of the text captions from the caption pool that were generated based on the emotion class associated with that audio segment. this yields audio-text pairs for training a contrastive learning model.

Inventor(s): Wei-Cheng Lin, Ho-Hsiang Wu, Shabnam Ghaffarzadegan, Luca Bondi, Abinaya Kumar, Samarjit Das

CPC Classification: G06N3/08 (Learning methods)

Search for rejections for patent application number 20250217638