Intel Corporation (20250014590). MULTIMODAL LARGE LANGUAGE MODEL WITH AUDIO TRIGGER

From WikiPatents
Jump to navigation Jump to search

MULTIMODAL LARGE LANGUAGE MODEL WITH AUDIO TRIGGER

Organization Name

Intel Corporation

Inventor(s)

Kuba Lopatka of Gdansk (PL)

MULTIMODAL LARGE LANGUAGE MODEL WITH AUDIO TRIGGER

This abstract first appeared for US patent application 20250014590 titled 'MULTIMODAL LARGE LANGUAGE MODEL WITH AUDIO TRIGGER



Original Abstract Submitted

systems and methods to trigger llm inference based on the presences of relevant audio, such as a keyword or sound event of interest. a detection head receives acoustic embeddings from an audio encoder and determines whether the audio stream includes relevant sounds (e.g., a selected audio trigger). when the audio stream does not include relevant sounds, multimodal llm inference is bypassed, thereby saving power and protecting privacy. when relevant sounds are detected in the audio stream by the detector, the acoustic embeddings from the audio encoder are transmitted to the multimodal llm, which proceeds to perform inference on the acoustic embeddings. the audio encoder and/or detection head can be offloaded in the hardware and implemented before the multimodal llm in the hardware pipeline, while the multimodal llm can be implemented in a neural processing unit.