MULTIMODAL LARGE LANGUAGE MODEL WITH AUDIO TRIGGER

Organization Name

Inventor(s)

MULTIMODAL LARGE LANGUAGE MODEL WITH AUDIO TRIGGER

This abstract first appeared for US patent application 20250014590 titled 'MULTIMODAL LARGE LANGUAGE MODEL WITH AUDIO TRIGGER

Original Abstract Submitted

systems and methods to trigger llm inference based on the presences of relevant audio, such as a keyword or sound event of interest. a detection head receives acoustic embeddings from an audio encoder and determines whether the audio stream includes relevant sounds (e.g., a selected audio trigger). when the audio stream does not include relevant sounds, multimodal llm inference is bypassed, thereby saving power and protecting privacy. when relevant sounds are detected in the audio stream by the detector, the acoustic embeddings from the audio encoder are transmitted to the multimodal llm, which proceeds to perform inference on the acoustic embeddings. the audio encoder and/or detection head can be offloaded in the hardware and implemented before the multimodal llm in the hardware pipeline, while the multimodal llm can be implemented in a neural processing unit.

Intel Corporation (20250014590). MULTIMODAL LARGE LANGUAGE MODEL WITH AUDIO TRIGGER

Contents

MULTIMODAL LARGE LANGUAGE MODEL WITH AUDIO TRIGGER

Organization Name

Inventor(s)

MULTIMODAL LARGE LANGUAGE MODEL WITH AUDIO TRIGGER

Original Abstract Submitted

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools

Patent Application Monitoring