18967529. MULTIMODAL DATA GENERATION (BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.)

From WikiPatents
Jump to navigation Jump to search

MULTIMODAL DATA GENERATION

Organization Name

BEIJING BAIDU NETCOM SCIENCE TECHNOLOGY CO., LTD.

Inventor(s)

Shuohuan Wang of BEIJING CN

Yekun Chai of BEIJING CN

Siyu Ding of BEIJING CN

Junyuan Shang of BEIJING CN

Zhenyu Zhang of BEIJING CN

Yu Sun of BEIJING CN

Hao Tian of BEIJING CN

Hua Wu of BEIJING CN

Haifeng Wang of BEIJING CN

MULTIMODAL DATA GENERATION

This abstract first appeared for US patent application 18967529 titled 'MULTIMODAL DATA GENERATION

Original Abstract Submitted

A multimodal data generation method is provided. The method includes: inputting a query data sequence into a multimodal model, to obtain a plurality of tokens in a response data sequence, where a current token is generated through the following operations: inputting the query data sequence and a current response data sequence into the multimodal model, so that the multimodal model generates the current token based on the query data sequence and the current response data sequence, in response to determining that the current token belongs to a first data modality; or inputting the query data sequence and a current response data sequence into the multimodal model, so that the multimodal model denoises an initial token sequence based on the query data sequence and the current response data sequence, to generate a result token sequence, in response to determining that the current token belongs to a second data modality.