Google llc (20250131208). Contrastive Pre-Training for Language Tasks
Contrastive Pre-Training for Language Tasks
Organization Name
Inventor(s)
Thang Minh Luong of Santa Clara CA US
Kevin Stefan Clark of San Francisco CA US
Contrastive Pre-Training for Language Tasks
This abstract first appeared for US patent application 20250131208 titled 'Contrastive Pre-Training for Language Tasks
Original Abstract Submitted
systems and methods are provided that train a machine-learned language encoding model through the use of a contrastive learning task. in particular, the present disclosure describes a contrastive learning task where the encoder learns to distinguish input tokens from plausible alternatives. in some implementations, on each training example the proposed method masks out some subset (e.g., 15%) of the original input tokens, replaces the masked tokens with samples from a “generator” (e.g., which may be a small masked language model), and then trains the encoder to predict whether each token comes from the original data or is a replacement produced by the generator.