SYNTHETIC TRAINING DATA FOR GENERATIVE MODELS

Abstract: implementations are directed to generating synthetic labeled/preference data by extracting preference pairs from sets of n outputs to a given unlabeled input to a generative model. a plurality of generative outputs are generated by a generative model from a set of input data. a reward model is used to determine a plurality of reward values for the plurality of generative outputs. based on the reward values, a pair of generative outputs from the plurality of generative outputs is selected for inclusion in a training example. the pair of outputs include a positive training example and a negative training example, where the reward values indicate that the positive training example is preferred over the negative training example. the process can be repeated for a plurality of sets of input data to generate a plurality of training examples for inclusion in a training dataset, which can be used to update reward model(s).

Inventor(s): Aliaksei Severyn, Alizée Pace, Eric Malmi, Sebastian Krause, Jonathan Mallinson

CPC Classification: G06N3/0455 (Auto-encoder networks; Encoder-decoder networks)

Search for rejections for patent application number 20250190762