18216271. SYNTHETIC DATA GENERATION FOR MACHINE LEARNING MODELS (Amazon Technologies, Inc.)

From WikiPatents
Jump to navigation Jump to search

SYNTHETIC DATA GENERATION FOR MACHINE LEARNING MODELS

Organization Name

Amazon Technologies, Inc.

Inventor(s)

Rahul Gupta of Waltham MA (US)

Ninareh Mehrabi of Glendale CA (US)

Palash Goyal of San Jose CA (US)

Kai-Wei Chang of Los Angeles CA (US)

Aram Galstyan of Los Angeles CA (US)

SYNTHETIC DATA GENERATION FOR MACHINE LEARNING MODELS

This abstract first appeared for US patent application 18216271 titled 'SYNTHETIC DATA GENERATION FOR MACHINE LEARNING MODELS



Original Abstract Submitted

Techniques for generating synthetic data for machine learning (ML) models are described. A system includes a language model that processes a task and a corresponding set of example inputs to generate another input, referred to herein as a machine-generated data. The machine-generated data is processed using a ML, model (that data is being generated for) to determine a model output, and the model output is analyzed to determine whether it corresponds to a target output. If the model output corresponds to the target output, then the machine-generated data is added to the set of example inputs and one of the original example inputs is removed to generate an updated set of example inputs. The updated set can be used for various training techniques.