18925017. GENERATING DIVERSE DATASETS USING MACHINE-LEARNED LARGE LANGUAGE MODELS (LLMS) BASED ON VECTOR DISTANCE CONSTRAINTS (Maplebear Inc.)
GENERATING DIVERSE DATASETS USING MACHINE-LEARNED LARGE LANGUAGE MODELS (LLMS) BASED ON VECTOR DISTANCE CONSTRAINTS
Organization Name
Inventor(s)
Jacob Jensen of Metuchen NJ US
Guanghua Shu of Sunnyvale CA US
GENERATING DIVERSE DATASETS USING MACHINE-LEARNED LARGE LANGUAGE MODELS (LLMS) BASED ON VECTOR DISTANCE CONSTRAINTS
This abstract first appeared for US patent application 18925017 titled 'GENERATING DIVERSE DATASETS USING MACHINE-LEARNED LARGE LANGUAGE MODELS (LLMS) BASED ON VECTOR DISTANCE CONSTRAINTS
Original Abstract Submitted
An online system augments a dataset in conjunction with a model serving system. The online system accesses a dataset for training a machine-learning model. The online system generates a prompt to generate candidate samples in the training dataset to the model serving system. The online system receives a response comprising one or more candidate samples. The online system compares the one or more candidate samples to at least one existing sample of the dataset to determine whether the one or more candidate samples are within a threshold level of similarity to an existing sample. If a candidate sample received from the machine-learning language model is not within the threshold level of similarity to an existing sample, the online system updates the dataset with the candidate sample.
(Ad) Transform your business with AI in minutes, not months
Trusted by 1,000+ companies worldwide